Have you ever wanted to play an old Apple II game, get instant access to 3 million books, or browse through early iterations of your favorite websites? You can do all of those things and more at Internet Archive, a nonprofit Internet library founded by Brewster Kahle that exists to “change the content of the Internet from ephemera into enduring artifacts of our political and cultural lives.”
Kahle, who is also the founder of Alexa Internet, has been focused on developing technologies for information discovery since the mid-1980s. In an effort to preserve our digital cultural commons, he has made hundreds of billions of webpages occupying 16 petabytes of data—in other words, a lot of data—accessible to the public for free.
With the ultimate goal of creating universal access to all knowledge, Kahle is a true visionary of the world of open innovation. I recently had a chance to ask him a few questions about when he decided to archive the Internet, how he addresses the challenge of curation, what the Internet Archive means for the commons, and his hopes for its future.
Jessica Conrad: What originally led you to make the connection between technology and its ability to expand our knowledge commons?
Brewster Kahle: A friend once said to me, Brewster, you’re an idealist. I replied, Yes. Then he said, And you’re a technologist. My reply was the same. Finally he challenged me to describe how I could use technology to make things better, which was not an easy question to answer. I came up with two ideas: enhanced privacy and building the great library. It quickly became clear, however, that it would be too difficult to create affordable privacy solutions for the commons with the technology available at the time. So I turned to the digital library, an idea I’ve been working on ever since.
And focusing on universal access to all knowledge has been a great career choice. By picking a goal that I can never achieve, not only do I have something to work on forever, but other people can also participate. The true beauty of it is that we’re all working toward the same thing, yet we don’t have to work for one another. Competing organizations can even agree.
Jessica Conrad: At what point did you decide it would be important to catalogue the ephemera of the Internet?
Brewster Kahle: By 1994, I had helped anchor the new World Wide Web by bringing The Wall Street Journal, The New York Times, and the Encyclopedia Britannica online. That’s when I decided we could start building a digital library. So in 1996 I started an organization called Alexa Internet with the goal of cataloging the Internet in such a way that it would be easy to navigate. We also built the nonprofit called Internet Archive, which would become the long-term repository, at the same time.
Jessica Conrad: What do you archive?
Brewster Kahle: In 1996, we started collecting the World Wide Web with a snapshot every two months. The web now contains 360 billion webpages and it’s growing at a rate of one billion pages a week. Needless to say, it’s huge. Today we work with libraries all over the world to figure out how we should crawl the web and archive it.
By 2000 we started archiving DVD quality television, 24 hours a day, from Russian, Chinese, and Japanese channels, Al Jazeera, and more. In an effort to help people understand the importance of a global perspective, we made one week of television news available on October 11, 2001. At the time, some people believed Palestinians were dancing in the streets in response to the attacks on September 11. But by watching Palestinian television you could tell that people weren’t dancing in the streets; in fact their reaction was quite different. The misunderstanding came from a news channel that had used stock footage from a mid-90s celebration in Palestine.
I think we now know that news comes with a point of view. You can’t just watch one channel; to think critically, you need to be able to quote, compare, and contrast. But that’s challenging with television because it just blows over, and people make stuff up because nobody can hold them accountable—except for Jon Stewart. At Internet Archive, we want to make everyone into a Jon Stewart research department. So we’ve made it possible to search television news since the middle of 2009 using closed captions or program transcriptions. Users can also get short clips on the web, or if they want to borrow a whole program, we’ll print it onto a DVD and lend it out.
That’s the television piece. Then we started archiving movies, music, and books. We’ve got a great live music selection, but we don’t have the classic commercial releases because we haven’t figured out how to make them very accessible. We also have archival footage of movies, but, again, we don’t know how to do the classic Hollywood films. Books, however, are a different story. We’re now digitizing 1,000 books a day, and we have over three million available for free at archive.org. We also lend over 200,000 newer books through a website called openlibrary.org.
Jessica Conrad: That’s a tremendous amount of content. What’s the archive’s capacity?
Brewster Kahle: Our total collection now contains 16 petabytes of data. Digital measurement goes kilobyte, megabyte, gigabyte, terabyte, then petabyte. So the collection is very big, but as long as it remains affordable to grow and maintain, we believe we can keep up.
Jessica Conrad: How do you approach the challenge of curation?
Brewster Kahle: We try to archive everything, but in some cases we have to be selective. For example, we haven’t archived everything on YouTube; instead, we just try to get the important parts of YouTube. This gets into your question of curation. How do we tell what’s important? Any YouTube that’s linked to from Twitter goes into the archive. That’s our way of selecting YouTubes.
Jessica Conrad: So if people are talking about a YouTube on Twitter, then the video is part of our cultural conversation?
Brewster Kahle: That’s the idea. But if someone refers to a YouTube elsewhere on the web, we’ll take the cue to archive it, too.
Jessica Conrad: What threats do you see to this kind of information sharing? Commodification? Lack of funding?
Brewster Kahle: Corporate control poses the biggest threat. Our 2005 study of World Wide Web usage revealed that the top 10 websites receive 20% of all clicks worldwide. That means 10 corporations control 20% of web usage everywhere in the world. I don’t think we’ve seen that level of concentrated power since the Roman Empire.
Now I’ll give you the good news, which is that the web has a long tail distribution. Even though there’s a great deal of concentration at the high end, the top 100,000 websites are still getting real attention from a worldwide population. There isn’t the same barrier that there used to be for getting your product onto the shelf or your words into the newspaper. If you write something worth reading, people can find it—no matter where they are. There are lots of ways for people to get information.
Even so, the threat of corporate control is real, and we might need to contend with it in the next decade.
Jessica Conrad: Do you see any other threats? Could the Internet ever shut down?
Brewster Kahle: Yes, but I think the way the Internet would “shut down” is by becoming irrelevant. Not to pick on Apple here, but take the iPhone. Apple controls its operating system, and if you try to make any changes—no matter how small—they call it jail breaking. Clearly it’s not encouraged. What’s more, only Apple-approved applications work on their devices. This ecosystem of closed hardware, closed operating systems, and closed applications is the antithesis of the Internet and World Wide Web. If creative people migrate away from open innovation toward the closed world of corporate control, we will lose something really great.
So we must make it more fun and interesting to be part of the open world. We need to enable people to make money and support themselves, which is something we’re currently not very good at. Ultimately, we have to evolve the open world so people can choose to stay and avoid ending up in some large corporation with private security, private contracts, and end-user agreements out of necessity. For me, that would be total defeat.
Jessica Conrad: How do you think about the Internet Archive in the context of the commons?
Brewster Kahle: We think of the Internet Archive as a home for the commons. In the United States, if you give something to the public—through a charity, for example—not only can you do it for free, but you also get a tax benefit and a pat on the back. This is true except on the Internet. If you try to make something publicly available on your own website and it gets popular, you end up with a huge bill. We want the Internet Archive to be an alternative for anyone who wants to make something accessible and as permanent as we know how to make it for the cultural world. Our site may not have all of the commercial bells and whistles, but we’re motivated differently. We want to provide unlimited storage and unlimited bandwidth forever for free.
Jessica Conrad: What’s your hope for the Internet Archive’s future?