Building the Universal Library

Author

Chris Sherman

Date published May 17, 2006 Categories

Industry

What will it take for Google or another search engine to truly assemble a library of all of the world’s information? A thought-provoking essay by Wired magazine’s “senior maverick” takes a fascinating look at the challenges.

The various book scanning projects underway throughout the world don’t snare as much media coverage as higher-profile products and services introduced by the search engines, but they’re nonetheless important initiatives. As Wired co-founder Kevin Kelly writes in a recent New York Time Magazine article, “The dream is an old one: to have in one place all knowledge, past and present. All books, all documents, all conceptual works, in all languages.”

Building a Universal Library a huge undertaking, and not just because the physical effort of scanning tens of millions of books is in itself such a massive task. Once scanned, the books must be indexed and made searchable, all the while respecting the copyrights of books not yet in the public domain.

Kelly offers some interesting stats about the current progress of various large-scale book scanning projects that we’ve written about at Search Engine Watch, such as Google Print, the Yahoo and Microsoft-backed Open Content Alliance, The Internet Archive’s Million Books Project and others.

He says these projects are scanning about a million books a year. Although this sounds like an impressive pace, it amounts to just 5% of all books currently in print. Fortunately, much of the new information created by humans is now in digital format, so it can more easily be included in the Universal Library without the extensive physical effort of scanning books.

And let’s not forget the web. Although the search engines have become fairly proficient at creating comprehensive indexes of the surface web, they’re still missing massive amounts of content located in databases or other dynamic sources (the Invisible web)—not to mention web pages that have disappeared.

“The grand library naturally needs a copy of the billions of dead Web pages no longer online and the tens of millions of blog posts now gone—the ephemeral literature of our time.”

Including this “ephemeral literature” could prove to be a major challenge. Various studies have put the “half-life” of an average web page at just under two years, with the half-life of a typical web site being just over two years.

The most complete publicly accessible archive of the web, the Internet Archive, contains just a fraction of all content that has been posted to the web—some 55 billion pages in all.

But I think it’s a fair bet to say that Google and Yahoo haven’t thrown away the pages they’ve crawled through the years. And there’s a precedent for digital restoration on a massive scale: Google’s painstaking effort to build an archive of the Usenet.

Assembling archives stored on magnetic tape, CD-ROM and other sources, Google restored a comprehensive archive of Usenet, dating back to 1981, and made this available to users in December 2001. Although still not totally complete, the renamed Google Groups now likely contains more than 99 percent of all Usenet postings ever made.

It’s not unthinkable that Google and Yahoo, the longest surviving crawler-based engines, could collaborate to restore a comprehensive archive of the web. Surely there are data archives from search engines now long-gone that could also be mined to build out an archive.

Apart from the challenges of simply creating the Universal Library and making it searchable, Kelly thinks the entire paradigm of how we consume information must change. He envisions the emergence of Wikipedia-like directories where fans of particular types of information can write reviews, or create pointers to obscure works for other fans. In essence, we will all become librarians in the Universal Library, helping each other navigate the vast amount of information that’s difficult for us to cope with today.

And, just as we do with our digital music now, we’ll be able to mix and mash content to create “playlists” (Kelly calls them “bookshelves”) to share with others.

Ah, but what about copyright? How can we create mashups without violating existing laws? Kelly spends a lot of time analyzing the current state of copyright laws, and how it poses a major barrier to the creation and fluid operation of the Universal Library.

These are just a few of the topics Kelly touches on in his terrific intellectual romp mulling the issues with a Universal Library, Scan This Book! It’s a fascinating and thoughtful read, well worth the time of anyone who spends a lot of time consuming digital information and is impatiently awaiting the arrival of the Universal Library.

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication’s search facility, which most have, and search for the headline.

Yahoo: Our ads are better…
CNET News.com May 17 2006 9:31PM GMT

The next Yahoo: social search, user content…
InfoWorld May 17 2006 9:28PM GMT

Yahoo sees no financial gain from ad system in ’06…
Reuters May 17 2006 7:11PM GMT

ShopWiki Launches Mobile Shopping Search Engine…
Electronic Commerce Guide May 17 2006 6:55PM GMT

Google, Microsoft and Adobe – The battle for the new operating system…
ZDNet May 17 2006 4:58PM GMT

Yahoo to Personalize Search…
Red Herring May 17 2006 4:48PM GMT

DoD to use Microsoft Virtual Earth; Google China controversy continues – 05/12/2006…
ITworld.com May 17 2006 2:29PM GMT

Google is most loved brand…
Guardian Unlimited reg May 17 2006 10:32AM GMT

Riya heading into Web search…
ZDNet May 17 2006 6:17AM GMT

Digging into Google Notebook javascript…
ZDNet May 17 2006 5:36AM GMT

What’s Next for Yahoo?…
iMedia Connection May 17 2006 5:05AM GMT

Google fine-tunes video service…
CNET News.com May 17 2006 4:50AM GMT

Study Finds RSS Ad CTR Leveling Off…
ClickZ Today May 17 2006 4:08AM GMT

Is Google God?…
Time May 17 2006 12:44AM GMT

The Aristocracy Of Relevance…
Media Post May 16 2006 9:57PM GMT

More about:

Resources

Analytics The 2023 B2B Superpowers Index

The Merkle B2B 2023 Superpowers Index outlines what drives competitive advantage within the business culture and subcultures that are critical to success. It is the indispensable guide for B2B marketers to deliver world-class experiences and keep pace with the dynamic environment. Download Now
Analytics Data Analytics in Marketing

The ClicData survey found that various challenges exist that prevent organizations from achieving such gains. These challenges included inaccessible data formats and limited flexibility in displaying data in dashboards. Download Now
Digital Marketing The Third-Party Data Deprecation Playbook

The need for fraud prevention in the digital world is critical now more than ever. Why? Thinking about your own behavior, consider how you complete transactions and how this has changed over the last 5 years. Download Now
Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

The need for fraud prevention in the digital world is critical now more than ever. Why? Thinking about your own behavior, consider how you complete transactions and how this has changed over the last 5 years. Download Now

Industry

SEO

PPC

Analytics

Social

Local

Mobile

Video

Content

Development

Information

Follow us

Search Headlines

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

The Search Engine Watch Top 5!

The ultimate 2022 Google updates round up

Is Google headed towards a continuous “real-time” algorithm?

Why we’re hardwired to believe SEO myths (and how to spot them!)

Seven Google alerts SEOs need to stay on top of everything!

The not-so-SEO checklist for 2022

Wrapping up 2021 with our top 10!

Four tips for SEM teams to adjust to a privacy-focused future

Follow us

Building the Universal Library

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Search Headlines

Get the Latestdaily news and insights about search engine marketing, SEO and paid search.

Resources

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

The Search Engine Watch Top 5!

The ultimate 2022 Google updates round up

Is Google headed towards a continuous “real-time” algorithm?

Why we’re hardwired to believe SEO myths (and how to spot them!)

Seven Google alerts SEOs need to stay on top of everything!

The not-so-SEO checklist for 2022

Wrapping up 2021 with our top 10!

Four tips for SEM teams to adjust to a privacy-focused future

Get the Latest
daily news and insights about search engine marketing, SEO and paid search.