IndustryHow to Build Your Own Search Engine

How to Build Your Own Search Engine

Want a detailed glimpse into the black boxes we call search engines? Mining the Web is a textbook that discusses everything from building your own crawler to the future of information finding on the web.

Want a detailed glimpse into the black boxes we call search engines? Mining the Web is a textbook that discusses everything from building your own crawler to the future of information finding on the web.

Search engines are designed to be simple to use. Type a few words into a query box, and voila, you’re presented with a set of probable results that match your information need.

This simplicity masks some heavy-duty complexity. Although we refer to a “search engine” in the singular, Google, Teoma, AlltheWeb and others are actually software systems made up of a number of components, each specialized and tuned to perform a specific function that contributes to the whole.

Mining the Web: Discovering Knowledge from Hypertext Data is one of the first books that actually describes, in detail, the parts of contemporary search engines and how they function. The author, Soumen Chakrabarti, is an assistant professor of computer science and engineering at the Indian Institute of Technology in Bombay, and the book reveals a rare glimpse at the inner workings of our favorite search tools.

Most commercial search engines guard the details of their innermost operations closely, revealing casual hints here and offhand remarks there, but almost never offering complete information about the “secret sauce” underlying their operations.

That’s what makes this book so interesting. If you really want to understand how search engines work, this book provides an excellent and fairly detailed explanation of the processes they all use, to one degree or another.

The book’s not for the technically faint of heart, however. It assumes a good working knowledge of math, logic and computer science, and the book is dense with formulae and graphs. But don’t let that scare you — Dr. Chakrabarti writes clearly, and the book is well organized, progressing logically from topic to topic.

Even if you find technical language challenging, skimming past the details will leave you with a good fundamental understanding of search engine technology.

The book begins with an introduction to search engine technology. Subsequent chapters deal with crawling the web, search and information retrieval, and basic relevance algorithms. The second part of the book is dedicated to machine learning — how search engines can be engineered to get “smarter” about processing queries and returning better results.

Part three shifts gears, focusing on practical techniques and applications of search engine technology. Here’s where Dr. Chakrabarti really gives us a peek behind the curtain, talking about the differences between Google’s PageRank algorithms and some of the techniques used by other commercial search engines to differentiate themselves from one another.

The last chapter takes a look at the future of web mining, offering tantalizing glimpses of what we can expect over the next few y

Resources

The 2023 B2B Superpowers Index
whitepaper | Analytics

The 2023 B2B Superpowers Index

8m
Data Analytics in Marketing
whitepaper | Analytics

Data Analytics in Marketing

10m
The Third-Party Data Deprecation Playbook
whitepaper | Digital Marketing

The Third-Party Data Deprecation Playbook

1y
Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study
whitepaper | Digital Marketing

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

1y