PDA

View Full Version : Basic SE infrastructure question


bigshark
11-27-2004, 05:41 PM
I am interested in developing a structured data search engine and would like to know what you guys think of the following basic infrastructure:

The db would have a max of about 200,000 records.

Windows server 2003
MySQL
PHP

Those I have consulted with have informed me that MySQL is too small to provide lightening fast search results and that Windows servers basically are no good for search?

Please help. What are my options and what are your comments thoughts?

Thanks very much for your time!

bigshark
11-28-2004, 02:02 PM
Nevermind about the operating system. After checking with netcraft, it appears that just about all search engines run on Linux...even beta.search.msn which runs on the AkamaiGHost..which we all know uses Linux.


Any ideas about the best choice of db to use?

orion
11-30-2004, 02:15 PM
Welcome bigshark to the Forums. Please feel at home.

You may want to check Matt Wells's gigablast architectural approach and query pro's/con's http://www.gigablast.com/rants.html.

For a start-up engine Matt's approach is a sound strategy. I'm not sure if they still have the document that briefly describes their dual pinging between machines.

Orion

iapain
12-19-2004, 04:22 PM
Actually, earlier i also think that searching in the database..is what i need. But think can any database Handles 8-billion pages? My advice to you that don't use any database management program.
How you can provide FAST search without databse?
Create an index of all words on your website and use hashing on each words.
In Simple words, Have you seen the INDEX at the end of BOOKS? like
A
Array 2,44,555

Some thing like above, this is called index, ARRAY is a keyword occured in Page 2, Page 44 and Page 555.
Similarly store all words of your website in similar INDEX and search on this INDEX after searching call Pages from the Document Server or Repository.

It took approx 0.1 Sec on 10,000,000 pages having 2 billion words in INDEX on an average machine.

About Server and Scripting?
Linux are the best for this kind of work and regarding scripting....PYTHON and CGI.

At last but not least, RANKING play important role, also think about it!

-Deepak
---------