PDA

View Full Version : Newbie needs SE resources help


bigshark
08-26-2004, 02:45 PM
I am looking to design a directory with information that I maintain and control. This is basically going to be a type of web directory.

I want to develop a SE that can search the information using keywords associated to the information. However, I do not want to simply match keywords up, I would like to create a db that associates certain words together so if one word is entered e.g. 'car' all bits of information will be returned that contain car, automobile, auto etc.

I have only really found resources for designing a SE for the WEB which includes indexing web pages. Since I control the information, I am assuming this sill be a lot easier.

Can anyone point me in the right direction for resources on creating this type of a keyword topic directory search?

Thanks in advance!

seobook
08-26-2004, 06:36 PM
I would probably look for lexical database (think thats what its called) or latent semantic indexing info on some of the major search engines

bigshark
08-26-2004, 09:45 PM
thanks for the lead.

seobook
08-26-2004, 09:56 PM
thanks for the lead.
no problem
http://javelina.cet.middlebury.edu/lsa/out/cover_page.htm
this is about latent semantic indexing

also, if you control the entire information database (or have a somewhat pure information database) you will be able to put a ton more weight on "on the page" features than commercial search engines do and will not likely need exceptionally sophisticated link analysis algorithms.

bigshark
08-28-2004, 11:59 AM
Thanks for the article,

So from what I read, it appears that I will be creating what the author suggested is a very time consuming and cost-prohibitive taxonomy.

However, since the amount of searchable data will be relatively small, around 1 million db records at max, I still think it would work. This number would start off much lower, only a couple hundred and eventually raise to ~1m.

Could someone critique this approach:

1. Create a db table (t1) with a list of categories.
2. Assign specific documents to their corresponding categories in t1. They could be attached to more than one category.
2. Create a second table (t2) with a lit of keywords...this could eventually number in the many thousands.
3. Assign keywords from t2 to each relevant category in t1 so that if any of the assigned keywords are entered, all results attached to that category will be returned.

I am sure this is too simplistic of an approach but coming from a non-SE development background it seems fairly intuitive. Please feel free to shoot holes through this!

Thanks for the help,