PDA

View Full Version : hit list and index..


francophilie
11-16-2006, 05:56 PM
I'm happy to be a member in the best forum..

i'm searching on the net about 2 topics of search engin
but i can't find the enough information about the hit list,forward and inverted index .

i know that each document is converted into a set of word occurrences called hits but this is not clear for me and also the differences between the forward and inverted index ...any member can help me?

Thanks

PhilC
11-16-2006, 08:00 PM
http://infolab.stanford.edu/~backrub/google.html

Scroll down to section 4.

francophilie
11-19-2006, 01:25 AM
http://infolab.stanford.edu/~backrub/google.html

Scroll down to section 4.

i found befor 2 weeks the orginal paper of larry and sergy which is the same this site..and i read it ,but i can't understand the terms of hit list and forward and inverted index,so i wrote here to clarification that.

Thanks PhilC

PhilC
11-19-2006, 05:32 AM
Each of those things is explained in that document (sections 4.2.5, 4.2.6 and 4.2.7) complete with diagrammes. You are asking for complete explanations to be written again, but it is better to spend time thinking about the document's explanatiions and trying understand it, and then, if there is something you don't understand, ask about it.

Just out of interest, why are you asking?

francophilie
11-20-2006, 01:37 PM
I have project about "search engine" .i face some problems to understand that topics. now ,after hard searching about the inverted index ,i get the meaning .But until now , the term of hit list still confuse me :
i expect ,if i get some hints ,i will understand it.

Thanks for your help ...PhilC *!*

PhilC
11-20-2006, 01:56 PM
Ok. Think of a "hit" as an occurence of a word in a document, and a "hit list" as a list of occurences of a particular word in a particular document. It is the same in the forward and inverted indexes.

In both indexes, the Google system knows what the word is, and each "hit" in the hit list is simply data about that particular occurence of that word in a particular document.

Each "hit" is 2 bytes long, and contains the following data about the particular occurence:- its capitalization, its font size, and its position in the document.

There are two types of "hits" - fancy hits and plain hits. The exact details of the data for each type can be read in the Stanford document.