xan
04-05-2005, 07:32 AM
From a nice and short article from L'Express (http://www.lexpress.mu/display_article_sup.php?news_id=39158):
The principles of searching a catalogue are different from that of a WWW and the WWW is devoid of cataloguing and classification as opposed to a library system.
“the WWW has created a revolution in the accessibility of information” (NISO, 2004).
Information overload can be defined as “the inability to extract needed knowledge from an immense quantity of information for one of many reasons” (Nelson, 1997).
“The volume of information on the Internet creates more problems than just trying to search an immense collection of data for a small and specific set of knowledge” (Nelson, 1997).
“finding authoritative information on the Web is a challenging problem” (Savoy in Baeza-Yates and Schauble, 2002) as opposed to a library where we would get mostly authoritative information.
“A Web page typically contains various types of materials that are not related to the topic of the Web page” (Yu et. al., 2003). As such, the heterogeneous nature of the Web affects information retrieval. Most of the Web pages would consist of multiple topics and parts such as pictures, animations, logos, advertisements and other such links. “Although traditional documents also often have multiple topics, they are less diverse so that the impact on retrieval performance is smaller” (Yu et. al., 2003). For instance, whilst searching in an OPAC, one won’t find any animations or pictures interfering with the search. "
Other differences not metioned are that the size of web pages varies hugely, exponential growth, the unstructured nature of the documents, keeping the index fresh, content quality,
In a digital library the user comes with a clear idea of what they are looking for (a document, an author, a specific subject), on the web, that is not so, making it necessary to refine the query futher.
I think we often forget that the web search engines are still very new and we are still unexperienced relatively speaking. All previous work has been in digital libraries and data mining. In fact the majority of IR work has been done in DL's. We have to modify tried and tested techniques and invent new ones to deal with this new format. This would suggest that previous IR work can help to understand what is going on in web work today.
The principles of searching a catalogue are different from that of a WWW and the WWW is devoid of cataloguing and classification as opposed to a library system.
“the WWW has created a revolution in the accessibility of information” (NISO, 2004).
Information overload can be defined as “the inability to extract needed knowledge from an immense quantity of information for one of many reasons” (Nelson, 1997).
“The volume of information on the Internet creates more problems than just trying to search an immense collection of data for a small and specific set of knowledge” (Nelson, 1997).
“finding authoritative information on the Web is a challenging problem” (Savoy in Baeza-Yates and Schauble, 2002) as opposed to a library where we would get mostly authoritative information.
“A Web page typically contains various types of materials that are not related to the topic of the Web page” (Yu et. al., 2003). As such, the heterogeneous nature of the Web affects information retrieval. Most of the Web pages would consist of multiple topics and parts such as pictures, animations, logos, advertisements and other such links. “Although traditional documents also often have multiple topics, they are less diverse so that the impact on retrieval performance is smaller” (Yu et. al., 2003). For instance, whilst searching in an OPAC, one won’t find any animations or pictures interfering with the search. "
Other differences not metioned are that the size of web pages varies hugely, exponential growth, the unstructured nature of the documents, keeping the index fresh, content quality,
In a digital library the user comes with a clear idea of what they are looking for (a document, an author, a specific subject), on the web, that is not so, making it necessary to refine the query futher.
I think we often forget that the web search engines are still very new and we are still unexperienced relatively speaking. All previous work has been in digital libraries and data mining. In fact the majority of IR work has been done in DL's. We have to modify tried and tested techniques and invent new ones to deal with this new format. This would suggest that previous IR work can help to understand what is going on in web work today.