View Full Version : Questions about Ontologies/Taxonomies and Use in Modern Search
randfish
03-10-2005, 04:32 PM
I have been doing some considerable reading about ontologies and taxonomies (http://en.wikipedia.org/wiki/Ontology_%28computer_science%29#Anatomy_of_an_Onto logy) (classification and subject hierarchy systems) and their use in information retrieval and web search engines.
My questions are:
1. Do modern search engines use a constructed/fixed ontology system for classifying information, topics & communities?
2. If they do, is it dynamic, and built from the search engine's index or is it a human constructed model?
3. Is this an area of knowledge that SEOs (like me) should be digging around in, or is this an impractical topic to focus on?
Thanks much!
orion
03-10-2005, 05:21 PM
Is this an area of knowledge that SEOs (like me) should be digging around in, or is this an impractical topic to focus on?
It won't hurt. I'm all for seo education.
Orion
1. Do modern search engines use a constructed/fixed ontology system for classifying information, topics & communities?
Yes, search engines do, but perhaps not the ones you mean. Web search engines for the most part don't and can't really because of scalability. Its also very hard to enforce a standard.
2. If they do, is it dynamic, and built from the search engine's index or is it a human constructed model?
My question is why bother consructing an automated ontology on the fly? Information storage is expensive enough as it is, so if you can get what you want automatically, use it and then ditch it. We do anyway.
3. Is this an area of knowledge that SEOs (like me) should be digging around in, or is this an impractical topic to focus on?
Its not impracticle, and dig if you so wish, that's all cool. I can't tell you how useful this will be though. It would depend on wthether you want to actually buid a web ontology or whether you want to understand how existing ones used on the web work (no I'm not being sarcastic, maybe you do want to build, I don't know). In a highly dynamic environment like this annotation is laborious, slow, impractical, and therefore quite useless. In a closed environement like engineering publication repositories its very powerful indeed, and I would always proceed that way, it makes sense. If used for an online search engine...I see problems.
However ontologies have been used and looked into for personalization:
Pretschner, A. & Gauch, S. Ontology Based Personalized Search (http://www.ittc.ku.edu/obiwan/publications/papers/ictai99-2.pdf)
You'll find them used or studied often in relation to things like that. How about the user creates his/her own ontology for personalized results? There are many ways to use them, but they are not always the best solution.
I went to the BCS annual meeting this week and we discussed ontologies/taxonomies a fair bit. I wrote a report of what went on and what was discussed. The semantic web is pretty interesting to everyone right now.
You know where the blog is. :)
Also Rand, I found this as well in our bits and pieces:
Peter Norvig: (Mr. Norvig is director of search quality at Google.) [There are] four individual challenges. First is a chicken-and-egg problem: How do we build this information, because what's the point of building the tools unless you got the information, and what's the point of putting the information in there unless you have tools. A friend of mine just asked can I send him all the URLs on the web that have dot-RDF, dot-OWL, and a couple other extensions on them; he couldn't find them all. I looked, and it turns out there's only around 200,000 of them. That's about 0.005% of the web. We've got a ways to go.
The next problem is competing ontologies. Everybody's got a different way to look at it. You have some tools to address it. We'll see how far that will scale. Then the Cyc problem, which is a problem of background knowledge, and the spam problem. That's something I have to face every day. As you get out of the lab and into the real world, there are people who have a monetary advantage to try to defeat you.
So, the chicken-and-egg problem. That's "What interesting information is in these kind of semantic technologies, and where is the other information?" It turns out most of the interesting information is still in text. What we concentrate on is how do you get it out of text. Here's an example of a little demo called IO Knot. You can type a natural language question, and it pulls out documents from text and pulls out semantic entities. And you see, it's not quite perfect—couldn't quite resolve the spelling problem. But this is all automated, so there's no work in putting this information into the right place.
randfish
03-11-2005, 12:33 AM
Xan,
Thanks so much for sharing. It sounds like there is work in this field, but that it's a long way from being something that's implemented in commercial web search. I do find it fascinating though. Perhaps when I've retired from SEO, I'll go into IR. :)
you'd be welcome I'm sure :)
Blogged lowdown available for you now at .search-science (http://spaces.msn.com/members/search-science/)