Special thanks to:
|
#1
|
|||
|
|||
|
Anaphora, complex stemming, etc...
Hi,
I was reading this thread and wondered what you guys thought about it. It's relatively basic but interesting, covering the potential use of anaphora resolution in Google and the "stemming of complex plurals". Just wondered what your thoughts were on that. MissOpie x |
|
#2
|
||||
|
||||
|
Re: Anaphora, complex stemming, etc...
A lot of good thoughts in that thread and much to think about going forward. But it did get quite off track from the initial stemming of complex plurals topic (which many such threads are wont to do).
Anaphora resolution is basically associating a pronoun with it's precedent. The wiki article gives a good example of its purpose and also an example of the inherent complexity: Quote:
A bit more interesting to me at the moment from that thread is Robbo's example passage: Quote:
Quote:
Quote:
|
|
#3
|
||||
|
||||
|
Re: Anaphora, complex stemming, etc...
I'm not so sure that is what is happening. I can't say it's not, of course, but the problem with linguistic methods is.. language. Since there is obviously more than one language on the web, linguistic analysis is more limited than you may think.
What I do think that might be very possible, even probable, is a lookup table based on related words. Think about it - stemming only gets you so far (fishes to fish). But if you had a lookup table for certain words, you could very easily combine a great many words into a single "concept". An example used in the thread that was referenced earlier was "Nitrous Oxide" and NOS. How about search engine optimization and SEO as another example? But it would be simple to find related terms - Google has access to Adwords data, term vectors, a huge database of sites, dictionary (the define: command) and no doubt many other sources of information. It would not be difficult to say that if someone searched for "search engine optimization", that they should also check "SEO" and "search engine optimisation", too. It would probably be more efficient than trying to analyse pages and queries linguistically. Just do a lookup. Google is good at lookups. Think about Google translate. Take a very close look at how it works. It starts by doing a standard translation. But then it adds phrase lookups based on both user input and on documents that it knows are identical but translated. The result is far better than a dictionary or linguistic based translation wherever the human translation is available. The wonderful thing about lookup tables is that they are multi-lingual - it doesn't matter what language is being used - it's just a lookup. If I were them, I would then assign a weighting to the lookups, where an exact match was given more weight than an inexact match, and a popular inexact match given more weight than an unpopular one. Some concepts that would work great with lookups but not with stemming or linguistics: Las Vegas, Vegas, LV, Los Vegas SEO, Search engine optimisation, Search engine optimization Woman, female, babe, lady Man, male, guy, dude, gent And so on. I would suggest that Occams Razor would support a simple lookup based on Google vast database of keywords and searches, rather than language specific, on-demand linguistic analysis of every search and page. It would also deal with common spelling mistakes fairly easily - let linguistic analysis try that! Ian
__________________
International SEO |
|
#4
|
||||
|
||||
|
Re: Anaphora, complex stemming, etc...
Quote:
Quote:
I do think that this is where things are now. We might not see "Beret" bolded in the snippets when we search for [France], but I'd (almost) bet that if both terms occurred on the same page a bit more weight would be given to the occurrence of "France." |
|
#5
|
||||
|
||||
|
Re: Anaphora, complex stemming, etc...
Quote:
The problem is, that the only way to talk about it easily with non-statisticians (which includes me) is with pictures and graphs (or a drawing on the back of a napkin, which is how Dr Garcia first explained it to me and one other guy you probably know a couple of years ago.) This means I don't discuss it in forums - that last time I tried it was a disaster. Half the people thought I was crazy, and the other half thought they'd discovered the holy grail of spam. It's neither, and as Dr. Garcia will quickly point out, it is very difficult to do if you are not a search engine. My version is simply "good enough" and better than what my competition uses, but I don't pretend it's perfect. As for details, it is safe to say that if you type a search term into Google, they can quickly bring up ALL the major terms in every document in their inventory and compare them very quickly. Terms that constantly show up in "natural" texts are then looked for as indicators of a particular document being both on-topic and non-spammy. It's not just things like "poker" being related to "Las Vegas", it includes the number "800" being a term vector for an online shopping portal. Any guesses why? ![]() Although TVA is a different issue, I do suspect that TVA is a good source for some of those lookup terms. I know I'd look there, if I was a search engine programmer. Ian
__________________
International SEO |
|
#6
|
|||
|
|||
|
Re: Anaphora, complex stemming, etc...
I think that statistical machine translation, translation using parallel texts, have produced less than adequate translation. in french I recently had "server" translater as "waiter.
I think a lot more work is now being done on natural language techniques. My view here is that language does not abide by a set number of rules, it's more fluid than that. Lookup techniques use up a lot of space and aren't flexibe enough in my opinion to deal with things like natural language queries for example, which are seeing a lot of research activity at the moment (i'm not saying this is going to be used, just that there is activity). I do know that there is interest in natural language generation also, hugely difficult problem that one. Personally of course I'm not sure what's going on, but it's always interesting to chat about when something worth some banter pops up. ![]() |
|
#7
|
||||
|
||||
|
Re: Anaphora, complex stemming, etc...
Ian, yes, much too complicated for folks not Dr. Garcia to try to talk about in forums. But it does have to be discussed to a certain extent just so folks know there are things going on other than the obvious. The things I mentioned are more along the lines of what I try to keep back of mind as I work, or as I try to understand why I can't understand certain SERPs.
|
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Optimizing A Complex Site | Dami | Search Engine Optimization | 3 | 07-12-2005 06:29 PM |
| I'm a searcher with a questions about how stemming works | katiel | Google Web Search | 7 | 06-13-2005 03:06 PM |
| Secure and/or complex URLs | kavinski | Search Engine Optimization | 4 | 08-18-2004 08:28 AM |
| Domain strategy - too complex? | Catamarans | Google Web Search | 2 | 07-15-2004 03:22 PM |
| KeyWord Stemming & Word Forms | Red5 | Search Technology & Relevancy | 27 | 06-29-2004 04:27 PM |