Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > Search Technology & Relevancy
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 05-10-2005   #1
nanocontext
next genaration information services
 
Join Date: May 2005
Location: Bangalore
Posts: 16
nanocontext is on a distinguished road
The relevance of "relevance"

For the past few years, since the search engine "race" has begun, the term "search relevance" has become a buzzword. One question arises is - is it relevent anymore???

It was relavant when the internet boom happened (so did the bust) and we were grappling with making sense of ever growing content. I guess for the past 5 years everyone has talked much of semantics, clustering, natural language processing to increase the relevancy. But the question remains...what is the "useability" of search results - by which I mean the quality of the impact that it really has on our need.

This is related to my work which is about modeling the human intellectual framework. When applied to search ( a demo for search can be seen at nanocontext), the focus has been "useability", not relevance. So I was wondering how really relevant is this "relevancy" thing? And when are we really going to move on to a better way of looking at things?
nanocontext is offline   Reply With Quote
Old 05-10-2005   #2
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Hi, nanocontext

Excellent questions.

In my opinion, relevancy has a lot to do with perception.

1. Which content is relevant according to user's perception?

2. Which content is relevant according to scoring functions used by a machine (IR system or search engine)?

3. Which list of content (documents) scored and already prequalified as relevant by a search engine algorithm are actually relevant according to user's perception and to the query that has been used?

These three are different questions. I believe Rustybrick tries to answer the last question (Q#3) in the The Search Engine Relevancy Challenge .

My take on their experiment is that it has very valid merits and I am happy many are paying attention to it. However, I would like to point out that these studies are not new and often have been carried out in terms of precision versus recall and EM measures. In Ricardo Baeza-Yates's "Modern Information Retrieval", Chapter 3, page 75 -79, Ric shows even examples with graphs on how to evaluate the retrieval performace of different systems.

As mentioned in this article, users tend to assess document relevancy by visually interpreting information that has been displayed. This is not a linear process. Indeed, the displaying can be tricked in many forms and shapes: use of CSS positioning, nested tables, graphics, dynamic effects and even sound. Even colors, background images and subliminal techniques (watermarks, upper scale steganography, etc) can affect the user's perception of relevancy.

A machine-based implementation, at least most of the current one, tend to read tag-by-tag information coded in documents, and as tags are found in the document. This is a linearized process Things like word proximity and local frequency as found in the code and as displayed to the user are often mismatched. Thus, the scoring of relevancy by a machine and what is perceived as relevant by the user can be evident and often does not match.

This is what makes document linearization so important for optimization purposes, in my opinion.


Orion

Last edited by orion : 05-12-2005 at 02:06 PM.
orion is offline   Reply With Quote
Old 05-10-2005   #3
xan
Member
 
Join Date: Feb 2005
Posts: 238
xan has a spectacular aura aboutxan has a spectacular aura about
Hi nanocontext,

I've seen your work before actually, its good stuff, interesting.

NLP involves identifying a set of concepts to sort out the interrelationships between them. The sentences are then split into semantic parts. Because language is ambiguous, scores get assigned to words and also concepts basically.

I think those techniques are always going to be the basis for pre-processing.

As for relevance, well I think you're right to an extent. Relevance feedback is being tested a lot and after inconclusive results (this meaning we didn't really test this very well!) its being tried again. Particularly in QA systems, where it has good results.

Errors do happen in IR system, usually because of stemming being used for queries rather than indexing tasks. This doesn't really show up at all in IR scores, but has a big impact on the user.

you probably know this, but in IR generally the relevancy score is calculated by divinding the number of relevant documents retrieved by the total number of relevant documents. This is of course "relevant" in the way that the weighting is calculated.

Relevancy has never even been defined for search engines though. Its a grey area. Its really really hard to calculate. How relevant is the relevancy? Not very really. We can't even measure it.

TREC have done a load of experiements on relevancy. There's also a system called ASK (Anomalous State of Knowledge) by Belkin which is about the role of users.

The user sees relevancy differently simply because they are human. Their needs are subjective, dynamic, and will depend on their preception as well. More user input might be the answer to the problem. This would definately affect SEO.

Are you planning more work on your system?
xan is offline   Reply With Quote
Old 05-11-2005   #4
nanocontext
next genaration information services
 
Join Date: May 2005
Location: Bangalore
Posts: 16
nanocontext is on a distinguished road
Quote:
Originally Posted by orion
1. Which content is relevant according to user's perception?
2. Which content is relevant according to scoring functions used by a machine (IR system or search engine)?
3. Which list of content (documents) scored and already prequalified as relevant by a search engine algorithm are actually relevant according to user's perception and to the query that has been used?
#3 is the most critical question, because thats where the money is. The lesser the gap is between users perception and system perception, the more useable the information is. But at least as of now, the gap is huge. So for making the information really useable, I see a need of an intelligent mix of both user perspective as well as system perspective. But even initiatives that try to get in some user input are in their infancy.

And to answer Xan's question...Yes, there is a lot of work planned ahead for nanocontext. It's taken real long (5 years) to figure out things...development will start once I find the right partners. I guess by the year end it would be ready to rock.

I think it would be good to start a new survery for useability...let me think of something for it.
nanocontext is offline   Reply With Quote
Old 05-11-2005   #5
I, Brian
Whitehat on...Whitehat off...Whitehat on...Whitehat off...
 
Join Date: Jun 2004
Location: Scotland
Posts: 940
I, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of light
Search relevance is still an imporant topic. From a user and SEO point of view, if a query for keywords relating to a website only shows that website via a long list of search results scraper sites, then there's something wrong with the relevancy of that particular search query. Perhaps Google therefore thinks that #2 is the most important from Orion's list?
I, Brian is offline   Reply With Quote
Old 05-11-2005   #6
nanocontext
next genaration information services
 
Join Date: May 2005
Location: Bangalore
Posts: 16
nanocontext is on a distinguished road
Google and others may have been thinking that way for the past few years trying to make those fancy algos, but I think all of them realize that whoever addresses #3 is the winner, because then the user will say "Hey these people give me exactly what I want for my need...so let me stick here"

The starting point is still "keywords" making query relevance itself very low, and when your starting point has low relevance, how can you give results that are of high relevance. What I have been pointing towards is that we need to move away from the present view of relavance to a new one...something I call "useability".

I really wonder that while everyone has been developing those fancy ranking alogos, why is that no one has went beyond making the search bar anything better than a plain text box?
nanocontext is offline   Reply With Quote
Old 05-11-2005   #7
seobook
I'm blogging this
 
Join Date: Jun 2004
Location: we are Penn State!
Posts: 1,943
seobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to all
>I really wonder that while everyone has been developing those fancy ranking alogos, why is that no one has went beyond making the search bar anything better than a plain text box?

many have advanced search pages.

also by leaving it plain maybe they learn conceptual relationships better.
http://searchenginewatch.com/searchd...le.php/3503931
__________________
The SEO Book
seobook is offline   Reply With Quote
Old 05-11-2005   #8
nanocontext
next genaration information services
 
Join Date: May 2005
Location: Bangalore
Posts: 16
nanocontext is on a distinguished road
Quote:
Originally Posted by seobook
many have advanced search pages.
Of course everyone has advanced search options, but all they offer is either filtering (language, geography, document type, domain...) or logical operations on keywords (exact phrase, all words, at least one word...). But still that does not allow us to specify what we want.

What I am talking about are richer query description options (like the query interface shown in my demo)demo . As you can see, it has options for specifying 'what" you want, the "context", conditions and so forth. I think any search engine can do better with such user input (if they want even better, they can of course pay me )
nanocontext is offline   Reply With Quote
Old 05-11-2005   #9
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Hi, nanocontext.

Q#3 is most definitely the most important one. I checked the demo. Please let me know when the working version will be available. I'll be happy to test it. Keep the hard work.

Cheers.

Orion
orion is offline   Reply With Quote
Old 05-11-2005   #10
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Here in The fractal nature of relevance, Jim Ottaviani, Library Head, Univ of Michigan, describes a fractal model for relevancy that may be of interest. A lot of research has been conducted in this area. The point to be made is that perception also has a lot to do with clustering of topics, ideas, notions, words in the mind of users.

It would be nice if a system that can tackle question #3 is also able to arrive to the same type of clusters as perceived by users. AI is still in its infancy but projects like nanocontext and 20Q and similar ones are nice steps in the right direction, in my opinion.

Orion

Last edited by orion : 05-11-2005 at 03:26 PM.
orion is offline   Reply With Quote
Old 05-12-2005   #11
claus
It is not necessary to change. Survival is not mandatory.
 
Join Date: Dec 2004
Location: Copenhagen, Denmark
Posts: 62
claus will become famous soon enough
Quote:
Originally Posted by nanocontext
the term "search relevance" has become a buzzword. One question arises is - is it relevent anymore???
(...)
But the question remains...what is the "useability" of search results - by which I mean the quality of the impact that it really has on our need.
(..)
So I was wondering how really relevant is this "relevancy" thing? And when are we really going to move on to a better way of looking at things?
I find these highly ambigous questions. Do you mean:
  • How do i define "relevancy"?
  • Is "relevancy" equal to "search relevancy"? If not, how is the latter defined?
  • How is "useability" defined? And how is it related to "relevancy" or "search relevancy"?
  • Are there different types of "relevancy" ? If yes, which types are there? And which are "better" given the specific scenario [insert description here]?
  • Given the definition of "relevancy" as [insert definition here], in which scenarios is "relevancy" itself not "relevant"? In which scenarios is it "relevant"?
  • Given scenario [insert description here] - is "relevancy" as defined by [insert definition here], more or less important than [insert other specific factors here] in [insert context here] in order to [insert user need here]?

As for the move to a better way, i think we first have to agree on which way we are on. Then, we can discuss if said way is good or not, and perhaps then discuss relative benefits of this way as compared to other ways.

If we then find that other ways have larger net benefits than the current one, we might start to speculate on a possible time frame for switching ways (given, of course, that the larger net benefits are valid for the people we wish to switch to another way).
claus is offline   Reply With Quote
Old 05-12-2005   #12
nanocontext
next genaration information services
 
Join Date: May 2005
Location: Bangalore
Posts: 16
nanocontext is on a distinguished road
Sure its a good idea to define things, but lets not make it complicated. Lets have something that is simple enough and something which people can relate to...and discuss.

Lets say we search using some keywords/keyphrases. These keywords will be related to a whole gamut of things. But when we search, we look for something specific in that whole range of things. The question is (A) how best can we specify our query as per our needs and (B) with respect to those needs how relevant are the results...that would be a good view of "relevancy".

The issue of useability is like saying "I want vegetarian recipe without onions". The ideal case is if you simply open the first link and get the recipe you see without having to read to make sure that are no onions. I tried this on google (recipe vegetarian -onions)...first was a link to another search, second was for a dessert and the next few links had recipes with onions..and so on. So useability that way is too low...and thats valid for all search engines as of now.

The buzz around relevancy exists because irrelevant results are by far too many. Among those irrelevant results, relevancy matters. But the kind of useability that I am pointing, one requires to first address the issue of non-relevant results, which will automatically make "relavancy" obsolete.

All this is not gonna come overnight. I have been trying to look at the kind of stuff that needs to go in to make it happen, to know how soon it can happen and while we do that transition, to see where "relevancy" stands.
nanocontext is offline   Reply With Quote
Old 05-12-2005   #13
xan
Member
 
Join Date: Feb 2005
Posts: 238
xan has a spectacular aura aboutxan has a spectacular aura about
This conversation is turning to QA systems. 20Q works on neural nets which is an old technique, not used so much now. There is a lot of research being done on QA systems, and they have evolved to be more intelligent than the 20Q game (thankfully or we'd all be in trouble), but in a very focused subject area. General spaces, like the search space is very difficult to tackle using the methods used effectively on small groups. This is why there is a lot of flutter about clustering but on an internal system level. This based around data mining algorithms, and is very much an area that those experts deal with. I've seen some really good things being done with that.

There appears to not be a whole lot of work going on, but a lot is happening behind the scenes. Security is a major issue at the moment. Without excellent security we can't release some of these systems to the public, because it's not allowed.

As for relevancy, like I suggested, it's so difficult to measure its near impossible to tell how relevant something really is. We don't don't even k now what it is as far as search engines are concerned. In closed environments like digital libraries you can easily measure it.

Claus I do agree with you:

Relevancy needs to be defined, and it is in IR by way of equations. The thing is this is not the same relevancy as a person will have. So it can be thoughts that the IR equation is ambiguous.

The definition of "relevance" according to websters:

"The relation of something to the matter at hand."

and in the field of computing it is defined as (websters also):

"The capability of a search engine or function to retrieve data appropriate to a user's needs."

That's quite similar to me. There is a difference in that "appropriate" suggests that its any result suitable, which leave it a little more vague. This expected as a machine is not a human.

I think that relevancy is highly subjective. If I'm given two objects, a rubber duck and a shoe, it means nothing to me and the two are completely unrelated. To someone else, they might be very much related, it could be a memorable event such as the time they trod on the rubber duck.
People look for different things, for different reasons, and people are fickle! They often change the thing they are searching for half way through the search because they have refined it in their mind, which is fine, but the engine doesn't know that.

People's expectations have grown immensly, and we tend to expect more from the search engines than our cars for example. It's getting to the point where it can't ever please.

Small search engines like Ask Jeeves do much better with all of this going on, because people are so busy attacking the big engines, that they leave the smaller ones alone, and of course praise then instead, because they are getting better. But most of those who praise them don't even use them!

Relevancy has always been an issue, but the mistake in my opinion was to measure relevancy within the data set, rather than relevancy according to humans.
xan is offline   Reply With Quote
Old 05-12-2005   #14
claus
It is not necessary to change. Survival is not mandatory.
 
Join Date: Dec 2004
Location: Copenhagen, Denmark
Posts: 62
claus will become famous soon enough
I just saw this study published by Dogpile which show that the overlap between Yahoo, Google, and Ask is only around 3% for the top ten results (first page). Moreover, 85% of the results were unique to one of the engines.

So, it seems that even the top search engines disagree totally about what constitutes a relevant page.

Here's the PDF (2 pages total)
Quote:
Missing Pieces: A Study of First Page Web Search Engine Results Overlap
Conducted by Metasearch Engine Dogpile.com in collaboration with:
Dr. Amanda Spink, Associate Professor at the School of Information Sciences at the University of Pittsburgh
Dr. Jim Jansen, Assistant Professor at the School of Information Sciences and Technology at The Pennsylvania State University

Last edited by claus : 05-12-2005 at 07:45 PM. Reason: typo
claus is offline   Reply With Quote
Old 05-14-2005   #15
nanocontext
next genaration information services
 
Join Date: May 2005
Location: Bangalore
Posts: 16
nanocontext is on a distinguished road
It's an interesting study I would say. Given that every search engine thinks it knows the best way, this was expected...though I would've guessed something better than 3% if asked for.

The study also brings an interesting observation. If you search and find the 3% that is found by all search engines...what can be said about its relevancy. Dogpile may be taking a mean of the rankings of the common ones and displaying new relevancy. Is the mean relavancy more valid? I think it would be a good idea to put that to a test if Rustybrick can include it in the next version of the Search Engine relevancy challenge...
nanocontext is offline   Reply With Quote
Old 05-14-2005   #16
mcanerin
 
mcanerin's Avatar
 
Join Date: Jun 2004
Location: Calgary, Alberta, Canada
Posts: 1,564
mcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond repute
Here is an interesting issue regarding this.

Anyone who has ever had a conversation with me knows that my usual response to someone asking me a simple question is to answer it as fully as possible, occasionally to the annoyance of whoever I'm talking to. A friend of mine once accused me of responding to a question like "what time is it" with a treatise on how to make a clock followed by a discussion on timeszones, etc.

The point is, that's actually a bad answer, though to a computer scientist it might be considered to be a good one.

If someone asks you for a vegitarian recipe without onions, it might not be the best response to try to provide the most complete answer, but rather to provide the answer that is most likely to make the questioner happy, which in some cases might be a so-so but easy and popular recipe rather than a culinary masterpiece that takes a trained chef and specialized equipment to make, for example.

In short, a popular, simple answer may be better than an accurate, complete, or even "best" or "award winning" one, especially for an initial question coming from someone who is not an expert on the subject matter.

The same answer would probably annoy a professional chef, however.

I suppose the real question of relevance is "to who?". There could a good argument for actually being able to choose your technical level in the search box in order to provide answers that range from "holding your hand" to "complete engineering specs".

Flowing from this, I suppose one might be able to guess at a search engines target market by looking at what it considers to be "relevant results", though that might be pushing it too far.

Ian
__________________
International SEO

Last edited by mcanerin : 05-14-2005 at 06:36 PM.
mcanerin is offline   Reply With Quote
Old 05-15-2005   #17
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation Umberto Eco

I agree. User's perception of relevancy matters a lot. Some readers do not understand expressions/terms like "the semantics, syntax and pragmatics of symbols" but are familiar with expressions/terms like "the meaning, structure and use of signs". Some may prefer to read A Theory of Semiotics while others prefer The Role of the Reader.

In the former the terms mean almost the same and in the later both books are written by the same author, Umberto Eco. Professor Eco is both a wonderful novelist and one of the greatest semiotic theoretician/thinker alive. He is best known in mainstream for The Name of the Rose rather than for his Model "Q" of Semantics.



Orion

Last edited by orion : 05-15-2005 at 01:50 AM.
orion is offline   Reply With Quote
Old 05-15-2005   #18
nanocontext
next genaration information services
 
Join Date: May 2005
Location: Bangalore
Posts: 16
nanocontext is on a distinguished road
Quote:
Originally Posted by mcanerin
a popular, simple answer may be better than an accurate, complete, or even "best" or "award winning" one, especially for an initial question coming from someone who is not an expert on the subject matter.

The same answer would probably annoy a professional chef, however.

I suppose the real question of relevance is "to who?". There could a good argument for actually being able to choose your technical level in the search box in order to provide answers that range from "holding your hand" to "complete engineering specs".
Ian
Even if we could get info on "who/what" the person is, the relevancy can be higher. Instead most of the search engines are trying to say "let me guess who/what you are" and it continues to "let me guess what you want and what we think is relevant to you". The issue of getting more info from the user can be extended to many more things, but the question is how much is good enough. (someone commented that the input form in my demo to be too laborious and would confuse the users ).

But suppose someone gives all that info, to what degree can the relevance of the results be improved using current technologies?
nanocontext is offline   Reply With Quote
Old 05-16-2005   #19
mcanerin
 
mcanerin's Avatar
 
Join Date: Jun 2004
Location: Calgary, Alberta, Canada
Posts: 1,564
mcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond repute
I'll take that as a challenge, just because I'm supposed to be doing work right now and am (predicably) avoiding it

Here is the basis for my thinking: a short while ago, a usability study was performed in the field regarding some accounting software. The software was owned by an auto mechanic.

The usablity experts watched the mechanic use the accounting software and took notes. They noted that he was very hesitant, tended to always go with the default choice, and never considered customizing the reports, among other things. Classic "barely computer literate" stuff. I'll get back to how that relates to search engines in a second.

On this occasion, however, something else happened. See, normally they bring these guys into the lab and do these tests - this was one of the few times that they did it in the field. There is no real good reason to go into the field for this type of work. But this time they did. Being usability experts, they did not shut off their brains or their observations just because the mechanic finished demonstrating the accounting software.

See, in this case he then had to go out into the shop and do some work. This work was computerized diagnosis of some high tech vehicles. The software used for this was very interesting, and the usability experts could not help but notice an amazing transition.

The software was at least as complicated, if not more so, than the accounting software. The mechanic flew through using it. His actions were precise, he questioned the computer and forced additional diagnostics, he showed complete confidence and control in every aspect of the software. When questioned, the mechanic also proudly showed how he had ordered, complied and installed custom modules, upgraded custom chips and in one case actually rebuilt the software after a hardware crash.

In short - an expert level skill set.

The usability experts were astonished, as the two demonstrations seemed to be polar opposites. One was a classic newbie, and the other was a classic computer expert, and both were the same person! they concluded that you can't write software interfaces based on peoples general technical skills, but rather on their technical skill in the area being addressed.

Let's apply this to search engines. Something like 70% of all searches are for information. Presumably some of these searchers are experts looking for technical answers and some haven't a clue but are looking for one. The ideal SERP for each is different.

Even if you know that I'm an expert user in general, you should not assume that I'm an expert in knitting, should I decide to do a search about it one day.

Likewise, my mother might need severe handholding for most searches, but would demand "quality" in searches related to knitting. "Quality" in this case being a technical level I probably would have difficulties understanding.

The point is that I think "personalization" is the wrong way to go. The same "person" may have multiple interests, and different skill levels in each of those interests.

If I tell Google that I'm interested in SEO and Knitting, then it should not assume that my interest is an indication of expertise, since it is in one case and most definately is not in the other.

Likewise, I would NEVER claim warehousing was an interest of mine, but I have an unfortunate level of experience in the area which I'm trying hard to forget as a bad time in my working life.

In order to be able to figure out this level of granularity, I suspect that an SE would need a disturbing amount of information and history on someone to provide accurate results.

I think the most efficient way of doing so would be to ask the user to specify their level of expertise in the area for each search. If they did not specify anything, then I think it would be a fair assumption that they are newbies.

This has the advantage of not needing any personal information on the searcher ahead of time, which is a huge positive.

Hmmmm.... Maybe I should patent this - simple but way better (IMO) than most personalization attempts right now.

Ian
__________________
International SEO
mcanerin is offline   Reply With Quote
Old 05-16-2005   #20
xan
Member
 
Join Date: Feb 2005
Posts: 238
xan has a spectacular aura aboutxan has a spectacular aura about
Hi Ian,

"The point is that I think "personalization" is the wrong way to go. The same "person" may have multiple interests, and different skill levels in each of those interests."

I totally agree with you, for all the reasons you gave, and slso because it's too laborious for the user. I think it opens up a whole new can of worms, which is a problem when we already are swamped with a load of things to sort out!
xan is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off