Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > Search Technology & Relevancy
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 10-09-2005   #1
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation Fallacies of Relevance

Old material with great information on relevancy!

In Fallacies of Relevance, Dr. Garth Kemerling summarizes many incorrect reasonings (fallacies). Dr. Kemerling writes

"The informal fallacies considered here are patterns of reasoning that are obviously incorrect. The fallacies of relevance, for example, clearly fail to provide adequate reason for believing the truth of their conclusions."


Fallacies of relevance along with fallacies of presumption and fallacies of ambiguity are part of a larger set known as informal fallacies.

These type of noisy patterns are hard to quantify in written words (e.g. documents). One of the goal in AI research is to construct a machine able to score these patterns. So far, current search engines and their IR cousins have failed in this regard since they target the "mechanic" of the problem but not its "soul". To top off, users tend to assign relevancy or make an assessment of it based on how useful a piece of information is to them, while current systems tend to score string patterns but not meaning or users' intentions and call those scores "relevancy".

And what about the source or the author of the generated piece of information? To different degrees, from time to time one can come across news articles, press releases, marketing material, even research papers contaminated with such noise; and even trained editors can fail in recognizing these patterns. At some point in our lives we may be guilty of some of these.

Depending on the end goal, not all of these patterns of reasoning are necessarily bad. For instance appeal to emotion is a common sales technique.

For those interested, here are some of the patterns discussed at G.K.'s site. Specific examples are given at his site.

1. Appeal to Force (argumentum ad baculum)
In the appeal to force, someone in a position of power threatens to bring down unfortunate consequences upon anyone who dares to disagree with a proffered proposition.

2. Appeal to Pity (argumentum ad misericordiam)
Turning this on its head, an appeal to pity tries to win acceptance by pointing out the unfortunate consequences that will otherwise fall upon the speaker and others, for whom we would then feel sorry.

3. Appeal to Emotion (argumentum ad populum)
In a more general fashion, the appeal to emotion relies upon emotively charged language to arouse strong feelings that may lead an audience to accept its conclusion.

4. Appeal to Authority (argumentum ad verecundiam)
Each of the next three fallacies involve the mistaken supposition that there is some connection between the truth of a proposition and some feature of the person who asserts or denies it. In an appeal to authority, the opinion of someone famous or accomplished in another area of expertise is supposed to guarantee the truth of a conclusion.

5. Ad Hominem Argument
The mirror-image of the appeal to authority is the ad hominem argument, in which we are encouraged to reject a proposition because it is the stated opinion of someone regarded as disreputable in some way. This can happen in several different ways, but all involve the claim that the proposition must be false because of who believes it to be true:

6. Appeal to Ignorance (argumentum ad ignoratiam)
An appeal to ignorance proposes that we accept the truth of a proposition unless an opponent can prove otherwise.

7. Irrelevant Conclusion (ignoratio elenchi)
Finally, the fallacy of the irrelevant conclusion tries to establish the truth of a proposition by offering an argument that actually provides support for an entirely different conclusion.

Food for thoughts. Dare to discuss any of these and their impact in your business or profession?


Orion

Last edited by orion : 10-09-2005 at 04:37 PM.
orion is offline   Reply With Quote
Old 10-10-2005   #2
traian
Member
 
Join Date: Sep 2004
Posts: 187
traian is on a distinguished road
LSI software

One question Orion,

You all are talking about LSI here and how that could improve the work of an SEO. But, besides the www.Theme-Master.com website is there any other tool that we can use? BTW, what is your opinion about theme master?
traian is offline   Reply With Quote
Old 10-10-2005   #3
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Quote:
Originally Posted by traian
One question Orion,

You all are talking about LSI here and how that could improve the work of an SEO. But, besides the www.Theme-Master.com website is there any other tool that we can use? BTW, what is your opinion about theme master?
Hi, Traian

First, thank you for posting.

I believe your post is misplaced since this thread is not about LSI.

About other tools/methods: In my work, I use semantic patterns and co-occurrence theory, which can be used with global, local and fractal systems.

Regarding the second part of your question, at the Term Vector thread started a year ago, Mike (thememaster), who recently co-started a company, is still a grad student at the Univ of Virginia, according to http://www.fortuneinteractive.com/Mike_Marshall.htm You may want to repost at that thread, so this one stay on-topic. However, I would say this: to compute term vector weights as done by search engines you need a corpus, otherwise whatever you compute is something else but what an engine scores. Current term weight models are quite fancy (e.g. BM25 and similar to this one)




Orion

Last edited by orion : 10-10-2005 at 01:16 PM.
orion is offline   Reply With Quote
Old 10-10-2005   #4
traian
Member
 
Join Date: Sep 2004
Posts: 187
traian is on a distinguished road
Thanks Orion,

This is so mystic for me. All this terms... But I am willing to learn. Could you point me to some e-works to start on? I mean about LSI and semantic analysis of seach engine, but at e beginner level?

Thanks
traian is offline   Reply With Quote
Old 10-10-2005   #5
randfish
Member
 
Join Date: Sep 2004
Location: Seattle, WA
Posts: 436
randfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to all
I love this post, Orion. It's so universal - but you can think of how it applies to linking, search and the rule of certain types of authority sites and authority sources of information on the web.

History is certainly passed down in such ways - through fear or charisma, a certain set of events becomes the accepted "truth" no matter if they are factually accurate. The same goes in politics, religion, even science (occassionally).

I think this is one of the topics that the French political structure was complaining about with Google - they have the ability to determine what kinds of knowledge are "popular" and, therefore, widely accepted.

I wonder, what made you think of this?
randfish is offline   Reply With Quote
Old 10-10-2005   #6
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Quote:
Originally Posted by traian
Thanks Orion,

This is so mystic for me. All this terms... But I am willing to learn. Could you point me to some e-works to start on? I mean about LSI and semantic analysis of seach engine, but at e beginner level?

Thanks
Oh, please, don't worry.

I hate to drop my own links: Tutorials on

indexing: http://www.miislita.com/information-.../indexing.html

term weight calculations: http://www.miislita.com/information-...-tutorial.html

EF-Ratios:
http://www.miislita.com/information-...-tutorial.html


Tutorials on LSI. I have a collection but non of them satisfy me. LSI fails in many instances with the actual "soul" of relevancy, so as most search engines. They are not as smart as many think. There are too many problems in LSI associated to reducing dimensions (the so-called dimensionality reduction curse) and when implemented in a very large corpus. So far some of the large search engines seem to limit LSI to small datasets of paid services, but not to larger corpus of organic results.



Orion

Last edited by orion : 10-10-2005 at 08:18 PM.
orion is offline   Reply With Quote
Old 10-10-2005   #7
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Quote:
Originally Posted by randfish
I wonder, what made you think of this?
University research and a private AI company working in this area.

Orion
orion is offline   Reply With Quote
Old 10-11-2005   #8
traian
Member
 
Join Date: Sep 2004
Posts: 187
traian is on a distinguished road
Thanks

Thanks Orion,

I will read the documentation, and try to see how that can help on SEO field. If i'll have i'll put them here, ok?

Traian
traian is offline   Reply With Quote
Old 10-16-2005   #9
massa
Member
 
Join Date: Jun 2004
Location: home
Posts: 160
massa is just really nicemassa is just really nicemassa is just really nicemassa is just really nicemassa is just really nice
>Dare to discuss any of these and their impact in your business or profession?<

Sem's interested in top placement in search results will see the focus shift from getting the right links to getting the right people to say the right thing about a specific topic. It has already had a significant influence on my business in terms of my building a different kind of network with different kinds of compensation arrangments, even though I believe we are at least 12 to 36 months away from seeing any dramatic changes in the results. It won't come as a nuclear blast, it will come, as it has been coming for some time now, in small, almost imperceptible increments.

Now, I would like to discuss the meat of your post Doc. Sorry for the length, but you of all people know how difficult it can be to get these kinds of abstract ideas explained and scrutinized. So, here it goes.
*****************************

Back in about 2,000 or so, lacking the programming skills to turn actions into equations, SearchKing's idea for determining relevance was to attempt to simply match like minded humans. We put up a little link below the search box that said, "show my hometown results first". No big deal. Had already been done to death. BUT, the idea was to start matching people searching to other people organizing data who would likely feel the same way about a specific topic.

We had already learned that the people who had paid the most for their independent portals were the ones who were producing the best results. Out of some 6,800 portals we were hosting, over 6,000 were free and out of those about 5,900 had little reason to exist. BUT, even that small of a percentage did prove to us that even free portals COULD produce a certain percentage of quality and relevance, (assuming we could find a way to define that term). So we decided to continue offering free portal hosting without class distinctions but mostly as a way to keep gathering data and test our theories.

The plan was, as someone would sign up for a portal, we would ask them questions to start being able to categorize the person making the relevancy decisions with this specific portal. The very easiest matching is by zip code, city, state, and country. We knew that by knowing a portal operators' location and then matching that to a search giving precedence NOT ON THE KEYWORDS, BUT ON THE HUMAN DOING THE SEARCH, we would have a shot at being able to come closer to getting the person doing the search to "feel" the results were more relevant. The thinking of course was that people who live in close proximity to each other are more likely to think a particular result relevant due to geo-specific influences. For example, a person who lived near a river would likely be more concerned with flooding than someone from an arid region, so we thought maybe we could assume that a search for real estate being done by someone from a specific city and then matching that to a portal operator from the same city, would have a greater chance of being more relevant to the searcher as we could assume both parties would see value in specific types of real estate that was not prone to flooding.

Setting up something to give scores to geo specific locations was easy. As I said, it had been done to death and this was just the first step. What we really wanted was more of a personality match and most of all an emotional motivation match. Naturally things like religion, political affiliation and demographics were very important but what we REALLY wanted was to know the things that made people feel good about themselves and things that pissed them off. We felt emotions were the essence of relevancy.

We knew we could expect to get a lot of personal information from the people wanting to operate a portal but we also knew we could not ask the general public to give us private information about themselves just to do a search. Without both parts of that equation, there could be no matching anything. We also knew we did not have the resources to start setting up tracking programs, (even from the portal operators who were sitting on our servers) and then storing and analyzing the data to make guesses about what the personal preferences of the portal operators were. Even if we did have those kinds of resources and had a way of determining that if your browser went to a certain place x number of times in x amount of time, that must mean ------- WHAT. No matter how we tracked or what measuring stick we would use to determine what the tracking meant, it still would be only a guess and it still would not give us much, if any, insight into the portal operators emotional motivations for thinking a specific data set was relevant to that portal operator.

What we did have was a very active forum. A place where a majority percentage of operators that cared about their portals interacted. What we learned was that we could, over time, start to identify certain personality traits. We started seeing who got upset when they felt threatened. Who changed their minds quickly. Who stood up for what they believed and on and on. We started seeing a way we could assign points to various personalities types and then predict an emotional response to a stimulus, like a website submission or a search query.

Using what we had learned from being in such a unique position, we started working on developing algorithms not on the data, but on the humans reviewing data and assigning relevancy scores to that data to work in conjunction with pretty much standard mathematical algorithms. We didn't really realize it at the time, but we were building a kind of trust rank. We could give points first to the on site criteria of a source doc in our database, then we would add and subtract points based on what we knew about the person/persons who had reviewed that data and assigned some relevancy value to it.

Now, we see we can take a search for real estate and based on things like how many operators had submissions related to this search term, how many submissions had been reviewed by that operator, how active their portal was, PLUS, where the operator was from, how old were they, a few demographics, (all of this stuff is pretty much standard), AND what kind of personality traits did that operator display, we could come up with a different kind of algorithm. The trick now was how to get this information from the searcher.

Our first idea was cyber guides. I won't go into that because I see where the concept, while sounding good at the time, is completely flawed in relation to a searching experience . People don't go to search engines to interact with the search engine. They go there to get information. Relevant information. So, I still think there is tremendous potential in having entities that would interact with visitors, I don't see any point in discussing it in this context. I will say though that Dan Kavanaugh came up with the idea of having cyber guides simply being shapes as opposed to something intended to look or act human and that was brilliant.

So, we talked a little more with some psychologists about how we could use a fast, non-intrusive system for getting enough information from someone doing a search to be able to match personality and emotional motivation to another person involved in the same search. Without spending a LOT of money I didn't have on university level lab and research, the best we came up with was giving something of enough value to the visitor to get them to take an action unrelated to the search they had planned on doing. It could have been something as simple as SHOW HOMETOWN RESULTS FIRST, to something for free if they answer a question, but what we settled on, ( and never got the chance to implement and test), was using the show hometown results first button and then asking them to take a short poll in addition to their zip code, (which we could then match to a city, state, country of a portal operator). The poll would be less than five questions that would enable us to identify traits that would most likely point to various personality classifications such as A types and B types for example. Then match that to the data and the portal operator that closely shared the same personality traits.

All of this doesn't matter. In an over-zealous attempt to obtain capital resources, we failed and were forced to abandon the entire project. My purpose in taking the time to show you this Doc is to point out that I still believe most of the AI and IR people are missing a very key factor in trying to assign relevancy to the actions humans take that actually determine relevancy to those very humans. I would feel very good indeed if I could offer anything to you that may save you some time or even just stimulate a few brain cells because I know how much you love a challenge when it comes to this stuff.

I think the focus on search engine relevancy research at this time is not so much on relevant search results as it is on being able to display relevant ads in close proximity to the results. That is not a bad idea in my opinion because results without a revenue model is pretty much doomed to fail. Besides, it looks to me like the conventional wisdom is that by getting improved relevancy for one, you get the other by default. If we can track and analyze online behavior to a specific searcher and use that to display ads likely to be perceived as more relevant to that searcher, then the perception would be the search results too would be more relevant. And vice versa.

I think displaying results, (and ads), based on behavioral patterns is going to be a huge leap forward, but I also don't think it is not going to be the holy grail as some would think. I think what is missing has a lot to do with what you've brought up right here. For whom does the relevancy fallacy bell toll?

(see next post please for continuation)
massa is offline   Reply With Quote
Old 10-16-2005   #10
massa
Member
 
Join Date: Jun 2004
Location: home
Posts: 160
massa is just really nicemassa is just really nicemassa is just really nicemassa is just really nicemassa is just really nice
I think Dr. Kemerling is much closer to finding the key to relevancy than link popularity could ever hope to be. That is as it should be of course because there is a time for all things and a link pop based algo had to come first. The problem I would think men like yourself face is who is making the determinations? Who decides whether a specific set of data was an appeal to authority or an appeal to pity? Naturally the answer would be that the program would decide based on LSI. Ahh, but who wrote the program and does that person have a type A personality or a type B and does that influence the design document?

I've spoken with enough IR folks to understand that the intention is to convert emotional responses to mathmatical equations based on characters in a text document, but can that REALLY be done and if it is possible, then which road do we take? I believe it can be done but it will require looking beyond anchor text and source documents.

Let me present just one example of how far beyond links I believe we need to look.

Nick Wilson started Threadwatch about a year ago. It has experienced exceptional growth. Those are facts. Now we ask why?

Nick doesn't actually create any original content. He only reviews other people's content and then assigns it a relevancy score by virtue of expressing his own opinion of the value of that content. Why would anyone care what his opinion is? My answer is honesty and honesty, at least in an abstract way, is the essence, or soul, of relevancy.

Nick writes what he honestly thinks and feels. He probably pisses off more people than he pleases BUT, there can be little doubt that whatever he says, it is the truth and by virtue of that perception, you may see someone argue with his point, but never will you see anyone question the relevancy of his conclusions.

When we can find a way to write an equation that identifies and assigns value to honesty, irregardless of intent and without prejudice, THEN match the emotional motivations of the searcher to the emotional motivations of the "honest" content provider, I believe we will then see more relevant ads.
massa is offline   Reply With Quote
Old 10-16-2005   #11
mcanerin
 
mcanerin's Avatar
 
Join Date: Jun 2004
Location: Calgary, Alberta, Canada
Posts: 1,564
mcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond repute
Lawyers (naturally) run into all of the above arguments all the time (sometimes all at once in the same document!) - I'm pleased to see you've even included the proper latin terms for them

I have a couple more to add to the pile:

8) Appeal to Repetition (argumentum ad nauseam)
False proof of a statement by (prolonged) repetition, possibly by different people. In short, as Lenin said, if you tell a lie often enough, people will eventually believe you. KWD comes quickly to mind, as do numerous other "theories" common in forums.

9) Appeal to Consequences (argumentum ad consequentiam)
An argument where you attempt the proof of something by pointing out the consequences of it. An example would be: "Google would not have programming errors in it's algo - that would make it's results unpopular!" aka wishful thinking.

10) Appeal to Novelty (argumentum ad novitatem)
An arguement that because something is new, it's superior or better. I run into this all the time on the web with people claiming that the latest "standard" will save us all, or that the fact that they just did something justifies a press release, which is then supposedly proof their company is superior. The opposite of this (but no less falacious) is Appeal to Tradition (argumentum ad antiquitatem) which is that since we've always done it this way, it must be the best way. ie Stuffing keywords into metatags is what has always done, therefore it's what SEO's should do.

There are LOTS more - and most of them we see in forums everyday. Frankly, I think the world would be a much better place if the simple task of teaching people how to recognise fallacies was taught early and well to children (and adults).

Ian
__________________
International SEO
mcanerin is offline   Reply With Quote
Old 10-16-2005   #12
mcanerin
 
mcanerin's Avatar
 
Join Date: Jun 2004
Location: Calgary, Alberta, Canada
Posts: 1,564
mcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond repute
As a thought - one of the fastest, clearest way to "break" most false arguments is to put it into a Venn Diagram.

A venn diagram can be rendered fairly well mathematically. I wonder if there is research into how to judge relevancy via the use of venn diagrams?

Ian
__________________
International SEO
mcanerin is offline   Reply With Quote
Old 10-17-2005   #13
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Hi, Bob and Ian

I wish I have the time to reply in details to your great posts. I'm taking a flight in few hours to the east coast. In short, those are great comments.

Ian, there are dozen of fallacies, but the fallacies I mentioned above are what we call fallacies of relevance, which to my knowledge no current system can grasp, yet. These are part of a special class. Regarding Venn Diagrams representations, yes these type of studies can be mapped to Venn Diagrams. Done all the time. After returning I can provide some example maps. Sorry have to rush. Feel free to take it from here, guys.


Orion
orion is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off