PDA

View Full Version : How Fair is the Link Popularity Algorithm?


Nacho
09-22-2004, 11:10 PM
We all know how important it is to get links to rank high on the primary search engines, right? An we also know how much effort, time and money it takes to get external inbound links to any given website (.com, .org, or .whatever). Whether it's you who is getting the links or outsourcing it to anyone in the world, it still costs time and money.

The question is, how fair is the link popularity algorithm across all websites on the Internet? Let me explain a little bit more.

Although I would not like this thread to go in depth about PageRank, I will use this one example to illustrate a point. Back some time, I was reading a great thread (http://forums.seochat.com/archive/t-13719) in SEO Chat about how many links to get a PageRank and our good friends, Sharon and Roy Montero, gave great insight about this:

How many backlinks from PR7 Web pages (not Websites) would it take to acquire a PR8 & PR9 Web page?

Back on November 4, 2001 we put together a chart that answers your question so the answer is ... 110 pages to acquire a PageRank8 page and 550 pages to acquire a PageRank9 page.


The chart is based on the assumption that all pages have an average of 22 links per page.

---------------------<><><><><>-------------------

How soon does the average Web page "grow up" to be a PageRank8 page?

With all PageRank3 pages you will need to be linked from 68,750 pages.
With all PageRank4 pages you will need to be linked from 13,750 pages.
With all PageRank5 pages you will need to be linked from 2750 pages.
With all PageRank6 pages you will need to be linked from 550 pages.
With all PageRank7 pages you will need to be linked from 110 pages.
With all PageRank8 pages you will need to be linked from 22 pages.
With all PageRank9 pages you will need to be linked from 5 page.
With all PageRank10 pages you will need to be linked from 1 page.

---------------------<><><><><>-------------------

How soon does the average Web page "grow up" to be a PageRank9 page?

With all PageRank3 pages you will need to be linked from 343,750 pages.
With all PageRank4 pages you will need to be linked from 68,750 pages.
With all PageRank5 pages you will need to be linked from 13,750 pages.
With all PageRank6 pages you will need to be linked from 2750 pages.
With all PageRank7 pages you will need to be linked from 550 pages.
With all PageRank8 pages you will need to be linked from 110 pages.
With all PageRank9 pages you will need to be linked from 22 pages.
With all PageRank10 pages you will need to be linked from 5 page.

---------------------<><><><><>-------------------

Your Friends,

Sharon and Roy Montero

Example 1, many links less quality (in PageRank points).

So, based on this example, you could get a PageRank of 8 "to be linked from 68,750 [PR3] pages" and just for argument, let say each these links costs $10 (very competitive pricing across the world). That means that for any website this link building campaign would cost $687,500 to get a PageRank 8 and regardless of what search engine it is 68,750 links is A LOT!! and enough to rank very well in almost all search engines regardless if they are coming from PageRank 3 pages.

Example 2, less links higher in quality (in PageRank points).

Let's say this website will seek an average of PageRank5 links, and those cost about $25 to $50, then the investment might still be around $68,750 to $137,500.

Ok, then whatever the investment is from any example you do, Link Building IS NOT CHEAP!!

For a large corporate website that could even be taken from the petty cash box, but for a smaller "mom & pop" website (regardless of the level of SEO it has) that could be a lifetime savings or not enough human resources to ever accomplish it.

Let's also take out of the equation the service or product offering provided (eg. looking back at the "Paris Hilton video" scandal, for example).

With this in mind, think about it for a minute, and what's your opinion on how fair do you think the link popularity algorithm from the search engines is across all websites on the Internet?

I will run a quick poll for quick overview of the thread with:

I agree with its fairness
I disagree with its fairness
I neither agree or disagree with its fairness

seobook
09-22-2004, 11:21 PM
1.) no such thing as fair.
2.) PageRank is not that important.
3.) link building can be done on the cheap if you focus on good ideas vice just buying links.
4.) if you are buying PageRank just for the heck of buying it then you should pay a ton.

Nacho
09-22-2004, 11:55 PM
Aaron (seobook),

#1 and #3 are valid points for discussion, but please please let's not turn this into a discussion about PageRank. I just used and example to illustrate a point, and perhaps I should have used a different example.

The true discussion is, do we have a A LEVEL PLAYING FIELD accross all websites (huge marketing budgets vs. NO budgets) as how search engines value link popularity in their algorithms today.

Thanks! :)

rustybrick
09-23-2004, 12:06 AM
Is link pop fair?

Of course you will need to break down link pop by engines:

Is Google's link pop fair? I think it can be better. Fair? Yes. But can be better.

Is Yahoo's link pop fair? err, fair - i hate the word fair.

Is Teoma's link pop fair?

etc.

MSN's Block Level Link Analysis paper that came out a while back is really the most advanced link equity solution, I have seen. So - when its actually put into play, we will have to see how that impacts relevancy. Will be interesting.

seobook
09-23-2004, 12:54 AM
The true discussion is, do we have a A LEVEL PLAYING FIELD accross all websites (huge marketing budgets vs. NO budgets) as how search engines value link popularity in their algorithms today.
fair enough...

2 years ago I had no knowledge of marketing or web stuff. currently I rank just fine for a bunch of phrases with minimal marketing spend.

being smaller means having less fat (other than my table muscle of course ;) ). it also means you are not going to be constricted by corporate crap that says stuff like "all inbound links must only say XYZ Corp"

in the long run intelligence, branding, and a good understanding of psychology and social networks are likely more important than dollar volume.

I still need to do lots of reading and learning though... :(

I do remember recently talking with Mike Grehan and he talked about research on this topic...about how the PageRank concept requires significant delay before new competitors can compete in a market.

Nacho
09-23-2004, 01:08 AM
Excellent points Barry!

I am also very excited about Block Level Link Analysis (http://forums.searchenginewatch.com/showthread.php?t=832), and as Dr. E Garcia (Orion) says, "This is where we are all heading to, to semantic blocks or passages of information semantically connected across the web as a graph of nodes (semantic one)." This is exactly more what I'm trying to see from this discussion. By looking at an equal opportunity. It's not about the number of links you get, but where they are on the web that matters.

The problem is, there will always be someone willing to pay and others willing to receive payment for a link regardless of where they are. Therefore it brings us back to square one, how can a link be fair for the average small budget website compared to Corporate-America's deep pockets? Will the solution be to give it less importance withing the 100 or so factors of a search engine's total algorithm? or can the be a link popularity algorithm that takes into the equation an equal playing field?

Yes, Barry (RustyBrick) is right as to say that search engines will value this link pop algo differently. However whichever search engine it is today, I strongly feel there is not enough fairness (or whatever word you want to use) in them, since it can easily be altered in sombody's favor from the one who has the most $$$ to build links.

I, Brian
09-23-2004, 06:49 AM
The trouble with the notion of questioning link pop is that there is no standard "all links have the same value". And I'm not simply talking about issues such as link weight and PR value of the page linking out, but issues such as age of the link, and what sort of context you have for the link.

Overall, the idea of link pop is pretty fair - the problem is how the concept is executed to provide relevancy in rankings - there are certainly improvements that can be made in Google at least.

rustybrick
09-23-2004, 09:32 AM
Is it fair that some companies can afford the tv spots at the super bowl? Is it fair that some companies can put a full page ad in the New York Times? Is it fair that some companies can afford huge ad space on Yahoo!'s home page?

Is it fair that some companies can buy more links then others?

In my opinion, Yes.

seobook
09-23-2004, 02:29 PM
fair is usually a concept based upon selfish individualism and usually fails to put ideas in a proper social context.

dannysullivan
09-23-2004, 03:25 PM
The true discussion is, do we have a A LEVEL PLAYING FIELD accross all websites (huge marketing budgets vs. NO budgets) as how search engines value link popularity in their algorithms today.
It wasn't level before link analysis was used. You had sites that were overtly trying to push their relevancy higher with on the page manipulation. You had sites that were naturally getting a lower ranking that a human might think they deserved because crawlers don't see images, were troubled frames and so on. In short, we've never had a level playing field in terms of web search.

Link analysis has pros and cons. It was definitely a great step forward, the second generation of search after being dependent on on-the-page analysis. But it can be manipulated and has been growing increasingly weak. Continued link bombs are a sign of this.

But that chart above? The numbers are misleading. PR4 sites might out rank PR10 sites on a particular query because the context of the link text also has to be taken into account. So the deck's not stacked as much as it seems.

EricWard-LinkMensch
09-23-2004, 03:32 PM
You guys know me. I try to be as fair to Google
as possible, but ten years of link building has
shown me that it's not fair at all to judge sites
based on links, for many reasons.

However, that's the world we live in, so you have to
either ignore PageRank, which is what I typically do
90% of the time when I'm building links for clients, or
bow down to it, which is what too many people do.

Here's my stance per a ClickZ article from three years
ago. Yep. Three years ago. Every word of it still holds.

The Five Major Flaws of Link Popularity (http://www.clickz.com/experts/archives/linking/build_links/article.php/836371)

Go back in time to that very first moment when
someone paid for a link for the first time due to the
site's PR. At that moment PR became tainted.
Google created PageRank as a way to let content be
the ultimate judge of content. Very true. We link to
stuff that's good. Sadly, now it's "We link to stuff that
pays us to link to it". And that means PR is simply
another thing to be rigged, the opposite of Google's
intent. So, PR is a ticking click, and at some point will
be pointless.

What Google will then have to do is stop supporting
the PR toolbar, and keep the link pop algo out of site,
behind the curtain. You can't rig what you can't see.
And remember, the Google toolbar number is not the
number that Google uses internally for searchers.

Links are my life, and they are the lifeblood of the web.
Google recognized that, and for this I commend them.
But I build links for reasons that have nothing to do with
Google, and always will.

Eric

Nacho
09-23-2004, 03:41 PM
Is it fair that some companies can afford the tv spots at the super bowl? Is it fair that some companies can put a full page ad in the New York Times? Is it fair that some companies can afford huge ad space on Yahoo!'s home page?

Is it fair that some companies can buy more links then others?

In my opinion, Yes.
Perhaps it is best for me to use an analogy to describe this one.

Last winter I went to Lake Tahoe for the first time (I love skiing :) ), but I love pizza more than anything when it comes to food. So, while I was on the slopes, on the grocery store, just walking around town, I was asking the locals, "where is the best pizza in town?" and very amazingly 3 out of every 4 people asked would say, "Lake Tahoe Pizza Co. (http://www.google.com/search?hl=en&lr=&ie=UTF-8&c2coff=1&q=%22lake+tahoe+pizza+co.%22)". Sure enough, I went and as an "amateur-expert" pizza lover, I can trully say that it was one of the best pizzas I ever had in my life.

However, I have never seen Lake Tahoe Pizza Co. on a TV commercial, radio or even on the hotel broschure where we stayed. Not the least, I would that a Super Bowl commercial is probably worth just as much as their business.

Did I get a "fair" non-commercial recommendation from the locals? YES! Was this what I wanted? YES! Can a search engine eventually do the same? I don't know. Are the search engine's taking this into account today with current link popularity algorithms? IMO, NO.

seobook
09-23-2004, 03:56 PM
Perhaps it is best for me to use an analogy to describe this one.

Last winter I went to Lake Tahoe for the first time (I love skiing :) ), but I love pizza more than anything when it comes to food. So, while I was on the slopes, on the grocery store, just walking around town, I was asking the locals, "where is the best pizza in town?" and very amazingly 3 out of every 4 people asked would say, "Lake Tahoe Pizza Co. (http://www.google.com/search?hl=en&lr=&ie=UTF-8&c2coff=1&q=%22lake+tahoe+pizza+co.%22)". Sure enough, I went and as an "amateur-expert" pizza lover, I can trully say that it was one of the best pizzas I ever had in my life.

However, I have never seen Lake Tahoe Pizza Co. on a TV commercial, radio or even on the hotel broschure where we stayed. Not the least, I would that a Super Bowl commercial is probably worth just as much as their business.

Did I get a "fair" non-commercial recommendation from the locals? YES! Was this what I wanted? YES! Can a search engine eventually do the same? I don't know. Are the search engine's taking this into account today with current link popularity algorithms? IMO, NO.

Search engines are dumb. they just match text and link patterns to search queries. they will eventually get smarter, but IMHO what you are talking about is more about branding. often the best products are not the best selling most widely used products.

if we search for web browser we see that many people have been pushing mozilla and opera to where they rank above the evil empire ;) in Google.

if you want people to care you need to establish a brand that other people care about...and ask them to push it.

rustybrick
09-23-2004, 04:06 PM
Good point Nacho,

One more thing. I am a huge Apple fan. I can't write in this little text box the amount of Apple products I have owned in my life.

When I really think about it, its not because I think Apple makes everything superior to everyone. I sometimes try to convince myself that they do, but I know deep down inside they are not the best at creating everything.

Why do I buy, recommend and love Apple products so much? They are damn good marketers. Where else have you found such loyal customers? Its almost like a Mac Cult (http://blog.wired.com/cultofmac/), oh wait, I didn't make that up, its for real.

What helped this? $$$ I think.

Something as widely used as computers and search engines can manipulate the user/customer and can be manipulated by the user/customer (referrals). Of course, this is my personal opinion, I do not work for Apple. :D

Buddha
09-23-2004, 06:11 PM
I don't think link pop is fair, but I still think it is the best indicator of relevancy because it is the hardest thing to manipulate. Especially as incumbents and established websites become entrenched, it becomes harder to knock out top sites. This is the natural evolution of business.

So what cant you manipulate? The actual user.
I do believe that SE's will incorporate their toolbar data in order to improve quality of relevance. However, any method they employ will always have room for manipulation. But the manipulation costs will increase and become much more difficult. At some point, the cost to create quality content should become less than the cost of manipulation. Then spammers will be forced to produce value.

Linking may not be fair, but those are the current rules we play by.

The russian oil tycoon and founder of Yukos was imprisoned and had his multi-billion $ company taken away by the Russian gov. Is it fair? Should he have "invested" in having more politicians on his payroll?

I know that linking works within the current SE environment and I will continue to invest in link building accordingly.

I, Brian
09-24-2004, 09:39 AM
Perhaps it is best for me to use an analogy to describe this one.

Last winter I went to Lake Tahoe for the first time (I love skiing :) ), but I love pizza more than anything when it comes to food. So, while I was on the slopes, on the grocery store, just walking around town, I was asking the locals, "where is the best pizza in town?" and very amazingly 3 out of every 4 people asked would say, "Lake Tahoe Pizza Co. (http://www.google.com/search?hl=en&lr=&ie=UTF-8&c2coff=1&q=%22lake+tahoe+pizza+co.%22)". Sure enough, I went and as an "amateur-expert" pizza lover, I can trully say that it was one of the best pizzas I ever had in my life.

However, I have never seen Lake Tahoe Pizza Co. on a TV commercial, radio or even on the hotel broschure where we stayed. Not the least, I would that a Super Bowl commercial is probably worth just as much as their business.

Did I get a "fair" non-commercial recommendation from the locals? YES! Was this what I wanted? YES! Can a search engine eventually do the same? I don't know. Are the search engine's taking this into account today with current link popularity algorithms? IMO, NO.
You may not have got recommendations from commercial sources - but you still got recommendations from human users. Those locals were recommending you look to a specific place because it was considered a quality place to recommend.

So you are actually arguing, by metaphor, the reasoning behind the benefits of link popularity in the first place. :)

Mikkel deMib Svendsen
09-24-2004, 09:51 AM
1) I do not think the purpose of a search engine is to be "fair" - it is to be good. And thats not the same. Search engines serve users and try to do that the best they can. If they do it well users will stick around and watch some commercials ... Search engines do not have to list all companies to serve users well. It may not be "fair" to you that they don't list YOUR company but the users may not care at all, as long as wht the search engine serves them makes them happy.

2) PageRank was never about being fair or democratic, i think. It was, and is, about identifying the elite - the best websites. Not the many websites, not all views on an issue, not a ballanced political view, not nessecarily the most cleaver thoughts but just the view of the few selected top sites. The most popular.

andrewgoodman
09-24-2004, 11:11 AM
PageRank is not just about fairness, of course, it's about relevance or "aboutness."

As in "which pages" are "widely accepted by authoritative related sites to be" "about" "x subject" (x subject being indicated by the user's query).

As a system, PageRank makes sense in a general way.

But users might have many different intentions... so PageRank is only one stab -- at the time a wildly successful stab -- at improving the state of whole web search.

But as soon as any algorithm becomes the subject of widespread gaming/optimization, its fairness, authoritativeness, relevance, etc. are bound to deteriorate.

"Clickstream search" (now back in vogue with a vengeance, particularly at A9) is another interesting way to attack the same problem -- so that search works very much like browsing Amazon... "users who viewed this page are also fond of .... THIS OTHER PAGE," etc. Of course the flipside is you need to really follow folks around, and there are privacy implications there. But I veer off topic.

Nacho
09-24-2004, 11:55 PM
Links are my life, and they are the lifeblood of the web.

Google recognized that, and for this I commend them. But I build links for reasons that have nothing to do with Google, and always will.
I agree 100% with that, and that's how it should be done.

I will never forget when Mike Grehan explained to me the equity and true value in a link the first time. He explained how at one point it was thought that link popularity could be used to determine the next nobel prize winner. However, that theory had many flaws because today’s link popularity algorithm is mainly based on the number of links and not the true quality or authority coming from those links.

Recently, I have been meditating about this more than ever as I am building new websites or rebuilding existing ones. When strategically designing the visual and functional architecture of each page, I am carefully adapting it to the future’s link popularity algorithms rather than today’s. I now know and understand that links residing from those “links.html” or other grocery list pages intended for pure reciprocal benefit only to gain manipulation in the SERPs are going to be worthless in the not so distant future.

And as I value and understand the meaning of the new “Block Level Link Analysis” on top of what has been studied in the past (PageRank and HITS), I realize that there must be other elements in the equation to effectively minimize the risk for spammers to manipulate this algorithm with the following:

Honest human ratings and reviews that are subjective and not corrupted by a monetary gain will forcefully need to taken into consideration as a verification of the value of the content, product or service provided in the actual page.
There will also need to exist 3rd party authorities that validate these ratings with an “expert” opinion.
As well as better flow of information between the websites and the search engines of such information that could not be manipulated by the website owner.

In my opinion, the current primary search engines will have to adapt more to the user’s true needs and not the website owner’s ambitions if the want to stay at the top or gain market share in this industry. After all, the search engines live from their reputation as a problem solving solution of organized information on the web to the users which feed them search query traffic volume. If the relevancy dies, the volume dies and the search engine’s reputation dies with it.

Mikkel deMib Svendsen
09-25-2004, 03:09 AM
that could not be manipulated by the website owner.


All systems, with or without a human component, can be manipulated. Search engines can make it more difficult to manipulate but they can never stop the most talented SEOs from doing exactly what they've sucessfully done since SEO all started: Manipulate systems

In fact, I belive the skills SEOs have got, doing what we do, is likely to spread into other areas. There are other systems companies and people are likely to want us to manipulate - and we have the skills to do it.


Honest human ratings and reviews ....

There have always been a human component to search and I think there will always be. But, I also believe that any engine, and especially Google, will try and solve relevancy as algoritmic as possible. Humans are not as easy to scale as machines :)

However, as far as I see, the human component is not always making search better - it just makes it easier to adjust to the political agenda of the engine in question. Did Google improve when they penalized Search King? Did Google improve when they accepted to limit the index to China? Humans did these things - not the algorithm and as far as I can se it was not done to serve the users better.

What I am trying to say is that I don't think humans is any guarantee of quality. Humans can be manipulated too - in some cases even easier than algorithms :)

seobook
09-25-2004, 03:21 AM
All systems, with or without a human component, can be manipulated. Search engines can make it more difficult to manipulate but they can never stop the most talented SEOs from doing exactly what they've sucessfully done since SEO all started: Manipulate systems
I have seen in forums that it usually only takes one or two influential people with a few followers to totally shape or frame a discussion. same thing with the web in general...really easy for an SEO to spread whatever info they want about anything...SEOs can even spread information they know to be false if they feel like it...many media networks already do this anyway.

there are systems all over the world that can be manipulated. just think if a person knew how to manipulate the stock market? some of the terrorists who knew about september 11th were shorting airline stocks.

What I am trying to say is that I don't think humans is any guarantee of quality. Humans can be manipulated too - in some cases even easier than algorithms :)

I am reading a book right now on political framing which really disects some of these ideas rather nicely.

If people do not think other people are easily manipulated then:
1.) why do many people frequently vote against their self interests?
2.) why does marketing even exist at all?

Nacho
09-25-2004, 12:41 PM
All systems, with or without a human component, can be manipulated. Search engines can make it more difficult to manipulate but they can never stop the most talented SEOs from doing exactly what they've sucessfully done since SEO all started: Manipulate systems
Of course, this is true and I agree completely. However, there must be a way to make the elements of the algorithm be completely apart from each other and make sense to what true "popularity" really is to reflect higher relevancy and minimize the risk of "spamming" manipulation. I underline the words "minimize the risk", because I know that it can probably not be removed completely, ever.

Nacho
09-25-2004, 12:54 PM
Moderaton Note:

I see 9 participants in the thread and 21 participants in the poll. We really want to hear your opinions please. They are very important to all of us and discussions like this is what helps us be ahead in our industry.

Rate this Thread: Don't forget to "Rate this Thread" at the bottom as it will help other Members and Guests be pointed in the right direction of quality discussions.

Rate this Post: Our forum is very unique in the way that you can use the "Rate this Post" feature for Members (located to the right of each member's post next to the scale symbol), which will award or substract points for their reputation levels based on choosing to "I approve" or "I dissaprove" each post. You do not have to do this on every post, but you can for sure use it as a way to say "thank you for that great contribution of knowledge" (for example) when you find one that stands out.

Thank you!! :)

bobmutch
09-25-2004, 04:51 PM
Things are not fair in this world and they don't need to be. Life is not fair. It takes money to make money. This is like discussing if it is fair for the rich to have so great an advantage with all they money to make money.

Now the fairness that we want to be concerned with is, is it fair for everyone with a big budget. Does Google allow people with brown eyes to get as many links with the person that has blue eyes. Now that would be an issue.

Life is NOT a level playing field. But all things being equal it should be. That means if you have 10mil and I have 10mil it should be a level playing field.

Now I can't help but say this. Any body that thinks it takes $687,500 or $68,750 to get a PR8 is wacked. PR9's go for $1200 a month, and one with less then 14 outbound will give you a PR8. Better even yet 4 PR8 with less thank 10 external outbound each and guess what -- you got a PR8.

Ya ya ya you don't want this to turn into a PR thread, and I'm not into hijacking threads, but come on people. That like saying in a car performance forum that it take $687,500 to build a good engine with 500 hourse.

Any way thats my rant for the day.

seobook
09-25-2004, 05:53 PM
capitalism and shareholder value and CEOs are not based on the concept of equality...and since the web is a reflection of the cross sections of the various broken social & political systems around the world it does not make sense that anything that attemts to organize that database would come up with anything that resembled the concept of fair.

my own rankings after my limited experience in a limited amount of time prove that search engines (and the web as a whole) are more fair than most portions of society...greedy megacorps have not fully sunk their claws into the web yet.

Nacho
09-25-2004, 06:39 PM
This thread is about a reflection of the link popularity algorithm and not about life's fairness. Let's not go off topic here please.

As a user, when it comes to asking the search engine "best pizza in Lake Tahoe" I rather not see Pizza Hut or Dominos because the have enough marketing budgets to buy links for the sake of manipulating their rankings for this location. I would like to see that search engine help me identify the true most popular pizza in Lake Tahoe.

Can a search engine some day give a true objective response that is a reflection of reality rather than SEO abuse or manipulation?

seobook
09-25-2004, 07:02 PM
This thread is about a reflection of the link popularity algorithm and not about life's fairness. Let's not go off topic here please.
the word fair is rather abstract and to define it as a yes or no we need to compare it against something...life is something we all do...and thus it is something that is easy to relate to others.

Can a search engine some day give a true objective response that is a reflection of reality rather than SEO abuse or manipulation?
for the most part no, but that is because most people are lazy and follower sheep...few people really speak their mind. again the web is just a reflection of the world in which we live...although it is more skewed toward some of the people who are more willing to speak / more interested in self expression.

projectphp
09-25-2004, 08:25 PM
As a user, when it comes to asking the search engine "best pizza in Lake Tahoe" I rather not see Pizza Hut or Dominos because the have enough marketing budgets to buy links for the sake of manipulating their rankings for this location.
No area of SEO gets more contradictory, yet incredibly well argued, evidence presented against it than Link Pop. "It is too hard for commercial sites to get links" and "Big budgets lead to more links" and "Non-commercials have an advantage over commercials" are just some of the comments that get bandied about.

So, who is right? Everyone is, in this case. Link analysis offers opportunites for manipulation, as does on-page SEO and any other business oportunity. Yet link analysis is the single hardest element to "game", and the easiest to counter when it is gamed.

IMHO, this is why link analysis is very, very fair, particularly if fair isn't a momentary phenomenon, but a sense of long term balancing. Todays shoot to number one link spammer will be gone next week, next month or next year.

So I vote yes to fair, in the while wash of things.

I, Brian
09-26-2004, 06:19 AM
Can a search engine some day give a true objective response that is a reflection of reality rather than SEO abuse or manipulation? Actually, I think you've missed an important point here - the main obstacle to best relevancy in search engine results - to "provide a reflection of reality" - is simply the complexities involved in assigning semantic order to the vast disordered internet, and then somehow matching that up with the intentions of the person making the query*.

Any kind of "abuse" or "manipulation" simply attempts to provide meaningful results in terms that search engines understand.

But would the SERPs be fair if there was zero abuse or manipulation? Absolutely not - and this is not least because search engines often have difficulty finding deeper pages and meaning on non-optimised pages in the first place.

SEO to any degree is a form of manipulation - but at the end of the day, SEO in all forms simply tries to help search engines assign meaning to URLs and pages.

Whether anybody agrees that any certain page deserves the degree of meaning assigned to it is an issue with the search engines themselves to decide. And if anybody disagrees with a search engine's decision, then that is a criticism of the search engine, rather than the webmasters whose pages are assigned any particular value.




* If you go to http://www.search-engine-book.co.uk/, you'll find Mike Grehan interviewed by the BBC, making interesting comments to the effect that Google is looking to use services such as Gmail to help collect data on user behaviour, for helping on issues of relvancy.

Mikkel deMib Svendsen
09-26-2004, 07:02 AM
true objective response

I think the point is there IS no such thing as a "true objective response" because there are no true objectives. There are lots of them - and they change day by day. Search engines are just trying to catch up to it.

In my mind a "true objective response" would not be usefull to anyone.

Mike Grehan
09-26-2004, 06:16 PM
There are some very valid comments relating to hyperlink based algorithms such as the two most important i.e. HITS and PageRank.

However, the notion of fairness is rather like the notion of quality: It means different things to different people.

If we take the minimal approach to "is a hyperlink based algorithm fair to EVERYONE on the web" - then the answer is most certainly no where search engines are concerned. It's only fair, to a point, to the fraction of the web which a search engine has charted, and more so to the popular pages which are most frequently linked to.

I have a very long article which I'm mailing out this week based on the "rich get richer" problem with a "static" hyperlink based algorithm such as PageRank and how it not only accelerates the problem, but how it is possibly also harmful to the web ecology as a whole.

There's no real point in me duplicating it here, or trying to publish it in advance as it is a long article extracted from some of the research for the third edition of my book. The reason I've written some of it up as a feature article is because of the alarming rate in which the "rich get richer" problem affects the work and endeavours of the search engine optimiser.

As for block level link analysis which has been covered in more detail elsewhere, I have to say I'm a little sceptical about the reason that Microsoft just happened to leave their research work lying around on the web. Rather like Google leaving their original research paper around for webmasters if you ask me.

Block level may be a new term but it's not new as far as research into the composition of a web page and where important linkage data lies. Soumen Chakrabarti looked at this three years ago http://www10.org/cdrom/papers/489/ I referenced it in my last edition.

And two years ago, just after my second edition was published I came across this http://www2002.org/CDROM/refereed/579/ and this http://www2002.org/CDROM/poster/78.pdf

There's so much more I'd like to add Nacho, but I'm so tight on deadlines just now. Take a look at the links I've given and wait until midweek when my article is mailed out. It has some VERY interesting links on why the rich get richer at search engines problem needs to be tackled now.

And then I'd love to try and find the time to get back into some more 'interesting' posting in this very 'interesting' thread!

Cheers!

Mike.

Nacho
09-26-2004, 11:41 PM
Mike,

Thank you for taking some of your time to read this thread and post your comments, it is very appreciated. I look forward to your article and research behind these links.

Saludos!

St0n3y
09-27-2004, 02:58 PM
Link popularity has its pros and cons, but the biggest problem with link pop is the PR bar. Get rid of that then link popularity building will be more about linking to quality and relevance than linking to better PR sites.

CSE Monkey
09-27-2004, 03:58 PM
On a personal/quantifiable note it would seem to me that the value of a link would depend on the user experience after the clickthru. Non-relevant pages will quickly generate a back button, while relevant pages will equate to # pages views on that site and time spent on the site. There are obvious privacy concerns here, but if anonimized they could be dealt with. Perhaps these factors are already taken into consideration, or perhaps they are too difficult to process since a search engine would have to record not just the links but link activity.

FYI, I am a relative amateur on the subject.

Mikkel deMib Svendsen
09-27-2004, 04:11 PM
CSE Monkey, click tracking has been done, and are still done by some, but it has never turned out to be a magic bullet. Done right, it may contribute to better search but on it's own it's just as weak (or strong) as most other methods.

Buddha
09-27-2004, 07:33 PM
"The Rich Get Richer"

It's easy for the rich to get richer because once you have a focused network of SEO sites, it's very easy to create additional sites in the same niche. All you have to do is leverage your existing assets (content, knowledge, link partners, etc) and build.

But I don't think what is occurring is any different from the evolution of the cable networks. Turner built a local channel, then TBS, then CNN, Headline News, and a whole bunch of other cable channels.

Mel
09-29-2004, 01:10 AM
This is a very interesting thread and there have been some great responses but:

Fair is a hard word to quantify and has different meanings to different people.

Fair to whom is also an important consideration, fair to searcher, fair to the webmaster, fair to the search engine?

IMO life in general is not fair, search engines are not fair (and really should not try to be IMO) and ranking algos in general are not fair.

PageRank(or linkpop or ...) does not try to be the single answer to ranking relevancy but only a tool to help improve relevancy over simply analyzing the words on the page, and that is a fair thing to do, so in my view from that perspective it is fair from the search engines point of view. After all it's not the search engines job to give equal rankings to all websites, thier job is to identify the best websites and put them first.

People will always use tools like PageRank that are put at their disposal, but different people have different capabilities and thus will use the tools differently according to their abilities. Is it fair that some people are great graphic designers and others not? Is it fair that some people are great copywriters and others not? IMO the answer to that question has to be that of course its fair to use what is available to the best of your abilities. So in looking from the webmasters point of view its fair to use PageRank to the best of your ability so long as you don't try to deceive with it.

But what about the searcher, is it fair to him? Not to ramble on for too long let me just say IMO, that so long as the search results are better than they would be without PageRank then that is a fair thing to do.

In summary, while it is not imperative that search engines are fair, the use of the linkpop algo is a fair way to increase relevancy to all parties concerned. If there is a fairer way then use it to build the next great search engine.

Sharon and Roy
10-05-2004, 08:26 PM
Hi Nacho,

We all know how important it is to get links to rank high on the primary search engines, right?


Right!

An we also know how much effort, time and money it takes to get external inbound links to any given website (.com, .org, or .whatever). Whether it's you who is getting the links or outsourcing it to anyone in the world, it still costs time and money.


While we do know how much effort, time and money it takes on "average" we would like to add that for some folks the "investment" is considerably less than the "average" and for others considerably more, right?

The question is, how fair is the link popularity algorithm across all websites on the Internet?


It is EXTREMELY fair!

Example 1, many links less quality (in PageRank points).

So, based on this example, you could get a PageRank of 8 "to be linked from 68,750 [PR3] pages" and just for argument, let say each these links costs $10 (very competitive pricing across the world). That means that for any website this link building campaign would cost $687,500 to get a PageRank 8 and regardless of what search engine it is 68,750 links is A LOT!! and enough to rank very well in almost all search engines regardless if they are coming from PageRank 3 pages.

Example 2, less links higher in quality (in PageRank points).

Let's say this website will seek an average of PageRank5 links, and those cost about $25 to $50, then the investment might still be around $68,750 to $137,500.

Ok, then whatever the investment is from any example you do, Link Building IS NOT CHEAP!!


Well, Nacho, that depends!

Consider this third example.

Example 3, even less links even higher in quality (in PageRank points).

So, based on this example, you could get a PageRank of 8 "to be linked from 1 [PR10] page" and just for argument, let's say this link cost $0 (because the link is given as a gift from a friend). That means that for any website this link building campaign would cost $0 to get a PageRank 8 and regardless of what search engine it is and 1 link from a PageRank10 page is A LOT!! and NOT enough to rank very well in almost all search engines regardless that it is from a PageRank10 page.

With this in mind, think about it for a minute, and what's your opinion on how fair do you think the link popularity algorithm from the search engines is across all websites on the Internet?


Again, this is completely fair!

Here's another question to ponder. In example 3, is it fair that the PageRank8 page doesn't rank well in any search engine even though it is a PageRank8 and the site owner spent $0 for it?

What if the site owner spent $1000 per month for it for a year ($12,000), would that make the situation any less fair?

Would that make the situation any more fair?

If you ask us, it would still be completely fair.

If you really analyze the current algorithms you quickly come to the conclusion that the page with the most links pointing to it doesn't always rank the highest, in fact in the majority of cases this is true, so is that fair.

If you ask us, yes, it is completely fair.

Here's yet another question to ponder. If 2 or more students enroll in our SEO class who want to rank high for the same keyword phrase and they are both taught the exact same methods of acquiring links back to their pages and one student quickly acquires a #1 ranking and the other student never acquires a ranking higher than #100, is that fair?

If you ask us, yes, it is completely fair.

Oh, and one last question to ponder. If a site owner has been acquiring links to their homepage for the last 5 years and on a consistent basis, month in and month out and they were ranked #1 for several terms for the last few years ... But let's also say that they DIDN'T have the foresight to understand that one day the search engines would be able to provide a certain type of algorithm that would revolutionize the way link popularity was calculated. Now, there's another site owner who has only been acquiring links to their homepage for the last year but they had not yet acquired a sufficient amount of links to even be close to entering into the top 100 for all of the same terms as the first site owner ... But let's also say that this site owner DID have the foresight to understand that one day the search engines would be able to provide a certain type of algorithm that would revolutionize the way link popularity was calculated and they were only acquiring certain links in light of this yet future algorithm.

Then the day arrives and this forward thinking site owner who "invested" heavily in one day having the search engines "catch up" with his particular linking strategy and thereby catapulting his page into the #1 spots for all of the same terms that our first site owner now finds themselves not even ranked in the top 100 for.

Is this fair?

If you ask us, yes, it is completely fair.

Your Friends,

Sharon and Roy Montero

artmaker
10-06-2004, 11:12 AM
This is all interesting but I have a site I use it as a portfolio more than a store. So how it ranks is of little use to me.
With that in mind, I still DO maintain several links pages. Why? Because I find the sites I have linked with useful, and relivant to what MY site is about. I actually have several other peoples site links pages bookmarked only because they are so full of resources. And of course, that site's main page is there every time I use their links.
I refuse to link with those link companies. I hope no one out there uses them. It's a cheat plain and simple and if enough people cheat, search engines will find another way to rank pages. I may never have the numbers of links needed to matter. And I sure can't afford to pay for a service to do that for me. I hope the validity of links counts for more than numbers.
My two cents. ;)

orion
10-08-2004, 03:52 PM
As for block level link analysis which has been covered in more detail elsewhere, I have to say I'm a little sceptical about the reason that Microsoft just happened to leave their research work lying around on the web. Rather like Google leaving their original research paper around for webmasters if you ask me.

Block level may be a new term but it's not new as far as research into the composition of a web page and where important linkage data lies. Soumen Chakrabarti looked at this three years ago http://www10.org/cdrom/papers/489/ I referenced it in my last edition.

And two years ago, just after my second edition was published I came across this http://www2002.org/CDROM/refereed/579/ and this http://www2002.org/CDROM/poster/78.pdf

http://www10.org/cdrom/papers/489/

Ah, the old paper referencing Hits and topic distillation. Let's put this in the right perspective. In that famous work Soumen discusses the hubs and distillation thing in addition to DOM segmentation.

Let's put DOM segmentation in the right perspective.

First, the concept of passages and segmentation of text comes from readability theory, not from IR theory, can be traced back the 40's and predates search engines. I'm sure there must be references in earlier times. Second, segmentation research is nothing new and predates the work of Soumen. Back in 94 research work was done in this area. I am listing some references from the 90's.

The Microsoft research on block level analysis is by far different from Soumen's DOM segmentation analysis and is aimed at extracting semantic information from blocks, not mere links or phrases inside links that then used for topic-distilled (http://research.microsoft.com/asia/dload_files/group/ims/21.pdf).

True that DOM segmentation provides a nice structure for web document in which tags are used to identify blocks and links inside those blocks. The problem with DOM-based approaches is that are hard to compare with other DOM structures. Another problem is that DOM is, as Microsoft researchers call it "a kind of linear structure and usually unable to represent the semantic structure of a page. From this perspective, DOM based blocks are, in some sense, similar to traditional discourse passages."

This quote of the http://research.microsoft.com/asia/dload_files/group/ims/21.pdf paper says it well

"DOM provides each web page with a fine-grained structure, which illustrates not only the content but also the presentation of the page. In general, similar to discourse passages, the blocks produced by DOM-based methods tend to partition pages based on their pre-defined syntactic structure, i.e., the HTML tags. There are some approaches that take into account the problem of page segmentation, but there is no consistent way to do it and, to the best of our knowledge, few works are done on applying DOMbased page segmentation methods on web information retrieval. Some simple experiments are performed in, where sub-trees tagged with <TITLE>, <P>, <H1>~<H3> and <META> are
treated as blocks, but the results are not encouraging. The reasons may lie in the following three aspects. First, DOM is still a linear structure, so visually adjacent blocks may be far from each other
in the structure and departed wrongly. Secondly, tags such as "<TABLE>" and "<P>" are used not only for content presentation but also for layout structuring. It is therefore difficult to obtain the appropriate segmentation granularity. Thirdly, in many cases DOM prefers more on presentation to content and therefore not accurate enough to discriminate different semantic blocks in a web page."

Microsoft's research accounts for semantics in each block and passage and any link within them. From this, they construct a block-to-pages and pages-to-blocks subgraphs. These and other papers on the subject should make clear that we are not talking about the same things, Mike.

In my own research, I use the concept of blocks exactly to mean passages in the readability sense. Thus we segment documents according to that definition. Then we apply semantic tools to what's inside these portions.


Previous Work on Segmentation follows

Salton, G., Allan, J., and Buckley, C., Approaches to passage
retrieval in full text information systems, In Proceedings of
the 16th Annual International ACM-SIGIR Conference on
Research and Development in Information Retrieval, Pittsburgh,
Pennsylvania, USA, 1993, pp. 49-58.

Callan, J. P., Passage-Level Evidence in Document Retrieval,
In Proceedings of the Seventeenth Annual International ACM
SIGIR Conference on Research and Development in Information
Retrieval, Dublin, 1994, pp. 302-310.

Hearst, M. A., Multi-Paragraph Segmentation of Expository
Text, In Proceedings of the 32nd Annual Meeting of the Association
for Computational Linguistics, New Mexico State
University, Las Cruces, New Mexico, 1994, pp. 9-16.

Salton, G., Singhal, A., Buckley, C., and Mitra, M., Automatic
Text Decomposition Using Text Segments and Text
Themes, In Proceedings of the Seventh ACM Conference on
Hypertext (Hypertext'96), ACM Press, New York, 1996

Kaszkiel, M. and Zobel, J., Passage Retrieval Revisited, In
Proceedings of the 20th Annual International ACM-SIGIR
Conference on Research and Development in Information
Retrieval, 1997, pp. 178-185.

Ponte, J. M. and Croft, W. B., Text Segmentation by Topic,
In Proceedings of the 1st European Conference on Research
and Advanced Technology for Digital Libraries, 1997

Kaszkiel, M. and Zobel, J., Passage Retrieval Revisited, In
Proceedings of the 20th Annual International ACM-SIGIR
Conference on Research and Development in Information
Retrieval, 1997, pp. 178-185.

Crivellari, F. and Melucci, M., Web Document Retrieval
Using Passage Retrieval, Connectivity Information, and
Automatic Link Weighting--TREC-9 Report, In The Ninth
Text REtrieval Conference (TREC 9), 2000.


Orion

fathom
10-08-2004, 04:09 PM
Is this fair?

In the general sense "completely"... well no more or no less fair than any business development - partnerships, limited partnerships, mergers, cooperative, associations, community economic development, and county, state/province/country & international alliances.

It's also generally known as "leverage".

Mike Grehan
10-09-2004, 01:28 PM
Microsoft's research accounts for semantics in each block and passage and any link within them. From this, they construct a block-to-pages and pages-to-blocks subgraphs. These and other papers on the subject should make clear that we are not talking about the same things, Mike.

Orion,

I hope I haven't been too confusing in my post. I wasn't suggesting that Dr Chakrabarti's work (or the others) was related in any way to the Microsoft research. I was just giving a few general examples of the way that, over the years, researchers have attempted to better get the gist of a page by document analysis and modelling.

By the way, I've just finished reading the excellent paper you wrote on the results of your recent experiment (on topic analysis). It's fascinating.

On the themes subject, I think that there is still a little misunderstanding in some parts of the search marketing community about the principle. I don't think it's so widely known that the thematic structure of texts can be broken down and discovered in a paragraph or a sentence, for instance.

An on topic web site which begins as subject specific and breaks down to the various subsets is good in the utility sense for both the searcher and the search engine. But a web site which attempts to cover the same topic on every page, in an effort to look stronger for a specific query is of little use to a searcher or a search engine in my opinion.

I have to say, I think your posts here are quite scintillating. I always look forward to your observations and research.

Once again, my sincere compliments to you on your recent work.

Mike.

orion
10-10-2004, 05:24 PM
Hi, Mike.

Thank you for your kind words. I have the same respect for your work.

About the three papers you have referenced

http://www10.org/cdrom/papers/489/
http://www2002.org/CDROM/refereed/579/
http://www2002.org/CDROM/poster/78.pdf

Certainly these research papers have credit and valuable information. The papers are aimed at mining the DOM and HTML tag structures of documents. However, these three models are hard to implement in the presence of commercial noise and frankly seem to fail to address the problem of unveiling semantics.

Contrary to what many thinks, Microsoft's block level analysis is not a refrit of these segmentation techniques (DOM, HTML segmentation) or another way to call old things. On the contrary, block level demonstrates why these models have failed. In the particular case of link models combined with DOM approaches (Hits, Clever, etc and even PageRank) these models fail to grasp document semantics and provide no information regarding the underlying term data structures present in specific passages.

On the subject of themes, many seos are even confused with the idea that block level has something to do with theming. It seems the misunderstanding comes from superficial references to expressions like "theme", "topics", and "content" when talking about block level analysis. Let address these issues separately, DOM and Theming.


1. DOM/HTML-based Models

The above three references account for the mechanical structure of documents (DOM and HTML), but do not account for the fact that many of the same descriptors and tags are used for both layout (presentation) and content (e.g. <table>, <p>, <div>, etc..) --and we haven't yet discussed CSS positioning.

DOM-based segmentation models didn't make a real impact in the area of semantics and are long history. HTML Tag-based segmentation is going the same way. Unlike these two appraoches, block level analysis looks at the localized semantics rather than at the mechanical structure of a site.

One of the references, above (http://www2002.org/CDROM/poster/78.pdf), when looking at improving precision and retrieval states the key idea of their HTML segmentation model as follows: "In this paper, we propose a technique to deal with the problem. The key idea is to segment each Web page to identify different micro information units or topic areas according to its HTML tags and contents." . The other paper introduce the idea of "pagelets" as a novelty. The three papers ignore the problem of trying to extract semantics from mechanical structures. Once we attach tag requirements, mapping of semantics is problematic or at the most too restrictive.

I hate to revisit this quote from Microsoft's research team, but they can put it better than me:

"There are some approaches that take into account the problem of page segmentation, but there is no consistent way to do it and, to the best of our knowledge, few works are done on applying DOMbased page segmentation methods on web information retrieval. Some simple experiments are performed in, where sub-trees tagged with <TITLE>, <P>, <H1>~<H3> and <META> are treated as blocks, but the results are not encouraging. The reasons may lie in the following three aspects. First, DOM is still a linear structure, so visually adjacent blocks may be far from each other in the structure and departed wrongly. Secondly, tags such as "<TABLE>" and "<P>" are used not only for content presentation but also for layout structuring. It is therefore difficult to obtain the appropriate segmentation granularity. Thirdly, in many cases DOM prefers more on presentation to content and therefore not accurate enough to discriminate different semantic blocks in a web page". (http://research.microsoft.com/asia/dload_files/group/ims/21.pdf)


2. About Themes

I must agree with your observations that theming is not well understood in some SEO circles. Each page of a site stands by its own rights, as a crawler cannot assign merits to a page based on in which architecture the page resides. On the other hand, terms repeated/focused across many pages would compete, blur the main theme of a site, and would not help a lot, as you well pointed out.

Many seos have the wrong perception about themes and block level analysis, too. The two are not related. Theming is about designing a hierarchical architecture for a site. Block level analysis is about extracting semantic information from specific portions of individual web documents.

My take on Microsoft's block level analysis is that it could fail in the presence of commercial noise. I believe the best way to grasp semantics from commercial documents consists in extracting the underlying term data structures of those portions, blocks, or segments while they are in their "natural habitat"; i..e., in the presence of commercial noise, then mining those data structures. This is what on-topic analysis tries to accomplish.


3. About On-Topic Analysis

On-topic analysis consists in

(a) querying a system
(b) Collecting passages from the top N results) and
(c) "asking" the passages the following question:

"Show me your underlying data structure"

Passages can be titles, descriptions, abstracts and even a link array (if we want to uncover data structures from a link subgraph of the Web).

In my view, the most significant finding of the on-topic analysis paper is that is possible to extract well-organized data structures from commercial collections full of noise, and doing this client-side. These data structures often follow the sequence

Top > Broader > Narrower, etc..

I am fascinating with the idea that we can extract order from disordered systems since my formal education is this area.

Now once we know the right structures, we can study them or target them for monetary gain or any other purpose at hand.

My problem now is how to explain what is/is not on-topic analysis and what it can/cannot do for seos. As any IR model, it has limitations and should not be taken for an instant gratification approach or a silver bullet. I have discussed block analysis, theming, and on-topic analysis before and it seems many folks are all confused about these subjects.


Orion