Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Yahoo! > Yahoo Web Search
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 08-04-2005   #1
Mel
Just the facts ma'm
 
Join Date: Jun 2004
Location: Malaysia
Posts: 793
Mel is just really niceMel is just really niceMel is just really niceMel is just really nice
Ridiculous Increase In Yahoo Backlink Counts & Is Bigger Index Real?

Just Noted tonight that my 2000+ links on Yahoo had shot up to over 24,000!

Checked with a few others on our forum who are all reporting roughly tenfold increases in Backlinks reported in Yahoo.
__________________
Mel Nelson
Expert SEO Dont settle for average SEO
Singapore Search Engine Optimization and web design
Mel is offline   Reply With Quote
Old 08-08-2005   #2
xbob
SEO Addict
 
Join Date: Aug 2005
Location: McCleary, WA
Posts: 3
xbob is on a distinguished road
Yahoo link surge possibly a programmatical error?

Yes, this has been happening for two solid days now. It is also occuring in Overture's Alltheweb (which is basically Yahoo also).

One strange thing i noticed is if you click through the result pages and pay attention to the number of backlinks at the top of the page, you will find the apparent actual number of backlinks, as opposed to the erronous results you recieve from the initial link: search.

I feel sorry for the programmer that made that mistake.
xbob is offline   Reply With Quote
Old 08-08-2005   #3
shor
aka Lucas Ng. Aussie online marketer.
 
Join Date: Aug 2004
Posts: 161
shor is a jewel in the roughshor is a jewel in the roughshor is a jewel in the roughshor is a jewel in the rough
Well, at least we know where most of these new backlinks came from:
Yahoo Expands Its Search Engine Index
shor is offline   Reply With Quote
Old 08-08-2005   #4
Everyman
Member
 
Join Date: Jun 2004
Posts: 133
Everyman is a jewel in the roughEveryman is a jewel in the roughEveryman is a jewel in the rough
Yahoo now claims 19 billion documents indexed, according to a widely-distributed Associated Press piece by Michael Liedtke that went out today, August 8, 2005. This is the first time Yahoo has announced its total.

Last November Google said, "I raise my bet from 4.2 billion to 8.2 billion." I was skeptical of Google's numbers. On my large site, I can prove that Google's counting is utterly inflated, by as much as 10 to 20 times the total pages that have ever existed on this site. This has been going on for many months.

Yahoo's actual counts for numbers of pages have been more believable than Google's since it started, but recently they're inflated also.

With this 19 billion figure that Yahoo reported today to the Associated Press, I think Yahoo got tired of Google's numbers, and is saying to Google, "I'll see your 8.2 billion and raise you 10.8 billion."

I don't believe Yahoo's numbers either, but nothing will shut Google up except a better bluff.

Here's what might be going on: With the Google/Baidu bubble, some people in high positions are thinking that they had better start letting the air out of Google before it bursts. It's better to have a soft landing, because there will be less damage that way.
Everyman is offline   Reply With Quote
Old 08-08-2005   #5
PhilC
Member
 
Join Date: Oct 2004
Location: UK
Posts: 1,657
PhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud of
I've been posting that Google's 8 billion figure isn't true ever since they put it up, but people still think they have that number of pages in the index. One of my sites has a miximum of ~23,000 pages, but, when Google were claiming 4 billion, they always reported anywhere between 23/24000 and the low 30 thousands. When they put the 8 billion figure up, that site's reported pages also doubled. There is still a maximum of ~23,000, but they've been reporting in the high 50 thousands or low 60 thousands since then.

It sounds like Yahoo! has decided to play the same game, although that doesn't account for the silly backlinks figures.
PhilC is offline   Reply With Quote
Old 08-09-2005   #6
shor
aka Lucas Ng. Aussie online marketer.
 
Join Date: Aug 2004
Posts: 161
shor is a jewel in the roughshor is a jewel in the roughshor is a jewel in the roughshor is a jewel in the rough
Quote:
Originally Posted by PhilC
I've been posting that Google's 8 billion figure isn't true ever since they put it up, but people still think they have that number of pages in the index. One of my sites has a miximum of ~23,000 pages, but, when Google were claiming 4 billion, they always reported anywhere between 23/24000 and the low 30 thousands. When they put the 8 billion figure up, that site's reported pages also doubled. There is still a maximum of ~23,000, but they've been reporting in the high 50 thousands or low 60 thousands since then.
This is also true for one of our database driven sites. It is not mathematically possible for our site to have 3 million documents in the index, but that was the number Google published for around 5 weeks. That number has since fallen - the current index total for the site sits at just above 2 million out of a possible ~2.5M.
shor is offline   Reply With Quote
Old 08-09-2005   #7
PhilC
Member
 
Join Date: Oct 2004
Location: UK
Posts: 1,657
PhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud of
I've been playing around a bit with Yahoo!'s backlinks searches, and, in spite of the (claimed) fact that they now have 19 billion pages in the index, I've found that the backlinks search is broken - maybe even intentionally.

On the site in my profile, a backlinks search for the homepage returns 254,000 backlinks. The same search for the PageRank article in the site returns 238,000. But a linkdomain: search for the whole site returns only 85,200 backlinks. The backlinks searches are broken. I say "only 85,200", but that's probably a ridiculous figure as well.

I did find one thing that may account for where some of the claimed massive increase in the index size came from. When Yahoo! bought Inktomi, they acquired Inktomi's large index of banned sites/pages, which included one of my sites. Yahoo! said that bans would stay. Last night I found that the site's home page isn't banned any more, although it does rank way below where it would normally rank on a search for its exact Title - like it's now penalised, but not banned. So maybe Yahoo! has declared an amnesty on the pages/sites in the banned index and released them in a penalised state. They could have done it to inflate the index size, or because they think that their programming is now good enough to catch them again if they are still 'bad'.

Having said that, it's only the index page that's no longer banned, but that may be due to changes in the site since the ban.
PhilC is offline   Reply With Quote
Old 08-10-2005   #8
Ben Anderson
Member
 
Join Date: Jan 2005
Posts: 8
Ben Anderson is on a distinguished road
Programmer's Mistake

xbob, What exactly do you mean? How can I see the apparent actual number of backlinns?
Ben Anderson is offline   Reply With Quote
Old 08-11-2005   #9
hardball
Member
 
Join Date: Oct 2004
Posts: 83
hardball will become famous soon enough
Indexed and available are 2 different things, It's doubtful anyone has their genuine "index count" in one index, we already know google runs several indexes, supplemental, main, dupes, banned. For all anyone knows it could be 100 different indexes, the big # doesn't mean much.
hardball is offline   Reply With Quote
Old 08-11-2005   #10
Mel
Just the facts ma'm
 
Join Date: Jun 2004
Location: Malaysia
Posts: 793
Mel is just really niceMel is just really niceMel is just really niceMel is just really nice
Interesting idea that Google may have seperate indexes for banned and dupicate sites. Can you point me to something which substatiates this idea?
__________________
Mel Nelson
Expert SEO Dont settle for average SEO
Singapore Search Engine Optimization and web design
Mel is offline   Reply With Quote
Old 08-11-2005   #11
Marcia
 
Marcia's Avatar
 
Join Date: Jun 2004
Location: Los Angeles, CA
Posts: 5,476
Marcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond repute
Quote:
we already know google runs several indexes, supplemental, main, dupes, banned. For all anyone knows it could be 100 different indexes
No, we don't know that. Is there any documentation or verification for that statement?
Marcia is offline   Reply With Quote
Old 08-12-2005   #12
hardball
Member
 
Join Date: Oct 2004
Posts: 83
hardball will become famous soon enough
Quote:
Originally Posted by Mel
Interesting idea that Google may have seperate indexes for banned and dupicate sites. Can you point me to something which substatiates this idea?
Maybe they just drag them to the recycle bin (my bad).

As I said: "For all anyone knows".

My take is that it is thoroughly possible for Y! to take the lead in index size, they have one critical advantage: less queries per second. Searches on large indexes begin to bog down not only with the size of the index but with the load placed on the query servers (a trade off that could work for Y!). In addition the late comer has the hardware advantage, google is probably still using a lot of old commodity boxes that lack crunch, I don't think Y! bought up surplus PCs for infrastructure but again I can't prove it.

You'll see consistent "pages found" for a "the" search across all of Y!s search properties (AV, ATW and Y!), google and MSN show significantly less results.

That being said, size as it relates to search doesn't matter all that much after a certain point, its not neccesarily a "quality signal".

Last edited by hardball : 08-12-2005 at 12:28 PM.
hardball is offline   Reply With Quote
Old 08-12-2005   #13
PerformanceSEO
"Be Accountable"
 
Join Date: Aug 2005
Location: Los Angeles, CA
Posts: 78
PerformanceSEO will become famous soon enoughPerformanceSEO will become famous soon enough
I noticed substantial Yahoo updates in regards to links over the past 7 days or so.

Last edited by Chris_D : 08-12-2005 at 10:10 PM. Reason: Sorry - no sigs - see http://forums.searchenginewatch.com/faq.php?faq=vb_user_maintain#faq_sigfiles
PerformanceSEO is offline   Reply With Quote
Old 08-14-2005   #14
Mel
Just the facts ma'm
 
Join Date: Jun 2004
Location: Malaysia
Posts: 793
Mel is just really niceMel is just really niceMel is just really niceMel is just really nice
Quote:
Originally Posted by hardball
...

My take is that it is thoroughly possible for Y! to take the lead in index size, they have one critical advantage: less queries per second. Searches on large indexes begin to bog down not only with the size of the index but with the load placed on the query servers (a trade off that could work for Y!). In addition the late comer has the hardware advantage, google is probably still using a lot of old commodity boxes that lack crunch, I don't think Y! bought up surplus PCs for infrastructure but again I can't prove it. ...
I suspect that this is not the case at all. Search engines don't search the full page content of every page in the index for every query.

For instance in the case of Google they have presorted inverted word barrels and choose only the top n results from the relevant barrels so that the time to search a large index or a small index is exactly the same.

I also strongly suspect that all search engines update their servers from time to time, so the late comer does not automatically have the hardware advantage. Of possibly more importance is the way the hardware is used and Googles system of disturbution of the computing chores over more than 10,000 machines is hard to beat.
__________________
Mel Nelson
Expert SEO Dont settle for average SEO
Singapore Search Engine Optimization and web design

Last edited by Mel : 08-14-2005 at 08:54 PM.
Mel is offline   Reply With Quote
Old 08-15-2005   #15
dyn4mik3
Michael Nguyen
 
Join Date: Feb 2005
Location: Riverside,CA
Posts: 49
dyn4mik3 is on a distinguished road
Three researchers from NCSA just released a study on Yahoo/Google index sizes.

My short summary and a link to the study. Obviously the whole study is based on some heavy assumptions, but it's an interesting read.
dyn4mik3 is offline   Reply With Quote
Old 08-16-2005   #16
shor
aka Lucas Ng. Aussie online marketer.
 
Join Date: Aug 2004
Posts: 161
shor is a jewel in the roughshor is a jewel in the roughshor is a jewel in the roughshor is a jewel in the rough
Interesting study, but was it necessary?

The whole Cold War my-index-is-bigger-than-yours onesupmanship has been around for a long time and means little beyond what a journalist can quote in the media. The public can never verify the actual index size without a proper 3rd party audit as the SEs can simply claim that an outsider can not properly measure their index.

Thus we have the ongoing e-penis wars. THIS BIG
shor is offline   Reply With Quote
Old 08-16-2005   #17
Mel
Just the facts ma'm
 
Join Date: Jun 2004
Location: Malaysia
Posts: 793
Mel is just really niceMel is just really niceMel is just really niceMel is just really nice
IMO its interesting to know how many pages search engines have indexed as the more pages they have the more likely they are to contain content which other engines do not.

I agree that the my index (or whatever) is bigger than yours scenario is a bit overdone, but what else can the people in the Yahoo marketing deparment do to convince users that theirs is the best search engine?

Surely not a search engine shoot out?
__________________
Mel Nelson
Expert SEO Dont settle for average SEO
Singapore Search Engine Optimization and web design
Mel is offline   Reply With Quote
Old 08-16-2005   #18
dannysullivan
Editor, SearchEngineLand.com (Info, Great Columns & Daily Recap Of Search News!)
 
Join Date: May 2004
Location: Search Engine Land
Posts: 2,085
dannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud of
Surely yes, a shoot out and regulary relevancy reporting according to an agreed industry standard would be much better. Size is just the same old thing, a figure used as a surrogate for relevancy but which doesn't mean the same thing at all.
dannysullivan is offline   Reply With Quote
Old 08-16-2005   #19
xan
Member
 
Join Date: Feb 2005
Posts: 238
xan has a spectacular aura aboutxan has a spectacular aura about
Hi all,

I think that there are many reasons to be skeptical about it. They have a huge amount of data (assume they do have a massive index) and the same method of providing results. If anything the results run the risk of being worse.

There have been tests done by Google and the National Center for Supercomputer Applications which seem to show that the index is not that big at all.

Large indexes go stale pretty quick. Have a look at the crawls on your site maybe, are they harder and faster?

I wrote a lot more on this at your trusty Search Science blog .

Hope you find it interesting. Let's see what happens to this scenario.
xan is offline   Reply With Quote
Old 08-16-2005   #20
PerformanceSEO
"Be Accountable"
 
Join Date: Aug 2005
Location: Los Angeles, CA
Posts: 78
PerformanceSEO will become famous soon enoughPerformanceSEO will become famous soon enough
Quote:
Originally Posted by dannysullivan
Surely yes, a shoot out and regulary relevancy reporting according to an agreed industry standard would be much better. Size is just the same old thing, a figure used as a surrogate for relevancy but which doesn't mean the same thing at all.
I fully agree with Danny. There should be an independent 3rd party routine testing of relevancy on a semi-annual or quarterly basis. Perhaps the IAB or the W3C? This would really help the public understand which search engine is "bigger and/or better".

I recall when MSN dropped Looksmart from their syndication earlier than anticipated after being given a walk-through of the results which turned out to be completely off-topic.
PerformanceSEO is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off