Special thanks to:
|
#1
|
|||
|
|||
|
Ridiculous Increase In Yahoo Backlink Counts & Is Bigger Index Real?
Just Noted tonight that my 2000+ links on Yahoo had shot up to over 24,000!
Checked with a few others on our forum who are all reporting roughly tenfold increases in Backlinks reported in Yahoo. |
|
#2
|
|||
|
|||
|
Yahoo link surge possibly a programmatical error?
Yes, this has been happening for two solid days now. It is also occuring in Overture's Alltheweb (which is basically Yahoo also).
One strange thing i noticed is if you click through the result pages and pay attention to the number of backlinks at the top of the page, you will find the apparent actual number of backlinks, as opposed to the erronous results you recieve from the initial link: search. I feel sorry for the programmer that made that mistake. ![]() |
|
#3
|
|||
|
|||
|
Well, at least we know where most of these new backlinks came from:
Yahoo Expands Its Search Engine Index |
|
#4
|
|||
|
|||
|
Yahoo now claims 19 billion documents indexed, according to a widely-distributed Associated Press piece by Michael Liedtke that went out today, August 8, 2005. This is the first time Yahoo has announced its total.
Last November Google said, "I raise my bet from 4.2 billion to 8.2 billion." I was skeptical of Google's numbers. On my large site, I can prove that Google's counting is utterly inflated, by as much as 10 to 20 times the total pages that have ever existed on this site. This has been going on for many months. Yahoo's actual counts for numbers of pages have been more believable than Google's since it started, but recently they're inflated also. With this 19 billion figure that Yahoo reported today to the Associated Press, I think Yahoo got tired of Google's numbers, and is saying to Google, "I'll see your 8.2 billion and raise you 10.8 billion." I don't believe Yahoo's numbers either, but nothing will shut Google up except a better bluff. Here's what might be going on: With the Google/Baidu bubble, some people in high positions are thinking that they had better start letting the air out of Google before it bursts. It's better to have a soft landing, because there will be less damage that way. |
|
#5
|
|||
|
|||
|
I've been posting that Google's 8 billion figure isn't true ever since they put it up, but people still think they have that number of pages in the index. One of my sites has a miximum of ~23,000 pages, but, when Google were claiming 4 billion, they always reported anywhere between 23/24000 and the low 30 thousands. When they put the 8 billion figure up, that site's reported pages also doubled. There is still a maximum of ~23,000, but they've been reporting in the high 50 thousands or low 60 thousands since then.
It sounds like Yahoo! has decided to play the same game, although that doesn't account for the silly backlinks figures. |
|
#6
|
|||
|
|||
|
Quote:
|
|
#7
|
|||
|
|||
|
I've been playing around a bit with Yahoo!'s backlinks searches, and, in spite of the (claimed) fact that they now have 19 billion pages in the index, I've found that the backlinks search is broken - maybe even intentionally.
On the site in my profile, a backlinks search for the homepage returns 254,000 backlinks. The same search for the PageRank article in the site returns 238,000. But a linkdomain: search for the whole site returns only 85,200 backlinks. The backlinks searches are broken. I say "only 85,200", but that's probably a ridiculous figure as well. I did find one thing that may account for where some of the claimed massive increase in the index size came from. When Yahoo! bought Inktomi, they acquired Inktomi's large index of banned sites/pages, which included one of my sites. Yahoo! said that bans would stay. Last night I found that the site's home page isn't banned any more, although it does rank way below where it would normally rank on a search for its exact Title - like it's now penalised, but not banned. So maybe Yahoo! has declared an amnesty on the pages/sites in the banned index and released them in a penalised state. They could have done it to inflate the index size, or because they think that their programming is now good enough to catch them again if they are still 'bad'. Having said that, it's only the index page that's no longer banned, but that may be due to changes in the site since the ban. |
|
#8
|
|||
|
|||
|
Programmer's Mistake
xbob, What exactly do you mean? How can I see the apparent actual number of backlinns?
|
|
#9
|
|||
|
|||
|
Indexed and available are 2 different things, It's doubtful anyone has their genuine "index count" in one index, we already know google runs several indexes, supplemental, main, dupes, banned. For all anyone knows it could be 100 different indexes, the big # doesn't mean much.
|
|
#10
|
|||
|
|||
|
Interesting idea that Google may have seperate indexes for banned and dupicate sites. Can you point me to something which substatiates this idea?
|
|
#11
|
||||
|
||||
|
Quote:
|
|
#12
|
|||
|
|||
|
Quote:
As I said: "For all anyone knows". My take is that it is thoroughly possible for Y! to take the lead in index size, they have one critical advantage: less queries per second. Searches on large indexes begin to bog down not only with the size of the index but with the load placed on the query servers (a trade off that could work for Y!). In addition the late comer has the hardware advantage, google is probably still using a lot of old commodity boxes that lack crunch, I don't think Y! bought up surplus PCs for infrastructure but again I can't prove it. ![]() You'll see consistent "pages found" for a "the" search across all of Y!s search properties (AV, ATW and Y!), google and MSN show significantly less results. That being said, size as it relates to search doesn't matter all that much after a certain point, its not neccesarily a "quality signal". Last edited by hardball : 08-12-2005 at 12:28 PM. |
|
#13
|
|||
|
|||
|
I noticed substantial Yahoo updates in regards to links over the past 7 days or so.
Last edited by Chris_D : 08-12-2005 at 10:10 PM. Reason: Sorry - no sigs - see http://forums.searchenginewatch.com/faq.php?faq=vb_user_maintain#faq_sigfiles |
|
#14
|
|||
|
|||
|
Quote:
For instance in the case of Google they have presorted inverted word barrels and choose only the top n results from the relevant barrels so that the time to search a large index or a small index is exactly the same. I also strongly suspect that all search engines update their servers from time to time, so the late comer does not automatically have the hardware advantage. Of possibly more importance is the way the hardware is used and Googles system of disturbution of the computing chores over more than 10,000 machines is hard to beat. Last edited by Mel : 08-14-2005 at 08:54 PM. |
|
#15
|
|||
|
|||
|
Three researchers from NCSA just released a study on Yahoo/Google index sizes.
My short summary and a link to the study. Obviously the whole study is based on some heavy assumptions, but it's an interesting read. |
|
#16
|
|||
|
|||
|
Interesting study, but was it necessary?
The whole Cold War my-index-is-bigger-than-yours onesupmanship has been around for a long time and means little beyond what a journalist can quote in the media. The public can never verify the actual index size without a proper 3rd party audit as the SEs can simply claim that an outsider can not properly measure their index. Thus we have the ongoing e-penis wars. THIS BIG ![]() |
|
#17
|
|||
|
|||
|
IMO its interesting to know how many pages search engines have indexed as the more pages they have the more likely they are to contain content which other engines do not.
I agree that the my index (or whatever) is bigger than yours scenario is a bit overdone, but what else can the people in the Yahoo marketing deparment do to convince users that theirs is the best search engine? Surely not a search engine shoot out? |
|
#18
|
|||
|
|||
|
Surely yes, a shoot out and regulary relevancy reporting according to an agreed industry standard would be much better. Size is just the same old thing, a figure used as a surrogate for relevancy but which doesn't mean the same thing at all.
|
|
#19
|
|||
|
|||
|
Hi all,
I think that there are many reasons to be skeptical about it. They have a huge amount of data (assume they do have a massive index) and the same method of providing results. If anything the results run the risk of being worse. There have been tests done by Google and the National Center for Supercomputer Applications which seem to show that the index is not that big at all. Large indexes go stale pretty quick. Have a look at the crawls on your site maybe, are they harder and faster? I wrote a lot more on this at your trusty Search Science blog . Hope you find it interesting. Let's see what happens to this scenario. |
|
#20
|
|||
|
|||
|
Quote:
I recall when MSN dropped Looksmart from their syndication earlier than anticipated after being given a walk-through of the results which turned out to be completely off-topic. |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|