PDA

View Full Version : Significant Changes In Google Results: March 2005


PhilC
03-23-2005, 12:23 AM
There were some significant ranking changes in the serps today which are being talked about in at least one forum, but I don't believe that it's an update in the normal sense. Bear with me while I explain...

I started watching the 64 DCs some weeks ago. At that time I recognised 3 distinct groups of DCs. They were recognisable by the significantly different rankings that were returned for a particular searchterm. I look lower down the serps where the differences are greater and more easily noticed. There are DC sets within the groups, which return slightly different rankings, but not too significant.

Soon after I started watching, 2 of the DC groups merged, and there have been 2 groups ever since. Group A was the smallest. It had about one third of the DCs, and group B had the other 2 thirds. Through the weeks, group A has gradually grown to about 2 thirds of the DCs - the trend is towards group A.

Today group A had all but 3 of the DCs when I first looked - that's 61 out of 64. It meant that many people saw changes in rankings due to a lot more DCs returning group A results, plus some shuffling within the original group A DCs, and it has been talked about.

But even now, things are back to the way they were yesterday, with group A having about 2 thirds of the DCs. I've seen it happen before but not the the extent that it did today.

I keep using the word "about" when it comes to numbers. It's because we don't always receive the results from the datacenter that we explicitly search. We are sometimes redirected to another datacenter. It's something to bear in mind if you use one of those "Google dance" tools.

So right now, things look pretty much the same as they did at this time yesterday. There was a significant change, and people noticed it, but it doesn't appear to have lasted.

Anyone seen anything? Anyone seen any significant ranking changes that *have* lasted? I don't analyse the DC results so I could be missing something.

Robert_Charlton
03-23-2005, 05:24 AM
There was a significant change, and people noticed it, but it doesn't appear to have lasted.

Phil - Not sure which forum you're talking about, but I was an early reporter of some changes on one discussion, and one significant change I saw does seem to have lasted till now... ie, one search phrase I've monitored apparently falling to oblivion and not yet returning to what had heretofor been a pretty solid position.

It wasn't a highly commercial phrase... more informational... but it brought visitors to a site, and I'm hoping it's temporary. I've seen lots of pages drop out and reappear from time to time. The disappearance of this one gives me an uneasy feeling, though, because it was accompanied by some other changes... the biggest movement I've seen in these particular serps in a year... and I'm sensing algo change rather than glitch. It's just a gut feeling, though.

nongasia
03-23-2005, 06:28 AM
Results today on many terms different from what they have been for the past week. I have joined this forum to try and find out how this all works. It is very difficult if not impossible for small businesses to keep up with these changes, I started reading everyone's opinions about <h1><h4> etc, inbound links, outbound links, keywoprd density, some search engines look at Meta tags and Google doesn't. How does a small guy with a website about his business achieve any sort of position in a search? Recently looking for the results for an Indian restaurant in a particular town generated results about hotels with restaurants and references to 'Indian Ocean' Islands.

It is interesting to move outside the box and try searches for things not relevent to your own business and see what kind of results show up. It is all very confusing.

sootledir
03-23-2005, 08:32 AM
I've seen a great number of changes, including the re-emergence of terms that were "lost" on 2/2. The results are very unstable and blink on and off.

inlogicalbearer
03-23-2005, 10:28 AM
On my side I see two differents sets. Even this morning. The first set with the stop words manipulate to 8 billion and the new one going off and on since first week of march where "the" give you less than 3 billion.

http://inlogicalbearer.blogspot.com/2005/03/something-have-changed-at-google.html

chriseo
03-23-2005, 10:52 AM
Hi all, first post :) I have noticed quite a few changes recently with results. But I have also seen backlinks in Google alternate for the past 2 weeks. It seems everyday they switch between 2 sets of results. Any thoughts?

PhilC
03-23-2005, 11:05 AM
The results across the DCs continue to change for me, but on the whole, group A still continues to grow.

The "blinking" is due to receiving the results from different datacenters, which I seperate into 2 groups. Right now, there are 52 in group A and 12 in group B. For a search on "the" (no quotes), Group B DCs return 8 million results and group A returns various figures between 2,960,000 and 3,700,000 depending on which DC supplies the results. For other searches, the 3 sets that are currently in group B vary in the number of results returned. (A "set" is a a small group of DCs that have the same IP address except for the last number - e.g. 64.233.171.xxx)

It's just not like it used to be, where the different DCs would converge over a few days. They have shown no signs of converging since I've been watching them. Even though group A is getting bigger, there are a number of different results being returned by its datacenters, and the same is true of group B. My impression is that there are 2 Google's - 2 distinctly different algorithms. And within each Google, there are small variations in algorithms. It seems to me that we can no longer think in terms of a page's ranking for a particular searchterm - we have to think of its various rankings for the searchterm.

hiero
03-23-2005, 11:09 AM
I've seen mixed results being returned for the same searches since the February changes, so I don't believe this to be new. I do the same searches on a daily basis and have been doing them for some time since February and there are 2 sets of results being returned. One set has a larger index and the other a smaller index. It's weird that some people have just started to see that, maybe it was being tested on the west coast DCs.

dazzlindonna
03-23-2005, 11:18 AM
I've seen the same thing PhilC. In my case, Group B resulted in a return of my rankings from the mid-December update (which not everyone noticed), but Group A is now on nearly every datacenter and it looks like the return of the mid-December results. Like you, Group B was the dominant group most of the time over the last few weeks, but the last couple of days have seen Group A overtake Group B.

lots0
03-23-2005, 11:39 AM
My impression is that there are 2 Google's - 2 distinctly different algorithms. Now your making some sense, not that you don't usually. ;)

I remember discussing the issue of different google algo's some time ago (about a year ago), with Chris Ridings. It seemed to me to be the only explanation of what was going on with the google SERP.

If you want to discourage people from learning your algo, I can't think of a better way than to keep changing it up on a almost random bases.

When the algo shifted the other night, I had two servers go down because of all the traffic I was getting from google, it overwhelmed the servers. Let me stress this was not traffic from googlebot this was genuine real traffic from google and a lot of it, never seen anything like it before. This traffic only lasted for about 6 hours then the SERP reverted back to what it was before and the traffic went back to "normal".

<added>
I do believe that there are at least three completly different algos in play with google.

PhilC
03-23-2005, 11:54 AM
It's weird that some people have just started to see that, maybe it was being tested on the west coast DCs.It's been discussed, but quietly, for some time. I brought it up yesterday because the change (in group A) was far more significant than the changes that we've been used to for while.

Some time ago, Matt Cutts said that Google uses several algorithms at random. Until I started to watch the DCs a few weeks ago, I simply didn't believe it. But I changed my mind. I don't believe the different numbers of results are anything to do with different index sizes, because it doesn't make sense to not update a bunch of DCs for so long. It does make sense to use different DCs to test algo tweaks without ruining all the serps. It also makes sense to run significantly different algos on some DCs while a major change is being tested and tweaked - the 2 groups.

PhilC
03-23-2005, 12:00 PM
I do believe that there are at least three completly different algos in play with google.There was a small group (2 sets - 64.233.183 and 64.102.11) that held their own, but I lumped them in with group B because their rankings were closer to B than to A. But it seems to have merged into group A now. Maybe you mean another group. Which DCs are in the third group?

lots0
03-23-2005, 12:08 PM
Phil, I am seeing the same thing you are.

It does not look like the "third" algo is currently in use.

But, I'll bet a third (smaller) group breaks out again in the next few days.

PhilC
03-23-2005, 12:12 PM
Well that small group has been in A for at least a week now, but if you want to bet on it....

any takers????? ;)

Frank Kilkelly
03-23-2005, 12:23 PM
3 different algos, could it be possible that Google is monitoring and comparing how users interact with the different algos? e.g. Algo A results in 33% less result page navigations than Algo B so therefore the user must be more satisfied with the results from Algo A. They could be testing the relevancy satisfaction of different algos.

lots0
03-23-2005, 12:28 PM
I guess I was wrong.

64.233.167.99

216.239.57.105

64.233.183.99


When I last checked these three were returning different results.

PhilC
03-23-2005, 12:28 PM
That's quite possible. But there are more than 3 algos, imo. Each of the 2 groups produces several different "of about" numbers for the results, and, rightly or wrongly, I put that down to minor algo tweaks/variations. I'm assuming that all the DCs have the same index most of the time.

RyanM
03-23-2005, 12:29 PM
Yesterday a co-worker and I were running a search for the number of pages with a particular website. The search on Google was "site:www.domain.com www". She got 4,000 pages while I got 1,400 pages. We wrote it off as a fluke, but PhilC's post makes sheds some light on this.

sootledir
03-23-2005, 12:34 PM
The main reason they would do something like this would involve measuring user metrics. Does he hit the back button? Does he read 4 pages?

This way they could monitor the "quality" of the SERPS based on several distinct algos. They could also figure out which algo generates the most revenue.

PhilC
03-23-2005, 12:36 PM
I guess I was wrong.

64.233.167.99

216.239.57.105

64.233.183.99


When I last checked these three were returning different results.The 57 set has been frequently switching between groups, although the switches may be just redirections.

The 183 set is almost always the same as the 11 set (part of that small ex 3rd group), although odd DCs do switch for short periods of time.

Individual DCs in most of the sets switch from time to time, as do whole sets.

I believe/assume the redirections are due to load balancing, and updating.

lots0
03-23-2005, 12:39 PM
But there are more than 3 algos, imo.
What I think is that there are three main algo (for lack of a better word) templates.

Within these templates, there are tweaks and adjustments done by the engineers to polish up the product...

PhilC
03-23-2005, 12:49 PM
Google has shown varying numbers for a site: search for a very long time. Different algos may have been around for a very long time without being noticed. Everything was always put down to "Google flux".

That idea of testing algos on users could be right, sootledir. In another thread, I posted a thought about why Google seems to give new sites high rankings for a short period - to find out what people think of the sites/pages (how quickly they click the Back button - like DirectHit used to do) and give them a score for ranking purposes. It was just a passing thought, but maybe Google really is trying to incorporate a "user evaluation" into the rankings. It's an interesting idea :)

Frank Kilkelly
03-23-2005, 12:55 PM
maybe Google really is trying to incorporate a "user evaluation" into the rankings. It's an interesting idea :)

It makes absolute sense to do this. I think a search engine's goal is the have the user spend the least amount of time on their result pages for a given query. A lingering user is a confused and unsatisfied user.

What I think is that there are three main algo (for lack of a better word) templates.

I really like this idea, 'plug and play' configurations of their overall algo. Survival of the fittest or if you will Algolution :)

lots0
03-23-2005, 01:00 PM
Google really is trying to incorporate a "user evaluation" into the rankings. It's an interesting idea

That might explain those google referal URLs I have been seeing.
http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=GGLD,GGLD:2004-34,GGLD:en&q=f

and yes I do optomize for the letter "f".... :rolleyes:

PhilC
03-23-2005, 01:10 PM
I think a search engine's goal is the have the user spend the least amount of time on their result pages for a given query.Some would suggest that they *want* people to click the Back button and come back to where the AdWords are, so they could evolve the algo to that end.

and yes I do optomize for the letter "f".You live in a different world, Lots0 :D

Frank Kilkelly
03-23-2005, 01:16 PM
Some would suggest that they *want* people to click the Back button and come back to where the AdWords are, so they could evolve the algo to that end.

Heh you could be right there Phil. I was of course thinking for the good of of relevant results but of course Google may have 'other' priorities ;)

PhilC
03-23-2005, 03:57 PM
I wonder if all the DCs are actually going to converge into group A. I am seeing all but 1 of them in group A right now, but it could be due to a fair number of redirections.

JohnW
03-23-2005, 10:34 PM
Here is a great tool that lets you compare kw results simultaneously at all of the different datacenters.

http://www.scroogle.org/scraper3.html

I, Brian
03-24-2005, 06:55 AM
Some time ago, Matt Cutts said that Google uses several algorithms at random. Until I started to watch the DCs a few weeks ago, I simply didn't believe it.
I could certainly see this is some niche searches - I was actually going to perform a public study of it, because there was a very obvious repeating pattern and I wanted to track how this was being applied.

However, it's been so difficult to take note of such since the Feb update, because the SERPs have been so widely fluctuating and there have been some apparently incomplete indices migrating around.

I've actually wondered if Google had actually been using multiple algo variations intentionally over the past few weeks, and using feedback and click data to determine which variations might be producing the more relevant results, and proceeding carefully in that manner.

By what you're reporting, it looks like Google could be fairly close to settling on a form that they actually have fully confidence in (presuming my presumptions are anyway reflective of this) - in which case, perhaps we will soon be able to sit down and perform proper keyword ranking analyses on this.

If they are rotating algo variations on a single complete index, then it should be fairly easy to observe - UK searches on Google UK was able to provide a window on previous algo rotation, so it'll be interesting to see what applies over the rest of the month.

PhilC
03-24-2005, 11:51 AM
I can't see them using different indexes on different DCs, but maybe there's a reason to do that. I put the different "of about" numbers down to different algos.

The idea of them using click and return data to tweak the algos was suggested in another thread somewhere. It was also coupled with new sites getting high rankings briefly, and the suggestion was that they did that to gather click and return data so that they could give the site/pages some sort of "user evaluation" figure. But that was just a wild guess.

I think it's difficult to observe algo rotations, if that's what they are doing, because you don't always get the results from the datacenter that you explicity search. Many DCs are very stable and many are not, and I feel sure that their instability is down to redirecting and not to switching algos within the DC. My view is that the different algos are in different DCs.

lots0
03-24-2005, 11:56 AM
I wonder if all the DCs are actually going to converge into group A. I am seeing all but 1 of them in group A right now, but it could be due to a fair number of redirections.
I am seeing things all over the board this morning.


Phil, when checking the DCs, are you using pages that are coming up in the top 10 or 20 results? ...or are you using pages that are lower in the rankings?

I have been using lower ranking pages. I maybe wrong, but it seems the lower you get in the SERP the more the results vary.

Also, has anyone noticed a difference in the results after clearing googles pref cookie?

lots0
03-24-2005, 12:06 PM
I feel sure that their instability is down to redirecting and not to switching algos within the DC. My view is that the different algos are in different DCs.

It would only be logical to have different algos installed in different DCs.

google is doing so much redirecting right now, it may very well be one of the main reasons for the instability of the results.

JohnW
03-24-2005, 12:14 PM
IMO what we are seeing is that some DCs have different indexes, clearly indicated by the number of results returned for various searches – kws, link, site, allin’s etc.

Maybe there are multiple algos at play, maybe not. Either way I don’t see this as being the cause of the major differences we are seeing. I don’t think the algo would cause the size of the index to be different. I do think that the same algo would produce different results if applied to a different index.

lots0
03-24-2005, 12:28 PM
Either way I don’t see this as being the cause of the major differences we are seeing. I don’t think the algo would cause the size of the index to be different.
I hope I am not taking your statement out of context here.
Cuz, I have to disagree.

The number of results returned for a search query do not indicate the size of the index.

The number of results returned for a search query is directly effected by how the algo parses the (reverse) index.

PhilC
03-24-2005, 12:49 PM
I saw group A recede last night, but right now I'm seeing it as having 61 of the 64 DCs - but that's just an 'about' number due to possible redirections. The individual DCs in one set are certainly switching (redirecting)

I look lower down the serps Lots0. The results are like a pendulum - the further away from the top, the greater the movement, and the easier they are to notice.

They may have different indexes, JohnW, but I don't see the reason for it. As Lots0 said, the "of about" numbers are produced by algos, and different algos are likely to produce different "of about" numbers.

Don't forget that there are more than 2 "of about" figures. For a search on 'the'. Right now I see (in millions):- 2760, 2950, 3200, 3220, 3420, 3710, and 8000. That's 7 different indexes, or the numbers are produced by algos. I prefer the latter.

Frank Kilkelly
03-24-2005, 01:09 PM
That's 7 different indexes, or the numbers are produced by algos. I prefer the latter.

I agree that it is the same index with different variations of the algo. But what could be the factor/s that effect the index size so much. I mean 2760 all the way up to 8000, that one hell of a variation. Any ideas? Is it LSI related maybe?

Naiden
03-24-2005, 01:15 PM
I don't know if this will help you, but on one of my sites I'm seeing something I haven't saw before.
I search the BLABLA keyword:
At google.com, my position goes 5. An hour later goes 8. some time later goes 5. Maybe an hour later goes 8 and later returns to 8.
At google.es, my position for same keyword is 19. An hour later goes 16. Some time later goes 19 and later returns to 16.
And this is what Google have been doing for this three latest days.

PhilC
03-24-2005, 01:27 PM
That's normal, Naiden. It's because you receive the results from whatever datacenter Googles chooses at the time, and your page is ranked differently in different datacenters. The datacenter that you receive the results from often changes from page to page of the results.

I never gave much thought to what could make such big differences in the "of about" figures. I did have one thought a few weeks ago - that it could be something to do with pre-selection in one form or another. For instance, the Hilltop algo uses a form of pre-selection in that the only pages that get into the results are those that are linked to from expert pages. I'm not suggesting that they are using an expert system - it's just an example of a pre-selection method.

It would be interesting to come up with possible reasons for the big differences in the "of about" numbers. The small differences would be just minor algo changes (I think).

JohnW
03-24-2005, 03:02 PM
My thoughts are based only on what I have been seeing – on Feb 3rd one of my sites, a #1 ranked site in a competitive space went down to the 450-500 range for the top kws. Since Feb 6, I have tracked a specific kw, multiple times daily, across all of the DCs that I know about. (64 servers/17 DCs). I have recorded and modeled the wide variety of changes I have seen.

For a while, we had to look at each server in each DC separately. On March 2, for my kws, the various servers (.99, .104 etc) in the DCs finally came in sync with each other, at all DCs except the 64.233.161 DC, which finally synced up its servers (for my terms) on Mar 10. As of Mar 10 through the present, for my kws, all servers in each DC have been the same (i.e. the .99, .104, .147 etc. match each other). This does not mean that all DCs matched each other, they did not. But as of Mar 10, at least all servers within each DC were consistent with each other, either good or bad, for my kws.

As all of this progressed from Feb 6, I saw my kws coming back (the larger “good” index showing instead of the smaller “bad” index) in more and more servers/data centers, falling out, coming back etc., trending up, down, up again. Based on what I saw and recorded, the results seemed to be related to which index was being used, the differentiation between the indexes was based on apparent differences in the size of the index. This size assumption was based only on the logical assumption that the universe (size) of the index is related to the total number of results returned for various queries. At each point along the way, we looked at total results returned for kw plus various allin and other queries.

On March 14, things started getting better each day and finally on Mar 18 and since then, my kws are back at #1 in 17/17 of the DCs that I track, on all servers in each DC. Based on what I have seen, my thoughts were that the indexes, rather than the algos, were different.

I admit that I am not sure. Because results have settled down for me, for now, the fact that others are still seeing major swings is puzzling. Does the algo control the number of available results? Or does the universe of data that the algo is applied to affect the number of results? Based on what can be actually seen, I concluded that there was more evidence saying that the algo works pretty much the same way but provides different results when it is comparing different sets of pages (and links, etc) that it is looking at. If there is any data that shows the algo itself controlling the total number of results returned, I would like to see it. And if this is a fact, how does this explain (or even allow for) the wide swings and settling that occurred on one site, while my other sites were unaffected, and the swings that I seem to be now past still being experienced by others?

Frank Kilkelly
03-24-2005, 03:44 PM
JohnW, were the keywords you were tracking competitive? Maybe the wild swings that some people are still seeing are somehow related to how competitive the kws are, similar to the they way competitiveness of a kw supposedly effects length in the sandbox.

JohnW
03-24-2005, 04:12 PM
Good point. I guess competitive is a relative term. The particular kw show search traffic of only 20k overture as an example, but these are $$$ words where every one of the top 20 or so results are all the product of professional SEO work.

PhilC
03-24-2005, 06:11 PM
I've been using the word "datacenters" instead of "servers", but "servers" makes more sense so I'll change. And I've been using the word "sets" instead of "datacenters" - I'll change that too.

Just one small point John. Don't forget that we are sometimes redirected when we search a specific server, so you don't always get the results from the server you requested them from. It means that we can't know that we have the results from all the servers when we search them all.

I can't be certain that we are not seeing the results from different indexes, but I don't think we are. Right now, the (competitive) phrase, 'search engine optimization', that I regularly check has the same ranking for me across all group A servers, but the results are not the same acrosss them all - other rankings are different. There are 6 different "about" figures in the 15 group A DCs, and 2 different ones in the 2 group B DCs (63 and 161). All of the figures are in the same ballpark.

The reason I don't think we are looking at different indexes is because I can't come up with a good reason for Google to have 8 different indexes across the DCs. I could understand 2 different indexes but not 8. So I have to assume that the different "about" figures are to do with slight differences in the algo rather than different indexes. But it's just an assuption.

Grumpus
03-24-2005, 07:20 PM
I haven't really kept up with what servers are where in recent months. Are the ones you guys are looking at different servers at the same location? Or are they different servers across a distance? If they are doing a massive DB update and the data has to cross a distance, it would take some time - and, it also takes time to infuse the new data into the index on that set of servers. So, the different "of abouts" could just be a matter that this set of servers has managed to assimilate X amount of data so far while that set of servers has only assimilated Y.

G.

JohnW
03-24-2005, 07:45 PM
>we are sometimes redirected when we search a specific server, so you don't always get the results from the server you requested them from. It means that we can't know that we have the results from all the servers when we search them all.

That’s true with a browser – and you can see the redirects by watching the http response headers. But request to a specific server IP address seems to go to the specified server. The redirects seem to occur when there is DNS involved but not with the IP. One clue is to mouseover the cache and look at the IP in the Chrome. That is a the real number.

PhilC
03-24-2005, 11:52 PM
The datacenters are at different locations, Grumpus, but it doesn't take well over a month to update them, which is how long I've been watching them. It only used to take a few days.

Checking the cache URLs or response header is a good idea John but it's something that I haven't done, and I'm not sure that it would work. Have you been able to see redirects by looking at either of those?

I haven't looked at either before, and I've based my belief that servers sometimes redirect on the speed/frequency at which some of them sometimes change their results. Sometimes a server will change its results every few seconds.

The cache URLs are relative in the page's code, and don't show anything useful in the browser, and a 200 response header always includes ".google.com" in the cookie, regardless of which server was specifically searched. I haven't checked the header when a server is known to be switching results. Have you done it?

The reason that I'm doubtful that checking the header will work is because I'm assuming that the redirect is not done in the way that we would do it, so I don't think that a redirecting header will be returned. For that to happen, the server that is being redirected from would need to send it, and I doubt that that's the way Google would do it. I've assumed that the redirect is deeper in the Internet system than that, and is done by the DNS system or something along those lines - zone files? (that stuff is beyond my current knowledge). They must surely have that kind a redirecting system in place so that they can close a server down when necessary without affecting any functioning of the overall system.

strategicrankings
03-25-2005, 03:23 AM
With the actual PR update, i see google's home page with a PR8 on some DC's, Yahoo! PR7. I see some sites dropping from PR6 to PR3. I know, :D its not yet finished.....


more and more difficult.

I, Brian
03-25-2005, 08:44 AM
One thing that has really struck out for myself for the Feb update - and seems to be continued in the March results - is that Google, by my perception, seems to be trying to handle redirects in a different way.

I keep seeing URLs in the SERPs that redirect to different domains - and this is even for domains that I know for a fact have been redirecting anywhere between 6-12 months, so it's hard to put it down to merely length of spidering issues.

I'm not talking about doorways here - simply mainstream websites that have updated to new domains as part of general internet evolution - and so far as I know are using 301's, rather than 302's, which seems all the more odd.

Normally I'd not expect to see these redirected URLs listed in the SERPs, even when tagged as Supplemental Results, so I find it not only odd that I've been seeing them since the Feb update, but that they have also remained.

It became such a point of issue that I started removing previous 301's I retained control over, instead leaving placement pages or even old archived versions of sites up instead.

JohnW
03-25-2005, 08:56 AM
From what I can tell, they use some type of a distributed dns solution that combined with some DC hardware/proxy solution tries to route browser based queries (DNS) to the nearest DC, based on availability. They can estimate your location by IP and send you to the closest server, provided that server is up and not too busy. I believe that is what accounts for the fact that a different server is offered from time to time. What I have not really thought about though, is if their cookie may also play a role for the browser.

>I'm assuming that the redirect is not done in the way that we would do it, so I don't think that a redirecting header will be returned.

That would be a cute trick. I do not know of any type of redirect that could be completely hidden but if there is such a thing I would sure like to know about it ;-)

Like most things in this biz, all you can do is try stuff and toss out ideas for critique or validation.

One additional comment:

>I can't come up with a good reason for Google to have 8 different indexes across the DCs.

What makes you think they did this on purpose? Looking back over the stuff that has happened over the past 4+ years, I would say it is an accident that they are still fixing.

PhilC
03-25-2005, 09:18 AM
>I'm assuming that the redirect is not done in the way that we would do it, so I don't think that a redirecting header will be returned.

That would be a cute trick. I do not know of any type of redirect that could be completely hidden but if there is such a thing I would sure like to know about it ;-)
I admit that I don't know much about the low level workings of the Internet itself, but wouldn't a change in the zone files, or something like that, cause an immediate change in the particular server that is accessed?

I do know that, if redirections do occur, they are not done by sending redirecting response headers such 302s. But I feel sure that they do occur. Not only do individual servers sometimes switch for very short periods, but whole DCs switch for much longer periods, before switching back. Imagine doing a complete index update without preventing the server from providing results.

What makes you think they did this on purpose? Looking back over the stuff that has happened over the past 4+ years, I would say it is an accident that they are still fixing.
It's possible, but it isn't something that can't be fixed very quickly. It shouldn't be a problem to turn the DCs off one at a time to update the whole index. It doesn't take that long.

aixtal
03-25-2005, 10:50 AM
If anybody is interested, I have made a list of systematic queries on 40 different DCs this morning, which clearly shows three different groups : see A snapshot of the update (http://aixtal.blogspot.com/2005/03/google-snapshot-of-update.html)

PhilC
03-25-2005, 12:47 PM
You're missing quite a few servers, aixtal. There's a list of 64 currently working ones at http://www.vaughns-1-pagers.com/internet/google-data-centers.htm

The 161s, 37s, and 39s frequently switch for periods of time - short times to long times, but most of time they are in group A or group B, so I don't see them as group in themselves, but maybe I haven't looked enough at the "of about" numbers.

aixtal
03-25-2005, 12:57 PM
Great! Many thanks. I will update my scripts. Anyway, this seems to confirm that there are three groups with different behaviours.

aixtal
03-25-2005, 01:09 PM
PhilC> I checked the list you pointed to above. It doesn't differ much to mine in terms of C-class addresses, which is the reason I was kind of sloppy. I assumed that the entire group of IP addresses from the same C-class returns the same results. Am I wrong ? Has anybody seen differences in the same class?

Also, I noticed that our lists overlap. I missed 64.233.179 and 216.239.63, but at the same time Vaughn seems to miss 64.233.185 and 64.233.189, which do respond.

Are there any other around? Does anybody have a complete list?

PhilC
03-25-2005, 01:15 PM
Yes, the servers in some DCs do sometimes change individually. 161, 39 and 53 do it quite often, but others also do it. In fact, 161 is doing it right now.

I'm only aware of 64 currently working servers.

Everyman
03-25-2005, 01:32 PM
Here are 74 that currently work:

216.239.37.104
216.239.37.105
216.239.37.106
216.239.37.107
216.239.37.147
216.239.37.99
216.239.39.104
216.239.39.106
216.239.39.107
216.239.39.99
216.239.53.104
216.239.53.106
216.239.53.107
216.239.53.99
216.239.57.104
216.239.57.105
216.239.57.106
216.239.57.107
216.239.57.147
216.239.57.98
216.239.57.99
216.239.59.104
216.239.59.105
216.239.59.106
216.239.59.107
216.239.59.147
216.239.59.99
216.239.63.104
216.239.63.99
64.233.161.104
64.233.161.105
64.233.161.106
64.233.161.107
64.233.161.147
64.233.161.99
64.233.167.104
64.233.167.105
64.233.167.106
64.233.167.107
64.233.167.147
64.233.167.99
64.233.171.104
64.233.171.105
64.233.171.106
64.233.171.107
64.233.171.147
64.233.171.99
64.233.179.104
64.233.179.106
64.233.179.107
64.233.179.99
64.233.183.104
64.233.183.106
64.233.183.107
64.233.183.99
64.233.187.104
64.233.187.106
64.233.187.107
64.233.187.99
64.233.189.104
66.102.11.104
66.102.11.106
66.102.11.107
66.102.11.99
66.102.7.104
66.102.7.105
66.102.7.106
66.102.7.107
66.102.7.147
66.102.7.99
66.102.9.104
66.102.9.106
66.102.9.107
66.102.9.99

aixtal
03-25-2005, 01:37 PM
Great! Thanks. That makes 16 different C-classes:

7 216.239.57
6 66.102.7
6 64.233.171
6 64.233.167
6 64.233.161
6 216.239.59
6 216.239.37
4 66.102.9
4 66.102.11
4 64.233.187
4 64.233.183
4 64.233.179
4 216.239.53
4 216.239.39
2 216.239.63
1 64.233.189

Everyman
03-25-2005, 02:37 PM
Oops, I forgot one Class C:

64.233.185.104
64.233.185.106
64.233.185.107
64.233.185.99

aixtal
03-25-2005, 02:47 PM
Yes, I had that class in my list. That makes 78 servers and 17 classes. I just pinged them all. They all respond.

PhilC
03-25-2005, 05:22 PM
That's excellent Everyman!

Connie
03-25-2005, 11:55 PM
I just want to commend those of you who take the time to research these kind of results. Personally I would never do it. However, This thread explains a lot of abnormalities I have seen for a while.Most of my key words are pretty stable. But some take big jumps every few days. I only use the DP API so I don't know which data center is producing the results.

Where I see is the biggest changes are in regard to lower ranking Key words. Key words that are lower than page 3 change rank drastically.

Does that make sense in regard to this thread?

PhilC
03-26-2005, 12:00 AM
I think it was said ealier in the thread that lower down the serps is the best place to notice the changes. I think it was me that said it :D

It's like a pendulum. Near the top the movement is very little, but the lower down you go, the movement becomes greater. I look on pages 3 to 5. The differences there show up more, but they arern't too big, and are often on the same page.

Connie
03-26-2005, 01:44 AM
I think it was said ealier in the thread that lower down the serps is the best place to notice the changes. I think it was me that said it :D


I was not disagreeing. :) Just sharing my experience based on the API. In my mind what I see concurs with what you are seeing tracking all those data centers.

PhilC
03-26-2005, 01:53 AM
I didn't think you were disagreeing at all, Connie. It was a very good point, and I was agreeing with you :D

PhilC
03-26-2005, 02:13 AM
I've just looked at a completely different set of results for the first time in quite a while, and they are definitely showing at least 2 different indexes, so I've changed my mind about that. 35 of the 78 servers (almost half) are currently showing me an old Title from a long time ago.

Through the years people have reported seeing old results but it's hard to understand why old results would appear in the serps, and, to myself, I always questioned the accuracy of people's memories. But they are there for me right now. Where do the old results come from? Why aren't all the indexes up to date? Were they there yesterday? I don't know because I haven't checked that searchterm for a few weeks. What's the point of keeping old data?

Some of the current differences may be due to algo variations, but there are at least 2 different indexes of Titles, one of which is very old. And they are not confined to group A or B - they are in them both - AND they show various "of about" numbers.

I'd already begun to consider the possibility that searching specific servers wasn't any good at all, and those old Titles on so many servers, and across the groups makes me wonder about it even more. But before I get into it, is someone able to tell me if low level instant switching between servers (hardware) is possible - by changing something in a file somewhere, for instance.

Is it possible for Google to use the low levels of the Internet to instantly switch requests to an IP address to a different server; i.e. one server instantly replaces the other?


[added for better clarity (I hope)]

What I'm thinking is that, if Google wants to update the 66.102.9.104 server, for instance, they would want all requests to that server to go to another server, say, 64.233.167.104, while the update is being done. I'm assuming that they can change something so that all requests to 66.102.9.104 won't even reach that machine - instead they will go to 66.102.9.104. I'd like to know if that switch can be made instantly.

sootledir
03-26-2005, 07:31 AM
"GoogleWatching" is not the relaxing past-time it once was.

martinuboo
03-26-2005, 10:27 AM
<snip> The reason that I'm doubtful that checking the header will work is because I'm assuming that the redirect is not done in the way that we would do it, so I don't think that a redirecting header will be returned. For that to happen, the server that is being redirected from would need to send it, and I doubt that that's the way Google would do it. I've assumed that the redirect is deeper in the Internet system than that, and is done by the DNS system or something along those lines - zone files? (that stuff is beyond my current knowledge). They must surely have that kind a redirecting system in place so that they can close a server down when necessary without affecting any functioning of the overall system.I'm not too much of a server infrastructure person, but I think that all the redirects between DCs and servers are handled by Google's load balancing system at the application layer.

I did some searches (http://www.google.com/search?hl=en&q=Google+servers+load+balancing&btnG=Google+Search) and found a few things.

Cache (http://216.239.63.104/search?q=cache:YsH8riWIBmwJ:www.rankforsales.com/n-ae/208-seo-sep-02-03.html+Google+servers+load+balancing&hl=en) view of an article (couldn't get the page to load)

Power Point Presentation (http://www.cs.rochester.edu/~kshen/csc257-fall2004/lectures/lecture19-loadbalancing.pdf) (in pdf file) on load balancing with Google as an example

Case Study (http://cio.co.nz/cio.nsf/0/1A198D155248057CCC256DE100734B34?OpenDocument) about Google load balancing

I'm sure there are some folks here at SEW that can explain these things and my apologies to anyone that thinks this info too "obvious", but I found it interesting in regards to PhilC's observations and questions on how the G handles redirects.

Everyman
03-26-2005, 12:55 PM
Another piece of information that is missing from our analysis is whether all of the 78 IP address are used to serve results to the public.

The cache copy typically uses the static IP address in the URL. If you have a page in Google's cache that includes images, and the searcher clicked on the cache URL, then the images only from that page get picked up by the searcher from your site. They should show a referrer from Google's cache URL. The IP address in the referrer would be the IP that resolved for that user from www.google.com.

I don't allow any bots to show cache links for any of my pages, so I cannot do this sort of research. But for someone who has a lot of traffic, from all over the world, they may be able to pull out at least a few hundred of these referrers. Then they could list which IPs were the most popular.

If enough people did this sort of research, we could determine whether all 78 IPs were actually used. And, of course, discover any new IPs that come along.

PhilC
03-26-2005, 03:30 PM
I recently asked people to post the position of a particular page in the serps, to find out if Google was serving results from both group A and group B - and they were. But that's as close as I got to checking the DCs. Checking them all is a good idea, Everyman.

I don't allow caches of the main part of my site, and the image bandwidth is the second reason why I don't allow it, but I do have some suitable logs for gathering IP information. To gather the information, we could do with a research thread, just for posting the IPs. Otherwise the research could get sidetracked and buried by other posts in the thread.

Judging by that PDF document, it is possible for Google to switch servers with immediate effect. It means that all requests to the 66.102.9.104 server, for instance, could be sent to the 64.233.167.104 server, and we would never know. At least I think that's what I read. It seems to happen for load balancing, and, if it can happen for that, it can happen for other reasons.

What has been occuring to me is that we just don't know which IPs are being switched to which datacenters at any particular time. How can we know that the datacenter at the end of 64.233.167.104 today is the same datacenter that will be at the end of that IP address tomorrow, next week or next month? I don't think that there's any reason why Google should keep them the same all the time, and they could even switch them around just to confuse us - knowing that people do search the datacenters.

I've been assuming that, when a DC, or a set of DCs, switches from group A to group B, that I am receiving the results from other known DCs instead of the one I searched, so that the other DCs provide the results for more than one IP address. But that's just an assumption. Google could have more than one DC (bunch of computers) behind each IP address, and just switch between those for certain purposes, such as uopdating. (That could be how old serps sometimes show up - older serps being in secondary DCs.)

In other words, I'm wondering if following the DCs can really tell us much about anything. For instance, if we do the research and find that all known IP addresses are being used to deliver results to people, how do we know that we aren't just seeing the results from a smaller number of DCs, and that some of the IP addresses are being switched, while others are doubling up?

Btw, because of what I read via the links that sootledir posted (thanks sootledir - they were very useful), I've gone back to thinking of one IP address as a datacenter, and IPs in the same C block as "sets". I think they are the most realistic descriptions.

PhilC
03-26-2005, 07:07 PM
This is the sort of thing I can't figure if the DCs we query return the results.

Right now, there are 5 C sets in group B:- 179s, 185s, 39s, 161s, and 63s.

I'll use result "types" to indicate the different results; i.e. type 1s are the same results, type 2s are the same, but type 2 results are different to type 1 results.

In the 179s: 2 DCs return type 1 results, and 2 return type 2 results.

In the 185s: 2 DCs return type 1 results, and 2 return type 2 results.

In the 39s: 3 type 1s and 1 type 2.

In the 161s: 4 type 1s and 2 type 2s.

In the 63s: 2 type 1s.

It would make good sense to me if the same results type were in a whole C set, but the 2 types are scattered through the 5 C sets. Why?

On the whole, C set DCs do contain the same results, but why are there different types in those C sets? A change has occured, but why would it occur like that instead of in complete C sets?

I know I'm rambling, but it's things like that that are leading me to think that maybe we simply can't trust that any queries to specific IPs return results from that IP.

bobmutch
03-28-2005, 01:48 PM
I am getting reports on PR and BL changes on a number sites. It looks like a quirk to me. Anyone else seeing anything.


I am seeing BL updates on McDars Tool:
http://www.mcdar.net/q-check/datatool.asp

216.239.53.99
66.102.7.99
216.239.53.104
66.102.7.104
66.102.7.105
66.102.7.147

I am seeing a PR change here but I think it is a quirk.
http://www.seochat.com/?go=1&option=com_seotools&tool=9&url=http%3A%2F%2Fwww.worldtraveldirectory.com&submit=Check
and here also:
http://www.seochat.com/?go=1&option=com_seotools&tool=9&url=www.reedelsevier.nl&submit=Check
A few monitoring tools:
http://www.pageranktool.net/
http://www.mcdar.net/q-check/datatool.asp
http://www.seochat.com/seo-tools/future-pagerank/

PhilC
03-28-2005, 11:46 PM
It looks like I may be the only one left in this discussion, but it's ok - I like talking to myself :)

Until recently we've believed that, if we search a specific datacenter, then we'll get the results from that datacenter. In the last 6 weeks, I've become convinced that that isn't necessarily so. I explained why I believe that earlier in the thread. I am sure that most of the time, some DCs direct searches to other DCs, even when it's an IP that is being searched. In recent days, I wondered if the results for any searches to specific datacenters can be relied on as being from the actual datacenter, and I suggested that Google may use low level directing to direct searches anywhere, regardless of the IP that is searched.

I now want to add something else, and put another conjecture forward.

Higher up this page, I mentioned that 35 of the 78 DCs were showing an old Title for a particular search, even though the snippet was unnecessarily the page Description. I say unnecessarily because there is printable text on the page that would have suited. For those listings, the cache is up to date - with the current Title - and the latest spider date is only 2 days ago. In other words, only the Title is old, and it's at least 6 months old.

Since I posted that, I've seen the old Titles on as few as 10 DCs, and currently its on 65 of the 78 DCs - almost all of them. Some of that will be due to being directed to other DCs, but I can't see most of the 65 DCs being directed to just a few DCs, and I can't see that those 65 have been recently updated with old data. So I want to suggest a possibility.

Each DC consists of a bunch of computers. It's likely that, for each DC, there is a secondary backup DC - another bunch of computers. It's very likley (almost certain) that the various indexes are contained in dedicated computers within the DC, and it is quite likely that each secondary index can be switch in as needed. Remember that there are several indexes in the Google system.

Right now there are old Titles on most DCs. I'm sure that some of it is due to low level directing, but where did the old Titles come from in the first place? Unfortunately, I didn't check the particular search before a few days ago, so I don't know if they've been there all the time. Assuming they haven't been there all the time, I suggest that they are secondary indexes that have been switched in for some purpose - perhaps to update the primary ones. Dunno why the old ones wouldn't be updated before being switched in though.

If all this is anything like the reality, what it boils down to is that we cannot trust that the results from any search on any DC are correct for that DC. It means that those "dance" type tools can only give some sort of general overview at best, and it might have reliability implications for the allin*: searches. The simple way that we've thought of the DCs up to recently may not be anything like the reality.

Anyone got any thoughts?

bobmutch
03-28-2005, 11:59 PM
Looks like the PR changes are a quirk and the BL changes are left over from last BL update. I guess I am easy to get excited : )

PhilC
03-29-2005, 12:12 AM
Other people are reporting PR changes on some DCs. One person's domain is showing PR8 on some DCs, and he says that the site has never had that before; i.e. it's new.

composer
03-29-2005, 12:42 AM
I believe DC's using cache for normal searches (may be more than one cache level; eg: first cache, second cache etc.)

indexes, cache indexes and search frequency is important factor on a search engine's performance, I thing. But not possible to know without abuse to SE (testing with massive searches)

Say; If I make massive searches with my keyword phrases, is it possible to my site (and similar sites) positioning going up? I dont know and nobody knows without testing; but nobody wants going down to sandbox, during months or years...

You know, breaking this cache indexes is possible with "-ksdjfsdf" nonsense keywords.

and we don't make anything on this multi layered cache system I thing.

Mel
03-29-2005, 01:59 AM
Each DC consists of a bunch of computers. It's likely that, for each DC, there is a secondary backup DC - another bunch of computers. It's very likley (almost certain) that the various indexes are contained in dedicated computers within the DC, and it is quite likely that each secondary index can be switch in as needed. Remember that there are several indexes in the Google system.

I will try to dig up the article URL (actually it was a video of a google engineers presentation at Washington State University) where it was explained that Google breaks up the indexes into Shards with each Shard being stored on one PC. The location and number of duplicate Shards are indexed by Pagerank with higher PR shards having more copies on other computers etc.

This is by way of explanation of why I don't feel that there is necessarily a seperate backup index.

I realize that this may not make it any easier to understand.

Frank Kilkelly
03-29-2005, 05:38 AM
I will try to dig up the article URL (actually it was a video of a google engineers presentation at Washington State University) where it was explained that Google breaks up the indexes into Shards with each Shard being stored on one PC.

Mel, I think this is the URL you mentioned http://www.uwtv.org/programs/displayevent.asp?rid=2459 :)

Mel
03-29-2005, 10:45 AM
Yes thats the one thanks Frank.

Its not really all that reveailing but it does have a few snippets that were interesting to me.

martinuboo
03-29-2005, 10:49 AM
Excellent video! I had heard/seen lots of that information before, but never all in one place. Thanks!

Unfortunately, I think this video might raise more questions than it answers. Chards, Chunk Servers and all the replication of data is "food for thought" (and even more educated guesses). PhilC you may think you are writing to yourself in this thread, but I'm sure the are many others, like myself, following this thread with great interest.

I have seen the subsets of data that you (and others) have documented. Depending on which tool I use and which type of query I make, I see different "clusters" of IP addresses. Have we decided on the "correct terminology" for the IP addresses....datacenters, servers, indexes?

I only started really paying attention to the multiple datacenter (Dance) tools, during the February 03, 2005 update, to keep my mind and fingers busy, so that I wouldn't start trying to tweak my site up from the abyss (top 40 to below #800). I'm glad that I didn't tweak anything during the update, because on or about 03/04/05, it came back better than before (top 5).

Thanks again for the link to that video!

martin

PhilC
03-29-2005, 11:07 AM
It's an interesting video - thanks.

Splitting the indexes into what they call shards is necessary because they won't fit in single machines. It has the massive advantage of allowing parallel searches of an index. Splitting an index across multiple machines is effectively the same as having the index in one, multi-processor, machine, except that it allows the switching in of partial indexes.

Having more copies of pages with higher PRs is different though, but I don't think it matters to this discussion. I think what matters is that they do have copies of the indexes, and they can be switched in as desired. Maybe that's where old listings come from.

What's come out of this for me is that the returned results, when searching a specific datacenter, can't be considered to be the results from the searched DC. They may come from a different DC, and they may be just one possible set of results from the data stored in whichever DC supplied them.

I don't think it makes much difference to us though. Maybe it will occassionally provide a reason why Google appears to be doing strange things. It probably accounts for the everflux, and the occassional appearance of old serps, for instance. But it does mean that there isn't one Google that provides one set of rankings, and all the datacenters agree. Those days appear to have gone, if they ever existed.

PhilC
03-29-2005, 11:19 AM
Have we decided on the "correct terminology" for the IP addresses....datacenters, servers, indexes? :D

I was temporarily sidetracked with the terminology. By "datacenter", people really mean an IP address. At the IP address there's a webserver up front and a bunch of computers that store and process the data behind it. At least that's what we think it is, but I sometimes wonder if there is just one bunch of computers for each C block.

Personally I think of the datacenters on the same C block as "sets", and I think of a "group" as those datacenters that display very similar results.

"The index" is a generic term that means the whole data storage system. But different data is actually stored in different indexes.

excell
03-29-2005, 11:43 AM
sidenote for sootledir:"GoogleWatching" is not the relaxing past-time it once was. Yup - better, just to ignore it (leave the endless figuring up to the "maths" folks) and get on with what you know best ;)

bobmutch
03-29-2005, 04:20 PM
PhilC: I am getting reports in Holland that there are a number of sites changing.

PhilC
03-29-2005, 04:41 PM
It seems to be a bit small scale, and, if it weren't for that *new* PR8 showing, I'd be tempted to think that it's due to switches in the indexes, but I've got that particular topic on the brain at the moment :confused:

strategicrankings
03-29-2005, 04:55 PM
I am getting reports on PR and BL changes on a number sites. It looks like a quirk to me. Anyone else seeing anything.



Mentioned that
here (http://forums.searchenginewatch.com/showpost.php?p=40906&postcount=46)
and have been seeing it for around one or two hours only everyday since.

Everyman
03-29-2005, 08:09 PM
Crawlers often pick up referral logs. By doing a search on Yahoo for each of these IP addresses, these are the total counts I get. This suggests that many of the addresses below are perhaps used internally and do not serve results to the public. Most of the very low counts are only from forum posts that list Google data centers!

216.239.37.104 - 123,000
216.239.37.105 - 140
216.239.37.106 - 79
216.239.37.107 - 144
216.239.37.147 - 706
216.239.37.99 - 11,600
216.239.39.104 - 143,000
216.239.39.106 - 70
216.239.39.107 - 82
216.239.39.99 - 1,690
216.239.53.104 - 38,200
216.239.53.106 - 53
216.239.53.107 - 102
216.239.53.99 - 1,320
216.239.57.104 - 60,300
216.239.57.105 - 644
216.239.57.106 - 100
216.239.57.107 - 548
216.239.57.147 - 730
216.239.57.98 - 694
216.239.57.99 - 2,310
216.239.59.104 - 84,900
216.239.59.105 - 193
216.239.59.106 - 11
216.239.59.107 - 84
216.239.59.147 - 697
216.239.59.99 - 1,730
216.239.63.104 - 20,300
216.239.63.99 - 1
64.233.161.104 - 46,000
64.233.161.105 - 842
64.233.161.106 - 86
64.233.161.107 - 24
64.233.161.147 - 111
64.233.161.99 - 1,010
64.233.167.104 - 43,500
64.233.167.105 - 2
64.233.167.106 - 2
64.233.167.107 - 2
64.233.167.147 - 65
64.233.167.99 - 1,030
64.233.171.104 - 729
64.233.171.105 - 40
64.233.171.106 - 2
64.233.171.107 - 28
64.233.171.147 - 84
64.233.171.99 - 620
64.233.179.104 - 21,700
64.233.179.106 - 4
64.233.179.107 - 5
64.233.179.99 - 576
64.233.183.104 - 28,600
64.233.183.106 - 1
64.233.183.107 - 13
64.233.183.99 - 567
64.233.185.104 - 607
64.233.185.106 - 3
64.233.185.107 - 5
64.233.185.99 - 552
64.233.187.104 - 31,900
64.233.187.106 - 3
64.233.187.107 - 3
64.233.187.99 - 716
64.233.189.104 - 717
66.102.11.104 - 45,400
66.102.11.106 - 95
66.102.11.107 - 87
66.102.11.99 - 1,550
66.102.7.104 - 49,300
66.102.7.105 - 138
66.102.7.106 - 14
66.102.7.107 - 13
66.102.7.147 - 688
66.102.7.99 - 1,060
66.102.9.104 - 62,400
66.102.9.106 - 1
66.102.9.107 - 17
66.102.9.99 - 688

PhilC
03-29-2005, 10:18 PM
Excellent stuff, Everyman!!!

The 104s get the heaviest useage in each set, and those website statistics files show them for cache and translation use, but we can probably assume that the searches themselves were also done by the 104s. A quick glance at some serps for other IPs in the same C blocks doesn't show any website statistics pages.

So it looks very much like the 104s provide the serps. It agrees with what we know about the C block sets - every set has a 104 - even the 1 DC set (189) is a 104.

The next up are the 99s. All sets that have more than one IP have a 99, including the 2 DC (63) set. But the first few results pages don't show any website statistics pages, so it looks as though the 99s don't provide the results.

Right now the 161s are showing me one set of results on 99 and 104, and a different set of results on the other 4 DCs, but I'd put that down to being directed someplace else for those 4. It's very common for the 161s to do that.

I haven't examined the serps very far, but my thinking now is towards the idea that if we do a normal search on a specific DC, then it will be treated as a search on the 104 of that set, and the 104 will return the results - or it will be directed to the 104 of another set. And, for the serps, each C block set is the datacenter, and there isn't a different datacenter for the serps behind each IP address. I love this stuff! :)

If that's right, it makes me wonder what the other IPs are used for. Google has quite a lot of different things on the go, and there may be dedicated datacenters (small or large) behind each of the other IPs. I think a comprehensive examination of the pages in the results for those other IPs is in order.

Excellent thinking Everyman!!!

chachi
03-29-2005, 11:24 PM
Phil, I hope you are able to sleep at night. :) Just read through this whole thread and thankfully someone posted that video from UW for you. You may also want to check out this video (http://edcorner.stanford.edu/) of Larry Page (Google founder) speaking at Stanford. It is not as technical as the UW vid, but it touches on some subjects and why he and his partnerwent into the search business themselves. The reason is not what you would think. Grab a bucket o' corn and watch.

I have to agree with JohnW here. I have seen a number of sites do some crazy things over the past 2 months. And, now that I have seen many of them return to "normal" it appears to me that Google may have just been cleaning house more than they were implementing/testing a new algo(s). I have seen a number of sites I control lose 30-50% of their page count using the G API and site: operator. I think the page count and internal links are largely responsible for the fluctuations and changes we have seen. It would make sense to me that they would be rotating these "cleaned" sets of data into and out of their clusters however they see fit. It doesn't seem like people have been talking about the number of pages Google had for a particular site in question over this period, I have this data somewhere and I will post it here just for kicks if someone wants to look at it.

PhilC
03-30-2005, 12:58 AM
I do have sleep problems, chachi, but it's nothing to do with this stuff :)

I enjoy this stuff even though none of will make much of a difference to us. But it's interesting to find that there appears to be no point in looking for rankings in any IPs other than the 104s, because the others aren't providing us with any information of their own. It's also interesting to realise things like the results that we receive from a specific DC aren't necessarily returned by that DC, and Google doesn't have complete indexes, but each index is made up of chunks called shards, that may or may not be part of an index at the time of the query.

I know that this thread sort of evolved from a sudden change in rankings to discussing the datacenters, but we wouldn't have that stuff if we hadn't gone through it all.

But about those ranking changes....

I'm currently seeing that old Title on all but 8 IPs ;)

Everyman
03-30-2005, 02:55 AM
I discovered 4 additional Class C blocks, but these look like they're waiting for future development. Those PhDs seem to like patterns -- the third quad is always an odd number (I tried some adjacent even numbers and they never work). Maybe the even numbers in the third quad are always used for ethernet connections.

64.233.163.104 - 2
64.233.163.106 - 0
64.233.163.107 - 0
64.233.163.99 - 0

66.249.81.104 - 4
66.249.81.99 - 1

66.249.85.104 - 7
66.249.85.99 - 4

66.249.87.104 - 2
66.249.87.99 - 0

PhilC
03-30-2005, 11:49 AM
Good find, Everyman! I've been examining a 1.27 gig log file from one of my sites, and none of those new IPs are in it, so it looks like they are unused for us at the moment.

In examining my log I found things pretty much the same as you found them in the searches. 104s all over the place and not much else. I did find that the 37s and 39s are dedicated to translations, so there doesn't seem to be much point in checking the rankings on those.

I'm rethinking those 104s though. All the log file entries are as referers, of course, and all of them are for images when people are looking at the cache or when Google is translating a page for them. In both cases, the images are acquired from the site, and that's how the IPs appear as referers. There are a few log entries when people search for an IP address that is on a page (like in this thread), but they are not of any interest.

What I'm wondering is perhaps Google uses other IPs than the 104s for normal searches, and uses the 104s for caches and translations. And maybe that is why the referers are full of 104s. It could be that the 104s are no good for checking rankings either, because they are dedicated to caches and translations.

Unfortunately, I only have one site that is suitable for testing it with, and it will take a little time, so if anyone can test that by searching in a non-104 IP address, then clicking the "cache" link without clicking through to the site itself, and then checking what is in the log file, it would help. The browser's temporary internet files would need to be deleted first to make sure that the cache gets the images from the site and that they are not displayed from the computer's caches.

Everyman
03-30-2005, 12:44 PM
What I'm wondering is perhaps Google uses other IPs than the 104s for normal searches, and uses the 104s for caches and translations.
No, they use 104, 147, and 99 for normal searches. I agree that the translations seem to be dedicated to 37 and 39.

Here is a good snapshot of major U.S. providers. We need something equivalent for the rest of the world.

Cached results from major Internet providers (http://www.dnsstuff.com/tools/ispdns.ch?name=www.google.com&type=A):

64.233.161.104 - 13 cached results
64.233.161.147 - 13 cached results
64.233.161.99 - 13 cached results

64.233.167.104 - 6 cached results
64.233.167.147 - 6 cached results
64.233.167.99 - 6 cached results

64.233.187.104 - 5 cached results
64.233.187.99 - 5 cached results

66.102.7.104 - 5 cached results
66.102.7.147 - 5 cached results
66.102.7.99 - 5 cached results

lots0
03-30-2005, 01:02 PM
http://www.l.google.com/search?hl=en&q=seo+forum&btnG=Google+Searchhttp://www.l.google.com/search?hl=en&q=seo+forum&btnG=Google+Search
interesting subdomain format.

Everyman
03-30-2005, 02:31 PM
The "l" in www.l.google.com probably stands for "load." I'm no DNS guru, but it appears that the plain www.google.com gets referred to a www.l.google.com for resolution. This probably gives them more control over the load balancing. Notice that all the TTLs are quite short -- a matter of minutes, not hours.

I tried to resolve the plain google.com with no subdomain in front from three different locations in the U.S., and all three times I got these three addresses: 216.239.37.99, 216.239.39.99, 216.239.57.99

Since many folks are too lazy to key in the "www" in front, I think it's likely that these three serve general search results to the public, and not translation results.

We need a DNS wizard to spend an hour or two digging into Google's DNS setup.

PhilC
03-30-2005, 04:46 PM
Out of almost 13 thousand log entries with Google IP referers, I have 11 referrals from 99s (all plain searches), and none from 147s. Almost all of the rest are cache referrals from 104s. The site is used by people all over the world.

If Google is using 99s, 104s, and 147s for normal searches, and if the cache uses the normal search IPs, I would expect to have some referrals from the all three, but I don't have any from the 99s or from the 147s. Every one of the 10,457 of them is from a 104.

It looks to me like the 104s are dedicated to the cache searches. They could be used for normal searches as well, but I don't see why only they would be used for the cache and not the others.

lots0
04-01-2005, 04:21 AM
The "l" in www.l.google.com probably stands for "load."
thats what I figured too.

networking all those PCs all over the world and then load balancing all the massive traffic, it boggles the mind.

Kudos to the boys at the "Plex".