Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Google > Google Web Search
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 05-16-2006   #1
traian
Member
 
Join Date: Sep 2004
Posts: 187
traian is on a distinguished road
Amazing screenshots from Google SERPs!

Hi Danny,


In one thread regarding a question about the moderators, more exactly were they are, and why they are not responding to questions related to the latest drop-offs. Your answer sound something like :"Now when Google makes an algo shift, I think it tends to hit the third-party sites, if you will, harder.".

Well, I have some screenshots to show Top Sites that are hit by this issue, content rich like wikipedia and about that are suffering incredible drop offs.

Doing a site:www.yahoo.com on Google will show you the amazing number of 612 pages indexed by google from www.yahoo.com

The same for site:www.about.com - man, 195 pages indexed.

I told you, this is amazing. For unbelievers and for further references I have saved this screenshots here. Check them all, that's worthy!

Observation: Doing the site:yahoo.com on google returns 395,000,000 milion pages. That's it. Cannonical issues at Google.


Matt,

Is that correct? How long this will take? How much did this affected the Google awarness, and most of all, can you deal with it?

Thanks,
Traian
traian is offline   Reply With Quote
Old 05-16-2006   #2
traian
Member
 
Join Date: Sep 2004
Posts: 187
traian is on a distinguished road
Yet another interesting discovery!

Check this out.

Do a site:www.yahoo.com on google. I have magnified the screenschot of the SERP in order to see better the results provided by google. You will have the following results .

The first three results are all www.yahoo.com/ insn't it?

Well, don't be so sure about that. Do not hurry up!

The first result is indeed www.yahoo.com with an cached page since 14 may 2006. I will save this one to my hdd .

Take a look at the second result, but pay attention to the spelling(more exactly the ASCII charachers used) in yahoo . Isn't that great? It is not the normal yahoo but is www.y%EF%BD%81h%EF%BD%8Fo.com. Interesting, not? Now go further and check the cached page :This is G o o g l e's cache of http://www.yahoo.com/ as retrieved on 18 Oct 2005 23:05:37 GMT.. Nice. I'll save this one too.

What will be the next one
At the third result pay attention to the letter "m" from the .com. Is not the letter "m" itself but is the ASCII code %EF%BD%8D. The cached page does not exist for this one.

After the second result I just thought that is yahoo that have registered somehow the weird version of yahoo. But, after I saw the third result, the letter "m" from the .com I realized that this TLD doesn't exist at all. So, it must be a Google internal problem.

Thoughts?

Thanks,
Traian
traian is offline   Reply With Quote
Old 05-16-2006   #3
ReSiever
Member
 
Join Date: Jun 2005
Posts: 27
ReSiever is on a distinguished road
Punicode in domainnames... ?
ReSiever is offline   Reply With Quote
Old 05-16-2006   #4
Chris_D
 
Chris_D's Avatar
 
Join Date: Jun 2004
Location: Sydney Australia
Posts: 1,099
Chris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud of
Hi Traian

You are 'asking the wrong question'.

Quote:
The word "site" followed by a colon enables you to restrict your search to a specific site.
You are asking for all the pages indexed from www.yahoo.com - and you are getting them. That's NOT where the content is at Yahoo! or About.com

The command is returning the correct result - its NOT a cannonical URL issue.

hostname.domainname is the format - www is a hostname. The hostname can be anything you like - or you can have none at all.

Yahoo and many other sites & portals choose hostnames like finance or cars or ftp or whatever - they are different hostnames.

Most Yahoo! content is on other host names - finance.yahoo.com, news.yahoo movies.yahoo etc. etc.

Try this: http://www.google.com/search?hl=en&l...te%3Ayahoo.com

Quote:
Results 1 - 10 of about 361,000,000 from yahoo.com
Same for About.com

http://www.google.com/search?hl=en&l...om&btnG=Search

Quote:
Results 1 - 10 of about 58,000,000 from about.com
now try http://www.google.com/search?hl=en&q...ance.yahoo.com

ninemsn here in Australia doesn't use the www hostname at all - its homepage is http://ninemsn.com.au/

So how many pages are indexed under www.ninemsn.com.au? http://www.google.com/search?hl=en&l...ninemsn.com.au

Now look at http://www.google.com/search?hl=en&l...&btnG=Sear ch

Its not a problem - its the correct answer to the question you asked. The issue is the question.
Chris_D is offline   Reply With Quote
Old 05-16-2006   #5
traian
Member
 
Join Date: Sep 2004
Posts: 187
traian is on a distinguished road
Quote:
Originally Posted by Chris_D
hostname.domainname is the format - www is a hostname. The hostname can be anything you like - or you can have none at all.

Yahoo and many other sites & portals choose hostnames like finance or cars or ftp or whatever - they are different hostnames.
I know that very well, but I was refering to the specific, www.abc.com domain, not to all subdomains of www.abc.com, like finance.abc.com.

Like I said before, google has index now (4 hours later than the first post) even less pages for www.yahoo.com (590 instead of 617), and let's be serious, ebay has a lot on content on subdirectories and not just 2 relevant pages in google index.

The way I see it is that google is trying to resolve the canonical issues to eliminate the problems that appeared between pages like abc.com and abc.com. You know very well that the same page for users:
http://abc.com
http://abc.com/
http://abc.com/index.html
http://www.abc.com

and even more are different from SEs POW.

But I might be wrong, so, please correct me if so.

Traian
traian is offline   Reply With Quote
Old 05-17-2006   #6
traian
Member
 
Join Date: Sep 2004
Posts: 187
traian is on a distinguished road
Quote:
Originally Posted by Chris_D
Its not a problem - its the correct answer to the question you asked. The issue is the question.
Yes it is. Read today Matt's blog and you will have some answers there.

Good luck,
Traian
traian is offline   Reply With Quote
Old 05-17-2006   #7
Chris_D
 
Chris_D's Avatar
 
Join Date: Jun 2004
Location: Sydney Australia
Posts: 1,099
Chris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud of
Quote:
I was refering to the specific, www.abc.com domain, not to all subdomains of www.abc.com, like finance.abc.com
traian,

The DOMAIN is abc.com (without a hostname)

The host name is www or finance or ftp or whatever.

Can you post a specific link where Matt says there is an issue with the site: command?

Alternatively, how many pages can you find at www.ninemsn.com.au? (remembering it is really http://ninemsn.com.au or http://news.ninemsn.com.au etc.)

Well - Google finds one http://www.google.com/search?hl=en&l...nemsn.com .au

MSN finds one: http://search.msn.com/results.aspx?q....com&FORM=QBRE

Yahoo finds more http://search.yahoo.com/search?p=sit...&cop=&ei=UTF-8 - but if you click on the serp links you'll see what they resolve to....
Chris_D is offline   Reply With Quote
Old 05-17-2006   #8
traian
Member
 
Join Date: Sep 2004
Posts: 187
traian is on a distinguished road
Quote:
The team refreshing our supplemental results checked out feedback, and on May 5th they discovered that a “site:” query didn’t return supplemental results. I think that they had a fix out for that the same day. Later, they noticed that a difference in the parser meant that site: queries didn’t work with hyphenated domains. I believe they got a quick fix out soon afterwards, with a full fix for site: queries on hyphenated domains in supplemental results expected this week.
...from Mat Cutt's blog found here: http://www.mattcutts.com/blog/indexing-timeline/

Anyway if you haven't read yet the Matt's latest blogs, we will see that he admited they have some fixes(I suppose they fixed bugs not pipes ).

In this post I was reffering to the number of yahoo pages indexed by google.com and yahoo.com using the equivalent command for each:
site:www.yahoo.com on google - 610 results and
site:www.yahoo.com on yahoo - 349.000 results

Still for the strange ASCII in the domain names, no one can give an answer?

Cheers,
Traian
traian is offline   Reply With Quote
Old 05-17-2006   #9
Chris_D
 
Chris_D's Avatar
 
Join Date: Jun 2004
Location: Sydney Australia
Posts: 1,099
Chris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud of
Quote:
site: use to find all documents within a particular domain and all it's subdomains.
(my bold)
http://help.yahoo.com/help/us/ysearc...basics-04.html

Quote:
If you include [site:] in your query, Google will restrict the results to those websites in the given domain. For instance, [help site:www.google.com] will find pages about help within www.google.com.
http://www.google.com/help/operators.html
Chris_D is offline   Reply With Quote
Old 05-17-2006   #10
jZillan
Hopeless Semantic
 
Join Date: May 2006
Location: Pacific Palisades, Ca
Posts: 8
jZillan is on a distinguished road
new version of Google cache

Quote:
Originally Posted by traian
At the third result pay attention to the letter "m" from the .com. Is not the letter "m" itself but is the ASCII code %EF%BD%8D. The cached page does not exist for this one.

After the second result I just thought that is yahoo that have registered somehow the weird version of yahoo. But, after I saw the third result, the letter "m" from the .com I realized that this TLD doesn't exist at all. So, it must be a Google internal problem.

Thoughts?

Thanks,
Traian
Traian,

Your absolutely correct. I believe this is just one side effect of Google's new crawl caching proxy.

Test out the following search method in Google:

1. Go to Google datacenter http://66.249.93.104
2. Type in www.myspace.com/ site:www.myspace.com or click here
3. Hit the cache
4. Compare cache version with another Google datacenter
5. Compare cache version with 2nd search result
6. Voila! 3 different variations of Google Cache

Now keep in mind that after the Big Daddy update..Google's caching method infrastructure completely changed. They now use what is called a crawl caching proxy, read more here

Here is a quick screenshot I took this morning.

Also...

What's also interesting is that Matt Cutts and GoogleGuy mention that they're refreshing supplemental results, check this out: site:www.searchenginewatch.com

now try it without the www site:searchenginewatch.com

P.S. Here is a 4th version of Google's new cache

-jzillan

Last edited by jZillan : 05-17-2006 at 02:09 PM.
jZillan is offline   Reply With Quote
Old 05-17-2006   #11
simons1321
Member
 
Join Date: Nov 2005
Location: Dallas, TX
Posts: 100
simons1321 is on a distinguished road
Quote:
Originally Posted by jZillan
1. Go to Google datacenter http://66.249.93.104
2. Type in www.myspace.com/ site:www.myspace.com or click here
3. Hit the cache
4. Compare cache version with another Google datacenter
5. Compare cache version with 2nd search result
6. Voila! 3 different variations of Google Cache
#3 and #4 are identical from where i'm sitting. Identical cache date, identical cache time, identical web page. #5 referst to the cache of a completely different page from #3 and #4, so it should be expected that it is different.

also,
the www version of SEW is not used by SEW, so i dont know what you're seeing or saw, but that site: search returns exactly what it should... nadda

and your 4th version of G's cache also refers to a completely different page, not to mention completely different domain.
simons1321 is offline   Reply With Quote
Old 05-17-2006   #12
jZillan
Hopeless Semantic
 
Join Date: May 2006
Location: Pacific Palisades, Ca
Posts: 8
jZillan is on a distinguished road
Apparently we're not on the same page simons1321, twice now.

Look closer at the cached pages. They're different versions of cache. You can see that they are testing different "variations" of the cache. The font type change should be sufficient enough.

Anyway, these caching modifications bode well for Google's recent Big Daddy update In that their entire caching infrastructure has changed. Meaning they're no longer caching a site like they used to. I read somewhere that if you could illustrate the BD update...it's like putting a whole new engine in a car. Not just a paint job or in our world...an algorithm change.

To answer #4 - This was something I noticed separately from my analysis today....again a different variation or font type change.

Finally, to answer your question about the www vs non www for SEW. This was to provide extra clarity for what was said by not only GoogleGuy but Matt Cutts this past week. Supplemental results...Supplemental Results...one more time....Supplemental Results. Ok, the supplemental results in Google have been recently refreshed and should account for one of the reasons webmasters are noticing a big drop in the index count . To paraphrase what GoogleGuy posted last week....this probably isn't something you will notice as far as the quality or quantity of traffic is concerned. However, if you frequently check your index numbers in Google (which I do)....then you will have noticed deprecated results or a drop in amount of pages indexed lately.

-jZillan
jZillan is offline   Reply With Quote
Old 05-18-2006   #13
traian
Member
 
Join Date: Sep 2004
Posts: 187
traian is on a distinguished road
Well,

I've always suspected Google to have multiple versions(and a history also) of webpages. Now, I'am sure. No doubt about that.

And now introducing the proxy cache(the famous mediabot infact will do that), hmm....

My guess is that google is trying to compare different versions of duplicate(or almost duplicate content) and to figure it out which was the first version of the content, and keep those websites relevant for some queries, and decrease the affiliate versions(that sometime rank higher than the content provider) or content scrapers.

Keep in mind that Matt told to affiliates to "add value" to their websites. So you must if you are an affiliate.

Good luck,
Traian
traian is offline   Reply With Quote
Old 05-22-2006   #14
simons1321
Member
 
Join Date: Nov 2005
Location: Dallas, TX
Posts: 100
simons1321 is on a distinguished road
Quote:
Originally Posted by jZillan
Apparently we're not on the same page simons1321, twice now.

Look closer at the cached pages. They're different versions of cache. You can see that they are testing different "variations" of the cache. The font type change should be sufficient enough.

Anyway, these caching modifications bode well for Google's recent Big Daddy update In that their entire caching infrastructure has changed. Meaning they're no longer caching a site like they used to. I read somewhere that if you could illustrate the BD update...it's like putting a whole new engine in a car. Not just a paint job or in our world...an algorithm change.

To answer #4 - This was something I noticed separately from my analysis today....again a different variation or font type change.
I'm sorry, i just can't reproduce the results you're seeing.
simons1321 is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off