PDA

View Full Version : images missing in google cache


janemc
06-26-2005, 06:53 AM
Our site is visited by google bot regularly, recently every 4 or 5 days, but the google cache is always missing the images
is this happening to anyone else?
ProvenceBeyond (http://www.provencebeyond.com/)
or
Beyond.fr (http://www.beyond.fr)
(same physical site, 2 URLs)

thanks for any comments/suggestions anyone might have

PhilC
06-27-2005, 01:54 PM
You're lucky. In the cache, Google doesn't display a page's images from their own resources; they leave the page to load them from the website itself, which means that they display your page in their site, and use your bandwidth to do it. Personally, I greatly object to that.

janemc
06-28-2005, 05:32 AM
Thanks very much Phil.
We use relative addressing for all our graphics - important for working on our multi-thousand page web locally. I think maybe we`ll now use absolute addressing for our home page images so Google users can see them. Actually, I wonder how many "normal people" actually look at the G cache? Maybe it's not very important. Do you have an opinion on this?

PhilC
06-28-2005, 06:25 AM
Google doesn't have a problem seeing the graphics with relative URLs. It's the browser that can't get them when in the cache because the URLs are relative to the Google site, and they don't exist there, of course.

I've no idea if people use the cache a lot or not, but I imagine they do.

janemc
06-28-2005, 06:49 AM
I appreciate your time, Phil, and don't want to bother you too much, but would appreciate one more clarification about this:
Does Google do this with all webs, or do they keep only some in their system?
Or perhaps this means that the majority of web masters use absolute links for their images?

PhilC
06-28-2005, 07:16 AM
Actually, it's peculiar. Google isn't preventing the images from showing in any way that I can see, and yet they didn't show for me when viewing the cache of the site. Google's cache page consists of some of their own html, followed by the page's html. At the top of Google's chunk of html, they put the line:-
<BASE HREF="http://www.beyond.fr/">which means that all relative URLs on the page are relative to that base URL. The combined URLs are correct, and the images should display.

But I looked at the cache and no images were displayed. So I got an image in the browser, and then it displayed in the cache. I went to the page itself, all the images were displayed, I went to the cache page, and all the images displayed. They now display on the cache page for me because the combined URLs are correct, and they are in my browser's cache, so they weren't requested again.

I can't see any reason why the images don't display in Google's cache (when the user doesn't already have them in the browser's cache). The only thing I can think of is that there is something in the back end of your site that deals with file requests differently to normal when the request's referrer is a Google page. I.e. the difference between me getting your images by typing the page's URL (or image URL) into my browser, and me getting your images by viewing Google's cache of your page, is the referrer. Note that Google does not request the images for the cache - the browser requests them, and the Google cache page is merely the referring page - the referrer.

You said that it's only been happening for a few days. Were any technical changes made to the site a few days ago, such as some sort of Google tracking? Maybe it's been happening longer, but you only noticed it a few days ago.

janemc
06-28-2005, 08:13 AM
You said that it's only been happening for a few days. Were any technical changes made to the site a few days ago, such as some sort of Google tracking? Maybe it's been happening longer, but you only noticed it a few days ago.
Actually I noticed it mid May, but do not know when it started. We have not consciously put any special code in the site for any specific bots or referrals.

PhilC
06-28-2005, 08:25 AM
I forgot to mention that I even copied the source code of Google's cache page, saved it as an html file, opened it in a browser, and all the images loaded - and that was without them having been stored in my browser's cache. So it has to be something specific to the referring page - that's not spider checking.

Maybe your server administrator has added something that blocks requests made by Google cache pages. Blocking the referrer somewhere is the only thing can think of that could account for what's happening.

janemc
06-28-2005, 08:33 AM
Maybe your server administrator has added something that blocks requests made by Google cache pages. Blocking the referrer somewhere is the only thing can think of that could account for what's happening.

Good idea. I'll check with them and let you know as soon as I get their response. Thanks again for all this help, Phil.

PhilC
06-28-2005, 08:57 AM
I'm very interested to know the result, so don't forget :)

Just as an aside, I strongly object to search engine caches for 2 reasons. One is that they do not have permission to reproduce other people's entire pages on their own site, but they do it, and I'm sure it must be illegal. The other reason is that, not only do they reproduce other people's entire pages within their site, but they have the nerve to cause the other people's bandwidth to be used in doing it - those graphics.

I would honestly like to see all servers block all file requests from all search engine cache pages, and I would also like to see all webpages use the meta tags to disallow search engine caches. But that's just me and my little rant.

janemc
06-28-2005, 10:08 AM
I'm very interested to know the result, so don't forget :)

I solved the problem thanks to the lead you gave me when talking about something happening on the server level. As an oldtime content-heavy web, we have always had a a lot of theft (lifting!) of both text and images from our web. A lot of deeplinking straight to our images for example. In order to try to cut down on this, we had put up instructions in the htaccess file to only let certain domaine names link to the images. We had not specificed the google, and therefore our images were not accessible to our page in their cache. We've just altered the htaccess file, and everything is fine now.
However, we still need to find solutions to the "content borrowing" that hits us so much. Do you have experience in this? But in the meantime, thanks very much for the tip that led us to the answer.
:)

PhilC
06-28-2005, 10:13 AM
Excellent! It's good when a bit of detective work pays off :D

What's the content borrowing problem that you have? It sounds like it's worth another thread.

janemc
06-28-2005, 12:03 PM
Good idea, Phil. I'm new here, and not sure where to start that thread. It's not google specific, obviously. I looked over the different forums and did not find a logical place. Do you have a suggestion?

PhilC
06-28-2005, 01:28 PM
You're right, Jane. This being a search engine type forum, there aren't any sub-forums for general website stuff. I think the best place would be the Padded Room, which is sort of like a lobby for discussing any other subject.