PDA

View Full Version : Caching Made Legal - Do You Agree? I Don't!


dannysullivan
01-26-2006, 03:38 PM
I blogged earlier (http://blog.searchenginewatch.com/blog/060126-075753) today about how a US court has ruled that Google's caching isn't a copyright violation, since there's an opt out ability. That means sites give an "implied license" to search engines to show cached pages, the court found.

I was pretty much OK with this, thinking of it from the same perspective of web crawling. I mean with web crawling, we could say that everything should be opt in. The problem is, many people would fail to explicitly put up a "come and get it" notice in their robots.txt files. And since most people want to be index, opt out seems a fair way to go.

With page caching, the court seemed to agree that Google (and other search engines) can't notify everyone out there that they will cache pages. That's why having an opt out mechanism seems to make sense.

On the Daily SearchCast today, I asked DaveN what he thought. He completely disagreed and thought caching should be opt in. I can see that view, especially in that unlike web indexing, a site owner doesn't get any particular benefit by having Google cache their pages.

Then over at WebmasterWorld, Brett's kicked off an excellent (http://www.webmasterworld.com/forum30/32936.htm) discussion, where he's against the ruling. He raises the issue that nocache is Google specific, rather than a standard (and the tidbit that Google seems to have added this because he asked them about it way back in 1999). There's also the spectre that Google might consider the tag its own intellectual property (pretty tough, since they explicit give versions to use for any spider) or that it breaks browsers.

Let's take the standard issue. It's actually pretty standard among search engines now:

Google
http://www.google.com/webmasters/remove.html#uncache
<META NAME="ROBOTS" CONTENT="NOARCHIVE">

Yahoo
http://help.yahoo.com/help/us/ysearch/crawling/crawling-02.html
<META NAME="robots" CONTENT="noarchive">

Ask
http://about.ask.com/docs/about/aj/teoma.htm#5
< META NAME = "ROBOTS" CONTENT = "NOARCHIVE" >

MSN
http://search.msn.com.my/docs/siteowner.aspx?t=SEARCH_WEBMASTER_REF_RestrictAcce ssToSite.htm
<META NAME="robots" CONTENT="noarchive" />

OK, MSN is kind of confusing, because they show this:

<META NAME="msnbot" CONTENT="noarchive" />

But I read the instructions to mean changing msnbot to robots will work for all spiders.

So we do have a standard, among the majors. But as someone else mentions in that WMW thread, it's not just the majors. I mean, potentially anyone could come along, spider a billion pages they show through a cache and say sorry, we're not violating copyright -- just use our nocache command. Here it is:

<meta name="robots" content="nocachingplease">

See the difference? They have their own nocache command -- it just doesn't match. The court opinion does make mention of this particular standard -- the actual noarchive command -- so perhaps someone would be skating on thin ice to have their own.

But then again, what if I don't want my page cached by some search engines but its OK in others. There's no "cache" command that I can use for the big four, so I've got to block all or none.

It's important to note that the person in this case did know all about the command:

Field concedes he was aware of these industry standard mechanisms, and knew that the presence of a “no archive” meta-tag on the pages of his Web site would have informed Google not to display “Cached” links to his pages. Despite this knowledge, Field chose not to include the no-archive meta-tag on the pages of his site. He did so, knowing that Google would interpret the absence of the meta-tag as permission to allow access to the pages via “Cached” links. Thus, with knowledge of how Google would use the copyrighted works he placed on those pages, and with knowledge that he could prevent such use, Field instead made a conscious decision to permit it. His conduct is reasonably interpreted as the grant of a license to Google for that use.

I'm actually a bit freaked by what comes up later in the argument. There's a separate issue from "implied license" over whether Google's reprinting (that's what it is, reprinting rather than "caching" in the traditional sense of loading into your browser" is OK under fair use. Yes, it is. And why? Because it adds new things. And those are?

1) Making it available if your site is down. Um. OK, I'm off to get copies of books that are no longer in print, and I'll reprint them myself, since they are no longer accessible. From the ruling:

First, Google’s cache functionality enables users to access content when the original page is inaccessible. The Internet is replete with references from academics, researchers, journalists, and site owners praising Google’s cache for this reason. In these circumstances, Google’s archival copy of a work obviously does not substitute for the original. Instead, Google’s “Cached” links allow users to locate and access information that is otherwise inaccessible.

2) You can detect if a page has changed, something the original page can't tell someone. Yes, that's true in some cases. It's not true in others, nor is it true over time. Nor does making copies of everything without permission for comparison reference purposes feel like fair use.

3) The pages show highlighted search terms. "By affording access to a page within its cache, Google enables users to determine whether and where the relevant language appears, and thus whether the page is truly germane to their inquiry."

Hmm. OK, you can do the same with an original page using the Google Toolbar. You don't need to have that. But this means anyone can copy your pages and just highlight a few things. And what about people who look at cached pages not through a search but through the toolbar? The pages aren't changed then.

4) I'll just quote here "Fourth, Google utilizes several design features to make clear that it does not intend a 'Cached' link of a page to substitute for a visit to the original page." Right, I'm off to copy some books. Don't worry, if they are your books. After someone buys them, I'll be sure the first page informs you that you really should go buy the original book.

5) The strongest point, that people can get out of the cache if they want to.

Buried elsewhere is this nugget:

Here there is no evidence of any market for Field’s works. Field makes the works available to the public for free in their entirety, and admits that he has never received any compensation from selling or licensing them. See Field Dep. at 132:10-17. There is likewise no evidence that by displaying “Cached” links for pages from Field’s site, Google had any impact on any potential market for those works.
More generally, there is no evidence before the Court of any market for licensing search engines the right to allow access to Web pages through “Cached” links, or evidence that one is likely to develop. “Cached” links are simply one way that search engines enable end-users to obtain information that site owners make freely available to the world. There is compelling evidence that site owners would not demand payment for this use of their works

In other words, this particular person wasn't trying to make money off their site, so what harm if Google reprinted. But the second paragraph is free. I may make information freely available to those who COME to my web site, but that's a far differnent thing than freely allowing others to REPRINT my content. And guess what? If I thought significant numbers of people were going to places that were reprinting me without permission, you bet I'd be demanding payment or issuing takedown requests.

Again, a commercial site or the proven impact might make a difference. From the court docs:

Because there is no evidence that Google’s “Cached” links had any impact on the potential market for Field’s copyrighted works, the fourth fair use factor weighs strongly in favor of a fair use determination.

I also like the part later on where keeping a cached page online for 14-20 days is deemed "transient storage." Hmm. Yes, then the page changes to another copy of the page -- maybe even the same page, if the page has had no changes. That's transient? If I sit in the Google lobby for 14-20 days, think I'll be transient?

All in all, I've been pretty mellow about caching because opt out is easy. But not that we have a court that seems to not understand some real copyright issues, I'm flipflopped. I'm with Dave and Brett -- make it opt in. And we don't need a court decision for that to happen. The search engines themselves could do it. The problem is, with this ruling in their pocket, they have less reason to do so.

dazzlindonna
01-26-2006, 03:55 PM
I totally agree. If the court had said, it's okay as long as it's opt-in, I would have no problem with it. But to say it's ok, because there is an opt-out feature is just bass-ackwards, imo.

vayapues
01-26-2006, 04:12 PM
You know, when I read about this earlier, I thought that it made sense, and was a good ruling. But now that I read some of the arguments in the forums, I realize that maybe it is not such a good ruling.

IMO there is a simple solution to the whole thing. We need a law to be presented and passed by a body of legislators, such as congress (IE, not the judicial) setting up a specific opt-out tag. Of course, they would just use the noarchive meta-tag, which is fine. This way, it would be official. No third party could setup their own opt-out tag.

This also addresses the issue of the search engines benefiting from copyrighted materials, while the copyright owner does not. If you make a legal opt-out tag, it really becomes an opt-in issue. The whole glass if half-full half-empty argument.

This ruling is simply too open-ended, and without the power of legislative / executive blessing.

darn, there I go suggesting regulation of search engines again. Just can't get away from that one can I. :) lol (http://forums.searchenginewatch.com/showthread.php?t=9533).

randfish
01-26-2006, 04:14 PM
It's akin to saying - we're going to make every magazine in the US that's offered as free reading in bookstores, cafes, dentist's offices, etc. available at the touch of a finger in your home. But, specific magazines can opt out by making sure that they put a stamp inside their magazine that says "don't archive me."

That's just not how copyright law works. I have no real issue with Google making cached copies, but I take issue with this ruling - it's just not a correct interpretation of the law, IMO. I give it a year before it's overturned.

grnidone
01-26-2006, 04:35 PM
It seems that Google has paid off the judge as it really makes no sense that google keeping a copy of someone's web site is legal.

It seems cut and dried to me. I'm utterly shocked at the verdict.

rogerd
01-26-2006, 05:07 PM
>>paid off the judge

It sounds more like a judge unfamiliar with technology combined with a poorly argued case by the plaintiff.

I'm quite surprised Google won. The comparison of what Google does to what an ISP does is particularly bogus, IMO. An ISP may deliver you a recent copy of that page, particularly if they have no reason to believe it has changed. Google gives users the option of seeing a page that may be months old, that may no longer have current content and ads, and may look very different if the site delivers content based on browser capability.

How many site owners have a clue that they can turn off cacheing with a special tag? How many site owners lack the technical knowledge or the code access to implement the tag? (Think of all the hosted blogs, social profiles, etc., where the only access is by web form.) What about non-HTML documents?

mcanerin
01-26-2006, 07:16 PM
In general, I'm in favor of opt-out for pubic information, and opt-in for private information.

But the concept of keeping the page in case it's not available is a serious issue, and a strong one against opt-out. What if I remove a page because I lost a lawsuit, or changed my mind, or discovered I had made a typo that completely misrepresented my position, or any number of other things?

What right does a search engine have to then make that information available afterward? I can't think of a stronger indication of a refusal to allow another to use something than deleting it! That lack of existence is a far stronger declaration of NO than a mere noarchive tag, or even a robots.txt disallow.

No means no. Not "well, no unless it's inconvenient or I really want to".

I don't buy the idea that I just need to go to the search engine in question and remove it manually. I already removed it manually. The user should not have to manually go to every search engine on the planet manually deleting a page. Some of them don't even have manual deletion options, or are not in the users language.

If the cache was truly transient, then it's shelf life would be hours, or maybe a day or even a weekend at the outside. But not a month. That's not transient, that's residence.

Next business day would be the maximum I could see as even beginning to approach reasonable.

When a search engine has something in it's possession for that length of time, they become caretakers, not carriers. The rules are different for caretakers than carriers. It's that simple.

As long as it's being kept for as long as it is, it should no longer be considered public information (and opt-out) but private, and therefore opt-in.

My opinion,

Ian

randfish
01-26-2006, 07:22 PM
Reading over the arguments for and against, it's totally shocking to me that Google did win this case. I think rogerd must be correct - the judge's incompetence in understanding the material is the only reason I can imagine for them getting this wrong.

This ruling conforms with none of the previous precedents set about copyright law in the US, nor does it match with a new interpretation of an existing law. If this were a non-technical, non-machine-data based case, there would have been no chance of the current ruling. For Google, the judge's ignorance was their bliss.

mcanerin
01-26-2006, 07:25 PM
That judge sure seemed to be quoting Google's brief a lot.

Until I read closer, I actually though it was Googles arguments being printed, not the actual judgement!

Ian

Wail
01-27-2006, 07:54 AM
Proxy servers make caches of web pages all the time. When I visit SearchEngineWatch our network's web proxy (and even my browser) checks to see whether the page has changed since I was last there. If it has not changed then rather than making SearchEngineWatch's server work again, I'm simply shown a local cache of the page I already have.

Web pages can opt-out of this process.

Danny's point that there's no standard for a search engine cache opt-out is valid but it is worth baring in mind that this was a Google cache case and not a Search Engine cache case.

The fact that anyone can create a web page does not mean that anyone should. I believe people should take some responsibility. You should not publish something without knowing who has distribution rights or what the front cover might be. I don't believe you can upload a web page and then defend yourself with a "I didn't know that would happen" complaint when the "what happened" was a harmless side effect aspect of your upload.

I don't believe the web will progress if we have to go at the speed of the lowest common denominator all the time. Technology improves - caches, for example - and rules and regulations should try and keep up, not slow things down.

I'm glad Google won this case and I'm actually a bit shocked that so many webmasters here disapprove of the win.

mcanerin
01-27-2006, 12:53 PM
Wail, the problem isn't the cache, so much as it is an archive. Calling it a cache doesn't mean it is one.

The average proxy server has a specific page in it for what? A few hours, maybe a little longer. I've seen some set to 3 days, but that is considered too long by many admins.

I agree completely that if you upload stupid things, then you are acting, well.. stupid. It's not Googles job to protect you from yourself.

Having said that, I've seen "cache" pages that were last indexed YEARS ago. I can't imagine that being considered a cache by any network admin. This one, for example, is from 2004!:

http://72.14.203.104/search?q=cache:UeO4JrnKst4J:www.medicum.net/dic2-ltrC-page107.html+cytozoic&hl=en&ct=clnk&cd=79

That's not a cache, it's an archive.

What if I upload something that I'm perfectly able to upload, say an affiliate page or something. Then the contract expires and I am no longer allowed to advertise the product on the net. Accordingly, I delete my page, or change it to something else.

But, like the living dead, this document rises from it's grave to wreak havok in terms of an archived file. I'm now in breach of contract. A court will give you a "reasonable" length of time to comply. This is usually next business day on for websites. Is Google liable for contractual interference? Maybe.

Nightmare Scenario

What if I uploaded an information page on an obscure drug or herbal remedy in good faith and with the most current information? Later, it turns out that this drug has very serious side effects and can be fatal to some people. Or maybe it works fine, but is counter-indicated for certain conditions. Naturally, everyone removed the information about this drug, or updated their pages to reflect the new information.

Except Google. Who, as we saw above, may see fit to keep and display the information for years.

Now someone hears about an obscure drug or herb that might help them with an ailment other drugs don't seem to be helping with, and goes looking for information on it. They have a hard time finding it (since it was deleted) but eventually the information page shows up in cache. Since the information page is from a reputable source, they read and beleive it. They live where this herb or drug is available. Maybe it's perfectly legal and functional for some things, but is counter-indicated for this guys condition. Now he's dead.

There is a lot of information in the world, and the more people turn to search engines to find obscure information that isn't available locally, the more they are going to have to trust the accuracy of that information. At the point where you are presenting outdated medical information as a current result, you've gone too far, IMO.

The rules for a temporary cache are difference from the rules for a long-term archive, IMO.

Ian

Wail
01-27-2006, 01:03 PM
Yes, that's a good point. I don't have a copy of the Judge's ruling handy here (just firing off a quick reply) but I do believe the temporary state of the cache was a factor. If the cache is not so temporary then it would be a different matter.

It is possible to have old caches/archives removed by Google via their page removal form.

In your nightmare example (which is scary) the correct and responsible thing to do would be to change your incorrect page to explain the nasty situation, provide correct data and even steps to take if you acted on the wrong information. The cache would then update to reflect that users who returned to the page would also be told about the important update.

It's my understanding that the very concept of the cache which was being decided here - not the finer aspects of when a cache becomes an archive or whether there are some situations when a cache would be unwelcome. After all, lawyers aren't illegal and there are plenty of nightmare scenarios involving lawyers! :D

Lenz
01-28-2006, 11:15 AM
and have a couple of long posts on my blog (http://k.lenz.name/LB).

In brief, I don't agree that there is much "transformation" going on here, and I don't buy the idea that the Google cache falls under the safe harbor, or that I give Google any "implied license" by refusing to jump through the hoops they set up for their opt-out.

In contrast, Patry and Lessig applaud on their respective blogs, but with not much meaningful analysis. I agree with Patry though that one can expect this opinion to feature heavily in future litigation.

BradBristol
01-28-2006, 02:08 PM

It is possible to have old caches/archives removed by Google via their page removal
form.

This is NOT just about google! This case helped establish case law for ALL search engines.

I have a feeling that this decision by this obviously biased judge will not stand for long, from what I understand an appeal is being prepared already.

Google lucked out this time, a favorable, unknowledgeable judge and a very very weak plaintiff attorney...

PhilC
01-28-2006, 04:58 PM
I'm absolutely staggered that a judge would make such a ruling. I've always been against search engine "caches" for the simple reason that, if people are going to view my pages, I want them in my site, and not in someone else's site. I am certain that it's illegal, regardless of the ruling, and I'm very glad to hear that an appeal is being organised.

Mikkel deMib Svendsen
01-29-2006, 05:09 AM
This is a US ruling, right? I would like to see how other countries courts may look at this. With Google opening more and more local offices/businesses they may be prosecuted locally for local copyright violations.

In the Newsbooster case (some years ago) a Danish court ruled in regards to crawling that the op-out option of robots.txt did NOT give Newsbooster the right to crawl newssites. One of the newspapers even had a robots.txt file up and Newsbooster fully respected - but this newspaper still won the case!

With that in mind, I am pretty sure that a case here in Denmark about the cache would get a similar ruling.

Personally, I definately think the "cache" (or "archive, as Ian rightly label it) shuld be opt-in.

On a funny side-note: I guess with the current US ruling building scraper sites based on Google SERPS is perfectly OK - after all, it's just a cache :) If they don't like it they can just use my own noarchive tag ... hehe

rogerd
01-29-2006, 02:56 PM
>>If the cache was truly transient, then it's shelf life would be hours, or maybe a day or even a weekend at the outside. But not a month.

I think that's a key distinction that the judge didn't understand. The Internet would bog down without caching at various stages, but Google's so-called "cache" is more like an archive.

PhilC
01-29-2006, 06:41 PM
Yes it's an archive, but few people object to search engines archiving their pages. It's an essential part of running a search engine, so it isn't archiving that's the problem - it's displaying the pages in their entirety within Google's own site that's the problem. It can't possibly come with the realms of fair use.

Mikkel deMib Svendsen
01-29-2006, 07:04 PM
> It's an essential part of running a search engine

I don't think so. I have seen many local Google partners over time drop the cache link in SERPS and it leaves the search experience just as good as having it. Also, just because something might be usefull dosn't make it either legal or right to do.

PhilC
01-29-2006, 08:27 PM
Yes it is, Mikkel. I'm not talking about the cache use of archived pages - I'm talking about storing them. It's essential for running a search engine.

Mikkel deMib Svendsen
01-29-2006, 08:37 PM
No, you don't need to store the entire page to run a search engine.

PhilC
01-30-2006, 06:07 AM
Perhaps not but it's an essential part of Google's search engine. Where could they get the relevant snippets from if they didn't store the entire page? Yes, they store the location of each of the words on the page, but it would take a lot computing time to reconstruct the whole page from that data just to produce a suitable snippet.

From the beginning Google showed that they store the pages in the Repository, and I've never seen a word written against it. The only thing that people have objected to is when they display whole pages within their own site. It isn't archiving that people are against, but displaying their pages in Google's site.

Mikkel deMib Svendsen
01-30-2006, 06:13 AM
They only need to store the raw text - not the markup, images etc, to create the snippets. Also, if it turns out that Google have build their engine in such a way that it turns out to be illegal (in some countries) then tuff **** - they just have to redo it in another way. Bad programming is no excuse to break laws! :)

Wail
01-30-2006, 06:19 AM
Google needs to store the mark up! As we all know, the mark up [<b>test</b>] has a very different search engine significance than [test].

Google needs to know where the text appears on the page, high up or low down, whether it's large text or small print.

PhilC
01-30-2006, 06:20 AM
Perhaps, Mikkel, but there's still nothing wrong with achiving entire pages. The thread was discussing the difference between a cache and an archive, and I'm addressing that point. People are not against search engines holding archives of their pages, and there can't be anything illegal with it, imo. What people are against, and what must be illegal in most countries, is displaying other people's entire pages within a site without their permission.

PhilC
01-30-2006, 06:22 AM
Google needs to store the mark up! As we all know, the mark up [<b>test</b>] has a very different search engine significance than [test].

Google needs to know where the text appears on the page, high up or low down, whether it's large text or small print.
They store that information along with each word, Wail, so it isn't necessary for creating the results.

Mikkel deMib Svendsen
01-30-2006, 07:25 AM
People are not against search engines holding archives of their pages, and there can't be anything illegal with it, imo.

Technically I don't think you are right. In Dansih copyright law, as in many other countries, it very specifically says that it is illigal to make any kind of copy of a copyright protected work. And that is basically what search engines do: They copy our work to bring themself profits. It is legal (here) to use a small quote and link to a site but making a copy of the entire site or pages is definately not legal (here).

The only reason people have not done anything about this so far is that they feel it's a fair trade - search engines steal our content, but in return send us qualified visitors. However, the archive dosn't bring any value to the site owners and therefor should be a pure opt-in program.

PhilC
01-30-2006, 07:41 AM
I agree completely that the "cache" system should be opt-in - I've been dead against the cache for years, and I'm even more against it because Google uses OUR bandwidth to display OUR pages in THEIR site - notice where the graphics in the cached pages come from. But I'm not against the engines holding copies of my pages, and I don't think that people in general are against it, and I doubt that there any laws against that won't be changed to be more in keeping with today's world.

it very specifically says that it is illigal to make any kind of copy of a copyright protected workWhat about the copies that we make in our browsers when we are viewing a page - like you are doing right now? They are copies of protected works. What about the archives that we make (our browsers) for viewing later via the History button? In IE pages can be archived for 99 days - over 3 months. I don't think that even in Denmark the law would prevent those copies from being made.

Mikkel deMib Svendsen
01-30-2006, 07:54 AM
Phil, there is no where in Danish copyright law where it say that the browser caching, or any other caching is legal. In fact, it basically just say that any copying is illegal, with a few exceptions. But as long as noone has brought these issues to the courts we just won't know for sure ...

One thing, however, that is essential in copying is weather you just do it for you own benifit or if you do it for commercial reasons. The browser cache only benifits me personally. The Google copying is an essental part of them making billions - it's a commercial use of copyrighted work.

Similar, it's legal to make a back-up copy of a CD you buy here or convert it to MP3 for your MP3 player. However, it is not legal to sell that copy to anyone or put the MP3 file on your website.

Lenz
01-30-2006, 08:46 AM
is of course a reproduction, which is copyright infringement everywhere on the planet.

Therefore, Google and other search engines need either a license (for example a Creative Commons license allowing commercial use) or an exception (like fair use, or personal use in European and Japanese copyright).

The court did not discuss if the original act of entering content into the database is infringement since the plaintiff had not alleged it. I think there is no way to deny that this is a reproduction. The only question is if it falls under fair use or not, and opinions may differ on that one.

PhilC
01-30-2006, 09:25 AM
The Google copying is an essental part of them making billions - it's a commercial use of copyrighted work.
Now it's my turn to question the use of the word "essential" :)

I don't agree that Google's "cache" system makes any money for them at all, let alone an essential part of making money for them. Perhaps I should ask, in what way does Google's storing of pages make money for them? In what way is it commercial? You have already said that it isn't an essential for running a search engine. Also, the search engine itself doesn't make money - they make money from ads.

Mikkel deMib Svendsen
01-30-2006, 09:37 AM
If its not essential they should just drop it :)

PhilC
01-30-2006, 09:53 AM
I agree - about the cache. I've said for years that the cache system is morally and legally wrong, and it always annoys me when I remember that they use MY bandwidth (for graphics) when they display my pages in their site.

I'm not convinced that there aren't any necessary or useful search engine reasons for storing entire pages, though ;)

Mikkel deMib Svendsen
01-31-2006, 11:18 AM
and it always annoys me when I remember that they use MY bandwidth (for graphics) when they display my pages in their site.


Hhhmm, maybe we should play around with some mod_rewrite and serve different images for Google cache hehe - maybe something like: "THIS IS AN ILLEGAL CACHE MADE BY GOOGLE!" - with a bit of added CSS it should be quite easy to make it take up the entire screen hehehe would that be too evil? :D

PhilC
01-31-2006, 11:28 AM
LOL!!! Actually, that's a damned good idea! I like it so much that I might just do it. Usually, the first graphic is top-right, so it would be easy enough to return a mega-huge graphic that fills the screen for all requests that come from the Google's cache. It wouldn't even need to be a huge filesize. I love it!!!

I wonder what Google would do to the site.

Mikkel deMib Svendsen
01-31-2006, 11:35 AM
I wonder what Google would do to the site.

Nothing - they woudn't know unless an editor actually took a look :)

OK, so lets do the script and give it away to anyone that wants it ...

PhilC
01-31-2006, 11:49 AM
Sounds good to me.

I would use an .htaccess intercept, and simply deliver a different graphic, but that's something that can't easily be given away. How would you do it?

Feydakin
02-01-2006, 10:59 PM
The solution while very cool would also have the side effect of doing the same thing to any google image search.. But, I suppose that if you didn't want Google to cache your pages you wouldn't want them coming up in the image search anyway..

PhilC
02-02-2006, 06:47 AM
Personally, I think that their image search is just as bad, if not worse that the cache. Why would anyone do an image search if it isn't to look for suitable images to use on webpages? That's got to be the biggest use of the search. To my way of thinking, Google makes the stealing of images easy for people.

engine
02-02-2006, 06:08 PM
I'm with the opt in proposal. Look at the problems caused by a mistake http://hardware.silicon.com/desktops/0,39024645,39156123,00.htm

If you're not Dell with a big budget and friends in high places it'll be tougher to get the cache removed. The little guy with trade secrets accidentally puts something online and BAM, it's there, he's not going to have the same pulling power as Dell.

One might argue that they shouldn't have put it there in the first place. That may be true.

I wonder how this sits with the Data Protection Act in the UK?

Feydakin
02-02-2006, 06:17 PM
Personally, I think that their image search is just as bad, if not worse that the cache. Why would anyone do an image search if it isn't to look for suitable images to use on webpages? That's got to be the biggest use of the search. To my way of thinking, Google makes the stealing of images easy for people.

Mostly I agree.. Except I recently started using the image search to drive traffic to our site.. It's not much yet, and I'm still working on a better way to convert that inbound person into a real browser of our site, but I am seeing some small amount of traffic coming in from the image search..

Personally I think that caching the site and holding it for weeks, sometimes months, for online viewing is not an acceptable use of our work.. Holding it internally to rfine and develop the search result I'm not so against.. Opt in for both the image search and the caching seems only appropriate..

Wail
02-03-2006, 04:58 AM
I see a lot of SEOs speaking out against the cache. I wonder how many of you recommend to clients to use "noarchive" metas?

Mikkel deMib Svendsen
02-03-2006, 05:18 AM
No, I would never recomend the use of the noarchive tag - not even on a cloaked page. Especially not on a cloaked page :)

OptimizeDotNet
03-15-2006, 02:46 PM
Mikkel, I was reading this discussion you were having earlier last month. I wanted to ask why you would never use the "noarchive" tag? I've never used it or experimented with using it, but given the usage outlined in the various search engine description pages...

Q. Can I prevent Teoma/Ask search engine from showing a cached copy of my page?
A: Yes. We obey the "noarchive" meta tag. If you place the following command in your HTML page, we will not provide an archived copy of the document to the user. Is it an "on moral grounds" thing? I vectored over here from Danny Sullivan's "25 things I hate about Google" article, and I think the discussion was interesting, even if I found myself disagreeing with most people on this issue. On of the reasons I personally like seeing "cached" pages, is for seeing the context of the terms used to return said page. Having the SE highlight the terms, and show which terms are only on "linked" text, and showing visually where these items appear, I've found to be a help. If Google swapped this for a "raw text" version, it would likely introduce the same questionable issues of content control... as the article text would still be available outside of the original page author's explicite permission.

--Also, with the images coming from the original server, I don't view that as Google "using" someone, but a great way to know how someone is viewing your website (given the referer log data this produces). If Google only showed "raw text" or perhaps gave no indication to searchers WHY a certain page comes up in a search, that would be a little dissappointing. --Especially if the cache is old, and you end up using the "current" data on the site to measure why they are successful.

Some thoughts.

~ Dudley

Mikkel deMib Svendsen
03-15-2006, 03:06 PM
I wanted to ask why you would never use the "noarchive" tag?

Thats an easy one :)
Simple answer is the fact that Google have proven once to remove all pages that had that noarchive tag! Yes, thats what they did only a few month after they launched it. A lot of cloakers had implemented it as a nice way not to have Google show the cloaked crap. I did too. Only I heard the roumor about Google doing this before they did, so I manage to get the noarchive off my pages before Google hit us all.

With that in mind you'll understand that I simply can't afford to trust that tag anymore.

PhilC
03-15-2006, 03:11 PM
I've used it for years without having any problems.

OptimizeDotNet
03-15-2006, 04:51 PM
Thats an easy one :)
Simple answer is the fact that Google have proven once to remove all pages that had that noarchive tag! Yes, thats what they did only a few month after they launched it. A lot of cloakers had implemented it as a nice way not to have Google show the cloaked crap. [-SNIP-] With that in mind you'll understand that I simply can't afford to trust that tag anymore. Hm. --But, am I missing something, or isn't it a serious faux pas to "cloak" if you want pages listed... cloaking being a no-no Google will even delist its own pages over (http://blog.searchenginewatch.com/blog/050309-092708)?

Not that Google will always catch you, but couldn't this be the reason Google drops certain pages from the index and NOT the "noarchive" tag? I have to admit, part of what I was reading mentioned something about NOT being able to single out search engines from caching, but having to tell ALL search engines not to cache... of else one runs into the possibility of Google double-checking for "cloaked" pages... and delisting you.

I'm more interested to hear that Google (or whoever) would treat "noarchive" like "noindex,nofollow" or something... as such, I'm still wondering if this happens.

~ Dudley