Special thanks to:
|
#1
|
|||
|
|||
|
Indexing Summit 2: Give Your Feedback On Handling Redirects
Indexing Summit is returning for our next SES show. One of the two big topics will be on how search engines handle redirection. I've posted a Revisiting Hijacking & Redirects: Moving To A Solution article on the SEW Blog (URL to come shortly) that explains the situation with W3C rules, what Yahoo does to break some of those for good reason and what Google does that causes some problems. The goal is to get to an overall standard for all the major search engines to use. Your feedback for the summit is really helpful. How would you like to see things work? What unusual situations might come up that require special handling?
Last edited by dannysullivan : 08-01-2005 at 02:08 PM. |
|
#2
|
|||
|
|||
|
Yahoo complies with the rules allright
Just a minor, albeit important, correction:
Quote:
Quote:
About the special keyword "SHOULD", this document says: Quote:
...which brings me back on topic: By doing something else than what is stated in RFC 2119 for certain situations, Yahoo is actually acting in full compliance with both RFC's. They don't break the rules, don't even bend them. In fact, they act 100% as they're supposed to act according to these rules. ------------------------------------------ BTW: Sorry I can't join you at the SES San Jose, Danny. Would like to, especially with this subject on the agenda, but it's just not possible. |
|
#3
|
|||
|
|||
|
Good point, Claus -- I'll go back and clarify and recall you'd made this point well in your article.
Recommendations is the better word -- but it's funny, because in talking with Yahoo, they clearly were uncomfortable to some degree against going against the recommendations even though the recommendations themselves recommend that on the odd occasion. Just wanted to use recommendations three times in that! They do feel they are doing the right thing, but there's that sense of somehow going against a standard. That's why I hope they'll all come up with a standard but one that makes sense for search engines. Sorry you won't be there but would love to make sure any suggestions you have get passed along. I'll go back through your article for them. Chiefly, it seemed to be adopting the Yahoo approach. |
|
#4
|
|||
|
|||
|
Hi everybody,
Is it not a solution the target domain to acknowledge the 302 redirect by someway and unless that, the redirects to be treated as permanent by SEs. If the initiator of the redirect has control over the target domain it can acknowledge that the redirect is accepted. This may be done by some of the following ways: - A list of accepted redirecting domains may be included in robots.txt and the treatment of 302 redirects may be defined in some future version of Robots Exclusion Standard, which may become "Robots Instruction Standard" - A meta tag may be used to acknowledge the 302 redirects. The latest looks like the best shot, how do you think? Hope this may help somehow. Regards, Vladimir Granitsky |
|
#5
|
|||
|
|||
|
I first saw this problem years ago - in 2001 or 2002 I think. I looked into a problem someone was having on iSearch and this Google 302 "pagejacking" turned out to be it. I reported it to Google, with a proposed solution (which was virtually identical to Yahoo's current solution) and, for a while, they fixed it. Then they broke it again. Over the intervening years they have fixed it and broken it a few times. I don't know why... they had it working once!
I think the problem is that some engineer has become too tied up in the meaning of "Temporary Redirect", and is following the W3C guidelines a little too closely for their own good. The simplest solution IMO is as follows:
|
|
#6
|
|||
|
|||
|
I disagree with the above points - about only handling redirects, if they are part of the same domain.
There are many legitimate reasons to redirect from domain to domain - such as branding issues. For example, I work with clients who spend years trying to buy their branded domain from a squatter, so they register another one in the mean time. So when it comes time to move to the new domain that they've always wanted, why should they be penalized for performing legitimate 301's from the old domain to the branded one? Personally I like the idea (however cumbersome) of acknowledging the redirect via a meta tag. That is something that could be explored I think. |
|
#7
|
|||
|
|||
|
Quote:
All redirects should be handled. It's just a matter of which URL a piece of content is indexed under. I'm suggesting that a piece of content should always be indexed under the URL that directly references that content, without a redirect. I'm suggesting that the only exception to this should be when a "home page" on the same URL redirects to that piece of content, then the content should be indexed under the home page URL rather than the URL which actually addresses it. Quote:
![]() |
|
#8
|
|||
|
|||
|
I guess my point is if you 301 a root url to another url for legitimate reasons, the way I read your post Alan was that the engine should not give credit for the redirect and instead treat it as a link?
Or am I still misunderstanding something? Don't get me wrong - I like the idea of tackling this whole hijacking issue but my concern is for those legitimate sites which seem to always get swept out with the trash when the engines perform a major update. |
|
#9
|
|||
|
|||
|
Quote:
What's the difference, as you see it?A redirect from "http://www.siteA.com/" to "http://www.siteB.com/" should mean that the content of "http://www.siteB.com/" is indexed under the URL "http://www.siteB.com/". All links to "http://www.siteA.com/" should be treated as links to "http://www.siteB.com/". I think this is what you want. ![]() A redirect from "http://www.siteA.com/" to "http://www.siteA.com/subpage.htm" should mean that the content of "http://www.siteA.com/subpage.htm" is indexed under the URL "http://www.siteA.com/". |
|
#10
|
|||
|
|||
|
Yes that is what I was thinking...
Sometimes, you know, its just easier talking to get a message across ![]() |
|
#11
|
|||
|
|||
|
No "accept redirects from" header, please
At first thought it may seem like a great thing, because then somebody is in control. But that is a false impression, as "somebody" is always in control: The sender of the traffic. And that's the right person.
I fully understand that the people at Yahoo (or Google, or anyone) have respect for this system - some really bright people have thought about all these things (and more) in the early days of the internet, and they knew so much more than we will ever do because they had the full picture back then when the web was relatively small and simple. It's really a complete system of interconnected rules and recommendations, so that if you change the way you do one thing you risk affecting another that you didn't even know about. An "Accept redirects from" header would be working totally against some of the useful things that you would really use a 302 for (and for which it was intended in the first place). Specifically, with a 301 or a 302 control is always on the sending end, never on the receiving end. The sender is in full control, and it was always intended that way: Only the sender should be able to control where his/her traffic goes. One such example of a useful thing is a "coral cache". This is a thing that you would use if one of your pages suddently became hugely popular, so that your server had problems following the demand. What happens then is that a network of some hundred servers takes over and mirrors a copy of that page, so that everybody can access it. The way this works is that you simply issue a 302 to a special URL that instructs these cache servers to deliver a copy of your page. When the peak load is over you just remove the 302. And here's the point: The receiving servers never know which pages will redirect to them or when - it's all automatic. If they had to authorize pages linking in, such systems would never work. It's situations like this 302īs were intendend for, way back when. Link counting scripts came much later, but these should really use the 302 code as well (just like most of them do), as links do change. (In all respect, the "Cool URI's Don't Change" piece never was more than a pipe dream. Cool URI's have change management: 301's and 302's) So, the error is not with the sender (the issuer of the 302) and it's not with the receiver either. Neither the redirecting site, nor the receiving one is broken -- or doing something wrong -- in any way. The error is only with the client, which is the search engine (except Yahoo, of course). Which also means that this can only be fixed at the client side (the SE) because, frankly speaking, the client is broken. -- -- -- Yes, I totally agree with the Yahoo approach. First time I heard about it I was actually against it, as they do treat a 302 like it is a 301, (and there is an important difference), but it works just right in this context. So, it "sounds wrong" but "works right" - I'll prefer that any day to something that sounds right but doesn't work. Also, that's the general idea behind those recommendations; if something else works better, all things considered, you should do that in stead. It might be that they found out only by accident and because they focused on doing something quick, but then they instinctively did exactly the right thing. Alan Perkins suggestion #4 also makes much sense to me - treat those status codes exactly like any other link. Don't make any difference at all. Take only the URL that the content is found on, no more and no less, and rely on the next spidering round to change the link if the page is now somewhere else. Of course, then they loose some information, but that information could be stored for internal use anyway. It's not all the SE's got that we get to see in the SERPS anyway ![]() |
|
#12
|
|||
|
|||
|
Quote:
![]() Quote:
I'd prefer that URLs that returned a redirect were not indexed (unless they were the root URL of a domain or subdomain). Instead, the URLs they redirect to should be indexed. I do think that, in practical terms, a redirect is a redirect whether it's a 301, a 302, a 307, a HTTP Refresh, a Meta Refresh or a client-side redirect (e.g. JavaScript or Flash). Even a one-frame frameset may be considered as a redirect. Given this wide definition of "redirect", and applying the logic I posted earlier, gives the following sample results: Code:
+---------------------------------------------------------------------------------------+ | Redirect Page | Target Page | Indexed URL | |-----------------------------+----------------------------+----------------------------| | www.siteA.com | www.siteB.com | www.siteB.com | | www.siteA.com | www.siteA.com/subpage.htm | www.siteA.com | | www.siteA.com | www.siteB.com/subpage.htm | www.siteB.com/subpage.htm | | www.siteA.com/subpage1.htm | www.siteA.com/subpage2.htm | www.siteA.com/subpage2.htm | | www.siteA.com/subpage1.htm | www.siteB.com/subpage2.htm | www.siteB.com/subpage2.htm | +---------------------------------------------------------------------------------------+ |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|