Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Yahoo! > Other Yahoo! Issues
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 02-04-2005   #1
mcanerin
 
mcanerin's Avatar
 
Join Date: Jun 2004
Location: Calgary, Alberta, Canada
Posts: 1,564
mcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond repute
Thumbs down Rackspace hates Yahoo? [Update - Bogons!]

Well this is an interesting one.

I have a client. He has a website. He ranks very well on all search engines.

But his site is a bit slow, and due to his increased volume he needs to move to a better, faster server. Late December, we move the site (leaving a copy on the old server to deal with DNS propogation lag) to the new server.

The new server has a brand spanking new, never before used IP address : 72.3.155.1 and 72.3.155.2, specifically.

We wait (watching the logs) until all of the spiders and visitors are resolving to the new site, then prepare to remove the old site. No one is visiting anymore, so everything has been switched and there is no possibility of a duplication issue.

All nice and clear and best practice, right? I assure you that the site itself is clean and has no spam issues.

One problem. His Yahoo rankings tank. Also, his Yahoo Paid Inclusion stops working. We check and find that the spiders are reporting errors. An 8006 error, to be exact.

But only Yahoo's spiders. Everyone else is doing fine. BTW, the Yahoo spiders ARE visiting occasionally, so they are not lost, AFAIK.

We call, and Position Tech and Yahoo blame robots.txt in what appears to be a form letter. The robots.txt is fine - basically says all spiders are allowed everywhere. 2 lines. After a few days of this, I remove the robots.txt file. No change, no Yahoo spiders. All other spiders are fine.

So then we get really annoyed and start bothering PT and Yahoo. Now, they are saying that Rackspace is blocking Yahoo internally:

From Position Tech:

Quote:
A very skilled internal technology team has examined your problem situation in great detail. You are using a Web Hosting service (Rackspace) that is blocking the Yahoo! crawler. It appears that the blocking is occurring within the Hosting service's security system.

There is nothing that we can do to resolve this issue. You must get resolution with your Web Hosting service (Rackspace). If you do not get a satisfactory response, then we suggest that they are not operating in their customer's best interest and you may want to consider other hosting
services.

The Yahoo! crawler will continue to attempt to process your web pages on a regular basis. When the crawler is "granted" access to your web pages, then your Site Match service will be restored accordingly.
I'm still waiting for Rackspace to get back to me. Anyone else having issues with Rackspace and Yahoo?

At this point, I can only see 2 possibilities:

1. Rackspaces Fault: they are blocking Yahoo's spiders. But you'd think other Rackspace clients would have noticed - they have a LOT of them. Maybe an isolated issue?

2. Yahoo's fault: the IP's are very new - I'm wondering if they have been blocked off (or never opened up) within the bowels of Yahoo.

Naturally, I expect each one to blame the other.

Just a word of warning. I'll keep you posted, assuming either one actually cares enough to do something about it.

My question is, has this happened to anyone else? What was the cause and solution?

Ian
__________________
International SEO

Last edited by mcanerin : 02-24-2005 at 11:29 AM.
mcanerin is offline   Reply With Quote
Old 02-04-2005   #2
TheotherTim
Official Y! Rep
 
Join Date: Jun 2004
Location: Palo Alto, CA
Posts: 27
TheotherTim will become famous soon enoughTheotherTim will become famous soon enough
I just talked to someone in support and they say that this sort of thing happens quite often. A customer of a hosting company like rackspace may have a customer that is being hit by our crawler and not realize who it is so they report the IP to rackspace or another provider and they block the IP at the router level. This would cause other SE to have access and for us not to have access.

I am not quite sure why you are blaming Yahoo! It looks like you have received very good service from us as we have identified the problem and notified you on how to fix it. Please send the domain to me via forum mail if you want us to look into it further.
Tim
TheotherTim is offline   Reply With Quote
Old 02-04-2005   #3
Hank Cowdog
Member
 
Join Date: Feb 2005
Location: Protohell (aka Texas)
Posts: 5
Hank Cowdog is on a distinguished road
The BOGONs are to blame

If this were indeed the issue, then a traceroute to and from the Yahoo spider's server and the customer's server would isolate which specific hop in the routing chain is to blame. The implication here is that the Yahoo! Spider's IP address is triggering the problem.

I think the problem is more likely the fact that the customer server's IP address has only recently been allocated, and may still be on one or more routers' BOGON list.

Possibility 3: A router between all/some of Yahoo!'s spiders is blocking the traffic from a mistaken belief that the brand new IP address is still unallocated (and hence not valid).

If you look at the BOGON allocation list, it shows that 94% of the IP addresses in the 72.X.X.X range are NOT currently allocated. If the IP in question was really just recently allcoated, there may be some routers out there that are still blocking this traffic.

Last edited by Hank Cowdog : 02-04-2005 at 03:08 PM.
Hank Cowdog is offline   Reply With Quote
Old 02-04-2005   #4
TheotherTim
Official Y! Rep
 
Join Date: Jun 2004
Location: Palo Alto, CA
Posts: 27
TheotherTim will become famous soon enoughTheotherTim will become famous soon enough
Ian,
We did some further testing and our crawler IPs are in fact blocked from Rackspace.
Tim
TheotherTim is offline   Reply With Quote
Old 02-04-2005   #5
mcanerin
 
mcanerin's Avatar
 
Join Date: Jun 2004
Location: Calgary, Alberta, Canada
Posts: 1,564
mcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond repute
Excellent!

Well, the situation isn't, but the response from Yahoo is - thank you for your time and effort to check this.

My thinking on whether Yahoo may be the issue was very much in line with what HankCowdog was saying, so getting an actual confirmation from Yahoo was, IMO, needed as part of a troubleshooting checklist. Nothing personal

So far, PositionTech and Yahoo agree that the spiders are being blocked, and that is good information.

I'm focused on Rackspace now. As expected, their initial response was that they have lots of clients that are not affected and it is probably Yahoo's fault, though they did acknowledge it could be an "isolated incident". I'll point this thread out to them - I'm inclined to believe Yahoo and PT at this point.

Thanks for the quick response, TheotherTim! For the record, I HAVE received excellent service from Yahoo for myself and my clients for many years now.

If I could ask, is there a simple method or checklist an SEO or webmaster could do to isolate this issue and ascertain whether or not it is the ISP or not?

Ie, how can I tell if it's the spider IP being blocked or simply an unspiderable site, etc. I know this particular site is ok, but I'm wondering how many people out there who are being blocked and just thinking they need to "optimize better".

Since my own workstations IP's are not blocked I'm not sure how to do it from my end, and since most webhosts will immediatly deny any fault (being human nature), it would be nice to go to them with some evidence.

Thanks again for looking into this,

Ian
__________________
International SEO

Last edited by mcanerin : 02-04-2005 at 03:58 PM. Reason: spelink errers.. ;)
mcanerin is offline   Reply With Quote
Old 02-04-2005   #6
I, Brian
Whitehat on...Whitehat off...Whitehat on...Whitehat off...
 
Join Date: Jun 2004
Location: Scotland
Posts: 940
I, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of light
I have a client on Rackspace UK who is fine for UK rankings - I figure you're talking about Rackspace in the US?

Also, is there a Yahoo! address general webmasters can contact about such concerns? I have a clean non-commercial reference site that Yahoo! doesn't recognise, despite recognising 20,000 links to it. Possible redirects issue.
I, Brian is offline   Reply With Quote
Old 02-04-2005   #7
Hank Cowdog
Member
 
Join Date: Feb 2005
Location: Protohell (aka Texas)
Posts: 5
Hank Cowdog is on a distinguished road
I think the fact that Yahoo! cannot reach the RackSpace server is clear. The question is: where in the path of many hops between the Yahoo! spider's machine and the RackSpace customer's machine does the comm break down?

Could be just outside of Yahoo!'s server's location.
Could be inside RackSpace's Network.
Could be anywhere in between.

A traceroute from the Yahoo! spider's machine (or at least from somewhere in that address space) and a traceroute from the customer's machine _back_ to that same Yahoo! machine would quickly isolate which router is performing the blocking and whether the problem is within either network, or somewhere in the Internet in between.

I am reminded of the Microsoft Support Tech at the shooting range: He keeps shooting at the target, but the spotter keeps telling him he missed the target. Finally, the tech puts his finger in front of the barrel and pulls the trigger, with obvious results. His response? "Things are fine leaving here, the problem must be on the other end."
Hank Cowdog is offline   Reply With Quote
Old 02-04-2005   #8
netguy1970
 
Posts: n/a
who is to blame

if this is indeed a problem with rackspace, then there are two possibilities:

1. rackspace has intentionally decided to screw their customers, and filter yahoo spiders, thus dropping their customers sites from search results, thus causing their customers to go out of business, thus ultimately causing rackspace to go out of business. or...

2. some sort of automated system has blocked yahoo, and the network engineers with rackspace are too stupid to figure it out.

now, obviously option number 1 doesnt make any sense so we have to look at option 2. speaking with rackspace support they have assured me that they do not run any network wide ids systems that would block yahoo. this makes sense since rackspace actually sells dedicated ids systems to their customers. also, rackspace has a very very diverse client base that push god-knows what kind of traffic patterns, and butt-tons of bandwidth. this type of environment would not be conducive to a system wide ids. they would be seeing all sorts of connectivity issues on a daily basis if they tried to implement an ids of that environment.

so now we have to look at the possibility that filtering is taking place either at yahoo, or one of yahoo's network providers. looking at the affected ip space, you'll see that it was newly assigned as of october. this means that it is highly subject to bogon filters. i have actually delt with this type of filtering in the past, and believe me it can be a real pain. pinpointing the device that is responsible for the filtering can be tricky.

lets look at it from the perspective of an entry level yahoo tech. he does a traceroute to rackspace from his desk that comes out something like this

yahoo hop 1 1ms 1ms 1ms
yahoo hop 2 3ms 3ms 3ms
sbc hop 3 10ms 10ms 10ms
sbc hop4 15ms 15ms 15ms
sbc-rackspace hop5 20ms 20ms 20ms
* * *
* * *

now from his perspective the traceroute is dying within the rackspace network, so he assumes that rackspace is filtering him. what he doesn know is that his own network is employing a bogon filter that happens to include rackspace's newly obtained ip space. this means that as soon as his traceroute reaches rackspace, the return traceroute traffic is coming from an ip that is being filtered by yahoo's bogon filter, thus the responses never actually reach him. now depending on how much he understands this problem, and how much he digs into it, he may quickly lay blame to rackspace.

unfortunatly, from rackspaces perspective if they do a traceroute to yahoo they will see similar problems. as soon as they reach the yahoo network, the bogon filter will cause their traceroute to time out. they would thus blame yahoo for the filtering... and back and forth we go. one company blaming the other until the end of time.

so who is the more likely culprit here? yahoo or rackspace? if i were a gambling man, then i would go with yahoo. here is my reasoning; who has more to lose by yahoo spiders being blocked? rackspace or yahoo? yahoo doesnt necessarily gain or lose anything by a few websites not making their search listings. rackspace, on the other hand has plenty to lose if their customers begin leaving them due to this problem. a problem like this might make it to a level 1 tech with yahoo, who very likely doensnt fully understand the problem. however, this sort of issue would get pushed to the highest levels with rackspace. very likely to someone with good understanding of network engineering. if i had to take the word of an entry level yahoo support tech or a high-level network engineer of rackspace, then i would go with the engineer. in addition to this is the plain and clear fact that we are dealing with newly assigned ip space. ip space that is quite likely withing a great many folks bogon filters. it makes much more sense that yahoo is blocking rackspace's new ip space than it is that rackspace is blocking a well known entity such as yahoo.
  Reply With Quote
Old 02-05-2005   #9
wtsandman
Newbie
 
Join Date: Feb 2005
Posts: 4
wtsandman is on a distinguished road
Hopefully we can get this all straightened out.. I'm the head of networking at Rackspace, and I will make sure this gets fixed regardless of where the problem lies.

Netguy1970.. I thought I was going to be annoyed by your post after reading the first paragraph, however thank you.. you made just about every point I would of wanted to.

The datacenter (in Dallas/Grapevine TX) that has the IP block 72.3.128.0/19 is brand new, along with the IP's. ARIN allocated them to Rackspace in Sept 2004, however you will be amazed at the number of Providers/ISP's/businesses out there that are blocking it still, largely due to bogon lists that don't get updated. I have easiliy contacted over 50 providers since Oct. 2004 about getting this filtering removed or adjusted for this IP block, so it wouldn't suprise me if it could be the case here as well. As Netguy1970 mentioned, Rackspace doesn't filter anything that would effect customer connectivity, we don't run an internal IDS, and we allow complete access to any ports, protocols, or IP's for our customers and their clients.

TheotherTim and Mcanerin, I would be happy to work with both of you to get this resolved. Finding out where (IP space) that the spiders are coming from, and getting a traceroute from that location will be key in resolving this problem. We do have a number of customers online in this new data center already and haven't heard of any other problems regarding this yet, however, this certainly could be an isolated incident.

Feel free to contact me with as much information about the spiders as possible, and we will certainly get this resolved.

Thank you,
Tom Sands tsands@rackspace.com
Chief Network Engineer
Rackspace Managed Hosting
wtsandman is offline   Reply With Quote
Old 02-05-2005   #10
mcanerin
 
mcanerin's Avatar
 
Join Date: Jun 2004
Location: Calgary, Alberta, Canada
Posts: 1,564
mcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond repute
Thank you Tom - the response is much appreciated.

It's becoming clearer that this could very well be far more than a simple issue to resolve, and that it could happen due to a number of possible factors.

I'd be happy to act as a go-between to help resolve this (especially since my client is one of the ones affected!) and will be in touch with everyone concerned.

Ironically - it might even be a completely different party at fault - which means that this may not be a Rackspace/Yahoo issue specifically - they may have just been the first victims discovered. I'd personally like to know how many other ISP's are using really new IP's, and how they are faring...

Regardless of the reason it happened, I think it's clear that it's in everyone's best interest that this get taken care of quickly - not just for the immediate client but for others who may be unknowingly affected as well.

Thanks again - also for the record, my client is a huge fan of rackspace and has been hosted with you for years with no problems, and has specifically requested that I try to resolve this issue without considering changing hosts.

Additionally, when I phoned I was answered quickly and professionally, and my client was also contacted quickly with an offer to do whatever it took to resolve this to his satisfaction - kudos on the service!

Hankcowdog - thanks for the insight! If this is indeed what's going on (and we are still checking), then we could be seeing the beginning of a serious potential issue as more and more people start websites and more and more IP's are put into play.

SEO's especially are known to prefer dedicated to shared IP's and may be accelerating the issue. I'd hate to think of what possible issues there might be when IPv6 is brought in...

Now, all we need to do is actually *solve* this...

Ian
__________________
International SEO

Last edited by mcanerin : 02-05-2005 at 03:14 PM.
mcanerin is offline   Reply With Quote
Old 02-06-2005   #11
Hank Cowdog
Member
 
Join Date: Feb 2005
Location: Protohell (aka Texas)
Posts: 5
Hank Cowdog is on a distinguished road
Tom,

Below is a list of the Yahoo! Slurp domain names culled from our access_logs over the last month. No guarantee that these are the one's creating the problem, but it is a start. Since mcanerin is seeing _no_ Yahoo! Slurp traffic since the move, I suspect that they all have the problem.

{edited}

The downside: when I tried tracing from a couple of locations on the 'net, I kept getting no traceroutes results after entering the network. These are from "old" IP blocks, so for my traceroutes, the lack of response is not a bogon issue. I guess that makes these inconclusive at best...

{edited}

Hope this helps.

I think we would really need a traceroute _from_ the Yahoo! spider machine to really solve this

In the mean time, perhaps re-IPing the machine to an IP address that isn't brand new would solve his problem while resolving the issue??

Last edited by mcanerin : 02-06-2005 at 01:44 PM.
Hank Cowdog is offline   Reply With Quote
Old 02-06-2005   #12
wtsandman
Newbie
 
Join Date: Feb 2005
Posts: 4
wtsandman is on a distinguished road
Rackspace

Morning guys,

Ian - you would be amazed at how often this happens, it is something that is regularly posted on NANOG.org (North American Network Operators' Group). There are a lot of ISP's that believe using bogon filters is a good idea, and I will not knock that as a security practice. However, this is a manual process for updating these lists for most ISPs/businesses. Additionally, not everyone uses the same lists, and not all of these lists are kept as accurately as others.


Hank - thank you very much for the info, this is a great start. However, I see the same resultes as you likely due to filtering other than bogon lists. I agree that our next required course of action is to try and get someone from Yahoo invloved that can help look into this. I will likely try posting to the NANOG forum about this since Network members of Yahoo likely lurk there, and most member are very responsive.

If anyone has any additional information, please feel free to chime in, or contact me directly.

Thank you,
Tom Sands tsands@rackspace.com
Chief Network Engineer
Rackspace Managed Hosting
wtsandman is offline   Reply With Quote
Old 02-06-2005   #13
wtsandman
Newbie
 
Join Date: Feb 2005
Posts: 4
wtsandman is on a distinguished road
Update

Just another note..

One good sign is that I can resolve all the spider hosts listed, which would mean that at least we know yahoo's DNS servers aren't blocking any requests from this IP space. However, when doing a traceroue to
ns2.yahoo.com I take the same path, and complete successfully. With known filtering occuring at hop 11, I would venture to say that we are probably pretty close.


{edited}


Tom Sands

Last edited by mcanerin : 02-06-2005 at 01:45 PM.
wtsandman is offline   Reply With Quote
Old 02-06-2005   #14
Hank Cowdog
Member
 
Join Date: Feb 2005
Location: Protohell (aka Texas)
Posts: 5
Hank Cowdog is on a distinguished road
Tom,
FWIW, I can successfully traceroute to their nameserver (ns2.yahoo.com) as well. It is on a different subnet, however, so the routing will be different.

Last edited by mcanerin : 02-06-2005 at 01:46 PM.
Hank Cowdog is offline   Reply With Quote
Old 02-06-2005   #15
mcanerin
 
mcanerin's Avatar
 
Join Date: Jun 2004
Location: Calgary, Alberta, Canada
Posts: 1,564
mcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond repute
Update: Yahoo, Rackspace, and us are all working on this together now, and the cooperation is excellent. Due to potential security and privacy issues, I've edited some of the above posts and sent them separately in an email to all related parties.

I have to say I'm very impressed at the willingness to get to the bottom of this on the part of everyone involved

Naturally, once this is solved I'll let you know what happened and how to prevent it from happening to your clients.

Ian
__________________
International SEO
mcanerin is offline   Reply With Quote
Old 02-22-2005   #16
wtsandman
Newbie
 
Join Date: Feb 2005
Posts: 4
wtsandman is on a distinguished road
Update

Just to update anyone that was following this. The issue of Yahoo spiders mot being able to access the 72.3.128.0/19 IP range was indeed resolved on Feb 7th.

The resolution to this actually involved transit filtering and nothing on the part of Rackspace or Yahoo.

Thank you,
Tom Sands
wtsandman is offline   Reply With Quote
Old 02-24-2005   #17
mcanerin
 
mcanerin's Avatar
 
Join Date: Jun 2004
Location: Calgary, Alberta, Canada
Posts: 1,564
mcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond reputemcanerin has a reputation beyond repute
This issue is now resolved for the client - but may affect others in the future. Mike Churchill and myself just published an article about the story in Mike Grehans E-Marketing News (came out today).

Bogons ate my website!

It was an interesting problem - many thanks to Yahoo, Rackspace, and everyone who contributed to this thread!

Ian
__________________
International SEO
mcanerin is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off