Special thanks to:
|
#1
|
||||
|
||||
|
Well this is an interesting one.
I have a client. He has a website. He ranks very well on all search engines. But his site is a bit slow, and due to his increased volume he needs to move to a better, faster server. Late December, we move the site (leaving a copy on the old server to deal with DNS propogation lag) to the new server. The new server has a brand spanking new, never before used IP address : 72.3.155.1 and 72.3.155.2, specifically. We wait (watching the logs) until all of the spiders and visitors are resolving to the new site, then prepare to remove the old site. No one is visiting anymore, so everything has been switched and there is no possibility of a duplication issue. All nice and clear and best practice, right? I assure you that the site itself is clean and has no spam issues. One problem. His Yahoo rankings tank. Also, his Yahoo Paid Inclusion stops working. We check and find that the spiders are reporting errors. An 8006 error, to be exact. But only Yahoo's spiders. Everyone else is doing fine. BTW, the Yahoo spiders ARE visiting occasionally, so they are not lost, AFAIK. We call, and Position Tech and Yahoo blame robots.txt in what appears to be a form letter. The robots.txt is fine - basically says all spiders are allowed everywhere. 2 lines. After a few days of this, I remove the robots.txt file. No change, no Yahoo spiders. All other spiders are fine. So then we get really annoyed and start bothering PT and Yahoo. Now, they are saying that Rackspace is blocking Yahoo internally: From Position Tech: Quote:
At this point, I can only see 2 possibilities: 1. Rackspaces Fault: they are blocking Yahoo's spiders. But you'd think other Rackspace clients would have noticed - they have a LOT of them. Maybe an isolated issue? 2. Yahoo's fault: the IP's are very new - I'm wondering if they have been blocked off (or never opened up) within the bowels of Yahoo. Naturally, I expect each one to blame the other. Just a word of warning. I'll keep you posted, assuming either one actually cares enough to do something about it. My question is, has this happened to anyone else? What was the cause and solution? Ian
__________________
International SEO Last edited by mcanerin : 02-24-2005 at 12:29 PM. |
|
#2
|
|||
|
|||
|
I just talked to someone in support and they say that this sort of thing happens quite often. A customer of a hosting company like rackspace may have a customer that is being hit by our crawler and not realize who it is so they report the IP to rackspace or another provider and they block the IP at the router level. This would cause other SE to have access and for us not to have access.
I am not quite sure why you are blaming Yahoo! It looks like you have received very good service from us as we have identified the problem and notified you on how to fix it. Please send the domain to me via forum mail if you want us to look into it further. Tim |
|
#3
|
|||
|
|||
|
The BOGONs are to blame
If this were indeed the issue, then a traceroute to and from the Yahoo spider's server and the customer's server would isolate which specific hop in the routing chain is to blame. The implication here is that the Yahoo! Spider's IP address is triggering the problem.
I think the problem is more likely the fact that the customer server's IP address has only recently been allocated, and may still be on one or more routers' BOGON list. Possibility 3: A router between all/some of Yahoo!'s spiders is blocking the traffic from a mistaken belief that the brand new IP address is still unallocated (and hence not valid). If you look at the BOGON allocation list, it shows that 94% of the IP addresses in the 72.X.X.X range are NOT currently allocated. If the IP in question was really just recently allcoated, there may be some routers out there that are still blocking this traffic. Last edited by Hank Cowdog : 02-04-2005 at 04:08 PM. |
|
#4
|
|||
|
|||
|
Ian,
We did some further testing and our crawler IPs are in fact blocked from Rackspace. Tim |
|
#5
|
||||
|
||||
|
Excellent!
Well, the situation isn't, but the response from Yahoo is - thank you for your time and effort to check this. My thinking on whether Yahoo may be the issue was very much in line with what HankCowdog was saying, so getting an actual confirmation from Yahoo was, IMO, needed as part of a troubleshooting checklist. Nothing personal ![]() So far, PositionTech and Yahoo agree that the spiders are being blocked, and that is good information. I'm focused on Rackspace now. As expected, their initial response was that they have lots of clients that are not affected and it is probably Yahoo's fault, though they did acknowledge it could be an "isolated incident". I'll point this thread out to them - I'm inclined to believe Yahoo and PT at this point. Thanks for the quick response, TheotherTim! For the record, I HAVE received excellent service from Yahoo for myself and my clients for many years now. If I could ask, is there a simple method or checklist an SEO or webmaster could do to isolate this issue and ascertain whether or not it is the ISP or not? Ie, how can I tell if it's the spider IP being blocked or simply an unspiderable site, etc. I know this particular site is ok, but I'm wondering how many people out there who are being blocked and just thinking they need to "optimize better". Since my own workstations IP's are not blocked I'm not sure how to do it from my end, and since most webhosts will immediatly deny any fault (being human nature), it would be nice to go to them with some evidence. Thanks again for looking into this, Ian
__________________
International SEO Last edited by mcanerin : 02-04-2005 at 04:58 PM. Reason: spelink errers.. ;) |
|
#6
|
|||
|
|||
|
I have a client on Rackspace UK who is fine for UK rankings - I figure you're talking about Rackspace in the US?
Also, is there a Yahoo! address general webmasters can contact about such concerns? I have a clean non-commercial reference site that Yahoo! doesn't recognise, despite recognising 20,000 links to it. Possible redirects issue. |
|
#7
|
|||
|
|||
|
I think the fact that Yahoo! cannot reach the RackSpace server is clear. The question is: where in the path of many hops between the Yahoo! spider's machine and the RackSpace customer's machine does the comm break down?
Could be just outside of Yahoo!'s server's location. Could be inside RackSpace's Network. Could be anywhere in between. A traceroute from the Yahoo! spider's machine (or at least from somewhere in that address space) and a traceroute from the customer's machine _back_ to that same Yahoo! machine would quickly isolate which router is performing the blocking and whether the problem is within either network, or somewhere in the Internet in between. I am reminded of the Microsoft Support Tech at the shooting range: He keeps shooting at the target, but the spotter keeps telling him he missed the target. Finally, the tech puts his finger in front of the barrel and pulls the trigger, with obvious results. His response? "Things are fine leaving here, the problem must be on the other end." |
|
#8
|
|||
|
|||
|
who is to blame
if this is indeed a problem with rackspace, then there are two possibilities:
1. rackspace has intentionally decided to screw their customers, and filter yahoo spiders, thus dropping their customers sites from search results, thus causing their customers to go out of business, thus ultimately causing rackspace to go out of business. or... 2. some sort of automated system has blocked yahoo, and the network engineers with rackspace are too stupid to figure it out. now, obviously option number 1 doesnt make any sense so we have to look at option 2. speaking with rackspace support they have assured me that they do not run any network wide ids systems that would block yahoo. this makes sense since rackspace actually sells dedicated ids systems to their customers. also, rackspace has a very very diverse client base that push god-knows what kind of traffic patterns, and butt-tons of bandwidth. this type of environment would not be conducive to a system wide ids. they would be seeing all sorts of connectivity issues on a daily basis if they tried to implement an ids of that environment. so now we have to look at the possibility that filtering is taking place either at yahoo, or one of yahoo's network providers. looking at the affected ip space, you'll see that it was newly assigned as of october. this means that it is highly subject to bogon filters. i have actually delt with this type of filtering in the past, and believe me it can be a real pain. pinpointing the device that is responsible for the filtering can be tricky. lets look at it from the perspective of an entry level yahoo tech. he does a traceroute to rackspace from his desk that comes out something like this yahoo hop 1 1ms 1ms 1ms yahoo hop 2 3ms 3ms 3ms sbc hop 3 10ms 10ms 10ms sbc hop4 15ms 15ms 15ms sbc-rackspace hop5 20ms 20ms 20ms * * * * * * now from his perspective the traceroute is dying within the rackspace network, so he assumes that rackspace is filtering him. what he doesn know is that his own network is employing a bogon filter that happens to include rackspace's newly obtained ip space. this means that as soon as his traceroute reaches rackspace, the return traceroute traffic is coming from an ip that is being filtered by yahoo's bogon filter, thus the responses never actually reach him. now depending on how much he understands this problem, and how much he digs into it, he may quickly lay blame to rackspace. unfortunatly, from rackspaces perspective if they do a traceroute to yahoo they will see similar problems. as soon as they reach the yahoo network, the bogon filter will cause their traceroute to time out. they would thus blame yahoo for the filtering... and back and forth we go. one company blaming the other until the end of time. so who is the more likely culprit here? yahoo or rackspace? if i were a gambling man, then i would go with yahoo. here is my reasoning; who has more to lose by yahoo spiders being blocked? rackspace or yahoo? yahoo doesnt necessarily gain or lose anything by a few websites not making their search listings. rackspace, on the other hand has plenty to lose if their customers begin leaving them due to this problem. a problem like this might make it to a level 1 tech with yahoo, who very likely doensnt fully understand the problem. however, this sort of issue would get pushed to the highest levels with rackspace. very likely to someone with good understanding of network engineering. if i had to take the word of an entry level yahoo support tech or a high-level network engineer of rackspace, then i would go with the engineer. in addition to this is the plain and clear fact that we are dealing with newly assigned ip space. ip space that is quite likely withing a great many folks bogon filters. it makes much more sense that yahoo is blocking rackspace's new ip space than it is that rackspace is blocking a well known entity such as yahoo. |
|
#9
|
|||
|
|||
|
Hopefully we can get this all straightened out.. I'm the head of networking at Rackspace, and I will make sure this gets fixed regardless of where the problem lies.
Netguy1970.. I thought I was going to be annoyed by your post after reading the first paragraph, however thank you.. you made just about every point I would of wanted to. The datacenter (in Dallas/Grapevine TX) that has the IP block 72.3.128.0/19 is brand new, along with the IP's. ARIN allocated them to Rackspace in Sept 2004, however you will be amazed at the number of Providers/ISP's/businesses out there that are blocking it still, largely due to bogon lists that don't get updated. I have easiliy contacted over 50 providers since Oct. 2004 about getting this filtering removed or adjusted for this IP block, so it wouldn't suprise me if it could be the case here as well. As Netguy1970 mentioned, Rackspace doesn't filter anything that would effect customer connectivity, we don't run an internal IDS, and we allow complete access to any ports, protocols, or IP's for our customers and their clients. TheotherTim and Mcanerin, I would be happy to work with both of you to get this resolved. Finding out where (IP space) that the spiders are coming from, and getting a traceroute from that location will be key in resolving this problem. We do have a number of customers online in this new data center already and haven't heard of any other problems regarding this yet, however, this certainly could be an isolated incident. Feel free to contact me with as much information about the spiders as possible, and we will certainly get this resolved. Thank you, Tom Sands tsands@rackspace.com Chief Network Engineer Rackspace Managed Hosting |
|
#10
|
||||
|
||||
|
Thank you Tom - the response is much appreciated.
It's becoming clearer that this could very well be far more than a simple issue to resolve, and that it could happen due to a number of possible factors. I'd be happy to act as a go-between to help resolve this (especially since my client is one of the ones affected!) and will be in touch with everyone concerned. Ironically - it might even be a completely different party at fault - which means that this may not be a Rackspace/Yahoo issue specifically - they may have just been the first victims discovered. I'd personally like to know how many other ISP's are using really new IP's, and how they are faring... Regardless of the reason it happened, I think it's clear that it's in everyone's best interest that this get taken care of quickly - not just for the immediate client but for others who may be unknowingly affected as well. Thanks again - also for the record, my client is a huge fan of rackspace and has been hosted with you for years with no problems, and has specifically requested that I try to resolve this issue without considering changing hosts. Additionally, when I phoned I was answered quickly and professionally, and my client was also contacted quickly with an offer to do whatever it took to resolve this to his satisfaction - kudos on the service! Hankcowdog - thanks for the insight! If this is indeed what's going on (and we are still checking), then we could be seeing the beginning of a serious potential issue as more and more people start websites and more and more IP's are put into play. SEO's especially are known to prefer dedicated to shared IP's and may be accelerating the issue. I'd hate to think of what possible issues there might be when IPv6 is brought in... ![]() Now, all we need to do is actually *solve* this... Ian
__________________
International SEO Last edited by mcanerin : 02-05-2005 at 04:14 PM. |
|
#11
|
|||
|
|||
|
Tom,
Below is a list of the Yahoo! Slurp domain names culled from our access_logs over the last month. No guarantee that these are the one's creating the problem, but it is a start. Since mcanerin is seeing _no_ Yahoo! Slurp traffic since the move, I suspect that they all have the problem. {edited} The downside: when I tried tracing from a couple of locations on the 'net, I kept getting no traceroutes results after entering the network. These are from "old" IP blocks, so for my traceroutes, the lack of response is not a bogon issue. I guess that makes these inconclusive at best... {edited} Hope this helps. I think we would really need a traceroute _from_ the Yahoo! spider machine to really solve this ![]() In the mean time, perhaps re-IPing the machine to an IP address that isn't brand new would solve his problem while resolving the issue?? Last edited by mcanerin : 02-06-2005 at 02:44 PM. |
|
#12
|
|||
|
|||
|
Rackspace
Morning guys,
Ian - you would be amazed at how often this happens, it is something that is regularly posted on NANOG.org (North American Network Operators' Group). There are a lot of ISP's that believe using bogon filters is a good idea, and I will not knock that as a security practice. However, this is a manual process for updating these lists for most ISPs/businesses. Additionally, not everyone uses the same lists, and not all of these lists are kept as accurately as others. Hank - thank you very much for the info, this is a great start. However, I see the same resultes as you likely due to filtering other than bogon lists. I agree that our next required course of action is to try and get someone from Yahoo invloved that can help look into this. I will likely try posting to the NANOG forum about this since Network members of Yahoo likely lurk there, and most member are very responsive. If anyone has any additional information, please feel free to chime in, or contact me directly. ![]() Thank you, Tom Sands tsands@rackspace.com Chief Network Engineer Rackspace Managed Hosting |
|
#13
|
|||
|
|||
|
Update
Just another note..
One good sign is that I can resolve all the spider hosts listed, which would mean that at least we know yahoo's DNS servers aren't blocking any requests from this IP space. However, when doing a traceroue to ns2.yahoo.com I take the same path, and complete successfully. With known filtering occuring at hop 11, I would venture to say that we are probably pretty close. {edited} Tom Sands Last edited by mcanerin : 02-06-2005 at 02:45 PM. |
|
#14
|
|||
|
|||
|
Tom,
FWIW, I can successfully traceroute to their nameserver (ns2.yahoo.com) as well. It is on a different subnet, however, so the routing will be different. Last edited by mcanerin : 02-06-2005 at 02:46 PM. |
|
#15
|
||||
|
||||
|
Update: Yahoo, Rackspace, and us are all working on this together now, and the cooperation is excellent. Due to potential security and privacy issues, I've edited some of the above posts and sent them separately in an email to all related parties.
I have to say I'm very impressed at the willingness to get to the bottom of this on the part of everyone involved Naturally, once this is solved I'll let you know what happened and how to prevent it from happening to your clients. Ian
__________________
International SEO |
|
#16
|
|||
|
|||
|
Update
Just to update anyone that was following this. The issue of Yahoo spiders mot being able to access the 72.3.128.0/19 IP range was indeed resolved on Feb 7th.
The resolution to this actually involved transit filtering and nothing on the part of Rackspace or Yahoo. Thank you, Tom Sands |
|
#17
|
||||
|
||||
|
This issue is now resolved for the client - but may affect others in the future. Mike Churchill and myself just published an article about the story in Mike Grehans E-Marketing News (came out today).
Bogons ate my website! It was an interesting problem - many thanks to Yahoo, Rackspace, and everyone who contributed to this thread! Ian
__________________
International SEO |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|