PDA

View Full Version : How to handle addon domain in robots?


clarkmurray
10-31-2005, 08:49 PM
I have an addon domain on my site. So Domain A points to my web root directory "/" and Domain B points to a subfolder "/b". Domain B represents a totally separate business. I just do it this way to save on hosting costs. Should I add an exclude statement for /b in my robots.txt file for Domain A? If I do, will Domain B still be crawled?

mcanerin
10-31-2005, 11:29 PM
If your add-on domain looks like this:

secondsite.firstsite.com

Then you need a separate robots.txt for it.

If it looks like:

www.firstsite.com/secondsite

then you have to use the same robots.txt you are using for the main site.

Ian

clarkmurray
11-01-2005, 12:13 AM
There is a subdomain defined for my second site--i.e. secondsite.firstsite.com. However, I can also get to the second site by entering www.secondsite.com.

So if I understand your reponse correctly, I do need to enter an exclude statement for /secondsite in the robots.txt file for firstsite.com. And secondsite.com will still get crawled. Is that correct?

mcanerin
11-01-2005, 03:49 AM
Hmmm - we may be going in the wrong direction here.

Before I answer, I need to be absolutely sure of something. Is this ONE site, just with 2 possible ways to get to it (ie www.secondsite.com and secondsite.firstsite.com) or are these physically 2 different sites?

Ian

clarkmurray
11-01-2005, 11:34 AM
This is one physical site. secondsite.com is a subdirectory of firstsite.com.

mcanerin
11-01-2005, 11:54 AM
I'm assuming that you are concerned about duplication issues (if not, let me know).

In this case, I don't think there is a problem - it's the same site, just with two different ways to get to it. This even counts for sub-directories. If you type in one of my local newspapers, for example: www.calgaryherald.com you will be taken to a subdirectory that looks like this: http://www.canada.com/calgary/calgaryherald/index.html

...and there are no problems with this at all. In essence, this is not a sub-domain - it's more of a redirect to a folder. The only robots.txt that would work would be the one for the main site, but if you lock off the subfolder there is an extremely high chance that you would kill both. As a matter of fact if it didn't it would be broken. It's supposed to work that way.

I run into this a lot where I have clients that have a main .com site and then country specific subfolders. They then buy ccTLD's for each of the countries and point them at the folders. No problem, no duplication.

In your case, it's even easier. secondsite.firstsite.com is considered to be a totally different thing from www.firstsite.com/second site. Completely different. The scenario you are using, where you have a subdomain that also has it's own domain name, is perfectly normal and nothing to be worried about.

You may run into a possible IBL splitting/duplication issue if you do link building under both URLs (sitck with just one) but otherwise this is a very normal setup and nothign to be concerned with. All a robots.txt would do in this case is affect both.

If you want to avoid even the slightest problem you could 301 the www.secondsite.com to the subdomain, but I don't think it's even necessary, unless you have a significant number of links using the subdomain address instead of the www one.

In short, if I understand you properly, the best option is to keep away from the robots.txt and do linkbuilding using your www.secondsite.com domain name.

Ian

clarkmurray
11-01-2005, 12:06 PM
Sorry, but I'm still not sure we're on the same wavelength. If this were a redirect, you would see the new url in your browser command line. If you enter www.nationalheritagemusic.com and go to the site, you will always see www.nationalheritagemusic.com. In fact, this site is physically located in a subfolder of another site www.digitalmusicdoctor.com.

rogerd
11-01-2005, 04:14 PM
Let me make sure I've got it straight: You have a site, example.com. In the example.com site, you have a folder, second, which is reachable over the Web at second.com. The content in the second folder can't be reached directly from the example site, i.e., there are no links to example.com/second/, only to second.com.

Assuming that's correct, you shouldn't need to include the second folder in your robots.txt unless there is some way to navigate to that folder from a link. At the same time, putting it in your robots.txt file shouldn't hurt anything, since the bots will never see the /second/ folder, they'll only see the second.com root folder.

To control spidering in the second site, you'll need to include a separate robots.txt file in the /second/ folder. (The file/folder names in that robots.txt should treat that folder as document root and not use the /second/ path.)

Don't be too sure you'll never have a link to the /second/ folder. Traffic reports, for example, may list the full example.com/second/ URL. If a traffic report gets spidered somehow, or if a link to the /second/ folder gets published in error, a bot could find it and then spider the whole site the wrong way if you use relative URLs. For that reason, I'd lean toward excluding that folder in the primary site's robots.txt.

Marcia
11-02-2005, 12:48 AM
There is a subdomain defined for my second site--i.e. secondsite.firstsite.com. However, I can also get to the second site by entering www.secondsite.com.Clark, it's not 100% clear whether your second site is in a subdomain like this

secondsite.mainsite.com

or in a sub-folder like this

mainsite.com/secondsite/

It makes a big difference - which way is it for sure?