|
#1
|
|||
|
|||
|
Hi all,
I have a domain's www.mydomainname.com, and now i set up www1.mydomainname.com, www2.mydomainname.com and www3.mydomainname.com. They are all available mirrors of www.mydomainname.com. Is this treated as spam? Will it effect SERP for mydomainname.com. Furthermore, today i run a command for 'site:www2.domain-name.com' on Google, many pages had been indexed by it yet. Our primary site(www.mydomainname.com) will be penalized by google? Or how to use robots.txt to block spiders from my mirror sites, just the main server (www).? Thanks in advance, |
|
#2
|
||||
|
||||
|
That's duplicate content, it's a problem in the making. A page should only show up with ONE only URL - not more than one.
|
|
#3
|
|||
|
|||
|
Hello Marcia,
Thanks for the quick reply. But I'm very confused. I've noticed many companies who implement multiple subdomain and not be penalized. I see they have been successful at having www spidered, but keeping Googlebot off www2 and www3 servers: www.cnn.com - 399,000 results www2.cnn.com - 607 results www3.cnn.com - 10,900 results Can you give insight on how they're doing this? Thanks again. |
|
#4
|
|||
|
|||
|
You could simply put in a robots.txt file which deny's google access on the other servers.
|
|
#5
|
|||
|
|||
|
Could you further tell me how to use robots.txt to deny googlebot access my other servers(www1,www2,www3)? Thanks again.
Quote:
|
|
#6
|
|||
|
|||
|
Sure
![]() Read this http://www.javascriptkit.com/howto/robots.shtml its very simple and is done by adding a sigle file to your server that is called robots.txt and contains: User-agent: * Disallow: / /Seebach |
|
#7
|
|||
|
|||
|
Hi Seebach,
Thanks again for your help. I can't use those two line commands, because the multiple domains on one server(i,e www1,www2,www3 and WWW on one server). So I just don't know how to use robots.txt to block googlebot from my mirror sites(www1,www2,www3)? Thanks again. Quote:
Last edited by zhan : 07-01-2006 at 04:26 AM. |
|
#8
|
||||
|
||||
|
You'll need a robots.txt for the root directory of each of those subdomains. There's a pretty simple explanation here with references:
http://modwest.com/help/kb2-197.html It will go into the same directory as the main index pages for the subdomains. |
|
#9
|
||||
|
||||
|
I can't believe that some companies (including a major airline) consider this to be good implementation of load balancing. Not only is it bad for SEO, it's poor/lazy technical planning. I looked at a website recently which used JavaScript to randomly choose one of four servers (www,www2,www3,www4) in an attempt to balance load. If www2 went down, the JS would still send a quarter of new visitors to the server and people with www2 in their bookmarks, history or search results would also end up visiting a broken server. Unsurprisingly the website with four times more indexed pages then actual pages didn't even rank for their own brand name.
Not a dig at you zhan, just at the people who should no better. Rob |
|
#10
|
||||
|
||||
|
Quote:
If they are pointed to different folders, you'll be able to use Marcia's advice, otherwise - can you tell us whether the server uses Apache or IIS web server and what code your pages are programmed in (HTML,PHP,ASP,CFML etc)? |
|
#11
|
||||
|
||||
|
Each of the subdomains is equivalent to being a separate domain, and each needs its own robots.txt file
User-agent: * Disallow: / That will go into the root directory folder of each of the subdomains, the same folder where you've got the main index page for the subdomain. You'll find complete documentation for the protocol here at this site: Robots Exclusion Protocol |
|
#12
|
|||
|
|||
|
If the four subdomains resolve to the same physical place, then there is absolutely no need to have four subdomains in the first place.
|
|
#13
|
||||
|
||||
|
I haven't got a clue why there are extra subdomains and what they actually resolve to.
|
|
#14
|
||||
|
||||
|
:scratches head and backs away slowly:
![]() |
|
#15
|
|||
|
|||
|
Many thanks for all the inputs.
The 4 subdomains all go to the same web folder on a IIs-based server, and all the pages are programmed in simple HTML code. So i can' t put robots.txt for the root directory of each of those subdomains. Any ideas? Quote:
|
|
#16
|
|||
|
|||
|
Many sites have also set up many extra subdomains for load balancing, but we just use them to track sales for each sale representative.
Quote:
|
|
#17
|
||||
|
||||
|
Quote:
Rob |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|