View Full Version : Change of web structure - effects on Google Index
splinters
06-20-2005, 09:17 PM
I've recently changed our web site structure from querystring-based URLs (company.com/genericpage.asp?parent=x&child=y), to folder-structured URLs (company.com/parent_x/child_y/pagename.asp. The domain name and page content are both unchanged. For backward compatibility (users' bookmarks, etc), the querystring form of the URL still works. Will this adversely affect the indexing of our site's pages on subsequent googlebot crawls because of the apparent duplicate page content?
If so, what is the best strategy for getting the site fully indexed by its new URL structure?
By the way, no redirect is being used to handle both forms of URL - I'm using a ModRewrite method.
alexo
06-20-2005, 11:03 PM
I've recently changed our web site structure from querystring-based URLs (company.com/genericpage.asp?parent=x&child=y), to folder-structured URLs (company.com/parent_x/child_y/pagename.asp. The domain name and page content are both unchanged. For backward compatibility (users' bookmarks, etc), the querystring form of the URL still works. Will this adversely affect the indexing of our site's pages on subsequent googlebot crawls because of the apparent duplicate page content?
If so, what is the best strategy for getting the site fully indexed by its new URL structure?
By the way, no redirect is being used to handle both forms of URL - I'm using a ModRewrite method.
hmm .. good question.
i'm in the same situation.
Marcia
06-20-2005, 11:21 PM
For Apache you'd have to redirect the old pages to the new server-side. There's a tutorial in the Dynamic Websites Forum here on how to do that. But with IIS it's different - the equivalent is ISAPI rewrite.
What you want to do is redirect those URLs server-side, which is transparent to user agents. That will avoid turning up with duplicate content.
splinters
06-20-2005, 11:31 PM
Thanks Marcia. I'm actually using Context.ReWritePath() in ASP.NET, which is doing the same thing as ISAPI Rewrite, to transparently map the folder-structured URLs to their querystring-style equivalents. That's not my concern. Google has already indexed the site using the querystring-style URLs, but now the site is accessed using the folder-structured ones. Won't the crawler recognize identical content for 2 different URLs and hence not index the new-style (folder-structured) URLs?
kool aussie
06-21-2005, 12:45 AM
Splinters, I've recently done something similar and used robots.txt to tell search engines not to index the old company.com/genericpage.asp?parent=x&child=y (except I'm using php). Google, instead of dropping these has put the 'php' pages into it's supplemental index which may be causing a 'duplicate page' penalty as my site suffered a major drop in the burbon update.
In some cases these 'php' pages no longer exist due the way I've rewritten the website but google still lists them in its supplemental index, in fact they also include pages in the supplemental index which i've specifically disallowed in the robots.txt file.
Does anyone know another way to stop google adding pages to the supplemental index. Google now says i have over 80,000 pages in my site (site:domain.com) when it should be only about half that.