#1  
Old 01-24-2006
CaliforniaGirl CaliforniaGirl is offline
Member
 
Join Date: Sep 2005
Location: Sydney, Australia
Posts: 42
CaliforniaGirl is on a distinguished road
SEO For Google News

Before I go off and tell my client that their dynamic URL structure will need to be re-thought I have this:

My client is a prominent news site. I have read in Pandia the following:

Quote:
Google News
This is also a prerequisite for getting indexed by Google’s news spiders. However, Google do not care about RSS feeds when it comes to their Google news service.

Instead they will crawl your news home page (i.e. the web page itself) on a regular basis, often several times a day.

In order to have your articles crawled by Google News, their URLs must contain a number consisting of at least three digits.

For example, the Google news crawler will not crawl articles with the following URLs:
www.pandia.com/news/article13.html
http://www.pandia.com/getting-indexe...ogle-news.html

It can crawl these pages:
www.pandia.com/news/20112005/article.html
http://www.pandia.com/news/getting-i...ews/23467.html
Can anyone confirm or deny?

Cheers, CaliGirl

PS the article: http://www.pandia.com/sew/118-how-to...h-engines.html
Reply With Quote
  #2  
Old 01-25-2006
CaliforniaGirl CaliforniaGirl is offline
Member
 
Join Date: Sep 2005
Location: Sydney, Australia
Posts: 42
CaliforniaGirl is on a distinguished road
Further research reveals

Further to my research on URL structures for news sites, particularly in reference to Google News, I have found the following information:

1. Do not re-use URL's - the numbers should be static and unique for each article
2. Do not start filenames with the year or include an ID that begins with the year

Not acceptable: http://www.newnews.com/2006topnews.html
as this might display a different article everyday

3. The crawler expects to see numbers in the article URL's so it knows the difference between an article and a section on a news site and the existence of numbers does this. The recommendation is to use between 3-6 numbers.

4. You can use the numbers in folders like http://www.newnews.com.au/2006/01/25...executions.htm

Summary:
The Google news crawler will not crawl articles with the following URLs:
www.newnews.com.au/news/article15.html
www.newnews.com.au/us_prisoner_executions.htm

It can crawl these pages:
www.newnews.com.au/news/20112005/article.html
http://www.newnews.com.au/news/us_pr...htm/23467.html

If anyone has more information/rules/references to add please do so. The information above can only be considered 2nd hand - nothing explicitly set down by Google News itself.

Thanks,
CaliGirl
Reply With Quote
  #3  
Old 01-25-2006
Archiseek Archiseek is offline
Member
 
Join Date: Jan 2006
Location: Ireland
Posts: 5
Archiseek is on a distinguished road
Google news includes my original news stories. I operate a virtual news clippings service on architecture. As part of that, I publish original news. Google knows where the index of original stories only is, and indexes normally with 7 or 8 minutes of publication.

my links are in the format

site.com/news/year/000001.html


Never reuse a url, it confuses the results
Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -4. The time now is 02:31 AM.