Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Google > Other Google Issues
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 02-24-2005   #1
SEORoy
Member
 
Join Date: Feb 2005
Posts: 33
SEORoy is on a distinguished road
Lightbulb How to get Google Spider to Crawl a site?

hi..

I have been looking at the googlebot reports of some of our sites... I have one site where the googlebot does not seem to be crawling around for long, and also it just seem to be crawling say one page and then leaves the site.

Can you please direct me to some articles or please advise on how to get the googlebot to stay longer on the website..or put in other words, crawl more pages.... OR Do I have to update my website every 2-3 days so that Googlebot sees some changes and decides that it need to crawl some more?

Any comments/suggestions/ideas will be much appreciated.
SEORoy is offline   Reply With Quote
Old 02-24-2005   #2
mcavill
Member
 
Join Date: Jun 2004
Location: UK
Posts: 7
mcavill is on a distinguished road
More links point at your site (including deep links) usually results in deeper more frequent crawling. Keeping the content fresh also seems to help...
mcavill is offline   Reply With Quote
Old 02-25-2005   #3
azhariqbal
Member
 
Join Date: Sep 2004
Posts: 12
azhariqbal is on a distinguished road
Cool

all you need good/proper inbound links that could attach whole web site properly and let spider show the way how to crawl. site map page would be a better option to guide spider to crawl whole web site.
azhariqbal is offline   Reply With Quote
Old 02-25-2005   #4
Mikkel deMib Svendsen
 
Mikkel deMib Svendsen's Avatar
 
Join Date: Jun 2004
Location: Copenhagen, Denmark
Posts: 1,576
Mikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud of
It could be that there is something that "confuses" the crawler on your site or make it not want to index it fully - even if you have all the links in the world. It is not always easy to identify all such indexing barriers but basically you have to ...

1) Make sure your entire site can be accessed with a text-only browser following (text) links from page to page.

2) Make sure that pages can only be accessed from ONE URL and that there are no endless (or close to endless) loops of links (for example endless calendars or autogenerated pages)
Mikkel deMib Svendsen is offline   Reply With Quote
Old 02-25-2005   #5
Michael Martinez
Member
 
Join Date: Jul 2004
Posts: 336
Michael Martinez is on a distinguished road
Post

Every page should have a text-only link to your site map. Your site map should be text-only links. Don't depend on fancy navigation tools like Javascript or Flash.
Michael Martinez is offline   Reply With Quote
Old 03-04-2005   #6
SEORoy
Member
 
Join Date: Feb 2005
Posts: 33
SEORoy is on a distinguished road
Thanks all of you for your replies... and ideas...



From most of my googlebot reports i can see that the spider crawled thru 2-3 pages and then came back the next day and crawled thru 2-3 more pages...

The other question will be how long does a spider stays/crawls your site at each visit... and how can we, if at all possible, get it to stay a bit longer...

Also not sure whether this is actually an issue..if the spider does not crawl more than 2-3 pages per visit...

Any ideas/advice will be much appreciated
SEORoy is offline   Reply With Quote
Old 03-04-2005   #7
Marcia
 
Marcia's Avatar
 
Join Date: Jun 2004
Location: Los Angeles, CA
Posts: 5,476
Marcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond repute
Welcome to the forums, Roy.

If a human visitor can navigate the entire site by following text links, then a crawler should have no problem. Sometimes when sites are brand new few pages are crawled, but eventually an entire site can be crawled in a session.
Marcia is offline   Reply With Quote
Old 03-04-2005   #8
Michael Martinez
Member
 
Join Date: Jul 2004
Posts: 336
Michael Martinez is on a distinguished road
Post

In my experience (confirmed by conversations with engineers at both Google and Inktomi), there are two kinds of crawls.

The regular crawl is where the spiders only retrieve specific URLs which are placed in a queue. As your site grows and is linked to by an increasing number of other sites, more of your page URLs will be placed into the regular queue.

However, from time to time, the search engines may initiate a special "deep crawl" of your site, in which the spider crawls only your site, following links immediately. A deep crawl like that is fairly rare for the majority of sites, although MSN and Google seem to be retrieving all actively updated pages from my forums these days.

I did not learn why Google ended up deep-crawling my sites on the occasions where I had to call them (their crawler got stuck and nearly crashed my server). When Inktomi's crawler got stuck, they told me the deep crawl had been requested by one of their customers.

If you have dynamic URLs (such as are generated by forums and extensive product catalogues), the chances that you will know you are being deep crawled are very good. The crawlers will affect your server performance.

Less than two weeks ago, my server started running very slow. Our statistics indicated that at least two crawlers (Google and MSN) were actively retrieving a large number of pages. We have been watching for more behavior like that because sometimes it becomes a problem and we have to modify our robots.txt file to tell the crawlers to leave us alone for a while.

I don't like doing that, but sometimes it's necessary.
Michael Martinez is offline   Reply With Quote
Old 03-05-2005   #9
JohnW
 
JohnW's Avatar
 
Join Date: Jun 2004
Location: Virginia Beach, VA.
Posts: 976
JohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud of
>modify our robots.txt file to tell the crawlers to leave us alone for a while.

Wouldn’t it make more sense to clean up the problems and add proper resources? Is this site profitable/important to you, and are the pages relevant? If so, I cannot imagine blocking Gbot with robots.txt.

>Every page should have a text-only link to your site map.

That’s not right. Did you mean to say - to use text links from the site map to every page?

SEORoy said:
>update my website every 2-3 days so that Googlebot sees some changes and decides that it need to crawl some more?

In addition to some of the other stuff mentioned here, yes. Assuming Gbot already knows about the pages, and you are simply trying to get them crawled more frequently, then adding/refreshing content is another very good idea.
JohnW is offline   Reply With Quote
Old 03-05-2005   #10
Michael Martinez
Member
 
Join Date: Jul 2004
Posts: 336
Michael Martinez is on a distinguished road
Post

Quote:
Originally Posted by JohnW
>Every page should have a text-only link to your site map.

That’s not right. Did you mean to say - to use text links from the site map to every page?
Every page should have a text-only link to the site map.
Michael Martinez is offline   Reply With Quote
Old 03-05-2005   #11
JohnW
 
JohnW's Avatar
 
Join Date: Jun 2004
Location: Virginia Beach, VA.
Posts: 976
JohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud of
Let me be clear - that is absolutely incorrect. I'm not trying to start an argument here but that is some of the worst advice I have seen posted here and I can't let it slide.

The proper thing to do is link your site map from only enough pages to make sure it gets spidered (home page only usually is enough), and use the site map to identify other pages. Unless, that is, you are trying to get your site map to rank highly, which doesn't sound like smart seoto me.
JohnW is offline   Reply With Quote
Old 03-06-2005   #12
Michael Martinez
Member
 
Join Date: Jul 2004
Posts: 336
Michael Martinez is on a distinguished road
Post

Quote:
Originally Posted by JohnW
Let me be clear - that is absolutely incorrect.
Let me be even more clear. You are completely wrong.

EVERY PAGE SHOULD LINK TO THE SITE MAP. The most important reason this is so is that there is no knowing which page a user will enter a site from. Getting to the site map should be easy and quick. "Users" includes spiders.

Now. You've stated your position, I have stated mine.

That should be sufficient.
Michael Martinez is offline   Reply With Quote
Old 03-06-2005   #13
Mikkel deMib Svendsen
 
Mikkel deMib Svendsen's Avatar
 
Join Date: Jun 2004
Location: Copenhagen, Denmark
Posts: 1,576
Mikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud of
Sorry guys, but this is not a question of "correct" or "incorrect" - that is in fact the only real incorrect statements here! It all depends on the site, design and other issues. Is it a 10 page site? A 1000 page site? Or a site with 17 million pages? What kind of navigation scheme do you use?

What is, in my opinion, very wrong is if you believe there is only one "truth" to SEO! The true fact is that 99% of what we do is a matter of personal experience and believes. Please do not turn this into a "fact" discussion.
Mikkel deMib Svendsen is offline   Reply With Quote
Old 03-06-2005   #14
JohnW
 
JohnW's Avatar
 
Join Date: Jun 2004
Location: Virginia Beach, VA.
Posts: 976
JohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud ofJohnW has much to be proud of
Mikel thanks for the adult viewpoint. You make a good point; there are different types of sites, some much larger than others. Your point about correct vs. incorrect "facts" is also valid. And opinions are like, well, like the old story goes – everybody has one.

Here's mine. In larger sites, it may make sense to use multiple sites maps (maybe one per category) and then link all of the site maps together with a master site map. In the example of the large or very large site, the importance of proper internal linking structure stands out even more so than in a small site.

Regardless of the size of the site, IMO it would be poor practice to link every page on the site to the site map, because linking every page to the site map would create an unnecessary meshing in the site design causing poor focus of internal link-pop (call it PR if you like). The only exception to this that I can think of would be if one were trying to gain rankings for the site map page for some reason, and that does not make sense in most cases.

Michael Martinez – We seem to be going down a rabbit trail here. Could I invite you to start a new thread about proper site-map implementation?
JohnW is offline   Reply With Quote
Old 03-06-2005   #15
SEORoy
Member
 
Join Date: Feb 2005
Posts: 33
SEORoy is on a distinguished road
Red face

Appreciate a lot the information going on in your discussion... I am trying to understand each of your comments

While I can understand where Mikkel coming from... I guess there wont be anybody out there who can be sure of a list of steps that one can carry out to get high ranking (because at the end of the day...thats what everybody is in SEO for...to achieve high ranking for some keywords)... So most of information posted will be based on user's experience

The problem is for people like me, i mean newbies, its easy to misunderstand..

Anyway... I would like to know if

1. Its a bad idea / or if there are any shortcomings in having a link to the sitemap on every page of my website.. I agree with Michael... as Users can land on any pages of my website...if users can..that means spiders also can...so looks like having this sitemap on every page is a good idea... and it also looks like... giving the spider more chance to crawl more pages on my site... Does this comment look okay to you ?... Or am i missing any obvious things... I am very sorry in case this looks like a stupid question.

2. I have noticed that one of my site with PR 5... gets a visit from Googlebot basically everyday... but one of my other site with PR 4... gets visited by Googlebot every 2-3 days...sometimes after longer period... so I am just thinking maybe there's some "policy" by Googlebot...where its been instructed to Visit all sites with PR 5+ everyday... but then there would be so many sites with PR 5 how would it manage to visit all those site in a day... So somehow it does not fit/make sense.. Does anyone have some idea if PR plays any role when Googlebot decide which sites to visit?

3. I have read couple of places that ...Googlebot will eventually visit your site more frequently if your website content changes (I think by more than 10KB) frequently... But from my observation upto now..seems that no matter how much modification to my PR 4 site ...and none on my PR 5 site... Googlebot still visits my PR 5 site on a daily basis... while my PR4 site gets visited ...only now and then..even though there's been quite some changes on it. However I do understand that it may take quite a while for Google to find out that my PR 4 site is updated regularly... and perhaps they will decide to give me a PR 5 and perhaps the site will then start appearing in at least TOP 20 for my most competitive word..for which we are currently listed in TOP 5 in Yahoo and MSN...but nowhere to be found on Google.

Well these are thought in a newbie's diary right now... Please advise in case you think I am on the wrong track..or if it looks that I have made some wrong assumptions....
SEORoy is offline   Reply With Quote
Old 03-06-2005   #16
PhilC
Member
 
Join Date: Oct 2004
Location: UK
Posts: 1,657
PhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud of
I think you are both correct about links to the sitemap. Michael Martinez is correct if the sitemap is there for people's benefit - a link to it from every page is the best thing to do. If it is there for the benefit of spiders, then a link to it from the home page only is the best thing to do - as JohnW suggested. It comes down to the purpose of the sitemap page.
PhilC is offline   Reply With Quote
Old 03-06-2005   #17
lots0
 
Posts: n/a
Then there are people like me...
I usually always put more than one site map on a site... cuz, as it has been pointed out here, there are different reasons to put up a site map and different expectations of what a site map should do.

So why limit yourself to just one?
  Reply With Quote
Old 03-06-2005   #18
Mikkel deMib Svendsen
 
Mikkel deMib Svendsen's Avatar
 
Join Date: Jun 2004
Location: Copenhagen, Denmark
Posts: 1,576
Mikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud ofMikkel deMib Svendsen has much to be proud of
I will try to adress your questions as precise as possible, SEORoy

1) From a SEO point of view I often find it best to add a link to the sitemap from every page. On large websites this can be either to the master-sitemap or the relevant category-sitemap. I have yet to see a design where it could not fit in.

2) Linkpopularity definately play a role in how often a website is spidered - but it's far from the only factor. Server response, content updates, and internal link structures also play a role. So, I don't think it's as simple as "PR 5 get spidered dayly, PR4 every 3-4 days ..."

3) In my experience it can indeed play a very important role in indexing as well as ranking to have a frequently updated website. I have even seen examples of how new, highly dynamic, sections such as a forum can increase rankings for non-forum pages on other parts of the same site. However, in my experience it is the total dynamic of the site that counts - it's not just simple question of how many new pages, or how much (KB) is changed.
Mikkel deMib Svendsen is offline   Reply With Quote
Old 03-06-2005   #19
Nacho
 
Nacho's Avatar
 
Join Date: Jun 2004
Location: La Jolla, CA
Posts: 1,382
Nacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to behold
Site maps are only a tactic to increase the crawler's ability of taking back your site to the search engine's index. I believe the purpose of this thread is a lot more than just talking about one tactic, don't you?

I always believed that clean navigation, clean urls, no duplication, fast pages, readable pages, clean code, and a lot more are ways to let the crawler know you have a site worth deep crawling. Then crawling is one thing, but re-crawling is much more important (just as it is in the retail world with repurchases). This then goes into what we have talked about before, which is "Time to Refresh" a Webpage on the Search Engine's Index. It's about letting the search engines know you have a popular website.
Nacho is offline   Reply With Quote
Old 03-06-2005   #20
Michael Martinez
Member
 
Join Date: Jul 2004
Posts: 336
Michael Martinez is on a distinguished road
Post

Quote:
Originally Posted by SEORoy
1. Its a bad idea / or if there are any shortcomings in having a link to the sitemap on every page of my website.. I agree with Michael... as Users can land on any pages of my website...if users can..that means spiders also can...so looks like having this sitemap on every page is a good idea... and it also looks like... giving the spider more chance to crawl more pages on my site... Does this comment look okay to you ?... Or am i missing any obvious things... I am very sorry in case this looks like a stupid question.
The concern about link popularity making a site map outrank other content pages is based on either a misunderstanding of how link popularity is applied by the search engines, or the experience of having looked at sites which were so poorly designed that the site maps DID outrank other content pages (I have seen that happen myself).

The site map is the most important part of a Web site. It is the glue which binds the site together. It is also the key validating factor in a Web site that separates the real content pages from the spam pages. If a page won't link back to the site map, that is a red flag. Red flags, by themselves, don't necessarily cause anything bad to happen, but it is reasonable to ask why a page won't link to the site map.

Quote:
2. I have noticed that one of my site with PR 5... gets a visit from Googlebot basically everyday... but one of my other site with PR 4... gets visited by Googlebot every 2-3 days...sometimes after longer period... so I am just thinking maybe there's some "policy" by Googlebot...where its been instructed to Visit all sites with PR 5+ everyday... but then there would be so many sites with PR 5 how would it manage to visit all those site in a day... So somehow it does not fit/make sense.. Does anyone have some idea if PR plays any role when Googlebot decide which sites to visit?
I have PR4 and PR3 sites which get visited regularly. While I believe that Google does return to higher PR sites more frequently, the chief value in having links from those sites lies in the fact that Google will follow them (and therefore follow incoming links to respider your site).

Google operates multiple spiders at a time. Crawling millions of pages every day is not a challenge for them. In fact, the "crawling" consists only of retrieving some information about the page, comparing it to the last time the page was cached, and then retrieving the page itself if it's current information indicates it has changed.

The rest of the work done by Google (parsing the page, reindexing the content, etc.) is handled by programs other than the spiders.

Quote:
3. I have read couple of places that ...Googlebot will eventually visit your site more frequently if your website content changes (I think by more than 10KB) frequently... But from my observation upto now..seems that no matter how much modification to my PR 4 site ...and none on my PR 5 site... Googlebot still visits my PR 5 site on a daily basis... while my PR4 site gets visited ...only now and then..even though there's been quite some changes on it. However I do understand that it may take quite a while for Google to find out that my PR 4 site is updated regularly... and perhaps they will decide to give me a PR 5 and perhaps the site will then start appearing in at least TOP 20 for my most competitive word..for which we are currently listed in TOP 5 in Yahoo and MSN...but nowhere to be found on Google.
Cross-linking your sites will help Google crawl the PR4 site more frequently. In fact, I cross-link all my sites (keep in mind that they are related without duplicating content). So, Google is constantly following links from one site to another. You can guide Google throughout your network of Web sites if you provide good, thorough linkage.

There are, of course, many people who operate Web sites as third party administrators. Cross-linking between their client sites like that is not advisable (certainly not without the clients' knowledge or permission). And, in fact, most people would probably do it wrong anyway (by trying to "hide" it as a reciprocal link program).

I mostly link to relevant content. In a few places, I "plug" specific content in such a way (such as in site news articles) that the context of the link is clear from the surrounding text.
Michael Martinez is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off