Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Google > Google Web Search
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 05-15-2006   #1
atlanta404
SEO for process engineering instrumentation
 
Join Date: Apr 2006
Posts: 68
atlanta404 is on a distinguished road
Why won't Google do a "deep dive" of my site?

Hello, and thanks in advance for reading my post.

I work on an e-commerce site in which the links on the homepage allow visitors to shop by category (flowmeters, controllers, etc) or shop by manufacturer (Fuji, Partlow, etc). In addition to these links, my homepage also has links to Support, About Us, Security, Shopping Cart, etc.

The Google spiders have come around a couple of times, and the pages that have been indexed (in addition to my homepage) are the very high level pages that link directly off my homepage - the landing pages for Flowmeters, Partlow, About Us, etc. The spiders haven't gone any deeper than these landing pages to the individual product pages, which is what I want them to index!

Any suggestions???

Thanks again.
atlanta404 is offline   Reply With Quote
Old 05-15-2006   #2
PhilC
Member
 
Join Date: Oct 2004
Location: UK
Posts: 1,657
PhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud of
Are the lower pages are actually spiderable - you should make sure they are.

Do the lower pages' URLs contain too many parameters?

Do they contain session IDs, such as "id="? Google generally won't crawl pages that include and id parameter.

Google's crawling depends a lot on PageRank. The higher the PR, the more often, and deeper, a site gets crawled. It's a good reason to get IBLs and build the PageRank in the site.

You could try using a sitemap that is linked to from the homepage.

You could also create and submit a Google Sitemap.
PhilC is offline   Reply With Quote
Old 05-15-2006   #3
atlanta404
SEO for process engineering instrumentation
 
Join Date: Apr 2006
Posts: 68
atlanta404 is on a distinguished road
Thanks PhilC! Here's some more information about my site... Any more suggestions?

My product pages (deepest pages) do contain "id=", but this hasn't prevented Google from spidering my other e-commerce site which is similarly structured...

The PageRank of my homepages is 1/10, but the product pages are a 0. I do have 8 or so inbound links to my homepage that Yahoo recognizes, but Google doesn't return any of these links when I enter "link:www.instrumart.com" into the search bar...


Quote:
Originally Posted by PhilC
Are the lower pages are actually spiderable - you should make sure they are.

Do the lower pages' URLs contain too many parameters?

Do they contain session IDs, such as "id="? Google generally won't crawl pages that include and id parameter.

Google's crawling depends a lot on PageRank. The higher the PR, the more often, and deeper, a site gets crawled. It's a good reason to get IBLs and build the PageRank in the site.

You could try using a sitemap that is linked to from the homepage.

You could also create and submit a Google Sitemap.
atlanta404 is offline   Reply With Quote
Old 05-15-2006   #4
PhilC
Member
 
Join Date: Oct 2004
Location: UK
Posts: 1,657
PhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud of
There are some pages in Google's index that contain "id=" in the URLs, but Matt Cutts recently said that they don't index such pages, so maybe they are getting firmer about it. I'd change the URLs if it were me.

If Yahoo! shows the IBLs, the chances are that Google also has them even if they don't show them. But if they haven't got now, they will have them soon enough.

Even so, you should always be doing what you can to build up the IBLs. The more PageRank a site has, the better it gets spidered by Google. Also, the link text from IBLs is the most powerful ranking factor of them all, so you not only need IBLs for PageRank, you also need the links to have the right link text. I.e. a link to your site that has "click here" as the link text, won't do anything much for your rankings, but the same link that has "New York hotels" as the link text will help the site's ranking for 'New York hotels'.
PhilC is offline   Reply With Quote
Old 05-15-2006   #5
glengara
Member
 
Join Date: Nov 2004
Location: Done Leery
Posts: 1,118
glengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud of
Recently came across a site with problems that also used a secondary search by manufacturer, and in the product pages this list formed the majority of the textual content as seen in the text-only version of the cache.

You might check it's not the case with your site...

Last edited by glengara : 05-15-2006 at 02:19 PM.
glengara is offline   Reply With Quote
Old 05-15-2006   #6
pleeker
www.SmallBusinessSEM.com
 
Join Date: Jun 2004
Location: Washington state
Posts: 295
pleeker is a jewel in the roughpleeker is a jewel in the roughpleeker is a jewel in the roughpleeker is a jewel in the rough
How new is the site in question? You mention that the spider has visited a couple times, which makes it sound like a relatively new site.

So in addition to the other replies, I'd just add that Google isn't crawling as voraciously as it has in the past. And Matt Cutts recently mentioned that under the new BigDaddy infrastructure, G has a different crawl priority than it used to have. Something to keep in mind.... but the other replies are also helpful and not to be ignored in favor of my "this is just how it is these days" post.
pleeker is offline   Reply With Quote
Old 05-15-2006   #7
Wilksy
Ben Wilks
 
Join Date: Mar 2006
Location: Various locations around Australia
Posts: 120
Wilksy is just really niceWilksy is just really niceWilksy is just really niceWilksy is just really niceWilksy is just really nice
G'Day Atlanta 404,

The above advice is great, be sure your have plenty of ontopic links to your site, not only the homepage but also deep into your products, be sure to get a nice spread.

Also another thing might be to look at the 'uniqueness' of you pages;
- do they have a significant amount of individual content per page?
- do you have a unique meta description on each page?
- do you have a logical internal navigation that themes your site? - see: http://www.webmasterworld.com/forum10003/3060.htm

This big daddy confusion seem to have hit some sites really hard, I am sure the above could alleviate most of the indexing issues currently experienced.

Also if your site is new, kepp on keeping on, build so many links around the site, SE's cannot ignore you (be sure to vary your anchor text).

Cheers,

Ben
Wilksy is offline   Reply With Quote
Old 05-15-2006   #8
Robert_Charlton
Member
 
Join Date: Jun 2004
Location: Oakland, CA
Posts: 743
Robert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud of
Quote:
Originally Posted by atlanta404
The Google spiders have come around a couple of times, and the pages that have been indexed (in addition to my homepage) are the very high level pages that link directly off my homepage - the landing pages for Flowmeters, Partlow, About Us, etc. The spiders haven't gone any deeper than these landing pages to the individual product pages, which is what I want them to index!
atlanta404 - You don't say how many pages you have, but it sounds like it might be a large site.

To describe the situation in pre-BigDaddy terms, because I'm not sure exactly what's happening with the new infrastructure... Googlebot will spend only a limited amount of time spidering a given site, and the amount of time it will spend is directly influenced by PageRank. Within that time, Google will crawl only so wide or so deep into your site's structure.

Spider-friendly urls might help things by speeding up the crawl.

External inbounds going to your landing pages may also help the spiders go deeper into the site by providing deeper entry points.

Though unique meta descriptions shouldn't have anything to do with crawling, substantial similarity among pages, usually similarity in page content, will keep them from ranking and may get them dropped. I don't think meta descriptions would do that, but templated pages without much unique content, or very low PR pages with unique content but with identical titles, eg, often will go supplemental.

Depending on the size and structure of your site, it may well be that you're going to need a PR of 5 or 6 or 7 to get all your pages indexed and ranking.
Robert_Charlton is offline   Reply With Quote
Old 05-15-2006   #9
Wilksy
Ben Wilks
 
Join Date: Mar 2006
Location: Various locations around Australia
Posts: 120
Wilksy is just really niceWilksy is just really niceWilksy is just really niceWilksy is just really niceWilksy is just really nice
Quote:
Originally Posted by Robert_Charlton
Though unique meta descriptions shouldn't have anything to do with crawling, substantial similarity among pages, usually similarity in page content, will keep them from ranking and may get them dropped. I don't think meta descriptions would do that, but templated pages without much unique content, or very low PR pages with unique content but with identical titles, eg, often will go supplemental.
Ok let's think like a search engine for a second here...

How many large sites with bunches of product pages and funky url's actually have unique meta descriptions? Bugger all (wow, this could be what Big_D went after...). It's a clear signal that someone can use a database (wow) --> so what! do these pages deserve to be in an index of other pages that are clearly telling the SE what they are about (or even in the index), I think not.

Do some test's Robert_Charlton and you WILL find that not only will a unique meta description pull some pages out of the supplemental index, but they are also seen as a signal of quality (how could they not be). More effort = more ranking.

(tedster noted this at WMW when the Big_D update began {long time back now..} and it makes total sense) See also a great post by Ian mcanerin http://forums.searchenginewatch.com/...threadid=11444 about them.

If Google finds some nice unique content with all the right signals it will crawl substantially deeper, why would it not? Your forgetting to tell the SE what your pages are about, and this will certainly help if they are similar and the bot cannot work it out for itself.

Food for thought,

Ben
Wilksy is offline   Reply With Quote
Old 05-16-2006   #10
Robert_Charlton
Member
 
Join Date: Jun 2004
Location: Oakland, CA
Posts: 743
Robert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud of
Quote:
Originally Posted by Wilksy
Do some test's Robert_Charlton and you WILL find that not only will a unique meta description pull some pages out of the supplemental index, but they are also seen as a signal of quality (how could they not be). More effort = more ranking.
Wilksy - Google considers those fabled 100 factors, and IMO the meta description is simply not that high on the list of things to pay attention to, at least not for rankings. It should definitely be tuned to help present a more attractive snippet for searchers, though.

As far as testing goes, I think I inadvertently just ran a test. I've got a page on a client site that last week moved up to #5 on Google for a two word phrase with 70-million competing pages... reasonably competitive... and I realized that I'd never tuned the meta-description for this page. It's got the same generic description that I use on the unoptimized pages in the site, which doesn't happen to contain this particular phrase. For this page, Google had been pulling the snippet from the first paragraph.

So I tuned the meta description a few days ago, not for ranking purposes but to present a more attractive description for that particular search. My point is that the meta description doesn't seem to have affected the ranking much, but I'll let you know if tuning it gets us to #1.

PS - The question of the original poster is essentially about crawling and indexing, though... and here I don't think the meta description affects crawling at all, nor do I think it's a large enough factor to put a page in supplemental. I do agree with you that it's important for selling the page to the searcher.

Last edited by Robert_Charlton : 05-16-2006 at 12:33 AM.
Robert_Charlton is offline   Reply With Quote
Old 05-16-2006   #11
PhilC
Member
 
Join Date: Oct 2004
Location: UK
Posts: 1,657
PhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud of
We ran a public test a few months ago, and it was found that the Description tag isn't used at all for rankings in Google, but it is used by Yahoo!. I'd run a test a year earlier, and found the same thing.
PhilC is offline   Reply With Quote
Old 05-16-2006   #12
glengara
Member
 
Join Date: Nov 2004
Location: Done Leery
Posts: 1,118
glengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud of
I'd agree with Wilksys' comment, don't know about ranking, but from what I've recently read from a number of experienced posters a generic D tag on dynamic sites with low link juice may well hinder spidering/indexing.
glengara is offline   Reply With Quote
Old 05-16-2006   #13
scrubs
UK
 
Join Date: Mar 2006
Location: UK
Posts: 169
scrubs is on a distinguished road
The key here is to be proactive about developing your store. Don't sit back and hope G will index a whole list of similar titles, descriptions and content. You may experience short lived success with a good listing but personally I would focus on the bigger picture and what can be indexed as a whole.

As stated earlier in the discussion you need to make sure the store is locatable by the spiders, for starters introduce search safe URL's and ditch the id='s. Good site linking structure is important for crawling the complete site.

PhilC - Thanks for the update btw rgd Matt Cutts, this is something I have guessed may happened to sharpen up results in the SERP's. This can be only a good thing I feel, do you agree?

I am currently holding good positions for a store, for product pages all scripted with unique META info and page content. Without the Description tag I feel this will not be possible, imo the pages relevancy will decrease. In some cases the des tag is shown in the SERPs, others it is not...it pulls text from the page.

Final thoughts - cover all bases and make sure all your pages are valued by G and dont be lazy
scrubs is offline   Reply With Quote
Old 05-16-2006   #14
Robert_Charlton
Member
 
Join Date: Jun 2004
Location: Oakland, CA
Posts: 743
Robert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud of
Quote:
Originally Posted by Wilksy
If Google finds some nice unique content with all the right signals it will crawl substantially deeper, why would it not?
I'm no authority on the architecture of a search engine, but it seems to me that for a spider to decide that content is unique before it crawls it is simply putting the cart before the horse. How does the spider know the content is unique if it hasn't crawled it? To determine uniqueness, doesn't the engine need to gather the data, add it to the main database, and run a data sort to compare it with other pre-existing content?

Quote:
Originally Posted by glengara
...from what I've recently read from a number of experienced posters a generic D tag on dynamic sites with low link juice may well hinder spidering/indexing.
Low link juice, I think, is the determining factor. I think you do have to be careful on large dynamic sites to "ration" the number of pages you show to a spider, and particularly to limit similar pages, because you'll otherwise be wasting your link juice... but I've got to think that, at the spidering level, the search engine isn't making content judgements. Those, I'm assuming, are made later. Conjecture on my part... and this may be something that is changing with BigDaddy.
Robert_Charlton is offline   Reply With Quote
Old 05-16-2006   #15
glengara
Member
 
Join Date: Nov 2004
Location: Done Leery
Posts: 1,118
glengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud of
I could well see where with "funky" Urls G would need some enticement to dig deeper, and with generic titles and D tags, it may well require higher than "normal" PR.....
glengara is offline   Reply With Quote
Old 05-17-2006   #16
Wilksy
Ben Wilks
 
Join Date: Mar 2006
Location: Various locations around Australia
Posts: 120
Wilksy is just really niceWilksy is just really niceWilksy is just really niceWilksy is just really niceWilksy is just really nice
Quote:
Originally Posted by Robert_Charlton
Wilksy - Google considers those fabled 100 factors, and IMO the meta description is simply not that high on the list of things to pay attention to, at least not for rankings. It should definitely be tuned to help present a more attractive snippet for searchers, though.

As far as testing goes, I think I inadvertently just ran a test. I've got a page on a client site that last week moved up to #5 on Google for a two word phrase with 70-million competing pages... reasonably competitive... and I realized that I'd never tuned the meta-description for this page. It's got the same generic description that I use on the unoptimized pages in the site, which doesn't happen to contain this particular phrase. For this page, Google had been pulling the snippet from the first paragraph.

So I tuned the meta description a few days ago, not for ranking purposes but to present a more attractive description for that particular search. My point is that the meta description doesn't seem to have affected the ranking much, but I'll let you know if tuning it gets us to #1.

PS - The question of the original poster is essentially about crawling and indexing, though... and here I don't think the meta description affects crawling at all, nor do I think it's a large enough factor to put a page in supplemental. I do agree with you that it's important for selling the page to the searcher.
Wouldn't a true test be done on a site with multiple 'funky url's' in the supplemental index, that's what we are talking about here, no?? Try it on some similar pages in the supplemental index and let me know how you go. The pages are not in the supplemental index due to a lack of a description tag, this tag can be used to tell the bot how similar pages are different and get these pages back into the regular index.

My reference on effort and rankings was about crawlability not ranking ; ) One leads to the other.

Also while I am certainly no se engineer myself I have done a lot of thinking and I am sure the copy taken back to the plex after a crawl is assessed and the next time the site is crawled it's crawled according to it's merits. Hence the better the signals the better the crawl, makes sense no??
Wilksy is offline   Reply With Quote
Old 05-17-2006   #17
Robert_Charlton
Member
 
Join Date: Jun 2004
Location: Oakland, CA
Posts: 743
Robert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud ofRobert_Charlton has much to be proud of
Quote:
Originally Posted by Wilksy
Wouldn't a true test be done on a site with multiple 'funky url's' in the supplemental index, that's what we are talking about here, no?? Try it on some similar pages in the supplemental index and let me know how you go. The pages are not in the supplemental index due to a lack of a description tag, this tag can be used to tell the bot how similar pages are different and get these pages back into the regular index.
Wilksy - Interesting thought experiment... doesn't necessary correspond to real world optimizing, though.

When I optimize a database driven catalog site such as described in the original post, with "funky urls" (if I understand what you mean by these), I generally have the programmers apply mod_rewrite to fix the urls, and come up with a database-driven approach to generate optimized titles, descriptions, and content... all together. It would be a waste of everybody's time and resources to just optimize the description.

If I'm working on a static site, I'd take care of titles and descriptions at the same time. I do have a site coming up that, in its archives, has hundreds of more or less identical titles, all of which are returned only with the dupe filter off, even though the pages have original content... and I have proposed a series of step tests to determine how much we need to change things to bring relevant content in these pages to user attention. I've got to tell you though, that it's never occurred to me to change only the descriptions, without first changing titles or internal nav links or page headings. That simply doesn't make sense.

Quote:
Originally Posted by Wilksy
My reference on effort and rankings was about crawlability not ranking ; ) One leads to the other.
Here too, the thought experiment doesn't correspond to the real world situation. The pages have in fact been crawled. They're simply not differentiated enough from each other, by title or internal links, for Google to return them even on a site search, but they're indexed.

Quote:
Originally Posted by Wilksy
Also while I am certainly no se engineer myself I have done a lot of thinking and I am sure the copy taken back to the plex after a crawl is assessed and the next time the site is crawled it's crawled according to it's merits.
Well, I'm not so sure, because I'm not a search engineer either. I have no idea what kind of computation resources it would take for a crawler to carry this kind of individual page information with it when it decides to crawl a site. I have reason to believe that this wasn't happening in the past, and that time allocated to crawling a site because of PageRank was in fact the main factor that determined how much of a site got crawled.

I mentioned that BigDaddy might be changing things. I've seen hints from Matt's postings that there are various bots from Google that are now "co-operating" with each other, so the above might make sense.

Quote:
Originally Posted by Wilksy
Hence the better the signals the better the crawl, makes sense no??
Again, if the description is not a factor that's used in ranking, and many of us believe that it's not, why should it be an important enough factor to influence crawl?

And again, returning to the original question, and to several earlier posts on this thread, there are some other well-known factors that need to be taken care of first. Of course, descriptions ought to be fixed. I don't think they're the source of the crawling problem, though.
Robert_Charlton is offline   Reply With Quote
Old 05-18-2006   #18
atlanta404
SEO for process engineering instrumentation
 
Join Date: Apr 2006
Posts: 68
atlanta404 is on a distinguished road
Thanks for all of the great posts!

I've checked my site's log files, and Google has been at the very deepest level pages of my site... But for some reason, Google has chosen not to index them.

What do you think???
atlanta404 is offline   Reply With Quote
Old 05-18-2006   #19
PhilC
Member
 
Join Date: Oct 2004
Location: UK
Posts: 1,657
PhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud of
Matt Cutts gave us some new information a few days ago, which concerns this topic. With the Big Daddy update, Google now...

1) intentionally crawls more pages that they will index, so pages will be crawled that won't get into the index.

2) has new criteria about how deep and how often to crawl a site. It's the reason why many sites are having their pages dumped from the index, and it's the reason why many sites won't get all their pages into the index, even though they may be crawled.

You can read what Matt has to say, and the subsequent posts which include some angry ones from me at http://www.mattcutts.com/blog/indexi...line/#comments
PhilC is offline   Reply With Quote
Old 05-19-2006   #20
glengara
Member
 
Join Date: Nov 2004
Location: Done Leery
Posts: 1,118
glengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud ofglengara has much to be proud of
Read your posts Phil, and couldn't quite see what the excitement was all about, hasn't PR/links always determined site indexing?

Though its been known for a long time in our circles, it's only relatively recently that G suggests gaining links in their guidelines, maybe MC was simply reiterating it for a wider audience?
glengara is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off