View Full Version : At a Loss With Google Indexing A Site.
briandoakes
03-10-2005, 12:18 PM
Hello,
One of the sites that I do the Creative on doesnt seem to be getting fully indexed by Google.
We have well over 40,000 pages on the site... but only about 14,000 of them get indexed.
I am at a loss, because the CEO keeps asking what's the issue. I am not a SEO guy... I am more of a design and creative person who was given the responsibility of looking into this.
I have exhausted all of my outlets and resources..
If anyone could take a quick look at http://www.globalindustrial.com and let me know if there is anything you see that can be done to improve these rankings, I would be greatly appreciative.
thanks,
Brian
Mikkel deMib Svendsen
03-10-2005, 01:24 PM
It looks like there are several reasons your entire website is not getting indexed. This is not uncommon for large dynamic websites such as yours. I could give you a few hints here but to be honest what you really need is to go through the entire site and investigate all and every indexing barrier and have each of them removed. The case is simply too complex to give a simple and usefull answer.
However, I can say that one of the things you have a problem with is the number of parameters in URLs - some pages have 4+ and that makes it very unlikely that search engines will crawl them.
briandoakes
03-10-2005, 02:50 PM
It looks like there are several reasons your entire website is not getting indexed. This is not uncommon for large dynamic websites such as yours. I could give you a few hints here but to be honest what you really need is to go through the entire site and investigate all and every indexing barrier and have each of them removed. The case is simply too complex to give a simple and usefull answer.
However, I can say that one of the things you have a problem with is the number of parameters in URLs - some pages have 4+ and that makes it very unlikely that search engines will crawl them.
Thanks... when you say 4+... are you referring to in the url like http:www.globalindustrial.com/productinfo4+732982DHS.jsp
something like that?
Honestly not quite sure what you mean when you say go through the entire site and investigate every indexing barrier and have them removed.
Like I said... I am not much of a SEO guy, more of a design.... this was just thrown on me.
Mikkel deMib Svendsen
03-10-2005, 03:16 PM
I will try and detail things a little more - just ask away if you get lost :)
Thanks... when you say 4+... are you referring to in the url like
I am talking about the number of parameters in the URL. A "normal" URL may look like this:
- www.domain.com/filename.asp
Most dynamic websites, such as yours, are however build using templates - one file, that serve up multiple pages, for example all product pages. The same file name is used - thats the template, and then parameters are added to the file name to let the back-end system know what to grab from the DB and how to render and order the content on the page. It could have a form such as:
- www.domain.com/filename.asp?pageid=23
In this example there is one parameter - the name of the parameter is "pageid" and the value is "23". On your site, as on many other dynamic websites, you use multiple parameters. Each parameter is seperateed in the URL by a &-sign - like this (taken from your site):
picGroupKey=3309&options.parentCategoryKey=128&index=7&catSearchParams.categoryKey=1618&REQ_SUB_CAT=Computer+Cabinets#gridProdAnchor
As you see, in this case we have no less than 5 parameters. There are a lot of reasons search engines do not like such URLs but whats important to you right now is that you have to change the format of those URLs to get those pages indexed. And you should! Because you have a great deal of great content hidden under all those "ugly" URLs that could improve your visibility in search engines dramatically if they got indexed.
However, keep in mind that A) There are many ways to solve this and it is important to know more about your back-end systems before anyone can recomend the best solution to you, and B) This may not be the only indexing barrier
Honestly not quite sure what you mean when you say go through the entire site and investigate every indexing barrier and have them removed.
What I mean is, that there could very well be other indexing barriers on your site beside the many parameters as described above. Often you can find out if there are other limitations for searh engines but sometimes you won't find all before you remove the first "layers" of problems.
It is like pealing an Onion from the inside - layer by layer you work your way to the top, the outside, but you just don't know how big the onion is to begin with :)
briandoakes
03-10-2005, 03:23 PM
Great thanks.....
These are things I suggested to the programming team... that the URL's seemed too long and chaotic... but I didnt not have anything other than my own specualtion to back it up with,,,
again... thanks!
Mikkel deMib Svendsen
03-10-2005, 03:53 PM
I looks like your biggest challenge is, like so many others, to deal with the humans inside your company. Technically everything is doable and I guarantee you that it is possible to build a website that your engineers can approve on and that still is VERY search engine friendly. Lot's of companies do that now. And thats probably the reason your CEO is asking for it.
One way or another you HAVE to make him understand that the way you currently publish content on your site HAVE to be changed in order to gain the results he is looking for. And thats just a start. You SHOULD require that SEO becomes and integrated part of all future development - just as well as design, IT security and usabillity (hopefully) is today. Don't let the engineers drive your website - let them build what YOU need - not what THEY want.
This may sound like a big task to you and I wont's lie to you: It is! But trust me it is indeed possible to turn around most companies on these issues and the results are usually well worth the efforts.
Michael Martinez
03-11-2005, 04:12 PM
Hello,
One of the sites that I do the Creative on doesnt seem to be getting fully indexed by Google.
We have well over 40,000 pages on the site... but only about 14,000 of them get indexed.
Most large content sites don't get deep-crawled by Google. You have to understand that some people estimate there may be as many as 600,000,000,000 Web pages in existence (and that number is now probably outdated). Google only claims to index 8,000,000,000 pages (give or take).
The best way to get 40,000 pages crawled is to create multiple crawl pages and link to them from several important pages. The crawl pages can have up to 200 URLs on them (Google suggests no more than 100, but I have seen them crawl 200+ URLs on a clean crawl page).
I would create multiple crawl pages (cover each URL 3 times in random patterns), link them together in some sort of hierarchy (in other words, have crawl pages for the crawl pages), have the detail crawl pages link to the higher crawl pages, and then have the highest level crawl pages linked to by the site map and 2-3 important pages.
Then submit 2-5 of the detailed crawl pages to Google per day until either you get them all submitted or you start seeing pages you have NOT submitted get crawled.
Feeding the spiders slowly is not deemed abusive, and it usually entices them to come dig a little deeper.
This approach takes time. I know some people can get thousands of pages into the index pretty quickly. I have never tried so I don't know how to do it. Worst-case scenario is it would take Google about 2 months to crawl all the pages. Best-case scenario, maybe 2-3 weeks.
But once the pages are crawled, there is no guarantee they will be included in the index.
All you can do is entice Google to crawl them. If they have poor internal linkage, they are probably dead in the water.