PDA

View Full Version : How do I increase the indexed pages?


CarlosO
02-04-2008, 02:24 PM
I have an issue with MSN Live only indexing a low number of pages of my site (2,780). Google has 158,000 pages indexed, Yahoo has 234,667 pages. I use the MSN webmaster tools and it reports that everyhting is ok.

Here is my robots file:
1. User-agent: *
2. Sitemap: http://www.site.com/siteMapIndex.xml
3. Disallow: /tools/
4. Disallow: /tracking/
5. Disallow: /etailers/link/
6. Disallow: /javascript/
7. Disallow: /goldbot/
8. Disallow: /hp_etailers/link/
9. Disallow: /*.do

After validating it I get the following message:

Line #9: Disallow: /*.do
Warning: MSNBOT doesn't support wildcard characters

Can this be the issue?

AussieWebmaster
02-04-2008, 04:21 PM
I would at least take it down and see... if that fixes it drop back and let everyone know

CarlosO
02-04-2008, 06:00 PM
Will do, thank you.

jewboy
02-08-2008, 12:11 AM
there is currently no accurate way for the engines to count your inclusion ratio, simple as that. what google, yahoo, msn reports are rough estimates, and are often WAY of the mark. discrepencies like this are not uncommon. you do want to make sure that the bot is visiting the pages you want ranked. i find that the site: command is not much more helpful than a link: command.

cscgal
02-10-2008, 04:14 PM
I don't care about how many pages they have indexed. I don't even care about what I rank for or how well I rank. I just care that I get a dependable, good amount of traffic from the SERPS. If my traffic decreases, I investigate what on my site changed or what I can do to increase my backlink count in general - just good ole viral marketing. I don't care what people are using to link to me and I don't care about anchor text and I don't care about my PR.

Jazajay
03-01-2008, 08:39 PM
Yeah I agree with Aussie.

Only Google handles wildcards in the robots.txt file properly as they came up with it, so by placing a Google specific rule in thier you have effectivly caused an error for Live.

Yahoo may be handling them now or may just be ignoring them when it comes accross them. In which case good on Yahoo for noticing a possible issue.

Have you tried specific rules?
ie:
User-agent: Googlebot
Disallow: /*.do

User-agent: *
Sitemap: http://www.site.com/siteMapIndex.xml
Disallow: /tools/
Disallow: /tracking/
Disallow: /etailers/link/
Disallow: /javascript/
Disallow: /goldbot/
Disallow: /hp_etailers/link/

That should fix it to be honest as only google would get the wildcard.

Jaza

CarlosO
03-03-2008, 01:40 PM
So I did take the Disallow: /*.do down, three weeks ago, but nothing has changed. Live still only indexes about a tenth of the pages that Google and Yahoo have indexed.

Ideas?

AussieWebmaster
03-03-2008, 04:43 PM
I would create a sitemap with all pages listed....
http://www.searchengineguide.com/stoney-degeyter/google-yahoo-ms.php

CarlosO
03-03-2008, 07:13 PM
I do have an xml sitemap with all the pages listed.(Almost 400,000)

Jazajay
03-04-2008, 03:51 AM
Try the following:
1. Increase the amount of links that have weight in Live.

You have to remember that each Search Engine has different ranking algorithms a link with a lot of equity in Google may have little equity in Yahoo, Live or Ask. A good link is not just a good link amongst all the engines, it's a good link in only 1 or 2 of the Engines.

Look at your competitors who rank high in Live and see where they link to in Live's Index and try to get Links from those pages as they will have high equity in Live's algo.

It may be down to insufficient Links that Live has to your site or insufficient equity so you are not high on their crawl list.

2. Live wont crawl a page if it returns of 304 (not modified). Live by 2012 will have 800,000 servers bringing it's results G will have about 600 or 700,000 cant remember which, from something I read the other day.
So live has made a decision not to re-crawl a page that hasn't changed, obviously to save on their electricity bill which is extremely high.

If your home page/only a few pages have all the links going to them and they and haven't changed your other pages may not be getting indexed due to Live getting a 304 on the links they get to from external links.

Try adding some content on some odd pages this will return a 200 (page fetched ok) and not a 304 (page not modified). Live should then crawl that page and any links off it.

3. Do some deep linking bring in point 1 (with links that have weight in Live) and this should bypass any 304's as well as give you some more equity to help with rankings.

Jaza

CarlosO
03-04-2008, 03:18 PM
Hi Jazajay, thanks for your very detailed suggestions, I will implement them and report back in a few weeks.

Jazajay
03-04-2008, 07:43 PM
Place this at the top of the page if you use php and have a php file extension.

<?php header("Cache-Control: no-cache, must-revalidate")?>
Bare in mind this will tel the browser, SE to fetch the page, This will also not cache any of the page as well.

If that is an issue change it to

<?php header(HTTP/1.1 200 OK")?>
This actually is better than the first 1. This tells the browser, SE etc... that the page was fetched and should remove the not modified header.

That said I have never used that on it's own before. So am not aware of any results with it. In theory it should work.

Not sure about using them post again before implementing.

Jaza

dinkoonks
03-13-2008, 06:26 PM
thanks for the info guys, it was exactly what i was looking for about half hr .. :\

Mitchelle Johnson
03-14-2008, 06:13 AM
try to increase off-page optimization...