PDA

View Full Version : why is my site dropped from this update?


catchmeifucan
02-21-2005, 09:01 PM
I think the update is over. My site got completely dropped. daily referral from Google dropped from 300/day to only 30/day. all the indexed url show only a url and as supplement result, no cache, no snipet. GB is still visiting the site everyday, though.

can I post the url here? www.samszone.com

anybody any idea why it is wipe out?

Thanks

Michael Martinez
02-21-2005, 10:21 PM
Your site is still in the index (as are over 17,000 links to your site). And I don't think the update is over.

dannysullivan
02-22-2005, 05:27 AM
Yes, the fact that you have 10,400 pages but only 29 or so of those registering (http://www.google.com/search?q=site:www.samszone.com&hl=en&lr=&c2coff=1&start=20&sa=N) as unique does suggest a problem. We've seen this type of thing even before the recent update: Only 31 results shown for Yahoo (http://forums.searchenginewatch.com/showthread.php/?t=3385) and Google Results Very Odd?Google Results Very Odd? (http://forums.searchenginewatch.com/showthread.php?t=3918). As noted, the update still may be continuing, so it's hard to give you any specific advice -- especially when this could be a bug on Google's part. Definitely send them a message asking them to investigate, and review of of the current threads (http://forums.searchenginewatch.com/showthread.php?t=4153) on the recent update to see if they might help more.

catchmeifucan
02-22-2005, 01:32 PM
Thanks,

I read quite a few post but I still don't have a clue. Google adsense on my site works poorly this week. it's not even shown. I doubt that the index about my site is broken and they no longer can dertermine what to show. I send them an email a few days ago but the answer was to "wait".

a search for www.samszone.com+samszone]more (http://www.google.com/search?hl=en&lr=&c2coff=1&rls=GGLD,GGLD:2004-40,GGLD:en&q=+site:[url) results for my site [/URL] yields more than 10,000 results, however, all cached back to April, 2004.

Was it some kind of penality or a bug? how do I know if it some kind of penality? Say, dup content? what's the symptom?

should I just sit down and wait?

Michael Martinez
02-22-2005, 03:01 PM
The fact that Google told you to wait implies strongly that they are trying to fix something. This does not look like a penalty to me.

The good news is that you probably won't have to fix anything. The bad news is that you have no control over how long it will take for Google to fix the problem.

palms
02-23-2005, 06:31 PM
<<Yes, the fact that you have 10,400 pages but only 29 or so of those registering as unique does suggest a problem>>

Well, here's a similar problem but from the OTHER angle:

When I do a site:www.mydomain.c0m, my 11,000 page site shows 31,000+ results! Any reason why that would be?

catchmeifucan
02-23-2005, 06:35 PM
is your site dynamic or static? If it is dynamic, then it could happen that two different urls goes to the same page.

palms
02-23-2005, 07:34 PM
<<dynamic or static?>>

Static. This one has me scratching my head. I could see the total count being off by a few hundred or so, but 20,000?

static
02-23-2005, 07:50 PM
Static. This one has me scratching my head. I could see the total count being off by a few hundred or so, but 20,000?

Does it happen to have any perl/cgi files or any other "dynamic" files linked from each page like addurl, vote, email a friend, rate this, blah blah....?

palms
02-23-2005, 08:05 PM
<<like....email a friend>>

...[gulp]...uh, yeah, why?

Whoa, now I get it.
Added:

User-agent: *
Disallow: /cgi-bin/

Thanks

Chris_D
02-23-2005, 08:05 PM
Catchmeifyoucan,

Have you changed the server config of your site recently? Added session cookies for tracking?

There is a problem is that the way your server is configured.

This is what happens:

GET /scooters/electric-scooters/280w-e015lx-deluxe-electric-scooter-with-front-and-rear-suspension.html HTTP/1.0
Host: www.samszone.com
Header (Length = 409):
HTTP/1.0·404·Not·Found
.
.
.
Set-Cookie:·cookie_test=please_accept_for_session;·exp ires=Sat,·26-Mar-05·00:00:19·GMT;·path=/;·domain=www.samszone.com

The server is eventually delivering the pages - but the initial response provided by the server is a '404 page not found' then making the user accept a session cookie.

I don't have time to dig further now - but all your 'deep' pages initially return a 404, or a 301 to a 404.

I think returning 404s until the bots eat your cookies - thats the problem.......

Marcia
02-23-2005, 08:11 PM
Aside from what others have suggested, I noticed two things - first, there isn't much unique text on the pages, with too many having that same first line aside from what look like datafeed links. There have been problems with such sites with getting hit for dups. Even if that's nothing to do with what's happening now, it should be looked at.

Then, the links are being redirected through a script. I didn't check myself, but apparently there's a problem - so also, could you check it out to see if any of the links are using a 302 redirect and let us know?

Just a wild guess with nothing whatsoever to base it on, except that there have been problems and HOPING they're working on it - but it isn't impossible that they're doing something regarding fixing their problems redirected links.

Again - NO basis for the *guess* except for keeping fingers crossed. I could well be dead wrong.

static
02-23-2005, 08:21 PM
<<like....email a friend>>

...[gulp]...uh, yeah, why?

Whoa, now I get it.
Added:

User-agent: *
Disallow: /cgi-bin/

Thanks

You have to make sure that you also have the robots meta tag <meta name="robots" content="noindex,nofollow"> for Google and that doesn't stop them from at least showing the URL in their index and counting them as unique pages.... I know, I have many 1000's of pages indexed (URL Only) ever since Google decided to double/triple it's index. These pages were always blocked (over 3 years) via robots.txt and also with the meta robots tag and Google decided to include them in their index one day....

Added Just to show you what I mean about Google needing more than robots.txt you can do a search like http://www.google.com/search?q=allinurl:editors/dmoz.org&hl=en&lr=&filter=0
and even though /editors/ is blocked via robots.txt Google still feels the need to list/count all those pages they have been blocked from.

catchmeifucan
02-23-2005, 09:41 PM
Catchmeifyoucan,

Have you changed the server config of your site recently? Added session cookies for tracking?

There is a problem is that the way your server is configured.

This is what happens:

GET /scooters/electric-scooters/280w-e015lx-deluxe-electric-scooter-with-front-and-rear-suspension.html HTTP/1.0
Host: www.samszone.com
Header (Length = 409):
HTTP/1.0·404·Not·Found
.
.
.
Set-Cookie:·cookie_test=please_accept_for_session;·exp ires=Sat,·26-Mar-05·00:00:19·GMT;·path=/;·domain=www.samszone.com

The server is eventually delivering the pages - but the initial response provided by the server is a '404 page not found' then making the user accept a session cookie.

I don't have time to dig further now - but all your 'deep' pages initially return a 404, or a 301 to a 404.

I think returning 404s until the bots eat your cookies - thats the problem.......

Chris,
Thanks a lot, I did do some changes, but I am not sure whether this is the problem.

The site is based on oscommerce. the original url is something like www.samszone.com/product_info.php?product_id=xxxx&ocsid=xxxxxxxxxxxxxxx for product detail pages, and www.samszone.com/index.php?cPath=xxx&oscId=xxxxxxxxxxxxxx for category pages.

I had outsourced it to a programmer for a mod rewrite to make the url like what you see now. But the oscId still exist. I don't quite understand how mod rewrite works, however, I know he used some sort of 404 page. It worked fine (I mean, google and other search engine don't have a problem indexing it) until recently I hired a SEO, he suggsetted that I should get rid of the oscId in the url. I did a research on oscommerce forum, one way to get rid of the oscId is to force a cookie, the purpose of the cookie is to allow the user to start a session. so I turned it on. that's approximately the time I started having problems.

But, Yahoo and MSN seems have no problem with this cookie thing, MSN's cache on my home pages is 2/21.

Chris, how did you get this report?

[QUOTE=Chris_D]
GET /scooters/electric-scooters/280w-e015lx-deluxe-electric-scooter-with-front-and-rear-suspension.html HTTP/1.0
Host: www.samszone.com
Header (Length = 409):
HTTP/1.0·404·Not·Found
.
.
.
Set-Cookie:·cookie_test=please_accept_for_session;·exp ires=Sat,·26-Mar-05·00:00:19·GMT;·path=/;·domain=www.samszone.com
[quote]

Chris, if I get rid of the cookie, do you think the mod rewrite is a problem? I mean, a 301 and 404 is returned before the page is delivered?

Thanks

Kevin

Marcia
02-23-2005, 09:52 PM
>>some sort of 404 page

I'm not much of a tech_type person, but I've heard of issues with those type of redirects. I've got no idea, but hopefully someone will come along who knows for sure what the details are. I do know with Apache, redirect/mod_rewrite does not need anything to do with any 404s.

HTTP/1.0·200·OK(CR)
(LF)
Date:·Thu,·24·Feb·2005·01:48:28·GMT(CR)
(LF)
Server:·Apache/1.3.33·(Unix)·mod_auth_passthrough/1.8·mod_log_bytes/1.2·mod_bwlimited/1.4·FrontPage/5.0.2.2635·mod_ssl/2.8.22·OpenSSL/0.9.7a·PHP-CGI/0.1b(CR)
(LF)
X-Powered-By:·PHP/4.3.10(CR)
(LF)
Set-Cookie:·cookie_test=please_accept_for_session;·exp ires=Sat,·26-Mar-05·01:48:28·GMT;·path=/;·domain=www.samszone.com(CR)
(LF)
Connection:·close(CR)
(LF)
Content-Type:·text/html(CR)
(

That is what comes up for the site just checking the homepage using the server header checker at www.rexswain.com

That is not right.

I think returning 404s until the bots eat your cookies - thats the problem.......
Boy, that would do it!

phpmaven
02-24-2005, 12:02 AM
catchmeifucan,

I'm going to make an educated guess that whoever set this up for you is running every one of those product urls through a PHP script that is creating a dynamic url and feeding that to another PHP script that is retrieving the page. One way that this can be done is to setup a .htaccess file that uses a custom 404 page that is actually a PHP script. It will work fine if you set it up properly, but you have to make sure that your script issues a 200 header before it does anything else or any SE spider will get that 404 header and your dead. A normal browser will just pull up the page and it will look like everything is fine.

Just a guess :D

catchmeifucan
02-24-2005, 01:19 AM
Hi, phpmaven,

here is the 404.php page, can you tell me if it is the problem?
-------------------------------------------------------------------------
<?php

$seo_extension = '.html';

function error404() {
}

require_once('includes/configure.php');

mysql_connect(DB_SERVER, DB_SERVER_USERNAME, DB_SERVER_PASSWORD);
mysql_select_db(DB_DATABASE);

// remove leading junk (eg "/" or "/test/")
//$req = substr($_SERVER['REQUEST_URI'], 6);
$req = substr($_SERVER['REQUEST_URI'], 1);

$req = explode('/', $req);


// the last element should be product's name, it might be empty if the url end with a slash,
// in that case category listing will be showed. In case no such product exists, category is tried
// instead, if category with that name exists, 301 Moved Permanently response is given with the proper url
$product_urlname = array_pop($req);

// there might be a query string after the product's name. Extract it and set GET variables
if(strpos($product_urlname, '?') !== false) {
list($product_urlname, $query_string) = explode('?', $product_urlname);
parse_str($query_string, $HTTP_GET_VARS);
// globals will be set later
}

// this array will be later used instead of $cPath_array at some places
$cPath_array_seo = $req;

// this variable will be used so that only the proper category hierarchy is traversed
$last_cat_id = 0;

$cPath = array();

$error404 = false;

while($category_urlname = array_shift($req)) {
$sql = "select categories_id from categories where categories_urlname = '" . mysql_real_escape_string($category_urlname) . "' and parent_id = '$last_cat_id'";
$cat = mysql_query($sql);
if(list($cat_id) = mysql_fetch_row($cat)) {
$cPath[] = $cat_id;
$last_cat_id = $cat_id;
} else {
$error404 = true;
$product_urlname = '';
}
}


if($product_urlname != '') {
//$product_urlname = substr($product_urlname, 0, strlen($product_urlname) - strlen($seo_extension));

// should be product displayed only if it is in the right category? no for now
$sql = "select p.products_id from products p, products_to_categories p2c where p.products_id = p2c.products_id and p.products_urlname = '" . mysql_real_escape_string(substr($product_urlname, 0, strlen($product_urlname) - strlen($seo_extension))) . "'";
$prod = mysql_query($sql);
if(list($prod_id) = mysql_fetch_row($prod)) {
$HTTP_GET_VARS['products_id'] = $prod_id;
$_SERVER['PHP_SELF'] = 'product_info.php';
$HTTP_SERVER_VARS['PHP_SELF'] = 'product_info.php';
$PHP_SELF = 'product_info.php';
} else { // product with this name does not exists, let's check if it is not a category
$sql = "select categories_id from categories where categories_urlname = '" . mysql_real_escape_string($product_urlname) . "' and parent_id = '$last_cat_id'";
$cat = mysql_query($sql);
if(list($cat_id) = mysql_fetch_row($cat)) { // it is, redirect with 301 Moved Permanently
header('HTTP/1.1 301 Moved Permanently');
header('Location: ' . ((!isset($_SERVER['HTTPS']) || $_SERVER['HTTPS']!="on") ? 'http://' : 'https://') . $_SERVER['SERVER_NAME'] . $_SERVER['REQUEST_URI'] . '/');
exit;
} else {
$error404 = true;
}
}
} else {
$_SERVER['PHP_SELF'] = 'index.php';
$HTTP_SERVER_VARS['PHP_SELF'] = 'index.php';
$PHP_SELF = 'index.php';
}

if($error404 == true) {
$_SERVER['PHP_SELF'] = 'index.php';
$HTTP_SERVER_VARS['PHP_SELF'] = 'index.php';
$PHP_SELF = 'index.php';
$HTTP_GET_VARS['error_message'] = 'Error 404 - The page you are looking for was not found on this server.';
require('index.php');
} else {
header("HTTP/1.1 200 OK");
$cPath = implode('_', $cPath);
$HTTP_GET_VARS['cPath'] = $cPath;

if(0) {
echo '<pre>';
print_r($cPath);

echo '</pre>';
}

// set globals from GET vars:
foreach($HTTP_GET_VARS as $key => $value) $$key = $value;
$_GET = $HTTP_GET_VARS;


include($_SERVER['PHP_SELF']);
}
?>
------------------------------------------------------------------------

and here is the .htaccess file

------------------------------------------------------------------------
# $Id: .htaccess,v 1.3 2003/06/12 10:53:20 hpdl Exp $
#
# This is used with Apache WebServers
#
# For this to work, you must include the parameter 'Options' to
# the AllowOverride configuration
#
# Example:
#
# <Directory "/usr/local/apache/htdocs">
# AllowOverride Options
# </Directory>
#
# 'All' with also work. (This configuration is in the
# apache/conf/httpd.conf file)

# The following makes adjustments to the SSL protocol for Internet
# Explorer browsers

<IfModule mod_setenvif.c>
<IfDefine SSL>
SetEnvIf User-Agent ".*MSIE.*" \
nokeepalive ssl-unclean-shutdown \
downgrade-1.0 force-response-1.0
</IfDefine>
</IfModule>

# Fix certain PHP values
# (commented out by default to prevent errors occuring on certain
# servers)

#<IfModule mod_php4.c>
# php_value session.use_trans_sid 0
# php_value register_globals 1
#</IfModule>

RewriteEngine on
RewriteCond %{HTTP_HOST} !^www\.samszone\.com
RewriteRule ^(.*)$ http://www.samszone.com/$1 [R=permanent,L]

ErrorDocument 404 /404.php
<Files 403.shtml>
order allow,deny
allow from all
</Files>
-----------------------------------------------------------------

catchmeifucan
02-24-2005, 01:29 AM
Marcia,

[QUOTE=Marcia]

Then, the links are being redirected through a script. I didn't check myself, but apparently there's a problem - so also, could you check it out to see if any of the links are using a 302 redirect and let us know?

QUOTE]

what links are you talking about? the "buy now" links? it is a php script, it was like

header('Location: ' . $url)

I don't know whether it is 302 or not, any problem with it?

Chris_D
02-24-2005, 01:37 AM
Hi Kevin,

I think returning a 404 is the problem on your content pages.

Even using modrewrite/ ISAPI rewrite etc it should be returning a HTTP/1.1·200·OK for a specific valid URL request.

You can see the headers returned by your server with any header viewer - like the Rex Swain one Marcia posted. Marcias post shows that your homepage returns a 200 Ok

There are tons of these in G:

www.samszone.com/product_reviews_ info.php?products_id=1611&reviews_id=77 - 42k - Supplemental Result

ie the 'old' urls.

I'm flat out right now, so I'm not sure without doing more digging - but if I was a betting man - I'd say its the 404 that's stopping Gbot dead.

As an experiment, you could try removing one single page. Eg pick an indexed page with the rewritten URL which returns a 404 in the header. Then use
http://services.google.com:8882/urlconsole/controller?cmd=reload&lastcmd=login to request removal of that page.

If you ask to remove a single page - Google will only accept it if there is a 404 in place - or it has a noindex on the page. If Google accepts the removal of this page - then its seeing the 404.... and thats probably the problem....

Chris_D
02-24-2005, 02:02 AM
<added>

Your 'old' pages like

http://www.samszone.com/product_reviews_%20info.php?products_id=1611&reviews_id=77

are returning 400s / 404s - thats why they are going 'supplemental' in G

phpmaven
02-24-2005, 10:46 AM
catchmeifucan,

I don't have the time or the inclination to debug this code for you, but obviously the proper headers are not being issued. I can see in your 404 script that it is sending a 200 header at some point. Your programmer needs to do some basic debugging to figure out where this script is going wrong. I think it's pretty clear that the problem lies with this script.

catchmeifucan
02-24-2005, 02:36 PM
Thanks a lot Guys!

This is a best forum so far. I posted my problems in other forums but never get such insight.

Marcia,

The links are redirected as a 302, I just ran it through the header checker. But why is that a problem? if I turn it to an affiliate link such as http://qvert.net-pid-aid, whould that still be a problem?

catchmeifucan
02-24-2005, 02:52 PM
Hi, Guys,

I still have some questions.

What does spider do when it hits a "404"? if it stops spidering, why were those "404" links still got indexed?

the mod rewrite was done April last year. many urls such as "www.samszone.com/scooters/electric-scooters/" got indexed by Google, MSN, and Yahoo. I am having a huge traffic drop just recently.

phpmaven
02-24-2005, 02:52 PM
catchmeifucan,

No offense, but you need to have someone who knows what they are doing take a look at your whole site. Links to product pages right on your home page are returning 404 errors. You need to get that fixed ASAP before you do anything else.

Just for example the following link on your home page:
"/rc-toys/robosapien/robosapien-robot-remote-control-rc-human-robot.html"
Is returning a 404 error even though the product page is being returned.

Michael Martinez
02-24-2005, 03:05 PM
Hi, Guys,

I still have some questions.

What does spider do when it hits a "404"? if it stops spidering, why were those "404" links still got indexed?

It depends on how the 404 is handled. The default configuration for a Web server (at least, for Apache) is to serve a code "404 - Page Not Found". Nothing else comes back. The spider would log it as a dead link.

However, you can set up a 404 Page, and one is included with most if not all Web servers, which basically tells the visitor, "You just tried to go to a non-existing page!"

While this message page may be useful for visitors (I find it annoying), spiders accept it as a normal content page. They may even index it (although I think Google finally figured out how to stop doing that a couple of years ago).

You can configure your server to redirect 404 traffic to a specific content page on your site. I send my 404 traffic to my main index page.

What this does is serve my visitors something useful and relevant to whatever they were looking for. I highlight currently hot topics on my main index AND have a link to my site map.

Spiders take the page in stride, follow the links, and eventually whatever dead link led to the 404 gets replaced in the next database build.

In a few cases, where I have formerly popular URLs with many inbound links pointing to them (that I cannot control), I put up static pages which do an http-meta refresh to new URLs. I include a ROBOTS tag with a "noindex,nofollow" value.

Chris_D
02-24-2005, 05:09 PM
why were those "404" links still got indexed?

Because they returned a 200 OK when they first got spidered/ indexed - but they return a 404 now (after they were indexed) - which is why they are going supplemental.

You have many pages returning a 404 - thats why you have 'supplemental result' on so many pages in G - if you don't fix it soon - those pages will be removed from the index...

catchmeifucan
02-24-2005, 05:16 PM
Thanks, everybody, I am working on it. I'll keep you updated if it fix my problem.

Chris_D
02-24-2005, 06:44 PM
In a few cases, where I have formerly popular URLs with many inbound links pointing to them (that I cannot control), I put up static pages which do an http-meta refresh to new URLs. I include a ROBOTS tag with a "noindex,nofollow" value.

Wouldn't a "noindex,follow" be more appropriate Michael? i.e. tell the spiders 'don't index this page, but follow the link to the new page (and index that new page)'?

Michael Martinez
02-24-2005, 07:03 PM
Wouldn't a "noindex,follow" be more appropriate Michael? i.e. tell the spiders 'don't index this page, but follow the link to the new page (and index that new page)'?

"noindex,follow" doesn't work (because you're telling the spider NOT to index the content).

My new URLs are crawled regularly because of my site map. I link to that site map everywhere, and the spiders are intimately familiar with it.

And in these cases, the URLs are "new" only with respect to the fact that there are still "old" URLs I have to maintain to allow for traffic coming from what would otherwise be dead links.

I rarely move content around. If I have a server crash or expand a page into a full section/sub-site, then I may change a URL. But I hate doing that because they all end up with links coming into them.

catchmeifucan
02-24-2005, 09:13 PM
I found this problem more and more interesting now.

I did a log analysis for Jan and Feb

for Jan, here is total server responds

Server response Hits Page views Visitors Size
200 825197 141333 18325 6751469481
304 184243 296 4610 0
302 19247 19247 1282 14287715
404 7814 1991 1228 130338613
206 977 4 529 10947639
301 843 841 461 150630
408 58 58 33 0
401 57 57 11 0


there's not much 404 though

here is part of Google bot path

Date/time Page Server response Page views Size
1/30/2005 19:45 /rc-toys/rc-helicopters/draganflyer-radio-control-electric-rc-helicopter.html 200 1 29066
1/30/2005 19:47 /rc-toys/rc-helicopters/ht tp://www.samszone.com/rc-toys/rc-airplanes 404 1 40900
1/30/2005 19:49 /rc-toys/rc-helicopters/?(parameters) 200 1 28832
1/30/2005 19:51 /rc-toys/rc-helicopters/blade-radio-remote-control-helicopter.html 200 1 28745
1/30/2005 19:52 /rc-toys/rc-boats/?(parameters) 200 1 21398
1/30/2005 19:54 /rc-toys/rc-boats/18-electric-power-rc-dorado-boat.html 200 1 28930
1/30/2005 19:56 /rc-toys/rc-tank/?(parameters) 200 1 23584
1/30/2005 19:57 /rc-toys/gas-rc-cars/110-nitro-gas-2speed-mustang-saleen-rc-car-xpro-enginertr.html 200 1 29004
1/30/2005 19:58 /rc-toys/mini-rc/mini-yoshi-kart-radio-remote-control.html 200 1 29654
1/30/2005 19:59 /rc-toys/mini-rc/?(parameters) 200 1 33892
1/30/2005 19:59 /video-games/game-cube/monopoly-party-game-cube.html 200 1 28268
1/30/2005 20:00 /video-games/game-cube/mary-kate-and-ashley-sweet-16-g-cube.html 200 1 28363
1/30/2005 20:01 /video-games/game-cube/nascar-thunder-2003-game-cube.html 200 1 28319
1/30/2005 20:02 /video-games/game-cube/nba-live-2003-game-cube.html 200 1 28265
1/30/2005 20:02 /video-games/game-cube/extreme-g3-game-cube.html?(parameters) 302 1 939
1/30/2005 20:02 /cookie_usage.php 200 1 17112
1/30/2005 23:35 /scooters/tri-scooters/?(parameters) 200 1 22209
1/30/2005 23:36 /robots.txt 200 1 2518
1/30/2005 23:36 /scooters/tri-scooters/tri5-scooter.html 200 1 39918
1/30/2005 23:37 /scooters/mini-chopperharley 301 1 0
1/30/2005 23:38 /robots.txt 200 1 2518
1/30/2005 23:38 /robots.txt 200 1 2518
1/30/2005 23:38 /video-games/playstation2/armored-core-2-ps2.html?(parameters) 302 1 939
1/30/2005 23:38 /cookie_usage.php 200 1 17226
1/30/2005 23:39 /product_reviews.php?(parameters) 200 1 23744
1/30/2005 23:39 /product_reviews_info.php?(parameters) 200 1 26037
1/30/2005 23:40 /video-games/playstation2/smugglers-run-ps2.html 200 1 28562
1/30/2005 23:41 /product_reviews.php?(parameters) 200 1 30520
1/30/2005 23:41 /pc-software/video-photography/movie-pack-pro-30.html 200 1 29558
1/30/2005 23:42 /rc-toys/electric-rc-cars/?(parameters) 200 1 42728
1/30/2005 23:43 /product_reviews.php?(parameters) 200 1 23855
1/30/2005 23:43 /rc-toys/electric-rc-cars/110-scale-radio-controlled-race-car.html?(parameters) 302 1 939
1/30/2005 23:43 /cookie_usage.php 200 1 17165
1/30/2005 23:43 /pc-software/games-entertainment/puzzle-solving/{default} 200 1 25835
1/30/2005 23:43 /pc-software/games-entertainment/startrek/{default} 200 1 37554
1/30/2005 23:44 /rc-toys/gas-rc-cars/smartech-1-speed-saleen-radio-remote-control-rc-nitro-gas-car-4wd-rtr.html?(parameters) 302 1 939
1/30/2005 23:44 /cookie_usage.php 200 1 17202
1/30/2005 23:44 /product_reviews.php?(parameters) 302 1 651
1/30/2005 23:44 /cookie_usage.php 200 1 17168


those rewritten urls are not returning a "404".

catchmeifucan
02-24-2005, 09:32 PM
Got it!

this is the latest for this month, a lot of 404!(Google), the same for MSN and Yahoo! Thanks guys! I got to get it fix ASAP!

Date/time Page Server response Page views Size
2/24/2005 0:12 /scooters/tri-scooters/tri8-scooter.html?(parameters) 404 1 38245
2/24/2005 0:13 /scooters/gas-scooters/pocket-harley-gas-scooter.html?(parameters) 404 1 30431
2/24/2005 0:14 /scooters/electric-scooters/thesportsauthority-bladez-xtr-heavy-duty-electric-scooter.html?(parameters) 404 1 29993
2/24/2005 0:16 /scooters/scooter-accessories/storage-lockable-trunk.html?(parameters) 404 1 30236
2/24/2005 0:17 /robots.txt 200 1 2518
2/24/2005 0:17 /scooters/electric-scooters/rad2go-zz-cruiser-300w-electric-scooter.html?(parameters) 404 1 29646
2/24/2005 0:18 /scooters/electric-scooters/boreem-601s-350-350w-electric-scooter.html?(parameters) 404 1 29835
2/24/2005 0:19 /scooters/electric-scooters/250w-electric-scooter.html?(parameters) 404 1 30039
2/24/2005 0:20 /scooters/pocket-bike/scooterstore-new-2005-sport-cvt-pocketbike.html?(parameters) 404 1 29625
2/24/2005 0:21 /robots.txt 200 1 2518
2/24/2005 0:21 /{default} 200 1 41690
2/24/2005 0:21 /scooters/tri-scooters/trikke-8.html?(parameters) 404 1 31641
2/24/2005 0:22 /scooters/gas-scooters/scooteroutfitters-milan-49cc-gas-scooter.html?(parameters) 404 1 29771
2/24/2005 0:23 /scooters/kids-scooter/dynamite-motorcycle.html?(parameters) 404 1 28388
2/24/2005 0:25 /rc-toys/rc-helicopters/smartech-aerohawk-4channel-advanced-remote-control-helicopter.html?(parameters) 404 1 31332
2/24/2005 0:26 /scooters/pocket-bike/minipocketbikes-jet-mg50-49cc-pocketbike.html?(parameters) 404 1 29491
2/24/2005 0:27 /rc-toys/rc-helicopters/radio-remote-control-rc-helicopter-mini-spy-cam.html?(parameters) 404 1 32737
2/24/2005 0:29 /scooters/electric-scooters/250w-e015etlx-deluxe-electric-scooter-with-front-and-rear-suspension.html?(parameters) 404 1 30153
2/24/2005 0:29 /scooters/electric-scooters/thesportsauthority-razor-300w-electric-scooter.html?(parameters) 404 1 29868
2/24/2005 0:31 /rc-toys/rc-airplanes/2-channel-dragonfly-radio-remote-control-rc-plane.html?(parameters) 404 1 29480
2/24/2005 0:31 /robots.txt 200 1 2518
2/24/2005 0:31 /scooters/scooter-accessories/rear-rack-with-mounting-kit.html?(parameters) 404 1 31718
2/24/2005 0:33 /scooters/mobility-scooters/allelectricscooters-golden-companion-2-gc321-3wheel-scooter.html?(parameters) 404 1 29988
2/24/2005 0:34 /scooters/gas-scooters/extremetoys-viza-viper-33cc-gas-powered-scooter.html?(parameters) 404 1 30051
2/24/2005 0:34 /scooters/electric-scooters/neoscooters-bladez-xtr-street-450w-electric-powered-scooter.html?(parameters) 404 1 30004
2/24/2005 0:35 /rc-toys/rc-airplanes/advance-2channel-radio-remote-control-plane.html?(parameters) 404 1 29431
2/24/2005 0:36 /scooters/pocket-bike/scooterselection-47cc-pocketbike-2g-zgrb.html?(parameters) 404 1 29948
2/24/2005 0:38 /scooters/pocket-bike/xr-wildcat-80cc-motorcycle.html?(parameters) 404 1 29479
2/24/2005 0:39 /scooters/scooter-accessories/electric-horn-with-mounting-bracket.html?(parameters) 404 1 30198
2/24/2005 0:40 /rc-toys/rc-airplanes/3ch-remote-control-rc-airplane.html?(parameters) 404 1 29460
2/24/2005 0:41 /scooters/pocket-bike/scooterselection-49cc-super-pocketbike-x5-zosb.html?(parameters) 404 1 29551
2/24/2005 0:43 /robots.txt 200 1 2518
2/24/2005 0:43 /scooters/gas-scooters/street-speed-43cc-gas-skateboard.html?(parameters) 404 1 28759
2/24/2005 0:44 /scooters/pocket-bike/47cc-full-fairing-pocket-bike.html?(parameters) 404 1 28890
2/24/2005 0:45 /scooters/mopeds/extremetoys-tank-50qt9-sporty-2passenger-moped-gas-scooter.html?(parameters) 404 1 29835
2/24/2005 0:46 /scooters/gas-scooters/overstock-tanaka-royal-paverunner-35cc-gas-scooter.html?(parameters) 404 1 29925
2/24/2005 0:47 /site_map.php 200 1 240734
2/24/2005 0:47 /scooters/electric-scooters/extremetoys-gt-schwinn-mongoose-full-size-electric-scooter.html?(parameters) 404 1 29974
2/24/2005 0:48 /video-games/game-cube/007-bond-agent-under-fire-game-cube.html 404 1 29150
2/24/2005 0:49 /products_all.php?(parameters) 200 1 34274
2/24/2005 0:50 /scooters/electric-scooters/100w-vs03-electric-scooter-with-foldable-seat-perfect-gift-for-kids.html 404 1 33181
2/24/2005 0:51 /products_all.php?(parameters) 200 1 33245
2/24/2005 0:52 /rc-toys/electric-rc-cars/radio-control-electric-power-ferrari-sports-car.html 404 1 29706
2/24/2005 0:53 /rc-toys/gas-rc-cars/734-mph-cen-ct4s-subaru-wrx-radio-remote-control-rc-nitro-gas-race-car-rtr-w-2-speed.html 404 1 30105
2/24/2005 0:54 /rc-toys/rc-parts-accessories/rc-airplanes-parts/1set-propeller-2pcs-for-all-rc-2ch-plane-dragonfly-seagull-butterfly-yellowbee.html

Chris_D
02-25-2005, 07:32 AM
Hi Catchmeifucan,

Thats great!

Michael wrote:
"noindex,follow" doesn't work (because you're telling the spider NOT to index the content).
No Michael. That isn't correct.

"noindex,follow" is telling the spider NOT to index the content of the page - AND that the spider SHOULD follow the links to your new page.

What to put into the Robots META tag
The content of the Robots META tag contains directives separated by commas. The currently defined directives are [NO]INDEX and [NO]FOLLOW. The INDEX directive specifies if an indexing robot should index the page. The FOLLOW directive specifies if a robot is to follow links on the page. The defaults are INDEX and FOLLOW. The values ALL and NONE set all directives on or off: ALL=INDEX,FOLLOW and NONE=NOINDEX,NOFOLLOW.

Some examples:

<meta name="robots" content="index,follow">
<meta name="robots" content="noindex,follow">
<meta name="robots" content="index,nofollow">
<meta name="robots" content="noindex,nofollow">
http://www.robotstxt.org/wc/meta-user.html

Michael Martinez
02-25-2005, 10:30 AM
Hi Catchmeifucan,

Thats great!

Michael wrote:

No Michael. That isn't correct.

The robotstxt site notwithstanding, "noindex" takes precedence over "follow". I and a number of other people tested this extensively with multiple spiders several years ago (including GoogleBot). None of them followed any links on a page they were not indexing.

Anyone who wants to know for sure should do their own test, as the spiders could have been reconfigured.

catchmeifucan
02-25-2005, 12:46 PM
got it sorted out. On Feb 8th, my host did an upgrade to the server that messed up everything. Now everything looks fine. Got to stay cool and wait it get indexed again. hopefully soon.

Who can answer the outbound link question? Is 302 outbound link bad?

I, Brian
02-28-2005, 09:31 AM
Google seems to be having a particularly bad day today. I'm finding duplicates of BBC pages, and my business reference site again no longer ranks for its own unique name.

ThouShaltSeo
02-28-2005, 01:42 PM
You're not alone Brian. It is getting scary now. Bug after bug...


my business reference site again no longer ranks for its own unique name.

I, Brian
02-28-2005, 02:50 PM
The SERPs are in a constant state of change - I don't remember seeing anything like this before.

What is all the more amazing is that we're now nearlry 4 weeks into this update and Google's results are still very off-center.

It would also be nice if Google could actually communicate to the world about what they are doing - the lack of information or dialogue is disconcerting. Are webmasters nothing more than a commodity to be used and commercialised by Google?

Is it possible for Google to talk to the internet without panicking about what the New York Exchange is going to think?

ThouShaltSeo
02-28-2005, 02:58 PM
I don't blem them for not communicating because they don't want to show their cards, I blame them for slow response on bugs, and for the wide net that catches many innocent sites.

What is all the more amazing is that we're now nearlry 4 weeks into this update and Google's results are still very off-center.

It would also be nice if Google could actually communicate to the world about what they are doing - the lack of information or dialogue is disconcerting. Are webmasters nothing more than a commodity to be used and commercialised by Google?

Is it possible for Google to talk to the internet without panicking about what the New York Exchange is going to think?

catchmeifucan
02-28-2005, 09:35 PM
Back on MSN quickly. top 10 rankings on MSN for several keywords already. But not for Google. how long does it take for Google update its index?

by the way, should I forbid spiders to crawl my product_info.php page? although mod rewrited, spiders still pick up these pages for some reason.

Michael Martinez
03-01-2005, 02:08 AM
Back on MSN quickly. top 10 rankings on MSN for several keywords already. But not for Google. how long does it take for Google update its index?

In the past, a dance used to take 2-4 days.

by the way, should I forbid spiders to crawl my product_info.php page? although mod rewrited, spiders still pick up these pages for some reason.

And why would you not want them indexed?

catchmeifucan
03-01-2005, 12:49 PM
And why would you not want them indexed?

after mod rewrite. these two pages are the same

product_info.php?product_id=1596

and /scooters/electric-scooters/250w-electric-scooters.html

I don't know why the spiders still have the product_info.php?... version. so I think maybe I should forbid spiders not to spider them.

Michael Martinez
03-01-2005, 06:39 PM
after mod rewrite. these two pages are the same

product_info.php?product_id=1596

and /scooters/electric-scooters/250w-electric-scooters.html

I understand.

I don't know why the spiders still have the product_info.php?... version. so I think maybe I should forbid spiders not to spider them.

Two possibilities I can think of: they have leftover URLs from the last crawl, other pages are linking to the old dynamic URLs.

Remember there are a lot of content scoopers out there.

I suppose it's also possible you might have some content you forgot about, so maybe your own site is improperply linking to itself.

Chris_D
03-01-2005, 08:45 PM
The 'duplicate' page is still there because www.samszone.com/product_info.php?product_id=1596 was previously indexed, and still returns a 200 - i.e. you are showing it as a valid page.

catchmeifucan
03-01-2005, 09:01 PM
The 'duplicate' page is still there because www.samszone.com/product_info.php?product_id=1596 was previously indexed, and still returns a 200 - i.e. you are showing it as a valid page.

So should I or should I not prevent spiders from spidering this dup pages?

ThouShaltSeo
03-01-2005, 09:59 PM
you don't want dupe content.
So should I or should I not prevent spiders from spidering this dup pages?

Chris_D
03-02-2005, 02:31 AM
So should I or should I not prevent spiders from spidering this dup pages?

If you leave dup pages - G will decide which one to index and keep - and get rid of the other one from it's index. It will decide which one to keep, and which on to trash. However, G's decision may not be the same as yours.....

Do any of these 'old' pages have any external links to them?

You have 4 options:
- do nothing and let G decide which one to keep;
- 301 the 'old' url to the new page;
- meta tag with 'noindex,follow' the old page to the new page; or
- just 404 the old page URL (if the new one is already indexed).

catchmeifucan
03-02-2005, 12:49 PM
301 is not an option, too time consuming to match thousands of products.
<noindex follow> does it equals to Disallow: /product_info.php in robot.txt?
404 is not an option, too.

SEO1
03-06-2005, 05:39 PM
Hi there everyone,

I am the SEO Kevin hired to optimize the website samszone.com.

I appreciate all of you helping him to find and answer I gave him many months ago in my initial bid which was dated 12-10-04

When I optimize a site I optimize all pages linked from the home page where possible.

I would do an analysis using the software that I use for such but it found that you have an header error 404 not found. for the url http://www.samzone.com/scooters/ Now I have seen the page and know this is not the case but apparantly you have an issue with your webhost as I did an header check and the response is below.

Current Date and Time: 2004-12-10T18:37:41-0800 User IP Address: 151.199.248.153

#1 Server Response: http://www.samzone.com/scooters/
HTTP Status Code: HTTP/1.1 404 Not Found
Date: Sat, 11 Dec 2004 02:37:39 GMT
Server: Apache/1.3.33 (Unix) mod_auth_passthrough/1.8
mod_log_bytes/1.2
mod_bwlimited/1.4 PHP/4.3.9
FrontPage/5.0.2.2634a
mod_ssl/2.8.21
OpenSSL/0.9.6b
Connection: close
Content-Type: text/html; charset=iso-8859-1


This could also be what prevents bots from indexing your site properly. The page does have Google Page Rank but that is the value from the links leaving the page which is in reality the hardlinks from the server.

I did a Google search for your site and found it is not included in googles index even though there is a link to it as well as several mentions on others sites.

Your webhost may not support If Modified Since HTTP header.

From Google
Make sure your web server supports the If-Modified-Since HTTP header. This feature allows your web server to tell Google whether your content has changed since we last crawled your site. Supporting this feature saves you bandwidth and overhead.


What also happened during the course of this is the OSC SEF urls component was turned on. Subsequently Google between the server errors and the SEF implementation seemed to dump kevins site quite hard.

I don't know if I should start a seperate thread but this is not the first OSC based website that I have worked on which when the SEF url implementation is turned on the site is soon dumped from Googles ranks.

The very thing that is supposed to end the session IDs seemd to get the sites dumped for over 6 months. I have also talked to others with the same issue happening to them on different OSC based sites so there is some variation in hosting environments, but a commonality in the implementation.

Any ideas or reasons this happens?

And again thank you for the help.

Clint

kenpomachine
03-07-2005, 11:09 AM
I think the update is over. My site got completely dropped. daily referral from Google dropped from 300/day to only 30/day. all the indexed url show only a url and as supplement result, no cache, no snipet. GB is still visiting the site everyday, though.
I think the update is still going on.

One of my sites appeared briefly from 02/21 to 03/02, and then Google began showing only supplement results for this blog (another blog under the same domain began showing in the SERPs on 02/23 and is still doing well, plus the content in this one hasn't been refreshed in some months now).

And the pages of a phpBB2 forum (www.abretelibro.com/foro) have this same problem currently, with the consequent dropping in referals. And the robots.txt hasn't seem to stop GB from indexing the profiles urls.

catchmeifucan
03-07-2005, 01:18 PM
Hi, Clint,

I checked the server log carefully. the 404 error occours only after Feb 8th, at that time my host updated their server and messed up everything.

SEO1
03-08-2005, 04:16 PM
For Kenpo

And the pages of a phpBB2 forum (www.abretelibro.com/foro) have this same problem currently, with the consequent dropping in referals.

Looking at the url's of your forum there is session ID's being used and Googlebot while able to index some dynamic urls, has problems with most and this could b the reason you are seeing dropping referals.

Also Google updates sites several times daily, this depends how often content is added on the website. The changes in the SERPs seem to be about every three days.

Kevin

I posted the original post from the bid so you could see the date which is December 12, 2004, Whether the host messed things up more on Feb 8th that would be only adding insult to injury.

The problems stemmed from the initial errors in Decemeber & Februarys incident just made a bad situation worse for you.

Clint

catchmeifucan
03-09-2005, 01:02 PM
clint,

I posted part of the server log here,

http://forums.searchenginewatch.com/showpost.php?p=35915&postcount=30

you can see before Feb 8th, 200 is returned for Google bot. the same as other bots.

catchmeifucan
03-09-2005, 01:43 PM
Hi, everyone,

Thanks for all the help here. My serps are back on Google, but it seems the site is not fully indexed.

site:www.samszone.com inurl:scooters

only show 116 results. many of my product pages are not back. still there are quite a few supplement results. and it seems the serps are not changing for quite a few days. However, a search for my most important keywords still won't show me up(top 50 keywords, not even one show up in top 30)

On MSN, I am doing pretty good, I got "scooters" #1 on MSN now.

Should I just wait for Google keep indexing or should I take some messures?

Thanks

SEO1
03-09-2005, 02:08 PM
Kevin

Sadly me sitting here discussing a clients website and it's problems is not what I ever wanted.

Yes google is not having a problem with the categoy pages due to the mod rewrite and as I said in my bid posting the problem lies in the product pages due to session IDs and all of this if attended to at that time would not be an issue now.

I hope you fare well in which ever direction you decide to follow with the marketing of your website.

Clint

PhilC
03-09-2005, 02:24 PM
Well, here's a similar problem but from the OTHER angle:

When I do a site:www.mydomain.c0m, my 11,000 page site shows 31,000+ results! Any reason why that would be?
I'm sorry, but I didn't read all the way through this thread, and I don't know if you found a solution to to your question or not, so....

I have a site that has a maximum of around 23,000 pages. Almost all of them are dynamic, so overlooking the dynamic possiblities isn't the answer to this. When the front page of Google stated that they had 4 billion pages in their index, a site: search showed (very) roughly the right number of pages for my site - almost always higher than was possible - up to 10,000 higher, but a similarish ballpark.

When they doubled the stated number of pages in their index to 8 billion, they also doubled the number of pages that they claim to have indexed from my site - literally - never below 50,000 and up to 60,000+. And it's been the same ever since.

So I've never believed that they have 8 billion pages in the index, or anywhere near it. That figure doubled at the same time that the number of pages they claim to have from my site doubled. For some reason, the figure from my site is wrong, and I assume that the 8 billion figure is wrong for the same reason.

Since then, I've assumed that they have multiple indexes, in which many, most, or all pages are stored multiple times in one form or another, and that the 8 billion figure is the sum of the multiple indexes, and not the number of unique pages that they've indexed.

If you didn't figure out an answer to your question, that may be it.

dannysullivan
03-10-2005, 07:56 AM
Relevant posts split into this new thread: PR Leakage Real & Inbound Link Issues (http://forums.searchenginewatch.com/showthread.php?t=4604).

catchmeifucan
03-16-2005, 03:10 PM
It's been 20 days since I corrected the problem. Got quite a few top 10 listings on MSN. Google is keep updating the SERPS. A lot of pages are cached but so far only one top 30 listings on Google, Referrals from Google only 10%, compared to MSN's 50%. Google used to be my #1 in referrals. Any suggestions?

Michael Martinez
03-16-2005, 07:33 PM
It's been 20 days since I corrected the problem. Got quite a few top 10 listings on MSN. Google is keep updating the SERPS. A lot of pages are cached but so far only one top 30 listings on Google, Referrals from Google only 10%, compared to MSN's 50%. Google used to be my #1 in referrals. Any suggestions?

Have you determined from your logs how many of your pages have been retrieved by Google?

Also, are your internal links still good? Can Google follow them and rebuild the relationship between all your pages?

It may take a couple of crawls and a major index rebuild for them to get you fully back into the swing of things.

I don't think they have done a second index rebuild since the beginning of February. They seem to have gone back to their weekly rollout pattern from December. I have inferred from that behavior that they are somehow updating the index without completely rebuilding it.