PDA

View Full Version : Why does Yahoo add strange query string to pages indexed?


sickone
03-08-2005, 01:09 AM
On a few of my sites, Yahoo has added some strange query string variables to the pages that are indexed.

One of them is:
www.example.com/?aref=9

Another one is:
www.example.net/?D=D

Anyone have any ideas why this is happening?

Marcia
03-14-2005, 06:38 AM
I've seen things in the stats that look a little funny, but to be honest I've never noticed that appended to a URL before. The only time has been when site owners were doing something like adding a field to identify PPC clicks (like from Adwords) and the URL got picked up by a search engine that way.

Are you running PPC with an identifier? Or can you identify any sites linking to you with characters appended to the URL?

Just grasping, I really don't know - hopefully someone else will come along who does know.

Marcia
03-14-2005, 08:03 AM
OK, I just clicked on a merchant link through the income reports at CJ and this is what it looks like

http://www.example.com/?xtr=cj

That is not an affiliate links, it is a way of tracking clicks to the merchant's site from the network, identifying where the click came from. That's an example of one of the ways I've seen such a link, but how it would make its way into the index I don't know.

sickone
03-14-2005, 10:53 AM
I'm not running any sort of PPC or affiliate programs for the site. It's a site I built and it only has backlinks coming from another site of mine. I've doublechecked the backlinks and they do not contain a query string, so I really do not understand why Yahoo is adding the strange query string.

The two sites that have the strange query string happen to be auto generated sites. Could this have anything to do with it? Could Yahoo be picking up on the format of the auto-generated sites? I do have some other sites that use the same script that don't have the strange query string added. Has anyone ever seen anything like this in Yahoo?

lots0
03-14-2005, 10:55 AM
Yahoo has been tracking clicks to sites in its index for a very long time.

I don't know what they use the info for, but I do know they are collecting it.

lots0
03-14-2005, 10:59 AM
let me make a wild guess;
The pages with the query string are the pages that are generating the most traffic for you in yahoo right now?

sickone
03-14-2005, 11:17 AM
let me make a wild guess;
The pages with the query string are the pages that are generating the most traffic for you in yahoo right now?The query strings are on the home pages of each of the two sites, so those pages are generating the most traffic for those two sites. However, my other sites that use the same script and do not have the strange query string have more traffic as they target keywords that are searched for more.

lots0
03-14-2005, 11:58 AM
I also think that the yahoo tracking is in part based on keywords. Some keywords get tracked some don't.

In most of the high traffic areas like porn, gambling and (legal) drugs almost all the listings on first page the have the tracking query strings.

just do a search for "porn" or "casino" in yahoo and take a look at the URLs.

You'll also notice that most of the high traffic pages are also being tracked thru yahoo's "rds" subdomain.

sickone
03-14-2005, 12:36 PM
I also think that the yahoo tracking is in part based on keywords. Some keywords get tracked some don't.

In most of the high traffic areas like porn, gambling and (legal) drugs almost all the listings on first page the have the tracking query strings.

just do a search for "porn" or "casino" in yahoo and take a look at the URLs.

You'll also notice that most of the high traffic pages are also being tracked thru yahoo's "rds" subdomain.To clarify, I'm not talking about the query string of the actual redirection script that does the tracking. The query string is being attached to the indexed page's url (the green url that is displayed directly under the site description in the serps). It is also attached to the url variable that is used in the redirection script.

lots0
03-14-2005, 12:48 PM
do you have an example you can post?

sickone
03-14-2005, 01:37 PM
do you have an example you can post?Unfortunately not ;)

lots0
03-14-2005, 03:34 PM
The query string is being attached to the indexed page's url (the green url that is displayed directly under the site description in the serps). It is also attached to the url variable that is used in the redirection script.

Interesting, would you PM me the URL?
I am curious about this.

I don't understand why yahoo would add a variable query string to one of your URLs.

lots0
03-15-2005, 12:31 PM
Without looking at the URL.

I think it is safe to say that Yahoo did NOT change your URL, the "script" you are using must have added the query string to your URL and Yahoo just indexed your URL.

Chris Boggs
03-15-2005, 02:56 PM
lotso, any guess you want to venture as to why yahoo would hold this data? perhaps for use within the paid inclusion sales cycle?

lots0
03-15-2005, 03:23 PM
I am not sure Chris.
With Yahoo's style of corporate management, my best and only guess would be that they are gathering and holding all that data until they can figure out how to make money with it - if they are not already using it to make money.

I know one thing, I would love to get my hands on that data...

sickone
03-15-2005, 05:44 PM
Without looking at the URL.

I think it is safe to say that Yahoo did NOT change your URL, the "script" you are using must have added the query string to your URL and Yahoo just indexed your URL.I disagree with your guess. The script I created is fairly simple and I have triple-checked the links to the home pages and none of them contain the query string. I have total control over every backlink pointing to the site so they are not an issue either.

I think it's just a quirk in Yahoo, but I will continue doing research on the subject to try and find some other examples of it that I can post.

Marcia
03-17-2005, 12:54 PM
OK, I have that thing in Cpanel that shows the last 300 pageviews and this is what showed up earlier with a visit from the Yahoo crawler - and I've seen others like it, many of them

/holiday?M=A

But nothing like that has shown up in the SERPs ever.

sickone
03-19-2005, 01:38 AM
OK, I have that thing in Cpanel that shows the last 300 pageviews and this is what showed up earlier with a visit from the Yahoo crawler - and I've seen others like it, many of them

/holiday?M=A

But nothing like that has shown up in the SERPs ever.That sounds exactly like what I'm seeing except the pages are indexed in Yahoo. I wouldn't be surprised if some of your pages started getting indexed with the strange query string.

I checked the log files for one of the sites and found I have many file requests for "/?M=D", "/?D=D", "/?M=A", "/?N=A", and "/?D=A".

PhilC
03-19-2005, 03:46 AM
It looks like Marcia might have found the answer. Do you use Cpanel? If you do, that seems to be your problem - there is or was a live link. I've never used cpanel, so I can't visualise it.

I agree with Lots0 - it won't be Yahoo! changing the URLs, which means they must have found the URLs somewhere. Have you done a link: search on some of them to find out what pages are linking to them? If you have, and if there are none, it may be that there were links at one time, however briefly.

Without an actual URL, it's very difficult to check anything.

Marcia
03-19-2005, 04:23 AM
PhilC
they must have found the URLs somewhere
Well now, here's a good one for the tin foil hat crowd. ;)

Phil, it's password protected and just a record of latest visits; but I keep the Yahoo toolbar enabled at all times, including when I view stats. How about you sicko, do you have the Yahoo toolbar installed?

PhilC
03-19-2005, 04:42 AM
Out of curiosity, I did a search on inurl:net/?D=D (from one of the examples in the first post) and there 4,650,000 results - and that's just for that particular one. Looking at a few "more from this site" listings, there are many URLs with different letters, so there are millions of URLs like that in the index.

The odd thing about them is that, apart from the #1 ranked page, no matter where I looked, none of the listed URLs are normal webpages, and even the #1 page redirects. Many are directory listings (because there is no index page in the directory), and many are pages with one word on them, or even one graphic on them. I didn't examine them all, but they aren't normal webpages or normal websites. It's the weirdest set of results I've ever seen.

Marcia
03-19-2005, 05:11 AM
Grab a look at the ones with html?M=A - some are real webpages, like the hotbot listing for yellow bellied sapsucker.

http://search.yahoo.com/search?p=inurl%3Ahtml%2F%3FM%3DA&ei=UTF-8&fr=slv1-&fl=0&x=wrt

I was just reading in some paper about crawling (and indexing) vs. ranking algorithms, so presumably there is a sequencing pattern for crawls. I wonder if it could be that those appendages to URLs are actually codes inserted to direct the crawlers - or more comprehensible to me, as some means of identification.

Also, some are from /directory/ index pages with access not even permitted. So apparently the crawler is trying to fetch pages that are on the server but aren't even linked to. Would that have functionalilty for detecting cloaked and/or orphaned doorway pages?

sickone
03-19-2005, 05:58 AM
How about you sicko, do you have the Yahoo toolbar installed?I don't have the Yahoo toolbar installed, but I do use the search box that comes packaged with Firefox which lets you search Yahoo.

sickone
03-19-2005, 06:39 AM
I wonder if it could be that those appendages to URLs are actually codes inserted to direct the crawlers - or more comprehensible to me, as some means of identification.?I thought about that as well.

The url example I gave with the ?D=D appendage only had a splash page with a "coming soon" image on it for the first few months, so it is similar to the sites that are being returned in the strange set of results.


Would that have functionalilty for detecting cloaked and/or orphaned doorway pages?I don't really see how it would be used to detect cloaked pages. The search engines would be cloaked by the ip address and/or user agent so the file request should be irrelevant for cloaking.

PhilC
03-19-2005, 12:54 PM
If they were crawler codes, they wouldn't have been left intact unless there was a brief glitch. When clicking on a link is the serps, Yahoo! does send people to a URL that includes those bits.

It wouldn't help in detecting orphan pages - only an examination of links can do that.

Switching to imagination mode (I think this might be what you meant about cloaking detection, Marcia)...

Suppose there is a particular serverside system that produces spam pages for search engines, and it is known to produce a particular kind of page when those, or even any, parameters are added. So Yahoo! requests one or more of those URLs to find out if the system is operating in the site. In my imagination, it's possible that it could happen.

But then, why would they index the returned pages? If they are the result of the serverside system, they wouldn't index them, and if they not the result of it, they still wouldn't index the returned pages under ficticious URLs.

My favourite is that the URLs were found somewhere, and without an actual example, or knowing all the history (how the pages were made, uploaded, etc.) I think we're banging our heads against a brick wall. We don't even know if the software runs serverside or not. It's like trying to detect something blindfolded.

Marcia
03-19-2005, 01:05 PM
My favourite is that the URLs were found somewhere
Well, putting the tin foil hat back on: people have been thinking, when they've got no idea how Google found their pages, that somehow pages have been found through data sent to Google by the toolbar phoning home.

Using that logic, if one person is using a Yahoo searchbox in Firefox and another is using the Yahoo toolbar - plus all the other sources that data can be pulled from Yahoo's index (or indices) doing searches, maybe the source of searches is collected by the browser or toolbar phoning home. It could be that those appendages in some way indicate where the URLs or searches for them came from. Or just from visiting a page with the toolbar, data could be beamed back to the starship, no?

It still wouldn't explain why they're showing up in the index, though.

PhilC
03-19-2005, 01:24 PM
I've never bought into Google passing URLs to the search engine from the Toolbar, but Yahoo! may do it, and, if they spidered the URLs, and received pages back, they would index them.

sickone doesn't have the Y! toolbar. I don't have the Y! searchbox. Is it placed in webpages, on the desktop, or in the browser? If it's passing every URL a person visits back to base, then it's a bit unscrupulous unless there's an option to turn that aspect on.

lots0
03-19-2005, 03:34 PM
I've never bought into Google passing URLs to the search engine from the Toolbar...
The tb sure used too, hasn't done in quite a while though. But back when the google tb first came out it sure did. Can I prove it, no, not now. But I sure got a lot of pages indexed real quick using the toolbar, in the “old days”. :)

This strange add on to the URL is poping up in a few places now. IMO, this is more of a glitch in yahoo.

sickone
03-22-2005, 12:02 PM
Using that logic, if one person is using a Yahoo searchbox in Firefox and another is using the Yahoo toolbar - plus all the other sources that data can be pulled from Yahoo's index (or indices) doing searches, maybe the source of searches is collected by the browser or toolbar phoning home.Yahoo is most likely indexing pages without links pointing to them. About 4 days ago I installed a Wordpress blog for a client on a directory that is not linked to from ANYWHERE in the client's site so they could take a look at it to see if they liked the system. As of yesterday, the blog was indexed in Yahoo. I am the only one who knew the link to the blog and have not gave it out to anyone. Why did the blog get indexed in Yahoo?

Unless WordPress automatically gives every new blog system a link from their site (which I doubt since I don't see any backlinks in yahoo for anything like that), then I have no idea how Yahoo indexed the new blog.

lots0
03-22-2005, 12:27 PM
Why did the blog get indexed in Yahoo?

Was the blog on a virtual or shared server?

sickone
03-22-2005, 01:30 PM
Was the blog on a virtual or shared server?Yes. Can you explain why that matters? Are you implying that Yahoo may try to index pages based on an IP address and not links?

Marcia
03-22-2005, 01:50 PM
No mystery about a blog - Yahoo finds blogs immediately. And WordPress by default gets picked up by the feeds. Once that happens Slurp is all over them on a steady basis.

But that's probably not the same issue, I'm getting the strange URL appendages on sites with no blog - though not in the serps. It does sound like a glitch.

Was the blog on a virtual or shared server?

I do wonder about sites on the same shared IP's though, I'm getting almost paranoid over that.

sickone
03-22-2005, 03:14 PM
And WordPress by default gets picked up by the feeds. Once that happens Slurp is all over them on a steady basis.So are you saying that WordPress provides a public link/feed to every installation of the blog system that is installed on a server? Can someone inform me on how to access this feed/link?

Marcia
03-22-2005, 03:41 PM
I really don't know, I installed through Cpanel and it's just there. :)

Here's what's linked to from the homepage as default with the install

http://www.marciahoo.com/feed/rss2/

sickone
03-22-2005, 07:06 PM
The blog I'm speaking of is in a directory that is not linked to from anywhere on the site. I had just created it for testing purposes and put it in a hidden directory around 4 days ago and Yahoo has already indexed it. I have not gave out the link to anyone either so I know that is hasn't been posted anywhere.

I do not understand how Yahoo was able to find it and index it.

kbman
02-24-2006, 07:26 AM
WP Blogs ping blog directories (like technoratti etc.) whenever you publish a post. The search engines monitor these blog directories, and then following the links from there, back to your blog. If the parent domain of the subdirectory already has got PR and backlinks, these entries should appear quickly in the Search Engines.

You can switch this option off by removing the rpc.ping-o-matic.com from the ping list in the wp options. (under writing if I remember correctly) or you should create a robots.txt file to exclude that directory from being monitored.

Chris_D
02-25-2006, 05:14 PM
I realeise that I'm responding to an old post - but Marcia wrote:OK, I have that thing in Cpanel that shows the last 300 pageviews and this is what showed up earlier with a visit from the Yahoo crawler - and I've seen others like it, many of them

/holiday?M=A

But nothing like that has shown up in the SERPs ever.

One thing that slurp is known for is making 'spurious' requests to test for 404's - particularly on dynamically generated sites.

Your crawler is asking for strange URLs that have never existed on my site, like /piopio/darkness-halo-bottom-camera.htm. Are you looking on the wrong host?
Some web servers send a site navigation page or other response page with a "HTTP 200 OK" response instead of a "HTTP 404 Not Found" result for page-not-found conditions. To check on web server handling of page-not-found conditions, Slurp will occasionally send deliberately odd URLs built from random words to sites from which no 404 results have been seen. These URLs are built intentionally to not match any actual content at the site. We save information on the web server response to requests for non-existent pages so we can correctly recognize and remove obsolete URLs in our search database.
A Slurp check for 404 results from a web server consists of requests for up to 10 such URLs. The check for 404 behavior is not a normal part of Slurp site refresh, so such requests will be rare.
http://help.yahoo.com/help/us/ysearch/slurp/slurp-10.html