PDA

View Full Version : How Do I Spot Cloaked Sites?


rustybrick
08-31-2004, 12:56 PM
Forget the debate about cloaking, I am a bit tired of that anyway. How does one detect some of the cloaking going on around the Web. Follow these instructions:

(1) Download the Firefox Browser (http://www.mozilla.org/products/firefox/)
(2) Install it
(3) Download the User Agent Switcher for Firefox/Mozilla (http://useragentswitcher.mozdev.org/) while using firefox
(4) Restart the browser
(5) Under Tools --> User Agent Switcher --> Options --> Options (that will open a dialog box)
(6) Click Add Under User Agents section
(7) In the description add "Googlebot" and in the user agent add "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
(8) Repeat this process for all the spiders you want to test. Updated comprehensive list of user agents (http://www.psychedelix.com/agents.html).
(9) Under Tools --> User Agent Switcher --> select the user agent
(10) Then navigate to the pages that you want to test for cloaking.

Hope this helps some people be Googlebot. :)

Nick W
08-31-2004, 01:02 PM
Not bad Rusty.

Unfortunately that's only going to catch the incompetent cloakers. Most cloakers wouldnt give a stuff about UA's, they care about IP's ;)

Shame firefox is so 'different' to Moz, some of the stuff is really round the wrong way, i just cant get used to it...

Nick

mcanerin
08-31-2004, 01:10 PM
Unfortunately, although it's easy to spoof an IP one way, it's very hard to spoof an IP and actually get a response back (the response would go to the spoofed IP, not yours).

There are ways around this but it's pretty complex, and usually not worth it.

I wonder how often (if ever) Google employees go home with a bunch of websites and visit them from there :D That would be really funny. Heck, you could even have a separate line/ISP coming into Googleplex just for surfing the web on non-googlebot IP's. Would be easy if they wanted to. Could even write a script to automatically compare a site that had been flagged.

That would be lots of fun, too..... ;)

Yes, I'm evil.....

Ian

hiero
08-31-2004, 01:15 PM
Pretty cool Barry......anyone have a url to test?

rustybrick
08-31-2004, 01:18 PM
Not bad Rusty.

Unfortunately that's only going to catch the incompetent cloakers. Most cloakers wouldnt give a stuff about UA's, they care about IP's ;)

Shame firefox is so 'different' to Moz, some of the stuff is really round the wrong way, i just cant get used to it...

Nick

Nick, 100% right. :) But its funny to see those people who cloak without the IP delivery methods. Of course mcanerin idea would work, but you would need some powerful software thats updated frequently to find these sites that way. And then, if your doing that, you can then easily cloak yourself.

hiero
08-31-2004, 04:05 PM
Here is a link to some cloaking examples:
http://fantomaster.com/fafaqcloak4.html

Using their first example:
Now if I'm using Firefox with the user-agent set to googlebot I should be seeing the bot page, but I'm not.

So I can only assume that the cgi script doesn't include googlebot 2.1 as an agent, right?

rustybrick
08-31-2004, 04:23 PM
Um, well no. Fantomaster's products are a bit more sophisticated then simply changing the user agent. They use IP Delivery. They know Googlebot's IP addresses and based on that info, they redirect the bot to a different page. What Nick said above in post # 2.

NFFC
08-31-2004, 04:25 PM
>they redirect the bot to a different page

I think the word redirect isn't quite right.

It seems to me that they provide a customised experience according to user expectations.

hiero
08-31-2004, 04:30 PM
Got it, thanks. ;)

Then what real value do you see the user agent switcher for Firefox having? Is it a worthwhile add on?

Nick W
08-31-2004, 04:33 PM
It would help with those dumb arsed sites that insist you get IE b4 u can view their crappy AOL like pages.......

...as if you'd want to eh?

Nick

rustybrick
08-31-2004, 04:36 PM
There are many sites that deploy user agent based cloaking only. Why? I guess because its easier. :confused:

hiero
08-31-2004, 04:51 PM
So if I write a javascript routine that looks at the user agent and returns a page for that user agent that would be cloaking also. hmmmm.....weird.

seomike
08-31-2004, 06:22 PM
Go to webmasterworld.com if you have a user agent with a search engine bot name you'll see meta tags otherwise you won't :)

Mikkel deMib Svendsen
08-31-2004, 09:06 PM
There are a lot fo reasons to detect either IP or agent names and change the content based on it. If you only check sites using Googlebot agent names you will mostly find sites that do agent identification for browser adjustments - not for SEO-cloaking.

Spoofing IP is not only difficult - it is highly illegal, I believe in most nations. So don't start on that just for de-cloaking purpose.

cline
08-31-2004, 11:04 PM
How about just comparing what G has cached vs. what you see?

Or is that too simple? :o

rustybrick
09-01-2004, 12:53 AM
Most sites that cloak also use the noarchive request

mcanerin
09-01-2004, 03:46 AM
Most sites that cloak also use the noarchive request

I find it amusing that cloakers assume that Google will obey the rules.

I find it astonishing that Google appears to actually obey them.

Ian

Mikkel deMib Svendsen
09-01-2004, 04:52 AM
I find it astonishing that Google appears to actually obey them.

They don't. Only when and if they want (or by accident, I don't know). I believe I am propably not the only one that still remember 3 month after Google released the no archive tag and deleted all sites using it! Remember?

That even is one of the reasons I NEVER again listen to "good advise" from Google. I listen, yes, but then I do what I find right :)

seobook
09-01-2004, 05:33 AM
I believe I am propably not the only one that still remember 3 month after Google released the no archive tag and deleted all sites using it! Remember?

Wow, good deal for all ;)

Mikkel deMib Svendsen
09-01-2004, 06:10 AM
Wow, good deal for all

How? The majority of cloakers already knew about it so they where not hit. Most of the sites I saw dissapear was good sites that would have served the users well :)

Nick W
09-01-2004, 06:10 AM
I listen, yes, but then I do what I find right

Spot on Mikkel, only way to fly....

>>no archive

bah humbug! - So dont bother with it! For me a no archive just throws up a big RED FLAG. If you do it right, and for the right reasons, the differences between what G has cached and what the user really sees should be pretty minimal.

If someone really wants to mess with your site, a no archive tag aint gonna stop 'em. If someone really wants to damage you do you think they'd go fill out a Snitch Report? Would they bollocks! ;) Thats just for people that have no clue what they're doing...

Nick

Mikkel deMib Svendsen
09-01-2004, 06:14 AM
If you do it right, and for the right reasons, the differences between what G has cached and what the user really sees should be pretty minimal.


yes, there are in fact several ways to "cloak" the cached page too, so (the majority of) users still don't see what was cloaked. But I guess you, Nick, already knows about that :)

seobook
09-01-2004, 06:18 AM
How? The majority of cloakers already knew about it so they where not hit. Most of the sites I saw dissapear was good sites that would have served the users well :)

I was joking...stating that Google broke people off pretty fierce with that move. I think you were joking too.

One of the hardest things about forums is that many people down the road will read a joke and not realize what people are thinking. I wonder if and how many people I may have ever given bad advice to in a joke taken out of context.

Nick W
09-01-2004, 06:25 AM
But I guess you, Nick, already knows about that

HAHA, contrary to what you might think, i really dont cloak much. I know there is a way to do that, but i dont know it. (seem to remember someone showing me but i've certainly forgotten it now...)

So, do tell Mikkel, how is that done?

Nick

seobook
09-01-2004, 07:20 AM
HAHA, contrary to what you might think, i really dont cloak much. I know there is a way to do that, but i dont know it.

Nick,
With as much as you tend to like broken algorithms (see your negative reputation marks ;)) one would assume that you would cloak everything even if just for the heck of it.
:D

Nick W
09-01-2004, 07:26 AM
Not at all, im pretty low/medium risk on a daily basis. Religious white hatters just wind me up ;)

Nick

Mikkel deMib Svendsen
09-01-2004, 07:28 AM
With as much as you tend to like broken algorithms (see your negative reputation marks )

Reputation is not a very good way to identify cloakers :rolleyes:

seobook
09-01-2004, 07:30 AM
Reputation is not a very good way to identify cloakers :rolleyes:
true enough...some of the people who could only hope to improve would never dream of cloaking :)

Chris_D
09-01-2004, 08:33 AM
The only cloaked sites you can easily detect are ones using a badly implemented cloaking script.

And if the script is badly implemented - you'll find a range of fingerprints which will expose the whole network.

Cloaking - as mentioned earlier in this thread - is about serving different content based on IP address AND user agent.

Here is a simplified explanation of the process. A webserver (which is running a cloaking script) gets a user agent (browser, bot etc) request for a page.

When any user agent (your IE webbrowser, or a Googlebot etc) makes a 'request' for a webpage from a server, it "identifies" itself (and its IP address, and a whole pile of other information, like referrer details etc) to the webserver. Thats what you see in your logfiles.

At the point of the request, the cloaking script has a quick check of the IP address - to see if it is known Googlebot IP address - many scripts don't even look at the user agent any more.

If the user agent is a known Googlebot IP address - the cloaking script returns a 'different' page to the requestor (in this case Googlebot) - a different page to the one that it would have returned for anyoldfred using IE6.0 on his local ISP IP address.

And just to be sure - a quick javascript redirect in the top of the cloaked page will ensure that even if it is cached - the page which appears to be in the cache isn't the one Googlebot saw when someone running IE etc views it.

Effective? Absolutely. Risky? Absolutely.

Why is it risky? Because all Google or Yahoo! has to do is quietly obtain a new block of otherwise unknown IPs, and quietly start spidering under a new 'experimental name'. And then compare the results to a 'known' Googlebot IP which follows it. And then the games up................

Mikkel deMib Svendsen
09-01-2004, 08:50 AM
There are lot of ways to cloak, some of them follow the way you describe, Chris_D, and some use a lot more identifiable facts and behaviours from bots.

Cloaking - as mentioned earlier in this thread - is about serving different content based on IP address AND user agent.


I would personally define it a lot broader. Something like: ... is about serving different content based on one or more accesible identifiers determined to be strong and unique enough

The JavaScript redirect is not very effective to hide cached versions. All you have to do is turn off JavaScript in your browser or request it with the "view-source:" prefix in IE

Chris_D
09-01-2004, 09:08 AM
I was just trying to provide a simplified example of the IP based cloaking process (IP is one of the 'uniques' to identify as you mentioned), as much of this thread has focussed on the content of the user agent identification string.

Of course, then there's the /file.exe method......

rustybrick
09-01-2004, 09:33 AM
Any chance we can get fantomaster back in the forums? ;)

seomike
09-01-2004, 12:46 PM
I'm sure Ralph and Dirk are busy catching new spiders and staying on top of the SE's. I think their view on SEO isn't fogged like most peoples. They know that if true SEO is to survive then there can't be any marriage between the SEO and SE industry to many conflicts of interest. Though it would be nice to have them drop a line here and there.

Incubator
09-01-2004, 01:13 PM
It would be good to hear from fantomaster, to hear about the latest in cloaking and thier perceptions


cheers

WC

rustybrick
09-01-2004, 01:31 PM
I agree seomike, as to what they are doing. But if someone has personal contact with these guys, maybe they can make a guest appearance?

Anyway, I just want to make sure this topic stays on the 'how to' versus the ethics.

Thanks.

Golgotha
09-01-2004, 03:05 PM
Here's an example of 'cloaking' although I would rather call it redirecting, or better yet, 'translating' in this example. This example shows why redirecting can be valuable to the 100% Flash website.

htt[p]://www.search-this.com/website_promotion/ASP.NET_redirection.aspx

I believe the above example is an indication that there IS still some gray areas when it comes to cloaking. I believe most people would agree that misrepresenting your site to the search-engines should be frowned upon and punishable. However, if used as an interpreter for Flash so that the bots can have an understanding then I think this is legit...

Mike Grehan
09-01-2004, 04:51 PM
I agree seomike, as to what they are doing. But if someone has personal contact with these guys, maybe they can make a guest appearance?

Anyway, I just want to make sure this topic stays on the 'how to' versus the ethics.

Thanks.

Rusty Barry,

If you want to simply detect if your competitor is
cloaking at Google, there's a simple litmus test.

Google (and Yahoo!) actually give a page weight (or
page size) when they return the results of a search.

Do a search in either place and note that they have:

www.somesite.com 44k Cached More pages from this site

or whatever.

So, you click on the top link which has a page weight
of 44k and a 3 meg flash movie full of topless dancers
appears... yup, that's cloaked ;-)

And by the way, Ralph (Fantomaster) is a close friend
and contributor to my new newsletter. I am so proud
because, not only is he one of the most intelligent guys
you'll ever meet, he's also one of the nicest.

And when it comes to cloaking... IMO why go anywhere else!

Cheers!

Mike.

dannysullivan
09-01-2004, 05:09 PM
It would be good to hear from fantomaster, to hear about the latest in cloaking and thier perceptions
I dropped Ralph an email about the thread, so perhaps he'll have a chance to swing by.

rustybrick
09-01-2004, 05:21 PM
Thanks Mike, Danny and I hope to thank Ralph (fantomaster).

seomike
09-01-2004, 05:24 PM
And when it comes to cloaking... IMO why go anywhere

I totally agree. I've written a spider audit for their spiderspy list just to see how good it is. For example everytime Google changes their bot's ip they have it updated within hours!

My hat goes off to them because that is a very big undertaking tracking all those googlebots. Not only that but all the other spiders that make it into their list which is now a text file that is over 540 kilobytes.

Incubator
09-01-2004, 09:41 PM
For example everytime Google changes their bot's ip they have it updated within hours!



Not only that but all the other spiders that make it into their list which is now a text file that is over 540 kilobytes.
As far as spider IP goes............... agreed, ever 4 hours they update , if changes happen

problem being they have to move away from flat file and find another deliver either MySql or a trigger friendly .db

cheers

WC

seomike
09-01-2004, 10:30 PM
agreed. maybe if we get them on here we can badger them into giving an sql dump every 4 hours instead :).

fantomaster
09-02-2004, 05:12 PM
Hi everyone, known and as yet unknown - and thanks, Danny, for the invitation and gentle nudging! Glad to be on board, albeit somewhat pressed for time (so what's new, eh ...)

So, to get straight to the point: indeed we have an SQL version of the fantomas spiderSpy(TM) botBase in the making, hoping to launch it sometime in Fall.

However, for the time being what we conceive to be the better solution is to simply allow it to generate your own, fully customized spider lists for further processing.

Still, it might indeed be viable to allow for download of the whole SQL database in one fell swoop as well if you feel that would be of any use.

Because it's not as if the current size of the db poses such a big performance problem in flat file format on any professional systems we're aware of. We do a lot of stuff with SQL and at the end of the day, there's lots of scenarios where flat solutions will simply perform better if only on the stability and reliability score. Having to reconstruct a corrupted db (and they always seem to go corrupt sometime sooner or later) sure is no fun!

As for UAs, we feature the spiders' in the db, of course, but for industrial-strength cloaking they're of little use IMO - far too risky to rely on that sort of easily manipulated data.

Actually, my kudos go to my partner Dirk who actual bears the brunt of the work of constantly monitoring more than 8K sites' traffic to catch the spiders as they come - this process isn't easily automated reliably, so it's really quite a chore.

NFFC
09-02-2004, 05:34 PM
>industrial-strength

Love that, I'm a non-cloaker but that is great branding.

I would be interested in your view of this
http://www.google.com/search?q=+site:www.microsoft.com+Vadmin+3.0+optimi ses+websites+for+more+search&hl=en&lr=&ie=UTF-8&filter=0

BTW
Don't forget to look at http://forums.searchenginewatch.com/showthread.php?t=1430

littleman
09-02-2004, 05:46 PM
My goodness, that is strait up spam from MS, I thought such a fine company would be above such ugly blackhat tactics! Shocking!
BTW, you can still see the original if you use a non-Mozilla/MSIE browser.

From the de-cloacked page of
www.microsoft.com/asia/solutionMarketPlace/ portal/broadcast-automation-india.htm ...

first there is an image which reads:
Welcome to our company. This page has been designed to help our visitors finding directly the information, product or service they are searching in our website.




The entry page to Microsoft's Web site. Find software, solutions and answers. Support, and Microsoft news.

Broadcast automation india



Microsoft


Solution Information The Vadmin 3.0 CMS Enterprise Edition is a web-based application that allows clients to modify, create and delete website content or images on the fly (from any computer which has an Internet connection). Vadmin 3.0 CMS Enterprise Edition broadcast automation india supports sections and sub-sections within sections. This feature enables higher levels of security due to an enhanced administration section for user roles and content access. It also seamlessly integrates with the Vadmin Registration and Security system, broadcast automation india and the Vadmin 3.0 User Management System. The Vadmin 3.0 Content Management System Enterprise Edition has a simple to use administration interface with a tree view that resembles the website navigation structure. The aim of the Vadmin 3.0 Content Management broadcast automation india System (CMS) Enterprise Edition is to give website administrators the ability to manage website content within a large enterprise level website, without the need to employ the services of a web design company. Business Issue Websites can have multiple broadcast automation india design templates associated to them. Vadmin 3.0 optimises websites for more search engines as some search engines have difficulty indexing website pages that end in query strings i.e. http://www.mysite.com/content.asp?pageid=357 Websites can have multiple broadcast automation india navigation bars that users can associate and remove pages from. Navigation bars can be textual or image based. Website administrators can create and edit. A simple hyper linking tool allows website. Value to Customer The Vadmin 3.0 CMS Enterprise Edition

broadcast automation india

is also designed to provide control over the quality of website content multiple website updaters. This is both in terms of content accuracy and content formatting with. Content quality is also assured through the use of predefined formatting broadcast automation india technology and document approval. The predefined formatting technology is employed to define the style of the text used within the website content. The use of consistent formatting in website content improves website aesthetics, and enhances website functionality broadcast automation india and the corporate brand due to a high level of consistency. Document approval is an optional feature that requires that content is approved by a content editor with approval rights before it is displayed on the website. This reduces the occurrence of errors broadcast automation india and helps enforce a high standard of content. Content is divided into blocks and can support expiry options. Users can be notified when content blocks have expired and need to be updated. Administering a website using the Vadmin CMS requires limited computer broadcast automation india knowledge. Company Information Established in 1998 Enlighten, has five years of experience under its belt in an ever-evolving industry. An IT Solutions Provider, our services include; consultation, software development, database development, SMS text services, broadcast automation india graphic design and corporate identity development, intranet development and windows based applications. Other services include; Windows NT Hosting ( www.enlightenhosting.com ) and Domain Registration ( www.enlightendomains.com ). Enlighten is an .NZ Authorised broadcast automation india Registrar and rebuilt the Shared Registry System Interface in the .Net platform. Enlighten also work closely with Telecom and Vodafone as a SMS (Short Message Service) provider, with our own XML gateways. Being a technology driven business, our primary broadcast automation india tools of development include; Web – HTML/DHTML/XML/ JavaScript/C#/Flash/VB Script/ASP and .NET Software development – Crystal Reports/C#/C++/XML/VB Script and .NET Databases – SQL 7 + Solution Information NETS Identity Management Web Edition provides perfectly broadcast automation india the authentication infrastructure to the eBusiness players (B2C). We have practical references supporting to over 2 thousand users with the proven solution to Korea Market for 4 years. For function, it is composed of Multi-site, Multi-Domain, Multi-Server broadcast automation india Single Sign-On, Automated Provisioning, Single Point Management and so on that meets with the needs of the eBusiness players NETS Identity Management Web Edition is the solution of the Authentication and Access control applicable to the Enterprise Environment. broadcast automation india It provides the suitable Total Identity & Access Management function to the Enterprise Environment through SiteMinder of Netegrity Inc, leading provider of IM. It helps enterprise manage to many structures of organization within enterprise, enhance more broadcast automation india strong security, control the access between Identity and Information. Business Issue NETS IM Web Edition NETS IM Enterprise Edition Value to Customer Reduced complexity through consolidation of identity information from across the enterprise, such as preferences, policies, and processes Reduced costs through enhanced provisioning automation, delegation, and self-service from within and outside the firewall Ensuring higher and more consistent levels of security and privacy for customers and all stakeholders interacting with enterprise systems and data Company Information NETS Company Limited(NETS) is a key enabler for wired and wireless web-based identity management(IM) infrastructure in Korea.

broadcast software china
MICROSOFT ASIA

HEHEHEhehe

Mikkel deMib Svendsen
09-02-2004, 05:49 PM
Sorry to break the fun, but this is not really cloaking - you can see the pages with a normal IE browser from any IP. They are just using a fast client side redirect

littleman
09-02-2004, 05:58 PM
You are right, but, it is still fun, becasue it is still very blackhat.

Check out the spammy HEAD section:
<!--TOOLBAR_START--><!--TOOLBAR_EXEMPT--><!--TOOLBAR_END--><html><SCRIPT language="JavaScript" SRC="javascript/balise.js"></SCRIPT><SCRIPT language="JavaScript">balise("http://www.microsoft.com/asia/solutionmarketplace/solution.asp?ind=13&sid=130201&type=1&sLanguage=6", "7", "78");</SCRIPT><HEAD><META http-equiv="Content-Type" content="text/html; charset=utf-8"><title>Microsoft - Broadcast automation india</title><META HTTP-EQUIV="pragma" CONTENT="no-cache"><META HTTP-EQUIV="cache-control" content="no-cache"><META HTTP-EQUIV="Content-Type" content="text/html; charset=UTF-8"><META HTTP-EQUIV="Content-Language" content=""><META name="DESCRIPTION" content="The entry page to Microsoft's Web site. Find software, solutions and answers. Support, and Microsoft news."><META name="KEYWORDS" content="Microsoft, asia, china, india, australia broadcast automation india"><META name="CLASSIFICATION" content="broadcast automation india"><META name="ROBOTS" content="INDEX|FOLLOW"><META name="ROBOTS" content="NOARCHIVE"><STYLE type="text/css">

.pscss {position:absolute; top:600px; left:0px; width:0px; height:0px; z-index:2;}

H1{display: inline; font-size:12px}

H2{display: inline; font-size:12px}

</STYLE></HEAD>

Mikkel deMib Svendsen
09-02-2004, 06:06 PM
If you really must put a hat on the solution I think this one is more appropriate :)

http://www.partydomain.co.uk/d-commerce/media/hatspitzblkwhite.jpg

NFFC
09-02-2004, 06:10 PM
>If you really must put a hat on the solution

hehe, clown hat seems about right.

>Sorry to break the fun, but this is not really cloaking

I'm not sure I agree, they are certainly showing VERY different content to a search engine than they expect to show to the huge majority of users.

I'm not a great fan of cloaking, to be truthful I think its kind of lame, but when done well it is exceptional. My well is having the content broadly the same between users, imho MS haven't done that and I don't think that serves the users well.

You think they will get banned?

seomike
09-02-2004, 06:11 PM
This brings up a point that is rarely discussed.

The point being:
You can use what ever agressive technique you want if your site would be sorely missed and would bring bad credability to a SE.

What would Google MSN or Yahoo! be like if the banned ebay or amazon for cloaking? Users would search somewhere else.

Theres a point where you get sooooo big that you actually take the SE's by the cohones. Mainly if you aren't found in their index then searchers just go somewhere else and they get the "crapy search engine reputation".

hiero
09-02-2004, 06:12 PM
Sorry to break the fun, but this is not really cloaking - you can see the pages with a normal IE browser from any IP. They are just using a fast client side redirectEven so, it's still cloaking. Why do you think its not?

seomike
09-02-2004, 06:12 PM
I wonder if MS uses Frontpage to edit their website :D

littleman
09-02-2004, 06:18 PM
Even so, it's still cloaking. Why do you think its not?
hiero
I am afraid IHY has done his damage. Client side redirection can act very much like server side cloaking but they are technically very different.

hiero
09-02-2004, 06:23 PM
I guess what throws me off is they are using a particular phrase to key in on in each of those links that you mentioned before. That's why is would seem like cloaking.

seomike
09-02-2004, 06:26 PM
Probably seeing how well Google picks up on Java cloaks. IMHO google sucks at it :D

If you could corrupt a future competitors results with jibberish they can't control ummmm. you win ha!

fantomaster
09-02-2004, 06:36 PM
>industrial-strength
Love that, I'm a non-cloaker
You'll see the light some day, don't worry - pace another Google Dance or two. :)

but that is great branding.
Thanks - getting along quite nicely. And yes, people really seem to love it.

I would be interested in your view of this
http://www.google.com/search?q=+site:www.microsoft.com+Vadmin+3.0+optimi ses+websites+for+more+search&hl=en&lr=&ie=UTF-8&filter=0
Funny. :)
But, ehm. it's not exactly the most competitive of keyword combinations, would you say?

BTW
Don't forget to look at http://forums.searchenginewatch.com/showthread.php?t=1430
O boy - sure would love to come (might have to don a false beard to get an invitation, eh?) but not at all sure I can make it.
But thanks for pointing it out.

Golgotha
09-02-2004, 06:47 PM
I wonder if MS uses Frontpage to edit their website :D

No, but I'm sure they use VisualStudio.NET

littleman
09-02-2004, 06:47 PM
Okay, the meaning of cloaking has gotten all garbled up and is taking on something new. I blame it on misinformation and the fact that SEM has brought in a wave of people who fancy themselves as SEOs but are just bidding on key words.

What cloaking use to mean:
As a request is given to a webserver it comes with information, this includes IP, User Agent (most of the time), Referer (some of the time) and a host of more obscure details that those 'in the know' would rather you not think about.

On the host computer (the webserver) there sits a script, program, .htaccess command, or modified/custom webserver which monitors this header information and besides what content to deliver. This can be used for SEO purposes, but also for a host of other reasons.

What it means to a lot of you today:
Showing the search engines and the end user two separate pages, by either server side or client side manipulation.

------------------------
If you have no idea how server side stuff works just keep this in mind:
True cloaking is just like asking "who is knocking on my door" and the greeting response vary depending on who you see through the peep-hole before the door is open.

NFFC
09-02-2004, 06:53 PM
>You'll see the light some day

I don't know, I've tried but spending days on designing a logo that only the spiders will see makes me think its not for me ;)

>Funny.
>But, ehm. it's not exactly the most competitive of keyword combinations, would you say?

Its funny for sure but I bet some webmaster is getting beaten out on his keywords by that. Normally the advice would be to try harder but in this case you have to beat MS, that won't be easy if they leverage their PR/Linkage/Trust. I think that as in the case of Yahoo's continuing SEO efforts they are breaking the golden rule.

>might have to don a false beard to get an invitation, eh?

You have an invitation, come and donate some software for the raffle, for charity.

seomike
09-02-2004, 06:54 PM
What it means to a lot of you today:
Showing the search engines and the end user two separate pages, by either server side or client side manipulation.

Funny we've been using Ralph's services for over 3 years and the only thing that has changed is that his spiderspy.txt file has gotten bigger, and it's still works the same way if you are a spider you go here if you aren't you go there.

I think what you call cloaking is really content specific delivery based on user preferences which is still cloaking just more widely used.

Golgotha
09-02-2004, 07:12 PM
I think what you call cloaking is really content specific delivery based on user preferences which is still cloaking just more widely used.

and that's the problem, Google has made the blanket statement, "no cloaking", thus taring all websites that cloak with the same brush.

But, redirecting is the best method of allowing the bots to understand the content of a 100% Flash website.

Not only that, but it's really the only method of sending the user to a specific location within a Flash site based off of their search results. Very nice if you are the user.

redirecting levels the playing field for Flash...I know, I know, no one here cares about Flash....but hey, always trying to help my clients.

hiero
09-02-2004, 07:17 PM
Creating a keyword rich PDF of the contents of a flash file works well also.

Golgotha
09-02-2004, 07:20 PM
Creating a keyword rich PDF of the contents of a flash file works well also.

er?
that helps how?

hiero
09-02-2004, 07:25 PM
On the same page that the flash file exists on link to a PDF file that descibes everything that's in the Flash file that you want the search engines to pick up on and just make sure that the text isn't locked when you create and save the PDF. Engines that read PDF files will then index your information.

Incubator
09-02-2004, 07:37 PM
Cant you use the embeded tags as well?


Cheers WC

Golgotha
09-02-2004, 07:40 PM
that really isn't that helpful, I don't want to hijack this thread and turn it into a lesson on optimizing Flash, you can find that here: htt[p]://www.search-this.com/website_promotion/ASP.NET_redirection.aspx (remove braces around the p in htt[p]) , but my point is audoredirecting comes in very handy in this case.

littleman
09-02-2004, 07:49 PM
Getting off topic but...
>flat file vs.MySql

You would be amazed at how fast a flat file can be referenced, especially if you come up with creative ways to keep the file sizes down

Off topic even more.
fantomaster, you still making a living promoting your warez? I mean the heyday for machine generated stuff has come and gone unless you are talking about bottom feeding. I remember the way you use to logspam the net, did you get much of a return on that?

fantomaster
09-02-2004, 08:40 PM
Getting off topic but...
>flat file vs.MySql

You would be amazed at how fast a flat file can be referenced, especially if you come up with creative ways to keep the file sizes down
Quite my point.

Off topic even more.
fantomaster, you still making a living promoting your warez? I mean the heyday for machine generated stuff has come and gone unless you are talking about bottom feeding. I remember the way you use to logspam the net, did you get much of a return on that?
Wouldn't exactly call it bottom feeding catering to Fortune 500 corps, no.

Also, apodictic statements like that tend to be a mite too black-and-white for my personal taste.
Depends entirely of what you actually produce, mechanical or otherwise - there's plenty of human generated stuff around that wouldn't hit the mark if you rubbed it on with a wad of Brillo.

We can probably agree that at the end of the day, all that counts are results. So yes, we're quite comfy with selling our fantomas shadowMaker(TM), thank you very much. But of course, the real money is in actually servicing SEO clients with the caliber of results we do tend to achieve.

I remember the way you use to logspam the net, did you get much of a return on that?
That, my dear, wasn't "logspam", it was highly targeted "logfile marketing". :) Sure beats the hell out of banner ads.

Actually, Brett brought it up in an exchange and we decided to have a bash at it.
It's quite effective, provided you target the right people, i. e. the more tech savvy webmaster, with the right sort of product.
That apart, it's a pretty good method for getting a fairly reliable indication of how many sites are actually online under which TLD, etc. -
invaluable for statistical analysis.

littleman
09-02-2004, 08:54 PM
>highly targeted "logfile marketing".

That is priceless, I'll have to quote you some time.

Well, I don't know how targeted you actually were, I had a couple hundred thousand pages floating out there and I got several hundred a day from you.

>apodictic statements

Okay, I'll put it this way...
cloaking = radio
link pop = television

Or more:
cloaking = polyester
link pop = cotton

You can bottom feed for F500 companies though, I've done it plenty -- those 200k pages weren't for porn.

lots0
09-02-2004, 08:56 PM
How Do I Spot Cloaked Sites?
Spoof one of googlebot's IPs, that is the only way I know of.

But why would anyone other than a SE want to spot a cloaked page in the first place?

fantomaster
09-02-2004, 09:43 PM
>highly targeted "logfile marketing".

That is priceless, I'll have to quote you some time.
Help yourself.

Well, I don't know how targeted you actually were, I had a couple hundred thousand pages floating out there and I got several hundred a day from you.
The targeting wasn't in the domains we picked - in fact, we picked them all, if that should still qualify for "picking".
What we did target, however, was webmasters who would actually look at their logs and follow up links -
that's one a very, very tiny minority but what you do get in the end is highly prequalified techie traffic.
Now, all you need is some product or service that will appeal to this specific crowd, and bingo!

Anyway, kudos to you for noticing it at all - that makes you a member of a very select elite. As if you didn't know ... :)

>apodictic statements

Okay, I'll put it this way...
cloaking = radio
link pop = television

Or more:
cloaking = polyester
link pop = cotton.
IMO that's just more of the same. Just try (no, better don't!) watching tv when steering a car ...

In any case, I find the analogy pretty much flawed, just as is Google's PR rationale: link pop may - possibly -
work like a song in academia (after all, it's nothing but an overblown citation index - and let's not get into
the issue that there are plenty of those who would reasonably argue that citation indices are nothing but a
load of bull in the first place, blatantly copycatting the music industry as they do ...).
But in a commercial/commercialized Internet environment? Bah!
Call it human nature, call it stupidity or call it entrepreneurial shrewdness - people simply aren't that amenable
to cross linking with their competitors.

Which begs the obvious question of what to do if you're running a company nobody important wants to link to.

Plus, we've always viewed cloaking as being just one option in a whole arsenal of tools.
A very powerful one, indeed, but it's never been a particularly wise policy putting all your eggs
in one basket.
And if you know how to get decent link pop to your cloaked domains as well, why not make use of it?

Of course, link pop shouldn't be equated with PR (not saying you did - just pointing to the fact that lots of people
seem to subscribe to exactly this simplistic view these days) as it's a lot more than that.
But that is quite another story. As is the manner in which Google's continually been demoting the linkpop aspect
of their PR algo for ages now ...

You can bottom feed for F500 companies though, I've done it plenty -- those 200k pages weren't for porn.
Point taken, though we haven't had that questionable pleasure ourselves to date.

Nick W
09-03-2004, 04:46 AM
I have a couple of questions if I may fantomaster:


What purposes do webmasters use cloaking for other than on-page stuff?
How is your txt file delimited?


The reason I ask the first one is because I cloak some sites in a rather mild manner. I've not done anything to hard-core with the techniques but am always interested to see the possibilities...

Presumably hiding links, and networks..?

Nick

seomike
09-03-2004, 12:16 PM
Though I'm not fantomaster

>> What purposes do webmasters use cloaking for other than on-page stuff?

Once you have their spiderspy(TM) list then it's on. You can just match IP's when regular pages are loaded and add or subtract blocks of code ie meta tags, additional text, navigation etc...

Since the spiderspy.txt file is structured like so:

216.22.22.22
#216.22.22.21
#216.22.22.20
216.22.22.19

It's fairly easy to use a $buffer = fget(...) if($buffer == $_SERVER["REMOTE_ADDRESS"]) to see if it's a spider. Of course this code is just a partial example but you can see where I'm going with it. I call these hybrid cloaks because you serve the same page to both user and spider.

Just remember that the # comments are bots you don't want. Example would be bablefish. You let that sucker through and it will translate your spider only site and when someone tries to view your site in spanish they see a translated version of your spider food ha!

Nick W
09-03-2004, 12:39 PM
I use this:

function isbot() {

$iplist=file('/path/to/ips.txt');
$bot=0;
foreach($iplist as $key => $val) {
$remote=$_SERVER['REMOTE_ADDR'];
$ip=trim($val);
if(preg_match("/$ip/", $remote)) {
$bot++;
}
}
if($bot == 0) { // Human
return FALSE;
} else return TRUE;
}


I wouldnt be at all surprised to find it's flawed but it's stood me in good stead for some time now ;)

Nick

Incubator
09-03-2004, 12:48 PM
I do have a question for the great fantonmaster. I have seen your apps in action and they are as you say very "industrial".

I came across a small site offering a cloaking solution
http://www.searchenginecloaker.com/

do you have a comment on that product compared to yours, this one seems new to me.....

Thanks Fantomaster for showing up here !

Cheers

WC

fantomaster
09-03-2004, 06:25 PM
Hi, Inubator,
and thanks for the friendly welcome!

I do have a question for the great fantonmaster. I have seen your apps in action and they are as you say very "industrial".

I came across a small site offering a cloaking solution
http://www.searchenginecloaker.com/

do you have a comment on that product compared to yours, this one seems new to me.....

As it's not our policy to comment on competitors' products and services in public, I'm afraid I'll have to pass this one on.

However, I guess I may point out that AFAIK it's not an entirely "new" product - I seem to remember having seen it around for the better part of a year and a half or so.

fantomaster
09-03-2004, 06:29 PM
Though I'm not fantomaster
Once you have their spiderspy(TM) list then it's on. You can just match IP's when regular pages are loaded and add or subtract blocks of code ie meta tags, additional text, navigation etc...

<snip>

Just remember that the # comments are bots you don't want. Example would be bablefish. You let that sucker through and it will translate your spider only site and when someone tries to view your site in spanish they see a translated version of your spider food ha!
Couldn't have said it better, Mike! :)

Incubator
09-03-2004, 07:30 PM
Thanks Fantomaster I respect that....maybe i just have to buy them all to see for myself....much appreciated, you have supplied some great code over the time!!!


Cheers
WC

littleman
09-03-2004, 08:45 PM
When I was cloaking to hell I had servers doing a lot of work. Because of that I would actually hard wire my IP regex code into the script itself instead of grabbing an external file and throwing the IPs into a foreach loop. The response time made a difference. The system completely smokes some big operations out there pushing *SQL DBs on a lot more horsepower. Of course, today servers are a lot stronger, so it is probably less important

Mikkel deMib Svendsen
09-03-2004, 08:53 PM
Of course, today servers are a lot stronger, so it is probably less important

Yes, and not only that. Scrip engines are also a lot better now. ASP and PHP is not the same it was 3-4 years ago.

However, for large scale systems you still want to embed the most resouce entensive processes in some more controlable and fast executable envirements than scripting. It's less flexible but still more stable, in my experience :)

downtown
09-10-2004, 08:53 PM
Shame firefox is so 'different' to Moz, some of the stuff is really round the wrong way, i just cant get used to it...


This same User Agent Switcher extension ALSO works for Mozilla.

You can find lots of other great extensions for both Firefox AND Mozilla at http://extensionroom.mozdev.org/

Roger B.

Nick W
09-11-2004, 02:27 AM
Thanks downtown, i'll get right on that this morning! ;-)

Nick

davidof
09-22-2004, 09:50 AM
You are right, but, it is still fun, becasue it is still very blackhat.

Check out the spammy HEAD section:
<!--TOOLBAR_START--><!--TOOLBAR_EXEMPT--><!--TOOLBAR_END--><html><SCRIPT language="JavaScript" SRC="javascript/balise.js">

the balise function name got me suspicious, it is a French word meaning marker or buoy, I found this French site which is identical in form to the Microsoft spam pages

http://www.hiver.vvf-vacances.fr/location-chalet-jura.htm

So Microsoft is using a software to create its spam... Anyone know which one?