View Full Version : Scoring system for detecting PPC click fraud
Clicklab
06-24-2004, 06:50 PM
Greetings:
Brittany Thompson started an excellent discussion on the looming PPC click fraud problem, Have You Been a Victim of Click Fraud? (http://www.webproworld.com/viewtopic.php?t=22173).
We at Clicklab have developed a score-based system for fighting click fraud. We call it Click Inflation Index.
The system applies a number of tests to each user session on your website, much like spam filters do with emails. Penalty points are assigned to the session for each failed test. If the cumulative penalty score exceeds the threshold, user session is tagged as inflated. Click Inflation Index is a share of user sessions tagged this way vs. total user sessions, expressed as a percentage.
Here is a brief summary of the click fraud tests you can apply:
Test 1. Visit depth, expressed in the number of pageviews or time spent on site.
Test 2. Visitors per IP triggers penalty if ratio for a particular IP exceeds the control by certain percentage.
Test 2a. Paid clicks per IP works same way as 2, only applies to paid clicks.
Test 3. No cookie - no play? We think you should penalize a user session if it doesn't accept cookies.
Test 3. Pageview frequency helps to differentiate between human user sessions and auto-clickers.
Test 4. Anonymous proxy servers should definitely trigger a penalty.
Test 5. Geographic origin. You may want to penalize countries from which you never have and likely never will receive a viable lead.
Test 6 and beyond. Finesse and customize. You can devise your own triggers and assign points to them.
Here's an article describing this click fraud fighting system (http://www.webpronews.com/ebusiness/seo/wpn-4-20040622UsingScoringSystemtoCombatPPCClickFraud.ht ml) in a little more detail.
Have you been using other methods of detecting and documenting PPC click fraud? What other tests would you apply in your particular situation?
Mikkel deMib Svendsen
06-26-2004, 05:48 AM
I am not sure what you want to do with this?
First aff all, most of the tests are not within the borders of what is usually defined as fraud - and certainly not in line with what I understand the engines agree on. So, even if you detect fraud, by your definition, I am not sure what to do with it if the engines will not accept the "evidence" and refund my accounts.
Personally, I do not agree on things like your cookie test. For some regions in the world a very high percentage have turned off cookies - does that make every click fraud in Germany? I don't think so. And what if people accept the cookie and then wipe it out within the moth? (last numbers I heard was that over 40% of US users wipe out ALL cookies within a month!)
Also, the number of page views per session, I think, may have more to do with how precise your ads target that page and how well the page is. So if your taget page is bad does that mean the clicks are fraud? I don't think so.
The last problem I have with such a list is that draud detection only works if you keep your tests very secret. Now that you told the world how you test hard do you think it will be for someone to design a click fraud scam that goes right throgh all the test? Not very difficult, i promise you :)
Clicklab
06-26-2004, 09:24 AM
First aff all, most of the tests are not within the borders of what is usually defined as fraud - and certainly not in line with what I understand the engines agree on. So, even if you detect fraud, by your definition, I am not sure what to do with it if the engines will not accept the "evidence" and refund my accounts.
Correct, none of these tests alone should trigger your click inflation flag. Your goal is to tune the system so three to four or more simultaneous offenses would be needed for the score to add up to the threshold.
If this isn't "evidence" for PPC SEs, then what would be, pray tell?
Personally, I do not agree on things like your cookie test. For some regions in the world a very high percentage have turned off cookies - does that make every click fraud in Germany? I don't think so.
See my reply above. It's only one of many tests.
And what if people accept the cookie and then wipe it out within the moth? (last numbers I heard was that over 40% of US users wipe out ALL cookies within a month!)
Our test is a little different. We determine whether this particular visitor accepts cookie during the session. Not whether it's preserved for later.
Also, the number of page views per session, I think, may have more to do with how precise your ads target that page and how well the page is. So if your taget page is bad does that mean the clicks are fraud? I don't think so.
Again, this is just one of many tests.
BTW, to make this particular test more accurate, you can apply it to the IP address, i.e. measure "percentage of single-page visits from IP address," and compare it to a control group. In other words, penalize for a statistical deviation instead of an absolute value.
The last problem I have with such a list is that draud detection only works if you keep your tests very secret. Now that you told the world how you test hard do you think it will be for someone to design a click fraud scam that goes right throgh all the test? Not very difficult, i promise you
I disagree. Without knowing your settings and control group statistics, it will be difficult to design an auto-clicker that passes all tests.
Besides, in order to go right through all the tests, click thieves will be forced to slow down their defrauding activity, thus reducing both their ROI and your losses.
Mikkel deMib Svendsen
06-26-2004, 05:16 PM
> Again, this is just one of many tests.
Yes, but like some of the other tests I do not think it has anything to do with click fraud - it has more to do with bad webdesign, bad campaign segmentation etc. I am pretty sure the engines is not going to refund anything if I tell them: These users did not accept cookies, did not see many pages per session, came from a country I do not like and used a proxy server. Even if all those things are true it still dosen't prove click fraud to me - and I am pretty sure search engines would see it the same way.
So, even though these tests might be interesting to make, to fine tune your campaign I would not expect to get any refunds from it.
> I disagree. Without knowing your settings and control group statistics, it will be difficult to design an auto-clicker that passes all tests.
Do you know why search engines do not publicly talk in details about how they do this filtering or how they do spam filtering on organic results? One of the reasons is that it makes it way too easy for spammers and click frauders to trick the systems. The less you tell the public about any anti fraud system the less likely it is that anyone breaks it :)
Clicklab
06-26-2004, 05:52 PM
> Do you know why search engines do not publicly
> talk in details about how they do this filtering or
> how they do spam filtering on organic results?
> One of the reasons is that it makes it way too
> easy for spammers and click frauders to trick the
> systems. The less you tell the public about any
> anti fraud system the less likely it is that anyone
> breaks it
That's one way to look at it. Another is the open source way.
Thanks for the feedback.
Mikkel deMib Svendsen
06-26-2004, 07:07 PM
> That's one way to look at it. Another is the open source way.
Try and sell that one to the banks. I am sure they'll love open source security systems :)
Clicklab
06-26-2004, 07:37 PM
Here's a chapter from a book by David A. Wheeler that offers a good recap of the current state of this perpetual debate:
http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/open-source-security.html
Quotes:
"Generally attackers (against both open and closed programs) start by knowing about the general kinds of security problems programs have. There's no point in hiding this information; it's already out, and in any case, defenders need that kind of information to defend themselves."
"In short, the effect on security of open source software is still a major debate in the security community, though a large number of prominent experts believe that it has great potential to be more secure."
Mikkel deMib Svendsen
06-29-2004, 03:49 AM
Personally I would never base an important security system on Open Source software and I have never seen any examples of it. Even though "some experts" can have theoretical arguments that it would be OK I do not think so and looking at the widespread of Open Source security systems it looks like the most agree with me :)
However, Open Source or not, it dosen't really change the fact that I think the tests you make show more about the quality of the site or the segmentation of the campaign than the quality of the visitors.
I do not think you can call a visit fraud just because the visitor does not perform as expected. It may be a bad campaign, a bad website but certainly not fraud and as such I very much doubt the engines will refund anything based on such findings.
Clicklab
07-02-2004, 04:41 PM
One other test you may want to consider is "sales per IP" or "actions per IP," referring to the total amount of orders or number of actions taken, such as a newsletter subscription, per IP address.
To make sure it's not a design or usability flaw, as justly pointed out earlier in this thread, compare the results against a control group of unbiased traffic sources, such as organic search results.
It's unlikely that click ghosts will be placing orders or signing up for your newsletters :)
AlchemistMedia
07-16-2004, 07:12 PM
ClickLab, good start on identifying click fraud.
Mikkel, valid points as always. See you in SJ?
For more information on the click fraud issue:
Background on Click Fraud (http://www.alchemistmedia.com/CPC_Click_Fraud.htm).
To obtain refunds for any identified fraudulent or non-legitimate CPC traffic, very specific click data has to be retrieved and then analyzed by both technological and human means using a combination of benchmarking, trending, and nitty-gritty research. Then a detailed case must be made based on both analyses and presented to the CPC engine(s), often multiple times and working with multiple data sets, until a suitable refund is negotiated between the two parties.
I've been handling this aspect of CPC campaigns for a few years now and welcome advertisers who would like more help in handling click fraud internally, or who would like help with their data analysis and presenting a case for refund negotiation. It takes a lot of legwork, but it's been done with success!
For more information: Alchemist Media Click Fraud Auditing & Refund Negotiation (http://www.alchemistmedia.com/Click_Fraud_Auditing.htm)
Clicklab
07-17-2004, 10:01 AM
Welcome AlchemistMedia :)
Good to see PPC click fraud consulting services spring up!
Mikkel deMib Svendsen
07-17-2004, 10:33 AM
Good to see you onboard, AlchemistMedia. yes, I'll be in SJ :)
As always you have some exelent information about click fraud. What is more scarry about click fraud is not the kind you (with your skills and knowladge) can detect but all the rest (!). Both human and bot based systems can (and mostly likely does) get build so well that I just do not see how anyone could identify them in a way that would saticefy recenable requirements from engines on documenatation. That was part of my point in some of the previous posts.
Unfortunately, the more we openly discuss how to detect and develop systems for detecting click fraud the easier it becomes for click fraud artists (if you can call them that) to adjust their systems to go under our rader ... So you say you look at page views, great, I'll ad randomly number of page views and time spend on site to my bot. Oh, you check IPs too - I'll better add a buch of dial up accounts and public proxy servers. You get the picture :)
I am not sure what the best way around such a paradox is: I agree the issue is very important, but how do we avoid giving too much of our click fraud detection knowladge away to the wrong people?
orion
07-19-2004, 01:17 PM
CNET is running this article today on search engines and click frauds,
http://news.com.com/Exposing+click+fraud/2100-1024_3-5273078.html?tag=nefd.lede
It's a matter of time before the SEC initiates some probes with the publicly traded SEs and PPCs involved.
Orion
Clicklab
07-19-2004, 01:40 PM
Thank you Orion - and Stefanie Olsen :)
Free market will punish PPCs that aren't taking adequate measures against click fraud well before SEC get to them (if they ever do).
orion
07-20-2004, 11:46 AM
So many SEC probes have demonstrated that when fraud and deception is involved, the free market cannot regulate itself. History has shown that soon or later all vested interests will collude. Murphy's Law applies: "If anything can go wrong, it will."
Meanwhile, here is a thread on click frauds at slashdot
http://slashdot.org/articles/04/07/20/1234238.shtml?tid=217&tid=95
Some slashdot posters could be mistaken for SEs pr folks trying to save face, but I could be wrong.
Orion
bradbyrd
07-21-2004, 09:09 PM
click fraud has actually been a topic of debate within the industry for years...
who remembers direct hit (acquired by ask jeeves in 2000 (http://searchenginewatch.com/sereport/article.php/2162261)) or globalbrain (acquired by nbci/snap.com in 1999 (http://searchenginewatch.com/sereport/article.php/2167341)) , both of which ranked search results in part based on click popularity?
i know for a fact that folks were building tools to inflate their clickthrough numbers (and improve their SEO rankings) back then! :)
the key point to consider is that PPC boils down to one thing: results.
do you get value out of what you pay for?
pay-per-click search engines have several incentives to do all they can to limit click fraud, including:
(a) avoiding potential lawsuits if they are found to be overcharging customers, and
(b) delivering VALUE, because advertisers will continue to invest if they find value in the advertising channel.
inflated numbers only hurt the PPC search engines in the long run, because they discourage advertisers from INCREASING their ad spend. IOW, if your ad budget isnt working for you someplace, you shouldnt/wont continue to spend there (assuming you've made all reasonable efforts to optimize your efforts).
our experience has been that the engines are surprising good and PROACTIVE about detecting and crediting potentially fraudulent traffic. it happens frequently (corrective refunds), they just dont make a big deal about it. well, make that the MAJOR ENGINES are good... several secondary engines (without naming names) have proven notoriously BAD -- based our experience (YMMV) -- at detecting and filtering fraudulent clickthroughs.
i agree with orion - the free market will punish engines that can not find and fix these problems. case in point: we limit our advertising on engines which have a demonstrated track record of poor click fraud policing.
but i dont agree with the tone of the CNET article. in fact, this is a problem that really is being addressed by leading engines, and they have proven they can do a respectable job at it.
i absolutely agree with Mikkel that it is IN THEORY feasible and possible to build powerful click fraud tools. but i can only see two scenarios where there's an incentive to do so:
(a) a PPC affiliate for a search engine (like an AdSense partner or Overture distribution partner) artificially inflates clickthrough numbers for their own site's traffic, to collect affiliate revshare on the billed traffic, or
(b) an advertiser intends to waste their competitor's ad budget by generating false clickthroughs that burn up ad dollars without generating value.
in both instances, the offending party risks punishment: being banned either as an affiliate or an advertiser. i can not imagine that any major advertiser would knowingly and willingly pursue a "burn" strategy vs. their competitors: that's a major lawsuit waiting to happen if it's are discovered.
while it may be the case that rogue, small advertisers "make a run at it", they will eventually be caught and punished. in other words, while it may not be impossible to eliminate the risk, it can be mitigated.
and even if there is some click fraud going on, it fundamentally boils down to the same question: are you still generating value from your advertising dollar. if you are, you can stomach some waste. sure, you'd like to eliminate it, but you'll run with it. and if the engines can do a better job at paring down the waste, you'll spend even more...
in short, the weight of economic rewards seems to lie with the current model, which means the model may have room to evolve and refine itself but it doesnt seem to have any major threat to its legitimacy. and that's because, fundamentally, advertisers like it.
Mikkel deMib Svendsen
07-21-2004, 10:09 PM
i can only see two scenarios where there's an incentive to do so
I think you are forgetting the kid-factor :)
A lot of the viruses that has cursed damage for billions was made by "adventurous" teenagers with no real purpose other than to show they could do it (I guess). Brag value or something. Or just plain "cyber-terroism". There is probably plenty of crazy fanatics that would love to be the ones to bring down "that evil commercial part of the Internet" - if they thought they could. The point is that you only mention the rational reasons for click fraud and I personally believe that a great deal could be in the "terror" category.
I am not an expert in security issues or viruses and such. Not even close to! But I did hear som scarry scenarios from techie friends I have about viruses and how we (us all) handle them today. Basically they told me that as long as viruses and worms attack us with the frequency they do we (they) can stop them and limit the damages but if some kind or group or hostile country decides to launch a coordinated virus attack with hundreds of new viruses at one time, or a new virus every day for a year, we would basically not be able to defend ourself - not even with the help of the wildest geeks.
Do we have the same situration in PPC click fraud? Can we, eventually, do no other than sit back and hope a serious attack will never come - or is there "defend" strategies that everyone should look into now?
orion
07-22-2004, 01:13 AM
No one is questioning offer and demand or the validity of PPCs. Still, at the end of the day (or the account), Murphy's Law remain in the picture and we have to face reality.
I agree with Mikkel. Here is another scenario:
In BI, we have found first hand of competitors type A trying to exhaust the accounts of competitors type B through fraudulent click schemes, essentially ruining other business accounts (collateral damage to innocent third parties). It appears this is going on at all levels, from average joe accounts to sussie corporate accounts.
Orion
Clicklab
07-22-2004, 09:07 AM
Thank you, Bradbyrd for taking this conversation a step further. You are absolutely correct. What matters in the end is the value you derive from a given traffic source.
Can this value be improved? Absolutely -- if you have a proper means of measuring and quantifying all the factors that affect your end value. Factors such as your being able to drive better quality traffic at lower cost, website usability, and so on.
Can we, eventually, do no other than sit back and hope a serious attack will never come - or is there "defend" strategies that everyone should look into now?
Very good point, Mikkel. Imagine a virus that sits dormant on millions of servers and desktops. It's designed not to interfere. Instead, it runs in the background and clicks on PPC ads, using your identity and Internet connection. It may forever remain below the radar, or be used in a massive coordinated attack which would wreck havoc in the ecommerce world.
The second scenario is similar to the distributed denial of service attacks (DDoS). The difference is in the target - networks and data centers vs. paid ads.
To diffuse such an attack, you need to quickly detect its onset. That’s where the scoring system for detecting click fraud comes in.
Let’s say you determine that there is a baseline of X per cent of questionable clicks. You still make money on the campaign, so you continue to run it, watching your click inflation index closely. In the ideal world, your monitoring system will send an alert about the onset of a massive click fraud attack to your email or cellphone, so you can quickly react.
The only way to be in control is to have reliable intelligence. Or so they say on TV :)
Mikkel deMib Svendsen
07-22-2004, 11:13 AM
In the ideal world, your monitoring system will send an alert about the onset of a massive click fraud attack to your email or cellphone, so you can quickly react.
OK, let's assume we all get this sort of allerting installed and a massive attack is launchen. I guess we will all pause our accounts, right. The question then is, will the engines still be there when we get back? :)
orion
07-22-2004, 01:43 PM
Mikkel:
True. Do's and Don't recipies are not enough. A real-time monitoring system is better.
I found some interesting behaviors in the SEW threads.
CIRCULAR FORMATIONS
Some times forums run around the same topics, usually with same posters cross-citing forums. This, actually is a good thing. Example:
http://www.webproworld.com/viewtopic.php?t=22173&postdays=0&postorder=asc&start=25&sid=e580e0f4120443aa7c3794dd548337c4
TRIANGULAR FORMATIONS
Some times forum threads adopt a triangular formation. Equally a good thing. Example:
1. This SEW thread discusses click frauds
"Scoring system for detecting PPC click fraud"
http://forums.searchenginewatch.com/showthread.php?p=6344#post6344
2. This SEW thread injects click through data to keywords research
"Tools to analyze keyword search frequency"
http://forums.searchenginewatch.com/showthread.php?p=6346#post6346
3. This other thread discusses keywords research
"Keywords Co-occurrence and Semantic Connectivity"
http://forums.searchenginewatch.com/showthread.php?p=5893#post5893
Even other threads already have walked the walk. Good thing, too. Here are some news on the eternal click throughs fraud problem:
Exposing Click Fraud (CNET)
http://news.com.com/Exposing+click+fraud/2100-1024_3-5273078.html?tag=nefd.lede
Google's Fraud Squad Battles (Slashdot)
http://slashdot.org/articles/04/07/20/1234238.shtml?tid=217&tid=95
Google Adwords Ignores Click Fraud (JimWorld)
http://jimworld.com/apps/webmaster.forums/action::thread/forum::googleadwords/thread::1064825360/
Click fraud or high visits from India and Nigeria (WebMasterWorld)
http://www.webmasterworld.com/forum81/1778.htm
Garrett's Article on Click Fraud (WebProNews)
http://www.webpronews.com/insiderreports/searchinsider/wpn-49-20040505ClickFraudTheGoogleKiller.html
A Perfect Storm for Pay-Per-Click? (CNET; Hans Riemer, from Market-Vantage)
http://news.com.com/2010-1024_3-5139481.html?type=pt&part=inv&tag=feed&subj=news
It would be interesting to know the degree of bias introduced due to fraudulent click throughs in keyword marketing programs and how this bias plays into keyword searches and demographic data.
If someone has already this kind of info, let us know. We're conducting a research work on click through (fraudulent and valid) and terms co-occurrence/sequencing. The goal is to build a math model and make some predictions based on LSI and real-time data. We are interested in queries consisting of 2-3 words, only. We also want negative and control terms. We can talk through regular email.
Orion
PS. We had some problems accessing this morning the SEW forum. Anyone experienced similar incidents?
Mikkel deMib Svendsen
07-22-2004, 01:58 PM
PS. We had some problems accessing this morning the SEW forum. Anyone experienced similar incidents?
There was a temporary glitch of some kind that our engineers was quick to fix :)
Clicklab
07-22-2004, 02:10 PM
If someone has already this kind of info, let us know. We're conducting a research work on click through (fraudulent and valid) and terms co-occurrence/sequencing. The goal is to build a math model and make some predictions based on LSI and real-time data. We are interested in queries consisting of 2-3 words, only. We also want negative and control terms. We can talk through regular email.
Interesting...but vague :)
orion
07-22-2004, 03:43 PM
I don't know about the vague part.
What we need: click through fraudulent and valid data
Data type needed: 2-3 word queries, negative and control terms
Objectives: A math framework for predicting/modeling queries.
Long-term goal: A real-time monitoring system.
Tools to be used: LSI and semantic tools (c-indices, EF ratios).
Orion