View Full Version : Fox News & Danger Of Citing Search Counts
dannysullivan
06-18-2004, 01:57 PM
Had a reporter just ask me about a case where Fox News tried to defend calling the BBC anti-American in part by using a Google count. The story (http://news.bbc.co.uk/2/hi/entertainment/3805691.stm) he's following up on says:
The network also said searching for the phrase "BBC anti-American" into the Google internet search engine resulted in 47,200 hits.
The count means nothing, of course. There was a good story (http://news.com.com/2100-1032_3-5211658.html) from News.com recently that I reviewed (http://searchenginewatch.com/sereport/article.php/3360261#law) that talked about a number of examples like this, where counts are used to try to "prove" things in court.
Here are some funny things to consider in breaking down the Fox defense. Remember, the test Fox did was to say that 47,200 matches for bbc anti-american suggests the network is anti-American. So presumably, more matches mean even more anti-Americanism. Given this, here are some counts from today:
51,300 for bbc anti-american (http://www.google.com/search?q=bbc+anti-american)
54,000 for fox anti-american (http://www.google.com/search?q=fox+anti-american)
143,000 for white house anti-american (http://www.google.com/search?q=white+house+anti-american)
351,000 for bush anti-american (http://www.google.com/search?q=bush+anti-american)
How could Bush be so anti-American? Of course, the counts only show how many pages, or links to pages, have those words in them -- nothing else.
Anyone else know of some example of counts being used wrongly like this?
rustybrick
06-18-2004, 02:48 PM
I am shocked that people would use a simple Google "total results" found to determine how anti anything a news site is. I'd also be interested in hearing if anyone else has heard of this type of logic being used elsewhere.
Oh by the way: 10,200 for danny sullivan anti-american (http://www.google.com/search?hl=en&lr=&ie=UTF-8&c2coff=1&q=danny+sullivan+anti-american&btnG=Search) :)
pleeker
06-18-2004, 02:56 PM
Anyone else know of some example of counts being used wrongly like this?
Yes. See post #2 above from rustybrick. :)
Seriously, this is astonishing. People really do think Google knows everything, that Google is God. I appreciate Google as much as the next guy, but it's time to start educating people on what it is and isn't and what it can and cannot do. There's nothing wrong with using Google for research, as many judges (or reporters) are apparently doing. But the system falls apart when the researcher isn't able to understand the real value (or lack thereof) of what Google gives you.
garyp
06-18-2004, 04:54 PM
Nice post Pleeker. I agree.
I've been writing about this topic on ResourceShelf for a long time. As a librarian who often works with news and media librarians the issue continues.
A few links.
A great article on the topic from MediaBistro.com with examples
http://www.mediabistro.com/articles/cache/a1217.asp
A ResourceShelf post from 11 months ago. Examples from two papers.
Finally, none other than William Safire in The New York Times has used page estimates in two columns. (NOT available free on the web)
http://www.resourceshelf.com/archives/2003_06_01_resourceshelf_archive.html#105591499073 339479
On October 13, 2003 Safire writes,
"Before joining Dean in castigating McCain for putting words in his mouth, I went to Google and keyed in "ends justify the means" and "Dean." To my astonishment, amid the 368 hits was this Associated Press dispatch by Holly Ramer from Manchester, N.H."
On April 14, 2003 he writes,
That favorite saying of heavyweight champion Jack Dempsey gets a half-million hits on Google, including George Washington in 1799: "Offensive operations, often times, is the surest, if not the only means of defence."
NOTE: Searching the phrase today returns an estimated 9,900 hits. Safire fogot to use "" when phrase searching
Searching without the "" marks (individual terms) rerturns 736,000.
Due to stopwords you're really searching best defense good offense.
Daria_Goetsch
06-18-2004, 07:28 PM
Great posts everyone. Interesting, isn't it? The creators of Google are obviously brilliant. But put into perspective, we are talking about an automated program that gathers information and stores it in a database. Human beings wrote the software programs. Software programs have bugs and fail all the time for users. Absolute factual trust in a program doesn't seem like the way to go.
garyp
06-18-2004, 08:22 PM
You're right Daria, Google's founders are brilliant on many levels including having a brilliant marketing team in place. Again, this genius marketing has lead to what Pleeker does a great job :) of describing in his post.
From a marketing angle larger numbers (as in estimated page totals) leads the unsophisticated user to believe Google has more than others. Bigger is better so to speak and the public loves numbers and to compare this type of info.
Even if you wanted to review each and every page in a massive result set, you can't. Google will only display the first 1000 results of a result set.
Another issue with Google (and other general web engines) is duplicate pages and near duplicate pages. Think about how many times the identical content from various ODP or Amazon affiliates are in the database?
More traditional info retrieval databases can produce better estimates. While far from perfect, services like ProQuest, the Dialog family of databases, and Factiva produce more useful estimates for the amount of material they cover. For example comparing press mentions which is something journalists like to do. Of course with these types of services, removing duplicates is a often a goal (some services even allow the searcher to do it) and spam is really not an issue.
A paper I enjoy referring people to was published two years ago. Authors include Craig Silverstein and Monika Henziger of Google. It discusses many of the challenges they face in building a web database.
"Challenges in Web Search Engines"
http://www.acm.org/sigs/sigir/forum/F2002/henzinger.pdf
cheers,
gary
seobook
06-18-2004, 09:54 PM
351,000 for bush anti-american (http://www.google.com/search?q=bush+anti-american)
Results seem accurate to me :)
though it does not parallel correctly.
2,090 for bush anti-world (http://www.google.com/search?sourceid=navclient&ie=UTF-8&oe=UTF-8&q=bush+anti%2Dworld)
Anthony Parsons
06-18-2004, 11:13 PM
51,300 for bbc anti-american
54,000 for fox anti-american
143,000 for white house anti-american
351,000 for bush anti-american
Funny really. Now lets compare against who is actually targeting the phrase "allintitle".
6 for bbc anti-american
0 for fox anti-american
0 for white house anti-american
57 for bush anti-american
A bit more interesting now!
Dodger
06-19-2004, 12:03 AM
News writers should always confirm their facts, shouldn't they? The fact that they used just Google as a sole source is sloppy journalism.
They should of used Yahoo too, I hate shoddy footwork. :mad:
1,300,000 for bbc anti-american (http://search.yahoo.com/search?p=bbc+anti-american&fr=my_top)
1,560,000 for fox anti-american (http://search.yahoo.com/search?p=fox+anti-american&ei=UTF-8&fr=my_top&n=100&fl=0&x=wrt)
2,348,000 for danny sullivan anti-american (http://search.yahoo.com/search?p=Just+joking+Danny&ei=UTF-8&fr=my_top&n=100&fl=0&x=wrt)
3,730,000 for white house anti-american (http://search.yahoo.com/search?p=white+house+anti-american&ei=UTF-8&fr=my_top&n=100&fl=0&x=wrt)
5,330,000 for bush anti-american (http://search.yahoo.com/search?p=bush+anti-american&ei=UTF-8&fr=my_top&n=100&fl=0&x=wrt)
seobook
06-19-2004, 12:17 AM
News writers should always confirm their facts, shouldn't they? The fact that they used just Google as a sole source is sloppy journalism.
They should of used Yahoo too, I hate shoddy footwork. :mad:
if you really wanted authoritative results you should search with teoma :)
garyp
06-19-2004, 11:21 AM
The problem with many of the examples given is that just searching
bush anti-american
bbc anti-american
or any other example and in addition to the fact that the page estimate software has problems and is inaccurate SO are the search strategies themselves.
Huh?
Because a web page has both of these terms on a page DOESN'T mean that they have any relationship to one another.
For example, the word "Bush" could be in the first paragraph of a web page and anti-american could be in paragraph twenty-five of the same page.
If Google (and other web engines) offererd a proximity operator aside from the "" or the - symbol it would allow for more accurate searching.
No web engine allows the searcher to control the proximity of two terms. AltaVista USED to offer NEAR and WITHIN but ended offering this feature a couple of months ago.
Many of the database services that I mentioned in an earlier post in this thread DO offer this feature. For example Bush near5 "anti american".
This tells the database Bush needs to be within 5 words (in either direction) of "anti american".
Even using the - (to show adjacency) as in anti-american
vs
"anti american" produces completely differerent numbers.
anti-american 552,000
"anti american" 530,00
=================
Finally, another issue in getting accurate totals from web engines comes from the distributed nature of Google (and other web engines). Numbers can vary greatly from moment to moment depending on the database cluster you hit.
garyp
06-19-2004, 11:54 AM
Rustybrick:
You asked for more examples.
This just in.
The new Forbes Celebrity 100 list is just out and one of the
factors used to determine the ranking is "web hits" from Google.
http://www.forbes.com/celebrities/2004/06/16/celebs04land.html
pleeker
06-19-2004, 02:57 PM
Because a web page has both of these terms on a page DOESN'T mean that they have any relationship to one another.
Amen.
So who's job is it to educate the media, judiciary, etc., about the error of their ways? (And I don't just mean WRT Google search counts; I mean the overall bigger picture of relying too heavily on Google as the be-all and end-all of investigative research.) When it's a Forbes celebrity popularity contest, no big deal. When it's news reporting or determining legal outcomes, big deal.
You don't exactly see Google going out of its way to rein this in. Heck, I've seen interviews with Larry (or maybe it was Sergey) where the story of the heart attack victim finding out what to do by doing a Google search has been told as evidence of Google's power.
garyp
06-19-2004, 03:17 PM
Amen.
When it's a Forbes celebrity popularity contest, no big deal. When it's news reporting or determining legal outcomes, big deal.
agreed on your other comments.
As I pointed out yesterday I've been talking about this on my site (http://www.resourceshelf.com) for more than 3 years as have many of my librarian colleagues. We're trying.
Chris and I talked about the topic in our book that was published three years ago.
I authored an article for SearchDay in 2002 about the high quality, authoritative and FREE databases (accessible from home) that most public libraries offer.
http://www.searchenginewatch.com/searchday/article.php/2161631
Yet, many people, including the mainstream press, often don't want to listen.
However, in the last few months, some good news. Many journos are starting to understand realize these issues. This article from the Mercury News is an example.
http://www.siliconvalley.com/mld/siliconvalley/8704895.htm
This Fall I've been asked to teach a class on this topic at the journalism school at the Univ. of North Carolina. So, if all goes well, these students, will have a better idea of what the web can and cannot do and when to use other resources, both electronic and print.
Dodger
06-19-2004, 03:31 PM
The problem with many of the examples given is that just searching
bush anti-american
bbc anti-american
or any other example and in addition to the fact that the page estimate software has problems and is inaccurate SO are the search strategies themselves.
Huh?
Gary - I did the Yahoo results as a joke. It wasn't meant to be anything more than that. I do agree with your take on this whole thing, it is the whole point of this thread to begin with.
But I think everybody missed the result for danny sullivan anti-american in the middle of all that. :)
rustybrick
06-19-2004, 11:02 PM
Thanks for that, a metric used to develop the Forbes Celebrity 100 is "Web mentions on Google". Very interesting. I can see how it can be used to figure out popularity. I mean, of course Forbes is smart enough to use a dozen factors (all with different weights).
But are there examples of people taking the extreme to use Google as the final word to an argument?
Rustybrick:
You asked for more examples.
This just in.
The new Forbes Celebrity 100 list is just out and one of the
factors used to determine the ranking is "web hits" from Google.
http://www.forbes.com/celebrities/2004/06/16/celebs04land.html
Everyman
06-20-2004, 11:23 AM
Here's an example from one year ago. This is the lead for a column by Ellen Goodman, a very prominent columnist for Newsweek:
"BOSTON--It's been 20 years since that Hollywood moment when Michael Keaton lost his job, found his kids and entered "Mr. Mom" into the cultural lingo. Even today, if you search for Mr. Mom on the Internet, you get 875,000 Google hits, including a Web site for a diner and a fiberglass pool company."
The problem is that Ms. Goodman didn't use quotation marks around "Mr. Mom" in the Google box. I sent her an email and scolded her for being so dumb.
The second, more recent problem is that at least since last September, Google's total counts have been utterly, hopelessly inflated. I think it's deliberate, or Google is so broken that the engineers no longer know what they're doing.
seobook
06-20-2004, 02:51 PM
The second, more recent problem is that at least since last September, Google's total counts have been utterly, hopelessly inflated. I think it's deliberate, or Google is so broken that the engineers no longer know what they're doing.
it does not hurt them to show somewhat inflated numbers. you can't go past 1,000 results deep anyway. if they are off I would say it is not much of a priority to fix em.
also, they probably run the results through some filters after they come up with that #.
Chris Sherman
06-20-2004, 02:57 PM
Even journalists who are savvy enough to do phrase searches, enclosing search terms in double quotes, are missing the boat when they equate the number of search results found with instances of their search terms. Many pages returned in search results don't contain the search terms at all -- they're only in links pointing to the page.
Google also now does stemming, using variants of search terms, and it also appears to count fragments of words appearing in URLs (eg: "ante" in the URLs antelaguerraactua.org, dantedesigngallery.com, etc.).
AussieWebmaster
06-21-2004, 12:55 AM
I think the major part of this is really about the weight being given to Google... misinterpreted or not, the branding of Google and how the general public perceives it is just like these poorly skilled journalists.
As a silly count that people place some stock in definitely reflects more on the lack of true understanding of search engines.
As time passes this will be changed... but just like the statement if it was on TV it had to be true etc. this misconception will soon be realised.
seobook
06-21-2004, 06:20 AM
I would say a large percentage of the population believes that if it is on TV it must be true. especially stuff on news channels or news programs.
Dodger
06-21-2004, 12:26 PM
I would say a large percentage of the population believes that if it is on TV it must be true. especially stuff on news channels or news programs.
I hear you on the News stuff ... I was going to say something about Wolf Blitzer here, but decided not to. I have picked on him enough in the past.
But I do believe that Relacore is not for me...I only need to shed a few pounds and Relacore is for those who want to lose 15, 20 to 30 pounds or more. That leaves me out. ;)
pleeker
06-21-2004, 03:38 PM
It all stems from lazy journalists and they are everywhere.
As someone who worked in print, TV, and radio media for 7+ years, I'd have to disagree that the problem is "lazy" journalists. There are hard-working journalists everywhere, many of whom are expected to be experts in too many areas. (And yes, a few lazy ones, too....)
The key in this situation, as I said above, is educating them on what Google can and cannot do, and what Google is and isn't. They don't report "Google hits" because they're lazy, they report them because they think it means something.
And I appreciate Gary's posts and efforts along these lines. But how much can one man do? :)
AussieWebmaster
06-21-2004, 04:59 PM
As someone who worked in print, TV, and radio media for 7+ years, I'd have to disagree that the problem is "lazy" journalists. There are hard-working journalists everywhere, many of whom are expected to be experts in too many areas. (And yes, a few lazy ones, too....)
The key in this situation, as I said above, is educating them on what Google can and cannot do, and what Google is and isn't. They don't report "Google hits" because they're lazy, they report them because they think it means something.
And I appreciate Gary's posts and efforts along these lines. But how much can one man do? :)
I agree with your conclusions... having worked in the field for a lot of years as well. But the things I learned when coming to US for my Masters of Journalism was the reverence given by the public to people who are not all knowing... just people trained to put together paragraphs in a set way (with as much information that can be verified)... guess Google is now a verified source!
hulkster
06-21-2004, 06:25 PM
My site comes up #1 out of 12,500,000 for car stories... I only noticed this recently and really wasn't something I was trying for!
Maybe Fox News should give me a call next time they want to do a car story?!? ;-)
alek
P.S. For those wondering, the inbound/referral traffic is pretty minor - certainly not a competitve keyphrase that is actually searched very often.
Dodger
06-21-2004, 07:10 PM
P.S. For those wondering, the inbound/referral traffic is pretty minor - certainly not a competitve keyphrase that is actually searched very often.
No, but I bet backseat stories are. :eek:
seobook
06-21-2004, 07:16 PM
No, but I bet backseat stories are. :eek:
what are backseat stories?
AussieWebmaster
06-21-2004, 07:35 PM
what are backseat stories?
You have spent too much time in front of a computer.......
Dodger
06-21-2004, 08:03 PM
what are backseat stories?
Man you have led a sheltered life too. Google it Aaron ... or have a talk with your father. This is something that you should not learn from the streets.
orion
06-22-2004, 12:42 PM
I am posting this information on two thread of SEW Forums
1. Fox News & Danger Of Citing Search Counts, started by Danny Sullivan
http://forums.searchenginewatch.com/forum/showthread.php?t=299&highlight=news
2. Keywords Co-Occurrence and Semantic Connectivity, started by Orion (me)
http://forums.searchenginewatch.com/forum/showthread.php?t=48
I hope this post clarifies in some way the missconception news writers and reporters have on search results interpretations. Let's see.
I agree with Danny and most readers. I would go little far from that. The use or misuse of query-driven absolute results, as absolute ranking results for that matter, often leads to misleading statistics. As Rich Ord, a writer from WebProNews put it, search results are not Gallup polls.
To assess association of terms and concepts we need to conduct semantic associations studies at both database and document levels.
For the terms and phrases discussed in Danny's thread, we conducted a semantic connectivity analysis at the database level and these are the results (as of today). Results may change over time. See my thread for the theory behind the results. I use terms co-occurrence at the database level (Terms co-occurrence and semantic connectivity at the document level will soon be discussed in the thread).
This is what we got.
IR/SEARCH ENGINE: GOOGLE
QUERY MODE: FIND ALL
DATE/TIME: 06-22-04 AT 10:30 AM
CASE: INSENSITIVE
k1=Fox n1=20,500,000
k2=anti-american n2=546,000
k12=Fox anti-american n12=50,800
c12=2.24 ppt
k1=bbc n1=25,400,000
k2=anti-american n2=546,000
k12=BBC anti-american n12=51,400
c12=1.98 ppt
k1=white house n1=12,400,000
k2=anti-american n2=546,000
k12=white house anti-american n12=143,000
c12=11.17 ppt
k1=bush n1=31,400,000
k2=anti-american n2=546,000
k12=bush anti-american n12=338,000
c12=10.69 ppt
k1=danny sullivan n1=1,290,000
k2=anti-american n2=546,000
k12=danny sullivan anti-american n12=8,990
c12=4.92 ppt
RESULTS
1. c-index values, in ppt: 11.17 (WhiteHouse) > 10.69 (Bush) > 4.92 (Danny) > 2.24 (FOX) > 1.98 (BBC)
2. the c-indices, fraction of co-occurrence indicates that there is a lose term co-occurrence between the cases (around 1% or less or if you wish around 11 ppt or less). If we take in a blindfold manner the c-index results then danny sullivan "would be" more anti-american than Fox or BBC. Similarly we can blindfold and incorrectly assume that Fox is slightly more anti-american than BBC.
3. Do the homework: Results in EXACT mode (find "phrases"), demonstrates no significant correlation at all. Cluster similarity results don't even help, either.
CONCLUSION
1. If we wrongly cite absolute results then we must wrongly assume that the white house and bush "are" more anti-american than Fox or BBc.
2. It is clear that absolute results cited by "reporters" to reinforce a story are meaningless. Without a clear understanding of terms and concept association, semantic connectivity and the underlying theory, those results qualify as fabricated facts --at least in my book.
3. The above results are valid in the queried database only, ie. Google.
4. For a true confidence level analysis, we need to conduct c-index values that change over time (time series co-occurrence study). Only then we can talk in terms of trends and patterns and "spikes" of signifcances. Certainly conducting such studies during Spring or Summer is not the same as conducting them during Fall or 30 days before or after Election Day 2004.
I invite news writers as well as marketers interested in conducting similar semantic connectivity studies in connection with phrases or concepts (to be targeted or marketed;ie., slogans, brands, catchy political gimmicks etc.) to revisit my thread on terms (or concepts) co-occurrence and semantics.
They may find some new and other applications for my c-indices I may have overlooked. I can provide any interested party with sounded scientific marketing research data or help them to assess their current data. Just need to ask.
Sorry to step in. I cannot help myself when others try to missguide the so-called "public opinion".
Some time during the day of today or tommorrow I will start with terms co-occurrence at the document level, an area may interest SEOs.
Orion
Errata/Clarification: Rich Ord, mentioned above, is staff writer of WebProNews and actually CEO of iEntry, Inc. His article can be found at http://www.webpronews.com/news/ebusinessnews/wpn-45-20040622CitingSearchResultCountsIsNotNews.html
nuclei
06-22-2004, 01:01 PM
351,000 for bush anti-american (http://www.google.com/search?q=bush+anti-american)
Actually this one seems right on the money.
St0n3y
06-22-2004, 02:04 PM
I certainly did not mean to apply that all or even most journalists are lazy, there are some lazy journalists and there are good journalists who just don't do their homework from time to time. I don't know what journalists made the report mentioned above so I don't want to call them a lazy journalist but in this case they didn't know the facts, or assumed somethign to be true that wasn't. Assumptions can be proven true or falsw eith a little leg work, something that does not happen enough, regardless of the journalist or news company.
News accuracy is only important if it is profitable. It is not laziness or accuracy that is the issue, its profitability that is important. Churning out more stories by doing inadequate research makes more profit. Making news more like Maxim or FHM draws more eyeballs to sell more ads & makes more profit.
Don't think I agree with this any more than I believe that a search engine can turn a profit by producing poor quality results. Yes, it is unfortunate that News has turned more into entertainment, but the quality of news has to be in tact in order to gain an audience. CNN lost its audiance to Fox because the quality of their reporting the facts was sub-par and had an obvious bias. It was pretty easy to spot.
Quote:
351,000 for bush anti-american
Actually this one seems right on the money.
861000 for "kerry bad for america"
Seems right on the money to me. (I'm sure we can go on and on with this silliness)
I'd want to add that even if a bush anti-american search did indicate that all returned pages claim he is anti-american, it would mean that many people think so, not that he in fact is.
That's a difference which should be taken seriously, above all in court.
Strictly speaking, it wouldn't even mean that many people think so, just that there are many web sites saying it, for whatever reason.
AussieWebmaster
07-07-2004, 11:06 AM
I'd want to add that even if a bush anti-american search did indicate that all returned pages claim he is anti-american, it would mean that many people think so, not that he in fact is.
That's a difference which should be taken seriously, above all in court.
Strictly speaking, it wouldn't even mean that many people think so, just that there are many web sites saying it, for whatever reason.
I like the reply and yes numbers do not make things true... just is a count of numbers
St0n3y
07-07-2004, 08:26 PM
that was the point I was trying to make but with example and no so much with words. Gotta remember, just because you find it on the internet does not make it true... and similarly, just because you hear it on the news doesn't make it true either. I don't believe anything until I see it from at least two sources.
But even still, you can't believe most as many news agencies use each other as sources rather than researching the truth for themselves. how many times have we heard an news anchor (from TV or radio) say "The New York Times is reporting today..."
Peter (IMC)
09-13-2004, 05:01 PM
lol,.. if that picture really is from the Fox News site than that says it all. Funny,... but lets not get too political in an SEO forum,. :D
Back to the original post,. it is interesting how "the average" user doesn't know much about what all these numbers mean.
Most people seem to think that when a search engine says it found XXXXXXXXXX documents then it means there are that many documents about the phrase they searched for. Thatīs not correct of course.
I wonder how many more misunderstandings there are about search engines numbers. I know PageRank is very misunderstood by many people.
dannysullivan
09-15-2004, 07:09 AM
I've split the debate about the quality of Fox News and others over here: Fox News & Accusations Of Bias (http://forums.searchenginewatch.com/showthread.php?t=1649). It was getting pretty far from the original topic of how all types of organizations, such as news organizations like Fox News, might rely on search counts as "facts" to include in their reports
St0n3y
09-15-2004, 02:05 PM
I think to use info in the way it was presented was clearly the journalists misunderstanding of search itself. To me that is inexcusable, but you see it all the time, especially when you see news reports or articles about this industry. What the reporter knows and reports is often pretty far behind the curve.