|
#1
|
|||
|
|||
|
Fox News & Danger Of Citing Search Counts
Had a reporter just ask me about a case where Fox News tried to defend calling the BBC anti-American in part by using a Google count. The story he's following up on says:
Quote:
Here are some funny things to consider in breaking down the Fox defense. Remember, the test Fox did was to say that 47,200 matches for bbc anti-american suggests the network is anti-American. So presumably, more matches mean even more anti-Americanism. Given this, here are some counts from today:
How could Bush be so anti-American? Of course, the counts only show how many pages, or links to pages, have those words in them -- nothing else. Anyone else know of some example of counts being used wrongly like this? Last edited by dannysullivan : 06-18-2004 at 12:59 PM. |
|
#2
|
||||
|
||||
|
I am shocked that people would use a simple Google "total results" found to determine how anti anything a news site is. I'd also be interested in hearing if anyone else has heard of this type of logic being used elsewhere.
Oh by the way: 10,200 for danny sullivan anti-american ![]() |
|
#3
|
|||
|
|||
|
Quote:
![]() Seriously, this is astonishing. People really do think Google knows everything, that Google is God. I appreciate Google as much as the next guy, but it's time to start educating people on what it is and isn't and what it can and cannot do. There's nothing wrong with using Google for research, as many judges (or reporters) are apparently doing. But the system falls apart when the researcher isn't able to understand the real value (or lack thereof) of what Google gives you. |
|
#4
|
|||
|
|||
|
Nice post Pleeker. I agree.
I've been writing about this topic on ResourceShelf for a long time. As a librarian who often works with news and media librarians the issue continues. A few links. A great article on the topic from MediaBistro.com with examples http://www.mediabistro.com/articles/cache/a1217.asp A ResourceShelf post from 11 months ago. Examples from two papers. Finally, none other than William Safire in The New York Times has used page estimates in two columns. (NOT available free on the web) http://www.resourceshelf.com/archive...499073 339479 On October 13, 2003 Safire writes, "Before joining Dean in castigating McCain for putting words in his mouth, I went to Google and keyed in "ends justify the means" and "Dean." To my astonishment, amid the 368 hits was this Associated Press dispatch by Holly Ramer from Manchester, N.H." On April 14, 2003 he writes, That favorite saying of heavyweight champion Jack Dempsey gets a half-million hits on Google, including George Washington in 1799: "Offensive operations, often times, is the surest, if not the only means of defence." NOTE: Searching the phrase today returns an estimated 9,900 hits. Safire fogot to use "" when phrase searching Searching without the "" marks (individual terms) rerturns 736,000. Due to stopwords you're really searching best defense good offense. Last edited by garyp : 06-18-2004 at 04:19 PM. |
|
#5
|
|||
|
|||
|
Great posts everyone. Interesting, isn't it? The creators of Google are obviously brilliant. But put into perspective, we are talking about an automated program that gathers information and stores it in a database. Human beings wrote the software programs. Software programs have bugs and fail all the time for users. Absolute factual trust in a program doesn't seem like the way to go.
|
|
#6
|
|||
|
|||
|
You're right Daria, Google's founders are brilliant on many levels including having a brilliant marketing team in place. Again, this genius marketing has lead to what Pleeker does a great job
of describing in his post. From a marketing angle larger numbers (as in estimated page totals) leads the unsophisticated user to believe Google has more than others. Bigger is better so to speak and the public loves numbers and to compare this type of info. Even if you wanted to review each and every page in a massive result set, you can't. Google will only display the first 1000 results of a result set. Another issue with Google (and other general web engines) is duplicate pages and near duplicate pages. Think about how many times the identical content from various ODP or Amazon affiliates are in the database? More traditional info retrieval databases can produce better estimates. While far from perfect, services like ProQuest, the Dialog family of databases, and Factiva produce more useful estimates for the amount of material they cover. For example comparing press mentions which is something journalists like to do. Of course with these types of services, removing duplicates is a often a goal (some services even allow the searcher to do it) and spam is really not an issue. A paper I enjoy referring people to was published two years ago. Authors include Craig Silverstein and Monika Henziger of Google. It discusses many of the challenges they face in building a web database. "Challenges in Web Search Engines" http://www.acm.org/sigs/sigir/forum/F2002/henzinger.pdf cheers, gary Last edited by garyp : 06-18-2004 at 07:24 PM. |
|
#7
|
|||
|
|||
|
Quote:
![]() though it does not parallel correctly.
__________________
The SEO Book Last edited by seobook : 06-18-2004 at 08:56 PM. |
|
#8
|
|||
|
|||
|
51,300 for bbc anti-american
54,000 for fox anti-american 143,000 for white house anti-american 351,000 for bush anti-american Funny really. Now lets compare against who is actually targeting the phrase "allintitle". 6 for bbc anti-american 0 for fox anti-american 0 for white house anti-american 57 for bush anti-american A bit more interesting now! |
|
#9
|
||||
|
||||
|
News writers should always confirm their facts, shouldn't they? The fact that they used just Google as a sole source is sloppy journalism.
They should of used Yahoo too, I hate shoddy footwork. 1,300,000 for bbc anti-american 1,560,000 for fox anti-american 2,348,000 for danny sullivan anti-american 3,730,000 for white house anti-american 5,330,000 for bush anti-american
__________________
I am Ronnie |
|
#10
|
|||
|
|||
|
Quote:
![]()
__________________
The SEO Book |
|
#11
|
|||
|
|||
|
The problem with many of the examples given is that just searching
bush anti-american bbc anti-american or any other example and in addition to the fact that the page estimate software has problems and is inaccurate SO are the search strategies themselves. Huh? Because a web page has both of these terms on a page DOESN'T mean that they have any relationship to one another. For example, the word "Bush" could be in the first paragraph of a web page and anti-american could be in paragraph twenty-five of the same page. If Google (and other web engines) offererd a proximity operator aside from the "" or the - symbol it would allow for more accurate searching. No web engine allows the searcher to control the proximity of two terms. AltaVista USED to offer NEAR and WITHIN but ended offering this feature a couple of months ago. Many of the database services that I mentioned in an earlier post in this thread DO offer this feature. For example Bush near5 "anti american". This tells the database Bush needs to be within 5 words (in either direction) of "anti american". Even using the - (to show adjacency) as in anti-american vs "anti american" produces completely differerent numbers. anti-american 552,000 "anti american" 530,00 ================= Finally, another issue in getting accurate totals from web engines comes from the distributed nature of Google (and other web engines). Numbers can vary greatly from moment to moment depending on the database cluster you hit. Last edited by garyp : 06-19-2004 at 10:47 AM. |
|
#12
|
|||
|
|||
|
Rustybrick:
You asked for more examples. This just in. The new Forbes Celebrity 100 list is just out and one of the factors used to determine the ranking is "web hits" from Google. http://www.forbes.com/celebrities/20...ebs04land.html |
|
#13
|
|||
|
|||
|
Quote:
So who's job is it to educate the media, judiciary, etc., about the error of their ways? (And I don't just mean WRT Google search counts; I mean the overall bigger picture of relying too heavily on Google as the be-all and end-all of investigative research.) When it's a Forbes celebrity popularity contest, no big deal. When it's news reporting or determining legal outcomes, big deal. You don't exactly see Google going out of its way to rein this in. Heck, I've seen interviews with Larry (or maybe it was Sergey) where the story of the heart attack victim finding out what to do by doing a Google search has been told as evidence of Google's power. |
|
#14
|
|||
|
|||
|
Quote:
As I pointed out yesterday I've been talking about this on my site (http://www.resourceshelf.com) for more than 3 years as have many of my librarian colleagues. We're trying. Chris and I talked about the topic in our book that was published three years ago. I authored an article for SearchDay in 2002 about the high quality, authoritative and FREE databases (accessible from home) that most public libraries offer. http://www.searchenginewatch.com/sea...le.php/2161631 Yet, many people, including the mainstream press, often don't want to listen. However, in the last few months, some good news. Many journos are starting to understand realize these issues. This article from the Mercury News is an example. http://www.siliconvalley.com/mld/sil...ey/8704895.htm This Fall I've been asked to teach a class on this topic at the journalism school at the Univ. of North Carolina. So, if all goes well, these students, will have a better idea of what the web can and cannot do and when to use other resources, both electronic and print. |
|
#15
|
||||
|
||||
|
Quote:
But I think everybody missed the result for danny sullivan anti-american in the middle of all that. ![]()
__________________
I am Ronnie |
|
#16
|
||||
|
||||
|
Thanks for that, a metric used to develop the Forbes Celebrity 100 is "Web mentions on Google". Very interesting. I can see how it can be used to figure out popularity. I mean, of course Forbes is smart enough to use a dozen factors (all with different weights).
But are there examples of people taking the extreme to use Google as the final word to an argument? Quote:
|
|
#17
|
|||
|
|||
|
Here's an example from one year ago. This is the lead for a column by Ellen Goodman, a very prominent columnist for Newsweek:
"BOSTON--It's been 20 years since that Hollywood moment when Michael Keaton lost his job, found his kids and entered "Mr. Mom" into the cultural lingo. Even today, if you search for Mr. Mom on the Internet, you get 875,000 Google hits, including a Web site for a diner and a fiberglass pool company." The problem is that Ms. Goodman didn't use quotation marks around "Mr. Mom" in the Google box. I sent her an email and scolded her for being so dumb. The second, more recent problem is that at least since last September, Google's total counts have been utterly, hopelessly inflated. I think it's deliberate, or Google is so broken that the engineers no longer know what they're doing. Last edited by Everyman : 06-20-2004 at 10:26 AM. |
|
#18
|
|||
|
|||
|
Quote:
also, they probably run the results through some filters after they come up with that #.
__________________
The SEO Book |
|
#19
|
|||
|
|||
|
Even journalists who are savvy enough to do phrase searches, enclosing search terms in double quotes, are missing the boat when they equate the number of search results found with instances of their search terms. Many pages returned in search results don't contain the search terms at all -- they're only in links pointing to the page.
Google also now does stemming, using variants of search terms, and it also appears to count fragments of words appearing in URLs (eg: "ante" in the URLs antelaguerraactua.org, dantedesigngallery.com, etc.). |
|
#20
|
||||
|
||||
|
hmmm
I think the major part of this is really about the weight being given to Google... misinterpreted or not, the branding of Google and how the general public perceives it is just like these poorly skilled journalists.
As a silly count that people place some stock in definitely reflects more on the lack of true understanding of search engines. As time passes this will be changed... but just like the statement if it was on TV it had to be true etc. this misconception will soon be realised. |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|