Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > Searching Tips & Techniques
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 06-18-2004   #1
dannysullivan
Editor, SearchEngineLand.com (Info, Great Columns & Daily Recap Of Search News!)
 
Join Date: May 2004
Location: Search Engine Land
Posts: 2,085
dannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud of
Fox News & Danger Of Citing Search Counts

Had a reporter just ask me about a case where Fox News tried to defend calling the BBC anti-American in part by using a Google count. The story he's following up on says:

Quote:
The network also said searching for the phrase "BBC anti-American" into the Google internet search engine resulted in 47,200 hits.
The count means nothing, of course. There was a good story from News.com recently that I reviewed that talked about a number of examples like this, where counts are used to try to "prove" things in court.

Here are some funny things to consider in breaking down the Fox defense. Remember, the test Fox did was to say that 47,200 matches for bbc anti-american suggests the network is anti-American. So presumably, more matches mean even more anti-Americanism. Given this, here are some counts from today:

How could Bush be so anti-American? Of course, the counts only show how many pages, or links to pages, have those words in them -- nothing else.

Anyone else know of some example of counts being used wrongly like this?

Last edited by dannysullivan : 06-18-2004 at 12:59 PM.
dannysullivan is offline   Reply With Quote
Old 06-18-2004   #2
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
I am shocked that people would use a simple Google "total results" found to determine how anti anything a news site is. I'd also be interested in hearing if anyone else has heard of this type of logic being used elsewhere.

Oh by the way: 10,200 for danny sullivan anti-american
rustybrick is offline   Reply With Quote
Old 06-18-2004   #3
pleeker
www.SmallBusinessSEM.com
 
Join Date: Jun 2004
Location: Washington state
Posts: 295
pleeker is a jewel in the roughpleeker is a jewel in the roughpleeker is a jewel in the roughpleeker is a jewel in the rough
Quote:
Originally Posted by dannysullivan
Anyone else know of some example of counts being used wrongly like this?
Yes. See post #2 above from rustybrick.

Seriously, this is astonishing. People really do think Google knows everything, that Google is God. I appreciate Google as much as the next guy, but it's time to start educating people on what it is and isn't and what it can and cannot do. There's nothing wrong with using Google for research, as many judges (or reporters) are apparently doing. But the system falls apart when the researcher isn't able to understand the real value (or lack thereof) of what Google gives you.
pleeker is offline   Reply With Quote
Old 06-18-2004   #4
garyp
 
Join Date: Jun 2004
Posts: 265
garyp is a jewel in the roughgaryp is a jewel in the roughgaryp is a jewel in the roughgaryp is a jewel in the rough
Nice post Pleeker. I agree.

I've been writing about this topic on ResourceShelf for a long time. As a librarian who often works with news and media librarians the issue continues.
A few links.

A great article on the topic from MediaBistro.com with examples
http://www.mediabistro.com/articles/cache/a1217.asp


A ResourceShelf post from 11 months ago. Examples from two papers.
Finally, none other than William Safire in The New York Times has used page estimates in two columns. (NOT available free on the web)
http://www.resourceshelf.com/archive...499073 339479

On October 13, 2003 Safire writes,
"Before joining Dean in castigating McCain for putting words in his mouth, I went to Google and keyed in "ends justify the means" and "Dean." To my astonishment, amid the 368 hits was this Associated Press dispatch by Holly Ramer from Manchester, N.H."

On April 14, 2003 he writes,
That favorite saying of heavyweight champion Jack Dempsey gets a half-million hits on Google, including George Washington in 1799: "Offensive operations, often times, is the surest, if not the only means of defence."

NOTE: Searching the phrase today returns an estimated 9,900 hits. Safire fogot to use "" when phrase searching
Searching without the "" marks (individual terms) rerturns 736,000.
Due to stopwords you're really searching best defense good offense.

Last edited by garyp : 06-18-2004 at 04:19 PM.
garyp is offline   Reply With Quote
Old 06-18-2004   #5
Daria_Goetsch
SEOExplore.com - SEO Research Directory
 
Join Date: Jun 2004
Location: Eureka, California
Posts: 226
Daria_Goetsch has a spectacular aura aboutDaria_Goetsch has a spectacular aura about
Great posts everyone. Interesting, isn't it? The creators of Google are obviously brilliant. But put into perspective, we are talking about an automated program that gathers information and stores it in a database. Human beings wrote the software programs. Software programs have bugs and fail all the time for users. Absolute factual trust in a program doesn't seem like the way to go.
Daria_Goetsch is offline   Reply With Quote
Old 06-18-2004   #6
garyp
 
Join Date: Jun 2004
Posts: 265
garyp is a jewel in the roughgaryp is a jewel in the roughgaryp is a jewel in the roughgaryp is a jewel in the rough
You're right Daria, Google's founders are brilliant on many levels including having a brilliant marketing team in place. Again, this genius marketing has lead to what Pleeker does a great job of describing in his post.

From a marketing angle larger numbers (as in estimated page totals) leads the unsophisticated user to believe Google has more than others. Bigger is better so to speak and the public loves numbers and to compare this type of info.

Even if you wanted to review each and every page in a massive result set, you can't. Google will only display the first 1000 results of a result set.

Another issue with Google (and other general web engines) is duplicate pages and near duplicate pages. Think about how many times the identical content from various ODP or Amazon affiliates are in the database?

More traditional info retrieval databases can produce better estimates. While far from perfect, services like ProQuest, the Dialog family of databases, and Factiva produce more useful estimates for the amount of material they cover. For example comparing press mentions which is something journalists like to do. Of course with these types of services, removing duplicates is a often a goal (some services even allow the searcher to do it) and spam is really not an issue.

A paper I enjoy referring people to was published two years ago. Authors include Craig Silverstein and Monika Henziger of Google. It discusses many of the challenges they face in building a web database.
"Challenges in Web Search Engines"
http://www.acm.org/sigs/sigir/forum/F2002/henzinger.pdf

cheers,
gary

Last edited by garyp : 06-18-2004 at 07:24 PM.
garyp is offline   Reply With Quote
Old 06-18-2004   #7
seobook
I'm blogging this
 
Join Date: Jun 2004
Location: we are Penn State!
Posts: 1,943
seobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to all
Quote:
Originally Posted by dannysullivan
Results seem accurate to me
though it does not parallel correctly.
__________________
The SEO Book

Last edited by seobook : 06-18-2004 at 08:56 PM.
seobook is offline   Reply With Quote
Old 06-18-2004   #8
Anthony Parsons
Rubbing the shine of the knobs who think they're better than everyone else...
 
Join Date: Jun 2004
Location: Melbourne Australia
Posts: 478
Anthony Parsons will become famous soon enough
51,300 for bbc anti-american
54,000 for fox anti-american
143,000 for white house anti-american
351,000 for bush anti-american

Funny really. Now lets compare against who is actually targeting the phrase "allintitle".

6 for bbc anti-american
0 for fox anti-american
0 for white house anti-american
57 for bush anti-american

A bit more interesting now!
Anthony Parsons is offline   Reply With Quote
Old 06-18-2004   #9
Dodger
Honorary Member
 
Dodger's Avatar
 
Join Date: Jun 2004
Location: Central US
Posts: 349
Dodger has a spectacular aura aboutDodger has a spectacular aura aboutDodger has a spectacular aura about
News writers should always confirm their facts, shouldn't they? The fact that they used just Google as a sole source is sloppy journalism.

They should of used Yahoo too, I hate shoddy footwork.

1,300,000 for bbc anti-american
1,560,000 for fox anti-american
2,348,000 for danny sullivan anti-american
3,730,000 for white house anti-american
5,330,000 for bush anti-american
__________________
I am Ronnie
Dodger is offline   Reply With Quote
Old 06-18-2004   #10
seobook
I'm blogging this
 
Join Date: Jun 2004
Location: we are Penn State!
Posts: 1,943
seobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to all
Quote:
Originally Posted by Dodger
News writers should always confirm their facts, shouldn't they? The fact that they used just Google as a sole source is sloppy journalism.

They should of used Yahoo too, I hate shoddy footwork.
if you really wanted authoritative results you should search with teoma
__________________
The SEO Book
seobook is offline   Reply With Quote
Old 06-19-2004   #11
garyp
 
Join Date: Jun 2004
Posts: 265
garyp is a jewel in the roughgaryp is a jewel in the roughgaryp is a jewel in the roughgaryp is a jewel in the rough
The problem with many of the examples given is that just searching

bush anti-american
bbc anti-american

or any other example and in addition to the fact that the page estimate software has problems and is inaccurate SO are the search strategies themselves.

Huh?

Because a web page has both of these terms on a page DOESN'T mean that they have any relationship to one another.

For example, the word "Bush" could be in the first paragraph of a web page and anti-american could be in paragraph twenty-five of the same page.

If Google (and other web engines) offererd a proximity operator aside from the "" or the - symbol it would allow for more accurate searching.

No web engine allows the searcher to control the proximity of two terms. AltaVista USED to offer NEAR and WITHIN but ended offering this feature a couple of months ago.

Many of the database services that I mentioned in an earlier post in this thread DO offer this feature. For example Bush near5 "anti american".
This tells the database Bush needs to be within 5 words (in either direction) of "anti american".


Even using the - (to show adjacency) as in anti-american
vs
"anti american" produces completely differerent numbers.
anti-american 552,000
"anti american" 530,00
=================

Finally, another issue in getting accurate totals from web engines comes from the distributed nature of Google (and other web engines). Numbers can vary greatly from moment to moment depending on the database cluster you hit.

Last edited by garyp : 06-19-2004 at 10:47 AM.
garyp is offline   Reply With Quote
Old 06-19-2004   #12
garyp
 
Join Date: Jun 2004
Posts: 265
garyp is a jewel in the roughgaryp is a jewel in the roughgaryp is a jewel in the roughgaryp is a jewel in the rough
Rustybrick:
You asked for more examples.

This just in.

The new Forbes Celebrity 100 list is just out and one of the
factors used to determine the ranking is "web hits" from Google.

http://www.forbes.com/celebrities/20...ebs04land.html
garyp is offline   Reply With Quote
Old 06-19-2004   #13
pleeker
www.SmallBusinessSEM.com
 
Join Date: Jun 2004
Location: Washington state
Posts: 295
pleeker is a jewel in the roughpleeker is a jewel in the roughpleeker is a jewel in the roughpleeker is a jewel in the rough
Quote:
Originally Posted by garyp
Because a web page has both of these terms on a page DOESN'T mean that they have any relationship to one another.
Amen.

So who's job is it to educate the media, judiciary, etc., about the error of their ways? (And I don't just mean WRT Google search counts; I mean the overall bigger picture of relying too heavily on Google as the be-all and end-all of investigative research.) When it's a Forbes celebrity popularity contest, no big deal. When it's news reporting or determining legal outcomes, big deal.

You don't exactly see Google going out of its way to rein this in. Heck, I've seen interviews with Larry (or maybe it was Sergey) where the story of the heart attack victim finding out what to do by doing a Google search has been told as evidence of Google's power.
pleeker is offline   Reply With Quote
Old 06-19-2004   #14
garyp
 
Join Date: Jun 2004
Posts: 265
garyp is a jewel in the roughgaryp is a jewel in the roughgaryp is a jewel in the roughgaryp is a jewel in the rough
Quote:
Originally Posted by pleeker
Amen.
When it's a Forbes celebrity popularity contest, no big deal. When it's news reporting or determining legal outcomes, big deal.
agreed on your other comments.


As I pointed out yesterday I've been talking about this on my site (http://www.resourceshelf.com) for more than 3 years as have many of my librarian colleagues. We're trying.

Chris and I talked about the topic in our book that was published three years ago.

I authored an article for SearchDay in 2002 about the high quality, authoritative and FREE databases (accessible from home) that most public libraries offer.
http://www.searchenginewatch.com/sea...le.php/2161631

Yet, many people, including the mainstream press, often don't want to listen.

However, in the last few months, some good news. Many journos are starting to understand realize these issues. This article from the Mercury News is an example.
http://www.siliconvalley.com/mld/sil...ey/8704895.htm

This Fall I've been asked to teach a class on this topic at the journalism school at the Univ. of North Carolina. So, if all goes well, these students, will have a better idea of what the web can and cannot do and when to use other resources, both electronic and print.
garyp is offline   Reply With Quote
Old 06-19-2004   #15
Dodger
Honorary Member
 
Dodger's Avatar
 
Join Date: Jun 2004
Location: Central US
Posts: 349
Dodger has a spectacular aura aboutDodger has a spectacular aura aboutDodger has a spectacular aura about
Quote:
Originally Posted by garyp
The problem with many of the examples given is that just searching

bush anti-american
bbc anti-american

or any other example and in addition to the fact that the page estimate software has problems and is inaccurate SO are the search strategies themselves.

Huh?
Gary - I did the Yahoo results as a joke. It wasn't meant to be anything more than that. I do agree with your take on this whole thing, it is the whole point of this thread to begin with.

But I think everybody missed the result for danny sullivan anti-american in the middle of all that.
__________________
I am Ronnie
Dodger is offline   Reply With Quote
Old 06-19-2004   #16
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Thanks for that, a metric used to develop the Forbes Celebrity 100 is "Web mentions on Google". Very interesting. I can see how it can be used to figure out popularity. I mean, of course Forbes is smart enough to use a dozen factors (all with different weights).

But are there examples of people taking the extreme to use Google as the final word to an argument?

Quote:
Originally Posted by garyp
Rustybrick:
You asked for more examples.

This just in.

The new Forbes Celebrity 100 list is just out and one of the
factors used to determine the ranking is "web hits" from Google.

http://www.forbes.com/celebrities/20...ebs04land.html
rustybrick is offline   Reply With Quote
Old 06-20-2004   #17
Everyman
Member
 
Join Date: Jun 2004
Posts: 133
Everyman is a jewel in the roughEveryman is a jewel in the roughEveryman is a jewel in the rough
Here's an example from one year ago. This is the lead for a column by Ellen Goodman, a very prominent columnist for Newsweek:

"BOSTON--It's been 20 years since that Hollywood moment when Michael Keaton lost his job, found his kids and entered "Mr. Mom" into the cultural lingo. Even today, if you search for Mr. Mom on the Internet, you get 875,000 Google hits, including a Web site for a diner and a fiberglass pool company."

The problem is that Ms. Goodman didn't use quotation marks around "Mr. Mom" in the Google box. I sent her an email and scolded her for being so dumb.

The second, more recent problem is that at least since last September, Google's total counts have been utterly, hopelessly inflated. I think it's deliberate, or Google is so broken that the engineers no longer know what they're doing.

Last edited by Everyman : 06-20-2004 at 10:26 AM.
Everyman is offline   Reply With Quote
Old 06-20-2004   #18
seobook
I'm blogging this
 
Join Date: Jun 2004
Location: we are Penn State!
Posts: 1,943
seobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to all
Quote:
Originally Posted by Everyman
The second, more recent problem is that at least since last September, Google's total counts have been utterly, hopelessly inflated. I think it's deliberate, or Google is so broken that the engineers no longer know what they're doing.
it does not hurt them to show somewhat inflated numbers. you can't go past 1,000 results deep anyway. if they are off I would say it is not much of a priority to fix em.

also, they probably run the results through some filters after they come up with that #.
__________________
The SEO Book
seobook is offline   Reply With Quote
Old 06-20-2004   #19
Chris Sherman
Executive Editor, SearchEngineWatch.com
 
Join Date: Jun 2004
Location: Boulder, CO
Posts: 111
Chris Sherman is a jewel in the roughChris Sherman is a jewel in the roughChris Sherman is a jewel in the roughChris Sherman is a jewel in the rough
Even journalists who are savvy enough to do phrase searches, enclosing search terms in double quotes, are missing the boat when they equate the number of search results found with instances of their search terms. Many pages returned in search results don't contain the search terms at all -- they're only in links pointing to the page.

Google also now does stemming, using variants of search terms, and it also appears to count fragments of words appearing in URLs (eg: "ante" in the URLs antelaguerraactua.org, dantedesigngallery.com, etc.).
Chris Sherman is offline   Reply With Quote
Old 06-20-2004   #20
AussieWebmaster
Forums Editor, SearchEngineWatch
 
AussieWebmaster's Avatar
 
Join Date: Jun 2004
Location: NYC
Posts: 8,153
AussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant future
hmmm

I think the major part of this is really about the weight being given to Google... misinterpreted or not, the branding of Google and how the general public perceives it is just like these poorly skilled journalists.
As a silly count that people place some stock in definitely reflects more on the lack of true understanding of search engines.
As time passes this will be changed... but just like the statement if it was on TV it had to be true etc. this misconception will soon be realised.
AussieWebmaster is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off