Special thanks to:
|
#1
|
||||
|
||||
|
Microsoft Scraping Google and Yahoo! SERPS?
Hot in my inbox is the WebProWorld article
"Microsoft Crawling Google Results For New Search Engine?" They already have an interesting thread going at WPW, link Quote:
One thing you cant say about MS is that they arent resourceful. ![]() Last edited by Jeff Martin : 11-11-2004 at 03:03 PM. |
|
#2
|
||||
|
||||
|
No comment
![]() Well, exept for one. Quote:
Last edited by Nacho : 11-11-2004 at 03:03 PM. |
|
#3
|
||||
|
||||
|
Seems a bit unethical to me, if true.
|
|
#4
|
|||
|
|||
|
They should by Fantomasters spiderspy list and redirect their spiders LOL.
They would be the definition or irony ![]() |
|
#5
|
||||
|
||||
|
I was thinking of discussing this topic at the blog I write at. But I do not think its really worth it, unless this topic gets really hot. I know Jason, I actually speak with him on a regular basis. I respect him, the company he works for and his colleagues.
Ok that being said, I think MSN would never even consider this. Couldn't they seed their index with the Yahoo! results they paid for and are still paying for to run the non beta version of MSN Search? Why would they use the Google API or screen scrape when they have access to Yahoo!'s index? Of course everyone needs something to seed the index. I believe most new engines start off with the Yahoo! Directory and ODP. But I am sure, others, smaller ones, use Jason's method. But MSN, I highly doubt it. |
|
#6
|
||||
|
||||
|
From my understanding search engines feed their crawlers from just about anywhere they can get their hands on to find new relevant documents. I've talked with SE engineers about this and it amazed me
to know sometimes where they can start looking. Quote:
How the get as you call "seed" pages to request the crawlers to fetch is totally different that what the SEs use to analyze and store in their index modules. From looking at results with limited testing I can perfectly tell their algorithms are completely unique and different to Google's. They have a lot of tweeking in their hands to do, but I rather not speculate what MSN Search (beta) is doing unless I have enough testing to prove any theories. If MSN Search is, then I would put attention to this comment: Quote:
|
|
#7
|
||||
|
||||
|
Quote:
G and Y! dont own any of the content.....we do. If we as an internet whole placed 4 lines of text in our robots.txt file we would shut down the two most powerful search properties in a matter of months. Since G and Y! dont own any of the content on what legal ground could they stand upon? SEOMike's got it right (wonder where that I dea came from Mike?), get a subscription to Ralph's list and block em ![]() Last edited by Jeff Martin : 11-11-2004 at 06:29 PM. |
|
#8
|
||||
|
||||
|
So MSN Search can be considered the "Black Hat Search Engine"?
![]() |
|
#9
|
||||
|
||||
|
Quote:
I suppose we could place this method into the grey shadded area that quite a bit of SEO falls into. |
|
#10
|
|||
|
|||
|
Quote:
http://www.google.co.uk/terms_of_service.html Quote:
Unethical? No, just commercial stupidity for a billion dollar corporation - if they did. ![]() |
|
#11
|
||||
|
||||
|
Brian,
IF MSN is doing it, then it is NOT doing what this quote suggests: Quote:
From this quote the technique discussed in this thread is NOT breaking any laws that I'm aware of, UNLESS if google has legally (terms & cond.) or using accepted methods (ie. robots.txt) disallow any SE crawlers to go through their results pages using a scrape method. The Google API has its own terms & cond. that must be met, so I'm sure MSN would not be this dumb. It would be like spitting up to the sky, wouldn't you think? |
|
#12
|
||||
|
||||
|
Quote:
Quote:
|
|
#13
|
||||
|
||||
|
Quote:
Again, IF it's legally and accessibly possible, then I think it's a genius strategy. |
|
#14
|
|||
|
|||
|
Whats an SE to do with "no results found"?
log the query and go scrape. |
|
#15
|
||||
|
||||
|
Not being a lwyer, I am still pretty sure that databases and collections of data are protected under copyright law - as I know they are in most of Europe. You you can own the collection (the index) without owning the content (the URLs) and it is illegal (at least here) to reuse the entire data collection.
|
|
#16
|
||||
|
||||
|
Note MSNdude's response to this topic within the thread at wmw:
http://www.webmasterworld.com/forum97/229.htm Very key answer there, and GG doesn't appear to believe this is the case himself, either. |
|
#17
|
||||
|
||||
|
So far an up to post #31 MSNDude has only said:
Quote:
I would expect a much better statement than that to cover something like . . . . we respect Google.com and as a result we will not use any of Google’s search results to build our index. We'll just have to wait an see if MSN really clears this up for real. |
|
#18
|
|||
|
|||
|
Quote:
To a large extent Database Right grew up as a development of copyright to make a distinction between owning the intellectual endeavour involved in creating original material and owning the intellectual endeavour involved in creating the compilation. Since SE databases are basically compiled by computers, does that demonstrate that computers are intellectuals, I wonder? ![]() |
|
#19
|
|||
|
|||
|
There are plenty of software packages that will screen scape search results in order to create search fodder for those trying to generate AdSense or other traffic.
It's entirely possible that MSN has simply crawled one of these pages. So yes, it would have crawled Google search results -- but these could have been Google search results that were copied and transferred to a different site. That's far more likely than the idea that MSN is somehow scraping Google. I mean what, MSN starts jumping over to Google, entering site:someonessite.com commands for upteen million sites to do some guesswork on harvesting sites? Farfetched. Much more likely it ran across the results as I've described. The actual story is also just incorrect. MSN never required a fee to be spidered. MSN still, on the flagship site, partners with Yahoo for its search results. Yahoo has operated a paid inclusion program but as many will attest, has also spidered pages for free aside from this. MSN dropped paid inclusion pages back in July -- but despite this, they already were and still are crawling the web for free via Yahoo (and via themselves, on the beta site). And the fastest way to get relevant pages is to crawl Google for every page listed from a site? Not. You'd instead do what the other crawlers do, harvest links from across the web and start indexing the ones you see most often. |
|
#20
|
||||
|
||||
|
Quote:
|
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|