Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > Search Industry Growth & Trends > Search & Legal Issues
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 02-17-2005   #1
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Scraping the Engines - Other Choices?

This thread is inspired by the most recent news at ThreadWatch under the title of Link Tool put on ice pending Discussions with Search Engines. Basically, Text Link Ads (they advertise on the "Search Engine Watch Marketplace") was going to sponsor (pay for) the development time that goes into building one of the most useful link analysis tools out there.

Along with that comes some challenges. When I had my advanced link analysis tool built, I made sure to only use the Google API. Search engines all have a clause in their TOS that disallow the use of screen scraping tools to capture data. Why? Simply because they do not want to have hundreds, thousands, etc. of desktop programs querying their engine for unintended uses. It puts a strain on the servers and costs money. Fair enough.

So I used the Google API. What did that do for me? Well, for a link analysis tool not much. We all know that Google limits what we can see when we conduct a link command. Yahoo does a pretty good job, but then Yahoo! doesn't really have an API that can be used to do this (or it isnt financially feasible).

So what options do we have? Well, if the search engines do not provide us with the access (API) and the data (good linkage data) then what can we do?

The ThreadWatch tool that was just taking off, had a major set back. "GoogleGuy" said it would be against Google's TOS and Text Link Ads was smart enough to back down as a sponsor of the tool. Text Link Ads can not be blamed, they saw an opportunity to help the SEO community and tried. But they also want to respect the Search Engines, which is honorable.

So again, what options does Text Link Ads have? Can a tool be developed? Can we bridge that gap where the search engines and the search engine optimizers are more open with each other. I believe Yahoo and Ask Jeeves (also MSN) have gone a long way in doing so. My personal feeling, Google is just "teasing" us. Most of you know my style, I try to stay very un-opinionated, so I apologize for that.

Now please, let me know your thoughts.
rustybrick is offline   Reply With Quote
Old 02-17-2005   #2
JasonD
Member
 
Join Date: Jan 2005
Posts: 51
JasonD will become famous soon enoughJasonD will become famous soon enough
Hi RustyBrick,

What an excellent post and I agree with the thoughts behind it.

I have a few potential answers and I'll air them here. None of this is set in stone but I do know that that there are options, options optionsand OPTIONS

I understand the business model of the search engines is built upon delivering the highest quality organic results to their visitors. I also respect the viewpoint that some search engine engineers have that anyone who tries to raise their rankings artificially is a spammer.

In a Utopian world no business would undertake any marketing efforts and the best would naturally rise to the top. We don't live in Utopia though and trying to raise a website's rank in the SERPs is a natural choice for any business owner. It's how the whole SEO/M business started.

Building a tool that delivers information on structures of the web and reporting on how that "might" have an effect on your sites ranking in the SERPs is a form of spam analysis. It can help deliver a solution to SE Spam or a worksheet of tasks to be undertaken depending on whether you are a search engine or a website owner.

I believe if you use a search engine analysis tool you are a spammer and definately a spam researcher. Your goal is to raise your site's ranking and that is unnatural in Utopia. I'll be honest here, under the definition above I sure am a spammer as I want to rank well for the sites I operate and to be frank I don't do too badly overall.

Putting specific search engines aside for a moment there comes the question of what can we and what can't we do?

I believe that there is a world of difference between breeching a company's terms of service and breeching a law. The 2 are not symbiotic.

It is legal for me to automate delivery of searches to my computer, although it may be against the TOS of the search engine owners. Increased bandwidth, hardware costs and general maintainance and upkeep mean that there are very real costs involved for a search engine to provide data to me if I automate a search compared to the time it takes to do it manually. To be fair to Google for a moment, they know that they have an immensely powerful tool and launched the Google API to help developers gain access to that store of data. They have also been extremely accomodating with their 1000 API limit for projects they like such as Google Alert and their sister site CopyScape

Those sites offer real and valuable information and resources and are targetted to Mr and Mrs Joe Bloggs rather than the webmaster (although it's great for them too)

There are a myriad of options that will not specifically breech the TOS a search engine has. Even the mighty publisher O'Reilly come under fire in their book Google Hacks

When it was origionally published they advocated the use of automation to get search results. The quitely changed that so that rather than direct and automated querying it would import a "manually searched for and saved page" from Google.

I am pretty sure that 99% of users dropped the manual part and added the automation back in.

We could do the same
We could also query another search engine that carries the same data but does not have a restrictive set of terms.
We could release an application engine and allow the owners of the application to chose how they want to search by opening up an API for searching.
We could carry on with all of these options and not be in direct breech of any search engines TOS but we will definately be annoying the search engines by not complying with the spirit of their TOS.
We could also get the big guns out and allow the community to come together and collectively build a grid style search and analysis application that will deliver the same kind of data the major search engines have without having to ever query them to find it out. This could be used to deliver a search engine in its own right but would be more likely used to deliver the data that the link (and other information) analysis application could look at on a personal level.

At the same time I don't think it is sensible to reinvent the wheel or build an open source collaboritive engine that might be a competitor to the existing and established market leaders.

The reality is that although we can stay well inside any TOS for any and all search engines I (and other members) would prefer to speak directly with the engines so that a sensible and workable solution can be found where as many parties as possible are happy.

I owe the search engines a huge thanks. They have indirectly paid my bills for a number of years and because of that I truly do owe them a debt of gratitude as well as a huge thanks. But at the same time if the TW application doesn't go ahead someone else will launch one. I think it would be in everyone's best interests (Searchers, Website owners, TW members and Search Engine operators) for us to sit down, have a cup of tea and a chat about how we can work together and build a tool that will deliver the functionality that end users are looking for whilst not upsetting the search engines.

I think comms is the key to the project rather than looking at TOS because as I mentioned above I can ensure we stay within pretty much any TOS that ultimately gets deployed but common sense says we should simply have a chat. Remember that if TW don't build it someone else will eventually!
JasonD is offline   Reply With Quote
Old 02-17-2005   #3
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
I see your point, however, there are always two sides to it.

The reason I would like to see such a tool in place is simply for academic reasons.

Anyway, your points are valid. But I am not optimistic with the engines participating on any level beyond the Google API. Of course, I can be wrong.
rustybrick is offline   Reply With Quote
Old 02-17-2005   #4
JasonD
Member
 
Join Date: Jan 2005
Posts: 51
JasonD will become famous soon enoughJasonD will become famous soon enough
Quote:
Originally Posted by rustybrick
But I am not optimistic with the engines participating on any level beyond the Google API. Of course, I can be wrong.
I hope you are wrong on this one pal, as the engines are operated by people just like you and I. People who have an interest in search and search technologies. They understand the financial value that a high ranking can deliver and also that they owe the community of website owners some thanks for building the sites that allow them to operate.

Most importantly of all they aren't silly or foolish. They know ,as well as I and you do that if the TW project doesn't go ahead then someone else will build it and release it to the masses. I am sure that they would prefer to be part of the project than see it come to fruition without their input. Either way if they chose not to chat about it then they are simply delaying the enevitable discussions with "someone" when such an application is launched.

My contact details are on my website which is accessable from my profile - I look forward to calls from any and all search engine reps
JasonD is offline   Reply With Quote
Old 02-17-2005   #5
mugshot
Member
 
Join Date: Sep 2004
Posts: 51
mugshot is on a distinguished road
Quote:
I believe if you use a search engine analysis tool you are a spammer and definately a spam researcher.
This is kinda disturbing. Every SEO whom I know use some sort of analysis tool to "check out" search engines. A lot of SEOs don't consider themselevs spammers but I'm sure this will shed some light on what we truly are? :-)

It would be nice if we went beyond API
mugshot is offline   Reply With Quote
Old 02-17-2005   #6
JasonD
Member
 
Join Date: Jan 2005
Posts: 51
JasonD will become famous soon enoughJasonD will become famous soon enough
Quote:
Originally Posted by mugshot
This is kinda disturbing. Every SEO whom I know use some sort of analysis tool to "check out" search engines. A lot of SEOs don't consider themselevs spammers but I'm sure this will shed some light on what we truly are? :-)

It would be nice if we went beyond API
It's my definition and not everyone will agree but essentially if you try to increase your rank you're a spammer or at the very least a wannabe spammer! If it aint natural it's spam.

But as Shakespeare once said....

The good news is that Yahoo have been very gracious and given us permission to do what has to be done in a workable format. I want to take my hat off to DaveN and the guys n gals at Yahoo for getting together and working out a solution. Well done

If any other search engine reps want to chat about the project please get in touch
JasonD is offline   Reply With Quote
Old 02-18-2005   #7
ihelpyou
 
Posts: n/a
Quote:
This is kinda disturbing. Every SEO whom I know use some sort of analysis tool to "check out" search engines.
You should do your homework. That's a general and sweeping statement that is patently false. Many of us have "zero" need for such things.


oops; just read the 'whom I know' part. Maybe who you know, but I'm here to tell ya that you must not know many seo's.

Last edited by ihelpyou : 02-18-2005 at 07:36 AM.
  Reply With Quote
Old 02-18-2005   #8
JasonD
Member
 
Join Date: Jan 2005
Posts: 51
JasonD will become famous soon enoughJasonD will become famous soon enough
To be fair to mugshot he did say

Quote:
Every SEO whom I know
N.B. My emphasis added

so is far from a general and sweeping statement. Tools are merely a facilitator to a person undertaking a job.

If I were a master carpenter (I am not by the way) I may use a hand drill, chisels and wooden pegs to make a piece of furniture but the labour would be increased and therefore costs escalate compared to using power tools and screws.

Either way an excellent piece of furniture could be delivered, it is just that manual construction is not as efficient as automated or semi automated construction.

The research stages of SEO are no different
JasonD is offline   Reply With Quote
Old 02-18-2005   #9
ihelpyou
 
Posts: n/a
I guess you did not see my 'edit'.

But anyway; I fail to see the need for ANY search engine giving spammers OR SEO's the ability to easily scrape their databases for ANY reason. My goodness; they have enough problems producing relevant results because of spammers. Any tool built to faciliate the businesses of spammers is a tool I could "never" endorse.

Besides, any SEO I know of who practices 'ethical' techniques does not need an automated tool for any reason. See Alan Perkin's newest article:
http://www.silverdisc.co.uk/articles/ethical-seo/

Why a search engine would agree to such a tool is beyond me. This industry is going downhill fast anyway. More and more of the web in general have a bad taste in their mouths about the SEM industry. Tools such as these only help perpetuate that taste, and only help those in this industry who wish to manipulate the se results by using unethical techniques.
  Reply With Quote
Old 02-18-2005   #10
JasonD
Member
 
Join Date: Jan 2005
Posts: 51
JasonD will become famous soon enoughJasonD will become famous soon enough
IhelpYou, you edited after I posted and added

Quote:
but I'm here to tell ya that you must not know many seo's
I may be wrong but (to me) that reads as a derogatory remark and I believe is uncalled for in a thread that has been (and I am sure will continue to be) polite and forthright in its open discussion of the topic.

Anyway, back on topic

Let me ask you a couple of simple questions related to my furniture related example above. When you undertake a new project to work on, do you not look at (manually) the SERPs to see how a client ranks currently, peer at their competitors and formulate a plan of action?

If so then isn't your preparation just a manual process, that automated tools do, but by undertaking it manually you are less efficient ?
JasonD is offline   Reply With Quote
Old 02-18-2005   #11
ihelpyou
 
Posts: n/a
The same analogy can be applied to my forums. Why would I want any automated scraper doing anything at all in them? If I provided any kind of advertising in my forums, what benefit is it to me if I allowed a scraper? Does a scraper click on ads and provide good ROI for my advertisers?

So you see, any search engine who agrees with your tool and analogy would be undercutting their actual existence, and ways they make money. As an example; Google uses Adwords. Now why would they endorse such a tool since a robot scraper cannot click on an ad? I do "everything" manually because that's how I expect others to treat me.

I don't understand why people think the search engines somehow owe them something, especially when that engine gives them Free visitors and referrals on a daily basis.

Now tell me again how this automated server scraper would benefit the search engines? Seems to me it simply benefits SEO's, and would enable them to "figure out" algos quicker.
  Reply With Quote
Old 02-18-2005   #12
JasonD
Member
 
Join Date: Jan 2005
Posts: 51
JasonD will become famous soon enoughJasonD will become famous soon enough
We cross posted again mate. Nice to see some lively debate between us

I think I answered your points in my post above, that automation is no different to manual research other than it is more efficient.

I will answer a couple of your specific points though.

Quote:
This industry is going downhill fast anyway. More and more of the web in general have a bad taste in their mouths about the SEM industry.
That's a personal opinion and not one I agree with. The web is increasing exponentially and the users, site owners, search engines and any all individuals in between have a myriad of different points of view. There is good, bad and down right evil as well as those who have a holier than thou attitude.

None of them are right or wrong as we don't live in a black and white world. Thankfully we live in a world dominated by pastels and simply painting in those 2 diverse colours doesn't give a full picture.

It definately offfers a way of delivering a striking image, buit striking is not the same as accurate!


Quote:
Tools such as these only help perpetuate that taste, and only help those in this industry who wish to manipulate the se results by using unethical techniques.
Lots of bad things were said about automation previously, but history has shown differently

The reality is that automation isn't going to go away and I feel that the search engines would rather work with it than work against it

Edited to fix my bad typing!

Last edited by JasonD : 02-18-2005 at 08:13 AM.
JasonD is offline   Reply With Quote
Old 02-18-2005   #13
ihelpyou
 
Posts: n/a
Cross posted again.
Quote:
I think I answered your points in my post above, that automation is no different to manual research other than it is more efficient.
yes; more efficient to whom? Read my post above that answers that question.

Automation is "very" much different than manual. You really cannot be serious, right?
  Reply With Quote
Old 02-18-2005   #14
JasonD
Member
 
Join Date: Jan 2005
Posts: 51
JasonD will become famous soon enoughJasonD will become famous soon enough
isn't cross posting fun I'll wait a while after posting this so we can try to get the thread back in sync

You said,

Quote:
The same analogy can be applied to my forums. Why would I want any automated scraper doing anything at all in them? If I provided any kind of advertising in my forums, what benefit is it to me if I allowed a scraper? Does a scraper click on ads and provide good ROI for my advertisers?
There would be absolutely zero benefit to you if you allowed a scraper into your forums but with the greatest respect (And I do mean this respectively, and hope it doesn't come out in a rude and disrepectful manner) your forums have close to zero value. This is especially true when we compare them to the search engines. I would not visit your forums to find out how I can increase targetted traffic to my website, whereas I would (and do, as you do too) go to the search engines to analyse what they do, how they do it and try to understand this in as an efficient as way as possible.

I doubt you are troubled by crawlers at all, or at least to the same extent that the search engines are. Let's presume I am wrong for a moment and that you have a massive problem with site scrapers, that have a very real and direct cost on your bandwidth, processing power, network infrastructure as well as time.

You could ban them one by one by blocking IP addresses, but in the long term that would be a fruitless battle as more and more chinks in the ban list are opened. You'd spend more and more time adminsitering your bans than actually running your forums. I am sure you would be thinking, there has to be a better answer

And I believe there is. Speak with the enemy!

Work with them so that the resources that are drained are managed and are not a drain on your long term revenue streams. Make sure that you give an alternative to direct scraping of your forums in the form of a feed - such as RSS or Atom. Access could a single feed can be managed much more efficiently than to your whole site, and if you give a reason for a scraper to use a controlled resource rather than leech even more of your bandwidth and other resources then I believe it'll work.

In search engine analogy, I know, the search engines know and I believe you know, that data extraction is rife already and by delivering a solution to these issues reduces the hassle for all involved. Of course they would prefer no one tried to increase their site in the SERPs but the reality is that most of us want that to happen, you included. Isn't it better to have some say on what can and can't be done. Terms of Service don't actually work in this instance whereas agreement between parties can and does.


Quote:
So you see, any search engine who agrees with your tool and analogy would be undercutting their actual existence, and ways they make money. As an example; Google uses Adwords. Now why would they endorse such a tool since a robot scraper cannot click on an ad? I do "everything" manually because that's how I expect others to treat me.
Google are one of the great guys out there. They released the G API and I believe it was for the reasons I stated above that they did this. The guys n gals at the plex understand what I am saying.

Quote:
I don't understand why people think the search engines somehow owe them something, especially when that engine gives them Free visitors and referrals on a daily basis.
I said above and I'll say it again here I feel the complete opposite. I actually owe the search engines a huge thanks as they have indirectly paid my bills and given me a greater than average income for quite some considerable time

Quote:
Now tell me again how this automated server scraper would benefit the search engines? Seems to me it simply benefits SEO's, and would enable them to "figure out" algos quicker.
Website owners and SEOers are in practice close to being the same entity.

Show me a commercial website owner that doesn't want to get a higher SERP ranking, so that (s)he can increase their customers and therefore bank balance and I'll let you pick any item you want from a milliners of your choice and I'll eat it at the TW meet in Stansted, have it videoed for you and give you full rights to it to use in a marketing campaign (or however you wish) proving you're right and I'm wrong.

N.B. recipes for making hats taste nice are gratefully received, just in case

Most website owners simply don't know how to do it. SEOers do!

Search engines definately owe a thanks to website owners as without them, they have no business model at all. As well as that the single most important benefit that the SE's get is control of an action that is ongoing anyway. Sure it may help understand the algo, but if scraping is going on and can't be controlled by brute force methods then surely trying to find a way of controlling it, whilst reducing costs, resource and hassles has to be a massive win for the engines themeselves.


P.S. I am definately going to stop posting now and allow the cross posting to sort itself out but 1st.....

Quote:
Automation is "very" much different than manual. You really cannot be serious, right?
Deadly so
JasonD is offline   Reply With Quote
Old 02-18-2005   #15
ihelpyou
 
Posts: n/a
Yes indeed; If I had a search engine who endorsed SEO's and site owners to download a feed to their computers that enabled those computers to simply scrape all data they wanted to scrape, my business as a search engine would cease to exist as I would not have any real visitors. That's great stuff.

Just think if all internet users in the world had this scraper; No need to "ever" visit Google again manually. That's great stuff. Each of us could quickly and more efficiently figure out Google's algo. Again; Great stuff.

I'm all for this scraper tool.
Quote:
And I believe there is. Speak with the enemy!
yep. Many of us don't see search engines as the enemy at all. That's a huge difference in our world and our views of this industry.
  Reply With Quote
Old 02-18-2005   #16
JasonD
Member
 
Join Date: Jan 2005
Posts: 51
JasonD will become famous soon enoughJasonD will become famous soon enough
Cool

Quote:
my business as a search engine would cease to exist as I would not have any real visitors
I think you misunderstand me. SEOers are not the search engines target market, but whilst the target market has access to the SERPs so do SEOers. If you can reduce the strain that researchers (as that is what an SEOer will be using the SERPs for) cause on the infrastructure then it is more open and accessable for the target market increasing business.

Quote:
Many of us don't see search engines as the enemy at all.
Neither do I, but I do respect that it is possible (some would say likely) that many within the search engines companies can see SEOers as the enemy. I can well imagine that within meetings all over search engine companies the world over that SEOers are the enemy, whereas I (and I belive you, and many other SEOers) look at the search engines as beneficial givers of traffic.

But whilst we live, operate and work in a commercial world rather than Utopia we want to rank better and get the benefits that delivers.

Whether the Search Engines look at me and others like me as an enemy or not is really a moot point. Speaking with a possible enemy has to deliver better rewards than the risk of ignoring of what is ongoing anyway.

P.S. My offer of the hat eating session is very real
JasonD is offline   Reply With Quote
Old 02-18-2005   #17
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
I just wanted to clarify my points in the original post of this thread.

This is not about a specific tool.

It is about the false intentions being displayed by some search engines. We all say how the bridge between the Search Engines and the SEOs are getting better. But its just one big fog. Its funny, we get an API, which is outstanding.

- the link command that doesnt work
- results are often limited
- data not always matches between the api and data centers

Just a few examples.

Many people, like Danny Sullivan, has done so much to try to improve the relationship. I just get this feeling that its a big game to some engines.

To be honest, I understand both sides. But I really just dont like the game.
rustybrick is offline   Reply With Quote
Old 02-18-2005   #18
Marcia
 
Marcia's Avatar
 
Join Date: Jun 2004
Location: Los Angeles, CA
Posts: 5,476
Marcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond repute
>>This is not about a specific tool.

Coulda fooled me. Between the first post and a few of the subsequent ones, that is exactly what I started out thinking, since it seemed to be obvious going out the gate - at least at first it did until it finally seemed to change course.

>>Now please, let me know your thoughts.

I would, if I could figure out what this thread is really about. IMHO it's the equivalent of a mixed metaphor and is totally mixing unrelated issues.

It has ALWAYS been a violation of Google's Terms of Usage to use automated tools, site scrapers, whatever to query their engine. And it is their site, so it's their privilege to limit anything they want however they want to. So how all of a sudden is anyone an innocent victim if they are not being allowed to?

What am I missing? Really - what is the core issue here? Is it about how Google shows (or doesn't show) links, or whether development of a tool was abandoned, or is it about whether or not Google's being scraped, or allowing their site to be scraped?

Last edited by Marcia : 02-18-2005 at 09:55 AM.
Marcia is offline   Reply With Quote
Old 02-18-2005   #19
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Quote:
Originally Posted by Marcia
What am I missing? Really - what is the core issue here? Is it about how Google shows (or doesn't show) links, or whether development of a tool was abandoned, or is it about whether or not Google's being scraped, or allowing their site to be scraped?
Sorry for the confusion... The core issue:

Quote:
Many people, like Danny Sullivan, has done so much to try to improve the relationship. I just get this feeling that its a big game to some engines.

To be honest, I understand both sides. But I really just dont like the game.
rustybrick is offline   Reply With Quote
Old 02-18-2005   #20
Marcia
 
Marcia's Avatar
 
Join Date: Jun 2004
Location: Los Angeles, CA
Posts: 5,476
Marcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond repute
Ok, so if that is the issue, what has it got to do with development of a tool being abandoned because of violating Google's TOS? Maybe that is what needs to be clarified for us. What has one got to do with the other?
Marcia is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off