Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Google > Google Web Search
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Thread Tools
Old 10-25-2004   #1
DaveN's Avatar
Join Date: Jun 2004
Location: North Yorkshire
Posts: 434
DaveN is a name known to allDaveN is a name known to allDaveN is a name known to allDaveN is a name known to allDaveN is a name known to allDaveN is a name known to all
googles signals and Load balancing

Google Load Balancing

The setup they have now basically detects your IP on a DNS query so when you search Google for blue widgets..

The first thing that happens is that Googles Load balancing system selects a cluster from the users Geographical area (each cluster has a few 1000 machines), two main reasons time to deliver results and DC failures.

Once Google has determined which cluster you are querying, the results are totally dependant on that cluster, and then the HTTP request is split into two main areas

1) index servers
2) document servers

The query to the index servers is to get the Hit list, this is to get a relevant set of Documents by keyword and then they score each document in turn

This score will determine what page will be displayed in the serps, because of the mammoth amount of data G splits it down into Index Shards, each shard is a random subset of the main index and a pool of machines serves each request of each shard.

The final part is to get the Docids into an ordered List then the document server computes the Title and the snippet, then the GWS checks the speller check server and Ad server.

Then you gets you results….. well that’s in practice, we have all heard talk of Google “signals” Florida update was so bad because they didn’t have enough signals ( yer whatever )..

what if Google’s signals ran after the GWS got the speller chacker info and Ads, could they not run a blacklist, spam list, white list or just an extra filter via ip location making the results almost different of everyone… we guess that local search could go live on all G DC’s meaning that if you live in Washington DC and search of “mobile phones” a extra boost would be given to companies that operate from DC ,

what if you linked to a mobile phone site and your site was about coffee mugs then give you a -10 or -20 pen, would that not stop pr selling.. anyway food for thought

DaveN is offline   Reply With Quote
Old 10-25-2004   #2
Join Date: Jun 2004
Posts: 55
Jeremy_Goodrich will become famous soon enough
Nice idea, but...I don't think so ;)

>>>what if you linked to a mobile phone site and your site was about coffee mugs then give you a -10 or -20 pen, would that not stop pr selling

Seems that there are too many instances of 'off topic' anchor text that would contradict the above from being useful - not saying this wouldn't work in theory, but my guess is you'd have too much collateral damage by implementing such a filter.

Blog spam seems to have come & gone (for the most part...) on Google, but the reality is, you can still do nearly anything in small doses & rank well - I think more in terms of the speed limit - like when yer driving yer car.

Everybody goes about (here in California, anway) 5-8 mph (miles per hour) over the norm - this is fine. If you step across that invisible line, *and* there is a cop around...then they'll bust you, and you get a ticket. If you driving only 7 mph over, and you happen across a few cops (say, spam engineers for some search engine) then the likelihood is that they'll look at you, but not issue you a ticket.

My metaphor may suck, but imho, this is more likely how the whoe shebang works - there are still too many large sites that have enough off topic anchor text, that once taken away, the amount of topic / semantically related anchor text shouldn't be enough to drive them to the top of the SERP - but in many cases, they're doing just fine...other thoughts?
Jeremy_Goodrich is offline   Reply With Quote
Old 10-25-2004   #3
Honorary Member
Dodger's Avatar
Join Date: Jun 2004
Location: Central US
Posts: 349
Dodger has a spectacular aura aboutDodger has a spectacular aura aboutDodger has a spectacular aura about
Maybe MapReduce is being used as you described to break this down quickly. MapReduce is an apparantly not so new process that Google has been using since February of 2003.

Abstract of MapReduce:
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.

Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.

Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.
I am Ronnie

Last edited by Dodger : 10-25-2004 at 05:04 PM.
Dodger is offline   Reply With Quote

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off