Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Google > Google Web Search
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 03-31-2005   #1
msgraph
ºº
 
Join Date: Jun 2004
Posts: 31
msgraph is a jewel in the roughmsgraph is a jewel in the roughmsgraph is a jewel in the roughmsgraph is a jewel in the rough
Does New Google Patent Validate Sandbox Theory?

Must read

Probably one of the best bits of information released by them in a patent.

Large number of inventors listed on here, even Matt Cutts that guy who attends those SE conferences . Explains a bit about what is already known through experience as well as comments made by search engine representatives.

Example:

Quote:
[0039] Consider the example of a document with an inception date of yesterday that is referenced by 10 back links. This document may be scored higher by search engine 125 than a document with an inception date of 10 years ago that is referenced by 100 back links because the rate of link growth for the former is relatively higher than the latter. While a spiky rate of growth in the number of back links may be a factor used by search engine 125 to score documents, it may also signal an attempt to spam search engine 125. Accordingly, in this situation, search engine 125 may actually lower the score of a document(s) to reduce the effect of spamming.
Information retrieval based on historical data

Last edited by msgraph : 03-31-2005 at 12:02 PM.
msgraph is offline   Reply With Quote
Old 03-31-2005   #2
I, Brian
Whitehat on...Whitehat off...Whitehat on...Whitehat off...
 
Join Date: Jun 2004
Location: Scotland
Posts: 940
I, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of light
Bloody interesting read. Not only does it look like the Sandbox Uncovered, it's got Matt Cutt's name on it!!
I, Brian is offline   Reply With Quote
Old 03-31-2005   #3
Phoenix
Member
 
Join Date: Jun 2004
Location: Austin, Texas
Posts: 97
Phoenix is just really nicePhoenix is just really nicePhoenix is just really nicePhoenix is just really nicePhoenix is just really nice
Brilliant find, does explain a few things, but then again not everything. I wonder why we haven't found this sooner. Last sentence in the document is particularly interesting in how it explains the "boost" new sites normally receive after initially getting indexed. You do well for awhile, and then you crash later. There is for a "lag" on scoring for this it seems.

Quote:
Accordingly, in this situation, search engine 125 may actually lower the score of a document(s) to reduce the effect of spamming.
Phoenix is offline   Reply With Quote
Old 03-31-2005   #4
Phoenix
Member
 
Join Date: Jun 2004
Location: Austin, Texas
Posts: 97
Phoenix is just really nicePhoenix is just really nicePhoenix is just really nicePhoenix is just really nicePhoenix is just really nice
Quote:
"Also, "stale" documents (i.e., those documents that have not been updated for a period of time and, thus, contain stale data) may be ranked higher than "fresher" documents (i.e., those documents that have been more recently updated and, thus, contain more recent data). In some particular contexts, the higher ranking stale documents degrade the search results."
Here is a line to support the "freshness" of a site does better line of thinking. It appears older and "staler" is not always better.
Phoenix is offline   Reply With Quote
Old 03-31-2005   #5
bhartzer
Search Engine Optimization, Search Engine Marketing Expert
 
Join Date: Jun 2004
Location: Dallas, Texas
Posts: 534
bhartzer has a spectacular aura aboutbhartzer has a spectacular aura aboutbhartzer has a spectacular aura about
Quote:
I wonder why we haven't found this sooner.
We couldn't find it sooner. Notice the date on the patent, March 31, 2005.

I'm particularly interested in this part:
Quote:
38. The method of claim 1, wherein the one or more types of history data includes domain-related information corresponding to domains associated with documents; and wherein the generating a score includes: analyzing domain-related information corresponding to a domain associated with the document over time, and scoring the document based, at least in part, on a result of the analyzing.

40. The method of claim 38, wherein the domain-related information is related to at least one of an expiration date of the domain, a domain name server record associated with the domain, and a name server associated with the domain.
It looks like I'll now be renewing my domains for additional years now. I've heard that you can now register your domain from NetSol for the next 100 years.
__________________
Bill Hartzer is an internet marketing consultant in Dallas and has been practicing organic SEO since 1996.
bhartzer is offline   Reply With Quote
Old 03-31-2005   #6
SEjack
Member
 
Join Date: Mar 2005
Posts: 13
SEjack is on a distinguished road
But what does this all mean. its a patent and that doesent mean that this is how the SE works. Correct me if I am wrong but...this dosent mean anything.
SEjack is offline   Reply With Quote
Old 03-31-2005   #7
dannysullivan
Editor, SearchEngineLand.com (Info, Great Columns & Daily Recap Of Search News!)
 
Join Date: May 2004
Location: Search Engine Land
Posts: 2,085
dannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud of
If they've taken the time to patent this, it strongly suggest they are using some of what's described. You're correct that it doesn't explain exactly how the search engine works completely or that any or all of what's described is being used. But part of what are described fit in with things some people have been scratching their heads over -- the new info seems to neatly explain some of what's been seen.
dannysullivan is offline   Reply With Quote
Old 03-31-2005   #8
SEjack
Member
 
Join Date: Mar 2005
Posts: 13
SEjack is on a distinguished road
I see what you mean Danny. Great points. I am so glad I joined this forum filled with great knowledgeble members.
SEjack is offline   Reply With Quote
Old 03-31-2005   #9
Mike Grehan
Member
 
Join Date: Jun 2004
Posts: 116
Mike Grehan is a name known to allMike Grehan is a name known to allMike Grehan is a name known to allMike Grehan is a name known to allMike Grehan is a name known to allMike Grehan is a name known to all
For what it's worth, I wrote a long paper about network theory which covers some of this (Filthy Linking Rich).

I'm with my dear friend and supporter Dr. Edel Garcia. If we don't start and pay attention to information retrieval science, which Edel is an expert in, and even listen to my own musings on network theory - we're screwed!

If we talk about anecdotal nonsense over and over again, just to point clients in the direction of some substance. And then fail miserably over and over again...

It's just as I said in New York about "do we need to know how a search engine really works."

And this supports the article Edel wrote about keyword density analysis yesterday (I can't find where it is in the forum - help moderator!).

This industry is growing - we better learn to grow with it.
Mike Grehan is offline   Reply With Quote
Old 03-31-2005   #10
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Quote:
38. The method of claim 1, wherein the one or more types of history data includes domain-related information corresponding to domains associated with documents; and wherein the generating a score includes: analyzing domain-related information corresponding to a domain associated with the document over time, and scoring the document based, at least in part, on a result of the analyzing.
I am happy this came out as well.

Matt Cutts at SES NYC told me, we did not become a register to register domain names. He went on to explain exactly what is written above. Why didn't I say anything? Because we all knew it, but now its on paper.
rustybrick is offline   Reply With Quote
Old 03-31-2005   #11
dazzlindonna
Internet Entrepreneur
 
Join Date: Jan 2005
Location: Franklinton, LA, USA
Posts: 91
dazzlindonna is a glorious beacon of lightdazzlindonna is a glorious beacon of lightdazzlindonna is a glorious beacon of lightdazzlindonna is a glorious beacon of lightdazzlindonna is a glorious beacon of light
The user behavior parts of the document also fits right in with the purchase of Urchin.
dazzlindonna is offline   Reply With Quote
Old 03-31-2005   #12
I, Brian
Whitehat on...Whitehat off...Whitehat on...Whitehat off...
 
Join Date: Jun 2004
Location: Scotland
Posts: 940
I, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of light
Check out 114-116 - toolbar data usage for ranking purposes.
I, Brian is offline   Reply With Quote
Old 03-31-2005   #13
PhilC
Member
 
Join Date: Oct 2004
Location: UK
Posts: 1,657
PhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud of
Quote:
5. The method of claim 2, wherein the inception date corresponding to the document is based on at least one of a date when a search engine first discovers the document, a date when a search engine first discovers a link to the document, and a date when the document includes at least a predetermined number of pages.
Alright. What does that last bit mean? Up to then I'd assumed that a document *is* a page.
PhilC is offline   Reply With Quote
Old 03-31-2005   #14
Frank Kilkelly
Member
 
Join Date: Mar 2005
Location: Malmö, Sweden
Posts: 19
Frank Kilkelly is on a distinguished road
From the DETAILED DESCRIPTION section

Quote:
[0020] A "document," as the term is used herein, is to be broadly interpreted to include any machine-readable and machine-storable work product. A document may include an e-mail, a web site, a file, a combination of files, one or more files with embedded links to other files, a news group posting, a blog, a web advertisement, etc. In the context of the Internet, a common document is a web page. Web pages often include textual information and may include embedded information (such as meta information, images, hyperlinks, etc.) and/or embedded instructions (such as Javascript, etc.). A page may correspond to a document or a portion of a document. Therefore, the words "page" and "document" may be used interchangeably in some cases. In other cases, a page may refer to a portion of a document, such as a sub-document. It may also be possible for a page to correspond to more than a single document.
Frank Kilkelly is offline   Reply With Quote
Old 03-31-2005   #15
Relevancy
Relevancy Brings Results
 
Join Date: Jan 2005
Location: CA
Posts: 225
Relevancy is on a distinguished road
So basically they are analyzing register information for how many years you are dedicating to the domain and because they want to see the interconnectivity of all your domains?

Then they are looking at fresh index types of data such as staleness of a page and how many links point to it. Plus if those links stay pointing to it or not?

Last edited by Relevancy : 03-31-2005 at 10:26 PM.
Relevancy is offline   Reply With Quote
Old 03-31-2005   #16
Phoenix
Member
 
Join Date: Jun 2004
Location: Austin, Texas
Posts: 97
Phoenix is just really nicePhoenix is just really nicePhoenix is just really nicePhoenix is just really nicePhoenix is just really nice
Quote:
17. The method of claim 1, wherein the one or more types of history data includes information relating to search terms that increasingly appear in search queries over time; and wherein the generating a score includes: determining whether the document is associated with the search terms, and scoring the document based, at least in part, on whether the document is associated with the search terms.
The above might be of interest to Dr. Garcia, and validates the importance of considering linearization in optimization he mentioned, as well as the importance of looking how tokenization and especially filtering applies to your documents. You can then use that information obtained from the study so that you don't end up on the wrong side of what is mentioned in the quote above: a document not associated with the search terms used.

There is also mention on something I have studied more closely about how in the patent it tries to detemine if the domain has changed owners. It says "[0128] a significant change over time in the set of topics assoicated with a document may indicate that the document has changed owners and previous document indicators, such as score, anchor text, etc..are no longer reliable."

I know a few people that have avoided the "sandbox" so to so, by "rehabing" older domains in order to set up new content or pages on the domain. For the most part this works if done without extending the domain to much as much as I have discovered. As it says "Similarly, a spike in the number of topics could indicate spam...that the document has been taken over as a "doorway" document". I don't think this was something before you would consider when rehabing a domain, but it sure will be know. At least for me.
Phoenix is offline   Reply With Quote
Old 04-01-2005   #17
whistleman
 
Posts: n/a
A little deeper look and what do we find?

Google is also acknowledging the following while describing how it gathers "User Maintained/Generated Data" in its patent filings


Quote:
"According to an implementation consistent with the principles of the invention, user maintained or generated data may be used to generate (or alter) a score associated with a document. For example, search engine 125 may monitor data maintained or generated by a user, such as "bookmarks," "favorites," or other types of data that may provide some indication of documents favored by, or of interest to, the user. Search engine 125 may obtain this data either directly (e.g., via a browser assistant) or indirectly (e.g., via a browser). Search engine 125 may then analyze over time a number of bookmarks/favorites to which a document is associated to determine the importance of the document.

[0115] Search engine 125 may also analyze upward and downward trends to add or remove the document (or more specifically, a path to the document) from the bookmarks/favorites lists, the rate at which the document is added to or removed from the bookmarks/favorites lists, and/or whether the document is added to, deleted from, or accessed through the bookmarks/favorites lists. If a number of users are adding a particular document to their bookmarks/favorites lists or often accessing the document through such lists over time, this may be considered an indication that the document is relatively important. On the other hand, if a number of users are decreasingly accessing a document indicated in their bookmarks/favorites list or are increasingly deleting/replacing the path to such document from their lists, this may be taken as an indication that the document is outdated, unpopular, etc. Search engine 125 may then score the documents accordingly. "
I don't know about you, but reading this particular part definitely gave me chills. After posting this article I guess I will let the Google toolbar go.

Last edited by whistleman : 04-01-2005 at 01:00 AM. Reason: corrected URL
  Reply With Quote
Old 04-01-2005   #18
Scoreboard
Member
 
Join Date: Dec 2004
Location: San Antonio, TX
Posts: 9
Scoreboard is on a distinguished road
I wouldn't worry too much about anything in this patent. Pragmatically, it conflicts it's own aims, and if I were a betting man, Google floated that purely to lay claim to any conceptual-based arguments to be made by competitors going forward.

It really sounds like a bunch of engineers got high on the Tahoe ski trip and whiteboarded their thoughts.
Scoreboard is offline   Reply With Quote
Old 04-01-2005   #19
Relevancy
Relevancy Brings Results
 
Join Date: Jan 2005
Location: CA
Posts: 225
Relevancy is on a distinguished road
I agree, it does sound like a lot of clutter to mask a few key ideas. Most of it is "ifs" and "in some ways". The basic idea is they seem to be analyzing register information and then they go into how the value fresh vs stale pages.

Last edited by Marcia : 04-01-2005 at 05:12 AM. Reason: Unnecessary URL removed.
Relevancy is offline   Reply With Quote
Old 04-01-2005   #20
Mel
Just the facts ma'm
 
Join Date: Jun 2004
Location: Malaysia
Posts: 793
Mel is just really niceMel is just really niceMel is just really niceMel is just really nice
It gives me the chills too Whistleman but there are a couple of questions I have regarding this patent, the first of which is why is it not in Googles name or at least assigned to Google?

Secondly the idea of rankings based at least partially on how often a page is accessed from bookmarks reminds me of Alexa, just too easy to manipulate.
__________________
Mel Nelson
Expert SEO Dont settle for average SEO
Singapore Search Engine Optimization and web design
Mel is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off