Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Google > Google Web Search
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 01-31-2005   #1
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Has Google Dropped Their 101K Cache Limit?

Via Research Buzz:

Quote:
Heretofore Google would only index the first 101K of a Web page, no matter how long it was. This was bad if you were searching for pages that tended to be really large (some resource roundup pages, LOTS of PDF documents) because you couldn't be sure that you were searching the entire document.


Now it appears that they're indexing more entire pages.
What are your thoughts?
rustybrick is offline   Reply With Quote
Old 01-31-2005   #2
Nacho
 
Nacho's Avatar
 
Join Date: Jun 2004
Location: La Jolla, CA
Posts: 1,382
Nacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to behold
Quote:
Google would only index the first 101K of a Web page
Just to be clear for some forum readers, this was 101K of the html code and did not inlcude images. (source: WebmasterWorld) and in June 9th, 2003 GoogleGuy said:

Quote:
Originally Posted by GoogleGuy
The "100 links" guidelines is just a good rule of thumb. Keeping pages below 100K is always a very good idea too. But it's not anything that would cause a penalty.
The best roundup I've seen has to be Danny Sullivan's with his blog "Search Engine Size Wars V Erupts", where he describes and confirms saying the following:

Quote:
Originally Posted by dannysullivan
In the past, if a page were longer than 101K, only the first 101K worth of text was indexed by Google. Everything else was ignored. My assumption right now is that Google still operates this way. If not, we'll bring an update as more information is gained.
I guess the best way to prove this is by example, so here goes. When you do a search on one of my favorite articles, "131 (Legitimate) Link Building Strategies", you will notice that it says . . .
131 (Legitimate) Link Building Strategies
... com, and Animations.com. 131 (Legitimate) Link Building Strategies. By Chris Sherman, Associate Editor July 11, 2002. By Robin Nobles ...
searchenginewatch.com/searchday/article.php/2160301 - 101k - Jan 31, 2005 - Cached - Similar pages
. . . 101k right after the URL and before the last cached date, and when you click on the Cached version, you notice that effectively it only did cache more or less that amount. Therefore I viewed the source code, removed the Google cache stuff at the top and saved it to my desktop as an html file. The result, it gave me a 111 KB (113,857 bytes) file size. So unless WindowsXP is wrong or there are other tests anyone can share, it looks like Google is being 10% more generous than before.
Nacho is offline   Reply With Quote
Old 02-01-2005   #3
I, Brian
Whitehat on...Whitehat off...Whitehat on...Whitehat off...
 
Join Date: Jun 2004
Location: Scotland
Posts: 940
I, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of light
I have a few large literature reference files online with text over 101k, still showing as 101k for the cache. Will be interesting if Google are caching larger sizes.
I, Brian is offline   Reply With Quote
Old 02-01-2005   #4
Serp
Member
 
Join Date: Jan 2005
Posts: 9
Serp is on a distinguished road
hm

When I'm thinking on this subject...
1 reason - google database with 8,058,044,651 pages indexed is too big. Too many memory. So they want to cut down it.

2 reason - maybe links. On big pages when people are exchanging links even in my case I'm always giving link on the end of the page. So now if page is big enough links will not pass.
Serp is offline   Reply With Quote
Old 02-01-2005   #5
Lance Housley
Looking from the Searcher's Angle
 
Join Date: Jun 2004
Location: Canterbury, England, UK
Posts: 24
Lance Housley has disabled reputation
Quote:
Originally Posted by I, Brian
I have a few large literature reference files online with text over 101k, still showing as 101k for the cache. Will be interesting if Google are caching larger sizes.
Spot on!
However, checking back over my testing and training records for the past couple of years, I have copious notes to the effect that the 101k limit only ever applied to HTML documents, and that other file types (and certainly pdf files) were indexed way beyond 101k, and perhaps without limit.
I cannot see anything at the moment to make me rethink that assessment.
Lance Housley is offline   Reply With Quote
Old 02-01-2005   #6
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Quote:
Originally Posted by Lance Housley
Spot on!
However, checking back over my testing and training records for the past couple of years, I have copious notes to the effect that the 101k limit only ever applied to HTML documents, and that other file types (and certainly pdf files) were indexed way beyond 101k, and perhaps without limit.
I cannot see anything at the moment to make me rethink that assessment.
Yes, I believe Yahoo! said that they index X amount of a PDF document, Y amount of Word docs, Z amount of HTML and so on.

So I would assume that Google and the other engines do something similar.
rustybrick is offline   Reply With Quote
Old 04-10-2005   #7
Nacho
 
Nacho's Avatar
 
Join Date: Jun 2004
Location: La Jolla, CA
Posts: 1,382
Nacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to behold
Members from Webmaster World are reporting a change in Google's cache limit.

I tested the "131 legitimate link building strategies" webpage and the cache limit continues to be 111K as reported back in 1/31/05. But I would call it off because of this just one test. So, if you have other cached pages being higher than the 100K or so, please let us know.
Nacho is offline   Reply With Quote
Old 04-10-2005   #8
jan
Member
 
Join Date: Jul 2004
Location: Netherlands
Posts: 40
jan is on a distinguished road
For pdf files I have examples of just over 1 MB indexed.
jan is offline   Reply With Quote
Old 04-11-2005   #9
dannysullivan
Editor, SearchEngineLand.com (Info, Great Columns & Daily Recap Of Search News!)
 
Join Date: May 2004
Location: Search Engine Land
Posts: 2,085
dannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud of
PDF files have always been indexed in excess of 101K. That limit was only for HTML and text files.

We covered the upping back in this Feb. blog post: Google Upping 101K Page Index Limit?
. We found it wasn't always dependable. I did ask Google and the response was the typical "we're always testing stuff." So expect you may find exceptions but that this won't always be dependable.
dannysullivan is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off