View Full Version : Level of Trust for Matt Cutts' Sandbox Explanation @ SES NYC
randfish
03-06-2005, 05:11 AM
I'm hoping to get some discussion on what I and several others "read between the lines" based on what Matt Cutts' (and Craig Manning) had to say about the "sandbox effect". It was my impression that both suggested that Google watches for sites that fall outside the norms in their sector for link building and optimization efforts and will "hold back" those sites until they can be manually reviewed to lift the penalty.
Matt mentioned looking specifically at sites like ChristopherReeve.org & Tsunami.Blogspot.com - do you think there's any validity to his statements? How much or little do you think this has to do with what we in the industry call "sandbox"?
I appreciate your input as I don't have much experience with how much or little to trust these guys.
I, Brian
03-06-2005, 10:18 AM
I've heard it said for years that links too fast will "raise a flag".
However, I find it hard to imagine that a technology-focussed company like Google is going to assign issues of relevancy on this scale entirely to human review.
I'm curious what information you are referring to on sandboxing?
Nacho
03-06-2005, 01:05 PM
In stead of reading between the lines, can anyone actually quote Matt and Craig on what was said?
PhilC
03-06-2005, 01:19 PM
I'd like to know that too.
Did you post your "reading between the lines" or what was actually said?
A quick thought on it:-
For several months, people were reporting that no new sites were added to the index after a certain date last year - the start of the sandbox. That doesn't equate with auto-identifying and holding back oddities, but, then, I don't know if your posted understanding is "between the lines" or not.
randfish
03-06-2005, 01:47 PM
OK, I'm going to do my best to quote both Matt & Craig from the different sessions, and not the reading between the lines, but the actual text.
Matt:
"If a new site is getting 1,000 links a day, it's going to look suspicious. Don't get me wrong, there are sites that deserve them like ChristopherReeve.org and Tsunami.Blogspot.com, but we look and see if those sites deserve their links."
Craig:
When someone asks about being able to filter out spam and spot "outliers" and the computational expensiveness of that process, Craig says "We have lots and lots of computers".
Matt (in a later session):
While talking about building too many links too fast says "Who here can afford to lose all their rankings for all their sites?" Only Eric Ward & Greg Boser raise their hands (pretty funny).
Greg (during same session):
(Talking about sites that build links too fast)I call it the "litter box" not the "sandbox" and it takes forever because these guys (looks at Matt) are gonna check out the site (Matt nods and smiles in a way that makes it seem to me like Greg & Matt have discussed this before).
I know this isn't a lot of evidence, but I'm trying to look for some nugget of truth regarding the phenomenon of sites that rank #1 at every engine but Google (where they're #550). This explanation makes a lot of sense, except for the fact that so many were "released" on Feb. 2. If they were getting manually reviewed, you'd think they'd be "released" as they were checked out, not all together in big clumps...
Nacho
03-06-2005, 01:57 PM
Okay then.
So now we can better read between the lines with a little more information. Let me just ask you this, are all websites that fall into this so called sanbox getting 1,000s of links per day?
I don't think so.
As to the trust thing. Yes I do trust what Matt and Craig say, always. It's what they don't say that's always the mistery which opens up the bounderies of possibilites. ;)
lots0
03-06-2005, 02:19 PM
Matt:
"If a new site is getting 1,000 links a day, it's going to look suspicious.
Yes it would look suspicions, if there was any way for google to tell you were getting thousands of links a day.
Googlebot can only find links on the pages it parses.
What if it takes several weeks for gbot to find and index the 10,000 pages that have the 10,000 links pointing to the page in question?
Greg (during same session):
(Talking about sites that build links too fast)I call it the "litter box" not the "sandbox" and it takes forever because these guys (looks at Matt) are gonna check out the site (Matt nods and smiles in a way that makes it seem to me like Greg & Matt have discussed this before).Maybe the nod and smile is cuz they have pulled off a great misdirection (slight of hand) move. ;)
Michael Martinez
03-06-2005, 02:56 PM
When I first set up my personal domain (to replace my personal homepage on Xenite), I only linked to it from my own sites. I did not pursue links from other sites. It only included a few content pages, but they were full of content.
The site remained in the sandbox for a very long time. I eventually updated the content, and noticed a number of other sites were now linking to it. About that time, the site started to appear in Google, but still doesn't rank very well for my name (I doubt it ever will, but we'll see).
So, that is an example of a site which did NOT get thousands of links (much less a thousand per day). But it is still an example of a site which did not generate a lot of unique inbound links.
Nacho
03-06-2005, 03:22 PM
Moderation Note: This thread is based on what Matt Cutts' (and Craig Manning) had to say about the "sandbox effect". It is NOT about providing examples of sandbox sites. Let's stick to this topic please.
Michael Martinez
03-06-2005, 03:33 PM
Moderation Note: This thread is based on what Matt Cutts' (and Craig Manning) had to say about the "sandbox effect". It is NOT about providing examples of sandbox sites. Let's stick to this topic please.
And the original post included the following question:
How much or little do you think this has to do with what we in the industry call "sandbox"?
How do you expect to get meaningful discussion if we don't attempt to relate what they said to what we are seeing in the real search results? It's not like I dropped a link in there.
I don't have enough information to decide whether I trust what they have reportedly said.
I, Brian
03-06-2005, 05:07 PM
I really don't think it's as simple as monitoring whether a site is getting too many links or not - my personal suspicion is that sandboxing is in majority an automated process, relying on a number of factors.
Somehow age of domain registration seem involved, but also the search frequency of the keywords involved looks to myself like it could be a factor. There's room to additionally argue that reduced number of link variations may also play some role in sandboxing.
EDIT: I'd posted an example of research here showing differences between a couple of sites, but the mods seem very twitchy today so I've removed it.
Michael Martinez
03-06-2005, 06:14 PM
I agree. I think Google introduced some sort of aging factor (in fact, I have written extensively about this on another forum). Maybe they have since decided to dispense with the aging factor, perhaps because they found that too many innocent sites were being dumped into the sandbox.
If that is the case, then the reported statements from the conference won't shed much light on what Google has actually been doing, since they don't offer much information regarding the extent of Google's efforts, or the duration.
These comments are really being provided out of context. We would need to see a transcript for all the sessions concerned. And even then, questions might arise for us that were not asked.
PhilC
03-06-2005, 06:42 PM
Moderation Note: This thread is based on what Matt Cutts' (and Craig Manning) had to say about the "sandbox effect". It is NOT about providing examples of sandbox sites. Let's stick to this topic please.
I didn't see an example of a sandboxed site. I saw an example of a personal experience of the sandbox, which the topic of the thread merits, imo.
I don't find what those guys said matches the experiences that people have reported since the so-called sandbox came into effect. If what they said is correct, then it must surely be only a part of the truth about the sandbox.
About the reason why so many were released recently:- if they really are evaluating all the sandboxed sites by hand, which is very hard to believe because of the sheer quantity of sites, then they may have decided to clear the decks and start again due to a massive build up.
randfish
03-06-2005, 11:44 PM
Phil - That actually sounds fairly logical to me. Brian, maybe you could PM me your research. I'd be interested to see it if you can't share it on the board.
I was thinking about the quantity of sites each day that "stand out" from the norm and seem to be getting many more backlinks than usual. Certainly they are automatically "caught" by the indexing system, but the question is how possible it would be for a human (or several) to actually look at the outliers each day. There's no question this would improve search relevancy in my mind, the only question is - is it possible?
Let me make up some figures - let's say each day there are 3000 sites that get "tagged" by the indexer as looking fishy. If a human being needs to look at each one, I'd estimate no more than 5 minutes per site, which would be 250 man hours each day - that would require a team of at least 30 people doing nothing but reviewing sites all day long... seems possible but unlikely.
If the indexer were only catching 1000 sites each day, I could easily see a ten-man team set up to watch for this type of activity. Anyone else's opinion?
rustybrick
03-07-2005, 09:45 AM
I had a five minute discussion with Matt on this topic by the speaker room.
I brought up a classic example of a company that specializes in building "insect rearing rooms." There is zero competition for this company, they wrote long, detailed information on how to build such a room. They are simply the authority on the Web for this topic. So he said, let me take a look at the site. And he will, I should hear back by mid this week. (I have never sent a specific example of a site to any of the engines before this one, and I probably never will again.)
But then I popped the question about the domain registrar topic. Why did Google become a registrar? Matt told me, it is not to register domain names. We then moved to the topic of how its important for Google to look at the "freshness" of a page and the age of a domain name. Bingo? I don't know. I did not get a direct answer from Matt about the sandbox. One thing is for sure, Google does not use that term inside the GooglePlex.
I'll let you know what I can about the results from my example site, sent to Matt. I am not expecting much but we will see.
Of course, Matt gave examples of sites that were not affected by this. But so many are.
I was never convinced "sandbox" existed. It didn't seem plausible for the task of retrieving relevant results.
bragadocchio
03-07-2005, 10:27 AM
Matt mentioned looking specifically at sites like ChristopherReeve.org & Tsunami.Blogspot.com - do you think there's any validity to his statements? How much or little do you think this has to do with what we in the industry call "sandbox"?
I attended both of these sessions, too. (Rand and I shared some thoughts on what was said.) The representatives from the search engines seem pretty aware that we would be looking at, and dissecting their every word, and I think that they were pretty careful in the way they presented information and responded to questions.
Rand's quotes pretty much match what I wrote in my notes (I got a big kick out of the "lots and lots of computers" answer.)
It's tempting to try to read more into some of those statements than may have been there, and a session on linkbuilding seemed like a reasonable place for someone to ask a question about a sandbox effect.
But, going back over them in my head, and the contexts under which they were uttered, I think that they would have given the same answers, and responses regardless of whether or not there was a sandbox.
I'm skeptical about the existence of a "sandbox" in the manner that many portray it, but I was willing to sit there are listen and see if I would hear an admission of some type that something of the sort existed. I just don't feel that there was enough there to draw any conclusions upon.
I, Brian
03-07-2005, 10:37 AM
I was never convinced "sandbox" existed. It didn't seem plausible for the task of retrieving relevant results.
I don't think it's meant for that task, as much as creating a barrier against short-term spam.
randfish
03-07-2005, 01:00 PM
Just to clarify what I personally mean when I say "sandbox", I mean a site that ranks top 5 at Yahoo!, MSN, Teoma, for the allin searches @ Google and yet appears at number 100+ in the Google SERPs. Certainly Google has their own unique algorithm, but typically an experienced professional (or an IR researcher like yourself, Xan) can look at the site in particular and the rankings and conclude that something is "funny".
Bill,
I have to agree that they probably would have said the same thing, sandbox or no, but just because we use a different term (Greg called it the "litter box") doesn't mean they don't know what we're talking about. It's certainly a relatively new phenomenon (about 1 year old now) and had never been seen prior to about March of 2004.
lots0
03-07-2005, 01:38 PM
I mean a site that ranks top 5 at Yahoo!, MSN, Teoma, for the allin searches @ Google and yet
appears at number 100+ in the Google SERPs. Certainly Google has their own unique algorithm, but
typically an experienced professional (or an IR researcher like yourself, Xan) can look at the site in particular and the rankings and conclude that something is "funny".
Nothing “funny” about it, it can easily be explained; The page in question did not rank in the google algo. In other words, google uses different criteria and/or puts different weights on the ranking factors than the other SEs.
Google is far more advanced in the way they look at anchor text and linking structure than any of the other engines and Google also has different filters in place than the other engines. Either one of these alone could well explain the ranking discrepancy.
I don’t believe that there is or ever was a such a thing as a “sandbox”.
"Sandbox" = Someone trying to explain something they don't understand.
I have looked at a lot of so called "sandboxed" pages, in every case there was a completely rational and logical reason the page was not ranking, usually something rather obvious that was overlooked.
ThouShaltSeo
03-07-2005, 02:29 PM
Hmmmm...total number of new links/number of days since last calculation. Not exactly hard.
There's probably some truth on their statements, however it is on their best interest to make us believe that they're much more sophisticated and thorough than they really are.
What if it takes several weeks for gbot to find and index the 10,000 pages that have the 10,000 links pointing to the page in question?
lots0
03-07-2005, 04:15 PM
Hmmmm...total number of new links/number of days since last calculation. Not exactly hard.
It would not be hard if they only had to do it for a few thousand pages. However, when you are talking about comparing BILLIONS of pages it becomes a task that borders on the highly improbable.
I know my opinion/position ruffles some feathers, some folks have tried to make a career off the so called "sandbox". But like I said I have yet to see a page that was "sandboxed" that did not have a reason (other than the sandbox) as to why the page was not ranking.
detlev
03-07-2005, 05:18 PM
Hello everyone,
I think Matt made it pretty clear that they try to allow for sites such as ChistopherReeves.com to make it through the filters but that such a site would get natural looking link development. When a site gains in links with the same link text and from some questionable type sites then it may come under such scrutiny as suggested by a sandbox. I am not a big believer in "sandbox" which denotes that a site is limited in ranking from the start just because it is new. That would ruin a search user's experience. New sites that should rank well straight away need to make it past any such filtering. Examples of such sites were given by Matt on the panel.
I don't pay much attention to "sandbox" theories except to think the publisher who compains must have done something that triggered general poor rankings in Google. If after time the site raises in rankings it doesn't mean it was in then out of a Google sandbox. I attribute it to the general ebb and flow of rankings over time. New sites come under scrutiny as would an old site that jumps in backlinks overnight. It is these statistical anomalies that are usually caused by spamdexing that can have the affect of poor rankings or removal from the index.
Best,
-detlev
I agree with you detlev.
Also there are many other ways to deal with spam which are far cleverer than barricading everything and blocking entrance to the index until every one has been checked. Sounds like a nightclub on a saturday night!
It is very diffucult to gain an understanding of what SE representatives mean when they are talking about search issues in public.
Googleguys posts in various forums are good examples, they are always helpful and informative on first reading, but they never reveal anything about the search engines algo, and after you read them a few times you begin to see that there are other possibilities in addition to those which spring to mind on the first reading.
I suspect that when people like Matt and Greg go out to speak in public they are given a small plastic laminated card to refer to in case of doubt, which reads:
Thou shalt not reveal anything regarding the algo noway nohow.
and while we all hope to learn from these gurus who possess that ultimate knowledge we all seek, that it's pretty much wishful thinking.
Nacho
03-07-2005, 09:14 PM
If Google was really doing something as to how fast links are gathered, then a date per link would need to be tracked and therefore infringing on the TLA patent filed by IBM.
Voasi
03-07-2005, 09:19 PM
Matt mentioned looking specifically at sites like ChristopherReeve.org & Tsunami.Blogspot.com - do you think there's any validity to his statements?
Looking at ChristopherReeve.org, that site has been around since 99'. How does that fall into the Sandboxy theory, considering the theory was "started" about a year ago?
Regarding the other site, well, we all know about subdomains and the Sandbox. :)
randfish
03-07-2005, 09:50 PM
Don't know how many of you are aware of this, but it certainly suggests more manual reviews of SERPs - http://forums.seochat.com/t24243/s.html
Nacho
03-07-2005, 10:11 PM
Don't know how many of you are aware of this, but it certainly suggests more manual reviews of SERPs
Yes, I was aware and find it facinating. However, this does not have anything to do with the sandbox. The sandbox can be an algorithmic process, filter or flaw that is not humanly manupulated on Google's behalf, at least that I know of.
notredamekid
03-07-2005, 10:28 PM
i can't believe you guys are still arguing that the sandbox is just a bs explanation for other reasons why a site doesnt rank.
There was an exact date after which millions of new domains were not able to rank, that could before. So whether or not there is a "sandbox" there was clearly a new algorithmic feature implemented at that time.
Even Brett Tabke finally admitted there's a sandbox - that is, after he started his own new site... which he said ranked well for a few days... and then got sandboxed.
As for what the Google reps said, that sounds like the biggest bull**** I've ever heard... Google, who prides itself in handling everything algorithmically, implementing a feature which would probably require thousands of man hours per day? Yeah right, hey Cutts, I have a bridge I'd like to sell you...
Think about it- if they actually told the SEO community what factor 'tripped' the sandbox-- well then, their sandbox wouldn't work anymore. and they obviously want it to work.
PhilC
03-07-2005, 10:30 PM
Millions? Can we see a list as evidence? ;)
Michael Martinez
03-07-2005, 10:39 PM
If Google was really doing something as to how fast links are gathered, then a date per link would need to be tracked and therefore infringing on the TLA patent filed by IBM.
Whether Google is infringing on or using (under license) an IBM patent is not anything I care to speculate on. However, they ARE reporting crawl dates and times for the pages in their cache. So, we know that they are recording temporal data already.
We also know that Google is requesting Last-Modified HTTP headers for pages from any servers that report such information. Ostensibly, this request is to speed up their crawling process (in fact, it should increase their crawling by a considerable rate). Nonetheless, these behaviors draw upon research going back at least as far as 1997.
In case anyone is interested, a summary of references can be found here:
http://64.233.167.104/search?q=cache:JixwELlA4CEJ:einat.webir.org/JASIS_2004_Temporal_Links_Analysis.pdf+Temporal+Li nk+Analysis&hl=en
And while it seems that now everyone is going to poo-poo the idea of the sandbox (it's amazing how quickly the pendulum swings), I think it's important to reiterate the fact that the concept has been a useful metaphor for an otherwise inexplicable phenomonen.
To suggest that people who speak about the sandbox don't understand what is happening is, in fact, a misunderstanding of the metaphor itself -- OBVIOUSLY no one understands what is happening. That is why the metaphor was proposed in the first place. No one (outside of Google) knew what to call the effect, or how to correctly identify what was going on.
notredamekid
03-07-2005, 10:49 PM
Millions? Can we see a list as evidence?
ha, how bout all the domains registered in the last 9 months? :D
PhilC
03-07-2005, 10:51 PM
Pure conjecture. We need real evidence :p
notredamekid
03-07-2005, 11:03 PM
real evidence
I'm being serious. Any site less than 9 months old.
Could you provide a single example of a site that new that isn't sandboxed?
PhilC
03-07-2005, 11:06 PM
I'm just being a bit mischievous. :rolleyes:
I don't create new sites, so I haven't had any experience of the sandbox effect one way or the other.
notredamekid
03-07-2005, 11:28 PM
discrediting two innocent, honest, upstanding people of impeccable reputation, based on some figment of the imagination pulled out of thin air.
I never implied they were bad or dishonest - but let's face it, they do have to keep the workings of their algorithm secret.
It would be absolutely impossible for them to do a human review on every new site. The notion is seriously ridiculous, unless there are 20,000 new hires at Mountain View.
a great misdirection (slight of hand) move.
Sleight of hand is exactly how something like this works. If we knew exactly what caused the sandbox it wouldn't have achieved its purpose now would it?
I remember when everyone lauded GG as being more open than other SE representatives.
Then he was silent for nearly a year about the sandbox - even when there were freaking 100 page (1000 post) threads on the topic in forums at which he regularly posts. The silence speaks volumes. And so does some BS explanation.
I'm not speaking to his character. But let's be honest: Google has silently declared a war on light grey hat SEO.
Marcia
03-07-2005, 11:33 PM
they do have to keep the workings of their algorithm secret.I wasn't quoting you kid, it was someone else.
But just to set the record straight - when we're talking about "they" and "their algorithm" - fine referring to Matt Cutts in that context. But unless there is another Greg who is with Google being referred to, I assume it's Greg Boser being referred to - who is an SEO and marketing consultant - actually, he is in fact one of the finest Google SEOs who ever walked the face of the earth.
notredamekid
03-07-2005, 11:37 PM
I think we were talking about Craig Manning? I think 'Greg' got used by accident.
lots0, referencing the above
Tell us you're kidding - you *must be joking, because it's not even possible that you believe that. I can't believe a slam could be publicly posted directed at and discrediting two innocent, honest, upstanding people of impeccable reputation like Matt and Greg, based on some figment of the imagination pulled out of thin air. You're smoking socks, dude. :cool:
LOL I am not sure if thats a joke or not Marcia, but on the off chance that its not are you of the opinion that either of these two ever reveal anything about the Google algo?
lots0
03-08-2005, 12:39 AM
Sorry folks, I mis-read, I thought the so called "look" was between Matt Cutts' and Craig Manning. Not between Matt and Greg.
My bad... :o
randfish
03-08-2005, 12:41 AM
Now I'm confused. The session in which Greg (Boser - not from Google, but from WebGuerilla) said he called it a litter box was one in which Matt Cutts (from Google) was also on the panel. I never meant to insinuate that these two are in cahoots, simply that Greg looked over to Matt as if to say "You want to say something about this?" and Matt smiled as if to say "No. You go ahead."
My feeling was that Greg and Matt had discussed this before, probably with similiar results (i.e. - Matt keeping quiet). But that is pure speculation that's based on nothing but glances, smirks and facial ticks - nothing to base anything solid on.
I like Greg a whole bunch. I actually got to hang out with him at the Yahoo! party and was even more impressed with his charisma and intelligence outside the panel. There were lots of young, pretty girls who came up to him and introduced themselves - I would have been quite flattered to be in his shoes.
Please do not take away the idea that I have anything bad to say about Matt or Greg (or Craig). I just have little experience knowing whether Google's public comments at events like this are to be taken at face value.
I believe that Matt & Craig & Greg's suggestions were not that ALL new sites were checked out, but only those that "stood out" as gaining links unnaturally fast for their particular sector (someone also mentioned unnaturally focused anchor text as being something that would stand out). So, if we are asking how many sites would have to be manually reviewed, my guess is between 1000 & 5000 sites each day, depending on how strict the algo that finds "unnatural" elements is (I'd also venture to guess that Google could easily adjust the tolerance to get only those sites that stuck out particularly badly).
In any case, I'm just hoping that we can discuss whether we should believe these guys?
BTW, Marcia, from the little I know of Greg, he seems like someone who might be more wounded by the statement of being an "upstanding" member of the SEO community than anything else :) He's a great guy - one of my favorites for sure.
notredamekid
03-08-2005, 12:55 AM
my bad, I guess there's a Craig AND a Greg.
I like Greg a whole bunch. I actually got to hang out with him at the Yahoo! party and was even more impressed with his charisma and intelligence outside the panel. There were lots of young, pretty girls who came up to him and introduced themselves
Wait... isn't this guy an SEO??? ;)
Nacho
03-08-2005, 03:50 AM
Whether Google is infringing on or using (under license) an IBM patent is not anything I care to speculate on. However, they ARE reporting crawl dates and times for the pages in their cache. So, we know that they are recording temporal data already.
We also know that Google is requesting Last-Modified HTTP headers for pages from any servers that report such information. Ostensibly, this request is to speed up their crawling process (in fact, it should increase their crawling by a considerable rate). Nonetheless, these behaviors draw upon research going back at least as far as 1997.
A better explanation of TLA model (http://domino.research.ibm.com/comm/wwwr_seminar.nsf/pages/sem_abstract_238.html) was discussed here (http://forums.searchenginewatch.com/showthread.php?t=2317) in SEW. As it was explained, links are recorded with dates and times. Links are completely different than a page's cache or the last-modified http headers. And also to be clear, Google presents cache retrevied date but a page can be crawled multipe times before it gets cached.
Therefore, if Google was using TLA (recoding dates on links found - not pages) for sandboxing purposes it would be at risk to a potential lawsuit from IBM's patent filling. IBM could not licence such patent because it is still pending to be issued by the US government.
With regards to the question of trust, I think the nod was more a knowing look type saying "hear we go again" type of thing, but thats just what I thought. You could tell in some of the other sessions that Matt was not too comfortable with Gregs jokes on spamming, especially those in reference to link spam. I can not remember if it was this session, or a nother one with Matt and Greg in that it was implied that the sandbox was more a penalty box, where you got different levels of penalties depending on your crime, and those that break all the guidelines 'grandslam spammers' i think they where called go in to this box, and Matt implied Never come out. I ditto rand, Greg seemed like a great guy and didn't seem to hold back anything at any of the sessions - which was especially funny in the Advanced Link Building Forum, where I think Debra / Greg joked during the q&a sesssion 'for the real answers ask us again in the Link Building Clinic when Matts not here!', and then they opened the Link Building Clinic by saying 'forget everything we just said' very good, very interesting.
dannysullivan
03-08-2005, 08:17 AM
So a couple of things as I do some catch-up reading.
1) I didn't attend either of these sessions, off moderating other ones but...
2) It would be weird for both Matt and Craig from Google to be on the same panel. Either one has spoken on our sessions before, but having both on the same panel doesn't make a lot of sense. They'd just duplicate each other. I'm assuming that when Matt and Craig are said to be on the same panel, this was actually Matt and Greg Boser -- who isn't from Google but who is often on the same panels that Matt's on.
3) I've talked with Matt many times on the sandbox issue, the idea that a new site simply won't rank well on Google. He completely disagrees with it. He feels that people will simply concoct whatever solution they seem to feel fits a problem, even if that's not really the case.
4) Others disagree with Matt -- believing there is some type of sandboxing and that while Google may be revealing on some fronts, they won't be on others.
5) The Greg-Matt interaction some have seen is pretty common for any experienced SEO who is on a panel with Matt. Matt knows and likes many of the SEOs he deals with, even if they are black hat or if he disagrees with the views they hold. In the sandbox case, Greg looking over at Matt to see if he wanted to say something is simply Greg in my view kind of joking knowing that Matt doesn't subscribe to the idea and really doesn't have just to say other than the standard denial. And Matt smiling back at Greg as a sort of nod of, "No, you go ahead," is exactly that -- go for explaining the speculation that Google itself doesn't publicly agree with.
6) I think it would actually be better for Google and the other search engines at times to simply state what the speculation is and comment on it directly. Matt knows the various sandbox theories out there. To have Google summarize them even briefly then provide its view, even if it's a "we don't do that," is useful. To be fair, I have seen Google do this type of thing in the past, but it sounds like a hand-off to Greg was seen as easier in this case. And to also be fair, there are a billion different theories about everything out there. You can't comment on them all.
7) It is interesting to hear some of the details that if links suddenly ramp up, that might trigger a review and a possible hold with Google. I actually don't think that's relatively new. Anything that's out of the norm has long been something that might cause a search engine to take a closer look at a site. But perhaps Google is being more forthcoming that automated processes are better as spotting artificial link structures and penalizing for them. Sandbox, litterbox or whatever, it could have some of the impact some have seen.
8) There's also long been discussed the idea of a more manual penalty box, which Google has admitted to before. The idea here is that if your site was identified as spamming and tossed out after a manual review, you might have to "sit out" your penality in the penalty box for a set period of time. That's not the same thing of sandboxing. With sandboxing, the idea is that all sites automatically get "held back" unless something prompts their "release" such as good links or some manual override or other speculation. A penalty box is something applied to only certain sites after a manual review.
Sandbox - IN or OUT? (http://http://forums.searchenginewatch.com/showthread.php?t=1917): A 101-style thread of resources on the topic from across the web.
Compilation of Anti-Sandbox Tactics (http://forums.searchenginewatch.com/showthread.php?t=3984): A member-contributed list of things that seemed to override a sandbox effect.
Sandbox - meaning? (http://forums.searchenginewatch.com/showthread.php?t=3062): A short guide to how various people define it.
SEO World Obsessed with Sandbox (http://forums.searchenginewatch.com/showthread.php?t=3782): Has more observations on the effect, how someone found a "subdomain" route out of it and a few other examples
Google Not SandBoxing Anymore (http://forums.searchenginewatch.com/showthread.php?t=4360): On whether the sandbox is over.
And you may try our SEW Forum Search (http://forums.searchenginewatch.com/search.php?) with the word "sandbox" for many more forum threads.
randfish
03-08-2005, 01:02 PM
I don't know if anyone else was there when Greg (Boser from WebGuerrilla) was asked about the sandbox and he, along with Eric Ward and someone else said "I don't believe in sandbox". At least three of the panelists said ths quickly, then Greg said - "when we want to dodge it we just create the site on an existing subdomain, then 301 it to the new domain after it's ranking."
Danny, you're absolutely right about Craig and Matt never being on the same panel. I hope I didn't confuse anyone. Greg (Boser) and Matt were on 2 panels together and Craig was on one panel where the issue was brought up.
I suppose the classic definition of the 'sandbox' - that all new sites are held back - is not what I define it as. I wrote a while back that the sandbox is:
"The penalty or devaluation in the Google SERPs of sites with SEO efforts begun after March of 2004."
We know when the filter started to become visible to the SEO community and we know it affects sites we work on more than sites that are created outside the SEO community. That alone can give us some insight into some of the factors involved.
siteseo
03-08-2005, 01:17 PM
"If a new site is getting 1,000 links a day, it's going to look suspicious...we look and see if those sites deserve their links."
When Matt said "we LOOK and see" I wonder if that is to be taken literally (manual review) or figuratively (processed via algorithm). Seems to me that ALL sites are dumped into the sandbox and may only get out when certain criteria are met...a "guilty until proven innocent" model.
This also flies in the face of Google's statement that "there is almost nothing that someone else can do to negatively impact your rankings." Hah. If I go out and purchase run-of-site links for my competitor's new site, by Matt's definition they'd be penalized. I think it's unethical to hold back (penalize) a site because of inbound links, as that is an external factor. They can certainly be DIMINISHED in their impact towards rankings - that would be logical. But links shouldn't trigger a penalty. I don't buy this explanation anyway, because even WITHOUT obtaining hoards of links a new site will STILL not rank well in Google.
PhilC
03-08-2005, 01:37 PM
I have no personal experience of the sandbox effect, because I don't create new websites. But I've read enough to believe that all new sites are affected, as people say, and I've never seen anyone show the URL of a new site that didn't get sandboxed.
It has long made sense that search engines would want to do something about the large number of new sites that are created solely for se promotional purposes. The engines would also want to do something about links that are organised solely for se promotional purposes. Those kind of things are identifiable by human eyes, and some of their characteristics should be identifiable programmatically - e.g. the growth rate of new IBLs.
So would it make sense if the sandbox is simply a quarantine, into which new sites are placed, and programmatically monitored over a period of time to check for any of the identifiable characteristics? When a period of time has elapsed, without any of the characteristics showing up, then the site is released. That could account for the different sandboxed periods that people report.
It's very similar to what siteseo just said:-
Seems to me that ALL sites are dumped into the sandbox and may only get out when certain criteria are met
The criteria being that a certain period of time has elapsed without any of the characteristics being seen (by the program).
Robert_Charlton
03-08-2005, 03:16 PM
I don't know if anyone else was there when Greg (Boser from WebGuerrilla) was asked about the sandbox and he, along with Eric Ward and someone else said "I don't believe in sandbox". At least three of the panelists said ths quickly, then Greg said - "when we want to dodge it we just create the site on an existing subdomain, then 301 it to the new domain after it's ranking."
randfish - This may have been your point, but I don't understand why anybody would take the trouble to "dodge" something they don't believe in.
The above exchange is further confusing for me because 301ing an existing domain to a new domain was, in my experience, one of the sure ways to put a site into the sandbox. From reports I've read, I was certainly not alone.
I say "was," not knowing whether this still happens (or perhaps I should say "still doesn't happen") with new domains, since all of my existing sandboxed sites were sprung from this figment of my imagination, by chance all at about the same time as Microsoft rolled out its new search engine. ;)
Were any comments made about whether this non-existent phenomenon is still being sighted?
It had been my guess, by the way, that the technique Greg is said to have described is one of the things that would have prompted the sandbox in the first place if the sandbox had been/is about fighting network spam.
Michael Martinez
03-08-2005, 03:35 PM
A better explanation of TLA model (http://domino.research.ibm.com/comm/wwwr_seminar.nsf/pages/sem_abstract_238.html) was discussed here (http://forums.searchenginewatch.com/showthread.php?t=2317) in SEW. As it was explained, links are recorded with dates and times. Links are completely different than a page's cache or the last-modified http headers.
Links come from pages, and their times can only realistically be derived either from the crawl or the page.
...And also to be clear, Google presents cache retrevied date but a page can be crawled multipe times before it gets cached.
The point was to show that Google does indeed record temporal data, not to argue that they are or are not doing anything with it.
So, there is no basis for arbitrarily ruling out any temporal-based algorithm by Google, regardless of the status of the IBM patent.
Marcia
03-08-2005, 09:02 PM
I've heard it said for years that links too fast will "raise a flag".
However, I find it hard to imagine that a technology-focussed company like Google is going to assign issues of relevancy on this scale entirely to human review.
I've also heard that certain things can "raise flags" - but why do we have to assume that when flags are raised each and every one needs human review? They are a search company and they have certain tools internally. Isn't it possible that they've got technical solutions in place for sites that raise flags?
Michael Martinez
03-09-2005, 12:06 AM
I've also heard that certain things can "raise flags" - but why do we have to assume that when flags are raised each and every one needs human review? They are a search company and they have certain tools internally. Isn't it possible that they've got technical solutions in place for sites that raise flags?
Here is some speculation.
Suppose that Google implemented some sort of redirectional filter last year intended only to take a small percentage of sites out of the stream of new content. Let's suppose this filter worked pretty much as expected and therefore could not explain ALL of the sandbox effects people have described. Only a minority of sandbox sites would be caught by this filter.
Still, they could be dealing with thousands of sites that need to be validated. So, do you rely upon random sampling or do you check each and every site manually (at least up to a certain point) to confirm or adjust your algorithm?
When I was writing massive data conversion projects, I often did the same thing. It was tedious work but necessary. One just never knew how many variations on "Smith Avenue" bored typists could come up with.
Web sites are much more complicated than the average chunk of business data.
randfish - This may have been your point, but I don't understand why anybody would take the trouble to "dodge" something they don't believe in.
The above exchange is further confusing for me because 301ing an existing domain to a new domain was, in my experience, one of the sure ways to put a site into the sandbox. From reports I've read, I was certainly not alone. ...
Not exactly the same as 301ing from one domain to another but, I have recently (January) worked on a small (25 pages) site which previously existed as a subdirectory of another domain, and which was moved to its own domain on a dedicated IP address as part of an SEO project.
Once the site SEO work was complete the site went live on the new IP address with its own domain name by way of a 301 on the old server, and within two days the site was spidered and ranking well in Google, and after another week on Yahoo and MSN.
This site survived the early Feb update with some ranking drops in Google but still has 15 first page rankings in Google on moderately competitive commercial terms.
Based on this I do not feel that 301ing a existing site from a subdirectory to its own address is going to result in a sandbox, though I must admit that this site does not rank well for the domain name, but since that is a dashed combination of its most competitive keyword, that might just be the competitiveness of the keyword coming into play not the sandbox.
Robert_Charlton
03-09-2005, 06:27 AM
Not exactly the same as 301ing from one domain to another but, I have recently (January) worked on a small (25 pages) site which previously existed as a subdirectory of another domain, and which was moved to its own domain on a dedicated IP address as part of an SEO project.
Mel - This may have something to do with the timing, or I could assigning completely wrong reasons to why an industry leader with several hundred inbounds disappeared (in Google only) after it was redirected to a new domain. It came out of the sandbox in early to mid-Feb, after having been buried for seven months.
I don't know whether a redirect now would create a problem at all. I'd be happy to get some sort of handle on current user experience, but I don't want to take this thread off-topic into that question.
I was wondering out loud about Greg's reported comments and what he may have meant by them. Eg, was he being ironic in saying that he didn't believe in the sandbox and then saying, "when we want to dodge it..."? Ditto with the workaround.
dannysullivan
03-09-2005, 07:31 AM
Hah. If I go out and purchase run-of-site links for my competitor's new site, by Matt's definition they'd be penalized.
Potentially, and it is an issue that can cause concern. But I suspect you'd see other factors also come into play. If the only influx of new links to your site were from run-of-site purchases, that would be unusual. A more "normal" site probably has a number of "natural" links along with the run-of-site links. Google might detect this either through a manual review or through some automated fashion.
Again, I completely understand your worry. And it could be that Google is indeed mistakenly catching sites that are being targeted by competitors in this way. But I suspect they also have some type of normalizing features also in place to help them better understand when a site or page might tip into discount/penalty mode.
I, Brian
03-09-2005, 08:36 AM
I know my opinion/position ruffles some feathers, some folks have tried to make a career off the so called "sandbox". But like I said I have yet to see a page that was "sandboxed" that did not have a reason (other than the sandbox) as to why the page was not ranking. Lots0, register an entirely brand new domain name, then set up the best quality content pages you can build, for any major commercial search terms.
Then set up a few hundred thousand links across a few thousand IP ranges, based on keyword anchor text describing those pages.
Within a couple of months, you're pretty assured a strong position on Yahoo! and MSN for those keywords. A year later, you would likely be asking why you still don't rank on Google for any of them.
A pint at any SES next year says you can prove otherwise. :)
Overall, though, I can only see Google implementing an automated control, and that either a direct WHOIS look up, or algo based on page caching dates, is used as part of that process we call the Google Sandbox.
Whichever methods and parameters are actually used, the practical reality is that relatively newer domains are simply harder to get rankings for than relatively older domains.
I can see the benefits of Google using this for trying to tackle spam issues - but it still seems like punishing the majority for the infractions of the minority - teacher putting the entire class into detention because of a few misbehaving boys at the back.
Personally, I always hated that - when you feel you have legitimate sites affected in this manner, it simply feels so unfair. But the world isn't fair, and business is business - so it can simply be a case of expanding your methods to achieve your goals. I never touched anything really blackhat until I hit the sandbox for the first time last year. I didn't continue those methods, but there are some ugly stains on my hat now from it.
While the Google Sandbox remains mysterious, and webmasters have no real information to work with, then there's little we can do to try and work with the system - so no wonder Google are sitting quiet on it.
But I'm still not convinced that the sandbox approach is necessarily the best way to tackle spam issues. Not when that essentially condemns links simply for being links when it comes to newer domains. Surely issues of relevancy and quality are core issues in search?
Xan has posted some interesting perceptions (http://spaces.msn.com/members/search-science/Blog/cns%211pRwySEtDnGfJ2qQMroazHIg%21232.entry) on SEO, and helps show why Information Retrieval dislikes SEO - it sees us as diminishing their quality of results. Yet the counter argument is surely that at its heart SEO is simply about improving accessibility (http://www.platinax.co.uk/blogs/brian/archives/2005/03/ir_researchers.html), as I've written about in reply.
The number of SEO's here who might try and rank off-topic content - such as porn sites for the keyword "Disney" - are surely diminuituve - the buzzwords are conversion, and if the traffic doesn't convert properly, it can be an effective waste of money to develop. So SEO nowadays is surely focussed on relvancy - and that surely means there can be a broad area of overlap between the interests of SE's and SEO's.
The sandbox seems a slap against that - an indication that Google will try and protect it's index against influence as much as possible - even if that means a loss of relevancy - yesterday's search today, today's search in the sandbox. The views of old and often outdated content as regarded as more important that today's updated content.
Whatever the sandbox is, it also seems to be changing. Last year, when I first encountered it, a straight 3 months was about all that was required to leave it. Now it seems that factors such as numbers of links acquired play a bigger role, and that it has become less of the Google Sandbox as the Google Quicksand, where newer domains are probably best approaching link development in small doses - because in too large number too quickly it looks like you can get stuck, and if you try and struggle by throwing more links in, you could just add to the problem, rather than escape it.
Either way, the core question was how much reliance should be made on suggestions that Google holds sites back, and involves a process of manual review of that. Certainly Google is holding sites back under a set criteria, and this process is almost certainly mostly automated. If it were simply a case of manually reviewing them, then a lot more sites should be out of it sooner.
And if anyone here still disbelieves in the sandbox, then you're welcome to follow the advice I gave to Lots0. :)
2c rant & ramble.