PDA

View Full Version : 2005 Year End Revisit: Is There A Google Sandbox?


Mike Grehan
12-21-2005, 04:38 PM
Moderator Note: Thread split from Getting Out Of Google Sandbox Using Subdomain & Redirection (http://forums.searchenginewatch.com/showthread.php?t=9137)

Mike Grehan, said '...the sandbox doesn't exist...'. ... Nope that was me ... we where on the same panel ...

Tut, tut, tut...

Yes I did say the sandbox doesn't exist Dave.

And I've said it at every session at every conference (dozens of them this year), since whichever "I don't understand marketing" nitwit came up with this half-baked theory about nothing, which never affects a well marketed business.

In fact, insofar as the Chicago panel is concerned, I actually said it here, before the session:

http://www.mikegrehan.com/2005/12/sandbox-about-as-real-as-pagerank-is.html

I said it here in my ClickZ column, after the conference:

http://www.clickz.com/experts/search/results/article.php/3569996

And I'll say it again... If your web site doesn't rank anywhere at all at a search engine - it's probably because it has no differentiating appeal or simply because it sucks.

What you actually need is called advertising and promotion and it has nothing to do with code of any kind, or subdomains or servers, or...

Here's a tip I'd like to pass on for Christmas. Don't read anymore of this thread or any others about this ridiculous notion of a sandbox.

Buy a good book on marketing instead.

Happy reading, happy holidays and all the best for the new year.

mcanerin
12-21-2005, 05:39 PM
Mike, I'd like to respond to that with a conditional "I agree".

There is no Google sandbox, at least as it is thought of by many people.

Well, there is, but people play beach volleyball on it over at the Googleplex. I've been there, and I'm willing to sell you, dear reader, a vial of genuine Google Sandbox sand for only $19.95 + shipping. Yes! Own a piece of the sandbox now, just in time for Christmas! Buy N.. Oops, sorry. I've been writing too much sales content lately... :o

Anyway, there is an effect that some people are calling the sandbox, but frankly it reminds me of those so-called "syndromes" that are just excuses for medical papers and herbal marketing. Seems like nowadays there is a special "syndrome" for every war and police action that comes around, for example. At a certain point, over-labeling stops people from focusing on the real issues, which is the root cause in the first place.

Yes, there is an issue that shows up. But it's not some sort of attack on new webmasters. It's just that new websites are most vulnerable to it. Big deal. New websites are also vulnerable to "link dearth syndrome" (not enough links), "content deprecation syndrome" (not enough content) and so forth...

I'm not trying to sound insensitive, but the fact that the hurricane in NO harmed certain segments of US society more than others does not mean that God or mother nature is out to get those people specifically or punish them for some misdeed. They just happened to be more vulnerable to it's effects and were living in the wrong area. The blame, if any, lies elsewhere (probably as to why they were vulnerable in the first place). Blaming it on some personal attack by the weather gods won't do any good, and distracts from the real issues and solutions.

Likewise, the effects you see that can be described as the Google sandbox apply to all sites, but some are much more vulnerable than others due to a lack of historical data, among other things.

I was sloppy in using the term earlier because I knew everyone in this thread knew what I was referring to. But anyone thinking of this as an attack by Google on new sites will not be in a mental position to figure out how to deal with the effects.

If it was an attack, how come there are so many exceptions? How come you can be placed into a "sandbox" even if you have an old site if you mess around with redirects the wrong way?

Nope, that's backward thinking. There are real issues, but most people who are feeling victimized are blaming the wrong thing for the wrong reason, IMO.

Additionally, as with all the other major buzz phrase syndromes and filters in the industry, some people with, let's face it, crappy sites and a tendency to refuse to take personal responsibility for things, have latched onto this as the root cause of all their problems. If you can blame the sandbox, well, I guess it's better than blaming yourself! :rolleyes:

On a completely different topic, I'd like to announce that I've discovered "link prophylactic syndrome", whereby some links (especially paid ones) don't seem to work the way some people expect them to.

This is obviously a vicious attack on life, liberty and the pursuit of happiness, and is intended to punish new website owners and innocent spammers by taking away their livelihood, and just before x-mas yet! Together we can find a cure. Donations can be made to.. Oops, there I go again....

Ian ;)

dannysullivan
12-21-2005, 06:01 PM
I'll try to strike the middle ground between Mike and Dave (who are both non sandbox believers, though Dave does see a sanbox effect) and those still hanging on to the idea of a sandbox period.

Is there a sandbox? Various people say no, including Google, in the sense that all sites must go to Coventry for a set period of time.

Is there a sandbox like effect? Yes, various marketers have seen this happen, and Google itself has said there are various filters that can cause a new site to have to sit in a waiting area, if it were.

Is there a great deal of confusion? Yes. To me, the idea of a sandbox has become the latest run in excuses for why a site doesn't necessarily do well, the universal truth trotted out for everything regardless of whether it really exists.

We've had years of this. Oh, the site's not ranking well because search engines are now doing themes, which they never ever did and to my knowledge still do not do today. But building a "themed" site often meant people took a big site and divided into smaller ones, thus increasing the chance of getting more pages indexed. It also meant that they end up having a number of very targeted home pages, your most important page, rather than one single home page diluted on many topics. That also mean they got more links pointing at the home pages of their more targeted sites. And often, in the process of building their "theme" sites, they created BETTER content.

All of this had nothing to do with the traditional "theme" idea that a search engine would magically look at all of the pages in your web site and reward those that were more on a particular topic. It should have been blindingly obvious to anyone that search engines did not and still do not operate in this way, otherwise Amazon or Wikipedia would never rank for anything, since they lack any theme at all. But because the medicine you took to cure your "theme disease" showed an improvement, themes became a reality for many people.

Oh, my site's not doing well because of term vector stuff now being used. Oh, it's because LSI is now big. Oh, it's because the sandbox. In 2006, I'm sure we'll have some other smoke-and-mirrors type thing that will get trotted out.

If a site isn't ranking, that's not necesarily because of a sandbox effect. It's probably because of a variety of other reasons, but the sandbox is an easy excuse for people to blame or they blame without knowledge.

If it's a brand new site with but not really seeming to rank as well as you might expect, then *possibly* some of the sandbox effect people have reported and Google itself says exists might be to blame. But I think it's fair to say, the number of people who say they've been sandboxed far exceeds the actual capacity of the sandbox, virtual or real where volleyball is being played :)

Mike Grehan
12-21-2005, 06:34 PM
Ian,

Nice post. Allow me a little riposte or two here...

New websites are also vulnerable to "link dearth syndrome" (not enough links), "content deprecation syndrome" (not enough content) and so forth...

Link dearth - usually one of two things:

a) Nobody's ever heard of you, so how do they know what to link to?

b) People have heard of you but there's no more reason to link to you than there is to anyone else in your sector.

As for, not enough content?

Try this at Google. Search for:

(single word) Currency

And then...

Currency converter

Change currency

Foreign currency

And other variations. At the top of the pile (one or two) you'll always find xe.com. It's ONE page, with little or no copy. Just a functional tool to use. How much more than that do they need to rank at number one for their specific keywords? NOW THAT'S CONTENT!

Okay.

Here's a challenge to the sandbox believers. Show me an international brand, like Virgin, for instance, which launches new sites for new brand extensions over and over, and gets involved in tactical promotions online, co-promotion, sponsorship and marketing of every kind... that ever got (so called) sandboxed.

Marcia
12-21-2005, 07:05 PM
Here's a challenge to the sandbox believers. Show me an international brand, like Virgin, for instance, which launches new sites for new brand extensions over and over, and gets involved in tactical promotions online, co-promotion, sponsorship and marketing of every kind... that ever got (so called) sandboxed.They've got the massive traffic to begin with, which gives their new brand extensions enough oomph for usage statistics to go through the roof going out the gate.

Take the same type of promotion done by a non-major, non-international brand like Virgin and see how they fare without the initial traffic & usage impetus.

The rich get richer.

Mike Grehan
12-21-2005, 07:24 PM
The rich get richer.
Correct, Marcia.

And how did they get rich in the first place?

Richard Branson started Virgin in a tiny little rented flat, above a furniture shop in London. Just like a brand new, tiny little web site, he had to grow his business out of nowhere.

He had vision and knew how to market himself and his business. And in my own experience, if you know how to create awareness campaigns and build brands, you succeed very well at search engines.

It’s not just a case of being available online – you have to be in demand. And that's a marketing exercise, not a technical process.

Marcia
12-21-2005, 08:05 PM
Richard Branson started Virgin in a tiny little rented flat, above a furniture shop in London. Just like a brand new, tiny little web site, he had to grow his business out of nowhere.That he did, but I'd like to see him start from scratch like that today, without the benefit his newer properties can now derive from his older properties.

Domain Name: virgin.com

Created on..............: Wed, Sep 10, 1997
Expires on..............: Mon, Sep 09, 2013
Record last updated on..: Fri, Jan 14, 2005

Mike Grehan
12-21-2005, 08:22 PM
Don't seem to remember seeing this one in the sandbox, at all. Do a search for Shayne Ward, this month's winner of the X factor in the UK

shayneward.org.uk
Registered on: 22-Nov-2005

rustybrick
12-21-2005, 11:17 PM
Mike,

Are you saying you never launched a site in the past year or so that had sandbox like symptoms?

I can tell you I have launched both types of sites; some that have no such affect and some that do.

I think the most critical issue is how people define the sandbox.

I would not say there is no sandbox. Possibly the first time I every argued with you, Mike. I have that feeling of shame, when writing this.

Jill Whalen
12-21-2005, 11:56 PM
I don't generally disagree with Mike either, but in this case, he's flat out wrong.

And I'll say it again... If your web site doesn't rank anywhere at all at a search engine - it's probably because it has no differentiating appeal or simply because it sucks.

If you've ever worked on a brand new site in the past couple of years since the aging delay has been in effect, you would know for sure that there most definitely is an aging delay and the only way out of it is time (or apparently a few tricks that a few people are starting to learn).

If there were no sandbox, then the same techniques we were using years ago to get listed right away in Google would still get sites listed right away in google. Now, they definitely DO get listed in Google, but not until approximately 9 months have passed.

I didn't believe it either until I worked on a brand new site. I am lucky that I usually get to work on existing sites, in general. And I can tell you that as long as G continues to implement an aging delay on sites, I won't be working on any brand new domains.

It's just too frustrating. You can have the best site in the world, and Google will agree that it is indeed the best site, but only after 9 months or so have passed.

I understand why they do it, and I even agree with it in many respects, as any long term business proposition can certainly wait 9 months to see Google success.

But, there most definitely IS an aging delay, that is a fact.

I also agree with Danny that it's an easy excuse for many webmasters as well.

The sandbox isn't a penalty, it's simply an aging delay until trust can be established for a new domain.

Marcia
12-22-2005, 01:16 AM
Aside from the tongue-in-cheek tinfoil hat Adwords reference, this is why I continue to believe there's a significant traffic metric involved

http://www.webmasterworld.com/forum34/920.htm

Now, a few people have asked me about the thread I referenced in the MSN forum at WmW where the lady seems to have nailed it. I'll have to dig that out, but in short she's talking about using the Alexa Toolbar to do deep research to avoid the "sandbox effect."

And in another thread (which can no longer be found) she mentioned very clearly - keeping in mind that her main business site is years old and very heavily trafficked - that when her company launches new sites, they link from their established, heavy trafficked ones to give the new sites their jump-start. Heavy traffic going out the gate. No sandbox.

In view of what I'd posted about usage statistics in rankings a few months before, when I came across those posts (and others that are around by other people with sites with heavy traffic as well), it started to make even more sense. I'm no scholar, but it all seems to fit together.

What is the main thing that Alexa measures over time?

Added:

Don't seem to remember seeing this one in the sandbox, at all. Do a search for Shayne Ward, this month's winner of the X factor in the UK

shayneward.org.uk
Registered on: 22-Nov-2005Ah, the BUZZ factor. ;) Significant buzz creates significantly heavy traffic (usage statistics), isn't that the case?

But being just a Yank from the left coast, I'm afraid I'll need a little help on this one, never having heard of Shayne Ward. May I ask what competitive commercial keywords that site is ranking for at Google?

DaveN
12-22-2005, 05:38 AM
http://www.google.co.uk/search?q=WINNER+OF+X+FACTOR

hmmm whats is then ... BAD SEO .. so if you don't rank then it's bad seo

DaveN

Robert_Charlton
12-22-2005, 05:53 AM
I generally don't disagree with Mike either... and I believe in success the old fashioned way and all that, but, call it the "sandbox" or the "deep freeze" or "limbo" or whatever, there's a factor or a group of factors that works against new domains.

Like Jill, I believe I understand why Google's doing this. I disagree with her that it's "simply an aging delay"... I think it's probably a complicated delay, where several age-related factors multiply each other and look almost exponential when they do. But I do go along with the second part of her statement: "until trust can be established for a new domain." In some cases, sites may never establish this trust. In others, I think Google has some problems in not being able to evaluate what's trustworthy a lot faster.

I'm afraid I'll need a little help on this one, never having heard of Shayne Ward. May I ask what competitive commercial keywords that site is ranking for at Google?

Marcia - I'd never heard of Shayne Ward either. Apparently, he's the competitive phrase... there are 127,000 pages on Google with exact matches for his name.

I'm not sure that the Shayne search is a good illustration of what I think Mike is trying to demonstrate... that something brand new can rank with enough buzz and traffic. Because all the sites targeting Shayne are probably the same age, I think it's likely to be a pretty level playing field, but I don't know much about Shayne or these sites.

X Factor winner Shayne Ward has set a new record for the fastest-selling single of 2005. Music retailers predict that Shayne's debut, That's My Goal, could be one of the fastest-selling singles ever....

...Woolworths said copies had flown off the shelves at seven times the rate of the previous fastest-selling 2005 single, the Crazy Frog.

And Matt Cutts had a college roommate whose site got lots of press and beat the non-existent sandbox too. In this country, we can all win the lottery, and we can all become President, and we all may have 15-minutes of fame... but let's not pretend that not being this exceptional always suggests the absence of merit or of marketing virtues. Agreed, there are some who are using the sandbox as a catch-all excuse. But I feel that the delay in ranking is all too often about collateral damage, and there are some extremely meritorious sites, companies, and organizations that are unfortunately being affected.

(Added... and DaveN's X-Factor search suggests that Shayne may still be in limbo.)

dannysullivan
12-22-2005, 07:24 AM
I'm tempted to split this thread into a new "Is There A Sandbox" one but lack the energy, at the moment :)

Some of what we're getting into has been discussed, and discussed, and discussed. How does Matt's roommate's site escape the sandbox if it hits all new sites? One reason might be that some feel like if you get good links from very trusted sites, bang -- you're out of the sandbox.

Others who don't believe in a formal sandbox but instead in filters that may hold back suspicious content are on the same page. Got a new site? Want Google to know it is trustworthy and should rank well? Pick up some good links.

It sort of comes back to what Dave's talking about. These older sites aren't just doing well because they are actually old. They've established some trust through their age, as well. So if you subdomain, Dave's finding you pick up trust from that domain pretty similiar to what Barry was finding for those making use of the com.uk subdomain. Of course, outing that made it an easy target for Google to discount trust :)

SEO World Obsessed with Sandbox (http://forums.searchenginewatch.com/showthread.php?t=3782) has Barry talking more about that technique, plus lots and lots of debate on whether there is a sandbox and so on.

Other past threads well worth reading:

Symptoms of Exiting the Sandbox (http://forums.searchenginewatch.com/showthread.php?t=6678)

Compilation of Anti-Sandbox Tactics (http://forums.searchenginewatch.com/showthread.php?t=3984)

Does New Google Patent Validate Sandbox Theory? (http://forums.searchenginewatch.com/showthread.php?t=4978)

Level of Trust for Matt Cutts' Sandbox Explanation @ SES NYC (http://forums.searchenginewatch.com/showthread.php?t=4524)

Filthy Linking Rich (http://forums.searchenginewatch.com/showthread.php?t=2063)

At the point, you've got plenty of search marketers that I trust seeing some type of sandox effect / aging delay and you've got Matt Cutts saying that such things may be happening. Yep, you can't absolutely trust everything a search engine says, but I feel like in this, Matt is simply acknowledging what many people have seen.

So for me, the debate isn't about whether a new site might have difficulties ranking. Yes, they might. The debate is really about whether every new site will have problems ranking. The answer to that I think is firmly no. There is no formal sandbox where every new site must sit things out. There are elements of the ranking algorithm, however, that may make it harder for a new site to immediately take off when compared to the past. However, other elements such as trusted links can counteract that.

DaveN
12-22-2005, 07:49 AM
OK .. i have been testing something out ..

and if Mike agrees .. that i have the skills has an SEO to rank in the search engines .. then this may shed some light on the sandbox effect

I took two sites ..
one domain I have held no content just a holding page for 2 years (old domain )
one domain I which is 1 month old (new domain)

both domains have got around 500 pages indexed now, seo and linking the same on both .. blah blah

BUT the 2 year old domain gets 45 times more traffic than the new domain, yes both rank for certian terms, but he new domain is missing KEY keyword places, in fact for some keywords it's not in the top 999.

both sites are in the same industry and compete against each other.

DaveN

DaveN
12-22-2005, 07:56 AM
example : i will use mikes

http://www.google.co.uk/search?hl=en&q=shayne+ward

http://www.google.co.uk/search?hl=en&q=shayne+ward+x+factor oops thats missing here :)

DaveN

Jill Whalen
12-22-2005, 09:32 AM
Google seems to give names of people searches different results than just phrases. If you have a new site and mention the site owner on an about us page, for instance, one of the few phrases you can often rank with in Google when you're being affected by the aging delay, is that site owners name.

Beats me how they do it, but it's one thing I've noticed. Now, it could simply be the competitiveness (or lack thereof) but when you view the results, they "feel" different for names than they do for keywords, for whatever that is worth.

I do find it interesting and slightly amusing (when it's not my site) that one day a site is apparently not relevant for pretty much anything that it is related to, and the next it's considered relevant to everything that it is related to. (After a good 9 months have passed though.)

notredamekid
12-22-2005, 11:15 AM
1) Of COURSE Google is rewarding domains with an earlier creation date. DaveN's experiment confirms this, I've done similar experiments with similar results.

2) Of COURSE Google is rewarding sites with trusted links. "TrustRank is the new PageRank"

3) Of COURSE a new site will not have #1, it probably will not have #2, and thus it will not be ranking "like it should".

This is the Sandbox.

And of COURSE the Sandbox has become the catch-all excuse anyone can use for not ranking a new site.

Is anyone really disagreeing here?

notredamekid
12-22-2005, 11:27 AM
I disagree with some of Mike's assessment. There is a clear trust/age delay for new sites in Google. You can't just blame it all on bad marketing.

BUT I agree with him that a lot of it CAN be chalked up to marketing. In the real world, authority/branding/trust takes time to build. Why would search engines want it any differently?

Look at Mike Grehan (as a person, not as a Web site). Is he necessarily smarter than anyone else in this thread?

No.

But his SEO career's "creation date" is a lot older than most. Likewise, he has obtained "trusted links" (recommendations) from authorities like Danny Sullivan.

So when he says "there is no sandbox", his statement get blogged on SearchEngineRoundtable. Yet a thousand other SEOs (possibly just as smart or smarter) with less trust, age and authority have said the same thing as him, and no one blogged it.

This is how the real world works, and why should we be surprised when search engine algo's mirror the real world a bit more as time goes on?

Take the age/trust delay, add in a bit of Filthy Linking Rich phenomenon, and of course the optional "bad marketing" and you get this 23-headed monster known as the Sandbox.

AndyBeal
12-22-2005, 11:33 AM
...so I'll add my 2 cents. :)

From my experience there is absolutely some kind of maturing process that "some" sites go through. I don't believe there is a sandbox that everyone must endure, but newer sites do tend to take a whole lot longer to achieve any meangingful rankings - even when they have fantastic content, highly relevant info and are indeed worthy of being #1 for the desired search term.

So I agree with my good buddy, Mike - there isn't a sandbox. But I also agree with Jill, newer sites do often have to wait.

DaveN
12-22-2005, 11:47 AM
Andy ... was that a "I can and can not confirm the exisitance of an effect that some people may or may not call the sandbox"

DaveN

jkemp
12-22-2005, 11:50 AM
All I can say is what I have seen from personal experience. The company I work for launched a web site for a very high profile hotel that was being constructed in downtown Knoxville, TN, http://www.cumberlandhousehotel.com. The site was launched in late summer. Because it was so high profile, it gained a lot of links locally very quickly. A few months after launch, the Google Toolbar showed it had a PR of 4, but it did not show up in the results even when searching for the hotel name plus the location. If the sandbox doesn't exist, how do you explain that?

Google has to have some kind of filter on domains that are a certain age. I believe in this case that the site got too many links too fast in addition to being new and Google deemed them to be unnatural. How do you explain that to a client? I would consider that good marketing, advertising, pr, whatever. Now granted, the site has almost no content, which I explained to them was not good. I think this could explain why it still does not appear in many searches even though it shows up now when you search for the name although it is still like 3 or 4 down. I suggested to them to use Google Adwords and after they saw they got no traffic, they were convinced it was worth it. Now most of their search traffic comes from Google ads. Isn't that what Google wants anyway?

DaveN
12-22-2005, 11:52 AM
Jkemp thats something different .. sorry ;)

DaveN

AndyBeal
12-22-2005, 12:03 PM
Andy ... was that a "I can and can not confirm the exisitance of an effect that some people may or may not call the sandbox"

Hah..so sitting on the fence doesn't work around these parts? ;)

Is there a "sandbox" for new sites? Yes!
Does every new site have to endure it? No.
Do I know what criteria is used to judge which sites get put in the sandbox? No

DaveN
12-22-2005, 12:06 PM
Do I know what criteria is used to judge which sites get put in the sandbox? No

I will stand beside you on that one too !!

DaveN

DaveN
12-22-2005, 12:08 PM
I posted this a few week ago .. still going strong the domain was brand new !!

brand new site launch 6 weeks ago . last weeks logs

1 No Referrer 5,296
2 http://www.google.co.uk 3,103
3 http://www.google.com 1,580
4 #*$!xxxxxxxxxxxxxxxxxxxx 901
5 xxxxxxxxxxxxxxxxxxxxxxx 714
6 http://search.msn.co.uk 336
7 http://www.google.com.au 120
8 http://www.google.ca 83
9 http://aolsearch.aol.co.uk 79
10 xxxxxxxxxxxxxxxxxxxxx78
11 http://www.google.nl 76
12 http://www.google.fr 66
13 http://www.google.se 57
14 http://www.google.de 53
15 http://www.google.it 44
16 http://search.msn.com 40

brand new domain ... xxxxxxxxx are sites that you will need to find imo

DaveN

bewarne
12-22-2005, 03:05 PM
For what it is worth, here is my experience with the sandbox/age/credibility thing at Google:

1) Launched site about the TV show "House" (http://www.housemd-guide.com) on May 13.
2) Submitted to Google, Yahoo and MSN at that time and got links from several sites about the show (Internet Movie Database, TV.com, all the other sites about the show or the stars). Got a link from Open Directory within a month or so. Also linked from my highly rated "West Wing" site. And the site started being mentioned in the forums.
3) All three search engines had the site listed in their databases within a few weeks.
4) And for Yahoo and MSN, the site was coming up for competitive words soon thereafter: Words like "House MD" or "House Fox TV Show". Was #1 or #2 iin MSN for several major keywords within a couple of months.
5) Google would have the site come up for anything that my site was just about the only site covering/using those words in any kind of decent density way and was just about the only site dealing with those words, subjects.
6) At 5.5 months (just short of the usual six months I had been hearing about), Google started listing my site on the first page for everything connected with the show. No major change to the site happened at that time.

What this indicates --- I do not think it proves anything--- is that you can come up in Google immediately for non-competitive words (your name, or information that you have that few other sites have) but for anything competitive, you at first end up last.

The speed with which the site got out of the sandbox effect, may be due to its informative instead of commercial nature.

Jill Whalen
12-22-2005, 03:12 PM
What this indicates --- I do not think it proves anything--- is that you can come up in Google immediately for non-competitive words (your name, or information that you have that few other sites have) but for anything competitive, you at first end up last.


Yep, those are the exact same symptoms reported by thousands of others who've created new sites. Very standard.

Marshall Clark
12-22-2005, 03:15 PM
Have to put my $.015 in on this.

From personal experience I see evidence for a number of hoops that new sites are required to jump through in order to develop competitive rankings. Older sites may already innately satisfy these requirements and not be aware of their existence.

Call it a “sandbox”, “a series of complex filters”, or “TrustRank” – when you get down to it’s just semantics ;) (sorry)

The lack of a common vector in the SEO community regarding the cause of the “sandbox” might suggest a number of factors being analyzed in combination with only a portion of them having to be satisfied at any one time.

Wild speculation conclusively demonstrates that sandbox criteria may include:
* Age of the domain
* Age of inbound links
* Reputation of linking sites (and their links, etc.)
* Age of inbound links of linking sites (and their links,...ad infinitum)
* Presence of an inbound link/301 from an established, parent company (likely one with few other outbound links)

Best way to avoid the sandbox in client websites:
* Only take on Fortune 500 clients

JasonD
12-22-2005, 05:23 PM
A damn fine research document!!! (http://www.iana.org/cctld/cctld-whois.htm)

Disclaimer:There may or may not be a reason or why the above URL has been placed in a thread that may or may not be discussing an effect that may or may not have a name that is related to a container that may or may not contain silicon based granules

graywolf
12-22-2005, 05:41 PM
I have two sites each with half a dozen pages about the same subject. Both have lots of low quality inbound links from SEO owned sites with optimal exact anchor text, both are nowhere in the top 1000.

I picked up an old site from 1999 put up one page with one paragraph about the exact same topic. No external links with the desired anchor text. The only link is internal and comes straight off the home page. Within one week of putting the page up it went to number 15. The site doesn't have lots of high quality inbounds but does have links from a wide variety of IP's none of which are from SEO sites.

randfish
12-22-2005, 05:42 PM
Mike, you know I'm a big fan, but I do have a very good sandbox example - I believe this is exactly what we're all talking about when we say sandbox:

www.seobythesea.com
Registrar: TUCOWS INC.
Whois Server: whois.opensrs.net
Creation Date: 19-jun-2005
Expiration Date: 19-jun-2006
Registered by Bill Slawski (bragadocchio)

Google - "seo by the sea" - http://www.google.com/search?q=%22seo+by+the+sea%22

- 15,800 results
- ranks #45
- unique phrase that didn't exist prior to Bill's creation of the site
- every one of the 44 pages ranking in front of it (and hundreds behind it) link to the site
- Yahoo! and MSN it's #1
- 1,000+ links according to Yahoo! linkdomain from lots of respected, on-topic industry sites

that's what I call being boxed. Bill's never bought unnatural links, his site deserves recognition and it's been getting it, naturally! I've seen this trend many times - seomoz.org itself was boxed for 9 months and couldn't be found for a search for "seomoz". Through the sandbox detection tool I've seen sites "boxed" even worse (although I can't confirm their link structure).

This isn't the exception to your rule, it's very common. And yes, big sites have been boxed, too; Electoral-Vote.com couldn't be found in the top 100 for 6+ months prior to the US election, despite being one of the most popular sites during that time (mentioned on CNN, CBS, Fox, every online news source, etc. - 100K+ links).

I'm just saying...

BTWOnly take on Fortune 500 clientsThis might be why you're not seeing it ;)

Marcia
12-22-2005, 06:29 PM
ranks #45#44 & 45, but not the homepage, which is nowhere to be found.

jasonsot
12-22-2005, 08:05 PM
Creation Date: 19-jun-2005
Expiration Date: 19-jun-2006



I agree 100% but wouldnt this leave a hole in this example

stuntdubl
12-22-2005, 10:23 PM
Call it a sandbox. Call it a trustbox. Call it bad marketing. Call it poor seo.
"The Google Sandbox" has taken on a connotation of it's own depending on the vernacular by which which you frame your overall SEO philosophy and what you've personally ascribed to it's meaning through the interpretation of others (some who claim it doesn't even exist!) or your own personal experience (which for most is with a extremely limited data set of information).

The question of a sandbox brings up interesting discussions on other valid and important points of order for search marketers. I call it intelligent application of information from toolbar data, personalized search, and other applications which they've acquired over the last few years. It's the evolution beyond singular variables and data and storage issues. The irony is G is now sometimes not nearly as smart as we give the engine credit for, and sometimes it's way more intelligent than we possibly believe it can be. The "natural search algorithm" is an evolving organism with increasingly complex characteristics.

There are definite phenomenon related to time, trust, and traffic. Each of these are quality indicators that help sites achieve their ultimate goals. Both engines and webmasters.

Time
I think there is a substantial disconnect on age filtering right now, and it is an error that will be remedied on the engine's part in the next year. Not so much logical flaws, as lack of good philosophy. relevance = good enough right now rather than as good as it could be if applied more liberally and in different areas. The engines have found new and better ways to store and retrieve LARGE portions of data in the last three years. They are still figuring out how to disseminate it.

The new filters and ranking algorithms are much more sophisticated than hundreds of lightly dependant variables. There are literally multitudes of inter-dependant criteria that help to determine if a site or page is the "best". It is not the marketing success breakthrough for AI yet, but it is good enough.

Trust
The will get better at using personalized data, and acquire much more user data. If Web 2.0 is about user generated content, than the engines are WAAAY ahead of the curve. They are the kings of scraper content. They just haven't applied heavily because they are still contemplating long term issues of privacy concerns.

Traffic.
Fortune 500 companies don't get sandboxed, litterboxed, trust boxed, filtered, delayed, and seldom penalized. As long as the media likes it it's okay now I would think. If you're backing a site up with a TV campaign, you're not going to have a problem. The barrier to entry has been raised, and it's in the form of traffic validation of natural algo search results. The linking rich get filthy stinkin' richer, and those getting traffic amaze more virally. Think viral optimization - one of the nice recommended ways of beating the box.


There's only a few ways to learn about algos or conduct yourself as a SEO:
(I'd love to hear others chime in on how they learn SEO/SEM)
-listen and read from smart people who you trust -
make up your mind what's right and apply to sites
-listen and read from smart people who you trust
believe everything and don't test
-listen and read from smart people who you trust
test everything anyways.
-read whitepapers all day and try to understand their meaning and how they may be applied and spend time testing
-conduct your own scientific research against the efforts of dozens of well bread P.h.d's
-believe everything your read, and post wild theories to forums hoping that your competitors or their associate read and apply your disinformation
-blame everything on things like the sandbox and read blogs and check statistics all day.



Success = build quality sites and trust over time. It works for business and it works for goog.

MODERATOR NOTE: I've split the good stuff about what 2006 might bring into a new thread, see as how I'm in a thread splitting type of mood this week! You'll find it here, SEO/SEM Tactics For 2006 (http://forums.searchenginewatch.com/showthread.php?t=9329), and I invite you all to explore non-sandbox stuff in that one.

orion
12-23-2005, 01:38 AM
Well, hope this shed some light to the subject.


Last year I was discussing via email with Prof. Ricardo Baeza-Yates about web dynamics and temporal link analysis. He then sent me an email (Friday, November 12, 2004) in which he kindly attached a great work he published in 2002 with Felipe Saint-Jean and Carlos Castillo. Their work expands on the famous AltaVista's Bowtie Theory (a paper I read back in 2000)


Baeza's paper was: Web Structure, Age and Page Quality (http://citeseer.ist.psu.edu/cache/papers/cs/26938/http:zSzzSzwww.webdyn.orgzSz.zSzproceedingszSzbaez a_yates_web_strucutre.pdf/baeza-yates02web.pdf)


The abstract states (emphasis added):

"This paper is aimed at the study of quantitative measures of the relation between Web structure, age, and quality of Web pages. Quality is studied from different link-based metrics and their relationship with the structure of the Web and the last modification time of a page. We show that, as expected, Pagerank is biased against new pages."


The authors tried to address a critical question. In their own words:

"Are link-based ranking schemes providing a fair score to newer pages? We found that the answer is “no” for Pagerank..."


When was this research conducted? Well, the data was collected from the last half of 2000 and first half of 2001. Note that this was way before SEOs started to promoting theories in discussion forums and SES conferences about a "sandbox".


Note the researchers mention that this was "as expected". So the phenomenon was related to the dynamics of the web and the way PageRank weight was passed through a dynamical system, not to any funny name "last updates from Google", a "new patent" or to some mythical "on/off page factors". ("If you do this or that to your site you can get sandboxed").


Although the study was limited to the chilean web and they defined age as "last modified", the age effect was well studied, addressed, and documented. The group later expanded their studies in two different occassion.

1. Web Dynamics, Structure, and Page Quality (http://www.dcc.uchile.cl/~ccastill/papers/baeza04_web_dynamics_structure_page_quality.pdf) Baeza, et al.

2. Analysis of Link Based Ranking for the Web (http://citeseer.ist.psu.edu/cache/papers/cs/29455/http:zSzzSzwww.ricardo.clzSzftpzSzlinka.pdf/analysis-of-link-based.pdf) Baeza, et al.


In article 2 they wrote:

"What about the correlation between link ranking and age? Figure 8 shows the PageRank of all pages with respect to age. The bottom dots are normal pages, being the lower region, low ranked pages in low ranked sites, which is the most common case from the point of view of a link based ranking. The fact that most of the new or recently modified pages have low rank (the solid red region) shows that PageRank is biased to old pages. This is bad considering the constant change and fast growth of the Web."


In their concluding remarks they state:

"Considering that most visits are the product of a search, this dependency can have a large impact in electronic commerce as they benefit older sites."


Back then, these papers documented that for new pages there was an initial increment in page rank in the first 3 months followed by a drastic reduction several month laters in which the "true" PageRank is reached. So there was a period of many month (6 - 9) in which one would expect some sort of readjustment/oscillations in the scores. They also found that growth of the Web follows periods of bursts, all amounting to oscillations. These two papers are full of graphs which allows you to visualize the "age effect".


Is interesting to point out that after Baeza's work, Junghoo Cho and Robert E. Adams, from UCLA in

Page Quality: In Search of an Unbiased Web Ranking (http://citeseer.ist.psu.edu/cache/papers/cs/30827/http:zSzzSzrose.cs.ucla.eduzSz~chozSzpaperszSzcho-quality.pdf/cho03page.pdf) proposed a time-based model for page popularity and quality to solve the PageRank bias problem, but even in their model they found oscillations across several months.

Probably Google was aware of all those research findings and similar papers from those years and eventually decided to avoid all the hazzle by simply assigning a waiting period to new sites. To recognize these as new, probably they use a specific criterion (whois?, http headers?, last modified?, I don't know.)

Why the phenomenon was not reported before? Probably because back in 1998-99 no one studied the phenomenon or paid attention to it. At least I did not find any paper on the subject from 1998-99.

The fact that as of today unused aged domains can be utilized to try to game the system suggests that they may need to redefine better the "born on" stamp or whatever flag they might have before passing weights.


Orion

gomer
12-23-2005, 04:38 AM
The debate is really about whether every new site will have problems ranking. The answer to that I think is firmly no.

I fully agree with this statement. But the ratio of sites experiencing this to ones that don't is what needs to be looked at.

I believe that well over 99% of sites launched will experience the sandbox. Yes, some sites getting links from whitehouse.gov and cnn can avoid this but that is like someone's analogy of winning the lottery or becoming president.

Would love to see a naysayer walk us through a real world example of getting a site out and ranking quickly.

DaveN
12-23-2005, 05:44 AM
Randfish that example is different thats not the sandbox, it something else
DaveN

rustybrick
12-23-2005, 09:05 AM
Randfish that example is different thats not the sandbox, it something else
DaveN
DaveN, why not share with Randfish, what it is? ;)

randfish
12-23-2005, 10:35 AM
Whatever it is, that's what I call sandbox. That's what the "sandbox detection tool" was made to look for and it's the phenomenon I've seen repeatedly.

Dave, Why must we have separate names for it?
and What's an example of what you call "sandboxing"?

erik
12-23-2005, 11:32 AM
I came into the issue a little later than some. Like Jill, I don't work on too many sites that are newly registered. About Oct. of 2004 I was working on a site that truly baffled me; I hadn't seen anything like it before then. It showed the same exact characteristics that bwarne and Rand describe. Performance for nothing but the longest-of-the-long tail terms.

Back then, SEO Chat and a few others were talking about it, but not too many. I think many sites were in it, but not too many had come out of it yet. A lot of people felt vindicated when Danny summed up sentiment at the SES Winter 2004 keynote (as summarized (http://www.seroundtable.com/archives/001267.html) by Barry):

Last year's Florida largely hasn't repeated but everything is in the sandbox
Speaking only for myself, if you're not convinced you're in the box while you're in it, you're sure you were in it after you leave it. (Does that make sense?) Literally overnight, and not tied to any algo updates, the keywords for which you've always been optimized start to flood the keyword reports, accompanied by a traffic increase unlike any you've seen. (I wouldn't call it a "spike" because it typically doesn't go back down.)

In the last couple months, the pendulum has swung back pretty far. It's sort of popular now to say it just doesn't exist. When you get into these articles deeper, they say that "all you need to do to avoid it" is A, B, C, F, G, and 90% of Q. To me (along with what I've seen), that just confirms that it's real.

Brian M
12-23-2005, 01:41 PM
I would like to see a better example of a site in the sandbox, because randfish has given us one that obviously has some problems.

The non-www version of seobythesea has a Page Rank=0, while the www version has a Page Rank=4. If I can see this, then Google's bots can see this as well, so there is bound to be confusion, particularly if the site was originally submitted without the www as a prefix.

Also, the large number of pages appearing in Google's supplemental index indicate that there was something else that was once very wrong with this site, and an attempt to fix it was made some time ago.

So, it makes perfect sense to me that this site would sit in a "sandbox" while Google's algorithms sort it out. Eventually, it will suddenly appear in the Google SERPS, and its creator will naturally assume that there is a sandbox.

There are also other problems with this site that may be causing the delay, so if anybody has another example that they would like to share, please provide it...

Jill Whalen
12-23-2005, 02:47 PM
I believe that well over 99% of sites launched will experience the sandbox. Yes, some sites getting links from whitehouse.gov and cnn can avoid this but that is like someone's analogy of winning the lottery or becoming president.

Yep! Couldn't agree more.

mcanerin
12-23-2005, 02:54 PM
I wrote a series of articles months ago that actually detailed what I felt the sandbox effect was, as well as how to get in/out of it, etc. I later deleted them and have since kept my results and tests fairly quiet for personal reasons.

As a result I can't remember the exact wording I used at the time, but one part of it went something like this:

The key to SEO (any type of SEO, not just sandbox avoidance) isn't links, or hilltops, or content or even trust. Trust is the closest - I just don't like the word "trust" used in conjunction with a search for a really bad site, for example. No, the holy grail, in my opinion, is confidence.

The more a search engine can be confident that the result it supplies to you is what you are looking for, the more likely you are going to be supplied with that result - i.e. the higher the site will rank.

Things like links, and content, and authority and all that stuff are just methods of attempting to ascertain how confident a search engine can be in presenting the site.

This may seem obvious or trite to someone not used to thinking things through very deeply, but put down the eggnog and indulge me for a moment...

Stop thinking about links and content. What else would inspire confidence in a website? What about the lack of duplicate data? What about a URL structure that lets the search engine know that it's definitely not indexing the same 15 pages over and over again?, what about a server NOT going down all the time? What about links outwards to sites that are known to be useful to searchers for the content they've just searched on? What if the site approaches the search term from a different angle than most of the other sites (ie it's a museum or directory rather than a commercial site, etc).

What about how long people link to it? A site that people link to for 2 months and then stop is probably not a good site (and probably buying or trading for them, or doing some sort of serial linking campaign). A site that has static 4-year-old links from trusted authority sources is probably a good site.

All of these things can affect the confidence levels a site has as a result for a particular query, or for a position on a results page for a particular query.

Of course, these are usually not yes/no answers - if you only rate a 46% confidence level for a keyword, that kind of sucks (I'm making these numbers up for illustration ONLY), but if the other choices are all 22% or lower, then you will be firmly placed in a top position, even though frankly it's not that great of a site. Just because a site is number one doesn't mean it's a good site, it's just considered the best of a bad lot.

I want you to look at something - find a site that is in a sandbox, and look at a keyword that it ranks for. Now look for the closest Supplementary Result. See a connection? Now think about what Supplementary Results are, and what that connection means. Look really, really close.

Sandboxed sites usually appear immediately above supplementary results. If there are no displayed supplementary results for a search (because there are so many other ones that the search engine can show instead), your site probably won't show up.

The Supplementary Results are a separate database of "last gasp, only show if nothing else works" results. They have a confidence score (else they would not show up at all), but it's extremely low. These include pages that either go down a lot, or that have been recently not found but used to be good, etc. In short, they are on topic, but there is almost no confidence in them.

I've noticed that "sandboxed" sites typically are sites whose confidence score is very low, but better than the ones in the supplementary results database (I suspect that they are the lowest or bottom results in the normal database).

That's a fairly accurate method to tell if something's been sandboxed. Find it's relation to the Supplementary Results for that search term. It's not the only method, but it's quick and and easy.

The sandbox has nothing to do with trust or age, or ccTLD - it's all about confidence, IMO. If you want to declare all sites that have very low confidence ratings as "sandboxed", then fine. For me, they are just sites that the search engine isn't confident about (yet).

It's perfectly possible (even common) for a site to be highly relevant, but not be assigned a high confidence level due to other factors.

IMO, the sandbox effect is related mostly to the length of time a domain has had particular links to it for. Which is actually very different from site age itself. An old site with no links to it will be "sandboxed" based on the first day new links are discovered. Likewise, an established site that resets it's historical data through a redirect, merge, or change in ownership/direction will often suffer the same effect.

Since the links age is only one criteria, a site that can show itself to be trustworthy because of other factors (ie really, really good links, etc) would override the negative aspect of the young links.

It appears you need links for about 6 months before Google begins to be confident that they are permanent links and gives you full credit for them. In short, you need at least 6 months of historical data. Since it usually takes 1-3 months for a new site to be fully spidered, you will note that the most common "sandbox" times are 6 + (1-3), or 7-9 months. It could be as soon as 6 months and one day, or as late as 12 months, but I most often see 7-9 as the common range for a standard (non-aggressive but competent) sites.

A brand new site launched by a very trustworthy company, or a site that has garnered lots of natural links, may easily be deemed as a site a search engine can present as a result with confidence, regardless of the youth of it's links. Young links are only one aspect of the whole thing, that's why (IMO) there are so many exceptions to the so-called "sandbox".

You can also avoid the effect if the site is assigned some of the historical data of another via a merge of some sort.

My suggestion for SEO in 2006 - make your site one that a search engine could show with complete confidence to a searcher for your term. Make sure its technology is sound, it's links trustworthy and it's content useful. If that sounds like what the search engines have been preaching all along, it's because it is - they are just finding different ways of measuring it.

Of course, I'm sure some people's response to all this will be along the lines of the old joke: "The secret to success is sincerity - once you can fake that, you've got it made!"

My opinion,

Ian

orion
12-23-2005, 07:32 PM
IMO, the sandbox effect is related mostly to the length of time a domain has had particular links to it for. Which is actually very different from site age itself. An old site with no links to it will be "sandboxed" based on the first day new links are discovered. Likewise, an established site that resets it's historical data through a redirect, merge, or change in ownership/direction will often suffer the same effect.


"Age" of site in Baeza's papers does not refer to when the site was created. They use "age" in a lose sense to indicate modifications to pages, in particular this refers to the age of links and to an age based pagerank


The following is from Section 5 of
Web Structure, Age and Page Quality (http://citeseer.ist.psu.edu/cache/papers/cs/26938/http:zSzzSzwww.webdyn.orgzSz.zSzproceedingszSzbaez a_yates_web_strucutre.pdf/baeza-yates02web.pdf)


5. AN AGE BASED PAGERANK

(emphasis added)


"Suppose that page P1 has an actualization date of t1, and similarly t2 and t3 for P1 and P3 such that t1 < t2 < t3. Lets assume that P1 and P3 reference P2 Then, we can make the following two observations:

1. The link (P3,P2) has a higher value than (P1, P2) because at time t1 when the frst link was made the content of P1 may have been different although usually the content and the links of a page improves with time. It is true that the link (P3, P2) could have been created before t3 but the fact that was not changed at t3 validates the quality of that link.

2. For a smaller t2-t1, the reference (P1, P2) is fresher, so the link should increase its value. On the other hand the value of the link (P3, P2) should not depend on t3-t2 unless the content of P2 changes."


The authors actually mentioned the problem of using site creation as age as opposed to link age and suggest the following solution.


"A problem with the assumptions above is that we do not really know when a link was changed and that they use information from the servers hosting the pages, which is not always reliable. These assumptions could be strengthened by using the estimated rate of change of each page.

Let w(t, s) be the weight of a link from a page with modification time t to a page with modification time s, such that w(t, s) = 1 if t>= s or w(t, s) = f(s-t) otherwise, with f a fast decreasing function. Let Wj be the weight of all the out-links of page j, then we can modify Pagerank using:

PRi = q + (1 - q) SUM w(tj, ti)*PRmj/Wmj

where tj is the modification time of page j. One drawback of this idea is that changing a page may decrease its Pagerank."

The idea here according to Baeza et al. is that each time a page is changed but the links stay, implies a reaffirmation of the links on that page.


Studing the phenomenon with a small subgraph as TodCl was a good choice since the researchers were able to have a less noisy environment. However, when the phenomenon was known, some dismiss it as a feature of small subgraphs or a consequence of a "the rich getting richer".

The fact is that the phenomenon studied by Baeza was different from the one reported by Barabasi, et al (the "rich-get-richer"), which is a more global and older phenomenon, observed across several length scales, and that predated Google.

By 2002-2003 both phenomena were well known, widespread and no longer possible to be ignored --proving fallacious the notion of PageRank as a democratic scoring system that levels the field.

While only Google can confirm, it is possible that around those days they decided to impose a very wise "wait and see" approach in an attempt to level the field, which is now referred to as sandboxing. Thus the notion of placing some sites in a "waiting", "maturing" status as some refer to as sandboxing is only valid on the grounds of cause-effect arguments.


Again, this research was from 2000-2001 and published in 2002 and further expanded. Was Google aware of these results? Let's look at two places in Baeza's paper: the conclusion and a little footnote after the Reference section (link may not work since is very old)


Conclusion:

"In this paper we have shown several relations between the macro structure of the Web, page and site age, and quality of pages and sites. Based on these results we have presented a modified Pagerank that takes in account the age of the pages. Google might be already doing something similar
according to a BBC article 3* pointed by a reviewer, but they do not say how. We are currently trying other functions, and we are also applying the same ideas to hubs and authorities"


Footer:

"3* ****://news.bbc.co.uk/hi/english/sci/tech/newsid_1868000/186395.stm(in a private communication with Google staff they said that journalist had a lot of imagination)."

End of the quote.

Back in August, Rand kindly sent me his San Jose, SES presentation on link analysis for a quick review and I suggested him to modify it to include the rate of change of pages, including the rate of change of links. The reason is that when dealing with temporal data you need to look at change ratios, not absolute counts.


The second article from Baeza, et al, Web Dynamics, Structure, and Page Quality (http://www.dcc.uchile.cl/~ccastill/papers/baeza04_web_dynamics_structure_page_quality.pdf), expands on both the age of pages and rates of changes. One needs to look at date of creation, updates, deletions, and age of page: the latter is different from date of creation: "We focus on webpage age, that is, the time elapsed after the last modification (recency)."--they stated.

In particular they looked at:

Data

The data is obtained by repeated access to a large set of pages during a period of time.

Notice that in all cases the results will be only estimation because they are obtained by polling for events (changes), not by the resource notifying events.

For each page i and each visit i there is:

The access timestamp of the page visit i.
The last-modified timestamp (given by most web servers; about 80%-90% of the requests in practice) modified i.
The text page itself, that can be compared to an older copy to detect changes, specially if modified i is unknown.

There are other data that can be estimated sometimes, specially if the revisiting period is short:

The creation timestamp of the page, created i.
The delete timestamp of the page, when the page is no longer available, deleted i

Metrics

There are different time-related metrics for a web page, the most common are:

Age, visit i - modified i
Lifespan, deleted i - created i
Number of modifications during the lifespan, changes, i
Change interval, lifespan i/changes i

For the entire web or for a large collection, useful metrics are:

Distribution of change intervals.
Time it takes for 50% of the web to change.
Average lifespan of pages.

"One of the most important metrics is the change interval; Figure 1.2 was obtained in a study from 2000 [13]. An estimation of the average change interval is about 4 months."


The gist of this research was that with temporal Web data, weight occurrence and measurement are not necessarily synchronized events. Mismatches can accumulate across scales, to a point that cannot longer be ignored. The time it takes for 50% of a graph to disappear does not help either.


Orion

PhilC
12-23-2005, 11:35 PM
Of course, I'm sure some people's response to all this will be along the lines of the old joke: "The secret to success is sincerity - once you can fake that, you've got it made!"I don't know if it's a joke or not, but I can vouch for the fact that it's wholly true ;)

Excellent post, Ian. I've no idea if you've hit the nail on the head or not, but it's definitely food for serious thought.

gomer
12-24-2005, 02:06 AM
Great post Ian, thanks.

In my opinion, confidence and trust are the same thing.

bragadocchio
12-24-2005, 03:03 AM
Since it was specifically brought up as an example, I figure I should address whether www.seobythesea.com in the sandbox.

I don't think so. There's more of a "cobbler's children have no shoes" effect going on there. :)

The blog was originally created as a one-shot on June 22, for an event hosted in August. It was picked up and linked to within a day or two on Search Engine Roundtable, the SEW blog, Threadwatch, Search Engine Journal, Cre8pc Blog, SEO Book, Cre8asite Forums, High Rankings Forum, Search Engine Watch Forum, SEOmoz, BPWrap, Gray Hat News, and more. With the RSS feeds on those blogs, a search in Google for "SEO by the Sea" was showing more than 13,000 results in less than a week.

That's what it was intended to do.

I've made a few posts there since then, but not really regularly until the last couple of weeks. I didn't do much beyond adding some posts. It suffers from many of the problems of a Word Press Blog, But I did spend a little time today tweaking it to fix some of those issues.

I changed the vhost.conf file to resolve the canonical URL issue, added a robots.txt file to address duplicate content under different URLs, tweaked titles so that post titles come before the site name, fixed or removed some bad links (even no followed one), restricted blog posts to single categories instead of multiples, and I even made my first submission of the site anywhere, at DMOZ. It still has some work needed.

Yahoo! and MSN are a lot more forgiving than Google, but both picked up on the fact that the second blog post was titled the same thing as what was in the title field of the software. The page title shows both blog post title and title field, so the phrase was repeated twice. Both of those search engines list that second page before the domain index page. Guess they give page title a lot of weight.

Sandbox? I've read the papers that Orion cited above, and they are definitely worth reading, but I think that it's possible many sites claimed to be in the sandbox have other issues. I know mine does. Good to have a holiday break to make some shoes for my own children. :)

PhilC
12-24-2005, 09:07 AM
In my opinion, confidence and trust are the same thing.They are not the same. You can be confident that a particular horse will win a race, but you don't trust it to win. Similarly, you can be confident that a website is what it claims to be, but that's not expressing any trust in it. Confidence in something or someone is something that you yourself have, but trust is something that you bestow on someone else. Also, confidence is in degrees - it can grow and diminish, but trust is either on or off.

A search engine can measure their degree of confidence that a site is good, but it doesn't mean or imply any trust in the site.

randfish
12-24-2005, 11:23 AM
Bill, my point isn't that everything is done perfectly on SEO by the Sea, it's that Yahoo! and MSN (and Google prior to March 2004) never did anything like this. If 100 sites pointed to a site with anchor text saying something unique, and that was the title of the site, Google always used to rank that site for that phrase.

A few SEO errors or non-canonical issues couldn't hurt that. Even dual competing names wouldn't affect it - there's a very different filter going on now at GG and that's what I call sandbox.

If there's something else you call "sandbox", please let me know. I've always referred to this effect (of sites that obviously should be ranking for something not ranking for it) as "sandbox."

AND - I bet dollars to donuts you could fix nothing about the site, Bill, and 3-6 months from now, during an update of some kind at Google, your site, along with a few dozen others, would all "suddenly" be ranking competitively for your respective phrases and terms. I've seen this many, many times and it's very consistent.

When SEOmoz "hopped" out of the box it went from ranking in the 60s and 70s for its own name to #1 and ranking top 3 for dozens of other relatively competitive phrases. I didn't "DO" anything to the site and not surprisingly, I got emails that same weekend and saw threads started by other folks who had also jumped out.

Same phenomenon happened with avatarfinancial.com - the first site I observed in the sandbox. Also occuring with Etsy.com - try searches like http://www.google.com/search?q=buy+handmade+online - Etsy should be ranking #1 - their links (http://search.yahoo.com/search?p=linkdomain%3Aetsy.com+-site%3Aetsy.com&ei=UTF-8&fr=sfp&fl=0&x=wrt) slaughter anyone else's in the top 20, but nada...

brandall
12-25-2005, 02:55 PM
They are not the same. You can be confident that a particular horse will win a race, but you don't trust it to win. Similarly, you can be confident that a website is what it claims to be, but that's not expressing any trust in it. Confidence in something or someone is something that you yourself have, but trust is something that you bestow on someone else. Also, confidence is in degrees - it can grow and diminish, but trust is either on or off.

A search engine can measure their degree of confidence that a site is good, but it doesn't mean or imply any trust in the site.
Hmmm....

From dictionary.com (Webster’s Dictionary):
con·fi·dence Audio pronunciation of "confidence" ( P ) Pronunciation Key (knf-dns)
n.

1. Trust or faith in a person or thing.
2. A trusting relationship: I took them into my confidence.

From dictionary.cambridge.org (Cambridge Dictionary):
confidence (CERTAINTY)
noun [U]
the quality of being certain of your abilities or of having trust in people, plans, or the future

You may be using them differently, but in fact, to most of the world, the two have the same meaning. Trust is no more a binary term than is confidence. I can trust one person a lot and another person a little, and yet a 3rd person not at all. Both trust and confidence are bestowed by the one experiencing the trust or confidence. Neither comes from the object thereof. I may trust you while someone else may not. And I may have confidence in you while someone else may not. The difference in both cases is in the person experiencing the trust or confidence (or lack thereof) not in you (the object of that experience).

In any case, I think you are correct about the "sandbox" being fundamentally about confidence (or trust). As for all of the distinctions being pointed out about what is sandoxing and what is something else, most of it is just semantics that has no effect in the real word as to how one goes about optimizing a web site. I don't really care if Rand's example is a case of what some here call the sandbox and others call something else. What I know for certain (i.e., I am highly confident in the statement to follow and I have a high level of trust in its accuracy) is that the site would have ranked high on page 1 in Google within weeks had it been launched prior to March 2004, and after that time it does not.

This phenomenon is what I call the sandbox. Might there be a number of different filters, algo components, etc. at play in this phenomenon? Certainly. Is it of value to distinguish the various components and identify methods of addressing each in our optimization tactics? Of course. But simply saying "the sandbox doesn't exist" or "other issues are the cause of this..." adds no value IMHO.

kservik
12-25-2005, 03:42 PM
It is easy to say that great content will be linked to, but the truth is that in a non-english market like Norway (http://en.wikipedia.org/wiki/Norway) there is not much of a culture of linking to others.

This means huge content and old sites have it much easier ranking for a lot of terms, while quality content sites are not going to the top of rankings, something that is quite a flaw in the way Google ranks pages.

I, Brian
12-26-2005, 05:54 AM
Sandboxing began as a process which simply delayed the impact of link anchors (http://www.platinax.co.uk/blogs/brian/30-09-2004/the-google-sandbox-an-early-history/) - once upon a time, dropping tens or hundreds of thousands of links for a semi-competitive term pointing to most sites would show ranking benefits on Google very quickly.

I think it's fair to say since then Google has developed the concept further, so what is now properly meant as the Google Sandbox refers to a set of "tools" within the algo to combat a range of potential spam elements.

However, while I disagree with Mike about his refusal to accept its existence, I think Mike's reasons for his objections (http://www.mikegrehan.com/2005/12/not-sandbox-again-i-dont-usually.html) are very sound:


How could anyone in this business take on a client, tell them not to expect any results for a possible nine months AND expect to be paid?

Paid, to sit around and hope that some technical process might make your website popular with end users?

Search engines are not a panacea for marketing. They're simply in the mix.

orion
12-26-2005, 03:22 PM
I've read the papers that Orion cited above, and they are definitely worth reading, but I think that it's possible many sites claimed to be in the sandbox have other issues.

Well, Bill. The above papers describe an aging effect driven by the structure of the Web while some sites claim unrelated things. In that sense I agree with you and Mike that there are many myths out there. However, there is a perceived aging effect against new sites that many sincerely feel are in a "box".

Most of the research papers I have read on this aging effect that feels, smells and looks like a "waiting period" or "box" have something in common: they discuss the effect of the Structure and Evolution of the Web and new non batch modes for crawling that structure, particularly with incremental web crawlers. Nothing like ending 2005 with more research readings.

Cheers

Orion

natureday
12-27-2005, 07:35 PM
Marketing is all that it takes to be sucessful in this world.

mcanerin
12-27-2005, 11:34 PM
Well, talent (content) helps a bit, too ;)

Ian

NuevoJefe
12-28-2005, 03:30 PM
Something that Orion referenced struck as possibly being an important factor behind sites I'm seeing leave the box early. It related to a possible perceived value, that links remaining constant on a page that endured other updates, might possess.

This may explain why some of the newer sites I've analyzed which have many natural, in-content blog links (blogs have replies added and are therefore updated) seem to be doing very well. However, that factor alone cannot be heavily weighted in this case scenario considering that a blog post normally receives updates only for a few days after originally posted, while a link directory page will likely receive regular updates as links are added every now and again (and we know how much link directory links are helping these days).

There may be something there though.

SparkysDaddy
12-28-2005, 05:06 PM
Here's a tip I'd like to pass on for Christmas. Don't read anymore of this thread or any others about this ridiculous notion of a sandbox.

Buy a good book on marketing instead.


Mike--

Before deciding whether or not to heed your sage advice on this ridiculous Sandbox notion (an obvious excuse employed by the unwashed masses incapable of grasping even the most essential concepts of good marketing), I would like to know something about your qualifications:

How many completely new Web sites utilizing newly registered domain names you have created in the last two years?

Should the answer be, "0," I would politely suggest that you limit yourself to topics with which you might have some expertise.

SANDBOX

Here is the best definition I've read:

From SEOmoz:

"The observed phenomenon of a site whose rankings in the Google SERPs are vastly, negatively disparate from its rank in other search engines (including Yahoo!, MSN & Teoma) and in Google's own allin: results for the same queries."

As far as my own Sandbox experience is concerned I will provide a typical example:

One of my newer sites ranked #13 on Yahoo and #3 and #4 on MSN for a moderately competitive search term, yet was nowhere to be found on Google (#180+). Now I know that their algos differ, but they don't differ that much. Without apparent rhyme or reason, it suddenly emerged from a nearly five month tour of duty in the box to assume position #11 on Google. No spammy tactics, less than fifty hand-picked inbound links and lots of original content. It is presently ranked #3 of 2.44 million result pages.

Mike--how would you explain such a meteoric rise in Google rankings?

Was I (a simple-minded sandbox believer) suddenly able to grasp the elusive intricacies of "good marketing?"

From WebProNews 11/16/05, Q&A With Google's Matt Cutts:

"Does the sandbox exist?
"Here comes the audience participation part: Show of hands? Most say yes. The fact is that there are some things in our indexing infrastructure that could be perceived as a ‘sandbox' effect.'""

From Threadwatch:

"Additionally, at SES London in "Meet the crawlers", a small business raised the problem to Google of new sites being held back from ranking. There was a huge murmur in the room. The Google engineer responded that Google will act as it sees fit to control the SERPs, and effectively acknowledged that they are involved in some process to this effect."

From 925m:

"Some intrepid bloggers came away from the 2005 SES conference in San Jose with confirmation that yes, Google does place some new sites into a sort of temporary holding classification. Rand Fishkin of SEOmoz.org reports on a couple of conversations he had with some SEO gurus, including Google’s Matt Cutts that the sandbox does indeed exist, and it presents a difficult challenge for zealous search engine optimizers:

Greg & Dave [Naylor] in particular had some choice words about the subject and I commented too. We all shared the opinion that ranking new sites at Google was a pain since the inception of "sandbox" and Matt noted (this is a near word-for-word quote) - "OK, so it's really working. Even on you (guys)."

Fishkin later spoke with some folks at Meet the Google Engineers, who also confirmed the existence of a sandbox, but who also noted that sites go through a filter which determines whether or not they find their way into it. Threadwatch.org member DougS also recalls listening to a Google engineer at SES, saying that the engineer did “openly acknowledge that they place new sites, regardless of their merit, or lack thereof, in a sort of probationary category.”"

[B]Question for Matt Cutts:

"Does gravity exist?"

"Most say yes. The fact is that there are certain observable physical phenomena that could be perceived as a 'gravity effect.'"

:cool:

dannysullivan
12-28-2005, 05:34 PM
As it happens, I was at a friend's house yesterday who has a completely new site, only a month and a half old. He was wondering why another site was outranking him. I'm probably going to go into detail about what I found, but fair to say, there was a sandbox effect for everything I could see. If ranked for some terms but wouldn't rank for other ones that it absolutely, positively should have -- given the other terms it was doing well for.

It will be clearer when I get into more details. Waiting for DaveN to get back so we can do some more playing with it. But it all came down to a single word that as far as I can tell, when used caused a different set of ranking criteria to kick in -- and age of site simply had to be a factor in this, esp. given the crud and junk that was outranking it.

Which brings me to this point. My friend's not an SEO, just a guy with a small side business he was starting. He's sitting there not understanding what's going on and likely would start changing various things on the site to fix the rankings. But if it's a sandbox effect, none of that helps necessarily. I really had to think, "what would prevent this person from wasting time, if they didn't know about this." Going to the Google site wouldn't have said something like, "you're outta luck on some terms for 6 to 9 months."

Now I fully and absolutely admit there are things I'm sure will bump him out -- the right trusted links, esp. But if I sat you down and showed you the queries, even you Mike :), I think you'd be reaching to sandbox as to why he doesn't rank at Google (but does at Yahoo and MSN).

thrion
12-28-2005, 05:37 PM
First off I should say that I am *not* a professional SEO, so my opinions may be way off base. I am a consumer, and I am the owner of an e-commerce site.

So from the standpoint of a consumer, I'm glad that Google waits a while before giving trust to new websites; it makes me feel more secure that when I am looking at a site that is listed in Google that it has either been around for a while, or there is a lot of buzz around the site. It gives me more confidence in the site.

As the owner of an e-commerce site, it was frustrating at first. We went through the whole, "hey wait, our site is designed well, looks great, performs awesome, meets all of the criteria, why are sites that have been around longer that look worse that have less of a selection showing up and we aren't?" phase.

But at the end of the day, the owner in me is glad that we didn't do better earlier. We have experienced a "ramping up" effect with traffic from Google that has given us the opportunity to tweak our internal opperations. We have been able to work out kinks in our business, hire additional people in places we would not have expected to, changed our look and feel in direct response to those people that did find us, and generally build a better end user experience.

Now that our company is just over a year old, we have a solid website built, our infrastructure is sound, and we are ready to accomodate a larger amount of traffic and business.

Although the site was designed with SEO/SEM in mind, we are now putting more focus on this aspect of the business, and are seeing a steady increase of traffic from Google, and we believe that what people find now is a much richer experience than if they had come to us a year before.

calebw
12-28-2005, 07:03 PM
Is the "sandbox" effect limited only to new sites, or rather to a combination of circumstances? I can't point to an example of a new site I've worked on that ranked poorly for reasons other than the reality of being new and lacking the reputation, content, links, etc... that come with a mature site.

I have, however, seen long-established sites start ranking poorly for terms which they formerly held great rankings. This has always been in conjunction with a redesign effort or site architectural changes. Here's an example with an odd twist.

A colleague of mine was recently telling me about a site of hers that is over 8 years old and holds a tremendous number of top-tier rankings for terms related to the country it is about (it is a content site with information and resources about a specific country). They made some large content changes and modified URL's to some content and within the following weeks their rankings plummeted. The content wasn't degraded but rather improved. The URL changes could have impacted things, but they changed the URL's back to original paths soon as the problem was discovered.

Now what caught my attention about this story was that my colleague noted they had an investor seeking to buy the site. That investor heard about the Google problems and purportedly "talked to his friends at Google" and shortly thereafter the site "appeared" back up to where it had historically ranked. Apparently a filter or penalty of some kind had been applied to the site. The circumstances surrounding the application of this mimics other sites I've work with that have had similar drops in rank (and unfortunately lack the close connections to Google).

Maybe 'sandbox' isn't the right term, but, as Rand noted, there are obvious filters that get applied within Google that negatively affect rankings for legitimate sites that really shouldn't rank so poorly.

calebw
12-28-2005, 07:06 PM
Now that our company is just over a year old, we have a solid website built, our infrastructure is sound, and we are ready to accomodate a larger amount of traffic and business ... and are seeing a steady increase of traffic from Google, and we believe that what people find now is a much richer experience than if they had come to us a year before.

Thrion: if all webmasters went about building sites with this in mind finding things would be MUCH easier and the search engines would love the webmasters! Way to approach the situation and I'm sure you will continue to reap benefits from this strategy in the long-run.

Jill Whalen
12-29-2005, 12:06 AM
esp. given the crud and junk that was outranking it.

Yep, that's a common symptom and dead giveaway that it's a sandbox (aging delay) thing.

Which brings me to this point. My friend's not an SEO, just a guy with a small side business he was starting. He's sitting there not understanding what's going on and likely would start changing various things on the site to fix the rankings. But if it's a sandbox effect, none of that helps necessarily. I really had to think, "what would prevent this person from wasting time, if they didn't know about this."

Tons of new site owners have no clue and are doing just that. The other thing they're doing is hiring SEO companies who really don't need to do anything because by the time they're done with their "work" the waiting period is over and they look like heroes.

Kind of funny how Google is helping to keep many SEOs in biz.

2much
12-29-2005, 04:00 AM
http://www.google.com/search?hl=en&lr=&q=tailrank

http://www.google.com/search?hl=en&lr=&q=tail+rank&btnG=Search

Created: 20-sep-2005
Expires: 20-sep-2006
Status: REGISTRAR-LOCK

http://www.google.com/search?hl=en&lr=&q=performancing

http://www.google.com/search?hl=en&lr=&q=bloggers+succeed

Created: 13-feb-2005
Expires: 13-feb-2007

Why?

http://search.msn.com/results.aspx?q=linkdomain%3Awww.performancing.com&FORM=QBRE

maybe 4000 links from multiple domains with authority status on different class c's?

http://search.msn.com/results.aspx?q=linkdomain%3Awww.tailrank.com&FORM=QBRE

this one has 2000

Both have massive link build in a short period of time, for new terms that are "fresh". in google's history data patent they talk about "fresh terms" and how "fresh terms" like "fresh links".

I think this is part of it, of course, with trustrank.

The question is, for a normal small business selling a product, how do you get that type of effect?

I, Brian
12-29-2005, 08:33 AM
My friend's not an SEO, just a guy with a small side business he was starting. He's sitting there not understanding what's going on and likely would start changing various things on the site to fix the rankings. But if it's a sandbox effect, none of that helps necessarily.

I remember at "Meet the Crawlers" at London SES this June, someone challenged the Google Engineer about newer sites being artificially constrained in the index, and a very loud murmur of agreement went around the sizable hall. That told me that ordinary small businesses were seeing real problems with it.


But it all came down to a single word that as far as I can tell, when used caused a different set of ranking criteria to kick in


Remember Florida? It seems generally accepted that an "authority concept", such as described in papers such as Hilltop or Localrank, had been implemented. But if memory recalls, these required the building of a specific keyword dependent set to be related to.

Doesn't seem implausible that Google applied the first keyword set at Florida, expanded it at Allegra, and now has a long list of commercial keywords, that when used in a query, invoke SERPs processed by authority criteria.

Jill Whalen
12-29-2005, 10:15 AM
The question is, for a normal small business selling a product, how do you get that type of effect?

Shoot for keywords you make up yourself. That's all those sites have done with your examples.

2much
12-29-2005, 03:02 PM
So far those are the only sites I've seen that have gotten right out.

If others post some more examples, we can compare similarities.

Jill Whalen
12-29-2005, 03:16 PM
But that's just the illusion of being out. I imagine those same sites don't show up for any keyword phrases that are competitive.

2much
12-29-2005, 03:32 PM
http://www.google.com/search?hl=en&lr=&q=helping+bloggers

11 million results

http://www.google.com/search?q=Build+Your+Blog+Profile+&btnG=Search&hl=en&lr=&rls=GGLG%2CGGLG%3A2005-27%2CGGLG%3Aen

22 million

You might be right, there wasn't much I could find.

It's almost like they're filling a gap with specific keywords, but the moment I look for keywords that are more widespread, they don't rank on the home page.


It'd be great if other people can post examples so we can analyze.

robwatts
12-29-2005, 03:39 PM
I have, however, seen long-established sites start ranking poorly for terms which they formerly held great rankings. This has always been in conjunction with a redesign effort or site architectural changes. Here's an example with an odd twist.


Ive seen this too. Its almost a case of change too much too soon without the requisite level of whatever it is required to keep you where you were beforehand and you are toast..


The content wasn't degraded but rather improved. The URL changes could have impacted things, but they changed the URL's back to original paths soon as the problem was discovered.


I can imagine why, for some types of queries, a SE might downgrade a page that had changed significantly from what it was previously, but would argue that its handled in a less than satisfactory way. For example, users, (we are told its all about them right?) might find themselves in circumstances of trying to find a previously used site via particular query, but due to the above effect, will be unable, simply because the algo decided it shouldn't rank anymore, irrespective of how previously useful/relevant it once was, the algo arbitarily determines that it no longer is.

Is this effect related to how consistant a page/site is overtime? Has the algo really reached a state of maturity that detects that if a page or site for that matter, fluctuates from state x to state y and hasn't got the requisite level of sustaining metrics, then its automatically dumped and cant rank for jack once more. Is this because perhaps, its deemed that it isn't as entirely reliable as a page that remains more or less static for a long or longer period of time?

Like everyone else outside the loop all I can do is assume and hypothecate but I suspect that much of the above relates in some way to whether a query is determined to be informtionally commercial(1) , informationally educational (2) or a mix of both even (3).



(1)For the purposes of illustration my defintion of Informationally Commercial would apply to the majority of sites out there, determined via a mix of directory seed and adword kw data. If your site is one of these and you don't have a satisfactory level of supporting data, (whitelist, links, age, history, unique-content etc) and you make radical alterations/improvements/content additions, then you run the risk of getting boxed.


How much of this is manually determined (http://www.searchbistro.com/index.php?/archives/19-Google-Secret-Lab,-Prelude.html) is a key point of which I guess should in theory drive us all to make something as kick bottom as poss, so we dont fall foul of things like the above...

(2)For the purposes of illustration my defintion of Informationally Educational would apply to sites that have a .edu .org .gov tld, determined as above, via a mix of directory seed and adword kw data. To illustrate further, I would bet that 'Kondratiev Cycle (http://www.google.com/search?q=kondratiev+cycle&btnG=Search)' isn't remotely competitive, making for a good example of a query that should return serps that are in the main, informationally educational and not part of some result set that is competively bidded and stalked upon. The topic itself is always the 'Kondratiev Cycle' it shouldn't ever alter, it can't, he's dead, the findings and what it entails are set in stone, end of subject. From an algo viewpoint, the task is easy, it isn't competitive, no one is trying that hard to game it, so no need to filter sites that are relevant for it.

Sidenote:From a query perspective a relatively simple question remains, what should be the authoritative source on his works, and how should this determined?

(3)We all remember the accounts of freshbot and the once noticeable boosts given to new content and whatnot and this may well still apply to News type sites/blogs with the requisite level of checked boxes ( links, content, age etc) These sites are subtley different from the others in the sense that they are expected to change with greater frequency than say, a site with pages that contained universally accepted info on say a page about the workings of a particular product or theory, which generally shouldn't alter that radically see (2).



I guess the bottom line for some of us is that we have our list of what we think is important in avoiding the consequences of the 'sandbox' or whatever else you want to call it, in some ways the people who Google are trying to outfox (seo's) are receiving a direct benefit; its kind of ironic that most of us do rather nicely helping or advising people who fall foul of something that in the minds of many was designed to prevent a perception of an easy to game algo.

2 cents :)

Marcia
12-29-2005, 04:18 PM
But that's just the illusion of being out. I imagine those same sites don't show up for any keyword phrases that are competitive.Exactly. Coined phrases wouldn't even turn up in a lexicon, so there's no tangible indication of the sites being "out."

Remember Florida? It seems generally accepted that an "authority concept", such as described in papers such as Hilltop or Localrank, had been implemented. But if memory recalls, these required the building of a specific keyword dependent set to be related to. Brian, the Hilltop interpretation was way overplayed, but there were things that were evidenced with Florida that clearly smacked of elements from Kleinberg's HITS, which is query dependent.

There were long, long discussions on it, with some analysis going on between individuals by PM of individual sites in specific topical areas, particularly where local listings were concerned. There were a few elements identified that if applied would pull a site out from under the filters. If those elements weren't applied to a site, then it didn't recover.

Doesn't seem implausible that Google applied the first keyword set at Florida, expanded it at Allegra, and now has a long list of commercial keywords, that when used in a query, invoke SERPs processed by authority criteria.The question there would be whether it's happening at query time, or whether there's a pre-processing step happening beforehand.

IMHO Florida and the significant changes we saw around that time were a portent of things to come. I've always thought of what's perceived as the Sandbox Effect as "Florida on Steroids" with additional factors thrown in for good measure.

That doesn't mean they're 100% related, but they aren't unrelated.

claus
12-30-2005, 06:27 PM
I've been referred to this thread from Threadwatch, I'm sorry to say that I almost never use this forum as it loads far too slow for my liking, but nevermind. I discovered it, that's the important part.

Allow me to throw my two cents at the table, and thanks for some sensible posts Mike, Ian, and Danny (#1, #2, #3)[1] . Oh, and Jill was extremely close to getting to the very core of the matter in the end of her her post (#10) as well, and so was Danny in the start of #14, but nobody seemed to notice?

Here's my two cents. I posted it at TW but I feel it belongs here, in context.

-----------------------------------------------------------------------

The thing that bothers me most about "sandbox" is that it's a catchy name that draws people's attention *away* from the core of the matter. It does not describe the core of the matter and it does not identify it. IOW it's a misleading term in so many ways.

Dannys post (#59, not #3) is pretty trivial stuff: He simply says that he saw a site that did not rank for terms that it really should rank for, especially given the (other) terms that it did rank for.

That's where most people will scream "sandbox". But, try a reality check: So, if I like ice-cream should it follow from that that I also like rollercoasters? Or?

What Danny has seen is simply a site that does not rank for some terms. Period. Perhaps it will aquire the credibility needed to rank for those terms, perhaps it will not, but as of now it simply does not have enough credibility to rank for those terms.

That's the core of the issue. Add credibility and the site will rank. It's not Google's fault, it's not a "sandbox" (or a "freezer"), it's lack of credibility. Plain and simple.

It's not a bad site, though, as it apparently ranks for something. That's a start. All good businesses start with a start.

-----------------------------------------------------------------------
Added:

[1] When writing this post I had only read page one of the thread, plus Dannys post #59 linked from Threadwatch. Now that I've read the rest of the thread I really need to mention Ian McAnerins post at #44, as that's the one that is closest to my own thoughts on this matter.

Oh, and Orion too, as "data decay" (for lack of a better word) is one thing I've talked about myself in several posts (on the WMW forum) since the updates turned incremental, and it's nice to know that others have pointed out more problems with incremental ways of indexing. So, I agree, but I feel that this issue is somewhat another one. That is, I'm not sure how much these issues overlap (it could eg. be two different sides of the same coin, or it could be two different coins).

andrewgoodman
12-30-2005, 07:05 PM
I'll try to strike the middle ground between Mike and Dave (who are both non sandbox believers, though Dave does see a sanbox effect) and those still hanging on to the idea of a sandbox period.

Is there a sandbox? Various people say no, including Google, in the sense that all sites must go to Coventry for a set period of time.

Is there a sandbox like effect? Yes, various marketers have seen this happen, and Google itself has said there are various filters that can cause a new site to have to sit in a waiting area, if it were.

Is there a great deal of confusion? Yes. To me, the idea of a sandbox has become the latest run in excuses for why a site doesn't necessarily do well, the universal truth trotted out for everything regardless of whether it really exists.

We've had years of this. Oh, the site's not ranking well because search engines are now doing themes, which they never ever did and to my knowledge still do not do today. But building a "themed" site often meant people took a big site and divided into smaller ones, thus increasing the chance of getting more pages indexed. It also meant that they end up having a number of very targeted home pages, your most important page, rather than one single home page diluted on many topics. That also mean they got more links pointing at the home pages of their more targeted sites. And often, in the process of building their "theme" sites, they created BETTER content.

All of this had nothing to do with the traditional "theme" idea that a search engine would magically look at all of the pages in your web site and reward those that were more on a particular topic. It should have been blindingly obvious to anyone that search engines did not and still do not operate in this way, otherwise Amazon or Wikipedia would never rank for anything, since they lack any theme at all. But because the medicine you took to cure your "theme disease" showed an improvement, themes became a reality for many people.

Oh, my site's not doing well because of term vector stuff now being used. Oh, it's because LSI is now big. Oh, it's because the sandbox. In 2006, I'm sure we'll have some other smoke-and-mirrors type thing that will get trotted out.

If a site isn't ranking, that's not necesarily because of a sandbox effect. It's probably because of a variety of other reasons, but the sandbox is an easy excuse for people to blame or they blame without knowledge.

If it's a brand new site with but not really seeming to rank as well as you might expect, then *possibly* some of the sandbox effect people have reported and Google itself says exists might be to blame. But I think it's fair to say, the number of people who say they've been sandboxed far exceeds the actual capacity of the sandbox, virtual or real where volleyball is being played :)

Another point to consider is this: we get so caught up in talking about "sites," when the underlying reality is that we are generally talking about "companies."

If you don't have a company of any substance (a startup can have substance), then your "site" will be seen by a smart SE as just that: a "site" hoping to get paid. A "company" will have a site that will, if all goes well, attract its fair share of attention of all sorts.

Which brings us back to Mike's key point: this is marketing first and foremost.

I could add something about a case study based on a high quality startup I'm working with now, that has great user-generated content, a great niche, a usability expert on board, me on board :), and lots of local rah-rah appeal. They ranked within a few weeks on all sorts of interesting queries, and we're just getting warmed up. But I don't want to dilute Mike's key point: understand the marketing first. And take it up a notch. Be a real company.

Sandbox effect could be mappable to some sort of linkage effect. New sites don't get a lot of links at first, and linking patterns take time to establish and confirm. The algo is de facto sandboxing you. But that doesn't mean you can't rank right away purely based on keyword matches and some basic checking the algo can do about your site's "veracity" and user behavior.

So - sure, this company is de facto sandboxed, but that's on some arbitrarily chosen terms where competitors just have so much link equity and (as Ian said) built-up SE "confidence," it *has* to take time to outrank them. But on other relatively competitive phrases, including some branded ones, this startup (launched in October) already ranks *in the top 5* - up there with pages from kijiji, Craigslist, etc. etc.

The domain is brand new, also. They actually had to change their company name in late Oct.

All in all, then, the term sandbox and the concept of sandbox are ill-conceived. Some sites are ranking OK right away -- why, in this case? My guess is the site is content laden (reviews and such, user-generated), and Goog has a way of measuring user interest in the site early on.

This aside, again, why dilute an excellent point made by Mike: look at this as marketing. Forget your "site" for a moment. And take it up even one level further. Be a "company". A real company. A great one, even.

Jill Whalen
12-30-2005, 09:17 PM
Sandbox effect could be mappable to some sort of linkage effect. New sites don't get a lot of links at first, and linking patterns take time to establish and confirm. The algo is de facto sandboxing you. But that doesn't mean you can't rank right away purely based on keyword matches and some basic checking the algo can do about your site's "veracity" and user behavior.

Ok, so you've never seen the sandbox in action either (like Mike).

It's not anything like that.

It's a great site, great links, whatever. Doesn't matter. But for any keyword phrases that have any other sites using the same phrases in their content, it won't rank.

Unless you've seen it in action, you simply won't get it.

But you can count on the fact that it does indeed exist. Regardless of whether you've seen it or not! (Like ghosts and UFOs? ;) )

All in all, then, the term sandbox and the concept of sandbox are ill-conceived. Some sites are ranking OK right away -- why, in this case?

No they're not. Not for keyword phrases where other sites have targeted those phrases.

claus
12-31-2005, 09:43 AM
Here's ten web credibility guidelines from Stanford (http://www.webcredibility.org/guidelines/). They're good, but that's it. It's *not* the be-all-and-end-all solution to the Google credibility issue. They illustrate the point made by Ian and me perfectly, though.

I have to repeat: That link is *not* the secret recipe you're looking for. It should help you think about some important issues, but it's no easy fix.

Also, if you don't like the terms "credibility", "trust", or "confidence", try "signals of quality". That's a term the SE engineers use themselves, AFAIK, and it's really, ultimately, what it's all about. Feel free to disagree.

----------------------------------------------------------------
Added:

There are some technical issues that we must not ignore, but there's no purpose in discussing the tech issues blindly, forgetting about the reasoning (as most do). IMO, all Google tech is built on reasoning. First you do the reasoning, then you build the algo. Most SEO forum contributors seem to think it's the other way round, or that there is only tech, and no reasoning.

----------------------------------------------------------------

>> Companies / sites

And really, what Google is all about is pages. The interesting thing is that you're right; Google's increasingly considering site factors as well. At least that's my own understanding. Again, feel free to disagree.


Anyway, happy new year to all :)

andrewgoodman
01-03-2006, 02:52 AM
Ok, so you've never seen the sandbox in action either (like Mike).

It's not anything like that.

It's a great site, great links, whatever. Doesn't matter. But for any keyword phrases that have any other sites using the same phrases in their content, it won't rank.

Unless you've seen it in action, you simply won't get it.

But you can count on the fact that it does indeed exist. Regardless of whether you've seen it or not! (Like ghosts and UFOs? ;) )



No they're not. Not for keyword phrases where other sites have targeted those phrases.

My client's site ranked quickly on phrases like "decorium toronto." It was and is on the first or second page in Google results -- #14 in my search. (Decorium is a well known furniture company.)

I guess I may be missing what you mean by "where other sites have targeted those phrases." All I know is that they are ranking competitively with pages from Kijiji, yellowpages.ca, workopolis.com, neattv.com, and some various spammy pages. The #1 listing overall is, as it should be, decorium.com. They have a title tag that begins with Decorium Furniture Toronto.

But my client is on page 2 of the SERP's. That's not ahead of everyone, but it's there. What did I miss?

In a general sense, this site is getting very few search referrals and is not yet well indexed. But when I search, it's there. These aren't monstrously competitive phrases, but ...

Marcia
01-03-2006, 02:56 AM
>>First you do the reasoning, then you build the algo.

claus, that may be the initial sequence, but then the algo itself has no reasoning capability, it's just a set of programs.

Stanford's criteria for credibility is fine and good, but it basically takes most all human judgment and reasoning and there's precious little in there that an algo can determine. Last updated, sure - but can they detect if it's any indication of value? nope.

There is virtually nothing mentioned in any search patents or white papers that has anything to do with companies and their credibility or quality or their marketing expertise, as such, not in the sense that human judgment can determine by observation and reasoning. That's where the link-based algos came in, PageRank and HITS - trying to use links as measures of importance and reputation. And those can be and *are* tampered with, or in some cases, the deck could be stacked in the first place, as in the case of long-established Fortune 500's which get around filters that result in the "sandbox effect". But that isn't by reasoning or judgment, it's by metrics that are mathematically measurable.

There's a *group* of filters operating, and people who don't DO SEO don't comprehend it, and those who work with Fortune 500's and the like won't experience it because they're working with a deck that's stacked in their favor to begin with, that's got the ability to by-pass and overcome the effects of some of the key filters to begin with.

Robert_Charlton
01-03-2006, 05:06 AM
My client's site ranked quickly on phrases like "decorium toronto." It was and is on the first or second page in Google results -- #14 in my search.

If you search for "decorium toronto" on Google, you'll see that Google cites only 14,500 pages (http://www.google.com/search?hl=en&lr=&q=decorium+toronto&btnG=Search) containing all the words, and only 18 pages (http://www.google.com/search?hl=en&q=%22decorium+toronto%22+&btnG=Google+Search) satisfying the exact query. It returns 8 of them. It's not a very competitive phrase.

But my client is on page 2 of the SERP's. That's not ahead of everyone, but it's there. What did I miss?

That many of us are talking about searches with hundreds of thousands or millions of pages containing all the words, and many tens or hundreds of thousands of pages containing exact matches.

Further, that these pages often belong to hospitals that don't rank for their own names... to industry leaders that changed a domain name and then vanished for their own products... or to highly relevant, well-linked sites that are nowhere on Google but at the top everywhere else, and which, one day, will apparently magically and arbitrarily pop up at the top on Google too.

Jill Whalen
01-03-2006, 12:40 PM
My client's site ranked quickly on phrases like "decorium toronto." It was and is on the first or second page in Google results -- #14 in my search. (Decorium is a well known furniture company.)

The phrase, according to Keyword Discovery had a whopping 12 searches in the past year. Hardly something that could be considered a competitive phrase.

That's exactly the type of phrase you CAN show up for while still being affected by the aging delay.

And according to Robert's post above, you came in 14 out of 18. Out of the aging delay, you will probably be a lot higher if optimized well.

Marcia
01-03-2006, 05:22 PM
>>Out of the aging delay, you will probably be a lot higher if optimized well.

If it's decorium.com - that's a flash splash page with interior pages in a frames structure, so optimization doesn't really enter into it, particularly with ranking for the company name. Ranking for furniture or home furnishings would be where the action is.

Plus, if that is the domain being referred to, it's been registered since 2001 so neither would any aging delay factor apply, because it's too old a site to be "sandboxed."

andrewgoodman
01-03-2006, 08:47 PM
Sure, fair enough.

But while we're on this topic, could you explain to me how any brand new site is going to rank well for the phrase "home furnishings" -- "sandbox" or no "sandbox"? You would have to build up relevant linkage and other indicators of a page's meaning & status before you could rank at all on those kinds of phrases. So the page's score would be so low that it would be zero or very near zero, and not worth displaying at all. I'm thinking as long as Google's algo has been sophisticated enough to filter out the worst kinds of link spam and assess behavioral/quality indicators, there would have been a sandbox-like effect on competitive phrases.

Today, this site [we were talking about homestars.ca by the way, but it started life as homedirection.ca] appears #1 & #3 on a designer's name, "hildi weiman," etc. In this case you can see that the #1 listing is of the old site, so both sites are ranking on the phrase... which makes this whole site a bad example to use, because it'll be awhile before Google figures out that homestars and not homedirection is the real site. I agree that's not a popular phrase, but...

We seem now to be defining the sandbox effect as "not getting high listings on very competitive phrases." But isn't the point of assessing the linking structure of the web one that would have inherently involved a sandbox effect for engines like Google and Teoma, so this current situation is more of a continuation/extension of something that always existed?

I wrote a blurb in October 1999 - http://www.traffick.com/story.asp?StoryID=29 - about Google, pointing to an argument that was emerging at the time against Google and PageRank:

"Google's reliance on an automated measure of 'reputation' may magnify the popularity of the biggest, most popular sites, and make it difficult for newer, high quality sites to be discovered." and "A major issue may be ‘lag time’ or inertia. Older, more established sites may fare better, and this can become a vicious circle. Some now-obscure pages buried deep in a major website's archives may rank too high."

On competitive phrases, hasn't it long been the case that SE's won't just rank new sites out of the blue on popular terms?

How can anyone point with certainty to the "day" the "sandbox" was "invented"? Probably because it never was, but what seems like a sandbox effect has ebbed and flowed as the technology has evolved. You could argue that editorial review (dmoz, etc.) is a "sandbox" as well. Editorially, sites and pages need to be "accepted" and gain some kind of confidence score higher than "infinitesimal" before they're going to be featured on a search engine. For a new site, they don't even have basic site-specific info for how often it updates. That data takes time to gather. Who expects to rank on a term like "home furnishings toronto" overnight?

I checked the registration dates for the current owners of the sites in the top ten listings on that particular query. They are:

Sep. 1996
Jan. 2001
Feb. 1997
Jan. 1999
Feb. 1996
[Google Directory category]
Mar. 1999
Apr. 1998
Jan. 1998
Jan. 2003

--
next 10:

May 1996
Aug 2000
Jan. 2003
[already mentioned]
[dealtime]
[already mentioned]
Aug. 1994
Nov. 2000
[already mentioned]
[already mentioned]

--
next 10:

Oct. 1999
Feb. 2000
Oct. 1994
May 1997
Feb. 2005 [page on romanian adult industry discussion site / redirect to a furniture company page / thus spam ] [so we get to the 25th result before freshness trumps reliability, and smack, a spam page is the result - until here, the youngest site is three years old]
Oct. 1995
[already mentioned]
May 2003
Nov. 1997
Mar. 2000

--
next 10:

[already mentioned]
[already mentioned]
[craigslist]
Nov. 2002
Mar. 1996
Jul. 2001
Aug. 2003 [spammy, broken, irrelevant, India]
[yahoo directory]
[yellowpages.ca]
Sep. 2000

Some sandbox! If you were telling a client how long you'd have to wait before having a shot at being ranked in the top 40 on a moderately popular term like "home furnishings toronto," (and of course you get virtually no clicks outside of the top 10 anyway), you'd have to tell them THREE YEARS!! (Unless they have special Romanian or Indian spam techniques up their sleeve, in which case they'd make it to #25 or #37 and get no clicks anyway.) Or if you wanted to give them the average or median age of site in those positions: more like 4-6 years.

Again I say: that's some sandbox!

But maybe we should be going for something more retail-practical, like a certain type of chair [recliners, for example]. Looking at one such query I see sites, including a client's site, in top ten positions with a fairly similar pattern as far as domain age goes: they are all old, having been registered in years like 1997, 1995, 2000, etc. Either that or they are portal sites like bizrate or "knowns" like Google Answers. Again, quite a sandbox!

Although I was too lazy to check beyond a few of them, I also assume that *all* of the above (save for the spam entry that snuck in, possibly because Google was having trouble doing its automated checking on foreign domains??) have a stable, long-standing pattern of inlinks from sites with high confidence. As one of the SE reps said in Chicago, a small retailer just doesn't get 1,500 links all of a sudden, spontaneously. Most would be happy to have 1,500 customers.

Both the age of the sites, and the continued importance of some types of links, underscores the limitations of the search technology. There is no particular value to this link, for example:

http://www.ctv.ca/servlet/ArticleNews/show/CTVShows/1104961165034_100369888

Except that it's a pretty sweet link from a national television station to a major furniture retailer. In short, a kind of "crony system." Little wonder then that companies will try to recreate spammy versions of same. To weed these schemes out, further/stronger filters and double-checks of quality are used, and that makes the pre-existing "sandbox effect" or "crony system" stronger (but if it's relaxed, spammy stuff comes right back in, so it can't be relaxed too much).

I guess a lot of it is about who you know, as always, huh?

Just playing devil's advocate. If a site is brand spanking new, I'm not sure how any of its pages could have enough PageRank (or other link recognition, or quality indicators) to outrank established pages/sites on core, popular terms. Based on what? By definition, such new sites come in tabula rasa (if they don't, please explain to me how they don't) and are de facto assumed to be spam until they prove otherwise. Guilty until proven innocent.

If a new site has 4-5 quality new links, maybe it should rank, but I see why it won't. Mainly because link schemes are so prevalent and SEO's have been sitting around aging domains and buying them and so forth, I'm sure there is some kind of waiting period in order to gain high enough confidence that a site isn't seen as "suspect."

Could something (evidence of major unmistakeable user interest in a new site) override that waiting period? I suspect so, but presumably there has always been a waiting period of at least 60-90 days before being decently indexed. Just because it seems longer now doesn't mean that's the sandbox length or anything like that. I don't know if "sandbox length" even makes any sense. Perhaps site history with organic is similar to how account history is measured in the AdWords algo now: undisclosed, but likely on a continuum. There is no set waiting period - you simply build a history (Ian said it -- reliability/confidence increases with more data).

Another thing I notice is that a few rather spammy (link farm driven) listings still do well on the phrase "home furnishings toronto." (I won't say which ones are the cheesiest as you're not supposed to out people on the forums.) No doubt, those listings will eventually be gone. But it looks like the reason they'll be dropping may have everything to do with spam reports by users and competitors. The top 20 listings in popular categories eventually come to the attention of the engines, and eventually the ones getting too much traffic by virtue of deliberate interlinking will get penalized based on judgment, not algorithms. Honestly, all you really have to do in a lot of cases is to look at the inlinks, then glance at the sites involved. So - human filtering is happening. There is so much going on behind the scenes, it's not funny.

If a site that got registered five years ago gets penalized for link farming, that could mean that over time, the average age of top-ranking sites on valuable phrases gets EVEN OLDER... at least until there's a user backlash and users decide they prefer freshness & diversity at the cost of at least some spam.

So are all new sites seen as suspect in a spam-ridden world? Yes, it seems, when it comes to ranking on popular, lucrative, high volume phrases, as long as users and competitors alike scream about spam.

Is this new? No, I don't think so.

Is it good for searchers? Not really. It would be better if SE's could understand what is really relevant to a user instead of relying so much on "fail-safe" methods like giving so much credence to domain age and stability/age/reliability/relevance of linkage. Someday they'll be better at personalization etc.

Can you explain the sandbox rules, or how long it might take, to rank well on a popular term? I doubt it!

Anyway, to sum up, isn't the idea of a sandbox on core popular terms built right into what the current generations of SE's actually measure, which is reputation, etc.? Is that not business as usual?

I suppose the problem with trying to suss out just exactly what the so-called sandbox is, its rules & parameters can change without notice, and exceptions can prove & disprove "rules" all over the place.

Jill Whalen
01-03-2006, 10:22 PM
how any brand new site is going to rank well for the phrase "home furnishings"

They're not.

But there is some middle ground between a phrase like the one you previously mentioned which basically gets no searches, and a phrase like "home furnishings."

It's those middle ground phrases that the aging delay eats for breakfast.

When you do searches for phrases that appear in the title tag of a page, yet those words happen to be on some other people's pages, but your relevant page shows after every single one, you will know what the aging delay feels like. I can't stress enough how completely different it is to the usual "takes awhile to rank" phenonmenon that we all know and love.

Can you explain the sandbox rules, or how long it might take, to rank well on a popular term? I doubt it!

I haven't dealt with it enough to say for sure, but the sites I've seen there are no rules. It's simply wait 9 months (approx) and bang your out regardless of anything else you do.

dazzlindonna
01-03-2006, 10:35 PM
On competitive phrases, hasn't it long been the case that SE's won't just rank new sites out of the blue on popular terms?

Not necessarily. I know myself and many other SEOs could fairly easily rank for competitive phrases within 30 days (back in 2003 and early 2004) with a brand new site - of course with lots of links being thrown at the site. The average site created by John Doe...well, yeah, it probably took him a while.

How can anyone point with certainty to the "day" the "sandbox" was "invented"?

Not the "day", but certainly a narrow range of time. Many SEOs began noticing the change in March/April of 2004, myself included. Between Jan. and March, I launched several new sites...30 days to the top. March/April and onwards...no SERPs love anymore. At that time, we all started taking a long hard look at what was going on, and it was obvious that something major had changed.

Marcia
01-03-2006, 10:44 PM
I have, however, seen long-established sites start ranking poorly for terms which they formerly held great rankings. This has always been in conjunction with a redesign effort or site architectural changes. That's because canonicals and site structures are being looked at, which *may* enter into the collection of factors that make up the delay, but a site that drops out of rankings because of a structural change has nothing to do with being sandboxed - it's something else.

Can you explain the sandbox rules, or how long it might take, to rank well on a popular term? I doubt it!There can be no rules, because it isn't a "thing" - it's a set of qualities and values looked for, combined with a set of filters. It isn't just one thing - it's a combination of things all put together.

Some sites will never come out of it because of continuing to run into certain filters or not accruing enough of what it takes to rank for given search terms. At some point, that's no longer the "sandbox delay" for those sites, they just don't qualify to rank or have something wrong that's preventing ranking.

mcanerin
01-04-2006, 01:08 AM
There can be no rules, because it isn't a "thing" - it's a set of qualities and values looked for, combined with a set of filters. It isn't just one thing - it's a combination of things all put together.

Exactly!

It's like asking what causes death. The list of possibilities and causes is almost endless and therefore the question can't really be answered as asked, but that doesn't mean there is no such thing as death as a result.

It just means that it's the WRONG question.

Any question involving the word "sandbox" is probably badly worded and therefore unanswerable, IMO.

That doesn't mean that the effect isn't real - it means that by using the term "sandbox" you have limited yourself already in the type of answer - it's an inherently biased question because it assumes that the definition of "sandbox" is fixed or even can be limited to a single set of circumstances.

The final effect on the other hand, like death, is pretty unmistakable, if you know what to look for.

Ian

I, Brian
01-04-2006, 05:49 AM
But while we're on this topic, could you explain to me how any brand new site is going to rank well for the phrase "home furnishings" -- "sandbox" or no "sandbox"?


It's not simply "brand new" but simply "newer" - which can mean a domain even a few years old. If you link build for a pre-2000 and post-2000 domain, the difference in ranking ability is significant, even when the link record for both sites are pretty much the same.


I suppose the problem with trying to suss out just exactly what the so-called sandbox is, its rules & parameters can change without notice, and exceptions can prove & disprove "rules" all over the place.

Certainly it's a concept that has become more developed at Google. For a brief history of how the term entered the SEO language, see this:
http://www.platinax.co.uk/blogs/brian/30-09-2004/the-google-sandbox-an-early-history/

Nacho posted a good list of links discussing the issue at SEW a while back:
http://forums.searchenginewatch.com/showthread.php?t=1917

claus
01-04-2006, 12:53 PM
>>First you do the reasoning, then you build the algo.

claus, that may be the initial sequence, but then the algo itself has no reasoning capability, it's just a set of programs.

Agree. Well, I don't even need to agree, it's as obvious as daylight :)

Stanford's criteria for credibility is fine and good, but it basically takes most all human judgment and reasoning and there's precious little in there that an algo can determine. Last updated, sure - but can they detect if it's any indication of value? nope.

That's not really an unusual situation. It's a fairly "classic" kind of problem in Marketing Research (or in any other type of research for that matter, I believe). You want to measure something, but the thing you really want to measure just don't lend itself to measuring.

So, you have to measure something else in stead. Things that have some kind of connection to the items that you really want to track. Approximations.

I did specifically say that that article was *not* the easy fix. I did hope that it would be an inspiration, though.

There is virtually nothing mentioned in any search patents or white papers that has anything to do with companies and their credibility or quality or their marketing expertise, as such, not in the sense that human judgment can determine by observation and reasoning.

Again I don't even need to agree as it's obvious. The keyword from above is "approximation (http://en.wikipedia.org/wiki/Approximation)". You can't make a computer perform human judgment, so you make it do something it can in stead - only, you try to make sure that what the computer does is somehow related to what you would really want it to do (if only it could).

That's where the link-based algos came in, PageRank and HITS - trying to use links as measures of importance and reputation.

Exactly. That's hitting the nail right on the head. It's "measures of" and not the real thing itself.

And those can be and *are* tampered with, or in some cases, the deck could be stacked in the first place, as in the case of long-established Fortune 500's which get around filters that result in the "sandbox effect".

You know, lists... *sigh* I often build and maintain various types of lists. Quite elaborate lists, usually.

From an Engineers perspective (I know a few) they're a PITA. You forget things that should have been on them, you include too much, you put something on them and then things change and they shouldn't be there again. Eg. a F500 list would only be the real list once a year when it's published. And then you've got all the Enrons of this world, too - as well as companies that move from serving one market to serving another, splitting up, reorganizing, merging, changing names, and buying/selling. And then there are errors.

Name any kind of list - as long as it gets big enough it's not a list anymore, it's a jungle.

But of course, in the Brick-and-mortar world there are companies specializing in making and maintaining lists of real businesses, as well as those that exist only on paper. So, I guess you could outsource that. I'm not saying I agree with you, and not that I disagree either.

I only find "stacked decks" as such a plain stupid thing to do, as the flexibility that Google needs would require a lot of manpower, and their usual power preference is electrical. Then again, perhaps they're being a bit stupid - it would not be the first time. Even with a high number of PhD's on the payroll they sometimes try out things that they haven't really got enough prior experience or knowledge about, and sometimes to an outsider some of those things look quite stupid.

Anyway, to cut them some slack:

What I find more likely is the thought that some of these "F500" have some properties that the other ones just don't have. IOW they're both out in the exact same rain, the F500s have just got a bigger umbrella. Or, they're on the exact same road, the F500s have just got more horsepower.


But that isn't by reasoning or judgment, it's by metrics that are mathematically measurable.

There's a *group* of filters operating

Okay, let me be a bit provocative...

Q: What's new here?

A: Nothing, really. It's the same as it ever was. Google just got smarter that's all.

and people who don't DO SEO don't comprehend it, and those who work with Fortune 500's and the like won't experience it because they're working with a deck that's stacked in their favor to begin with, that's got the ability to by-pass and overcome the effects of some of the key filters to begin with.

I could repeat the Q and A here. However that's not very productive. There are always differences in perspective. However, I don't deny the existence of those issues that you all refer to as "sandbox" issues. Not at all. But I *still* don't speak of a sandbox if I can help it. I maintain that the term is wrong and misleading.

Even "rain" does not equal "wet" - "no umbrella" plus "rain" is a bit closer.

I think it's more appropriate and fruitful to think in terms of "Survival of the fittest" than in term of sandboxes. <tongue-in-cheek> It is "organic" SERPs after all (sorry about the pun, couldn't help it ;) ) </tongue-in-cheek>

And those that are fittest in some contexts will be the established sites, while in other contexts it will be new-ish sites. (And of course, by "X sites" I mean "pages on X sites").

So, to turn the attention to something productive again, think about your typical plant. How would a new plant of any particular type get a slice of the precious sunlight? Those "signals of quality" I mention are what makes the sun shine on a smaller or larger part of your plant in stead of the other plants. And of course, the old and big plants tend to overshadow the new ones. No wonder.

So, let's say that you have a sun that favour the plants that have the highest likelihood of becoming nutritional ingredients in a salad. That might turn out to be the plants that already get some sunlight. Yes, of course it's skewed. No, of course all animals on the farm are not equal.

Hope you get my point now :)

</rant>

Marcia
01-04-2006, 04:20 PM
claus, did you read this whole thread? Scroll back up and read msg #73.

claus
01-04-2006, 07:25 PM
Yes I read it... I don't understand, perhaps I missed something? I just read that post a second time, still don't get it, I'm sorry - what did I miss? Was your post partially a response to #73 and not mine, is that it? If so, I'm sorry I didn't get it. :confused:

andrewgoodman
01-09-2006, 07:55 PM
I want to thank Jill in particular for explaining the "sandbox-like effect" to me so patiently. I don't think it hurts to ask "stupid questions," though. Because these forums tend to get rather self-referential and before you know it, some post someone made in October is required reading even though it was only hints and guesses.

So, although I can certainly see the existence of a sandbox-like effect, I do also hope I offered a bit of food for thought.

claus's statement:

Q: What's new here?

A: Nothing, really. It's the same as it ever was. Google just got smarter that's all.

...was probably closest to what I was trying to get at.

Considering all the junk that gets thrown so aggressively at the engines, it's a good thing for users that these new pages do get "sandboxed".

What has to happen in the future, though, is that Google has to get *even* smarter. The sandboxy treatment of newer pages is a pretty blunt instrument. It raises real questions about the moat-like divide between older and newer sites/pages. Can you keep extending your lead on newer sites, all else being equal, if you have "tenure" and "history"? If you have to wait up to a year to gain decent traction on SE's, then you might be in pretty shaky shape by the time you "come out." And that in turn will make it hard to crack the top rankings, etc.

But if Google tries to get *even* smarter to *validate* sites in some way, then what form does that take? Clearly, they are thinking about that on several fronts. They have a verification system for the Local listings product; they have editors for AdWords and News; they have SiteMaps; etc.

So in the future Google seems poised to consider forms of paid inclusion or at least "trusted inclusion"; or to introduce further editorial intervention (or more weight on editorial gatekeepers) that they don't admit is editorial at all.

I think we do need to be asking more specific questions here, trying to isolate *what* needs to be older to help you expedite the exit. Domain? Pages known to Google? Links? Business registration date? Other? A combination of things? It may well be that the sandbox-like treatment of new pages & sites is in itself, in a kind of infancy. And will soon become more sophisticated, so the "effect" is felt very differently by different new sites & businesses.

PhilC
01-09-2006, 09:30 PM
This is a fantastic thread!

It seems to me that there isn't any major difference of opinions as to whether or not the sandbox effect exists. Andrew Goodman suggests that it's just a development of the age-old delay in getting rankings for decent searchterms, but he does seem to accept that there is a change to the age-old delay. The other side says the same thing, except that it's not merely a development of the age-old delay, but an intentional thing by Google.

Certainly there was a specific time period when the sandbox effect was realised, as dazzlindonna pointed out, so either a new sandbox effect started then, or a development of the age-old delay came into play then. Either way both Andrew's view, and the other view, amount to the same thing - there is a sandbox-like effect, which can be simply called "the sandbox". The only real difference is how it came about, but that doesn't matter.

I rarely create new sites, and I've no personal experience of the sandbox, but I'd like to suggest something that occured to me whilst reading this thread...

It's almost unanimous that long tail terms aren't affected by the sandbox, and it's the more popular terms that are affected. The thinking seems to be that it's the searchterms that make the difference. But how about this for an alternative possibility:-

The reason for the difference between searchterms is not because the more popular ones are listed in some way, but it's the site's/page's confidence score that determines it all. So when Google can get a large enough results set from pages that have a good confidence score, they show them. But when they can't get a large enough results set, they include pages that don't have a high confidence score - just like they do with pages in the Supplemental index. Since popular searchterms are targeted by many sites, there is no problem in getting a large enough results set without needing to include low confidence pages.

I've never liked the idea of a search engine having a list of searchterms for special treatment. It's come up a number of times in the past, and it just seems unGoogle-like to me. I seriously like the 'confidence' idea that's been put forward as what the sandbox is about, and, for me, the size of the results set is a much more pallatable idea than an arbitrary list of popular searchterms.

Marcia
01-09-2006, 09:36 PM
We can also ask why some sites go up and are never sandboxed at all - which some aren't. If it were strictly an age thing that wouldn't happen; it isn't that simple.

It's a collection of algo requirements and filters that result in the "sandbox effect" for most new sites, but obviously, some sites don't get sandboxed, so those must pass muster in spite of their age. So there have to be factors or indicators that over-ride the filters and the age factor and allow some sites to rank.

PhilC
01-09-2006, 09:42 PM
We can also ask why some sites go up and are never sandboxed at all - which some aren't. If it were strictly an age thing that wouldn't happen; it isn't that simple.Is that a response to my post, Marcia? If it is, I didn't suggest anything about how the confidence score is arrived at. I only suggested that it might not be the searchterms themselves that decide whether or not a low confidence page is listed, but the size of the results set that Google can compile for the query.

added:
In Google's original engine, they compiled a results set of about 40,000 pages, which they then ranked according to certain criteria. I'm suggesting that, when they can get a suitably sized results set for a query without needing to include low confidence pages, as they can for popular searchterms, then they don't include low confidence pages. But when they can't get a suitably sized results set, they do include low confidence pages. In that way, it isn't the searchterms themselves that decide whether or not a low confidence page is ranked, but the size of the results set.

Marcia
01-09-2006, 10:11 PM
Is that a response to my post, Marcia?No Phil it wasn't (I posted before you did, just hadn't hit enter yet), but the confidence score is a good point, apart from arbitrarily being 100% age-dependent.

So how come some sites never experience the "effect" and don't ever get hit with it, while others have found ways to get out from under it?

I'm not convinced it's totally number of results available to return for a search, because I've got a site that never got sandboxed and it started out ranking for search terms with from 200K pages returned on up to close to 500K pages returned and never hit the sandbox; it's been steady like that all along. It's now ranking for a search term that's got close to 2 million pages returned for it - after 5-6 months. BTW, it's not a commercial site and while there may be plenty of pages returned and those initial search terms get looked for a lot, there's no commercial value.

PhilC
01-10-2006, 12:01 AM
If there's a confidence score, we don't know how they come by it - we don't know what a site needs to have (apart from time) to get a good enough score to be ranked properly, so we don't know if your site had what it takes.

200k results isn't a lot, and it's possible that the searchterms weren't popular enough to make many of the 200k pages rank well for it, but they got there because they satisfied the criteria for only one of the searchterm words. Perhaps the compiling algo gets what pages it can that contain all the words, including low confidence pages if necessary, plus all the pages it can that contain fewer words - if you see what I mean. I haven't worded that very well.

I'm suggesting that the searchterms weren't popular enough to fill a 40k results sets without including low confidence pages. Don't forget that the results set isn't the 200k or n million results etc. They get a small results set regardless of how many actual results there are.

andrewgoodman
01-10-2006, 03:28 AM
There is also personalization to factor into the mix -- and, I naively hope, coming soon... better personalization.

Experts at the engines tell us that users wouldn't want to set a dial to make "page freshness" a preference for them, but one way or another, SE's are going to privilege fresh pages on your behalf, in different ways. (Or, they'll punish them.)

True, but I still love playing with that feature on MSN Search. (Or at least I do in theory. The feature is too one-dimensional to be effective.)

Of course freshness is something you measure on established sites. Fresh pages on fresh sites... another matter worthy of sandbox-like discourse.

In any case, if you were to ask me, I'd say having 40 results that don't include more than a handful of new pages is a potential negative, but then again, I suppose that's query-dependent. On a stable term, you get "stable pages". On a "hot" term, perhaps freshness matters. Which is too bad, because that means my blog post on Scarlett Johansen which has ranked in the top ten for over a month now is soon going to cool off.

You can only assume there is so much for the SE's to consider in matching pages with users, that it would be wrong to get too down in the dumps about there being a permanent "sandbox" affecting all new ventures.

P.S. I don't like the idea of buying an old domain and bolting your new site onto it, because domain age is probably going to get downplayed as a criterion if too many sites start doing that. Plus, if you've chosen your company name carefully, why would you go out and buy up some other name?? On the other hand, acquiring established sites that others are undervaluing could be a smart move.

Jill Whalen
01-10-2006, 09:34 AM
Glad to see you finally drank the koolaid, Andrew. Now can you pour a glass for Mike? :D

Robert_Charlton
01-10-2006, 06:09 PM
Glad to see you finally drank the koolaid, Andrew. Now can you pour a glass for Mike? :D

I think Mike's drinking champagne now and doesn't drink Kool Aid anymore. ;)

Jill Whalen
01-10-2006, 06:42 PM
Maybe if we tell him it's Merlot? ;)

claus
01-10-2006, 07:07 PM
You're all entitled to your opinions, and I'm not really getting any kicks out of being a rebel these days, but I stick to my own opinion nevertheless...

All I'm saying is that if you think you can rank for [Texas Holdem] in a month just by filling a few thousand pages with related words and throwing a few hundreds (or thousands) links up, then your strategy might not be as long term as you would want it to be. It just might evolve to be a thing of the past, if it's not there already.

Thinking a bit ahead, if you're simply doing what anybody else with sufficient ressources could be/are doing, and you're not among the first doing it, then why should you rank at all? You know how many people climb Mount Everest each year? Would you like to carefully examine a list of the names?

Whatever... I don't think I've got more to add at this moment.

Big Bill
01-11-2006, 04:14 AM
Moderator Note: Thread split from Getting Out Of Google Sandbox Using Subdomain & Redirection (http://forums.searchenginewatch.com/showthread.php?t=9137)

And I'll say it again... If your web site doesn't rank anywhere at all at a search engine - it's probably because it has no differentiating appeal or simply because it sucks.

What you actually need is called advertising and promotion and it has nothing to do with code of any kind, or subdomains or servers, or...



If I might, what a site needs can be summed up in one word; provenance.
People can look it up for themselves, I imagine.

BB

EcoSea
01-17-2006, 06:40 AM
My SEO GODS on one thread awesome. :)

MSN is listing some of my pages for my new site CheapCharliesHotels. I put almost same same but different content on EcoSea and the stuff on EcoSea gets ranked sometimes in a week or less on Google and the Cheapcharlieshotels only gets ranked if you type the url exactly. My limited experience is that MSN and Google rank the pages fairly much the same even though they use different methods so it seems to me to be semantics as their is a lag for new sites on Google.

I am wondering though that if you run a ppc against a new domain how much you would have to spend to have enough traffic to get listed earlier on Google? Assuming of course that you are not link rich or a Guru.

PS found out that cheap hotels is bad But I do kinda feel like Robin Hood for poor guys like me trying to find a hotel for under $100 a night.

Thanks for all the insights fred

Big Bill
01-17-2006, 09:36 AM
My SEO GODS on one thread awesome. :)



I blush...

BB :)

andrewgoodman
01-17-2006, 06:52 PM
If I might, what a site needs can be summed up in one word; provenance.
People can look it up for themselves, I imagine.


Nice word. Which brings me to a thorny issue.

Domain age is a poor substitute for proof of origin, historical record, or ownership. Why? Because cybersquatters and domain hijackers have played havoc with a significant portion of the domains owned by legit businesses.

Here's the (real life) scenario, name changed for privacy's sake. It affects the mid-sized business who hasn't been able to "get on top of" their web presence in a timely fashion, but wishes to do so now.

TrousersByPatricia is a beloved niche retail brand, with a brick and mortar presence in twelve malls in the midwest, one in Florida, and one in California. Weeks before the owner, Imelda (there is no Patricia), went to register the TrousersByPatricia domain name to launch an ecommerce site, some clever SOB registered TrousersByPatricia.com, as well as the .net, and the .org, etc. Angry, but thinking there was no recourse, Imelda contented herself with windfall brick and mortar sales, and nearing the peak of her business, wisely sold a stake in her company for $3 million. She sailed around the world, dressed in marvelous fashions, to celebrate. Then upon returning she sulked for nearly two years before hiring a consultant to figure things out with the web stuff. At this point, it was decided that the .biz was available, so it was launched. TrousersByPatricia.biz was born.

But to this day, the hijacked domain, which gets decent organic traffic and generates affiliate revenue from a faux catalog, outranks the "real" TrousersByPatricia.biz site. Worse, the hijacked site outranks the "real" site on queries like "trousers by patricia," "trousersbypatricia," and "patty's pants."

Eventually, Imelda will prevail, especially if she hires me, Al Franken, and some smart lawyers. But for 2-3 years while all this gets sorted out, a very bad site gets too much credit, and a real live retailer gets outranked by the fake site. Which just goes to show, search engines can be pretty dumb.

andrewgoodman
01-17-2006, 06:54 PM
My SEO GODS on one thread awesome. :)

MSN is listing some of my pages for my new site CheapCharliesHotels. I put almost same same but different content on EcoSea and the stuff on EcoSea gets ranked sometimes in a week or less on Google and the Cheapcharlieshotels only gets ranked if you type the url exactly. My limited experience is that MSN and Google rank the pages fairly much the same even though they use different methods so it seems to me to be semantics as their is a lag for new sites on Google.

I am wondering though that if you run a ppc against a new domain how much you would have to spend to have enough traffic to get listed earlier on Google? Assuming of course that you are not link rich or a Guru.

PS found out that cheap hotels is bad But I do kinda feel like Robin Hood for poor guys like me trying to find a hotel for under $100 a night.

Thanks for all the insights fred

On the PPC question, my impression from recent new accounts is that it makes no difference in the short term, as far as exiting the sandbox-like nether world. Over a period of years, it all helps you withstand the ups and downs of algos, as you build a real live customer base, links, and other behaviors out in the "real world" which a smart SE can measure. We are talking about a war of position requiring more staying power than the NHL playoffs.

stevenmad
07-10-2006, 01:02 PM
hi there, I have about twenty websites and I've never experienced the sandbox. The only thing that I do, is some basic SEO, and that's it. Sometimes I submit, others I don't, they always get spidered and then included in the index after a few days, less than a week.

scrubs
07-10-2006, 01:12 PM
hi there, I have about twenty websites and I've never experienced the sandbox. The only thing that I do, is some basic SEO, and that's it. Sometimes I submit, others I don't, they always get spidered and then included in the index after a few days, less than a week.

You must have some sort of linking strategy in place to assure they get spidered that quickly? Are you achieving good natural results after this short space of time?

I personally would not tell a customer they would achieve natural results in less then a week.

brandall
07-10-2006, 01:13 PM
hi there, I have about twenty websites and I've never experienced the sandbox. The only thing that I do, is some basic SEO, and that's it. Sometimes I submit, others I don't, they always get spidered and then included in the index after a few days, less than a week.
Getting spidered and indexed has NOTHING to do with the sandbox, so the fact that you sites get spidered and indexed tells us nothing about whether or not you avoid the sanbox.

Want to convince me you can avoid the sandbox? Nuy a new domain today. Build a site on it and rank in the top 20 for Gift Cards in the next 3 - 4 months. 3 years ago, that could be done. Today, from what I have seen, it cannot.

scrubs
07-10-2006, 01:16 PM
Those were the days brandall :) These days it more about enjoying a challenge!

Phil Beck
07-10-2006, 04:50 PM
I set up my website in March 2005, it took a couple of months or so for Google to find it, at that time to see it anywhere in Google I had to type in my name or postal code, generic terms such as IVA appeared nowhere and Individual voluntary arrangement appeared firstly on page 37, slowly climbing up to page 26, even though my website got to pagerank 5/10, well below sites with much lower pagerank.

Now, suddenly last week I find that my website has been promoted from page 26 to page 1 of the rankings. The website hasn't changed in any way nor has it suddenly acquired any high ranking links. My conclusion: I've dropped out of the sandbox after 15 months, now I'm finally ranking in line with my competitors.

EcoSea
07-13-2006, 12:20 AM
My site Cheap Charlies Hotels has my 5 years of amateur SEO SEM experience behind it and I am running PPC at it and finally after 8 months or so can you find it searching on the name Cheap Charlies hotels. The Google index had it if you typed the URL exactly within 2 weeks from spidering. While on MSN it was showing up near the top on some terms within a month. I am seeing the exact same thing with a new site I put after Cheap Charlies.