Special thanks to:
|
#81
|
||||
|
||||
|
"Is", "Isn't", or "Always Was"
Sure, fair enough.
But while we're on this topic, could you explain to me how any brand new site is going to rank well for the phrase "home furnishings" -- "sandbox" or no "sandbox"? You would have to build up relevant linkage and other indicators of a page's meaning & status before you could rank at all on those kinds of phrases. So the page's score would be so low that it would be zero or very near zero, and not worth displaying at all. I'm thinking as long as Google's algo has been sophisticated enough to filter out the worst kinds of link spam and assess behavioral/quality indicators, there would have been a sandbox-like effect on competitive phrases. Today, this site [we were talking about homestars.ca by the way, but it started life as homedirection.ca] appears #1 & #3 on a designer's name, "hildi weiman," etc. In this case you can see that the #1 listing is of the old site, so both sites are ranking on the phrase... which makes this whole site a bad example to use, because it'll be awhile before Google figures out that homestars and not homedirection is the real site. I agree that's not a popular phrase, but... We seem now to be defining the sandbox effect as "not getting high listings on very competitive phrases." But isn't the point of assessing the linking structure of the web one that would have inherently involved a sandbox effect for engines like Google and Teoma, so this current situation is more of a continuation/extension of something that always existed? I wrote a blurb in October 1999 - http://www.traffick.com/story.asp?StoryID=29 - about Google, pointing to an argument that was emerging at the time against Google and PageRank: "Google's reliance on an automated measure of 'reputation' may magnify the popularity of the biggest, most popular sites, and make it difficult for newer, high quality sites to be discovered." and "A major issue may be ‘lag time’ or inertia. Older, more established sites may fare better, and this can become a vicious circle. Some now-obscure pages buried deep in a major website's archives may rank too high." On competitive phrases, hasn't it long been the case that SE's won't just rank new sites out of the blue on popular terms? How can anyone point with certainty to the "day" the "sandbox" was "invented"? Probably because it never was, but what seems like a sandbox effect has ebbed and flowed as the technology has evolved. You could argue that editorial review (dmoz, etc.) is a "sandbox" as well. Editorially, sites and pages need to be "accepted" and gain some kind of confidence score higher than "infinitesimal" before they're going to be featured on a search engine. For a new site, they don't even have basic site-specific info for how often it updates. That data takes time to gather. Who expects to rank on a term like "home furnishings toronto" overnight? I checked the registration dates for the current owners of the sites in the top ten listings on that particular query. They are: Sep. 1996 Jan. 2001 Feb. 1997 Jan. 1999 Feb. 1996 [Google Directory category] Mar. 1999 Apr. 1998 Jan. 1998 Jan. 2003 -- next 10: May 1996 Aug 2000 Jan. 2003 [already mentioned] [dealtime] [already mentioned] Aug. 1994 Nov. 2000 [already mentioned] [already mentioned] -- next 10: Oct. 1999 Feb. 2000 Oct. 1994 May 1997 Feb. 2005 [page on romanian adult industry discussion site / redirect to a furniture company page / thus spam ] [so we get to the 25th result before freshness trumps reliability, and smack, a spam page is the result - until here, the youngest site is three years old] Oct. 1995 [already mentioned] May 2003 Nov. 1997 Mar. 2000 -- next 10: [already mentioned] [already mentioned] [craigslist] Nov. 2002 Mar. 1996 Jul. 2001 Aug. 2003 [spammy, broken, irrelevant, India] [yahoo directory] [yellowpages.ca] Sep. 2000 Some sandbox! If you were telling a client how long you'd have to wait before having a shot at being ranked in the top 40 on a moderately popular term like "home furnishings toronto," (and of course you get virtually no clicks outside of the top 10 anyway), you'd have to tell them THREE YEARS!! (Unless they have special Romanian or Indian spam techniques up their sleeve, in which case they'd make it to #25 or #37 and get no clicks anyway.) Or if you wanted to give them the average or median age of site in those positions: more like 4-6 years. Again I say: that's some sandbox! But maybe we should be going for something more retail-practical, like a certain type of chair [recliners, for example]. Looking at one such query I see sites, including a client's site, in top ten positions with a fairly similar pattern as far as domain age goes: they are all old, having been registered in years like 1997, 1995, 2000, etc. Either that or they are portal sites like bizrate or "knowns" like Google Answers. Again, quite a sandbox! Although I was too lazy to check beyond a few of them, I also assume that *all* of the above (save for the spam entry that snuck in, possibly because Google was having trouble doing its automated checking on foreign domains??) have a stable, long-standing pattern of inlinks from sites with high confidence. As one of the SE reps said in Chicago, a small retailer just doesn't get 1,500 links all of a sudden, spontaneously. Most would be happy to have 1,500 customers. Both the age of the sites, and the continued importance of some types of links, underscores the limitations of the search technology. There is no particular value to this link, for example: http://www.ctv.ca/servlet/ArticleNew...5034_100369888 Except that it's a pretty sweet link from a national television station to a major furniture retailer. In short, a kind of "crony system." Little wonder then that companies will try to recreate spammy versions of same. To weed these schemes out, further/stronger filters and double-checks of quality are used, and that makes the pre-existing "sandbox effect" or "crony system" stronger (but if it's relaxed, spammy stuff comes right back in, so it can't be relaxed too much). I guess a lot of it is about who you know, as always, huh? Just playing devil's advocate. If a site is brand spanking new, I'm not sure how any of its pages could have enough PageRank (or other link recognition, or quality indicators) to outrank established pages/sites on core, popular terms. Based on what? By definition, such new sites come in tabula rasa (if they don't, please explain to me how they don't) and are de facto assumed to be spam until they prove otherwise. Guilty until proven innocent. If a new site has 4-5 quality new links, maybe it should rank, but I see why it won't. Mainly because link schemes are so prevalent and SEO's have been sitting around aging domains and buying them and so forth, I'm sure there is some kind of waiting period in order to gain high enough confidence that a site isn't seen as "suspect." Could something (evidence of major unmistakeable user interest in a new site) override that waiting period? I suspect so, but presumably there has always been a waiting period of at least 60-90 days before being decently indexed. Just because it seems longer now doesn't mean that's the sandbox length or anything like that. I don't know if "sandbox length" even makes any sense. Perhaps site history with organic is similar to how account history is measured in the AdWords algo now: undisclosed, but likely on a continuum. There is no set waiting period - you simply build a history (Ian said it -- reliability/confidence increases with more data). Another thing I notice is that a few rather spammy (link farm driven) listings still do well on the phrase "home furnishings toronto." (I won't say which ones are the cheesiest as you're not supposed to out people on the forums.) No doubt, those listings will eventually be gone. But it looks like the reason they'll be dropping may have everything to do with spam reports by users and competitors. The top 20 listings in popular categories eventually come to the attention of the engines, and eventually the ones getting too much traffic by virtue of deliberate interlinking will get penalized based on judgment, not algorithms. Honestly, all you really have to do in a lot of cases is to look at the inlinks, then glance at the sites involved. So - human filtering is happening. There is so much going on behind the scenes, it's not funny. If a site that got registered five years ago gets penalized for link farming, that could mean that over time, the average age of top-ranking sites on valuable phrases gets EVEN OLDER... at least until there's a user backlash and users decide they prefer freshness & diversity at the cost of at least some spam. So are all new sites seen as suspect in a spam-ridden world? Yes, it seems, when it comes to ranking on popular, lucrative, high volume phrases, as long as users and competitors alike scream about spam. Is this new? No, I don't think so. Is it good for searchers? Not really. It would be better if SE's could understand what is really relevant to a user instead of relying so much on "fail-safe" methods like giving so much credence to domain age and stability/age/reliability/relevance of linkage. Someday they'll be better at personalization etc. Can you explain the sandbox rules, or how long it might take, to rank well on a popular term? I doubt it! Anyway, to sum up, isn't the idea of a sandbox on core popular terms built right into what the current generations of SE's actually measure, which is reputation, etc.? Is that not business as usual? I suppose the problem with trying to suss out just exactly what the so-called sandbox is, its rules & parameters can change without notice, and exceptions can prove & disprove "rules" all over the place. Last edited by andrewgoodman : 01-03-2006 at 09:08 PM. |
|
#82
|
|||
|
|||
|
Quote:
But there is some middle ground between a phrase like the one you previously mentioned which basically gets no searches, and a phrase like "home furnishings." It's those middle ground phrases that the aging delay eats for breakfast. When you do searches for phrases that appear in the title tag of a page, yet those words happen to be on some other people's pages, but your relevant page shows after every single one, you will know what the aging delay feels like. I can't stress enough how completely different it is to the usual "takes awhile to rank" phenonmenon that we all know and love. Quote:
Last edited by Jill Whalen : 01-03-2006 at 10:24 PM. |
|
#83
|
|||
|
|||
|
Quote:
Quote:
|
|
#84
|
||||
|
||||
|
Quote:
Quote:
Some sites will never come out of it because of continuing to run into certain filters or not accruing enough of what it takes to rank for given search terms. At some point, that's no longer the "sandbox delay" for those sites, they just don't qualify to rank or have something wrong that's preventing ranking. |
|
#85
|
||||
|
||||
|
Quote:
It's like asking what causes death. The list of possibilities and causes is almost endless and therefore the question can't really be answered as asked, but that doesn't mean there is no such thing as death as a result. It just means that it's the WRONG question. Any question involving the word "sandbox" is probably badly worded and therefore unanswerable, IMO. That doesn't mean that the effect isn't real - it means that by using the term "sandbox" you have limited yourself already in the type of answer - it's an inherently biased question because it assumes that the definition of "sandbox" is fixed or even can be limited to a single set of circumstances. The final effect on the other hand, like death, is pretty unmistakable, if you know what to look for. Ian
__________________
International SEO |
|
#86
|
|||
|
|||
|
Quote:
Quote:
http://www.platinax.co.uk/blogs/bria...early-history/ Nacho posted a good list of links discussing the issue at SEW a while back: http://forums.searchenginewatch.com/...ead.php?t=1917 |
|
#87
|
|||||||
|
|||||||
|
Quote:
![]() Quote:
So, you have to measure something else in stead. Things that have some kind of connection to the items that you really want to track. Approximations. I did specifically say that that article was *not* the easy fix. I did hope that it would be an inspiration, though. Quote:
Quote:
Quote:
From an Engineers perspective (I know a few) they're a PITA. You forget things that should have been on them, you include too much, you put something on them and then things change and they shouldn't be there again. Eg. a F500 list would only be the real list once a year when it's published. And then you've got all the Enrons of this world, too - as well as companies that move from serving one market to serving another, splitting up, reorganizing, merging, changing names, and buying/selling. And then there are errors. Name any kind of list - as long as it gets big enough it's not a list anymore, it's a jungle. But of course, in the Brick-and-mortar world there are companies specializing in making and maintaining lists of real businesses, as well as those that exist only on paper. So, I guess you could outsource that. I'm not saying I agree with you, and not that I disagree either. I only find "stacked decks" as such a plain stupid thing to do, as the flexibility that Google needs would require a lot of manpower, and their usual power preference is electrical. Then again, perhaps they're being a bit stupid - it would not be the first time. Even with a high number of PhD's on the payroll they sometimes try out things that they haven't really got enough prior experience or knowledge about, and sometimes to an outsider some of those things look quite stupid. Anyway, to cut them some slack: What I find more likely is the thought that some of these "F500" have some properties that the other ones just don't have. IOW they're both out in the exact same rain, the F500s have just got a bigger umbrella. Or, they're on the exact same road, the F500s have just got more horsepower. Quote:
Q: What's new here? A: Nothing, really. It's the same as it ever was. Google just got smarter that's all. Quote:
Even "rain" does not equal "wet" - "no umbrella" plus "rain" is a bit closer. I think it's more appropriate and fruitful to think in terms of "Survival of the fittest" than in term of sandboxes. <tongue-in-cheek> It is "organic" SERPs after all (sorry about the pun, couldn't help it ) </tongue-in-cheek>And those that are fittest in some contexts will be the established sites, while in other contexts it will be new-ish sites. (And of course, by "X sites" I mean "pages on X sites"). So, to turn the attention to something productive again, think about your typical plant. How would a new plant of any particular type get a slice of the precious sunlight? Those "signals of quality" I mention are what makes the sun shine on a smaller or larger part of your plant in stead of the other plants. And of course, the old and big plants tend to overshadow the new ones. No wonder. So, let's say that you have a sun that favour the plants that have the highest likelihood of becoming nutritional ingredients in a salad. That might turn out to be the plants that already get some sunlight. Yes, of course it's skewed. No, of course all animals on the farm are not equal. Hope you get my point now ![]() </rant> Last edited by claus : 01-04-2006 at 12:56 PM. |
|
#88
|
||||
|
||||
|
claus, did you read this whole thread? Scroll back up and read msg #73.
|
|
#89
|
|||
|
|||
|
Yes I read it... I don't understand, perhaps I missed something? I just read that post a second time, still don't get it, I'm sorry - what did I miss? Was your post partially a response to #73 and not mine, is that it? If so, I'm sorry I didn't get it.
![]() |
|
#90
|
||||
|
||||
|
Pesky agglomeration of granules revisited
I want to thank Jill in particular for explaining the "sandbox-like effect" to me so patiently. I don't think it hurts to ask "stupid questions," though. Because these forums tend to get rather self-referential and before you know it, some post someone made in October is required reading even though it was only hints and guesses.
So, although I can certainly see the existence of a sandbox-like effect, I do also hope I offered a bit of food for thought. claus's statement: Q: What's new here? A: Nothing, really. It's the same as it ever was. Google just got smarter that's all. ...was probably closest to what I was trying to get at. Considering all the junk that gets thrown so aggressively at the engines, it's a good thing for users that these new pages do get "sandboxed". What has to happen in the future, though, is that Google has to get *even* smarter. The sandboxy treatment of newer pages is a pretty blunt instrument. It raises real questions about the moat-like divide between older and newer sites/pages. Can you keep extending your lead on newer sites, all else being equal, if you have "tenure" and "history"? If you have to wait up to a year to gain decent traction on SE's, then you might be in pretty shaky shape by the time you "come out." And that in turn will make it hard to crack the top rankings, etc. But if Google tries to get *even* smarter to *validate* sites in some way, then what form does that take? Clearly, they are thinking about that on several fronts. They have a verification system for the Local listings product; they have editors for AdWords and News; they have SiteMaps; etc. So in the future Google seems poised to consider forms of paid inclusion or at least "trusted inclusion"; or to introduce further editorial intervention (or more weight on editorial gatekeepers) that they don't admit is editorial at all. I think we do need to be asking more specific questions here, trying to isolate *what* needs to be older to help you expedite the exit. Domain? Pages known to Google? Links? Business registration date? Other? A combination of things? It may well be that the sandbox-like treatment of new pages & sites is in itself, in a kind of infancy. And will soon become more sophisticated, so the "effect" is felt very differently by different new sites & businesses. Last edited by andrewgoodman : 01-09-2006 at 08:46 PM. Reason: spelling error |
|
#91
|
|||
|
|||
|
This is a fantastic thread!
It seems to me that there isn't any major difference of opinions as to whether or not the sandbox effect exists. Andrew Goodman suggests that it's just a development of the age-old delay in getting rankings for decent searchterms, but he does seem to accept that there is a change to the age-old delay. The other side says the same thing, except that it's not merely a development of the age-old delay, but an intentional thing by Google. Certainly there was a specific time period when the sandbox effect was realised, as dazzlindonna pointed out, so either a new sandbox effect started then, or a development of the age-old delay came into play then. Either way both Andrew's view, and the other view, amount to the same thing - there is a sandbox-like effect, which can be simply called "the sandbox". The only real difference is how it came about, but that doesn't matter. I rarely create new sites, and I've no personal experience of the sandbox, but I'd like to suggest something that occured to me whilst reading this thread... It's almost unanimous that long tail terms aren't affected by the sandbox, and it's the more popular terms that are affected. The thinking seems to be that it's the searchterms that make the difference. But how about this for an alternative possibility:- The reason for the difference between searchterms is not because the more popular ones are listed in some way, but it's the site's/page's confidence score that determines it all. So when Google can get a large enough results set from pages that have a good confidence score, they show them. But when they can't get a large enough results set, they include pages that don't have a high confidence score - just like they do with pages in the Supplemental index. Since popular searchterms are targeted by many sites, there is no problem in getting a large enough results set without needing to include low confidence pages. I've never liked the idea of a search engine having a list of searchterms for special treatment. It's come up a number of times in the past, and it just seems unGoogle-like to me. I seriously like the 'confidence' idea that's been put forward as what the sandbox is about, and, for me, the size of the results set is a much more pallatable idea than an arbitrary list of popular searchterms. Last edited by PhilC : 01-09-2006 at 09:32 PM. |
|
#92
|
||||
|
||||
|
We can also ask why some sites go up and are never sandboxed at all - which some aren't. If it were strictly an age thing that wouldn't happen; it isn't that simple.
It's a collection of algo requirements and filters that result in the "sandbox effect" for most new sites, but obviously, some sites don't get sandboxed, so those must pass muster in spite of their age. So there have to be factors or indicators that over-ride the filters and the age factor and allow some sites to rank. |
|
#93
|
|||
|
|||
|
Quote:
added: In Google's original engine, they compiled a results set of about 40,000 pages, which they then ranked according to certain criteria. I'm suggesting that, when they can get a suitably sized results set for a query without needing to include low confidence pages, as they can for popular searchterms, then they don't include low confidence pages. But when they can't get a suitably sized results set, they do include low confidence pages. In that way, it isn't the searchterms themselves that decide whether or not a low confidence page is ranked, but the size of the results set. Last edited by PhilC : 01-09-2006 at 09:51 PM. |
|
#94
|
||||
|
||||
|
Quote:
So how come some sites never experience the "effect" and don't ever get hit with it, while others have found ways to get out from under it? I'm not convinced it's totally number of results available to return for a search, because I've got a site that never got sandboxed and it started out ranking for search terms with from 200K pages returned on up to close to 500K pages returned and never hit the sandbox; it's been steady like that all along. It's now ranking for a search term that's got close to 2 million pages returned for it - after 5-6 months. BTW, it's not a commercial site and while there may be plenty of pages returned and those initial search terms get looked for a lot, there's no commercial value. Last edited by Marcia : 01-09-2006 at 10:30 PM. |
|
#95
|
|||
|
|||
|
If there's a confidence score, we don't know how they come by it - we don't know what a site needs to have (apart from time) to get a good enough score to be ranked properly, so we don't know if your site had what it takes.
200k results isn't a lot, and it's possible that the searchterms weren't popular enough to make many of the 200k pages rank well for it, but they got there because they satisfied the criteria for only one of the searchterm words. Perhaps the compiling algo gets what pages it can that contain all the words, including low confidence pages if necessary, plus all the pages it can that contain fewer words - if you see what I mean. I haven't worded that very well. I'm suggesting that the searchterms weren't popular enough to fill a 40k results sets without including low confidence pages. Don't forget that the results set isn't the 200k or n million results etc. They get a small results set regardless of how many actual results there are. |
|
#96
|
||||
|
||||
|
There is also personalization to factor into the mix -- and, I naively hope, coming soon... better personalization.
Experts at the engines tell us that users wouldn't want to set a dial to make "page freshness" a preference for them, but one way or another, SE's are going to privilege fresh pages on your behalf, in different ways. (Or, they'll punish them.) True, but I still love playing with that feature on MSN Search. (Or at least I do in theory. The feature is too one-dimensional to be effective.) Of course freshness is something you measure on established sites. Fresh pages on fresh sites... another matter worthy of sandbox-like discourse. In any case, if you were to ask me, I'd say having 40 results that don't include more than a handful of new pages is a potential negative, but then again, I suppose that's query-dependent. On a stable term, you get "stable pages". On a "hot" term, perhaps freshness matters. Which is too bad, because that means my blog post on Scarlett Johansen which has ranked in the top ten for over a month now is soon going to cool off. You can only assume there is so much for the SE's to consider in matching pages with users, that it would be wrong to get too down in the dumps about there being a permanent "sandbox" affecting all new ventures. P.S. I don't like the idea of buying an old domain and bolting your new site onto it, because domain age is probably going to get downplayed as a criterion if too many sites start doing that. Plus, if you've chosen your company name carefully, why would you go out and buy up some other name?? On the other hand, acquiring established sites that others are undervaluing could be a smart move. Last edited by andrewgoodman : 01-10-2006 at 03:33 AM. |
|
#97
|
|||
|
|||
|
Glad to see you finally drank the koolaid, Andrew. Now can you pour a glass for Mike?
![]() |
|
#98
|
|||
|
|||
|
Quote:
![]() |
|
#99
|
|||
|
|||
|
Maybe if we tell him it's Merlot?
![]() |
|
#100
|
|||
|
|||
|
nevermind...
You're all entitled to your opinions, and I'm not really getting any kicks out of being a rebel these days, but I stick to my own opinion nevertheless...
All I'm saying is that if you think you can rank for [Texas Holdem] in a month just by filling a few thousand pages with related words and throwing a few hundreds (or thousands) links up, then your strategy might not be as long term as you would want it to be. It just might evolve to be a thing of the past, if it's not there already. Thinking a bit ahead, if you're simply doing what anybody else with sufficient ressources could be/are doing, and you're not among the first doing it, then why should you rank at all? You know how many people climb Mount Everest each year? Would you like to carefully examine a list of the names? Whatever... I don't think I've got more to add at this moment. |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|