View Full Version : Google and website 'profiles'
PhilC
07-08-2005, 12:25 PM
Here's an extract from this article (http://www.seo-scoop.com/direct_link.cfm?thepost=418) that wheelsoffire pointed to in this thread ( thishttp://forums.searchenginewatch.com/showthread.php?t=6678):-
The question asked of the Google engineer was in regards to whether or not Google penalizes or filters a site that has a sudden influx of new pages. This was the explanation given (paraphrased)...
Google does not specifically filter for any one particular thing like that. Instead the algorithm looks at other similar situations and determines if the action is good or bad. For example, if a 2-page site suddenly adds 10,000 pages, there may in fact be a legitimate reason for it to do so. But the algorithm will first make the assumption that the action is "suspicious" and will then look at a large sample of other 2-page sites that have suddenly added 10,000 pages. If the majority of those sites were considered spammy, then your site will get lumped into the same spammy category. Of course, if the majority of those sites were deemed to be legitimate, then your site would likewise be deemed legitimate.The question was about the sudden addition of a large number of pages to a site, but the response was in general terms, and is indicative of what's inside the Google 'mind'. I can think of a number of things that the response may account for - the sandbox, and Bourbon, for instance.
In another thread, we discussed whether or not Google had targeted directories in the Bourbon update. There is evidence that they were hit, but there are directories that weren't hit. There are also non-directory sites that were hit. One of my comments was that maybe they hadn't specifically targeted directories at all, but that some other thing may have encompassed a fair number of directories, and I'm wondering if the Google engineer's response might shed some light on Bourbon. I'm also wondering if it might shed some light on the sandbox.
The response shows that Google incorporates what I'm thinking of as 'profiles' into their system, so that a website's profiles are examined when necessary; e.g. the sudden addition of a lot of new pages, or the sudden addition of a lot of new IBLs, would be compared with sites that displayed the same changes in the past. Other profiles make sense to me as well - new sites that have too good a number of IBLs already, site structure (internal linkages), the site structure of new sites, sudden major changes in site structure, and so on.
Perhaps some new sites get into the sandbox because of their 'new site profile'. It could be the reason why some new sites escape that fate. Perhaps Bourbon hit many directories because Google applied a new profile that snapped many of them up along with other sites.
All this is just my immediate imagination, but that engineer's response is very revealing to my way of thinking, and I'd like to see some discussion on it.
wheelsoffire
07-08-2005, 03:32 PM
Reading that article shed some major light on my situation.
I had a site of 500 + pages. At the time, I didn't have a clue about SEO, Sandbox, or penalties.
The site was finally starting to get some traffic from the SE's. Including Google.
Then one day I decided to change the extensions of all my pages from php to html. I did this after buying the book Google "Google Hacks". There was a part by Brett Tabke that said you should try to stick with .html or .htm.
So without really thinking about any consequences, I did a big find and replace, uploaded the site, and it was done.
Not until a month or so later did I realize that I should have used 301 redirects.
My traffic dissapeared. MSN was actually pretty fast to realize what happened and re index all of the new pages.
Though my site is back in the google index (only after using sitemaps), a search for my domain name without the dots wont even show my site. I am only just recently getting referals again, but only for search terms in exact quotes.
So after reading that article, I feel pretty sure that Google only saw a jump of almost 600 pages all at once. I know its not 10,000, but I still think its enough to set off some alarms.
PhilC
07-08-2005, 04:09 PM
It could be, but I'd be more inclined to think it's simply that the new urls are new pages as far as the engines are concerned, and they have to go through the system and find their own rankings. That would be clouded by the continuing existance of the old urls in the index for some time. That would be my first thought, anyway.
It's a pity that the book didn't teach the point about 301s, or maybe you overlooked it :)
>profiles
I think they call them signals.
Its a very good perspective from which to view your own sites and quite the cutting edge it seems. First we had text driven, then link driven now we have signals, I like it.
PhilC
07-08-2005, 05:51 PM
They can call them what they want - I like 'profiles' because it describes them better :)
Actually, I think we are talking about different things. A signal is what is raised when something happens; a profile is the pattern that is compared - a small site suddenly adding a lot more pages, for instance.
>a small site suddenly adding a lot more pages
Thats a signal :)
The question is....is it a signal of quality?
PhilC
07-08-2005, 06:55 PM
It's a signal when it's an event, but it's a profile when it's a pattern to compare ;)
It's the concept of matching profiles/patterns that interest me, because it could account for a number of things. I know the idea has been thrown into the ring in little 'maybes' and as conjecture before, but I don't think we've seen anything quite as concrete as that engineer's response.
pleeker
07-08-2005, 07:19 PM
You're both right. How about this: A site's profile is made up of many signals.
Good way to look at it, PhilC. I like.
PhilC
07-08-2005, 07:57 PM
hehe...
Actually I was thinking of general profiles that are compared by Google, sometimes on-the-fly, rather than a website's profiles. E.g. Google notices an event, such as they find a new site, or a site changes something big, and that raises a signal. They then respond to the signal by treating the change as a profile (or pattern) and check it against similar changes on similar sites from the past. That's pretty much what the engineer said they do. I think of that pattern as a profile, whether on-the-fly or preset.
But I'd really like to discuss the possibilities that such a system might be responsible for some of the things that we see happening, or if it could be very limited in scope at the moment. Maybe it doesn't inspire anyone else in the way that it inspires me right now. I just thought that the engineer's response was very revealing of part of the Google 'mind' - the way that they actually conceive of dealing with some things.
Robert_Charlton
07-08-2005, 10:05 PM
But I'd really like to discuss the possibilities that such a system might be responsible for some of the things that we see happening...
Phil - On the thread you cite I had posted the following...
While I don't know all the details, I'm guessing that, based on this linking pattern, Google is not seeing your site as a good quality site, and that you've probably raised some flags.
If you raise enough flags with Google these days, you are liable to have ranking difficulties.
Essentially, I had some sort of profile system in mind. The article strongly suggests that statistical analysis, say, of page growth is one of the tools Google is using to look for flags, and that there are many such flags or signals.
We can assume that Google is looking at sites of varying quality (remember that thread about the quality checkers?) and statistically analyzing what other patterns should raise flags or add gold stars.
They're undoubtedly looking at patterns associated known spam techniques, and I'm sure they have a good idea what these techniques are.
And a lot of it is common sense. If you're spamming Google, what techniques would you use? Google mentions many possible flags in their recent patent, and I'm sure they're analyzing as many likely patterns, and patterns of patterns, among these as they can. Remember, the foundation of their business is the organization of data.
They probably then run ranking tests in which they weight statistical analysis, TrustRank, Hilltop, in various combinations with the other 100 factors, to see what gives the best overall results with the least collateral damage. I think statistical analysis by itself would have some big problems, but I suspect when Google can reach an acceptable margin of error using a particular combination of signs, flags, or omens, they'll use it. I'm also guessing that until computing power is sufficient, some of these flags are probably more broadly applied than any of us would like, including Google.
The over arching pattern, I think, is the one that fits the fabled "good quality" site, and I'm thinking that even many dark grey hats who are in it for the long haul are seeing that pursuing such a profile might be the best strategy. As the saying goes, "The secret of success is sincerity. Once you can fake that, you've got it made." The question is whether they can fake it. That can lead to questions of what socialization is in real life, and how we learn to be good.
PhilC
07-09-2005, 11:26 AM
As the saying goes, "The secret of success is sincerity. Once you can fake that, you've got it made." I can personally vouch for that where women are concerned ;)
What's "the over arching pattern"?
That was a very interesting overview, Robert.
Yes I do remember the thread about the quality checkers, and that, together with the engineer's response, certainly seems to indicates that we are watching the development of a significantly different type of search engine than the clinical "a + 2b + 3c = a top ranking" type that we have been used to through the years.
I have a client who's G listed backlinks suddenly included a very large percentage of scraper sites in February. It constituted a massive increase in backlinks, and all from scraper sites. At the same time, the rankings dropped - not totally out of site - just dropped significantly, so that the traffic plummeted. The site's name went from #1 to 100+ for a number of weeks, and then got back to #1, but the rankings that produced the traffic didn't recover much at all.
It was a mystery at the time, but I'm now considering the profile effect of those new IBLs from scraper sites. As you suggested, the computing power isn't yet sufficient to avoid throwing out some wheat with the chaff.
Some thoughts:-
Some, if not all, profiles could be preset, such as a site that adds above a certain percentage of new pages as compared with the number of it's existing pages, and a large number of new IBLs over a short period of time, as compared with the number of IBLs it had previously.
When they add a new profile, I'm wondering if they run it over the index. If they do, they would need to have dates of when new pages or new links are found, so that they can run the profile retrospectively. It could account for much of what we've seen in recent updates.
To be honest, it's all as clear as mud to me. The concept is straight forward enough, but applying it to the things that we've seen is just too open-ended at the moment. If they really are applying profiles (which may or may not be detrimental to a site) to significant site changes, including the appearance of new sites, and the appearance of new IBLs and pages, what's the best course to steer.
I know - I'm just rambling. My excuse is that it may matter :confused:
qp8qp
09-15-2005, 01:11 AM
I have a client who's G listed backlinks suddenly included a very large percentage of scraper sites in February. It constituted a massive increase in backlinks, and all from scraper sites. At the same time, the rankings dropped - not totally out of site - just dropped significantly, so that the traffic plummeted. The site's name went from #1 to 100+ for a number of weeks, and then got back to #1, but the rankings that produced the traffic didn't recover much at all.
It was a mystery at the time, but I'm now considering the profile effect of those new IBLs from scraper sites. As you suggested, the computing power isn't yet sufficient to avoid throwing out some wheat with the chaff.
I've always read that "inbound links can't hurt you otherwise competitors could manipulate others' rankings." But from some of the things I've read lately it seems like that isn't true. Isn't is a bad idea to give people a way to damage their competitors' rankings?
PhilC
09-15-2005, 09:40 AM
For a long time, people have had the idea that IBLs can't hurt you, but it was never true, and Google never said it. What they did is that IBLs can't "usually" hurt you.
qp8qp
09-15-2005, 12:25 PM
For a long time, people have had the idea that IBLs can't hurt you, but it was never true, and Google never said it. What they did is that IBLs can't "usually" hurt you.
It seems like that would make it possible for a competitor of disgruntled employee to sabotage a business' rankings.
PhilC
09-15-2005, 12:35 PM
That's always been the case, but the competitor runs the risk of boosting the target site's traffic. A new term has emerged recently - "Google bowling". It's exactly that - other people doing off-site ultra-spammy things that look like they are intended to boost the target site's rankings, with the intention of getting the target site banned or penalised.
qp8qp
09-15-2005, 01:00 PM
That's always been the case, but the competitor runs the risk of boosting the target site's traffic. A new term has emerged recently - "Google bowling". It's exactly that - other people doing off-site ultra-spammy things that look like they are intended to boost the target site's rankings, with the intention of getting the target site banned or penalised.
I heard about the http://yoursite.com and http://www.yoursite.com duplicate content problem and I fixed all my sites so that they redirect to "www". At least on the Linux servers--I don't know how to do it on IIS yet. I just started reading about googlebowling and see that they are talking about link text.
If is that easy for people to damage rankings that the search engines should have explicit warnings about what to look for in the webmaster guidelines. Now people will actively trying to destroy other people's web sites. Then other people will be destroying their own web sites while claiming that the competitor did it...
DanThies
09-15-2005, 02:14 PM
Does anyone have any info or links related to a "Google bowling" attempt actually achieving its objective?
I have seen a couple cases where "someone" has fired a bunch of, erm, questionable links at a site without their knowledge, but in both cases rankings improved. This in itself could be seen as a form of sabotage, since a significant increase in SE referrals is not necessarily a good thing for every business.
For example, I try to limit my organic traffic to the most relevant terms, so that my time isn't wasted by a bunch of casual tire-kickers. I would be really piffed if someone linked my site onto the first page for a marginally relevant yet high traffic search term like "search engine optimization." (Please don't do this, anyone - it really would be a problem.)