View Full Version : Measuring the "Quality" of Web Content
randfish
11-17-2004, 02:06 PM
An excellent post from EGOL over at SEOChat:
Someday, google will rank a site mainly, if not entirely, by its content. You can bull**** people with links but content is the only measure of true value.
I couldn't help but agree, but I also wondered, what technologies could be used to measure the actual 'value' and 'quality' of web based content - especially images, flash files, scripts, etc?
Is it possible for a computer to 'visit' a web page and objectively rank the quality of the content contained therein?
Is this an idea that search engineers are working with or have developed theories/ideas about? Or is the focus still on moving away from on-page factors?
>Is it possible for a computer to 'visit' a web page and objectively rank the quality of the content contained therein?
Well, sort of..... http://www.stanford.edu/class/archive/cs/cs276a/cs276a.1032/projects/reports/rdg12-afw.pdf
[warning PDF]
randfish
11-17-2004, 02:43 PM
The PDF (Using Semantic Analysis to Classify Search Engine Spam) was an interesting read, but it focused more on how to use patterns in documents to identify and classify them as spam or non-spam.
While this is very useful, it is (as far as I can tell) not being used by the major SEs yet. Also, I don't think it has application beyond recognizing or filtering for very poor content.
If an IR System is to rank content based on quality, it needs objective ways to measure how relevant and useful a human would find it.
NFFC - Are you suggesting that semantic analysis could be applied in a more advanced form to filter thousands of 'high quality' documents out of millions of 'average' documents? It's an interesting idea...
>not being used by the major SEs yet.
Not fair, you didn't ask for stuff that is being used :)
>advanced form
Semantics are, relatively, computationally expensive, IMHPO I think that as the web grows and the "work" that a search engine needs to do gets greater then we should look at "solutions" that require the least amount of work.
Some food for thought:
At one level simple semantics are already being used, I assume we all know that. I think that area will continue to grow and that the "quality control" of on page stuff will become a bigger factor. Its still easy to fake good content though and I don't see that changing.
Having said that I feel the major change that we will see from the SE's will [has already] involve "trust". "Trust" is harder to game than links and at its heart is a very, very simple concept, easy to impliment and light on costs you just need a few smart humans to seed it.
"If you can fake sincerity, you've got it made."
randfish
11-17-2004, 03:50 PM
NFFC,
With processor speed advancing so quickly, is computational expensiveness a real concern for the long-term?
Also, I like your idea of "trust" - are you suggesting some type of 'inclusion' program like BBB membership - almost a "Google Trusted Source" or "Yahoo! Trusted Source" club that websites can belong to?
Or, are you thinking more along the lines of using the algos to measure the trustworthiness of the site somehow?
BTW, you're right, I didn't ask for stuff that's being used and I'm not going to - this is a future thinking thread :cool:
I wonder why they aren't using this yet?? Maybe it excludes too many 'non-spam' sites too...