|
#1
|
|||
|
|||
|
Can Tagging Help Search?
I blogged yesterday about how there's a lot of excitement over "tagging" that's in use in places like Flickr and Technorati. Now that Yahoo owns Flickr, some are wondering if tagging -- labeling posts, images and so on into different categories -- might help web search. Yahoo's game of photo tag is a good article from News.com that looks at this more and caused me to kick off my post. And Tags & Folksonomies - What are they, and why should you care? from Threadwatch is a nice roundup of what tagging is, if you need to come up to speed. Also see our other forum discussion, Questions about Ontologies/Taxonomies and Use in Modern Search.
In my post, I argue that we've had tagging of web pages for years and that the search engines don't use that information because it's not trustworthy. My feeling is that tagging is not somehow going to become a solution to better search relevancy even if it is "community driven" for all the same reasons -- it will ultimately be untrustworthy. Agree? Disagree? Please chime in! |
|
#2
|
|||
|
|||
|
Quote:
![]() |
|
#3
|
|||
|
|||
|
Tagging work when people are honest. But honesty leaves when profit is involved
So tagging would be spammed like links today.Though, a little bit of tag analysis could be used imho, but it should not be decisive. Juste one criteria in a million (or 100). |
|
#4
|
|||
|
|||
|
authoritative tagging?
What if there was a reputable website/organization that would apply these tags according to certain criteria independent of the intentions of the blogger/content creator/website?
Imagine I create a blog, and I activate this "authoritative tagging" option (which could be a component of the blog software). Now, when I make a post, the rss feed goes to the "tag authority" site and enters a que for eventual evaluation by either a human editor or a peice of software, which then somehow transmits the unbiased tag entry to my blog and labels the post accordingly. Hmmm the more I think about it, the logistics of that makes it nearly impossible to do (at least in a timely manner). It might work better if you just had some kind of software built into your blog that could evaluate and tag it (the same way adsense scans your page to discover the topic). So in this case you might choose to activate the "Google tagger" in your blog, which maybe would warrant a higher ranking in the search results. This might actually work pretty well, just paste the code on your page (like adsense) and let Google label it. Or would this be redundant since basically G does that anyway when they spider your page? |
|
#5
|
|||
|
|||
|
Quote:
Quote:
![]() Taking your ideas as a whole, however, it would be possible for some Adsense-like organization to autogenerate tags for a third party. Tagging text is not where the really difficulties lie though. Images, music, video and other binary formats are much more difficult. |
|
#6
|
|||
|
|||
|
Tagging could be the ultimate authority of organic search. Take the concept of Furl or De.licio.us, if I tag a page and a hundred other persons in my demographic do so you have a consensus. I suggest data organization via tagging is the purest form of data aggregation, truly organic, more pertinant than Pagerank where bought or irrelevant links can dominate.
I believe advertisers will ask for tagged data from aggregators first when the data can be properly presented. |
|
#7
|
|||
|
|||
|
If the short history of the www shows us anything, it shows us that if any weight is given to these “tags” in any major search engine algo, these tags will be spammed into uselessness in a very short time.
It sounds like a great idea, label your content appropriately... I don’t know about anyone else, but I think appropriately labeling your content is just a basic part of website design. |
|
#8
|
|||
|
|||
|
I think you're right webvisitor, community input is probably the best way to get accuracy in tags. But I would tend to think it would only work well if you had a large community actively tagging....too small a sample would be easy to manipulate artificially. Most websites probably don't have large active communities like that.
Take wikipedia for example: the pages with only one or two people working on them are more likely to be skewed to one person's opinion, but the pages with many active users will be able to maintain more of a general consensus. |
|
#9
|
|||
|
|||
|
Hmm, if the community of people doing the tagging have to pay for the privilege, or jump some equally onerous membership barrier, and that community is policed to the extent that the burden of often being booted out and rejoining is too great to make it worth spamming, then the tagging may work.
|
|
#10
|
|||
|
|||
|
[quote=DarkMatter] But I would tend to think it would only work well if you had a large community actively tagging....too small a sample would be easy to manipulate artificially. Most websites probably don't have large active communities like that.
QUOTE] I agree DarkMatter it will only work well if a large community ie. an SE aggregates the tags and yes it would only work with a representative "larger" sampling. I disagree with the assertion made that spamming would be an huge issue. Spammers don't work hard enough to spoil or seed a tagged subject to the degree it would alter the data. I am watching Furl and how LookSmart will use the data from Furled pages. Del.icio.us is not aligned with an SE so it will be more difficult to measure how those tags/bookmarks are used. |
|
#11
|
|||
|
|||
|
I don't think anything easily manipulated would be used. This means the search engines have to index these in a chosen way. Classification again. Its already used in the backend. I think it could be extended though, just not manually or by the site/blog owners.
well that's my idea anyway ![]() |
|
#12
|
|||
|
|||
|
Quote:
IMO, these "new" tags are nothing more than an extension of the meta description and keyword tags. |
|
#13
|
||||
|
||||
|
I like very much Nick's decription of tags:
Quote:
I agree with you Danny that tags are not ready for the web today, but hopefully they can be improved in a way that it will work for the future to be trustworthy. I believe that it's in everyone's (search engines, webmasters and users) best intentions to see improvement on search engine's results for any given query through algorithmic definition which excludes manual editorial reviews. |
|
#14
|
|||
|
|||
|
This is all about the "semantic web" again, I hate the definition-its silly
![]() In digital libraries everything is "tagged". There's lots of different way to do this like Dublin Core, OWL, RDF, ... I can't see how it can be used any time soon on the public web, as its not standardized for it, it's not even researched properly yet (there is still work), inference rules have to be imposed, ... There's XML which has been a great idea, sowing seeds for progress, but it doesn't tell you anything about what the structure of the document means. RDF (RDF/XML) solves this to an extent,as it is structured as one or more Triples. A Triple is: (1) the subject , (2) the property and (3) the actual value (all Universal Resource Identifier (URI)). With RDF a machine can recognise different sets of vocabulary as well. URIs make sure that concepts are not words in a page but are linked to a unique definition so everyone can find it on the Web. To this you also have the subject of Ontologies, which I think there is a thread on already, or there's a lowdown on my blog. So great, all these things exist but aren't practical because of manipulation, errors, no standards, etc... People are talking about "the Semantic Web's unifying language" (which is the logical inferences made using rules and information such as those specified by ontologies). Things called proofs get exchanged between agents in order to make a descision on a result. Ontologies would exist for a large number of resources and would have to be brought together into a new model. The semantic structure is just a foundation for complex A.I techniques, which will decide what belongs where, what the relationships are, what's going on, who wants what and what is best,... Its possible and it has been discussed that digital signatures could be used to validate documents. In fact they could be parsed through a validation service between upload and loading. If markup like this is allowed, it will be difficult to manipulate I think. A lot of very famous faces like Susan Dumais for example do not believe in the "semantic web". This is an excellent book, even though it was written 1999 - how far have we really come? Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor. Tim Berners-Lee, with Mark Fischetti. Harper San Francisco, 1999. |
|
#15
|
|||
|
|||
|
IMHO, tagging can, does and will help. The question is how much. When I see an ad for a resteraunt proclaiming "best customer service ever", I am obviously sceptical. When I see a review in a news paper proclaiming "best customer service ever" I am less skeptical, and likely to believe. When my best friend whom I dine with regularly and has similar expectations of customer service says to me "best customer service ever" I probably believe it without question.
The level of trust in each case is important, but in all cases the fact the statement is made will ahve some impact on my decision to believe or not. SE tagging could play a similar role. Use the info from tags, just don't trust it much. This would mean tags only have much influence when there is little else to go on. So, if I wanted a photo of the concord, tagging shouldn't be relied on (as there are better methods). If, as the old Steven Wright joke goes, I wanted a "... rare picture of Norman Rockwell beating up a child", then trusting tags is as good a way as another to source such a photo that probably doesn't exist. What would be great though, in terms of tagging, is more negative tags. noimageindex, nonewsindex etc etc. that will help the indexing of stuff, because it will ensure that copyrighted material isn't searchable, and people are kept out of innapropriate indexes. |
|
#16
|
||||
|
||||
|
I love Flickr (what little I know of it as a newcomer to the service). I believe that tagging works well there but relies on the fact that it's a community of reasonable people.
Clearly SE's are coming back to "workarounds" that amount to reintegrating metadata & categorization into search where it failed before. Per my recent blog about wcities.com, they power Yahoo Travel > Restaurants and have some very cool info in their database, such as neighborhood, which is a pretty localized concept. Google Local Business Center strikes me as a quiet entry into this realm as well. Having sites/businesses enter categorized info about themselves... it relies on them being trustworthy but if the format of the listings is different from general search, I'd be optimistic that it would be less spammable. Certainly I doubt that many businesses would find it helpful to tag themselves as being located in "Parkdale" when in fact they are in "Cabbagetown." So on the whole, I think that some ad hoc tagging is going to be helpful. It's going to be necessary to go in that direction because a unified metadata protocol or universal naming system just doesn't seem to be on the table. If you've taken a photo involving a cat standing near a brick wall, I think there is probably some good will there, you will probably tag your photo with cat and brickwall, and that you may not have a lot of incentive to also put j-lo ringtones cash viagra etc. in there. Plus, if the community is of reasonable size, I would think the organizers of something like Flickr could just bounce you for putting spam in tags. So much for purely automated search. ![]() |
|
#17
|
|||
|
|||
|
We have believed in the concept of "tagging" since 1999 when we first sat down at the drawing board with the intention of designing a web directory which is self organized by the webmasters themselves. Early on we concluded that the only feasible way of deferring categorization on a large scale to the web commnunity is to use "tagging" in some form. The tag in question lets the webmaster label the web page with a value which refers to subject category in a taxonomy. The tag can be retrieved by a robot and used by a "web directory engine" to organize the web in a fully automated process. We decided to use the meta tag as a data vehicle since it is unobtrusive, light-weight, and has a standardized name-value pair format.
So far our experiences with a community driven tag based categorization of the web has not only met our expectations but far exceeded them: 1) 90-95% of all submissions are correctly classified 2) submissions that require administrator intervention mostly involve assigning a different classification. They rarely involve a delete, and almost never a block. 3) The general acceptance for tag based categorization is increasing. From an initial acceptance rate of less than 10% the acceptance rate is now over 50% (acceptance rate is measured as the number of listed web pages divided by the number of submissions per day) 4) the number of candidate categories is increasing (user suggested subcategories when a category is full) 5) Spam has not been as big a problem as one might have expected. Somehow the tag concept in combination with administrative supervision seems to prevent the worst forms of spam. 6) People want to be a part of the organization of the web In our experience there is no doubt that a community driven tag based paradigm really does work and, to answer Danny's question, yes it can help web search. A bottom up user driven categorization scheme in combination with a top down key word algorithmic search may become a "meet in the middle" search tool that gives SERPs of unexpected quality to the end user. |
|
#18
|
|||
|
|||
|
Quote:
As for "trust" - well, that depends on who's doing the tagging, and how. Try looking at some controversial self-tagging, say "flickr > porn" or something. Even with an adult filter turned way up, i wouldn't call this pr0n. Just like the days when people were putting the word "sex" in every meta tag to get traffic (i still see this on sites you wouldn't believe - last example i saw was a gardener web site). Still, i believe there are some SEO benefits in all this nonsense, but that's another story. And besides being off-topic based on that, it doesn't really relate to the tagging itself either, more to the fact that a lot of people think they're fun to play around with (sort of like blogs, but different) |
|
#19
|
|||
|
|||
|
A number of posts in this thread talk about communities creating meta tags. If you can describe a bureaucracy as a community then the Australian Government already does it. All Aus Gov website have their high level pages tagged with a dublin core set of meta tags. This doesn't help public search engines generally, but it does dramatically simplify maintenance of what are called the entry points, their content and search engines use .gov.au domain meta tags.
the entry point is www.australia.gov.au and the standards are at www.agls.gov.au. agls stands for Australian Government Locator Service. The driving force behind it was the government archives whose record keeping mandate was being seriously threatened by online only information. There is another entry point www.business.gov.au which pre-dates agls. It used a different voluntary standard to bring together government web sites dealing with business in Oz. The thing is though, that if the bureaucrats didn't tag the page, you'll never find it from the entry point - but Google will if you look hard enough. |
|
#20
|
|||
|
|||
|
An article about tagging and human information in search
Some of you might be interested in this article I posted on my blog yesterday on the effects of tagging, meta tags and human information on search:
Coming to terms with tags: folksonomies, tagging systems and human information |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|