View Full Version : Does Meta Data Within Images Count in Search?
PennyPracticesSEM
04-30-2008, 02:12 PM
Apparently, the question of the relevance of the meta data embedded within the image (jpg) for search engine relevance and ranking is a debated topic. Does it even matter to have meta data for your image since the ALT text and title already has pull?
metasynman
04-30-2008, 02:52 PM
Do you mean using text within the graphics themselves? No... without implementing some form of OCR, the search engines don't actually "see" what you've got in your images. However, a manual review of your site will tell Google whether your ALT & title tags for an image are actually relevant to what is in the image itself. Image file names with keywords can help to some degree as well, but as far as the actual image contents it only becomes a factor during a manual review.
cryptblade
04-30-2008, 04:45 PM
I read the question as "meta data" in images being file/data property information that you could edit in certain programs, is that what you are asking? Or were you asking about words IN the images?
mcanerin
04-30-2008, 05:47 PM
I just read a newspaper article that said that Google does look at file meta data for images (I know singingfish does, too) but I can't find the reference now.
Ian
metasynman
04-30-2008, 05:54 PM
Oh THAT meta data... yeah, that would make more sense, wouldn't it. And yes, Google uses that meta data for Google Images, but I was under the impression it wasn't taken into consideration for organic content search listings. I could be mistaken, however. Let us know if you find that article again, Ian.
cryptblade
04-30-2008, 08:05 PM
yeah, i'd like to see that article too. But do we know what the question meant? was the question file meta data or words IN the images?
Possible OCR (Optical Character Recognition) has been confirmed. In fact, OCR is pretty standard for Google Books and similar. Here is the blog post on Tesseract OCR:
http://google-code-updates.blogspot.com/2006/08/announcing-tesseract-ocr.html
The "property" meta information cryptblade mentioned is more properly called EXIF (Exchangeable Image File Format). I asked Matt Cutts about EXIF during Google's webmaster live chat last month:
"Brian Ussery: Does Google extract exif from images?"
"Matt Cutts: Brian, I'm not sure, personally. I could imagine that any stuff embedded in an image file might be used, though."
If you do a search for "iso: 400" in Picasa, you'll see EXIF in action. You can see the data by clicking on a photo and clicking the "more info" box to view the EXIF.
http://picasaweb.google.com/lh/searchbrowse?q=iso%3A+400&uname=&psc=G&filter=1#3+1
So, the answer to both the "words in images" and "words in file" question is yes!
In case you missed it, I'd urge everyone to check out the full transcript:
http://groups.google.com/group/Google_Webmaster_Help-chit-chat/msg/c7b26a3b5b0a87ac
cryptblade
04-30-2008, 10:20 PM
sweet find B-man!
sweet find B-man!
Funny you say that, I was a photographer long before there was such thing as a search engine. Every photographer knows they have been using EXIF in some cases for at least two years.
PennyPracticesSEM
05-01-2008, 01:14 PM
Sorry for the confusion, but yes, I was referring to the meta data in the image that you can edit in programs such as photoshop, not actual-visible text in the image. This is great knowledge-sharing, thanks guys!:D
mcanerin
05-01-2008, 03:10 PM
THAT'S where I read it - in a Google Groups thread! No wonder I couldn't find it while looking in newspaper sites...
I must be losing it. Just ignore the senile, muttering old guy feeding the spiders over here in the corner....
Ian
cryptblade
05-01-2008, 04:46 PM
That's right Ian - pave the way for the young blood, da youngins. Yee-yah, step off unc! :cool:
- just kidding!
I was referring to the meta data in the image that you can edit in programs such as photoshop, not actual-visible text in the image.
Yep that is EXIF and mcanerin come on now, it is about time I answered a question correctly :)
jason
05-04-2008, 01:50 PM
Hi everybody, I think by metadata here we have to distinguish between what we call "scene text" (car plates, t-shirt brands) and "overlay text" (aka subtitles, banners, transcriptions). There is a whole domain in image text recognition and it consists in detection, binarization, filtering, and finally OCR. If the background is strange or the colors cannot be discriminated the task is difficult. For normal overlays the precision/recall values are up to 90%, however no benchmark exists. For scene text numbers are clearly low due to the difficulty of the task, however some progress has been achieved lately. Hope this helps. I would call EXIF media description rather than pure metadata in the semantic sense that I believe applies here. But I could be wrong.
billse
05-04-2008, 03:46 PM
I expect they read that meta, and I think they value that meta; I don't think they put a lot of weight on that meta in ranking just yet. It does tell an engine about the quality of the image. Obviously, engines want to deliver quality. Hopefully meta ranks more than I suspect. Nice thread!
@ Jason,
This is pretty cutting edge stuff but you are correct. Technically, the EXIF is a carrier of metadata as this was originally intended to provide for interoperability in a/v equipment. "EXIF metadata" isn't to mean the Exif is metadata but rather Exif provided metadata. The real secret here is that Exif can provide Geo information via GPS but that will only happen when GPS recording features are built into a/v software and equipment. Oh and by the way, welcome to SearchEngineWatch.com
see 3.12:
http://www.w3.org/2005/Incubator/mmsem/XGR-vocabularies-20070724/#existing-SI
@billse,
Exif may not have a significant impact in rankings except when users query information not included in images without Exif. For example, if a user queries "iso 400" chances are that non-Exif images won't rank well. Now think about how that will impact users searching for a specific location using images with no Exif in a few years when Geo is built in. Either way, Exif is one of the only ways to determine original images from those that have been retouched via PhotoShop. Detecting retouched images is important to folks like AP photo editors. Safe to say a non-photoshoped image might be more "trusted" than a photoshoped image in some cases. One of the best ways if an image is retouched is via Exif and that is why I think it's importance will grow in the future (5 years+).
billse
05-05-2008, 01:02 PM
I'm even thinking outside of just the EXIF (sort of like what the Google scientists were talking about in their paper "Page Rank for Product Image Search" in Beijing last month).
Takes me back to my visual tech days!!! That's where untalented graphic designers go (haha!).