View Full Version : Search through HTML code?
joeminori
01-14-2006, 05:20 PM
Hi,
Is there any search engine that is capable of searching
through the entire HTML code, and not just the text
contents?
For instance, if a webpage contains
<script type="text/javascript" src="http://www.google.com">
I would like this webpage to be included in the results
when I search for http://www.google.com.
Is there any search engine or another solution capable
of this?
Regards,
Joe Minori
mcanerin
01-15-2006, 03:13 AM
It's technically possible, but I'm afraid I've never heard of anyone doing it.
I know it would be helpful for some things, but I suspect in general there would be so much duplication (every almost webpage on the planet has certain common tags, like html, body, etc) that it would take a LOT of storage, but be of little value to anyone but a few (probably non-paying) researchers.
I don't imagine a commercial site taking it on, but it might make an interesting research project.
Right now, the only choice you have that I'm aware of is the old "view source and look for text string" approach, which is slow and tedious, and assumes you already know the pages you want, which doesn't sound like what you are looking for, sorry.
It would be nice if it were, because there have been a few times I've wanted to do the same. But my cynical self is telling me it's not likely to happen any time soon - too many expensive resources would be needed to do it, but with no reliable method of making the money back or even breaking even. I think it would have to be government or research grant funded to even get off the ground.
Just my opinion, of course,
Ian
runarb
01-16-2006, 12:02 AM
in general there would be so much duplication (every almost webpage on the planet has certain common tags, like html, body, etc) that it would take a LOT of storage
One could use Zipf's law to remowe thos frequently occurrent terms automaticly.
Se http://en.wikipedia.org/wiki/Zipfs_law
byronm
01-21-2006, 10:11 PM
Most search engines won't index the html content however they do typically pull in anchor texts even from java scripts/pdfs or any content they parse.
If you want to searh source codes there are a few niche searches that do that.