Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Other Search Engines & Directories
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 08-30-2004   #1
garyp
 
Join Date: Jun 2004
Posts: 265
garyp is a jewel in the roughgaryp is a jewel in the roughgaryp is a jewel in the roughgaryp is a jewel in the rough
Waypath Adds Topic Streams to Site

Blog search engine Waypath has just added a new feature called Topic Streams.

http://www.waypath.com/

"Topic Streams automatically aggregate topic-specific information feeds with content from around 3 million weblogs. We're still perfecting Topic Streams in the Waypath Labs."

Topic Streams are also available in RSS.
garyp is offline   Reply With Quote
Old 08-31-2004   #2
seobook
I'm blogging this
 
Join Date: Jun 2004
Location: we are Penn State!
Posts: 1,943
seobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to allseobook is a name known to all
I like how they allow many features such as:

find all posts with related topics
find all posts that reference this post
show only results from this weblog
hide results from this weblog

but currently it appears as if they are doing full page indexing. they probably want to just stick with the semantical rss stuff and not index full pages because common site navigation elements and sitewide advertising will make this product less usable for general search terms.
__________________
The SEO Book
seobook is offline   Reply With Quote
Old 08-31-2004   #3
steveATwaypath
 
Posts: n/a
Waypath's applications are useful because they work on the full text of weblog posts. Summaries do not contain enough information and index pages (showing multiple posts) are too semantically fractured to be useful.

Where we can, we use full text from RSS or Atom feeds. However, we have found that only 63% of the weblogs we crawl have feeds; only 22% have full-text in their feeds.

To get post text for the remaining 78% of weblog posts, we crawl the HTML.
Our HTML-parsing algorithms are not perfect, but they are very good. We're currently able to identify and extract only the actual text of the post in about 98% of the weblogs we crawl. Of course, crawling as much as we do, that 2% adds up...we're working on that.
  Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off