Go Back   Search Engine Watch Forums > General Search Issues > Search Technology & Relevancy


Reply
 
Thread Tools
  #1  
Old 04-30-2006
orion's Avatar
orion orion is offline
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation Visual Information Analysis

If you are an IR practitioner, researcher or into information visualization, this might interest you.


The Need For Metrics In Visual Information Analysis by Nancy Miller, Beth Hetzler, Grant Nakamura, and Paul Whitney from Pacific Northwest National Laboratory is an outstanding research which describes a technique for visualizing document collections as fractal patterns. Their abstract states and quote (emphasis added):


"This paper explores several methods for visualizing the
thematic content of large document collections. As opposed to
traditional query-driven document retrieval, these methods are
used for exploring and gaining insight into document
collections. For our experiments, we used 12,000 medical
abstracts."

"The SPIRE system was used to create the
mathematical signal from text and to project the documents
into a universe of “docustars” and as a thematic contour map
based on thematic proximity. A self-organizing map is used to
project the documents onto a “Tree” fractal. A topic-based
approach is used to align documents between concepts in the
“Cosmic Tumbleweed” projection. In the 32-D Hypercube,
documents are organized by cascading theme strengths."

"An argument is made for a new type of metric that would facilitate
comparisons among the many methods for visualizing or
browsing document collections. An initial organization is
proposed for some of the relevant research that metrics for
information visualization can draw upon."


I bolded some key terms to highlight the connection between structural pattens with on-topic analysis and semantics.


Amazing research!


The authors show in Figure 2 of their paper how document vectors were projected to construct a tree-like fractal pattern.


They explain:


"The line segments show the structure of the fractal. Documents are
projected onto the nodes (where the segments branch) of the
fractal. In this figure, projected documents are shown as colored
dots. This fractal is constructed by pasting together successively
smaller “angles” and is a member of the class of iterated function
system fractals [1]. In this “Tree” fractal, the distance between
nodes is calculated by path distance within the fractal.
Documents (or other information objects) are assigned to a node
such that the distance between the information objects is similar
to the distance between the fractal nodes (up to scale). A variant
of self-organizing maps [8] is used to carry out the projection."



The pattern shown in their Figure 2 is clearly intermediate between two well-known patterns: the dense-radial and diffusion-limited aggregation (DLA tree-like). While this grows from a central point, their branches resemble tree-like L-System Patterns growing under recursive grammar rules.


No doubt that classic document embedding and analysis tools (TVT, LSI) are getting behind ("caduque"). Fractal analysis and diffusion geometries as tools for embedding documents and visualizing collections are here to stay.


Orion

Last edited by orion : 04-30-2006 at 02:15 PM.
Reply With Quote
  #2  
Old 05-06-2006
orion's Avatar
orion orion is offline
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

And for those interested in learning about how relevant diffusion geometries, dimensionality reduction and self-similar scaling techniques are to document classification, here is Stephane Lafon brilliant research:

Diffusion Geometries for Data Mining and Dimensionality Reduction

Extended work was presented at IPAM.


Orion
Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -4. The time now is 05:41 AM.