Special thanks to:
|
#1
|
||||
|
||||
|
If you are an IR practitioner, researcher or into information visualization, this might interest you.
The Need For Metrics In Visual Information Analysis by Nancy Miller, Beth Hetzler, Grant Nakamura, and Paul Whitney from Pacific Northwest National Laboratory is an outstanding research which describes a technique for visualizing document collections as fractal patterns. Their abstract states and quote (emphasis added): "This paper explores several methods for visualizing the thematic content of large document collections. As opposed to traditional query-driven document retrieval, these methods are used for exploring and gaining insight into document collections. For our experiments, we used 12,000 medical abstracts." "The SPIRE system was used to create the mathematical signal from text and to project the documents into a universe of “docustars” and as a thematic contour map based on thematic proximity. A self-organizing map is used to project the documents onto a “Tree” fractal. A topic-based approach is used to align documents between concepts in the “Cosmic Tumbleweed” projection. In the 32-D Hypercube, documents are organized by cascading theme strengths." "An argument is made for a new type of metric that would facilitate comparisons among the many methods for visualizing or browsing document collections. An initial organization is proposed for some of the relevant research that metrics for information visualization can draw upon." I bolded some key terms to highlight the connection between structural pattens with on-topic analysis and semantics. Amazing research! The authors show in Figure 2 of their paper how document vectors were projected to construct a tree-like fractal pattern. They explain: "The line segments show the structure of the fractal. Documents are projected onto the nodes (where the segments branch) of the fractal. In this figure, projected documents are shown as colored dots. This fractal is constructed by pasting together successively smaller “angles” and is a member of the class of iterated function system fractals [1]. In this “Tree” fractal, the distance between nodes is calculated by path distance within the fractal. Documents (or other information objects) are assigned to a node such that the distance between the information objects is similar to the distance between the fractal nodes (up to scale). A variant of self-organizing maps [8] is used to carry out the projection." The pattern shown in their Figure 2 is clearly intermediate between two well-known patterns: the dense-radial and diffusion-limited aggregation (DLA tree-like). While this grows from a central point, their branches resemble tree-like L-System Patterns growing under recursive grammar rules. No doubt that classic document embedding and analysis tools (TVT, LSI) are getting behind ("caduque"). Fractal analysis and diffusion geometries as tools for embedding documents and visualizing collections are here to stay. Orion Last edited by orion : 04-30-2006 at 02:15 PM. |
|
#2
|
||||
|
||||
|
And for those interested in learning about how relevant diffusion geometries, dimensionality reduction and self-similar scaling techniques are to document classification, here is Stephane Lafon brilliant research:
Diffusion Geometries for Data Mining and Dimensionality Reduction Extended work was presented at IPAM. Orion |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|