Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > SEM Related Organizations & Events
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 04-15-2005   #1
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation Document Space Workshop At Ucla's Ipam

I have the honor of learning via Dr. Mark L. Green Director, Institute for Pure and Applied Mathematics and Professor of Mathematics, UCLA about IPAM'S DOCUMENT SPACE WORKSHOP. If you are interested, please feel free to register now. The workshop will be held in January 23-27, 2006 at UCLA. The organizing committee consists of

Carey Priebe, Chair (Johns Hopkins University, Center for Imaging Science/Applied Mathematics and Statistics)
Damianos Karakos (Johns Hopkins University, Center for Language and Speech Processing)
Mauro Maggioni (Yale University, Mathematics/Program in Applied Mathematics)
David Marchette (Naval Surface Warfare Center, Dahlgren, VA)

To register or learn more about this very important workshop, please visit http://www.ipam.ucla.edu/programs/ds2006/

Material to be included in the workshop (emphasis added):

"This workshop on Document Space has the goal of bringing together researchers in Mathematics, Statistics, Electrical Engineering, Computer Science and Linguistics; the hope is that a unified theory describing "document space" will emerge that will become the vehicle for the development of algorithms for tackling efficiently (both in accuracy and computational complexity) the challenges mentioned above."

"Text documents are sequences of words with high syntactic structure, where the number of distinct words per document ranges from a few hundreds to a few thousands (depending on the size of the document). Much effort has been devoted to finding useful low-dimensional representations of these inherently high-dimensional documents that would facilitate NLP tasks such as document categorization, question answering, machine translation, etc. Most approaches can be categorized as either (i) symbolic, or (ii) stochastic. Specifically:"


"* The symbolic approaches aim to find the class, or parts-of-speech (POS) tag, for each word in the text. These tags (e.g., noun, verb, pronoun, determiner, etc.) provide a significant amount of information about the word and its neighbors. For example, POS tags can tell us something about how a word should be pronounced (the word lead, for instance, can be either a verb or a noun), or, they can help for further document processing (e.g., word stemming)."

"* The stochastic approaches use word (or N-gram) frequencies to derive summarized versions of documents. In the vast majority of these techniques, each document is represented as a vector of word (or N-gram) frequencies, where each frequency corresponds to the number of times a word appears in a document. Usually, these frequencies are scaled by the number of documents in a corpus that contains each word (the inverse document frequencies). Other approaches use the point-wise mutual information between a word and its context. The end result is a vector representation for each document; the dimensionality of the space in which these vectors lie is usually in the thousands. Dimensionality reduction techniques through Principal Components Analysis, Multidimensional Scaling, Laplacian Eigenmaps, etc., have been explored, with some success."

"We can thus see that many IR tasks can be formulated as problems of clustering, outlier detection, and statistical modeling in this high-dimensional space. Many important questions then arise:"


"* What is the best way to perform dimensionality reduction? The fact that documents can have diverse features in terms of vocabulary, genre, style, etc., makes the mapping into a common space very challenging."

"* Is there a single best metric for measuring similarity between documents? Documents can be similar in many ways (in terms of content, style, etc); how do different vector representations facilitate different similarity judgments?"

"* How can the semantics of each word be incorporated into the analysis and representation? For example, there are many cases where related documents share very few common words (e.g., due to synonymy). On the other hand, documents with high vocabulary overlap are not necessarily on the same topic."

"* It has been argued that sub-corpus dependent feature extraction (that is, document feature computation that depends on collective features of a subset of the corpus) yields far better retrieval results than when the features depend only on each document independently. Hence, efficient representation of documents into a common space becomes a "hard" problem: in principle, one would have to consider all possible subsets of a corpus in order to find the one that yields the best feature selection."

"* There is a natural duality between the symbolic and stochastic approaches described above, which have been exploited in order to organize document corpora. Symbolic information can be used to define coordinates and/or similarities between documents, and conversely the stochastic approach can lead to the definition of symbolic information. As above, this correspondence is relative to different subsets, of both documents and symbols, and organizing and fully exploiting it, with efficient algorithms, is challenging."

I'll be there. Anyone going? Nacho, are you going, too?


Dr. E. Garcia

Last edited by orion : 04-15-2005 at 11:09 PM.
orion is offline   Reply With Quote
Old 04-15-2005   #2
Nacho
 
Nacho's Avatar
 
Join Date: Jun 2004
Location: La Jolla, CA
Posts: 1,382
Nacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to behold
Quote:
Originally Posted by orion
I'll be there. Anyone going? Nacho, are you going, too?
Are you kidding? For $100 bucks in registration fees how can anyone miss this?

You got it, I'm in! Thanks for the heads up.
Nacho is offline   Reply With Quote
Old 04-22-2005   #3
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Good, Nacho.

I forget to mention that they have financial support for those that qualify.
https://www.ipam.ucla.edu/elements/c...aspx?pc=ds2006

Orion
orion is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off