Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > Search Technology & Relevancy
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 04-29-2005   #1
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation L-Systems and Semantics (Copy)

If you are interested in conceptual semantics, cognition, and how these relate to relevancy and semantic search engines, I have written the Fractals, L-Systems and Semantics article. Feel free to discuss it here if you wish.


Orion
orion is offline   Reply With Quote
Old 04-29-2005   #2
randfish
Member
 
Join Date: Sep 2004
Location: Seattle, WA
Posts: 436
randfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to all
Dr. Garcia,

My guess is that this paper will settle the debate over whether sentence's and document structure is fractal in nature (or at least help your position considerably).

If we assume that search engine's can use this type of fractal analysis to grade documents, what are some possible applications:

1. Could search engines use this to further eliminate "spam" and auto-generated content from their index (or at least penalize it's rankings)?

2. Conversely, could they use this to boost the rankings of documents exhbiting particularly on-topic and relevant content?

Thanks for this work, you've enlightened us all once again.
randfish is offline   Reply With Quote
Old 04-29-2005   #3
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Thanks, Rand

Quote:
Originally Posted by randfish
1. Could search engines use this to further eliminate "spam" and auto-generated content from their index (or at least penalize it's rankings)?
From my side, I can say I have a method for detecting spam using fractal measures.

Quote:
2. Conversely, could they use this to boost the rankings of documents exhbiting particularly on-topic and relevant content?
Absolutely.


Orion
orion is offline   Reply With Quote
Old 05-02-2005   #4
2much
Member
 
Join Date: Feb 2005
Location: Los Angeles
Posts: 24
2much will become famous soon enough2much will become famous soon enough
layman's terms

orion, i read the article, but as i'm not mathematical, linguistics, or tech oriented (i'm a marketer), could you translate your article into layman's terms?

if you were explaining this concept to a 10 year old, how would you explain it?
2much is offline   Reply With Quote
Old 05-02-2005   #5
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation Hi

Hola, Marcela

Dr Kevin Jones (Kingston University) expressed better than me all this in the winning essay in The THES/OUP Science Writing Prize for 1999 (Self-similar syncopations: Fibonacci, L-systems, limericks and ragtime). This is reference 23 of the paper.


BTW, some univ. colleages expressed to me interest in this topic but from the IR/semantics standpoint rather than marketing, so I uploaded this morning a printer friendly version of the article and removed some typos.

Cheers,

Orion
orion is offline   Reply With Quote
Old 05-03-2005   #6
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Randfish:

Between yesterday and this morning, I have received email communication and good feedback from Profs Simon Levy (Washington & Lee Univ), Kevin Jones (Kingston Univ), Brian Davison (AIRWeb Chair, Lehigh Univ) and so many others working in fractals applied to semantics and IR/mining. Yes, fractals are relevant to the nature of semantics.

Marcela:

Explain this to a little boy; that's a task. I would say, "Look at things in Nature, the sound of music, rhythms, the shape of things. What do you see and feel?" and go from there.

How marketers could use the essence of the article? That's a different story.

1. Optimize pages for NV phrases and noun groups NN, NNN, NNN, etc since these structures convey more precise information and are more flexible if users are doing query expansion.

2. Use these structures in titles, urls, first paragraphs, etc

3. Buy Adsense with these structures.

4. Use a subliminal technique for embedding these structures, as described in the article as latent sequences. So, let say there is a hypothetical camping company known as "Selection Camps", writing a sentence like

For staying overnight, a good selection is one of these camps

After stopword removal, is evident you can score for NN == selection camps

assuming you optimize for "Selection Camps".

Just one of many examples. I would be happy to provide full examples about using fractal and semantics but in a proper training environment.


Orion

Last edited by orion : 05-03-2005 at 10:12 PM.
orion is offline   Reply With Quote
Old 05-03-2005   #7
randfish
Member
 
Join Date: Sep 2004
Location: Seattle, WA
Posts: 436
randfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to all
Orion,

I'm thrilled to hear the response to the paper has been so positive. It suggests to me that there is great potential for expanding into this field with IR.

Your suggestions about the use of fractals is also revealing, it suggests that pattern discrimination and detection can go much deeper than most in the field suspect...
randfish is offline   Reply With Quote
Old 05-04-2005   #8
xan
Member
 
Join Date: Feb 2005
Posts: 238
xan has a spectacular aura aboutxan has a spectacular aura about
Exclamation

Sorry !

I chatted about this to very influential people in IR and computational linguistics, and some high brow research groups and the response was not so positive (Cambridge university, MIT, xerox, IBM amongst some).

Here are some basic problems with the claims:

You are right in saying that grammar is self similar – but only to a point.
This shouldn’t be anything particularly shocking. Formal grammar should be satisfied with a relatively simple set of production rules, although in practice this isn’t the case as I found out in my research,and has been evident in research by other labs, there are always the weird exceptions that make the number of rules required exponentially bigger.

The examples seem to give the impression that English grammar might work like L-systems in that your NV model descends to a certain depth. English grammar isn’t that restrictive and can descend further in one branch, than it might in another.

Further the examples are chosen to fit the model, and this does not qualify as proper research. If this was submitted as serious research, it would get knocked back first off because none of the claims are backed up, and no proper research is evident. This therefore qulifies as an idea, but not as a scientific anything.

Additionally you do not compare the technique to any other methods and you do not draw any statistical evidence from this. Numbers are all important.

I do understand that you are not writing in an academic or research environment, and that it is to benefit a different audience and not IR and CL research, but really all claims should be backed don't you think?

Good luck nonetheless.

Last edited by orion : 05-04-2005 at 01:53 PM.
xan is offline   Reply With Quote
Old 05-04-2005   #9
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

You and "they" are entitled to an opinion, whatever that may be. Those quoted in the article differ from "they", of course.

Thanks, anyway.

Orion

PS. In your post, I hit the "edit" instead of the "quote" button, but did not edit anything. Sorry, my mistake.

Last edited by orion : 05-05-2005 at 10:24 PM. Reason: Removed personal message that should be taken via PM.
orion is offline   Reply With Quote
Old 05-05-2005   #10
general
Member
 
Join Date: Feb 2005
Posts: 13
general is on a distinguished road
Ok guys- here is a different point of view, from an unscientific, normal business guy trying to optimized [and keep optimized] a website which is a leader in its field.

Search engines probably are not retrieving data based on semantic or linguistic techniques at this point in time. I believe, however, it is a matter of 1 to 1-1/2 years before they start extracting in this manner [I find the new Yahoo semantic tool very interesting]. Even studying the beta Yahoo semantic analyzer, and trying to break down syntactic patterns that they may be using... definately leads to use of noun phrase patterns.

So, we have taken Orion's theories and put them to real world use [whether they are relative or not at this point in time] in the following manner:

1) All of our context for every page is now written with either of the following formats:

a) very clear Noun Phrase to Verb to Noun Phrase sentences whereby the first noun phrase in the sentence is the subject [keyword phrase]. Previously, most of our sentences were written in random sequences. Also, we try to start most of these sentences with an determiner "the".
b) If we do not do "noun phrase, verb, noun phrase" sequence, we use certain indicator verbs based on referenced research to alot of his articles. For example, one study that determined the verb "including" or "includes" very often defines a key phrase [subject] in DMOZ categories; or the verb "is a" is a great indicator.
c) We have improved our html layout with CSS and tried to make more text browser friendly [linearization] in hopes that a spider would also read this way.
d) we implant related topic words [C-Index or synonym] into our text

Just some quick examples... so Orion's theories may not qualify as "scientific acceptable proof" in the academic world [or maybe they do]. But, from a real world point of view relative to optimizing... there may be some good rational to help us "real guys" trying to keep our websites on the cutting edge. Again, I think we are ahead of the curve, but starting to optimize a 10,000 page website now for Semantic Extraction will help us 1 year from now. The basic premise in "macro" view is that the true goal of the SE is to be able to interpret true meaning and true context of a document... just a matter of tiem before they figure out how to extract in a more semantic sense. I certainly cannot objectively proove to you that taking the above steps [Orion's theories] have improved the ranking of 1 page or another, but I feel it will only help in a general sense as we move forward toward a semantic world.

So, I like hearing bot Xan and Orion's point of view... one is probably a bit theoretical [ahead of today's reality] but none the less fruitful, while the other brings us back to reality. I personally am aggressive, more of a risk taker so I am willing to test and try some of Orion's theories knowing it probably can't hurt. But, a more conservative person, or without as much time to take risk may take Xan's point of view.

Keep on posting both of you.
general is offline   Reply With Quote
Old 05-05-2005   #11
xan
Member
 
Join Date: Feb 2005
Posts: 238
xan has a spectacular aura aboutxan has a spectacular aura about
Orion,

I didn't start the disgression. I asked you to simply answer to my observations:

"You are right in saying that grammar is self similar – but only to a point.
This shouldn’t be anything particularly shocking. Formal grammar should be satisfied with a relatively simple set of production rules, although in practice this isn’t the case as I found out in my research,and has been evident in research by other labs, there are always the weird exceptions that make the number of rules required exponentially bigger.

The examples seem to give the impression that English grammar might work like L-systems in that your NV model descends to a certain depth. English grammar isn’t that restrictive and can descend further in one branch, than it might in another."



Thank you General. You're quite right.

"Search engines probably are not retrieving data based on semantic or linguistic techniques at this point in time"
There's evidence of that in the news sourcing and this search too. Mainstream search has to comply to this. Every other kind of search does, like digital libraries for example. The problem with the web is the fact that so much content is unstructured, but there's a lot of work on automatic markup and meta data being formed, meaning that it will definately need to be used.
Linguistic techniques are used to retrieve documents by all search engines. The ranking however isn't as you know.

Starting sentences with "The"...why is that? "the" is a stopword. Because it preceeds a noun? Well so does "a" or "an".
"noun phrase, verb, noun phrase" - I'm not sure why. Maybe trying to tie in nouns to their referrent verbs. This would help you. But relying on any fixed set of rules could easily throw up some nice patterns to search providers which would get picked up and analysed and worked out of the system if you see what I mean. Never write for the machine, write for the user.

Synonyms and all that are just fine, and you look like you are facing in the right direction, with Orion's help perhaps.

You are right, I'm used to the research arena and the way it works is particular and very harsh! Nothing will even get published if you havn't backed every single point that you make and shown it from all angles, even those where the theory fails.

Last edited by Nacho : 05-05-2005 at 09:10 PM. Reason: Removed personal message that should be taken via PM.
xan is offline   Reply With Quote
Old 05-05-2005   #12
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

General:

Thank you for your feedback/comments. However, this thread is not about c-indices or co-occurrence. I do agree with your observations on testing.

Orion

Last edited by Nacho : 05-05-2005 at 09:15 PM. Reason: Removed personal message that should be taken via PM.
orion is offline   Reply With Quote
Old 05-09-2005   #13
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Quote:
Originally Posted by general
Just some quick examples... so Orion's theories may not qualify as "scientific acceptable proof" in the academic world [or maybe they do].
Hope this help.

Prof. Brian Davison (W3C's AIRWeb Chair) emailed me to mention he will be offering a graduate course in web mining next semester, and that will encourage the students to consider a semester project related
to the published work. Other colleagues are currently conducting or will be conducting research in this area. I have several research projects for the summer in collaboration with some of them.

Orion
orion is offline   Reply With Quote
Old 05-12-2005   #14
claus
It is not necessary to change. Survival is not mandatory.
 
Join Date: Dec 2004
Location: Copenhagen, Denmark
Posts: 62
claus will become famous soon enough
Interesting reading, and thanks for the link to Self-similar syncopations, that one was very nice, and through a few other pages, it lead me to an entertaining book that i've just ordered (although not from amazon)

Speaking of book:

Quote:
These specific thoughts and events, when co-occuring with similar thoughts or associated with other segments give rise to branching paths which allow words to evolve into topics and topics to evolve into documents.
So, is a book forthcoming?
claus is offline   Reply With Quote
Old 05-12-2005   #15
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Hi, Claus

Yes, it's a great reading. Prof Kevin Jones, award winner and author of the piece read my one and emailed me last week. We have similar goals, he from the mathematics of computerized music, me from the mathematical semantics of mining stuff. His is a gem of work.

Orion
orion is offline   Reply With Quote
Old 05-12-2005   #16
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Claus, you may also be interested in the following work

Recognition and Generation of Fractal Patterns by using Syntactic Techniques, by Jacques Blanc-Talon (C.S.I.R.O. Division of Information Technology, Australia). Here the author shows the connection between D0L-systems (a special type of free-grammar L-System), language and fractals. I described briefly D0L-systems in my paper.

For fractals applied to cognition, semantics and physcology of language, you may want to check Prof. Ben Shanon (Department of Psychology, The Hebrew University, Jerusalem, Israel) work. He has written dozen of research papers on the subject There is an old, yet nice article: "Fractals patterns in language" in New Ideas in Psychology, 11, 105-109, 1993 which may interest you, too.

Orion

Last edited by orion : 05-12-2005 at 04:20 PM.
orion is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off