Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > SEM Related Organizations & Events
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 02-28-2005   #1
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Search Algorithm Research & Developments - SES NYC 05

Orion couldn't make it due to a mud slide, he will be here tomorrow. They will try to present his presentation with voice overs.

Mike Grehan was up first, he deleted his presentation last night. So he had to restart from scratch, but everyone sympathized. He shows the SEW Forums, and explains that people are very interested in "this stuff." He highlights the keywords co-occurance thread that had 46,401 views, so there is a lot of interest. He said Orion deserves a ton of credit. What are the ages of my three sons? He starts a story that all of his three sons are having a birthday today. He then gives clues to figure out his sons ages, the product of the ages of my sons is 36 and the sum of their ages is equal to the number of windows in the building and the last clue is one son has blue eyes. Mike then gives down a break down on how to figure with equations. the answer is 9, 2, and 2. The last clue, about the blue eyes, said there was an oldest son so the 6, 6, and 1 wouldn't be the right answer. He explains that engines want the most relevant results, which is hard "because end users are search nitwits!" He explained that someone who walks into a travel store and tells the clerk "travel" he will kick you out but search engines respond. The "abundance" problem, too many results, which are the best results, which are the most relevant? Social networks have been extensively researched long before the Web. He briefly explains "Citation analysis", so we have a Web graphic, directed edges and undirected edges (co-citation). If you have questions about this, let me know. Then he discusses PageRank and HITS. PageRank he sums up, PageRank is keyword independent. HITS (Teoma) which is keyword dependent. Great way of explaining the difference. He says there is only one problem with these two solutions, "Neither of them work." He said the problem with PageRank, well they don't use it, so he skipped it. He then went on to HITS and said topic drift, nepotistic linking and runtime analysis are the three issues. The first two were corrected, but runtime analysis is still an issue. He said how AG from Ask Jeeves (Teoma) cracked it. He then put up a graph on the hubs and authorities. So what happened next? B&H algorithm died with AV, then those two went to Google and Hilltop came out. Then in Feb. 03, Google patented Local Link (Bharat). Then he went into Florida (nice little graph), he said it had a lot to do with Google moving from keyword independent to dependent. He throws up some links to advanced papers on this about the future. He finishes off his presentation with an other story. A guy is walking in a desert, he finds a dead guy on the sand with a bag on his back. What was in his bag, a parachute.

Next up was Ask Jeeves named Rahul Lahiri, he helped me out once with a relevancy issue a month ago. He said there is some overlap with Mike's presentation. He goes over the Ask properties and growth numbers. Ask's mission is relevance, index completeness, freshness, and structured data (smart answers). Algorithmic drives are content/text analysis, and link analysis. He focuses on the link side; and shows a graph of page a linking to page b and page c (mike showed something similar). Ask looks at what the "links are about". He goes into the hubs and authority thing. The key challenges are solving the problem in real-time and identifying the communities. He then gives examples of queries such as "buffalo" vs. "bay area airports". They need to weed out the noises from the good stuff. He explains that small enthusiast sites get a chance to rise to the top, which is great. They then can do a better job of identifying different communities, refine search.

Now they give Orion's, Dr. E. Garcia's presentation a try. It sounds like Nacho. Cool, its working. Nacho introduces it. Co-occurrence suggests association or relatedness. I'll summarize it later, very technical.

Q & A:

LSI - Mike said that engines will use it, but he implied they are not at this time.
rustybrick is offline   Reply With Quote
Old 02-28-2005   #2
Phoenix
Member
 
Join Date: Jun 2004
Location: Austin, Texas
Posts: 97
Phoenix is just really nicePhoenix is just really nicePhoenix is just really nicePhoenix is just really nicePhoenix is just really nice
Search Algorithms and Research

I covered this session as well. Here is my take on it. I asked Barry to gather what he could so he could possibly come up with an additional summary. This was the most difficult session of the day to report on. As it was fast paced, there was an audio voice over, and highly technical, but amazingly fascinating.

Search Algorithms and Research

“End users want to achieve their goals with minimum of cognitive load and a maximum of enjoyment.” ~ Marchionini. Why? Because search users are nitwits. Mike asks us to consider the following. What if someone goes into a travel store and when asked what he is looking for, he answers “travel”. He goes on to describe it takes to get ranked in the top ten. Social sciences and bibliometry is also mentioned on the screen and have existence for a long time, even before search engines. They are being applied today in the algorithms that are created for search engines. The web is a social network he continues. Social networks have been extensively researched long before the web. He describes citation analysis and the how this is applied to in search engines. There is a difference between a citation and a reference.
Hyperlink analysis algorithms make either one or both of these simple assumptions. Assumption 1 – A hyperlink from page A to page B. Co citations, if a page C cites pages A and B, then A and B are said to be co-cited by C. Pages A and B being co-cited by many other pages is evidence. There are two main algorithms based on links. PageRank (Google): Each page on the web has a measure of prestige that is independent of any information need or query i.e. keyword independent. Roughly speaking, the prestige of a page is proportional to the prestige of the sum of the prestige scores of pages. HITS or Hyperlink-Induced Topic Search. Problem is that neither of these algorithms work.

The problem with HITS. Topic drift, nepotistic linking, and runtime analysis. Mike says there are three steps to success. They cracked the problem relating to time of a search from 11 seconds to instant. He describes Teoma and subject specific popularity.
Adventures in search algorithms: What happened next? Both Krishna Bharat and Monica Hensinger join Google. Mike believes that Florida that moved from keyword independent to keyword dependent.

Ending joke:
There is a guy trapped in the desert and is looking for life. He finds a man face down in the sand, with a bag on his back. He thinks what was in the bag that would have saved him. Answer: Parachute

Next up was Rahul Lahiri he presents some of the properties that Ask Jeeves controls. Today they are ranked #7 on the web and have done exceedingly well since this time last year. What is their mission: relevance. He goes into general link analysis methods. The challenge is to discovering what the links are about. A link from page A to page B (or C) is a vote or recommendation by the author or page A for the page B (or C). The problem is that if you have a link with the anchor text budget, you don’t know what the budget means. Was it a budget for Budget rent-a-car or budget for someone’s companies?? That’s a problem obviously. He continues that organizing into local subject communities of sites. This is how Teoma views that web. Some of the challenges that they face is that solving the problem in real-time. 200 ms (milliseconds) to do this computation for each query, millions of times per day. You also have to identify the communities. The link structure of the web is noisy. Hubs link to topic specific pages. An example of topic focused vs. broad topic areas. Topic focused is a search for “buffalo” and broad topic areas is a search for “bay area airports”. Some of the benefits are that smaller enthusiast sites get a chance to come up to the top of the search listings (example search: fantasy football).
The power of communities is a better vision, expert validation, contextualization, and better user experience.

Next Dr. E. Garcia, a pioneer that has allowed us to better understand the search engines as marketers was next to present. His plane has been delayed till tomorrow because of weather (its snowing heavily here), BUT there is a voice over for his presentation. Tapes starts. He is going to discuss grasping co-occurrence. Co-occurrence suggests association of relatedness. Side note: People are leaving because the audio isn’t too great. But not too many as there is a good amount of interest for this. Back to co-occurrence. Co-occurrence can be: Global, Local, or Fractal. This presentation is highly technical, and while I understand his work, it’s hard to follow. I am trying to get what I can, as its requiring very detailed listening and comprehension at this point. I apologize for any errors in this document, please feel free to correct.

Example of the case of “Hawaii” which is semantically connected to aloha, Hawaiian, Maui. C-indices can be used to estimate the relative presence of targeted keywords across search engines. He gives another example of “comida + mexicana” that are semantically connected. Example: C-indices can be used to monitor keyword trends, word patterns and topics in time. He goes on to talk about competitive words. Based on his research the example suggest that many competitive queries in Google tend to exhibit C12 indices. His research indicates that overused queries tend to exhibit unusually high C-indices while unrelated terms in a query tend to exhibit very small C-indexes. He gives the example of “guacamole optimization” with a low C-index of 0.12. On to term sequencing: EF-ratios. He talks about various types of queries such as a findall and exact and how order and frequency matter. He goes on to give the example that EF-ratios can be used to estimate the relative frequency of natural sequences and phrases in a source. So what about candidate sequences? These EF ratios can be used to examine how easy or difficult would be to rank for a given sequence in a given search. Keyword competitiveness is specific to each search engine. Some search engines return documents whose sequence can be found. When queried in EXACT mode, some searches return docs in which the queried term can be found. What is it separated by, delimiter (hyphen, underscore), space, or stopwords (in, of, with). So to recap, co-occurrence theory can be used to understand semantic associations between: terms, products, services.

Q: Interested in how we will be searching in 5-10 years time? Personalization?
A: Where is search going? Mike did an interview from the founder of Teoma. It was interesting he says. The most interesting is that he said they need to get up 10 steps up the ladder, currently we are 3-4. The one thing that will change this, will be personalization. It’s misunderstood, personalization. It’s not giving you a search just for you. Its about returning results for your peer group. They can start to tailor the search specifically to you. There is data now using genetic algorithms and others set that are using these to create search engines. Mike concludes the more information we give the search engines, the better our experience will be.

Last edited by Phoenix : 02-28-2005 at 08:14 PM. Reason: spacing
Phoenix is offline   Reply With Quote
Old 02-28-2005   #3
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Orion's presenation::::::

First excuse me if I make major mistakes in my interpretation of the presentation. I hope Dr. G. (Orion) reviews this and makes any necessary corrections.

Orion's first slide went over some of the basics of co-occurences. Orion explains that co-occurences shows a type of "relatedness" between words. So if you have two terms that are often discussed or found on the same document, they tend to be more related. He then gives an example of the term "aloha". What does aloha make us think of? Hawaii is the correct answer. Orion then explains that this is important when conducting "keyword-brand associations." In Orion's second example he shows an equation he discussed in the forums; c12-index = (n12/(n1+n2-n12))x1000, he overlays an example of a k1 and k2 showing the n12 overlap in the middle as well as explains how an example of 3 keywords makes for a much more complex query in AND mode (n123). He then brings back the old example of "aloha hawaii" to explain "term associations". When you compute the values in Google of "aloha hawaii" versus "aloha indiana" or "aloha montana" you will notice the the C index is much higher with "aloha hawaii" (28.11) versus "aloha indiana" (3.23). This shows that aloha AND hawaii are more "semantically connected" then the other examples. He then shows how you can use the C-index computation to determine which engines would it be easier to target a specific keyword phrase, the higher the c-index, the more competitive that keyword phrase is in the engine, relative to other engines. Orion then explains that c-index can be used to monitor keyword trends over time, showed some very interesting slides to prove it. Orion's benchmark for a "competitive query" is one that has a c-index of above 25 points, he lists a number of those submitted to him via SEW Forums for a stufy he did several months ago. He then computed the c-index of some spam related keywords that were way above the 100 mark on the scale, neat stuff. Orion then explains that most engines use AND (FINDALL) mode as opposed to EXACT. When you look and compare both, you should find the results for EXACT mode within the FINDALL mode. The reason has something to do with order and proximity, where exact mode it does matter and findall it does not. Using this information, Orion defined a new ratio named "EF Ratio" which is equal to (n12 Exact Results/n12 FindALL Results) x 100. What the EF Ratio shows us is the "natural sequences" of words used. Meaning, how are words used in language, documents (real life). EF Ratios can be used to determine competitiveness of a keyword. The lower the number the less competitive it is. In fact, he showed that competitiveness for the same keyword phrases differ from search engine to search engine. The last slide we will save for those who were at the session.
rustybrick is offline   Reply With Quote
Old 03-01-2005   #4
massa
Member
 
Join Date: Jun 2004
Location: home
Posts: 160
massa is just really nicemassa is just really nicemassa is just really nicemassa is just really nicemassa is just really nice
Great job guys.

This is the one that makes me the most dissappointed. Business didn't allow me to leave until Monday and when we got to the airport we found that our fight had been cancelled. Since we knew we would already be missing the first day, I didn't see the point in trying to reschedule so I had to accept that it wasn't meant to be this time.

I had the great pleasure of working with Dr. Garcia a couple of years ogo and have been a huge fan ever since. I was going mostly so I could meet him, finally, face-to-face. I wouldn't have been able to attend the seminar anyway, but I was actually more looking forward to just meeting with Doc and I also wanted to bend Mikkel's ear. Oh well, maybe the next one.

In the meantime, it is great to be able to read the pitch-to-swing reports you guys are providing.
massa is offline   Reply With Quote
Old 03-07-2005   #5
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Hi, Bob.

Thank you for such kind words. I have a lot of respect for you, too. I guess we'll meet at a future marketing event unless you invite me again to OK.

BTW I PM several times prior the SES. Feel free to PM or email me.

Orion
orion is offline   Reply With Quote
Old 03-07-2005   #6
massa
Member
 
Join Date: Jun 2004
Location: home
Posts: 160
massa is just really nicemassa is just really nicemassa is just really nicemassa is just really nicemassa is just really nice
Mi casa, Su casa Doc. Just let me know when you can make it and I'll take care of the accomodations!

I have had a couple of other people mention problems with my SEW PM. Try my email please. I have something I think you'd enjoy discussing.
massa is offline   Reply With Quote
Old 03-07-2005   #7
stuntdubl
Traffic not SEO.
 
Join Date: Jun 2004
Location: Upstate NY
Posts: 45
stuntdubl is a splendid one to beholdstuntdubl is a splendid one to beholdstuntdubl is a splendid one to beholdstuntdubl is a splendid one to beholdstuntdubl is a splendid one to beholdstuntdubl is a splendid one to beholdstuntdubl is a splendid one to behold
Yes...very nice presentation. I was impressed that you were able to do it while not actually being there in person Dr. Garcia.

I would be interested in re-reading the presentation, as well as Mike Grehan's from that session. Does anyone know if these are posted anywhere...and if so where?
stuntdubl is offline   Reply With Quote
Old 03-07-2005   #8
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Quote:
Originally Posted by stuntdubl
Yes...very nice presentation. I was impressed that you were able to do it while not actually being there in person Dr. Garcia.

I would be interested in re-reading the presentation, as well as Mike Grehan's from that session. Does anyone know if these are posted anywhere...and if so where?
Thank much stuntdubl for such kind words. I just uploaded some advanced information, which is available at http://www.miislita.com/fractals/ove...-patterns.html This is part of an ongoing series on The Fractal Nature of Semantics I'm writing. Enjoy it!

I'm debating if I should make available the entire SES presentation. I need to resort several issues first.

Cheers

Orion
orion is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off