Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > Search Technology & Relevancy
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 04-26-2005   #1
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation The Future: Stuff I've Seen (SIS)

Here is MSN's Susan Dumais presentation in which the future of search and indexing is unveiled: SIS http://www.infonortics.com/searcheng...des/dumais.pdf

1. User's see something and it gets indexed.
2. You are the search agent!

Interesting Features of this technology

1. Re-use vs. search discovery
2. Implicit Queries: Score = tfdoc/log(tfcorpus + 1)

Imagine a technology that can index what you seen while browsing the Web, opening an email or chatting.

While the concept of SIS is not new, the technology was a barrier. Well, not anymore.

Comments?

Orion
orion is offline   Reply With Quote
Old 04-26-2005   #2
Nacho
 
Nacho's Avatar
 
Join Date: Jun 2004
Location: La Jolla, CA
Posts: 1,382
Nacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to behold
This couldn't be any more facinating as it is. I believe that what we are seeing in technology, science and marketing now are making our industry be one day maybe more important than how television is as it is today. I'm speechless!

This method, which had been discussed in the past long ago by Tim Burners-Lee seems to be closer than ever and not a computational obsticle any more. Users becoming additional crawlers by the millions and indexing being more up to date and refreshed based on true traffic demand and navigation for each page in question.

MSN by having it's own operating system and browser in hundreds of millions of PCs (at homes, offices, public facilities, ... you name it) can gain indexing speed faster than Google or Yahoo! can. What a BIG move this could be. Could this be what forced Google to go into creating its own browser or even an operating system???
Nacho is offline   Reply With Quote
Old 04-26-2005   #3
xan
Member
 
Join Date: Feb 2005
Posts: 238
xan has a spectacular aura aboutxan has a spectacular aura about
It was discussed at the Search Engine Meeting in Boston that I was at, I blogged about the meeting and about this. She entitled her lecture: "personal information retrieval: Finders Keepers." (don't quote me but I think the SIS method was in a paper at SIGIR 2003).

SIS as she explained just works on a simple hierarchy of folders. This is nothing different from any other product out there but SIS is aimed at researchers and research.

Microsoft low down is cool: here

Gordon Bell's MyLifeBits is also a project simialr to that, and you can find others if you take a look.

I agree that we can look towards this kind of stuff in the future, and to a much more integrated search as well.

This was a bit of a reiteration of stuff we already knew, but the most important point she made was that the semantic web is nonsense. She's never believed in it, and I have trouble too. Its a very silly name for it as well.

Carol Tenopir made a good point about having too many "my's" to deal with (my email, my account, my searches,...)

Susan has done a lot in this area, but so have many others as well. Elizabeth Liddy had some very good points to make, as did the people from Clairvoyance.

Check it here, there's also a link to the slides at infonortics.

Almost everybody had some good points to make, and it was nice to meet up again as there's always new projects and ideas to discuss. In fact the only person who was out of place there was a speaker I won't mention here who claimed to have taken secret camera footage at google of the appliances and so on and then produced screenshots from the short film they released. He was apparently booed off by microsoft as well, and didn't do too well at SEM either.

Some people were introducing new products and things on the first day but it was all interesting.

Last edited by xan : 04-26-2005 at 03:15 PM.
xan is offline   Reply With Quote
Old 04-26-2005   #4
Phoenix
Member
 
Join Date: Jun 2004
Location: Austin, Texas
Posts: 97
Phoenix is just really nicePhoenix is just really nicePhoenix is just really nicePhoenix is just really nicePhoenix is just really nice
Oh this is incredible stuff, I like the timeline feature, and its probably closer to reality then we imagine. Some of the things she mentioned in the presentation we are doing already. Personal digital libraries for many things will lend to an alternative space where SIS will be important so you can search anything under an integrated index. I have to ask though, do we really want the search engines (beyond SIS) the ability to index anything and everthing we look at. Personally I would not allow them to do so.

The other question I have is in terms of being marketers, will this technology make it more difficult for us to market to particular segment or easier. Meaning since personalization and integration of this will take such a course in many directions, will there be any common reference points we can share with others?
Phoenix is offline   Reply With Quote
Old 04-26-2005   #5
Nacho
 
Nacho's Avatar
 
Join Date: Jun 2004
Location: La Jolla, CA
Posts: 1,382
Nacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to behold
Excellent points Ben (Phoenix)!

On your first, it will become HUGE privacy barriers that the search engines will encounter. Bigger than what they probably have now. Will you want a search engine indexing your girlfriend's email on your Yahoo! Mail account? Not likely! Perhaps, they can put limitations like "IF and ONLY IF" there is an outside inbound link to it, then it gets indexed otherwise it gets dropped.

On your second, as a marketer that's what I see it the most facinating. Someday I believe you will be ably to request search traffic based on the demographics you want (specially on the paid side of things, like PPC and PFI). For example, you may want to target only women age 25-35 on a query for "chocolates" during february days. If that's what you want, then search engines should be able to provide it at a premium cost, don't you think?
Nacho is offline   Reply With Quote
Old 04-26-2005   #6
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Good post, Phoenix and Nacho.

Having an IR index system that at the other end can be feed with records from let say, the browser History or Favorites is a demographic goldmine for MSN or Google. The search engines could have more control on pricing for several services. It may also solve the current click fraud problems. A one-two punch.

Now imagine these SIS features integrated with TV programing. Adsense-like and scripts from local news, programs or documentaries can also be indexed.

Asides the obvious privacy issues, there is an indexing overhead they will need to overcome. Indexing everything will consume them. So for such technology to succeed it must be self-restrictive, I think.

Orion
orion is offline   Reply With Quote
Old 04-26-2005   #7
Phoenix
Member
 
Join Date: Jun 2004
Location: Austin, Texas
Posts: 97
Phoenix is just really nicePhoenix is just really nicePhoenix is just really nicePhoenix is just really nicePhoenix is just really nice
Quote:
Asides the obvious privacy issues, there is an indexing overhead they will need to overcome. Indexing everything will consume them. So for such technology to succeed it must be self-restrictive, I think.
This is a really interesting point, and one I would agree with. Restriction as a means of self regulation. Adding self restriction in a way could be another area for which MSN or Google forsees potential new markets (or maybe not) and we as marketers and individuals see how the combined restricting of particular documents or libraries makes one more valued over the other. Which could give us a better look at the actual person behind the computer screen. What data do they not want to give to a search engine? But that would mean attributing a value to certain things, which I don't know if we can do yet as a whole. Like the combined value of the worlds personal email, or video libraries. How about the combined value of someone's searching habits on a Tuesday. Translate that into giving access to marketers for select premium areas that most people don't have access to. That would command a higher price I think. To take off from Nacho's example, targeting women who shop a day before Valentines day at 8 am in the morning for restaurant reccomendations supplementing the search with their past search history, personal data on recently visited restaurants, and those of maybe other peoples recommendations or search histories. Add the Adwords component in there and...you get the picture.
Phoenix is offline   Reply With Quote
Old 04-26-2005   #8
Webvisitor
Member
 
Join Date: Jun 2004
Location: NearYosemite
Posts: 107
Webvisitor will become famous soon enough
Smarter cookies? Some day it will reach a point that search engines and browsers will have to pay me to play on their platforms. Cynical yes but why give up your personal traits and information for free?
Webvisitor is offline   Reply With Quote
Old 04-26-2005   #9
incrediblehelp
Member
 
Join Date: Sep 2004
Location: Toledo
Posts: 24
incrediblehelp is on a distinguished road
Quote:
Originally Posted by orion
Here is MSN's Susan Dumais presentation in which the future of search and indexing is unveiled: SIS http://www.infonortics.com/searcheng...des/dumais.pdf

1. User's see something and it gets indexed.
2. You are the search agent!

Interesting Features of this technology

1. Re-use vs. search discovery
2. Implicit Queries: Score = tfdoc/log(tfcorpus + 1)

Imagine a technology that can index what you seen while browsing the Web, opening an email or chatting.

While the concept of SIS is not new, the technology was a barrier. Well, not anymore.

Comments?

Orion
Isn't this along the same lines of Personalization?

http://labs.google.com/personalized
incrediblehelp is offline   Reply With Quote
Old 04-27-2005   #10
xan
Member
 
Join Date: Feb 2005
Posts: 238
xan has a spectacular aura aboutxan has a spectacular aura about
I think you will be finding it in Longhorn, as that's part of its concept, making previously viewed information easy to find whatever the format.

Its very much a reality, its in use all over microsoft. Lots of these are around and in beta and development. We clearly need a new way to search, especially with the advances with internet2.
xan is offline   Reply With Quote
Old 04-27-2005   #11
xan
Member
 
Join Date: Feb 2005
Posts: 238
xan has a spectacular aura aboutxan has a spectacular aura about
I thought this might shed some light!

"whatis" and other resources about longhorn and its search

Look here for Implicit queries and their explaination.

(while I was writing this I wondered why it was that I kept coming back here to read about stuff people find interesting, because theres a lot of other places to go. A good reason is that its cool to see how non-research people respond to technology, and the reason they can is because Orion brings up the subjects).

so anyway...

"What's more, the next evolution of "Stuff I've Seen" and "Implicit Query" will monitor what you're working on and provide suggested information sources and files from both your desktop and the web. "
(Searching for Dominance: What Will Microsoft Search Look Like? )

I think that people are still uncomfortable about this sort of thing, I mean security has to be really good to help people relax about monitoring and logging movements around the computer and the net. A lot of good work is going into security, so that's good. I think we need to remember how much info we have lying around in the real world as well. I'm not so concerned about this issue.

Last edited by xan : 04-27-2005 at 06:14 PM.
xan is offline   Reply With Quote
Old 05-01-2005   #12
claus
It is not necessary to change. Survival is not mandatory.
 
Join Date: Dec 2004
Location: Copenhagen, Denmark
Posts: 62
claus will become famous soon enough
On the home page for Stuff I've Seen there are screenshots, but no download. It also appears that this will only search Outlook email and IE cache. Microsoft researchers generally make some nice stuff, but although a lot of people use MS programs only (or mostly) i'm not one of those users personally, and i find that most technically advanced users aren't either. Ironically, these technically advanced users are the ones that will benefit most from such products, if done right.

By the way, distributed indexing is not new - meet Grub (from LookSmart). This is just a crawler that uses your idle time, and as such it is not really storing the pages you watch (afaik).

However, recently the list of personal information managers have grown a lot. Some examples:


I suppose that to some extent you could include services such as "del.icio.us", "Flickr", "Yahoo360", as well.

All of these are "My", but (besides the desktop search products) they are not really "mine". Here's a quote from a post i made on another forum a forthnight ago regarding this:

Quote:
The "My" that is not mine: All the "My whatever" sites are not really mine. Of course it's something that i use and something that i shape, but i can't take any of it with me. Ie. "My Yahoo" only works on Y! properties, Passport only works on MS soil, "My Google" will only work with Google. So, effectively, "My Yahoo" is not my personal interface to (or, version of) Yahoo, it's Yahoo's limited interface to (or, version of) me, aka. "Yahoo's me". And the same goes for the rest. None of it is really mine, as if it were i would have access to these things even if i was not on that particular property. It would follow me whereever i took it with me (as the memex). Which brings me to:

Source: msg #56
What it brought me to was "walled gardens" versus open exchange formats like OpenSearch, RSS/XML and so forth. Read the post for clarification, it's not a long one.

The point (to me at least) is that the stuff that will be really useful to an individual will not be really useful to any one particular company/service, as the data would belong to that individual and not the company/service in question. We (at least i) need to be able to integrate stuff that is personal with stuff that is MS specific, Yahoo-specific, Google-specific, web, email, documents, and so on.

The only place i am able to do this at the moment is on my PC, but the more i use personal web services the harder this becomes, as these services do not interact. There's no exchange from "My Yahoo" to "My Google", and each of those only have part of the full picture. And, i should add, only should have part of the picture.

So either i (as a user) have to work a lot to make sure that each of them get loaded with all the different types of information, otherwise i (as a user) have to work a lot to extract the information from all those different services and reformat it on my PC to suit my personal needs. None of these options spell "time-saver" to me.

The third option - to rely on one particular service exclusively - just don't compute to me for a multitude of reasons.

Last edited by claus : 05-01-2005 at 09:54 AM. Reason: Added some
claus is offline   Reply With Quote
Old 05-01-2005   #13
xan
Member
 
Join Date: Feb 2005
Posts: 238
xan has a spectacular aura aboutxan has a spectacular aura about
I agree with everything you said Claus. At the moment, the way the desktop stuff is working and all that, you can just as easily make your own if you have enough knowledge in this area. There would be no download because its still in closed beta, and not public. This might be because of patents or something or simply for competition purposes as is the usual trend in this business. You're right about Grub as well. SIS is not new at all as has been said here, and there are much more interesting things going on in the area of search methodologies. I'm writing an article on the semantic grid. So many researchers are busy with the network, and people forget about all the computational linguists working to add functionality to the network. Potentially things like SIS will integrate into such things especially since they will be present in your operating system.

But like I said, you make your own as well. Susan's best work goes well beyond this, and as she said, all it is a hierarchy of folders.
xan is offline   Reply With Quote
Old 05-02-2005   #14
claus
It is not necessary to change. Survival is not mandatory.
 
Join Date: Dec 2004
Location: Copenhagen, Denmark
Posts: 62
claus will become famous soon enough
>> a hierarchy of folders.

There's another way emerging slowly, but getting a lot of blogger buzz these days: Tags. In stead of organizing items in folders, you "tag" them with labels, the point being that any one item can have more than one tag.

The "The Brain" application that i mentioned in another thread essentially works the same way: You have all kinds of items (documents, bookmarks, folders, etc.) and then you have labels. What you do next is simply to establish a connection between labels and items (and among labels as well).

Of corse, in terms of usability tags require more work in the archiving process, as you have to think about more than one folder name, but in retrieval it makes things easier.
claus is offline   Reply With Quote
Old 05-02-2005   #15
xan
Member
 
Join Date: Feb 2005
Posts: 238
xan has a spectacular aura aboutxan has a spectacular aura about
Tags have been used for a very long time in IR, and they are a laborious but effective way to get things markedup, however on a very very large set, everything must be automatic or else we can't use it and with the amount of information we all have it gets hard. Tagging has especially been used large companies with masses of data.

"We are not inventing relational models for data, or query systems or rule-based systems. We are just webizing them. We are just allowing them to work together in a decentralized system - without a human having to custom handcraft every connection." (W3C)

Having all of that power sitting under the hood of your computer, here you are labelling bits of data and putting them away - it isn't a solution. W3C are right in saying that we need to adapt methods that work in smaller datasets to this expansive one.

The SIS folder hierarchy is just like anything else. I know a lot people say "we have created a folder hierarchy which...", usually meaning "we just didn't get round to it, so we thought that would do". There is still some time to go until things are finalized as far as SIS is concerned so we should be seeing something cleverer than that.
xan is offline   Reply With Quote
Old 05-03-2005   #16
claus
It is not necessary to change. Survival is not mandatory.
 
Join Date: Dec 2004
Location: Copenhagen, Denmark
Posts: 62
claus will become famous soon enough
"Here I am, brain the size of a galaxy, and they tell me to open the door..."

Contrary to "The hitch-hikers guide" I think we tend to expect too much from our non-organic mostly-idle peers. Of course, being a seasoned PC user with double digit years of computer experience I have learned to reduce my expectations to the level of "little more than typewriter and pocket calculator".

You still have to punch all the cards yourself, so to speak. Now, I should really read that PDF from Susan Dumais... downloading now. Ah, it's just slides, I thought it was a very large written document. "Information silos" - I like that term, it's accurate. Meta data...already on slide 6 - now where does that meta data come from - who's going to punch those cards...again *sigh*

Ah, I see... they don't: "Frequent use of query iteration in UI (48%)" Oh, and after you give those people this new tool to try out a significant less number of other queries are carried out. And they're having most problems with e-mail search - not surprising - and tend to search recent stuff, i.e. stuff they remember, so they search by date.

Ah... time line with landmarks. Of course. Now that's useful! Kudos!

So, what Search is missing according to SD:
  • User modelling
  • Domain modelling
  • Context of information use

Now, I would concentrate on Domain modelling (if this term means what I think it means). Second, context. I've worked a lot with user modelling, and I normally put the user first, but not in this case. Why? Because you don't want to model the user, not even the behaviour of the user. Locating "segments" is a no-brainer, but actually inferring something useful from those is a fair bit harder, as every marketer will know.

Plus, they're not as stable as they used to be, as people "trend hop" (I just invented that term - it means that they move among segments). The age of stereotypes is generally over. These days we have conservative pot smokers and stuff like that, or even better/worse, take your pick.

This is all about the needs of the user, and they're every bit as different as users are, and then multiplied by a large number, as they shift all the time according to time, taste, events, and what's hot this season. It's just a chaos model essentially, nothing useful comes from that except very broad trends that are not specific enough for this purpose.

As for domain modelling, we need synapses (for lack of a better word). Something that will decide from context (without asking me) that this email is about a fruit and that email is about a PC, even though they both mention apples. Am I likely to consider fruits with that price tag? No, of course not, so it's something else. Or, would I process a PC to fluid form and drink it? Not likely, so that's the other one. That's knowledge, not just information. Don't get me to label all my recipes with apples "fruit" - I know it's fruit, I'm not at all interested in that. Why should I teach the computer - it will just have to learn by itself, like the rest of us do.

As for context, the above is document context (or, item context), not usage context. Of course I don't want to write "apple-dash-fruit" or some other nonsense like that. I've been looking at computer sites haven't I? Am I a vegetarian perhaps? Enjoying a healthy lifestyle, even? No, just give me the right variety - don't bother me.

On the other hand, it's apple season in the fall unless you're a supermarket person. But then if that was a major event for you - climbing trees, getting a sore back, and so on, then you would likely have perused or crafted information about it as well. How hard can it be?

*lol* extremely hard. But one can dream. Or, have a nightmare as it might be. I'm off to bed, if the above is unclear it's because I am, it's 2am here.

Last edited by claus : 05-03-2005 at 08:07 PM.
claus is offline   Reply With Quote
Old 05-04-2005   #17
shepherd
Member
 
Join Date: May 2005
Posts: 15
shepherd is on a distinguished road
Quote:
Originally Posted by incrediblehelp
Isn't this along the same lines of Personalization?

http://labs.google.com/personalized
Not at all ! Creating an account on google will deminish your privacy and let Google learn everything about you.. Doing indexing on your computer where information never leaves is totally private.
Another thing is, by building a profile you basically say.. I'm static, this is who I am and I will never change. Is it really true though? Today you have a girlfriend so you like Romantic movies but tomorrow you break up so you move back to your old you - Action movies. Static profiles can never recognize those kinds of changes in user development.
shepherd is offline   Reply With Quote
Old 05-08-2005   #18
yellowwing
Member
 
Join Date: Jul 2004
Location: Winnipeg Canada
Posts: 21
yellowwing is on a distinguished road
I think it is the wrong way to go with search technology.

Remember the the 2004 elections? The promise of an infinite Internet delivering a diverse spectrum of information was ignored. People only read and believed what they wanted to.

Hard corps Republicans didn't read Liberal newspapers. Liberals didn't read Matt Drudge every day.

The technology of your own personal alogorithm will only deliver what you already know and believe. It will give folks a nice warm fuzzy feeling but no real knowledge.

If you team up a good SEO with a competent socialogist, you can easily optimize content to manipulate people.

What dangers can be cooked up with a spammer and a socialogist?

Orson Wells in 1938 scared the heck out of people with War of the Worlds when radio was the state of the art medium.
yellowwing is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off