Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > Search Technology & Relevancy
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 09-08-2004   #1
garyp
 
Join Date: Jun 2004
Posts: 265
garyp is a jewel in the roughgaryp is a jewel in the roughgaryp is a jewel in the roughgaryp is a jewel in the rough
A Scalable Topic-Based Open Source Search Engine

Here's a paper set for publication this month (Proceedings of the IEEE/WIC/ACM Conference on Web Intelligence (WI'04) (Beijing, China, September 2004) might be of interest to some of you.


Title: A Scalable Topic-Based Open Source Search Engine
http://cosco.hiit.fi/Articles/wi04search.ps
If you need to convert from postcript to pdf, simply enter the url in this conversion tool:
http://view.samurajdata.se/

Note: The paper is not yet in ResearchIndex.


Btw, many (if not all) of the authors are part of the
Next Generation Information Search project at the Complex Systems Computation Group. Part of the Helsinki Institute for Information Technology (HIIT).
http://cosco.hiit.fi/search/

Last edited by garyp : 09-08-2004 at 03:41 PM.
garyp is offline   Reply With Quote
Old 09-08-2004   #2
Nick W
Member
 
Join Date: Jun 2004
Posts: 593
Nick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the rough
oooooooooh baby, please let it be so!

Who's behind it? the uni or a company?

I love open source, i recently taught myself the basics in python, think what i could do if i could take an informed look at the algo? hehehehe.....

Many OS projects flop as abysmally (i cant speel) as regular ones, but somtimes they really, really fly... what fun!

Nick
Nick W is offline   Reply With Quote
Old 09-08-2004   #3
garyp
 
Join Date: Jun 2004
Posts: 265
garyp is a jewel in the roughgaryp is a jewel in the roughgaryp is a jewel in the roughgaryp is a jewel in the rough
Nick:
Take a look at this page:
http://cosco.hiit.fi/search/

Includes info about funding and links to all of their projects.
I believe some work comes from a consortium of with members from France, Switzerland, Finland, Denmark, Slovenia, and Spain.

all sorts of interesting stuff. Demos coming soon.

+ From the web site, "The new economy is based on innovation, and innovation is based on up-to-date information. The semi-static Internet alone has in the order of 1000 million pages of information, and search has become a fundamental service required both by individual citizens and businesses alike."

+ From the web site, "The project will conduct research in the design, use and interoperability of topic-specific search engines with the goal of developing an open source prototype of a distributed, semantic-based search engine. Existing search engines provide poor foundation for semantic web operations, and US companies such as Google are becoming monopolies, distorting the entire information landscape."

+ Goal #1 -- Implement a semantic-based search engine, with the code as Open Source. A public demonstration of a topic-specific search engine in a topic to be determined will be done.

Last edited by garyp : 09-08-2004 at 02:37 PM.
garyp is offline   Reply With Quote
Old 09-08-2004   #4
Nick W
Member
 
Join Date: Jun 2004
Posts: 593
Nick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the rough
Very interesting stuff, i'd love to see it happen.

How would they deal with the fact that people would look at the algo and optimize for it though? - Would it be a problem if they did?

I'll follow that, thanks for the links! ;-)

Nick
Nick W is offline   Reply With Quote
Old 09-08-2004   #5
Dodger
Honorary Member
 
Dodger's Avatar
 
Join Date: Jun 2004
Location: Central US
Posts: 349
Dodger has a spectacular aura aboutDodger has a spectacular aura aboutDodger has a spectacular aura about
I found out about this in early July when it's spider hit a couple of sites. If you check your logs, you will see the User-agent string larbin_2.6.3_for_(http://cosco.hiit.fi/search/) Tomi.Silander (the name at the end is an email address I think, I have seen a couple of other names used.) It "sampled" the sites and I have not seen it since.

The project I think is being run by this group http://cosco.hiit.fi/people.html where you will see that one of the researchers names that matches the Larbin bot UA string, who I think is in Beijing right now.

I almost forgot about this until I read this post. It appears they are behind a little on their demo of the project. The site does not seem to have been updated since I last saw it.
__________________
I am Ronnie
Dodger is offline   Reply With Quote
Old 09-20-2004   #6
Nick W
Member
 
Join Date: Jun 2004
Posts: 593
Nick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the rough
http://www.alvis.info/
Quote:
The vast quantity of information sets new challenges for even the best commercial search engines. Building next generation search engines is not just a question of scaling existing techniques. What is needed is a departure from the existing keyword search that has made current search cumbersome even for the skilled. Qualitatively better ways are needed to allow more meaningful, semantically aware queries, and new delivery modes are needed to make search another common resource in the spirit of the web itself, to make search peer to peer.
Nick

Last edited by Nick W : 09-20-2004 at 04:03 AM.
Nick W is offline   Reply With Quote
Old 09-20-2004   #7
newreality
Member
 
Join Date: Jun 2004
Posts: 315
newreality is on a distinguished road
What we will be doing, beyond working with keywords and assessing semantics and that whole intertwined snarl, is going beyond the word itself. This is clear realizing that translators will have their cutoff limitations.

We will be moving closer and closer through the intent of every person. This is obvious looking not only at search engine technology but global society as a whole.

We will also realize that this is not philosophy being relegated to some categorized subject matter for any certain nation or people. This is the new reality. We are in the process of moving beyond the category itself. To think it is actualization.

"How" won't be the important question. If there is a question, it will be "why".
newreality is offline   Reply With Quote
Old 09-27-2004   #8
massa
Member
 
Join Date: Jun 2004
Location: home
Posts: 160
massa is just really nicemassa is just really nicemassa is just really nicemassa is just really nicemassa is just really nice
>How would they deal with the fact that people would look at the algo and optimize for it though?<

That is how it should of been from the very beginning. Optimizing your site should not be a bad thing. Every webmaster SHOULD optimize their sites. Optimizing your site does not a spammer make. Optimizing your site just makes you a better webmaster.

Making up and changing the rules of the game and then penalizing people for breaking self-serving guidelines is what should have been recognized as the problem all along. It never had to be that way. It could have all been looked at differently right from the start. We, as an industry, have tried to commit virtual hari-kari on ourselves by sitting idle and allowing major corporate propaganda machines to cause us to divide against ourselves. We should have had the balls to cry foul and unite the very first time Infoseek's spam assasin made a post at the Warrior's Forum calling one of our own a spammer. Remember Infoseek? The great promise of a search engine who was one of the first to set forth "guidelines" that claimed to protect the internet from evil while cloaking their own lack of ability to build a better search engine. Didn't their vice president get busted for trying to pick up a 13 year old he wooed in a chat room just before Disney bought them? Am I the only one who appreciates that kind of irony?

The commercial internet is a pretty good self-policing entity. None of us do anything for very long that does not work. All a search engine has to do is make what they don't like not work. Bingo! It don't work, we don't do it if we want to be in the top of that engine. No need to call people names. No need to wipe out peoples' businesses, no need to try to dictate "rules" that had no reason to exist in the first place to try to save an internet that didn't need saving in the first place. Getting people in our own industry to defend these poor, helpless engines with nothing but a few billion to work with against all those nasty, evil spammers that also never existed in the first place, now THAT is PR!

It is a personal triumph for me to see other people in the search engine industry realizing that allowing total power without demanding accountability creates monopolies and that is not a good thing.
massa is offline   Reply With Quote
Old 09-27-2004   #9
Nick W
Member
 
Join Date: Jun 2004
Posts: 593
Nick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the rough
Man.. my Moz browser dont like the rep system here, but im moved to go open Opera and give you the proverbial big sloppy kiss for that massa.

You're an eloquent speaker on such matters and it's appreciated.

Nick
Nick W is offline   Reply With Quote
Old 09-28-2004   #10
I, Brian
Whitehat on...Whitehat off...Whitehat on...Whitehat off...
 
Join Date: Jun 2004
Location: Scotland
Posts: 940
I, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of light
Excellent comments, Bob - would you mind if I quote an excerpt offsite?
I, Brian is offline   Reply With Quote
Old 09-28-2004   #11
massa
Member
 
Join Date: Jun 2004
Location: home
Posts: 160
massa is just really nicemassa is just really nicemassa is just really nicemassa is just really nicemassa is just really nice
Thank you for the compliments on my hillbillyesque, self-defecating remarks. While I'm not sure why anyone would want to, (Google may wipe out your page rank for mentioning me you know), anyone is welcome to repeat anything I say if it serves them well. That may sound strange not demanding a link or something, it's just that over the past two years, I've been quoted and misquoted a few times. Once in a while in my defense, but more often using my own words against me. What I have found is that what others do to me or for me has not nearly the impact as what I do to or for myself.

Quote in good health with my blessings. I appreciate you asking. It's a sign of good manners, character and respect.

Nick W, I really appreciate a compliment like that coming from you. I have been a great fan of your posts for a while now. That said, I urge you to try to develop an offline relationship with a real human. That kiss thing was really descriptive and indicative that you may be having some urges not being met in cyber space.

Good luck with that.

Have we strayed off topic far enough or should I keep going?

Last edited by massa : 09-28-2004 at 05:02 PM.
massa is offline   Reply With Quote
Old 09-28-2004   #12
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Quote:
Originally Posted by massa
Have we strayed off topic far enough or should I keep going?
I love and respect you all too. But, let's not go too far off topic here.

Great posts!
rustybrick is offline   Reply With Quote
Old 09-28-2004   #13
Nick W
Member
 
Join Date: Jun 2004
Posts: 593
Nick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the rough
HAHA, you're not english massa, it dont mean nothing ;-)

Nick
Nick W is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off