Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Other Search Engines & Directories
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 06-15-2005   #1
randfish
Member
 
Join Date: Sep 2004
Location: Seattle, WA
Posts: 436
randfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to all
Lightbulb OpenRank - An Open Source WWW Index

One of the biggest issues we have in SEO/SEM is getting accurate measurements of link popularity, index size, and number of results. These figures are critical for many SEO/M projects who would like to take a more technical approach to optimization.

The new OpenRank project - www.openrank.org - was started just recently with the goal of creating an open source index of web pages - much like a search engine without an algorithm or the ability to search.

Obviously, this project is an enormous undertaking, and will require massive investment and participation. I'm hoping to spur some discussion here at SEW on the subject - guage the interest of the community, and hopefully get your input and ideas.

I personally think this type of service would be a massive boost to the tools available to people in SEO/M. The project would also allow for 3rd party querying and creation of tools, allowing for an unlimited amount of creativity in attacking search engine rankings.
randfish is offline   Reply With Quote
Old 06-15-2005   #2
Scottie
In search of stuff...
 
Join Date: Jun 2004
Location: Columbia, SC
Posts: 45
Scottie has a spectacular aura aboutScottie has a spectacular aura about
Details?

Ok, so I could just click the link and read it, but I'm lazy. What are the requirements for participation, Rand?
Scottie is offline   Reply With Quote
Old 06-15-2005   #3
I, Brian
Whitehat on...Whitehat off...Whitehat on...Whitehat off...
 
Join Date: Jun 2004
Location: Scotland
Posts: 940
I, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of light
I'm afraid I don't understand what the actual aims of the project are - for example, are rights to use and apply commercial tools to the data really open?

Regardless, and as a general point, I should think that any attempt to provide an open source search solution is something I'm happy to help support - though as a potential aside point, would perhaps need to think aside from the concept of datacenters of server farms for powering such projects, and instead look to the method of use of bit torrents hosted upon webmaster sites and similar as a way of sharing and processing resources on a large scale. 2c.
I, Brian is offline   Reply With Quote
Old 06-15-2005   #4
Marcia
 
Marcia's Avatar
 
Join Date: Jun 2004
Location: Los Angeles, CA
Posts: 5,476
Marcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond repute
Quote:
open source index of web pages - much like a search engine without an algorithm or the ability to search.
But algorithms and search are what *define* a search engine. This is in the search relevancy and technology forum - what kind of search technology is it using? And if there's no algo, how is relevancy determined?

A search engine crawls, indexes and retrieves results using algorithms - which is why it's called a "search" engine - yes or no, or am I missing something? So if there's no algo and no search, what makes this any different from a directory?

Last edited by Marcia : 06-15-2005 at 07:10 PM.
Marcia is offline   Reply With Quote
Old 06-15-2005   #5
Scottie
In search of stuff...
 
Join Date: Jun 2004
Location: Columbia, SC
Posts: 45
Scottie has a spectacular aura aboutScottie has a spectacular aura about
Diy?

Is the idea that you could build you own search engine and test algos on this open source index of sites? That might be kind of cool.

The next and better Google could come from an underfunded mathematical mind that doesn't have to worry about the semantics of getting the database to run the algo on...
Scottie is offline   Reply With Quote
Old 06-15-2005   #6
xan
Member
 
Join Date: Feb 2005
Posts: 238
xan has a spectacular aura aboutxan has a spectacular aura about
It sounds just like an index to use as a test bed.

Agreed marcia, it's not a search engine anything. It's a repository of web pages.

Scottie, the problem is running the an algorithm on the database.

Last edited by xan : 06-15-2005 at 08:16 PM.
xan is offline   Reply With Quote
Old 06-15-2005   #7
randfish
Member
 
Join Date: Sep 2004
Location: Seattle, WA
Posts: 436
randfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to all
Sorry to be vague. I guess it does require clarification.

What we're talking about is an index, similiar to Google or Yahoo!'s index of millions and millions of web pages. These pages would be stored on servers and be accessible to the public not via a search engine, but via development codes or other types of access.

In other words, there is no "search engine", just the framework of stored web pages. The purpose for this would be to help better understand the world wide web and allow for open-source coders and tool builders to create their own systems that use the data.

My personal goals would be:

1. Create an alternative to PageRank using the link structure stored in the database. This new measure of global popularity would be both "accurate and precise" and could help people see exactly what links are influencing their "OpenRank"

2. Create a measurement system of local popularity by segmenting pages into subject-specific communities and then examining the popularity of pages based on their links within those communities.

3. Create tools that would conduct more advanced and comprehensive link analysis then is currently available using the APIs or link commands at the search engines. One could, for example, see the exact number of links to each web page with a particular anchor text or view all the links sorted by anchor text, or by extracted page topic or poularity, etc.

The possibilities of what could be done with an open index like this are endless, but the idea really spawned when Google's PageRank went down, and people started e-mailing me asking if I could come up with a substitute for PR.

As xan knows, this project is almost ludicrously ambitious in size and scope, but there has already been such a buzz since I first mentioned the possibility of it on my blog, that I think it's at least worthwhile to pursue until we have a good estimate of the costs of starting and operating the project - at which time we can truly decide on the feasability.

As for why it's in the search technology thread, I simply could not think of a more appropriate thread to put it in and spent some time debating between this and beta test.

Quote:
though as a potential aside point, would perhaps need to think aside from the concept of datacenters of server farms for powering such projects, and instead look to the method of use of bit torrents hosted upon webmaster sites and similar as a way of sharing and processing resources on a large scale. 2c.
This is an excellent idea Brian. I know search engines can't do this because queries would take more than 10 full seconds to run even over a speedy distributed network, but perhaps an installed client that spiders the web and sends link and stripped down page data back to a central server (or many)could help do the job.

Last edited by randfish : 06-15-2005 at 08:54 PM.
randfish is offline   Reply With Quote
Old 06-15-2005   #8
NFFC
"One wants to have, you know, a little class." DianeV
 
Join Date: Jun 2004
Posts: 468
NFFC is a splendid one to beholdNFFC is a splendid one to beholdNFFC is a splendid one to beholdNFFC is a splendid one to beholdNFFC is a splendid one to beholdNFFC is a splendid one to behold
Would building a "web map" about sum it up?

>this project is almost ludicrously ambitious in size and scope

No almost about it.
NFFC is offline   Reply With Quote
Old 06-15-2005   #9
projectphp
What The World, Needs Now, Is Love, Sweet Love
 
Join Date: Jun 2004
Location: Sydney, Australia
Posts: 449
projectphp is a splendid one to beholdprojectphp is a splendid one to beholdprojectphp is a splendid one to beholdprojectphp is a splendid one to beholdprojectphp is a splendid one to beholdprojectphp is a splendid one to beholdprojectphp is a splendid one to behold
Rand, not to be blunt, but is this your initiative? So many questions. What is the purpose, what are the goals, who is backing it etc etc.

The only Info I have is one sentence:
Quote:
A project for an open source tool to measure the value of link popularity.
What does that mean?

Can you perhaps put up a page on the site about the principals, goals, ideals etc etc. http://dmoz.org/socialcontract.html is a good guide, and that would help clarrify things a bit!
projectphp is offline   Reply With Quote
Old 06-16-2005   #10
dazzlindonna
Internet Entrepreneur
 
Join Date: Jan 2005
Location: Franklinton, LA, USA
Posts: 91
dazzlindonna is a glorious beacon of lightdazzlindonna is a glorious beacon of lightdazzlindonna is a glorious beacon of lightdazzlindonna is a glorious beacon of lightdazzlindonna is a glorious beacon of light
To add to Brian's idea, I was imagining something along the lines of what Grub did/does at grub.org
dazzlindonna is offline   Reply With Quote
Old 06-16-2005   #11
stuntdubl
Traffic not SEO.
 
Join Date: Jun 2004
Location: Upstate NY
Posts: 45
stuntdubl is a splendid one to beholdstuntdubl is a splendid one to beholdstuntdubl is a splendid one to beholdstuntdubl is a splendid one to beholdstuntdubl is a splendid one to beholdstuntdubl is a splendid one to beholdstuntdubl is a splendid one to behold
"An open source project to have local and global link popularity measurable independently of any search engine" - is the tentative project goal at this point. Though I think these are the type of things that can certainly be put to a vote once there is more opinion voiced on the matter, and we have a more comprehensive project scope.

Quote:
participation
Certainly anyone can add ideas to the project at this point. I tossed up openrank.org as a repository for folks to share ideas on the topic. I think certainly an outline of the technology and goals of the project will be required and is very important at some point, but I think the discussion of issues is important to address firstly.

Quote:
...would perhaps need to think aside from the concept of datacenters of server farms for powering such projects, and instead look to the method of use of bit torrents hosted upon webmaster sites and similar as a way of sharing and processing resources on a large scale.
Best suggestion I've heard yet. Decentralizing something like this would be a great way to go I think, this way it is not dependant on financial resources to purchase the necessary processing power. I would imagine storage will still be an issue, but all it takes is a few ideas like this to come up with great solutions.

Quote:
It sounds just like an index to use as a test bed.
Agreed marcia, it's not a search engine anything. It's a repository of web pages.
Actually, it's not even really that yet. It's not really anything except for a few good ideas. I think those good ideas could certainly turn into a lot of tangible projects though if they share a fairly common vision and understanding that is communicated publicly and non-commercially. I think folks will visualize it a bit differently at this point, and throwing all those thoughts and ideas out on the table and somehow organizing and prioritizing them will be very important to the success of such a project.

I think there are some amazing ideas floating around out there in the SEO community, that could really benefit a project like this if folks decide that it really is a worthwhile venture. The discussion and ideas so far have been extremely positive. Yes, it would be a boatload of work, but I don't think it has to be done tomorrow either. This can be an ongoing thing that people support as time and resources permit.

From those who may have dealt with non-profits or open source projects in the past, do you have suggestions of how to make the communication or promotion of the organization or community more effective?

Quote:
Can you perhaps put up a page on the site about the principals, goals, ideals etc etc. http://dmoz.org/socialcontract.html is a good guide, and that would help clarrify things a bit!
Another great idea! There are certainly a TON of "need-to-do's" from all aspects of this project. If anyone can help, please stop by the site and register, and state how you may be able to help. We're gonna need all the help we can get We'll keep adding thoughts and ideas, and hopefully others will do the same.

Last edited by stuntdubl : 06-16-2005 at 10:58 AM.
stuntdubl is offline   Reply With Quote
Old 06-16-2005   #12
randfish
Member
 
Join Date: Sep 2004
Location: Seattle, WA
Posts: 436
randfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to all
Todd - Thank you. I should have given you credit for the site in the first post - an error of omission.

Right now, OpenRank is just a set of ideas, but from reading around the web, I feel that this type of project has the support of many people in the SEO/M community. Certainly, I'd love it if the project could run off donations, but that doesn't seem entirely feasible unless we were all giving up $1000 - and that ROI could take quite a while.

What will probably happen is that SEOmoz will foot the initial bill along with perhaps 2-4 other individuals or companies. I have received several e-mails with offers of financial support already (thank you!). I'm hoping to find 1-2 programmers who have significant interest in the project and skill who can contribute some of their time and expertise. I can probably hire a 3rd person to work on this full-time in the next 6-9 months.

The biggest initial problem will be hosting fees and bandwidth charges for this massive amount of data scraping, which is why Brian's idea of a distributed spidering system that we could all install and run in the background would be such a great idea. Everything in this project takes time and money, and these are resources we haven't yet solved. Certainly my little company does not have the deep pockets to fund all of this development, but even little by little, I believe it's a step that has taken too long to begin.

So let me postulate some ideas of what you can do to help:

1. Sign up at OpenRank and contribute ideas
2. E-mail myself or Todd if you are interested enough to contribute programming time or funding
3. Keep you eyes and ears open about the project and consider if you would be willing to install a distributed computing software piece on your home/work machines to help the project.

That's all there is to do actively for now, but I want to thank so many people who have contacted me privately or published publicly that they are supportive of this effort. I think Todd & I will see what we can do about organizing an e-mail notification newsletter on the subject of OpenRank.
randfish is offline   Reply With Quote
Old 06-16-2005   #13
stuntdubl
Traffic not SEO.
 
Join Date: Jun 2004
Location: Upstate NY
Posts: 45
stuntdubl is a splendid one to beholdstuntdubl is a splendid one to beholdstuntdubl is a splendid one to beholdstuntdubl is a splendid one to beholdstuntdubl is a splendid one to beholdstuntdubl is a splendid one to beholdstuntdubl is a splendid one to behold
Thanks Rand. As mentioned, a "social contract" type document similar to the ODP's or something would be nice. I think this should be on the top of the agenda so folks can be completely comfortable with the project intentions from the get go.

The project will certainly need lots of assistance, but I think everyone will be more receptive to contributing their ideas (or even donations) to something if they understand exactly how they will be used, and how it could potentially benefit them as well.

>email newsletter

Pretty sure the cms software has this capability, and there is an rss feed that folks can subscribe to as well. I know there is functionality for polling as well if there are some questions you would like to get public opinion on.

Another "need" we can add to the list is someone with some drupal experience for a bit of assistance with the management of the site. I'm pretty much a noob with it, and just muddling my way through.

Thanks to everyone so far that has taken a little bit of time to post thoughts and ideas. I really think this is what will get the ball rolling, and the more input we can get the more likely it will be that this bird ever gets off the ground.
stuntdubl is offline   Reply With Quote
Old 06-16-2005   #14
I, Brian
Whitehat on...Whitehat off...Whitehat on...Whitehat off...
 
Join Date: Jun 2004
Location: Scotland
Posts: 940
I, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of light
You're asking for a sign up and help - but to be honest, I don't know what you are actually asking for help with and sign up for.

Both of you are people I respect - I've had the pleasure of speaking to Rand on the phone, and the stuntdubl blog is an excellent resource. Therefore whatever you are trying to get us involved with almost certainly is going to be interesting.

But - - - what are you actually asking for help with?? For someone reason I'm having difficulty reading what clear aims and end goals the project has.

It seems that somehow, somewhere, amongst all of the floating ideas, is the concept of an open source search movement. In which case, how much have you looked at hooking up with Nutch? Also, is it worth trying to chase after Jux2 and recover it?

Anyway, if you need resources then I should be able to help in some way - but I would be grateful for some clearer idea of what I can help with.
I, Brian is offline   Reply With Quote
Old 06-16-2005   #15
randfish
Member
 
Join Date: Sep 2004
Location: Seattle, WA
Posts: 436
randfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to all
Brian -

Right now OpenRank is a project with the intention to replace the measurement of PageRank with more accurate measurements that calculate both local and global popularity. Pretty simple.

We're also dedicated to building the project completely open source - so anyone can plug into the index we build or data we collect and use it for any (non-commercial) purpose.

The hows. whens and wheres are complex, but the whys and the whos are fairly direct. This is a long term project that requires help of all kinds. If you have interest in supporting it, please stop by the site, sign up for an account and start posting on the board. Whatever you can provide - time, ideas, money, server space, programming, contacts, etc. is welcome. This project was started because several people had the idea of an OpenRank at the same time and it's grown into a real project because of the interest from the SEO community and the number of supporters we've already found.
randfish is offline   Reply With Quote
Old 06-16-2005   #16
xan
Member
 
Join Date: Feb 2005
Posts: 238
xan has a spectacular aura aboutxan has a spectacular aura about
It could be a fun play, but there are a lot of projects that have been going for a while at sourceforge on alternatives to existing search technology, and a lot of stuff is available at code repositories and ofter projects will issue this code as well. An index is easily created. You need somewhere to put it all, a method to classify it all, and a damn heavy spider to constanly collect.

Maybe you can test the existing methods and available systems and see what works for you, but then you are tailoring something that returns search for you from a limited index.

It'd be cool to see what it churns out! An empty field has so much potential for building don't you think?
xan is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off