|
#1
|
|||
|
|||
|
One of the biggest issues we have in SEO/SEM is getting accurate measurements of link popularity, index size, and number of results. These figures are critical for many SEO/M projects who would like to take a more technical approach to optimization.
The new OpenRank project - www.openrank.org - was started just recently with the goal of creating an open source index of web pages - much like a search engine without an algorithm or the ability to search. Obviously, this project is an enormous undertaking, and will require massive investment and participation. I'm hoping to spur some discussion here at SEW on the subject - guage the interest of the community, and hopefully get your input and ideas. I personally think this type of service would be a massive boost to the tools available to people in SEO/M. The project would also allow for 3rd party querying and creation of tools, allowing for an unlimited amount of creativity in attacking search engine rankings. |
|
#2
|
|||
|
|||
|
Details?
Ok, so I could just click the link and read it, but I'm lazy. What are the requirements for participation, Rand?
|
|
#3
|
|||
|
|||
|
I'm afraid I don't understand what the actual aims of the project are - for example, are rights to use and apply commercial tools to the data really open?
Regardless, and as a general point, I should think that any attempt to provide an open source search solution is something I'm happy to help support - though as a potential aside point, would perhaps need to think aside from the concept of datacenters of server farms for powering such projects, and instead look to the method of use of bit torrents hosted upon webmaster sites and similar as a way of sharing and processing resources on a large scale. 2c. |
|
#4
|
||||
|
||||
|
Quote:
A search engine crawls, indexes and retrieves results using algorithms - which is why it's called a "search" engine - yes or no, or am I missing something? So if there's no algo and no search, what makes this any different from a directory? Last edited by Marcia : 06-15-2005 at 06:10 PM. |
|
#5
|
|||
|
|||
|
Diy?
Is the idea that you could build you own search engine and test algos on this open source index of sites? That might be kind of cool.
The next and better Google could come from an underfunded mathematical mind that doesn't have to worry about the semantics of getting the database to run the algo on... |
|
#6
|
|||
|
|||
|
It sounds just like an index to use as a test bed.
Agreed marcia, it's not a search engine anything. It's a repository of web pages. Scottie, the problem is running the an algorithm on the database. Last edited by xan : 06-15-2005 at 07:16 PM. |
|
#7
|
|||
|
|||
|
Sorry to be vague. I guess it does require clarification.
What we're talking about is an index, similiar to Google or Yahoo!'s index of millions and millions of web pages. These pages would be stored on servers and be accessible to the public not via a search engine, but via development codes or other types of access. In other words, there is no "search engine", just the framework of stored web pages. The purpose for this would be to help better understand the world wide web and allow for open-source coders and tool builders to create their own systems that use the data. My personal goals would be: 1. Create an alternative to PageRank using the link structure stored in the database. This new measure of global popularity would be both "accurate and precise" and could help people see exactly what links are influencing their "OpenRank" 2. Create a measurement system of local popularity by segmenting pages into subject-specific communities and then examining the popularity of pages based on their links within those communities. 3. Create tools that would conduct more advanced and comprehensive link analysis then is currently available using the APIs or link commands at the search engines. One could, for example, see the exact number of links to each web page with a particular anchor text or view all the links sorted by anchor text, or by extracted page topic or poularity, etc. The possibilities of what could be done with an open index like this are endless, but the idea really spawned when Google's PageRank went down, and people started e-mailing me asking if I could come up with a substitute for PR. As xan knows, this project is almost ludicrously ambitious in size and scope, but there has already been such a buzz since I first mentioned the possibility of it on my blog, that I think it's at least worthwhile to pursue until we have a good estimate of the costs of starting and operating the project - at which time we can truly decide on the feasability. As for why it's in the search technology thread, I simply could not think of a more appropriate thread to put it in and spent some time debating between this and beta test. Quote:
Last edited by randfish : 06-15-2005 at 07:54 PM. |
|
#8
|
|||
|
|||
|
Would building a "web map" about sum it up?
>this project is almost ludicrously ambitious in size and scope No almost about it. |
|
#9
|
|||
|
|||
|
Rand, not to be blunt, but is this your initiative? So many questions. What is the purpose, what are the goals, who is backing it etc etc.
The only Info I have is one sentence: Quote:
Can you perhaps put up a page on the site about the principals, goals, ideals etc etc. http://dmoz.org/socialcontract.html is a good guide, and that would help clarrify things a bit! |
|
#10
|
|||
|
|||
|
To add to Brian's idea, I was imagining something along the lines of what Grub did/does at grub.org
|
|
#11
|
||||
|
||||
|
"An open source project to have local and global link popularity measurable independently of any search engine" - is the tentative project goal at this point. Though I think these are the type of things that can certainly be put to a vote once there is more opinion voiced on the matter, and we have a more comprehensive project scope.
Quote:
Quote:
Quote:
I think there are some amazing ideas floating around out there in the SEO community, that could really benefit a project like this if folks decide that it really is a worthwhile venture. The discussion and ideas so far have been extremely positive. Yes, it would be a boatload of work, but I don't think it has to be done tomorrow either. This can be an ongoing thing that people support as time and resources permit. From those who may have dealt with non-profits or open source projects in the past, do you have suggestions of how to make the communication or promotion of the organization or community more effective? Quote:
We'll keep adding thoughts and ideas, and hopefully others will do the same.Last edited by stuntdubl : 06-16-2005 at 09:58 AM. |
|
#12
|
|||
|
|||
|
Todd - Thank you. I should have given you credit for the site in the first post - an error of omission.
Right now, OpenRank is just a set of ideas, but from reading around the web, I feel that this type of project has the support of many people in the SEO/M community. Certainly, I'd love it if the project could run off donations, but that doesn't seem entirely feasible unless we were all giving up $1000 - and that ROI could take quite a while. What will probably happen is that SEOmoz will foot the initial bill along with perhaps 2-4 other individuals or companies. I have received several e-mails with offers of financial support already (thank you!). I'm hoping to find 1-2 programmers who have significant interest in the project and skill who can contribute some of their time and expertise. I can probably hire a 3rd person to work on this full-time in the next 6-9 months. The biggest initial problem will be hosting fees and bandwidth charges for this massive amount of data scraping, which is why Brian's idea of a distributed spidering system that we could all install and run in the background would be such a great idea. Everything in this project takes time and money, and these are resources we haven't yet solved. Certainly my little company does not have the deep pockets to fund all of this development, but even little by little, I believe it's a step that has taken too long to begin. So let me postulate some ideas of what you can do to help: 1. Sign up at OpenRank and contribute ideas 2. E-mail myself or Todd if you are interested enough to contribute programming time or funding 3. Keep you eyes and ears open about the project and consider if you would be willing to install a distributed computing software piece on your home/work machines to help the project. That's all there is to do actively for now, but I want to thank so many people who have contacted me privately or published publicly that they are supportive of this effort. I think Todd & I will see what we can do about organizing an e-mail notification newsletter on the subject of OpenRank. |
|
#13
|
|||
|
|||
|
Thanks Rand. As mentioned, a "social contract" type document similar to the ODP's or something would be nice. I think this should be on the top of the agenda so folks can be completely comfortable with the project intentions from the get go.
The project will certainly need lots of assistance, but I think everyone will be more receptive to contributing their ideas (or even donations) to something if they understand exactly how they will be used, and how it could potentially benefit them as well. >email newsletter Pretty sure the cms software has this capability, and there is an rss feed that folks can subscribe to as well. I know there is functionality for polling as well if there are some questions you would like to get public opinion on. Another "need" we can add to the list is someone with some drupal experience for a bit of assistance with the management of the site. I'm pretty much a noob with it, and just muddling my way through. Thanks to everyone so far that has taken a little bit of time to post thoughts and ideas. I really think this is what will get the ball rolling, and the more input we can get the more likely it will be that this bird ever gets off the ground. |
|
#14
|
|||
|
|||
|
You're asking for a sign up and help - but to be honest, I don't know what you are actually asking for help with and sign up for.
Both of you are people I respect - I've had the pleasure of speaking to Rand on the phone, and the stuntdubl blog is an excellent resource. Therefore whatever you are trying to get us involved with almost certainly is going to be interesting. But - - - what are you actually asking for help with?? For someone reason I'm having difficulty reading what clear aims and end goals the project has. It seems that somehow, somewhere, amongst all of the floating ideas, is the concept of an open source search movement. In which case, how much have you looked at hooking up with Nutch? Also, is it worth trying to chase after Jux2 and recover it? Anyway, if you need resources then I should be able to help in some way - but I would be grateful for some clearer idea of what I can help with. |
|
#15
|
|||
|
|||
|
Brian -
Right now OpenRank is a project with the intention to replace the measurement of PageRank with more accurate measurements that calculate both local and global popularity. Pretty simple. We're also dedicated to building the project completely open source - so anyone can plug into the index we build or data we collect and use it for any (non-commercial) purpose. The hows. whens and wheres are complex, but the whys and the whos are fairly direct. This is a long term project that requires help of all kinds. If you have interest in supporting it, please stop by the site, sign up for an account and start posting on the board. Whatever you can provide - time, ideas, money, server space, programming, contacts, etc. is welcome. This project was started because several people had the idea of an OpenRank at the same time and it's grown into a real project because of the interest from the SEO community and the number of supporters we've already found. |
|
#16
|
|||
|
|||
|
It could be a fun play, but there are a lot of projects that have been going for a while at sourceforge on alternatives to existing search technology, and a lot of stuff is available at code repositories and ofter projects will issue this code as well. An index is easily created. You need somewhere to put it all, a method to classify it all, and a damn heavy spider to constanly collect.
Maybe you can test the existing methods and available systems and see what works for you, but then you are tailoring something that returns search for you from a limited index. It'd be cool to see what it churns out! An empty field has so much potential for building don't you think? |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|