View Full Version : SpiderSafeURL Gives You More PR
bobmutch
12-03-2004, 11:35 AM
I just had SpiderSafeURL (a free product put out by cfdev.com) installed on the server hosting my coldfusion Inventory Listing Web Application so I won't have URLs in my source pointing to the inventory.cfm?id=777 type file names.
It is just a simple DLL that is installed on the IIS server and it gives IIS the ability to convert /'s back to ?, =, and & .
On the source I just did a #replace("url", "/", "?=&", "all")# that replaced all the ?=& with / so now there are no URLs with inventory.cfm?id=777 in the source.
carsinlondon.com/forsale-london-ontario.cfm?ID=216&for-sale-used-=cars&car&buick¢ury
now shows up as
carsinlondon.com/forsale-london-ontario.cfm/ID/216/for-sale-used-cars/car/buick/century
in the source and there is no more PR bleed to query strings and in the next toolbar PR the bleed will be gone.
disclaimer: I am not assocated with cfdev.com other than I use their SpiderSafeURL product.
I, Brian
12-03-2004, 12:14 PM
Sounds good. :)
Now if only Apache were as easy...
powerofeyes
12-03-2004, 02:44 PM
in the source and there is no more PR bleed to query strings and in the next toolbar PR the bleed will be gone.
If you are talking only about Toolbar PR then its fine, There is no PR bleed or what ever you call it in the internal Pagerank which Google knows,
Query strings doesnt seem to affect internal Pagerank calculation, It only affects the Visible Pagerank display on the toolbar,
seomike
12-03-2004, 03:01 PM
Sorry to be the bearer of bad news. but You'll be lucky to get spiders to even crawl directories that deep unless pushed by some hefty links.
AussieWebmaster
12-03-2004, 03:16 PM
I just had SpiderSafeURL (a free product put out by cfdev.com) installed on the server hosting my coldfusion Inventory Listing Web Application so I won't have URLs in my source pointing to the inventory.cfm?id=777 type file names.
It is just a simple DLL that is installed on the IIS server and it gives IIS the ability to convert /'s back to ?, =, and & .
On the source I just did a #replace("url", "/", "?=&", "all")# that replaced all the ?=& with / so now there are no URLs with inventory.cfm?id=777 in the source.
carsinlondon.com/forsale-london-ontario.cfm?ID=216&for-sale-used-=cars&car&buick¢ury
now shows up as
carsinlondon.com/forsale-london-ontario.cfm/ID/216/for-sale-used-cars/car/buick/century
in the source and there is no more PR bleed to query strings and in the next toolbar PR the bleed will be gone.
disclaimer: I am not assocated with cfdev.com other than I use their SpiderSafeURL product.
But does it create a hardcode of the page or converts it back when requested inbound?
If it does not convert it is naming a page that does not exist.
bobmutch
12-03-2004, 03:31 PM
AussieWebmaster: "But does it create a hardcode of the page or converts it back when requested inbound? If it does not convert it is naming a page that does not exist." No sure if I am following your question. Its a dynamic page lets say example.com/inventory.cfm?id=777 . All the SpiderSafeURL dll does is let you ask IIS for the same page with example.com/inventory.cfm/id/777 and it returns what example.com/inventory.cfm?id=777 would have given.
Then all you do from there is do a replace() on the links in your code so that they are not example.com/inventory.cfm?id=777 but example.com/inventory.cfm/id/777 .
seomike: Very interesting point. I am just wondering about your position; if I have links to files like carsinlondon.com/forsale-london-ontario.cfm/ID/183/for-sale-used-cars/car/chevrolet/monte-carlo off my one of my subs you trying to say the Googlebot will not crawl them because it is some many directories down?
I would say the Google bot will not got to page10 and then from there go down a level to Page100 and then go down to Page1000 and then go down to page10000. But as far as Googlebot finding a link with a few directories in it before it hits the contect do you really think Googlebot will not crawl that page even though it is, lets say, directly linked to the home page?
I can do it this way I guess.
I changed
carsinlondon.com/forsale-london-ontario.cfm/ID/183/for-sale-used-cars/car/chevrolet/monte-carlo
to
carsinlondon.com/forsale-london-ontario.cfm/ID-183-for-sale-used-cars-car-chevrolet-monte-carlo
AussieWebmaster
12-03-2004, 05:49 PM
You covered my question.
bobmutch
12-03-2004, 09:05 PM
powerofeyes: "If you are talking only about Toolbar PR then its fine, There is no PR bleed or what ever you call it in the internal Pagerank which Google knows, Query strings doesnt seem to affect internal Pagerank calculation, It only affects the Visible Pagerank display on the toolbar"
There are 2 kinds of PR. Real PR and visible PR. PR bleed refers to real PR value that is voted to other external pages (sites) via external outbound links that would remain within the site if there was no external links (or less external links). Example: 11 page fully meshed link structure with 10 external outbound links on the home page. 10/20's or 50% of the real PR vote is voted to external pages (sites) that would have been voted to internal pages if there were no external links.
Toolbar PR (visible) is just a 10 unit logarithmic or exponential scale that represents the whole range of real PR. So no, I am not talking about Toolbar PR. I am talking about real PR. Everything to do with ranking weight, algo's, bleed, voted all has to do with real PR. The PR bleed is real PR.
Having said that if you bleed enough of your PR off a page (or the site as one page effects the others) next toolbar PR update if the real PR is no longer in the toolbar PR range you used to have the toolbar PR will be lower.
So when I talk about PR being bleed to a file with a query string (inventory.cfm?) I am talking about real PR not toolbar PR.
powerofeyes
12-03-2004, 11:12 PM
I changed
carsinlondon.com/forsale-london-ontario.cfm/ID/183/for-sale-used-cars/car/chevrolet/monte-carlo
to
carsinlondon.com/forsale-london-ontario.cfm/ID-183-for-sale-used-cars-car-chevrolet-monte-carlo
There is no need for it, What SEomike is saying is that it seems the first URL seems that the site has so many directories and that googlebot wont crawl that much deeper, But from what I can see you dont seem to have that much directories deep, From the 2nd URL I can understand the page is only 2 or 3 levels deep, that wont be an issue even if the URL is the first one or the 2nd one,
Search engines dont usually see the number of directories in the URL, they mostly see the source of the link, If the link is 2 or 3 level deep from the main link they will crawl it regardless of the words or directory structure in query string, We just have to make the URL search engine friendly in some cases like the way you did
Regarding pagerank i dont want to hijack this thread into a PR discussion, i know you already have the knife pointing at me :D
Mikkel deMib Svendsen
12-03-2004, 11:27 PM
carsinlondon.com/forsale-london-ontario.cfm/ID-183-for-sale-used-cars-car-chevrolet-monte-carlo
That will get you trapped up in the "if-there-are-too-many-hyphens-its-probably-spam" filter :)
Joke aside, in my experience that sort of URLs are not good. You want something as short, nice and easy to read as possible. This often mean you have to make a more "intelligent" conversion of the URL instead of just converting carachters like you do now.
There are also some information you could, in my mind strip out of your suggested URLs - stuff that dosen't make any sense to users or engines, such as: /ID/183/
I would maybe stript it don to:
carsinlondon.com/chevrolet/monte-carlo.html
To make an easy lookup you could extend the file name with the page ID like this:
carsinlondon.com/chevrolet/monte-carlo_183.html
Now, you have all the information you really need right in the URL in order to pull the right page from the server.
bobmutch
12-04-2004, 12:06 AM
Mikkel deMib Svendsen: All the infor I need to get the right record is id=183. I agree there may be a few to many dashs in the URL : ) Also I am not worried about people reading the URL either and I can't stip out id=183 as that is the infor that is need to pull the record.
I guess I am back to the question about the carsinlondon.com/forsale-london-ontario.cfm/ID/183/for-sale-used-cars/car/chevrolet/monte-carlo URL.
seomike said "You'll be lucky to get spiders to even crawl directories that deep..." but the directories are not that deep. That is just a link. That URL could be off my home page. I don't think that a spider is going to say that is more than 4 directries deep I am not going to crawl it. It just looks at it as a URL and goes there.
Now if it hand to got to forsale-london-ontario.cfm and like up a link to the dir call ID and from there to a dir 183 and then from there... all the way down then no it won't crawl that deep with out some heavy PR links.
I do agree that dashed link is to spammy but the many-directories link should be ok.
powerofeyes: I agree I don't think it should be a problem but I would like to know for sure!
Please explain to me how changing the url from one form to another is going to affect the PR bleed?
Why worry about PR bleed at all?
bobmutch
12-04-2004, 02:07 AM
Pages URLs with query strings such as inventory.cfm?id=777 are seen as another file by Google. So you have inventory.cfm and inventory.cfm? both have PR voted to them.
The inventory.cfm? URL would be found in the source of the inventory.cfm page and would have PR voted to it by the inventory.cfm page. While both of these files are seen as the same by IIS and Apache Google sees them as differnt files.
By changing the inventory.cfm?id=777 URL in the source with a #replace("url", "/", "?=", "all")# you get inventory.cfm/id/777 and there is no more inventory.cfm? for the inventory.cfm page to vote PR to. Then SafeSpiderURL dll that is installed on the IIS server is able to take inventory.cfm/id/777 and convert it to inventory.cfm?id=777 and pass it to the coldfusion server.
One example why someone may not want to bleed their PR away into phatom query string pages would be to negate the possiblity of having a page slip into a lower toolbar PR value.
You have lost me Bob. A link is a link is a link - if you link to a page (any page) from the source of one page it is still counted as a link regarless of the form of the URL.
You seem to be saying that a link to inventory.cfm?id=777 will not be seen as a link but a link to inventory.cfm/id/777 will not be???
seomike
12-04-2004, 03:06 AM
I"ve learned the hard way with deep directory structures. Even Yahoo would have a tough time getting crawls in the deep directory/folders if it weren't for the high page scores.
I usualy don't go 3 deep.
bobmutch
12-04-2004, 03:12 AM
I was noting that the URLs example.com/inventory.cfm? and example.com/inventory.cfm are seen as two differnet files by Google but by IIS and Apache the same file. When the example.com/inventory.cfm? URL is in the source of the example.com/inventory.cfm file, inventory.cfm will vote PR to it.
We get a simular example with example.com and www*example.com . When there are inbound links to both these URLs they both get PR. When this happens you can 301 one of them into the other and the one will get all the PR voted to it. You can't do that with inventory.cfm? and inventory.cfm as IIS and Apache see them as the same file and it would create a loop. Therefore you eliminate one of them.
bobmutch
12-04-2004, 03:18 AM
seomike: I think you are missing my question to you altogether. You seem to be saying this link carsinlondon.com/forsale-london-ontario.cfm/ID/183/for-sale-used-cars/car/chevrolet/monte-carlo is 7 directories deep, and therefore a search engine spider will not be as inclined to crawl it as if it was carsinlondon.com/chevrolet/monte-carlo which is only 2 dir deep.
My question is if this "7 deep directory" URL is on the home page, are you saying that the search engine spiders see it as being 7 directories deep and will not be inclined to crawl the link?
I was noting that the URLs example.com/inventory.cfm? and example.com/inventory.cfm are seen as two differnet files by Google but by IIS and Apache the same file. When the example.com/inventory.cfm? URL is in the source of the example.com/inventory.cfm file, inventory.cfm will vote PR to it.
We get a simular example with example.com and www*example.com . When there are inbound links to both these URLs they both get PR. When this happens you can 301 one of them into the other and the one will get all the PR voted to it. You can't do that with inventory.cfm? and inventory.cfm as IIS and Apache see them as the same file and it would create a loop. Therefore you eliminate one of them.
Yes but in eliminating one of them you have created another, so I fail to see the point.
bobmutch
12-04-2004, 12:49 PM
Mel: The one eliminating was useless and the one created is useful.
So you still have two links out, but one has a different name how does that give you more PR or reduce PR bleed.
bobmutch
12-04-2004, 01:59 PM
Mel: PR bleed is PR lost to a cause that doesn't help you. A phantom file that gets PR is PR lost. PR voted to a page you want displayed on the internet is a good thing.
PR bleed is PR that could be contributing to your site pages but is not, it has nothing to do with causes or morals.
At any rate the effect on your PR is exactly the same in each case and I thought the title of this thread said somthing about using this tool was going to improve your PR?
bobmutch
12-05-2004, 02:40 AM
Mel: "PR bleed is PR that could be contributing to your site pages but is not." I would agree with that statement.
What it does do it get rid of PR being bleed to phatom pages and being applied to pages that you want to Rank.
seomike
12-06-2004, 01:29 PM
My question is if this "7 deep directory" URL is on the home page, are you saying that the search engine spiders see it as being 7 directories deep and will not be inclined to crawl the link?
I doubt it will crawl it unless you have a high page score on your index page.
I don't think you can put a universal rule to what googlebot will and won't crawl as far a deepness goes. It's more like trial and error since I couldn't even get google to crawl three directories deep untill I had a pr of 4 on 3 of my personal sites. And it was going index.htm >> /folder/folder/folder/file.htm which isn't as deep as the above urls.
for about 6 months and until I got enough links into the core nothing happened. then all of a sudden the entire site got crawled. People that are just getting into mod rewrites for the first time generally make this mistake like i did with my first couple of mod sites.
Now when I do a mod I only go 1 directory deep or less and give unique names to files so I can query the database using names instead of id numbers (it makes the url look less like a mod).
/folder/file.php
file.php
bobmutch
12-06-2004, 02:54 PM
seomike: Ok thanks for the feedback. I will give this some though and try to decide what the best thing to do is. I will post on what I do and then report how it works out.