PDA

View Full Version : Welcome to the Dynamic Web Site Issues forum


Mikkel deMib Svendsen
06-02-2004, 10:48 AM
I am looking very much forward to see what this dynamic forum will bring. As anybody that knows me know, I am a dynamic-geek but there is always room to learn and I hope to do so from all the great people that will soon populate this forum.

I will personally do my best to share usefull information about this important subject. My personal experience, working with search engine marketing and dynamic websites is that it's not really a problem but rather an option. It's an option to do better than any static site will ever do.

Lets get some real postings roling :)

rustybrick
06-02-2004, 10:55 AM
I am sure the spiders will get better at indexing dynamic content, so I do anticipate this forum to get a lot of use. I've heard you speak at SES and I look forward to your posts in this forum.

Welcome!

Mikkel deMib Svendsen
06-02-2004, 11:00 AM
> I am sure the spiders will get better at indexing dynamic content

At SES in Toronto Chris Sherman talked about a conversation he had with some of the leading engineers at Google, working with the issues on crawling dynamic content. Google said, according to Chris, that for them to develop a solution that would uncover the entire "hidden web" it would (conservatively) take 50 years! I bet they are right.

So, for the next 50 years or so we sure need this forum :)

rustybrick
06-02-2004, 11:02 AM
Toronto was a long time ago. ;)

Uncovering all of the 'hidden web', in my opinion, will never be done.

I agree.

AdamH
06-02-2004, 12:27 PM
Do most people use mod_rewrites to create hard links and get dynamic content indexed, or other tactics?

I currently have a client whose site is in a proprietary content management system and am working to get this site more search engine friendly.

Any advice from those with prior experience would be greatly appreciated.

[Removed signature per guidelines]

Mikkel deMib Svendsen
06-03-2004, 06:44 AM
The reason I put The Hidden Web in "" is that I do not think you can ever define how many pages it represent. How many pages do Google have? One for every search made, right. There is an endless number of pages. And the same goes for most other query based dynamic ("hidden") web sites and content.

So, if the hidden web is infinite how can you ever index it all? :)

rustybrick
06-03-2004, 09:21 AM
Mikkel, lol, very true.

AdamH, I think most people use mod_rewrite if they are running their dynamic site on a Unix box. I use mod-rewrite all the time, not just for SEO reasons.

Mikkel deMib Svendsen
06-03-2004, 11:05 AM
Yes, mod_rewrite is only for Unix but with ISAPI filters you can do the same for NT - and, as you say, there are many interesting things you can use it for beside just SEO

Ron Carnell
06-03-2004, 02:58 PM
Do most people use mod_rewrite to create hard links and get dynamic content indexed, or other tactics?
I'm going to go against the grain a bit here, Adam, because IMO mod_rewrite should be the last tool a developer reaches for, not the first.

If your CMS isn't SE-friendly, that's a problem with the CMS, not with the web server, and the best solution will be implemented at the correct level. Realistically, I know it's not always possible to solve the problem at the source, but I think that should always be the goal. At the end of the day, mod_rewrite is typically a Band-Aid used to cover up sloppy programming. Heal the wound, and the Band-Aid can be removed.

mcanerin
06-03-2004, 07:10 PM
I agree with Ron entirely.

I also think the general direction of SEO in general will change in the future due to this happening. CMS systems will be coming "out of the box" SE friendly, HTML editors with SEO (and accessability) tools built in and easier to use, web design courses including an SEO/SEM component and so forth.

I suspect that a lot of the simple (and not so simple, in the case of getting some CMS systems to an SE friendly stage) "grunt work" will stop being in the SEO realm and SEO's will begin to be able to focus on other issues that can't be handled as easily as upgrading to properly designed software or launching a wizard that checks for Alt, Meta and Title tags and other simple issues.

Ian

Mikkel deMib Svendsen
06-07-2004, 07:27 AM
"Theoretical" speaking I agree with you, Ron, but in my personal experience it is just very often easier in the long run to have the URL-rewrite middle layer, so engineers (as good or bad as they come) can have it their way and marketers and SEOs can have it their way. It sort of bridge two groups of people in large corporations that do not always communicate very well :)

On top of that comes all the other cool stuff you can do once you start utilizing mod_rewrite ...

Ron Carnell
06-07-2004, 08:31 AM
Hey, Mikkel! What's it been, five or six years since we had not dissimilar discussions at SEF? Except, I think that was before there was an ISAPI solution and all the threads revolved around 401 traps. Glad to see you've kept busy. :)

You're absolutely right that it's very often easier, but I'm not so sure that adding unnecessary layers is necessarily better. Those engineers (as good or bad as they come) aren't stupid people, and I think education can sometimes be a better, more long-term solution. If an old sea dog like me can learn ... :)

While mod_rewrite falls in the very cool category, nothing comes without cost and maintaining multiple API's in anything except the most static environment can be a high price to pay, especially when you have to pay and pay and continue paying. Not to mention the server overhead when the monthly uniques start to push into the high seven figures.

There are times when the underlying software simply can't be changed, for various reasons, and mod_rewrite is the only answer available. Other than that, I think it's just a short-term solution. My rule of thumb is, if you expect to live with the application for more than six months, pay the piper and fix it right. A year down the road, you'll be glad you did.

Mikkel deMib Svendsen
06-07-2004, 09:08 AM
It's good to see different point of views on the issue. There are many valid arguments on both sides (though, I am not even sure we are on different sides) :rolleyes:

rcjordan
06-08-2004, 06:40 PM
>Not to mention the server overhead

Yeah, there's the sticking point for me. Even if I'm not hitting high-7 uniques, I tend to have a lot of other draws on the server and I've found out the hard way that sites crack under load on their best $$-performance days.

Mikkel deMib Svendsen
06-08-2004, 07:29 PM
I have used URL'rewrite on even sites with heavy traffic without ever running into server problems because of that. In my experience, and from talking to other engineers I know use rewrite on NT, .NET and Apache most say the same: Rewriting use very little recourses. Off course, if you are running on the limit it may be the determining factor but then you might benifit from a bit more headroom anyway :)

Every process you run on a webserver "cost" but if the benifits are good enough it may well be woth it. I do not always find it the most important goal to save a few percentages of CPU if that power is used for something useful.

If I really want to save resources I'd stop server monitor logging that can take up as much as 25% of actual resources (on NT) and move data collection to network level (packet sniffing) or client side tracking - but thats another discussion :)

JBL
06-08-2004, 08:34 PM
Just under a year ago, I used Spider Safe URL to re-write my dynamic urls. Old urls looked like www.my-web-site.com/?city=10. Now they look like www.my-web-site.com/default.asp/city/10. Google has it figured out and ranks the new urls properly. But, Yahoo has both sets of urls in their index and seems to be penalizing me for duplicate content. At least, that's what Sit Match tells me.

Site Match seems to think I should go back to the old urls. But, I'm not willing to risk the good positions I have at Google. Will Yahoo ever figure out that this is not really duplicate content, but really just old urls that shouldn't be in the index at all?

Thanks for any advice or comments.

Dodger
06-08-2004, 09:02 PM
Yahoo also does not appear to be indexing pages past the first query in the URL either. Anything after that first ampersand is not being picked up. You can do a site: query on Yahoo for any of your popular forums and see that compound queries in the Url are not there.

Another thing they are not picking up is Blogger pages. But that is for another subject altogether. ;0)


As for mod_rewrite ... yep that is sloppy, I agree. But someone mentioned it was for Unix only. My impression it was for Apache Server, which runs on Unix, Linux, and Windows boxes.

seomike
06-08-2004, 10:13 PM
There are many mod rewrite programs available for IIS. I think there is a thread in here that has a pretty good list.

Ron Carnell
06-09-2004, 10:20 AM
Off course, if you are running on the limit it may be the determining factor but then you might benifit from a bit more headroom anyway
Server overhead covers a lot of ground and, unfortunately, isn't always a black and white issue. For example, one of my sites is a poetry site, which tends to be very seasonal. During most of the year, our uniques hang around 2M a month, while the summer months will see a dip of almost fifty percent and major holidays can see a short-term rise of several hundred percent. On February 13 and 14, it's not unusual to see over 25K uniques an hour for several hours at a time. Ideally, the server would have sufficient headroom to handle that kind of load without a hiccup, but realistically, it's hard to justify paying for that much server 365 days a year. There's not quite enough money in poetry, I'm afraid, to warrant unnecessary expenses.

My situation may be extreme, but seasonal fluctuations are certainly not unusual. I've seen too many developers implement mod_rewrite solutions that "work" just fine during average loads, but result in unacceptably sluggish response times during peak loads. This can be especially dangerous on large sites if a high-traffic user period coincides with a spider crawl, since the whole goal of mod_rewrite is to fool the spiders into thinking they are getting static pages (which WILL be requested faster than dynamic ones).

I still think mod_rewrite is a great Swiss-army knife every developer should have in their took kit. But it's a knife with very sharp edges and should be handled accordingly. :)

Mikkel deMib Svendsen
06-09-2004, 05:02 PM
You got a very valid point, Ron. You case is probably extreme, as you say, but I guess there are many sites with similar problems (alltough maybe not as dramatic).

Do you have any estimate on how much CPU (%) the overloaded servers used on rewrite? As far as I have been able to find out it's usually less than 2% and it that case you should be able to squeze it in on most systems.

But, off course only if the code is decent and as effecient as possible :)

DianeV
06-09-2004, 05:21 PM
It is just great to see Mikkel & Ron talking again. </off topic>

Mikkel deMib Svendsen
06-09-2004, 05:44 PM
It's good to see both of you again. Been some time :)

For the rest fo you that have no clue what we are talking about, Ron, DianeV and used to spend a great deal of time together some years ago in another prominent SEO-forum ... Look out for those guys! :D

Ron Carnell
06-09-2004, 06:08 PM
Obviously, Mikkel, any CPU percentage will depend on the CPU, but I suspect your 2 percent number is a fair estimate for most. The bigger challenge, as you implied, is the number and complexity of the regular expressions required. I suspect each application is going to be unique, making any estimate just an estimate.

<edit>LOL. I see Diane snuck in while I was typing! She does that kind of thing ... :) </edit>

seomike
06-09-2004, 11:23 PM
That's why we should teach good syntax if people ask about using mods in here. ;)

I see to many tutorials that just say (.*) that'll nab everything when all it needs to find is a ([0-9]{1,5}).

Mikkel deMib Svendsen
06-17-2004, 04:35 PM
So true, seomike, and I hope you will step in whenever you see suggestions using bad coding and generally hekp teach better coding practise in here. I am not even close to perfect when it comes to that myself - always a lot to learn :D

I have however, been fortunate enough to work with really good programmers and seen how much you can uptimize just about any code and how much it matters. If people knew that they would give programmers more time and money ...

That is also why I think 1-2% CPU usage for URL-rewrite should not be a problem as you should easily be able to save it back uptimizing just a few things. I mentioned server logging as one. I know of many webservers that companies just leave to default logging on everything - even things they never use. Turning some of that off should save a lot more than 2% CPU

Then, if you move on to optimizing the code running on the server (whatever applications or services it may be) you can win even more.

Ron Carnell
06-17-2004, 05:08 PM
I see to many tutorials that just say (.*) that'll nab everything when all it needs to find is a ([0-9]{1,5}).
As always, the code should depend on your goal. The implication, however, that (.*) is more processor intensive than ([0-9]{1,5}) is, frankly, just plain wrong.

If I ask you to list all the months (.*) it should require very little thought on your part. If I ask you to list only months that have an "e" in them and are less than 8 characters long, it will likely require a few more brain waves. It's the same with computers. Using (.*) will always be more efficient than a more selective filter (though certainly not always safer). There are exceptions, but generally speaking, if the goal is to save CPU cycles then your regexp should be as selective as necessary to do the job, but not MORE selective than necessary.

Dodger
06-17-2004, 05:21 PM
I see to many tutorials that just say (.*) that'll nab everything when all it needs to find is a ([0-9]{1,5}).

Checking for individual characters (pattern matching) is more processor instensive than wildcard matching.

seomike
06-18-2004, 09:59 AM
I stand corrected. I was under the impression that wildcards took more out of the cpu. Guess ya learn something everyday :D