TV version (Display Regular Site)

Skip to: Navigation | Content | Sidebar | Footer


Weblog Entry

Rewrite!

March 04, 2005

mod_rewriting your way to fame, fortune… and frustration.

A new feature I’ve added to the Zen Garden is a far nicer URI schema. I’ve moved from this:

http://www.csszengarden.com/?cssfile=145/145.css

to this:

http://www.csszengarden.com/145/

Nice, right? Except it’s not quite perfect yet.

For the sake of easy, short URIs for the book, I looked into mod_rewrite at the time, but never quite got it working. So I hacked this all together manually by adding redirects for each of the designs to an index.php file in their respective directories. Ugly, unmaintainable, and more work than necessary, but it worked.

Then Mathias Bynens asked if the new links could be applied to all designs. And since I had originally planned for this anyway, I went further into the mod_rewrite manual, which is completely incomprehensible to my puny brain, and managed to come up with something anyway. The affirmation from an anonymous helper monkey kept me at it, and finally I came up with this monstrosity:

RewriteEngine On
RewriteRule ^([0-9]{3,})(/?)(/&page=([0-9]*))?$
   	index.php?cssfile=/$1/$1.css&page=$4

In plain English… oh geez, I don’t even want to try. There’s still a query string variable in there to indicate the current page of designs; I’d love to go with post data for that, but there’s no way to retrofit a form in the ZG markup at this point without really breaking things. The important thing is, the variable is passed as I’d expect, which is good.

What’s not so good is that while these three possible URI scenarios are covered:

http://www.csszengarden.com/145/
http://www.csszengarden.com/145/&page=2
http://www.csszengarden.com/?cssfile=145/145.css

This one is not:

http://www.csszengarden.com/145

Note the lack of a trailing slash. Follow the link and see what happens to the URI — ugh. I’m not sure how to fix that, and so I turn to you dear reader. Tell me how.

I already tried an initial rule before the previous one to add the slash back in, but this only netted me server errors:

RewriteRule ^([0-9]{3,})(/?)$	/$1/

So, what now? First answer that works gets a free copy of the book.

Update: Got it, thanks to Roger Johansson. See the comments for details.


March 04, 02h

How about this:

RewriteRule ^([0-9]{3,})$ $1/

Dave S. says:
March 04, 02h

Roger – Nope. No server error, but no change.

ak says:
March 04, 02h

try replacing the (/?) with ([/]?) that’s what i use for an optional trailing slash

March 04, 02h

Hmm. Seems to work for me.
What about if you add an R flag after the first rule?

RewriteRule ^([0-9]{3,})$ $1/ [R]

Dave S. says:
March 04, 03h

ak – nope. When applied to the big rule, no change. When applied to the smaller rule, server error.

Which highlights something – when replying, let me know if I should be applying your suggestions to the larger rule that’s already working, the smaller one that I used last, or a brand new one of your devising.

ph says:
March 04, 03h

i’ve use this and it seems to work

RewriteRule ^/([0-9]{3,})/?$ /test.php?t=$1 [R]

ph says:
March 04, 03h

Its a redirect you need to use the [L] flag not the [R]

March 04, 03h

Like ph says it could be a laer rule messing things up. Removing the first (short) rule and adding an L flag to the big rule works for me:

RewriteRule ^([0-9]{3,})(/?)(/&page=([0-9]*))?$ index.php?cssfile=/$1/$1.css&page=$4 [R,L]

Dave S. says:
March 04, 03h

Roger: got yours working with the addition of a slash:

RewriteRule ^([0-9]{3,})$ /$1/

Without the leading slash, it tries to write the physical server path. With it, it does exactly what we’d expect.

Thanks, I’ll email you about the book!

Dave S. says:
March 04, 03h

This is the final result, and the book goes to Roger:

RewriteRule ^([0-9]{3,})$ /$1/
RewriteRule ^([0-9]{3,})(/?)(/&page=([0-9]*))?$ index.php?cssfile=/$1/$1.css&page=$4

So, no R/L flags. Should I be using them?

11
Jim says:
March 04, 03h

> Its a redirect you need to use the [L] flag not the [R]

Use both. If you omit the R, it’s not a redirect in the eyes of clients, and the two resources are cached separately. The worst case scenario (an even 50:50 split of people using the two different URIs) would more than double your bandwidth use.

In the particular case of adding slashes on for directories, it can also result in broken links/stylesheets/images/etc.

Preferably use a permanent redirect so that search engines pay attention.

I find the full names are more readable too, especially when I’m editing a config file I haven’t touched in months.

[redirect=permanent,last]

…is equivalent to:

[R=301, L]

March 04, 03h

Aw man, almost had a book… You need to have problems like this more often, Dave! (Or at least offer free books from time to time.) ;^)

Tom says:
March 04, 03h

how about enabling MultiViews and using php explode() to get /what-ever-number? I personally don’t like messing with re-write rules.

March 04, 03h

I know the problem is mostly solved, but just wanted to point out this regular expression resource:
http://regexlib.com/RETester.aspx
I know, the patterns are run against the ASP.NET Regex engine, but it is quite similar to the Unix version. It makes pattern testing a snap, because I can just tweak the pattern until I get it right, then take it to the config file (or whatever).

You’re really getting into the nitty gritty now Dave, sahweet!

Dave S. says:
March 04, 03h

Hmm, maybe there’s still room for another. I’m having trouble with the flag thing. This:

RewriteRule ^([0-9]{3,})$ /$1/
RewriteRule ^([0-9]{3,})(/?)(/&page=([0-9]*))?$ index.php?cssfile=/$1/$1.css&page=$4 [R=301,L]

rewrites this:

http://www.csszengarden.com/052/

to this:

http://www.csszengarden.com/home/httpd/vhosts/csszengarden.com/httpdocs/index.php?cssfile=/052/052.css&page=

So, what’s up now? The only difference between the solution that appears to work and this one is the addition of the R/L flags.

March 04, 03h

dave, why didn’t you drop me a line? mod_rewrite is one of my hobbies…ah well, glad it’s fixed :)

March 04, 03h

Wow, great!

Thanks Dave. Really looking forward to receiving the book! :-)

And I see your question about the flags has already been answered.

You may need to add “RewriteBase /” to the line after “RewriteEngine on” to avoid the rules trying to use the physical path.

March 04, 03h

How about this one:
RewriteRule ([0-9]{3,}[^/]*) index.php?cssfile=/$1/$1.css&page=$4 [R=301,L]

Dave S. says:
March 04, 03h

Justin - nope. Same problem.

20
Jim says:
March 04, 03h

You want a leading slash in the destination:

RewriteRule ^([0-9]{3,})(/?)(/&page=([0-9]*))?$ /index.php?cssfile=/$1/$1.css&page=$4 [R=301,L]

March 04, 03h

I’m going to try one more:
RewriteRule ([0-9]{3,}[^/]*).*(page=([0-9]*)){0,1} index.php?cssfile=/$1/$1.css&page=$3 [NC]

I forgot about the “&page=XX” part.

22
Jim says:
March 04, 03h

Ahh, hang on a sec, I see what the problem is. I was looking at it in the opposite direction to you. You need an extra rule before the others.

Essentially, it’s:

1. Directory hack with external redirect (and stop processing).

2. Rewrite new URI to old URI with local redirect (and stop processing).

3. Rewrite old URI to new URI with external redirect.

March 04, 04h

Darn it. I was too late for the question I actually could have answered.

24
Jim says:
March 04, 04h

Something like this should work:

# Fix up missing trailing slash
RewriteRule ^([0-9]{3,})$ /$1/ [R,L]

# Redirect old URIs to new URIs permanently as far as the rest of the world is concerned.
RewriteRule ^index.php?cssfile=/([0-9]{3,})/[0-9]{3,}.css&page=([0-9]*)$ /$1/&page=$2 [R=301,L]

# Redirect new URIs to old URIs internally.
^([0-9]{3,})(/?)(/&page=([0-9]*))?$ index.php?cssfile=/$1/$1.css&page=$4 [L]

That’s what you want to do, right?

Dave S. says:
March 04, 04h

Here’s where we’re at. Everything’s working perfectly with the R/L flags:

RewriteEngine On
RewriteBase /
RewriteRule ^([0-9]{3,})$ $1/
RewriteRule ^([0-9]{3,})(/?)(/&page=([0-9]*))?$ index.php?cssfile=/$1/$1.css&page=$4 [R=301,L]

The only catch is that, from what I understand, it’s the R flag that expands the URI. So instead of the final address being a nice tidy /100/ , it ends up /index.php?cssfile=/100/100.css&page=0 at the moment.

Any way to shrink that back down, or is that where it has to stand if I don’t want two separate resources being indexed/bandwidth doubled?

26
Jim says:
March 04, 04h

Did you spot my comment above? It sounds crazy to write two rules, each redirecting to each other, but what you have to remember is that only one of them will apply for any given request, and that the external redirect generates a new request. It works like this:

Browser requests old URI.
Old URI tells the client to redirect to the new URI.
Browser requests the new URI.
The server serves what was at the old URI.

March 04, 04h

Jim’s looks spot on, I’d say he desrves a book if there’s a second one to be given. Looks like we’ve got a mod_rewrite master in the premises.

As long as you request the new URI, you should be served the document transparently without any redirect. And if you’re asking for the old URI, well, you are told that you should be asking for the new URI. Sounds like that’s what you are after, right Dave?

28
Jim says:
March 04, 04h

Ha! I’m no master, I feel like I’m banging my head against a wall every time I touch the damn thing. It’s just I’ve done it enough times to have a couple of tricks up my sleeve and a bumpy head :).

I may well be off on the regexp particulars though, I can never remember exactly which directories they start from.

Turnip says:
March 04, 05h

“finally I came up with this monstrosity”

How can you say that?! Regular Expressions are beautiful ;-)

30
Michael Daines says:
March 04, 06h

One option might be the implementation used in Movable Type dynamic publishing, which would require editing the script that shows the gardens. What you do is write everything that isn’t an actual request of a file or directory to a central script (in their implementation, it’s called “view.php” I think), and then use the REQUEST_URI variable available in that script to see what to display. You’ll get something like “/145” if the user is coming from the book, which you can regexp and transform as you see fit. See the view.php page in the movable type installation for more hints.

A technique like this is also used in Ruby on Rails to make URLs nicer. Of course, in a rails application, there are more types of URLs to worry about, and they carry a little more information, but it’s the same deal.

Here is the movable type documentation page on this: http://www.sixapart.com/movabletype/docs/mtmanual_dynamic

March 04, 08h

I fight with regular expressions all the time, and the trailing-slash issue is a frequent source of frustration.

In the future, I’ll be sticking to the excellent method proposed in this old ALA article: http://www.alistapart.com/articles/succeed/

Basically: Forget the regexes. Just feed the whole url into a traffic-light script in your home directory that just include()s whatever document has been requested. It’s much safer, and only incurs one extra disk-read.

March 04, 08h

with regards to jim’s suggested rules, a small comment: i may be wrong, but rewrite rules can’t actually act on the GET variable part of a request…you can only check the actual file/path being called, not the parameters passed.

anyway, i may be missing something dave, but what is the problem at the moment? are you trying to automatically redirect the old-style urls to new short style ones? or are you worried about search engines spidering the new-style urls with and without the slash and doubling up the results? if it’s the latter, i may be wrong, but i suspect that google and co treat www.foo.com/bar and www.foo.com/bar/ as exactly the same, and only index them once.

oh, and how about this really short rule? should cover your original need (and handles any additional slash or &page= or anything else for that matter)

RewriteEngine on
RewriteRule ^([0-9]{3,})/?(.*)$ /index.php?cssfile=$1/$1.css$2


if, however, your search engine concern is that of old-style versus new-style, here’s an idea (again keeping in mind that mod_rewrite can’t see the GET part): change your current index.php to check the request url. if it’s old-style, send a permanent redirect (301) to itself in the new-style url format.

Jon says:
March 04, 08h

A random commnet here… for what I remember about RewriteBase, it should point at the filesystem directory that the rewrite applies to. Therefore, when I’m testing my site on this iBook, RewriteBase is set to “/www/public_html”. I remember having all manner of fun if I then directly transferred this file to my server without editing it…
Of course, it might not make any difference… but it’s 4am and I’ve just woken up (why did I choose to live so far away from my customers), and that’s what jumped out at me.

That said, I’d personally rewrite all requests for /nnn(/)? to /index.php, and parse $_SERVER[“REQUEST_URI”] to include the appropriate stylesheet… but anyways.

Good luck, Dave - mod_rewrite is a tough one to ‘get’, but it’s well worth it :)

-jdp

Dustin says:
March 04, 09h

This is when I count my blessings of knowing a wide array of languages so well. from css to advanced OOP…even to mastering regular expressions…

I wish I would have seen this post earlier. Could have got myself a free book.

Congrats Roger.

Aristotle says:
March 05, 03h

Firstly, requests for the URL with no trailing slash should return a redirect to the same URL but including a trailing slash. This is what happens when you ask for http://example.com/directory and get http://example.com/directory/ served to you.

Secondly, the mod_rewrite rule WILL NOT MATCH QUERY PARAMETERS. You cannot use RewriteRule to match something like ?page=X in the URL. You must use a RewriteCond on the QUERY_STRING environment variable to do that, except in your case you don’t need to. You do want to add a “cssfile” parameter of your own. You can do this by including a query string in the result URL, but then it will overwrite the incoming query string which might have contained a page= that you want to preserve. So use the [QSA] option which means “Query String Append”.

That gives:

RewriteRule ^([0-9]{3,})$ $1/ [L,R=301]
RewriteRule ^([0-9]{3,})/$ index.php?cssfile=$1/$1.css [QSA]

These are tested and work.

Furthermore, I suggest:

RewriteRule ^([0-9]{3,})/([0-9]+)$ index.php?cssfile=$1/$1.css&page=$2

so that you can say http://www.csszengarden.com/145/2 to get page 2. Note the absence of trailing slash which makes sure that the browser requests relative URLs in the page relative to /145/ instead of relative to /145/2/. If that is of no consequence and you prefer the trailing slash, you can trivially paraphrase the other previous rules to achieve that.

March 05, 05h

:) Darnit indeed. And to think all those hours playing with mod_rewrite could have earned me a free book.

Anyway, I’d just be repeating other people’s comments at this point (most notably Aristotle’s comment on the [QSA] flag, which you certainly should be using) so I guess that’s all for now.

Best wishes.

March 05, 06h

perke…hitting “next designs” and “previous designs” doesn’t actually change the design that’s displayed at the moment, but it does change the list of available designs in the menu (i.e. it changes the “page” GET parameter)

Yaniv says:
March 05, 06h

mod_rewrite can give serious headaches. When I get to this point in a scenario as simple as you describe, I drop it altogether and write:

ScriptAliasMatch ^/[0-9]{3,} “/<replace-with-path>/index.php”

This will make your index.php be invoked everytime an address beginning with 3 digits is accessed. I then do a little internal parsing of the REQUEST_URI environment variable in my script to fetch the parameters I need.
Keep in mind that with this “fluid” behavior you may end up with audience who doesn’t know how a correct address should look like. However, with proper internal parsing, almost every address they type will fetch the page they wanted, so it’s not clear if it’s good or bad.
You may also write several more specific ScriptAliasMatch to restrict the address to several templates.

March 05, 07h

To get the redirect like http://www.csszengarden.com/145 is very easy.

Say if I enter url like this http://www.csszengarden.com/145

What will happen ?
Since /145 doesn’t exists, it will be handled by your default 404 handler.

Now the trick is to add code to your ErrorHandler script (a simple php file).

You can extract the css number (and possible other variables) from the $_SERVER[“REQUEST_URI”] variable, by doing some basic matching and string trimming.

Hope this helps!

P.S: This is how http://www.php.net does their URL function search. For example if you write a url like this http://www.php.net/str_replace it will take you to the manual page for that function.

J. J. says:
March 05, 09h

Wow. Now if I could only get that kind of help by posting a simple question on my blog…. I’ve got a couple old 15” monitors if anyone wants to help me with my spam and redirect problems; see my last couple of posts ;-)

perke says:
March 05, 11h

ooops!
I was looking at garden archive couple minutes ago and hit <next design> and got the same design!
So,Dave if you were playing with it at that time OK…but if not you may messed it up. …. just for heads up

I was here:

http://www.csszengarden.com/?cssfile=http://home.comcast.net/~glyphfoundry/zengarden/zengarden.css&page=4

cheers, mate

Dave S. says:
March 05, 12h

Alright, a tiny bit more progress. I hadn’t seen Jim’s suggestion (comment #24) when I posted the prior update, but I just plugged it in and it worked great.

Except for one little hitch. I need one or the other to be canonical. Right now, what I’ve come up with thanks to Roger’s input is that all /###/ are redirected to /?cssfile=###/###.css, and all /?cssfile=###/###.css remain the same. So the ugly querystring version is canonical, which it turns out is very important. (the canonical bit, not the querystring bit)

Jim’s suggestion allowed either /###/ or /?cssfile=###/###.css to function, without a redirect to one or the other. I believe my previous wording suggested that would be okay, but I’ve realized it’s not.

Due to some of the current setup (style sheet importing script, next/previous design list links, and historical links to existing designs all being variables here) I have to hardwire a certain amount of the linkage to various designs. An example: on the current page, if you hit the ‘Next Designs’ link, does that take you to ‘/&page=$nextPage’ or ‘/###/&page=$nextPage’? Depends on the URI scheme.

Actually, there ought to be an easy way to script around the discrepency now that I think about it, instead of further mod_rewrite gymnastics. I kind of like the idea of having either linking method work without a visible redirect, and I think I’m about done with mod_rewrite on this one. Maybe I should just work out the logic and make that happen. Maybe Patrick’s on to something (comment 32) with the self-redirect idea. (But he’s already got a book. So this one goes to Jim.)

March 06, 09h

What’s wrong with “145.csszengarden.com” ?

I’ve never tried anything with Movable Type and dynamic subdomains, but it shouldn’t be that hard to implement, and I don’t see anything wrong with using that space for something useful.

March 06, 10h

Something you might want to try instead of mod_rewrite is a htaccess with Options +MultiViews.
This is what I use, and then I just rip apart(regex) the URL and use that to determine what data gets retrieved where.
Hope this helps your quest to kill the kruft.

John Y. says:
March 06, 10h

Interestingly, ESPN has this same problem: http://sports.espn.go.com/mlb/ works, but http://sports.espn.go.com/mlb does not. It can be pretty frustrating, really.

keesj says:
March 06, 12h

“— Due to some of the current setup (style sheet importing script, next/previous design list links, and historical links to existing designs all being variables here) I have to hardwire a certain amount of the linkage to various designs. An example: on the current page, if you hit the ‘Next Designs’ link, does that take you to ‘/&page=$nextPage’ or ‘/###/&page=$nextPage’? Depends on the URI scheme. —”

Try using the base-tag. ( http://www.w3.org/TR/REC-html40/struct/links.html#h-12.4) )

47
Jim says:
March 07, 03h

> If you add ” [L]” to the end of your mod_rewrite rule, the URI will redirect silently

No, the URI redirects silently by default, unless you supply a flag like [R].

The [L] says that if the rule is triggered, no more processing should be done (i.e. it’s the Last rule). It’s to avoid processing rules that won’t/shouldn’t match.

March 07, 03h

I’m not quite sure if I like the new IRI-design (yes, it’s an IRI, not a URI).

I think /design/number would suit better than /number, since a hierarchy-IRI design should have more than one node.

The issue with the page-number is quite tricky. I don’t look at /design/number/page/number OR /design/number/number as a *good* IRI.
I have no good answer for this one, unfortunately.

By the way, a static IRI shouldn’t be followed by a trailing-slash in the end.
I look at this as a good guideline:
URIs used to access HTML content for static web pages should have no .html extension and no trailing slash.

Oh, and the book looks great. :)

March 07, 04h

It’s extremely cool you actually listen to me ;), and I’m sure this enhances the usability of The Garden. But why not rewrite the whole thing instead of redirecting to the old URIs? I’d rather redirect the old crufty URIs to the new ones!

50
That Random Guy says:
March 07, 05h

Quote: “an anonymous helper monkey” *grins*. Like your terminology Dave!

March 07, 11h

If you add ” [L]” to the end of your mod_rewrite rule, the URI will redirect silently, therefore still showing the URI the user typed in in the address bar while also showing the redirected page in the browser window. This method leaves the URI as the short, clean one they typed.

RewriteEngine On
RewriteRule ^([0-9]{3,})(/?)(/&page=([0-9]*))?$
index.php?cssfile=/$1/$1.css&page=$4 [L]

March 08, 03h

Working on a neat CMS and am also banging my head against Mod_rewrite…

We’ve got it working so that you can have a site in a subfolder and have it redirect to the appropriate place with the following rules:

RewriteBase /devLocation/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /devLocation/index.php?q=$1 [L,QSA]

Our CMS grabs an alias assigned to each page to rewrite the URL nicely. However, we’d like to be able to use the site heirarchy to have the same alias in arbitrarily deep folders so that we can use the same alias without having to absolutely reference all the CSS files and whatnot. The goal would be to have the following work:

mysite.com/news/new.html
mysite.com/products/misc/new.html

In the above example, both pages have their alias set to “new”, but they reside in different locations on the document tree.

I’d like to be able to use a template header that calls the link to the CSS relatively (like “assets/site/style.css”). This would allow me to stage the site in a “non-public” directory off the back end of a site, and then simply move it up and theoretically change the rewriteBase to reflect the new location.

Is this doable at all?

Mrten says:
March 08, 05h

regarding comment 32: you can get to the parameters-part of the request, it’s just a bit tricky.

there is this pretty badly documented variable %{THE_REQUEST} which I’ve been using for a couple of years now to redirect all requests and parameters to one specific php-script like this:

RewriteCond %{THE_REQUEST} ^.*.html\?(.*)\ HTTP/.*$
RewriteRule ^(.*).html$ page.php?request=$1&%1 [L]

i’m sure dave can adapt this example to his needs :)

Mike says:
March 08, 06h

You can grab the querystring using mod-rewrite (the bit after the question mark):

RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^(.*)$ index.php?page=$1&%1 [L]

This grabs the querystring, stuff’s it into a variable (%1), falls through to the next rule, rewrites everything to index.php, and stuffs the requested link into the page variable ($1), then attaches the querystring variable at the end.

March 09, 11h

Ryan, just add a slash at the beginning of the link: /assets/site/style.css. This will effectively look for http://yourdomain/assets/site/style.css. This also causes less overhead than using full URI’s.

Jon Berg says:
March 13, 02h

It seems that there is a big market for custom mod_rewrite. I have even gotten emails from people that I don’t know asking me to make them regular expression redirects. And I am no guru.

February 12, 06h

dave, i know this is an old entry, but…have you now reverted back to not using rewrites? the URLs from the book seem to work, but any other number brings up a “forbidden”.

February 18, 21h

Wow, a definately nice article! and a lot many useful reply-posts

I too certainly like Apache for this wonderful tool that it has, But I’m unsure if mod_rewrite can cause in increase in load on extremely loaded servers ?