About a month ago I was updating sfadb.com and browsing around the site and happened to click on Richard Cowper’s page, http://www.sfadb.com/Richard_Cowper, and got a ‘file not found’ error message. Huh? Also for the following page in the alphabetical sequence of Names, David Cowperthwaite. Aside from those two, everything seemed fine.
At the time I rebuilt and uploaded the Cowper and Cowperthwaite pages to make sure they hadn’t been overlooked somehow, but still got the error messages. Didn’t have time to investigate further.
Tonight I updated several pages on the site with award winners announced this past month, which went pretty quickly since updating records already compiled does not involve a lot of text entry, verification of book and story titles, and so on — just tagging certain records as winners, and re-assembling and -building the affected Name and Listing pages. Updated five awards in under an hour, though that did include tracking down English language titles for finalists in the foreign novel category of the Deutsche Phantastik Preis…
And then returned to this bug. Did it affect any other pages than those two? I stepped through the entire set of Co Name pages and found no other problems. So, then what?
Well, the likely suspect was the ‘htaccess’ page that manages redirects, i.e., that converts visible URLs, like “http://www.sfadb.com/Richard_Cowper”, to the actual URLs of files on the site. When I set the site up in early 2012, I spent a lot of time figuring out how to implement simple, clean URLs, like the Cowper link, without the “.php” extension, and even though all those files are actually located with a /db subdirectory off the main sfadb.com domain. (Thus, the actual Richard Cowper page is http://www.sfadb.com/db/Richard_Cowper.php.)
This involves an ‘htaccess’ file — a text file with no extension — that instructs the server how to process requests for URLs. It consists of a series of ‘rewrite’ conditions and one key instruction, which is this:
RewriteRule (.*) /db/$1 [L]
This says anything like sfadb.com/filename, replace by sfadb.com/db/filename, and process that URL request and supply that page.
This is preceded in the htaccess file by a long list of *exceptions*, individual files names, or file name rules, that are exempt from this rewrite instruction. For example,
RewriteCond %{REQUEST_URI} !/graphics
In this case the ! means ‘no’, and the instruction means any URL request such as www.sfadb.com/graphics/filename should *not* be mapped to sfadb.com/db/graphics, per the rewrite rule. Because the graphics subdirectory is a top level subdirectory at the same level as /db.
So… was there an exception to the rewrite rule that somehow affected Richard Cowper…?
Well, yes. It was this:
RewriteCond %{REQUEST_FILENAME} !wp(.*)
Designed to exclude the rewrite rule for those WordPress-installed files in the top level directory, which all begin with ‘wp’.
I fixed the Cowper problem by changing this rule to:
RewriteCond %{REQUEST_FILENAME} !wp-(.*)
since all of the WordPress-installed files are named things like wp-activate.php, wp-config.php, and so on. They all have that hyphen.
The original rule found the ‘wp’ in ‘Cowper’ and ‘Cowperthwaite’. By adding the hyphen, it still avoids redirecting the wp- files, but not the Cowper and Cowperthwaite files. So Richard Cowper’s pages are visible again now.
I think I didn’t anticipate any problem with the original rule because… you’d think !wp(.*) would apply only to file names *beginning* with ‘wp’.
Apparently not.
It took less time to fix this problem than it’s taken me to write it up. I do so because it’s another example of my philosophy about computer science and programming: you can *always* figure out and solve any problem. And fellow geeks might be interested.