Wednesday, June 25, 2008

Technical methods of Dynamic Pages of Any site

There are few technical aspects need to be considered for optimizing dynamic websites.

Lets start with .htacess & mod-rewrite. These are the two concepts that you will have to master to understand how to cloak search engine unfriendly urls. Also keep in mind that these two components are implemented on apache server. However for IIS server, we have the equivalents available, as can be seen later in this article.

So starting from the basics

.htaccess File:

An .htaccess file just is a plain text file. It has one directive per line like this:
RewriteEngine on

The "RewriteEngine" portion is the directive and "on" is a parameter that describes what "RewriteEngine" should do


The .htaccess file usually lives it the root directory of a site and allows each site to uniquely configure how Apache delivers its content. Its directives apply to the entire site, but subdirectories can contain their own .htaccess and it applies to this sub and all of its subs and so on, down thru all of your sub sub sub sub subdirectories... You could have a different .htaccess in every subdirectory and make each sub behave a little differently.

Mod_rewrite:

Mod-rewrite is a redirect directive to the requesting object on a apache server. Its typical format looks like

Options +FollowSymLinks
RewriteEngine on
RewriteRule ^url1\.html html$ url2.html [R=301,L]

Lets look at this a little more closely. The first directive instructs Apache to follow symbolic links within the site. Symbolic links are "abbreviated nicknames" for things within the site and are usually disabled by default. Since mod_rewrite relies on them, we must turn them on.

The "RewriteEngine on" directive does exactly what it says. Mod_rewrite is normally disabled by default and this directive enables the processing of subsequent mod_rewrite directive.

In this example, we have a caret at the beginning of the pattern, and a dollar sign at the end. These are regex(regular expressions in *nix) special characters called anchors. The caret tells regex to begin looking for a match with the character that immediately follows it, in this case a "u". The dollar sign anchor tells regex that this is the end of the string we want to match.

In this simple examples, "url1\.html" and "^url1\.html$" are interchangeable expressions and match the same string, however, "url1\.html" matches any string containing "url1.html" (aurl1.html for example) anywhere in the URL, but "^url1\.html$" matches only a string which is exactly equal to "url1.html". In a more complex redirect, anchors (and other special regex characters) are often essential.

Once the page is matched it directs it to replace it by the ‘url2.html’

In our example, we also have an "[R=301,L]". These are called flags in mod_rewrite and they're optional parameters. "R=301" instructs Apache to return a 301 status code with the delivered page and, when not included as in [R,L], defaults to 302. Unlike mod_alias, mod_rewrite can return any status code that is specified in the 300-400 range and it REQUIRES the square brackets surrounding the flag, as in this example.


The "L" flag tells Apache that this is the last rule that it needs to process. It's not required in this simple example but, as the rules grow in complexity, it will become very useful.

The Apache docs for mod_rewrite are at http://httpd.apache.org/docs/mod/mod_rewrite.html

& some examples can be found at

http://httpd.apache.org/docs/misc/rewriteguide.html .


Now if we rename or delete url1.html, then request it again. Mod_rewrite can redirect from non-existent URLs (url1.html) to existing ones. This is how essentially we cloak the dynamic pages. The first url can be the dynamic page that we want to be replaced by the the static looking ‘url 2’. This then is how cloaking works on the apache server. Though there are other methods available however this remains the most popular & reliable.

IIS Server Redirects:

As long as one uses one of the mod_rewrite cousins for IIS (iis_rewrite, isapi rewrite), the method will be mostly the same for IIS as it will for Apache. However the place, the rules are inserted will depend on which software is being used (not obviously into httpd.conf or .htacess). But the rule generation pretty much remains the same either way.

The most used framework for this genre is ispai rewrite. For more info on this consult http://www.isapirewrite.com/ . The site has a free download version of their code & a paid version for 69USD

For IIS Rewrite functionality, Qwerksoft remains the most popular alternative(http://www.qwerksoft.com/products/iisrewrite/). Again a basic free downloadable or a 99 USD purchase option exists with them.

However user experience suggests that the ISAPI_Rewrite product outperforms the others due to its ease of configuration and a bunch of other little extras. One of the biggest benefits with ISAPI_Rewrite is that you don't have to restart IIS each time you make a change to the .ini file. In other words once ispai-rewrite is installed, one can have the .ini file within the root folder so that changes can be made, as one goes along if necessary,without a restart.

Also these products support shared hosting. So the hosting provider can be convinced to buy them & install them. Some other products in this category are as under:

http://www.pstruh.cz/help/urlrepl/library.htm ( free ispai)

http://www.motobit.com/help/url-replacer-rewriter/iis-mod-rewrite.asp

Also if you are using .NET platform, this works for free:

http://msdn.microsoft.com/msdnmag/issues/02/08/HTTPFilters/

No comments: