Wednesday, June 25, 2008

Advanced SEO - Dynamic Page Optimization

Dynamic page optimization:

As Internet user base started to grow, website owners started to make web site more and more attractive and user friendly. The most important thing to keep in mind is each webpage is not a separate file but is created when a user performs some activity.

Lets see what exactly dynamic site is! Unlike the normal HTML website where The content of static pages doesn't change unless you actually code the changes into your HTML file: open the file, edit the content, save the file, and upload it to the server. All search engine spiders can index static Web pages. Dynamic web page is a template that displays specific information in response to queries. The database is connected to the web site and the response is generated through the connected database only. These sites are easy to update for webmaster. Since it is directly connected to database, the change in database reflects all the pages. It is much simpler than normal HTML pages, where you need to change the desired content in each and every page.

For the marketer, the creation of new pages or updates to existing pages is done by either making adjustments to information in the database or, when it comes to the site’s visual presentation, may mean adjustments to one or a few template pages. Of course one has to make web site like that only, but problem started when these beautiful, content rich sites failed to rank higher in search engines.

But the problem lies in its advantage itself. As studied earlier, dynamic page executes on query. Users send queries through search engines or they are already be coded into a link on the page. But a search engine spider doesn't know to use your search function - or what questions to ask. Dynamic scripts often need certain information before they can return the page content: cookie data, session id, or a query string are common requirements. Spiders usually stop indexing a dynamic site because they can't answer the question.

Search engines only believe in content and not flashy elements in your web site. Search engine crawlers are programmed in such a way that they can read the text only. Crawlers strictly ignore all the flashy elements such as pictures, frames, video etc, read it as an empty space and move on. Some search engines may not even be able to locate the dynamic page very easily. But if we make web sites SE friendly only and not user friendly then most likely you end up losing out visitor. This then presents a big problem for marketers who have done very well with their rankings in search engines using static pages but who wish to switch to a dynamic site.

This is why SEOs came up with the advanced SEO techniques to optimize dynamic pages. Here are few methods that you can use to optimize dynamic pages.

Methods to make search engine spider Dynamic Pages:

1. Use of softwares – There are various softwares available in the market, which will remove the "?" in the Query String and replace it with "/", thereby allowing the search engine spiders to index the dynamic content.

Example -
http://www.my-online-store.com/books.asp?id=1190 will change to
http://www.my-online-store.com/books/1190.

The latter being a static URL, it can easily be indexed by the search engine spiders.

2. Use of CGI/Perl scripts - One of the easiest ways to get your dynamic sites indexed by search engines is using CGI/Perl scripts. Path_Info or Script_Name is a variable in a dynamic application that contains the complete URL address (including the query string information). In order to fix this problem, you'll need to write a script that will pull all the information before the query string and set the rest of the information equal to a variable. You can then use this variable in your URL address.

Example - http://www.my-online-store.com/books.asp?id=1190

When you are using CGI/Perl scripts, the query part of the dynamic URL is assigned a variable.

So, in the above example "?id=1190" is assigned a variable, say "A". The dynamic URL http://www.my-online-store.com/coolpage.asp?id=1190
will change to http://www.my-online-store.com/books/A through CGI/Perl scripts which can easily be indexed by the search engines.

3. Re-configuring your web servers -

(i) Apache Server - Apache has a rewrite module (mod_rewrite) that enables you to turn URLs containing query strings into URLs that search engines can index. This module however, isn't installed with Apache software by default, so you need to check with your web hosting company for installation.

(ii) Cold Fusion - You'll need to reconfigure Cold Fusion on your server so that the "?" in a query string is replaced with a '/' and pass the value to the URL.

4. Creation of a Static Page linked to an array of dynamic Pages -

This approach is very effective, especially if you are the owner of a small online store selling a few products online. Just create a static page linking to all your dynamic pages. Optimize this static page for search engine rankings. Include a link title for all the product categories, place appropriate "alt" tag for the product images along with product description containing highly popular keywords relevant to your business (You can conduct keyword research for your site through http://www.wordtracker.com ). Submit this static page along with all the dynamic pages in various search engines, conforming to the search engine submission guidelines.

Technical methods of Dynamic Pages of Any site

There are few technical aspects need to be considered for optimizing dynamic websites.

Lets start with .htacess & mod-rewrite. These are the two concepts that you will have to master to understand how to cloak search engine unfriendly urls. Also keep in mind that these two components are implemented on apache server. However for IIS server, we have the equivalents available, as can be seen later in this article.

So starting from the basics

.htaccess File:

An .htaccess file just is a plain text file. It has one directive per line like this:
RewriteEngine on

The "RewriteEngine" portion is the directive and "on" is a parameter that describes what "RewriteEngine" should do


The .htaccess file usually lives it the root directory of a site and allows each site to uniquely configure how Apache delivers its content. Its directives apply to the entire site, but subdirectories can contain their own .htaccess and it applies to this sub and all of its subs and so on, down thru all of your sub sub sub sub subdirectories... You could have a different .htaccess in every subdirectory and make each sub behave a little differently.

Mod_rewrite:

Mod-rewrite is a redirect directive to the requesting object on a apache server. Its typical format looks like

Options +FollowSymLinks
RewriteEngine on
RewriteRule ^url1\.html html$ url2.html [R=301,L]

Lets look at this a little more closely. The first directive instructs Apache to follow symbolic links within the site. Symbolic links are "abbreviated nicknames" for things within the site and are usually disabled by default. Since mod_rewrite relies on them, we must turn them on.

The "RewriteEngine on" directive does exactly what it says. Mod_rewrite is normally disabled by default and this directive enables the processing of subsequent mod_rewrite directive.

In this example, we have a caret at the beginning of the pattern, and a dollar sign at the end. These are regex(regular expressions in *nix) special characters called anchors. The caret tells regex to begin looking for a match with the character that immediately follows it, in this case a "u". The dollar sign anchor tells regex that this is the end of the string we want to match.

In this simple examples, "url1\.html" and "^url1\.html$" are interchangeable expressions and match the same string, however, "url1\.html" matches any string containing "url1.html" (aurl1.html for example) anywhere in the URL, but "^url1\.html$" matches only a string which is exactly equal to "url1.html". In a more complex redirect, anchors (and other special regex characters) are often essential.

Once the page is matched it directs it to replace it by the ‘url2.html’

In our example, we also have an "[R=301,L]". These are called flags in mod_rewrite and they're optional parameters. "R=301" instructs Apache to return a 301 status code with the delivered page and, when not included as in [R,L], defaults to 302. Unlike mod_alias, mod_rewrite can return any status code that is specified in the 300-400 range and it REQUIRES the square brackets surrounding the flag, as in this example.


The "L" flag tells Apache that this is the last rule that it needs to process. It's not required in this simple example but, as the rules grow in complexity, it will become very useful.

The Apache docs for mod_rewrite are at http://httpd.apache.org/docs/mod/mod_rewrite.html

& some examples can be found at

http://httpd.apache.org/docs/misc/rewriteguide.html .


Now if we rename or delete url1.html, then request it again. Mod_rewrite can redirect from non-existent URLs (url1.html) to existing ones. This is how essentially we cloak the dynamic pages. The first url can be the dynamic page that we want to be replaced by the the static looking ‘url 2’. This then is how cloaking works on the apache server. Though there are other methods available however this remains the most popular & reliable.

IIS Server Redirects:

As long as one uses one of the mod_rewrite cousins for IIS (iis_rewrite, isapi rewrite), the method will be mostly the same for IIS as it will for Apache. However the place, the rules are inserted will depend on which software is being used (not obviously into httpd.conf or .htacess). But the rule generation pretty much remains the same either way.

The most used framework for this genre is ispai rewrite. For more info on this consult http://www.isapirewrite.com/ . The site has a free download version of their code & a paid version for 69USD

For IIS Rewrite functionality, Qwerksoft remains the most popular alternative(http://www.qwerksoft.com/products/iisrewrite/). Again a basic free downloadable or a 99 USD purchase option exists with them.

However user experience suggests that the ISAPI_Rewrite product outperforms the others due to its ease of configuration and a bunch of other little extras. One of the biggest benefits with ISAPI_Rewrite is that you don't have to restart IIS each time you make a change to the .ini file. In other words once ispai-rewrite is installed, one can have the .ini file within the root folder so that changes can be made, as one goes along if necessary,without a restart.

Also these products support shared hosting. So the hosting provider can be convinced to buy them & install them. Some other products in this category are as under:

http://www.pstruh.cz/help/urlrepl/library.htm ( free ispai)

http://www.motobit.com/help/url-replacer-rewriter/iis-mod-rewrite.asp

Also if you are using .NET platform, this works for free:

http://msdn.microsoft.com/msdnmag/issues/02/08/HTTPFilters/