Sunday, June 22, 2008

Dynamic URLs Rewrites :-

Dynamic pages are roadblocks to high search engine positioning. Especially those that end in "?" or "&". In a dynamic site, variables are passed to the URL and the page is generated dynamically, often from information stored in a database as is the case with many e-commerce sites. Normal .html pages are static - they are hard-coded, their information does not change, and there are no "?" or "&" characters in the URL.


URL rewrites are programming techniques that allow the returned URL to be more search engine friendly by removing the question mark (?) and ampersand (&) from the returned URL found in the location or address bar. This enables the search engines to index the page without having variables or session id's interlaced into the URL.

Pages with dynamic URLs are present in several engines, notably Google and AltaVista, even though publicly AltaVista claims their spider does not crawl dynamic URLs. To a spider a "?" represents a sea of endless possibilities - some pages can automatically generate a potentially massive number of URLs, trapping the spider in a virtually infinite loop.

As a general rule, search engines will not properly index documents that:

  • contain a "?" or "&"

  • End in the following document types: .cfm, .asp, .shtml, .php, .stm, .jsp, .cgi, .pl

  • Could potentially generate a large number of URLs.

In these cases, where page should be dynamic it is possible to clean up their query strings. URL rewriting generally clean up ‘?’, ‘&’, ‘+’ symbols in URLs to more user friendly characters. Check out the following URL: http://www.yourdomain.com/shop.php?cat_id=1&item_id=2

This dynamic URL can be converted into: http://www.yourdomain.com/shoppinglist/apparels/shirts

This makes the page look static but in actual it is dynamic. URL rewriting needs some serious strategy and planning. There are few tools available fro URL rewriting. These are rule-based tools and the most famous tools are ‘More Rewrite’ for Apache and ISAPI rewrite for IIS. Mode Rewrite can be used to solve all sorts of URL based problems. It provides all the functions you need to manipulate URLs. But because of its complex rule based matching engine, it’s hard to learn. However once you understand the basic idea you can master all of its features. ISAPI Rewrite is a powerful URL manipulation engine based on regular expressions. It acts mostly like Apache's mod_Rewrite, but it is designed specifically for Microsoft Internet Information Server (IIS). ISAPI Rewrite is an ISAPI filter written in pure C/C++ so it is extremely fast. ISAPI Rewrite gives you the freedom to go beyond standard URL schemes and develop your own scheme.


There are two types of URL rewrites. Both are there to make I search engine friendly but the advanced URL rewrites is search engine friendly.


Non-URL Rewrite URL

http://www.yourdomain.com/shop.php?cat_id=1&item_id=2

The above URL indicates to the database that the returned information should be from the category with id equal to 1 and the item id equal to 2. This works fine for the system because it understands the variables. Many search engines however do not understand this form of URL.


Simple URL Rewrite

http://www.yourdomain.com/shop/1/2.html

The simple URL rewrite will take the URL and modify it so that it appears without the question mark (?) and ampersand (&). This enables all search engines to index your all of your pages, but still lacks in some important areas.


Advanced URL Rewrite

http://www.yourdomain.com/oranges/mandarin_oranges.html

The advanced URL rewrite enables your URLs to include your keywords. This is another location search engines look for important information about your pages. Being able to include keywords in your URL helps elevate your page to the top of the search engine result pages.

URLs can be cleaned server-side using a web server extension that implements content negotiation, such as mod_negotiation for Apache or PageXchanger for IIS. However, getting a filter that can do the content negotiation is only half of the job. The underlying URLs present in HTML or other files must have their file extensions removed in order to realize the abstraction and security benefits of content negotiation. Removing the file extensions in source code is easy enough using search and replace in a web editor like Dreamweaver MX or HomeSite. Some tools like w3compiler also are being developed to improve page preparation for negotiation and transmission. One word of assurance: don't jump to the conclusion that your files won't be named page.html anymore. Remember that, on your server, the precious extensions are safe and sound. Content negotiation only means that the extensions disappear from source code, markup, and typed URLs.

To avoid complications, consider creating static pages whenever possible, perhaps using the database to update the pages, not to generate them on the fly.

No comments: