Future web, dynamic or static?
It is perhaps hard to contest that modern web becomes increasingly dynamic in nature. There is barely a site out there, which does not have a part of its content if not the whole dynamically generated. All the mighty in the industry are throwing their weight behind their respective technologies. Herds of them exist today and counting. The (relatively) recent advancement of client-based dynamic content generation, Flash and AJAX in particular brings it all to a whole new level.
Points of entry. Many or one?
There are also so called (in a good sense) best practices, or patterns, etc of the good web design. For instance, nowadays predominant MVC (Model View Controller). Not that I am trying to say that you don't already know what it is, or unfamiliar with the term. More like in ever so populated terminology space there are duplicates and triplicates of the same abbreviations sometimes. Though I've never heard of MVC being overloaded before, but there's always someone in the crowd who definitely has. Many of the web oriented MVC design patterns pitch an idea of the 'front controller', a universal application dispatcher and coordinator, which orchestrates delivery of the incoming requests to its appropriate place in the handling chain. That is everything is siphoned through this component. In the URL terms, how may it look like? Perhaps somewhat like: http://your.site.com/FrontController?bla=ssdas&bla1=... In URL terms, requests mostly only differ in the query part relevant it might be in the context of search engines? We're about to find out.
Dynamic URLs. Bad?
Speaking of URLs inside our dynamic pages, perhaps they are unlikely static as well? Except for image URLs for the most part. URL rewrite technique is convenient as it gives the same power without involving forms. Yes, ASP.NET tribe, no forms and no __postback() javascripting. I'd call dynamic URLs 'inline forms'. Tracking of a current page and total number of pages, previous and next page numbers (if page long enough and paginated), and whatever else we may stuff these URLs with. Is it bad? Perhaps not. Or at least not always. Some webservers actually re-write URLs behind your back. For example take almost any servlet/jsp container. Disable cookies in your browser and give it a try on such a server with one of its demo apps. Occasionally look at pages source. Very probable you will see something looking like ';jsessionid' appended to the end of some URLs. Yea, to make things faster many if not all servlet/jsp containers pre-create user sessions for you in hope that you eventually will need them. Of course servlet container would attempt to use cookies first, but with many browsers now supporting cookie filters and users being ever security paranoid over plethora of viruses, spy and malware, there's a good chance that session cookies will be rejected every now and then. So what might it all mean for the search engine? Read on!
In-browser rendered dynamic content
Client side scripting, whether a form of JavaScript driven dynamic HTML or plugin based technology (like Flash), though very convenient for humans and pleasing in appearance, it is not so much perhaps altogether not at all convenient, neither it is pretty to the faceted eye of a search engine. How much not pretty? We'll see.
Ranks to the flanks
Everyone seem obsessed over their site, page, blog, etc search engines rankings. If you aren't listed, you do not exist. You have got to be in a top 10. Myth or fact: people (almost) never look past top 10 - a (first) page of search results? Not mentioning that some search engines result pages contain more that 10 entries on the first page ;). Search engine optimization services of all shapes, forms and colors flock the cyberspace. One can spit and likely it'll land on some SEO. They might be very much like hair loss remedies, sometimes.
If Search Engines were humans
So how are the search engines of a present day cope with all of that? Hopefully not in a way The Inquisition did. I claim no expertise in search engines technology, in fact I know perhaps little more or little less about it than your ordinary Joe the programmer. I'm only rumbling over what is published on the Net. Cursory examination of webmaster guidelines, published by some mainstream search engines brings a number of things an S.E. weary webmaster may want to watch for. Here's a short (perhaps incomplete - feel free to add) summary of them:
- Every page's to be accessible through a static link
- A some sort of word of a caution regarding dynamic URLs, i.e try to be able to render (navigate through) your pages without them
- Content, produced by means of in-browser scripting, i.e. JavaScript-ed DHTML (there goes your AJAX), or plugin based (followed by Flash, cheers) is unlikely to be considered
- Re-iteration of the word of caution regarding dynamic URLs, usually more specific this time. To a webmaster that may mean that some sorts of URL query parameters you might be willig to use, such as language code, page numbers, article numbers, view modes, etc may hinder crawler's crawling abilities
- Multi-sourced pages. I.e. framesets are tough for crawlers. One's guess might be that IFRAMEs are perhaps just as tough.
- Google goes further on a path of dynamic URLs - they seemingly ignore URLs, which contain '&id=':
...Don't use "&id=" as a parameter in your URLs, as we don't include these pages in our index.
Read more at: www.google.com/webmasters/guidelines.html
Let's give a closer look to what these guidelines may actually mean with respect to popular web development techniques.
Single application entry point (AKA Front Controller)
One of the ways in using front controller is to have query parameters to steer it. I.e. for example, in URL terms:
http://your.site.com/FrontController?page='MyPage'&lang=en
Renders 'MyPage' in English.
Sometimes, perhaps not without search engine factor involved, front controller may draw its input from remainig URL path, found beyond its context path, i.e. what follows it in the URL:
http://your.site.com/FrontController/page/MyPage?&lang=en
Still, even with this approach there might be some degree of query parameterizing desired, such as this. It will become more problematic to encode parameters as fragments of URL paths as number of parameters grow, contributing factor being that query parameters are in nature name/value pairs, something that URL path fragments are not. It may quickly grow out of hand and become plain ugly to say the least. However how much does this matter in connection to the webmaster guidelines listed above? Let's see:
- .... Rrriight.Could be a bit tricky, but certainly doable. Nothing is not-doable. We can always go with second approach in our front controller, and default whatever additional parameters we might have to some meaningful values. Like setting page language to 'en' in above example. That will work. Unless we want search engine to index our French version of a page as well. And might be our Spanish one too... On to the next point!
- .... This is pretty much along the same lines with the first point with regard to a front controller. Remaining points can be either seen through the same lens (4) or not applicable to front controller.
Dynamic URLs, URL re-write, form-based navigation(ASP.NET)
It might be a warm welcoming to the search engine hell for ASP.NET folks. Looks like they have to either avoid using some goodies of ASP.NET (which are, unfortunately based on form navigation) in favor of more traditional URL encoding, or probably forget about search engines all together. Why is that? Mainly because of:
- I am not 100% sure on that but I have a feeling that crawlers won't submit your forms. Secondly...
- Number 3 above.Yes, to differentiate between different ASP.NET events form must be submitted in different ways, which thereby done by javascripting form submission. Bummer.
It's not perhaps as gloomy for URL re-write and other dynamic form of URLs as it appears, because it seems that search engines would honor them for the most part, with noted exceptions.At least not perhaps as dark as to producers of in-browser generated content...
The Flash of AJAX
Perhaps AJAX and Flash content producing folk may seem to be in the deepest trouble of all regarding the guidelines. They are outright denied any indexing of their content at least until search engines begin executing javascript and Flash-ing as they crawl. By that time, perhaps, that ASP.NET hell will freeze over ;). However, AJAX people are probably more or less OK as long as there is at least some HTML left that crawlers can see without actually running any javascript.It is likely often the case as AJAX apps tend to be user session centric, like web based e-mail, all sorts of online planners,etc that require user personification prior to their usage. But if the application is entirely in Flash, and HTML is only used to bootstrap the Flash player, there is perhaps little if anything that might get you noticed by the crawler. RSS feeds maybe, who knows.
If humans were search engines
One theme also sounds through the webmaster guide lines. That is, develop your site around your users, not search engines. If it was ever so simple...
References
All trademarks, which may be mentioned in this article for the purpose of discussion are copyrights of their respective owners