Peter Marklund

Peter Marklund's Home

Fri Aug 18 2006 05:38:00 GMT+0000 (Coordinated Universal Time)

Rails Recipe: HTML Validation

In this howto I'll show a simple approach to HTML validation that I use in my current Rails application. For me, HTML validation is a way to achieve wide browser compatibility, and to do a baseline check for correct rendering and UI brokenness. By putting ampersands and less-than signs in my test fixtures I can use HTML validation tests to check that I haven't forgotten any HTML quoting in my templates, i.e. that I haven't forgotten to use the following constructs:

<%=h some_variable %>
<%= link_to h(some_variable) ... %>

Two common tools for HTML validation are the W3C Validator and Tidy and since I've found them to be complementary I've decided to use both. Tidy warns about empty tags which the W3C validator doesn't. On the other hand Tidy sometimes misses obvious errors such as missing paragraph end tags.

The approach that I came up with for HTML validation was to do it in an after_filter, but only when tests are run, so I added the folling to my test_helper.rb:

# HTML validate the response of all requests
require File.join(File.dirname(__FILE__), '..', 'app', 'controllers', 'application')
class ApplicationController
  after_filter :assert_valid_markup

  def status_code
    @response.headers['Status'][0,3].to_i
  end

  def assert_valid_markup
    return if RAILS_ENV != 'test'
    return if !(status_code == 200 &&
      @response.headers['Content-Type'] =~ /text\/html/i && @response.body =~ /<html/i)

    assert_tidy

    # Going to the W3C validator over HTTP is a bit slow so we make this optional
    return if !ENV['HTML_VALIDATE']
    assert_w3c_validates
  end

  def assert_tidy
    tidy = RailsTidy.tidy_factory
    tidy.clean(@response.body)

    unless tidy.errors.size.zero?
      message = ("-" * 40) + $/
      i = 1
      @response.body.each do |line|
        message << sprintf("%4u %s", i, line)
        i += 1
      end
      message << ("-" * 40) + $/
      message << tidy.errors.join($/)
    end
    raise "Tidy detected html errors in response body: #{$/} #{message}" unless tidy.errors.size.zero?
    tidy.release
  end
 
  def assert_w3c_validates
    require 'net/http'
    print "Querying W3C XHTML validator ... "
    response = Net::HTTP.start('validator.w3.org') do |w3c|
      query = 'fragment=' + CGI.escape(@response.body) + '&output=xml'
      w3c.post2('/check', query)
    end
    raise response.body if response['x-w3c-validator-status'] != 'Valid'
    print response['x-w3c-validator-status']   
  end
end

As you can see from the code above, I only do the time consuming HTTP request to the W3C validator if the environment variable HTML_VALIDATE is set. This way I can easily turn off W3C validation, the obvious risk here being that it always stays turned off. Possible solutions include running the tests with full HTML validation nightly and to install the W3C validator locally.

In the code above you can also see that I use the excellent assert_tidy command from the RailsTidy plugin, so installing that plugin along with the Tidy library itself is a prerequisite for the code to work.

Great tools that I use for manual HTML validation include the Web Developer Extension for Firefox with its W3C validation capability for local HTML and the Safari Tidy plugin that allows you to see for every loaded page any Tidy errors and warnings.

When selecting a DOCTYPE to validate against I was choosing between XHTML 1.0 strict and transitional and I was convinced by certain experts that strict was the way to go.

Doing HTML validation is not without its frustrations of course. For example I had to work around the fact that in XHTML 1.0 strict, form elements (input, select etc.) need to be inside a p, div, or fieldset tag. Also, Tidy requires tables to have the summary attribute. Those are just small annoyances though and I haven't come across any bigger stumbling blocks yet. All in all I'm very happy about my validation efforts and I have a lot more confidence in the UI of my Rails application now that I'm validating its markup automatically in my controller and integration tests.