Peter Marklund

Peter Marklund's Home


Rails Recipe: HTML Validation

In this howto I'll show a simple approach to HTML validation that I use in my current Rails application. For me, HTML validation is a way to achieve wide browser compatibility, and to do a baseline check for correct rendering and UI brokenness. By putting ampersands and less-than signs in my test fixtures I can use HTML validation tests to check that I haven't forgotten any HTML quoting in my templates, i.e. that I haven't forgotten to use the following constructs:

<%=h some_variable %>
<%= link_to h(some_variable) ... %>

Two common tools for HTML validation are the W3C Validator and Tidy and since I've found them to be complementary I've decided to use both. Tidy warns about empty tags which the W3C validator doesn't. On the other hand Tidy sometimes misses obvious errors such as missing paragraph end tags.

The approach that I came up with for HTML validation was to do it in an after_filter, but only when tests are run, so I added the folling to my test_helper.rb:

# HTML validate the response of all requests
require File.join(File.dirname(__FILE__), '..', 'app', 'controllers', 'application')
class ApplicationController
  after_filter :assert_valid_markup

  def status_code

  def assert_valid_markup
    return if RAILS_ENV != 'test'
    return if !(status_code == 200 &&
      @response.headers['Content-Type'] =~ /text\/html/i && @response.body =~ /<html/i)


    # Going to the W3C validator over HTTP is a bit slow so we make this optional
    return if !ENV['HTML_VALIDATE']

  def assert_tidy
    tidy = RailsTidy.tidy_factory

      message = ("-" * 40) + $/
      i = 1
      @response.body.each do |line|
        message << sprintf("%4u %s", i, line)
        i += 1
      message << ("-" * 40) + $/
      message << tidy.errors.join($/)
    raise "Tidy detected html errors in response body: #{$/} #{message}" unless
  def assert_w3c_validates
    require 'net/http'
    print "Querying W3C XHTML validator ... "
    response = Net::HTTP.start('') do |w3c|
      query = 'fragment=' + CGI.escape(@response.body) + '&output=xml'
      w3c.post2('/check', query)
    raise response.body if response['x-w3c-validator-status'] != 'Valid'
    print response['x-w3c-validator-status']   

As you can see from the code above, I only do the time consuming HTTP request to the W3C validator if the environment variable HTML_VALIDATE is set. This way I can easily turn off W3C validation, the obvious risk here being that it always stays turned off. Possible solutions include running the tests with full HTML validation nightly and to install the W3C validator locally.

In the code above you can also see that I use the excellent assert_tidy command from the RailsTidy plugin, so installing that plugin along with the Tidy library itself is a prerequisite for the code to work.

Great tools that I use for manual HTML validation include the Web Developer Extension for Firefox with its W3C validation capability for local HTML and the Safari Tidy plugin that allows you to see for every loaded page any Tidy errors and warnings.

When selecting a DOCTYPE to validate against I was choosing between XHTML 1.0 strict and transitional and I was convinced by certain experts that strict was the way to go.

Doing HTML validation is not without its frustrations of course. For example I had to work around the fact that in XHTML 1.0 strict, form elements (input, select etc.) need to be inside a p, div, or fieldset tag. Also, Tidy requires tables to have the summary attribute. Those are just small annoyances though and I haven't come across any bigger stumbling blocks yet. All in all I'm very happy about my validation efforts and I have a lot more confidence in the UI of my Rails application now that I'm validating its markup automatically in my controller and integration tests.

7 comment(s)


coteyr said 2008-11-12 09:56:

there a bad space in there that needs fixing IO.popen ("validate #{RAILS_ROOT}/tmp/validate.html") { |f| should be IO.popen("validate #{RAILS_ROOT}/tmp/validate.html") { |f| Also calling it from the teardown method won't work. Use it as a normal helper method. If anyone knows how to call the method automatically after every get/post then let me know. The command line validate executable is available above in the post by Lambda and is an apt package to Ubuntu (most likly others). Just make sure it's in your path. It's fast enough to call every time you do a get/post. Just wish there was a better way.

coteyr said 2008-11-12 08:57:

#This should work for using local validator. Just require it in your Controller #test then set and after_filter require 'action_controller/test_process' require 'test/unit' require 'ftools' class Test::Unit::TestCase def assert_valid_markup(fragment=@response.body) lines = [] if !@response.redirect? #rails makes bad html validate_temp ="#{RAILS_ROOT}/tmp/validate.html", 'w') validate_temp.puts fragment validate_temp.close IO.popen ("validate #{RAILS_ROOT}/tmp/validate.html") { |f| lines = f.readlines } end if lines.length == 0 #valid makup returns nothing assert true else #report why puts lines assert false, 'Markup validation failed' end end end

Lambda said 2008-10-13 07:56:

RE: I agree that it would be great if the W3C validator could run locally. Anyone know if there an easy way to do this? The w3c validator is open source and can be downloaded from:

Tung said 2006-10-20 19:58:

assert_valid_markup by Peter Donald

Tung said 2006-10-20 19:49:

I agree that it would be great if the W3C validator could run locally. Anyone know if there an easy way to do this?

Jarkko said 2006-09-06 04:34:

Hmmm... seems that the textile parser doesn't work quite right. A preview would be nice :-)

Jarkko Laine said 2006-09-06 04:33:

Great recipe, Peter! bq. For example I had to work around the fact that in XHTML 1.0 strict, form elements (input, select etc.) need to be inside a p, div, or fieldset tag. Work around? That's how you should do it, not work around it ;-) bq. Tidy requires tables to have the summary attribute. If you run any accessibility check, you will see that you must have the summary anyway. If you can't come up with a good summary for the table, it's a good indicator that you're using tables for layout which you should avoid anyway. I'm thus very grateful that tidy nags about them, they are my mistakes after all. The problem with using the W3C validator is that it really makes the tests too slow. In fact, for our current project querying the validator timed out several times, causing the test process to break. It would be very interesting to get the W3C validator running locally, though.