Peter Marklund

Peter Marklund's Home

Fri August 18, 2006
Programming

Rails Recipe: HTML Validation

In this howto I'll show a simple approach to HTML validation that I use in my current Rails application. For me, HTML validation is a way to achieve wide browser compatibility, and to do a baseline check for correct rendering and UI brokenness. By putting ampersands and less-than signs in my test fixtures I can use HTML validation tests to check that I haven't forgotten any HTML quoting in my templates, i.e. that I haven't forgotten to use the following constructs:

<%=h some_variable %>
<%= link_to h(some_variable) ... %>

Two common tools for HTML validation are the W3C Validator and Tidy and since I've found them to be complementary I've decided to use both. Tidy warns about empty tags which the W3C validator doesn't. On the other hand Tidy sometimes misses obvious errors such as missing paragraph end tags.

The approach that I came up with for HTML validation was to do it in an after_filter, but only when tests are run, so I added the folling to my test_helper.rb:

# HTML validate the response of all requests
require File.join(File.dirname(__FILE__), '..', 'app', 'controllers', 'application')
class ApplicationController
  after_filter :assert_valid_markup

  def status_code
    @response.headers['Status'][0,3].to_i
  end

  def assert_valid_markup
    return if RAILS_ENV != 'test'
    return if !(status_code == 200 &&
      @response.headers['Content-Type'] =~ /text\/html/i && @response.body =~ /<html/i)

    assert_tidy

    # Going to the W3C validator over HTTP is a bit slow so we make this optional
    return if !ENV['HTML_VALIDATE']
    assert_w3c_validates
  end

  def assert_tidy
    tidy = RailsTidy.tidy_factory
    tidy.clean(@response.body)

    unless tidy.errors.size.zero?
      message = ("-" * 40) + $/
      i = 1
      @response.body.each do |line|
        message << sprintf("%4u %s", i, line)
        i += 1
      end
      message << ("-" * 40) + $/
      message << tidy.errors.join($/)
    end
    raise "Tidy detected html errors in response body: #{$/} #{message}" unless tidy.errors.size.zero?
    tidy.release
  end
 
  def assert_w3c_validates
    require 'net/http'
    print "Querying W3C XHTML validator ... "
    response = Net::HTTP.start('validator.w3.org') do |w3c|
      query = 'fragment=' + CGI.escape(@response.body) + '&output=xml'
      w3c.post2('/check', query)
    end
    raise response.body if response['x-w3c-validator-status'] != 'Valid'
    print response['x-w3c-validator-status']   
  end
end

As you can see from the code above, I only do the time consuming HTTP request to the W3C validator if the environment variable HTML_VALIDATE is set. This way I can easily turn off W3C validation, the obvious risk here being that it always stays turned off. Possible solutions include running the tests with full HTML validation nightly and to install the W3C validator locally.

In the code above you can also see that I use the excellent assert_tidy command from the RailsTidy plugin, so installing that plugin along with the Tidy library itself is a prerequisite for the code to work.

Great tools that I use for manual HTML validation include the Web Developer Extension for Firefox with its W3C validation capability for local HTML and the Safari Tidy plugin that allows you to see for every loaded page any Tidy errors and warnings.

When selecting a DOCTYPE to validate against I was choosing between XHTML 1.0 strict and transitional and I was convinced by certain experts that strict was the way to go.

Doing HTML validation is not without its frustrations of course. For example I had to work around the fact that in XHTML 1.0 strict, form elements (input, select etc.) need to be inside a p, div, or fieldset tag. Also, Tidy requires tables to have the summary attribute. Those are just small annoyances though and I haven't come across any bigger stumbling blocks yet. All in all I'm very happy about my validation efforts and I have a lot more confidence in the UI of my Rails application now that I'm validating its markup automatically in my controller and integration tests.

Comments

Jarkko Laine said over 8 years ago:

Great recipe, Peter!

For example I had to work around the fact that in XHTML 1.0 strict, form elements (input, select etc.) need to be inside a p, div, or fieldset tag.

Work around? That’s how you should do it, not work around it ;-)

Tidy requires tables to have the summary attribute.

If you run any accessibility check, you will see that you must have the summary anyway. If you can’t come up with a good summary for the table, it’s a good indicator that you’re using tables for layout which you should avoid anyway. I’m thus very grateful that tidy nags about them, they are my mistakes after all.

The problem with using the W3C validator is that it really makes the tests too slow. In fact, for our current project querying the validator timed out several times, causing the test process to break. It would be very interesting to get the W3C validator running locally, though.

--------------------------------------------------------------------------------

Jarkko said over 8 years ago:

Hmmm… seems that the textile parser doesn’t work quite right. A preview would be nice :-)

--------------------------------------------------------------------------------

Tung said over 8 years ago:

I agree that it would be great if the W3C validator could run locally. Anyone know if there an easy way to do this?

--------------------------------------------------------------------------------

Tung said over 8 years ago:

assert_valid_markup by Peter Donald

http://www.realityforge.org/articles/2006/03/15/rails-plugin-to-validate-x-html-and-css

--------------------------------------------------------------------------------

Lambda said over 6 years ago:

RE: I agree that it would be great if the W3C validator could run locally. Anyone know if there an easy way to do this?

The w3c validator is open source and can be downloaded from:
http://validator.w3.org/source/
http://validator.w3.org/validator.tar.gz

--------------------------------------------------------------------------------

coteyr said over 6 years ago:

#This should work for using local validator. Just require it in your Controller
#test then set and after_filter
require 'action_controller/test_process'
require 'test/unit'
require 'ftools'

class Test::Unit::TestCase
def assert_valid_markup(fragment=@response.body)
lines = []
if !@response.redirect? #rails makes bad html
validate_temp = File.new("#{RAILS_ROOT}/tmp/validate.html", 'w')
validate_temp.puts fragment
validate_temp.close

IO.popen ("validate #{RAILS_ROOT}/tmp/validate.html") { |f|
lines = f.readlines
}
end
if lines.length == 0
#valid makup returns nothing
assert true
else
#report why
puts lines
assert false, 'Markup validation failed'
end
end
end

--------------------------------------------------------------------------------

coteyr said over 6 years ago:

there a bad space in there that needs fixing
IO.popen ("validate #{RAILS_ROOT}/tmp/validate.html") { |f|
should be
IO.popen("validate #{RAILS_ROOT}/tmp/validate.html") { |f|

Also calling it from the teardown method won't work. Use it as a normal helper method.

If anyone knows how to call the method automatically after every get/post then let me know. The command line validate executable is available above in the post by Lambda and is an apt package to Ubuntu (most likly others). Just make sure it's in your path. It's fast enough to call every time you do a get/post. Just wish there was a better way.

--------------------------------------------------------------------------------