Peter Marklund

Peter Marklund's Home

Thu Nov 09 2006 13:57:00 GMT+0000 (Coordinated Universal Time)

Rails QA: Watch Out for Duplication in your Fixtures

I had a long debugging session due to duplicated record keys in one of my fixture files. It turns out the YAML parser will not complain if you have duplicated record keys (the keys on the top level) or duplicated column keys. Eventhough the YAML specification says that map keys should be unique the YAML parser will happily overwrite any existing value for a key if it encounters that key again.

Under the motto of crashing early and avoiding silent failures I came up with the following unit test:

# Check for duplication in Fixture files, i.e. that:
#
# - keys of records are unique
# - column names are unique
# - id values are unique
#
# The idea is that this test can complent the syntax checking of the YAML
# parser and help avoid debugging nightmares due to duplicated records or columns.
# What the YAML parser will do when a record or column key is duplicated is it
# will just use the last one and let that overwrite the earlier ones.
# Before writing this script I tried using the Kwalify YAML validator but I
# couldn't quite coerce it into doing what I wanted.
class FixtureTest < Test::Unit::TestCase
  def test_fixtures
    fixture_file_paths.each do |file_path|
      initialize_variables(file_path)
      fixture_contents(file_path).each do |line|
        next if skip_line?(line)
        if is_record?(line)
          assert_record_not_dupe(line)
        elsif is_column?(line)
          assert_column_not_dupe(line)
          assert_id_not_dupe(line) if is_id?(line)
        end # End if statement
      end # End line loop
    end # End fixture file loop
  end

  private
  def initialize_variables(file_path)
    @file_path = file_path
    @record_keys = []
    @column_keys = []
    @ids = []   
  end

  def fixture_file_paths
    Dir["#{Test::Unit::TestCase.fixture_path}/**/*.yml"]
  end

  def fixture_contents(file_path)
    ERB.new(IO.read(file_path)).result
  end

  # Skip YAML directive, comments, and whitespace lines
  def skip_line?(line)
    line =~ /^\s*$/ || line =~ /^\s*#/ || line =~ /^---/    
  end

  # A record has no indentation (leading white space)
  def is_record?(line)
    line =~ /^\S/
  end

  # A column line has some indentation (should be same as the first column)
  # and a colon
  def is_column?(line)
    return false if line !~ /^\s+[a-zA-Z_]+:/

    # Looks like a column - check the indentation level as well
    indent_level = line[/^(\s+)/].length
    if @column_indent
      # If the indentation level is different from the first column this may
      # not be a column, it could be nested data of some sort
      return @column_indent == indent_level
    else
      # This is the first column so remember its indentation level
      @column_indent = indent_level
      return true
    end
  end

  def is_id?(line)
    column_key(line) == "id"
  end

  def column_key(line)
    line[/^\s+([^:]+):/, 1]
  end

  def assert_record_not_dupe(line)
    record_key = line[/^(?:- )?([^:]+):/, 1]
    assert !@record_keys.include?(record_key),
      "Record key #{record_key} in fixture file #{@file_path} " +
      "is duplicated on this line: #{line.chomp}"
    @record_keys << record_key
    @column_keys = []
    @column_indent = nil
  end

  def assert_column_not_dupe(line)
    assert !@column_keys.include?(column_key(line)),
      "Column #{column_key(line)} for record #{@record_keys.last} is duplicated " +
      "in file #{@file_path} on this line: #{line.chomp}"    
    @column_keys << column_key(line)
  end

  def assert_id_not_dupe(line)
    id_value = line[/^\s+id:\s*(\S+)/, 1]
    assert !@ids.include?(id_value),
      "Value for id column duplicated in file #{@file_path} on this " +
      "line: #{line.chomp}"
    @ids << id_value
  end
end