When a test file declares one or more expected diagnostics, we check those
instead of checking the result value. The severities and source ranges
must match.
We don't test the error messages themselves because they are not part of
the specification and may vary between implementations or, in future, be
translated into other languages.
The harness can now run tests that decode successfully and compare the
result with a given value. Further work is required in later commits to
deal with other cases, such as tests that intentionally produce errors.