Fun with XML, part 4

The last of the “technical” parts of this mini-series is about how to validate an XML file with a parser, against an XSD schema. This would complete the list of things I learned in this journey to create a T5 XML file, that I didn’t know before I started.

If you’re interested in the previous parts, part 1 is the story behind all of this, part 2 is a deep dive into the SQL code and syntax for the query, and part 3 is about getting the namespaces correct on the XML file.

What’s a parser?

So, if you’re an actual developer reading this, you’ll probably laugh, but I had no idea that was a thing. I had created all of the code, painstakingly checked the XML it created and ensured it matched exactly what the CRA’s specs had, or so I thought. I made absolutely sure that I included every single element listed, validated the lengths and content types of the data, make sure I had no typos in my element names etc., I thought I was set. What I ultimately missed was I had empty elements in the file, which I had assumed would be no issue as they are typically fine in other usages of XML in my experience.

The Finance team submitted the T5 XML file to the CRA for processing and a few hours later, we got a lovely email back, more or less scolding us for not checking the file against the XSD before submitting. Doh. Embarrassing!

The CRA website themselves said to check against a parser, but otherwise you’re completely on your own for where to find one and how to do it. So… this post may be useful to someone out there.

Notepad++ can do it

It turns out the tool I was using to check the file also has a plugin to validate a file against an XSD schema. Here’s what you need to do:

Download and install Notepad++

…if you haven’t already got it. It’s free and it’s a great tool for various things.

Install XML Tools plugin

Go to the Plugins menu > Plugins Admin > find XML Tools in the list and click install.

Download & extract your schema files

In my case, I had to download the Canada Revenue Agency’s XML schema which is a zip file containing 64 different XSD schema files. Unzip it into a folder. If you’re validating against some other kind of schema, the same logic will apply here for other schemas I’m sure.

Save XML file in the schema folder

This might not apply to everyone, and there’s likely a different way to do this, but if my XML file wasn’t in the same folder as the schema, I got this error. I didn’t want to alter the pathname on the header to make it work so saving my XML with the schema seemed like the simplest thing to do.

Run “Validate Now” option for XML Tools

On the plugins menu > XML Tools, choose Validate Now. I didn’t touch any of the other settings and options in this window, FWIW. If it has any issues, you get things like this:

This is based on a mocked up file to show what kinds of errors I got originally. I had dozens of lines of T5Slip sections and the same errors were occurring in most of my file. It’s these errors that drove the changes in the “Big T5Slip” section of my 2nd part in this series, where I indicated a “see note 1” and “see note 2” section.

  1. I had to ensure the data was NULL not empty strings if the element should be excluded from the file. That addressed the “addr2_l2_txt” messages in this validation error list. The CRA is telling me if the tag exists, it expects a minimum length of 1 character not 0 characters.
  2. On Parent-Child elements where there would be no child data, I needed to exclude the parent. Having NULL data in the child elements wasn’t enough, so I used WHERE clauses to handle the situations where the BUS_NM or RCPNT_NM sections would be hidden or visible.
  3. You will see other errors if an element is missing entirely. That example wasn’t in my file but it triggers an “expecting ___” type of message.
  4. If the data isn’t in the right format or value, you will also get an error. I played with one thing to trigger an error (but didn’t get a screen shot) where there was an element called recipient type where it’s expecting a value of 1 to 5. I left it blank to see what the error was and it basically said the value must be a single character and 1 to 5 are valid.

So, overall, since the file I had was so close to correct, Notepad++ worked just fine for my needs to validate against the schema.

That’s it. Once I corrected the areas causing the error, eventually I got to this. Yay! I hope this helps someone out there!

comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy