The last of the "technical" parts of this mini-series is about how to validate an XML file with a parser, against an XSD schema. This would complete the list of things I learned in this journey to create a T5 XML file, that I didn't know before I started.
For those interested in the previous parts of the series:
- Part 1 is the story behind all of this.
- Part 2 is a deep dive into the SQL code and syntax for the query.
- Part 3 is about getting the namespaces correct on the XML file.
What's a parser?
So, an actual developer reading this is probably going to laugh, but I had no idea that was a thing. I had created all of the code, painstakingly checked the XML it created and ensured it matched exactly what the CRA's specs had, or so I thought. I made sure that I included every single element listed, validated the lengths and content types of the data, made sure I had no typos in my element names etc., I thought I was set. What I ultimately missed was I had empty elements in the file, which I had assumed would be no issue as they are typically fine in other usages of XML in my experience.
The Finance team submitted the T5 XML file to the CRA for processing and a few hours later, we got a lovely email back, more or less scolding us for not checking the file against the XSD before submitting. Doh. Embarrassing!
The CRA website themselves said to check against a parser, but otherwise, I was completely on my own for where to find one and how to do it. So… this post may be useful to someone out there.
Notepad++ can do it
It turns out the tool I was using to check the file also has a plugin to validate a file against an XSD schema. Here's what I needed to do:
Download and install Notepad++
It's free and it's a great tool for various things.
Install XML Tools plugin
Go to the Plugins menu > Plugins Admin > find XML Tools in the list and click install.
Download & extract the schema files
In my case, I had to download the Canada Revenue Agency's XML schema which is a zip file containing 64 different XSD schema files. Unzip it into a folder. If I was validating against some other kind of schema, the same logic would apply here for other schemas I'm sure.
Save the XML file in the schema folder
This might not apply to everyone, and there's likely a different way to do this, but if my XML file wasn't in the same folder as the schema, I got this error. I didn't want to alter the pathname on the header to make it work so saving my XML with the schema seemed like the simplest thing to do.
Run the "Validate Now" option for XML Tools
On the plugins menu > XML Tools, choose Validate Now. I didn't touch any of the other settings and options in this window, FWIW. If it has any issues, I would get things like this:
This is based on a mocked-up file to show what kinds of errors I got originally. I had dozens of lines of T5Slip sections and the same errors were occurring in most of my files. It's these errors that drove the changes in the "Big T5Slip" section of my 2nd part in this series, where I indicated a "see note 1" and "see note 2" section.
- I had to ensure the data was NULL, not empty strings if the element should be excluded from the file. That addressed the "addr2_l2_txt" messages in this validation error list. The CRA is telling me if the tag exists, it expects a minimum length of 1 character, not 0 characters.
- On Parent-Child elements where there would be no child data, I needed to exclude the parent. Having NULL data in the child elements wasn't enough, so I used WHERE clauses to handle the situations where the BUS_NM or RCPNT_NM sections would be hidden or visible.
- I would see other errors if an element is missing entirely. That example wasn't in my file but it triggers an "expecting ___" type of message.
- If the data isn't in the right format or value, I will also get an error. I played with one thing to trigger an error (but didn't get a screenshot) where there was an element called recipient type where it's expecting a value of 1 to 5. I left it blank to see what the error was and it said the value must be a single character and 1 to 5 are valid.
So, overall, since the file I had was so close to correct, Notepad++ worked just fine for my needs to validate against the schema.
That's it. Once I corrected the areas causing the error, eventually I got to this. Yay! I hope this helps someone out there!