Today’s post is a continuation of the mini-series I started on Feb 28th. You can read the first part here if you’d like, which contains the story behind this. In this post I’ll be getting into the technical details on what I learned.
This is about the guts of the XML itself, in SQL, where I ended up writing this procedure. First, before I get into the details, I wanted to reference this Stack Overflow post that was the single most useful post and example for the work I did. The person who posted the example, Ryan, also laid out a very simple temp table structure so I literally copied that code and pasted it into my test SQL server and played with it to understand it.
What I needed to create
The file format I needed to create is for the Canada Revenue Agency and specifically for T5 submissions. The specs are relatively well laid out here, although the formatting of the page makes it hard to read the details and find sections easily (IMHO).
The basic format of the file is this:
Here’s a visual example of the simple file I’ve mocked up in order to write this post. This is how my XML results from SQL show in Notepad++. I have 4 T5 slips so my “Return > T5” section shows 4 sets of T5Slip elements, and then the 1 T5Summary to complete the T5 section. Beyond that each file would only have one T619 section as that is just information about the organization filing the submission. The T619 section is collapsed so the end tag isn’t visible, likewise with the T5Slip and T5Summary sections.
What I found challenging
There were a few challenging things along the way. The biggest was there are multiple levels of information required in the file and based on the Stack Overflow post and answer, that would require multiple levels of nested select statements to accommodate. At times it was getting hard to identify which part was related to what parent element. In each section, there are further nested elements required which made the file contain a lot more levels than any example I could find online to follow along with. For example: any fields with names or addresses are encapsulated in a parent node with child elements underneath.
How I approached it
First, I used the example from that Stack Overflow post referenced at the top of this article. I used the sample temp table scripts and dummy data and then played with it, trying to add more complexity to it, until I understood the code. I’m not a developer, so this was all new to me. I’ve written tons of SQL queries in my life but none using FOR XML.
Second, I mapped his example vs. my file format to see what sections aligned with his, now that I understood the working parts a little better. In my case I had to nest a lot deeper than the example so I basically started from the inside out, at the lowest level of the child elements in the file. Here’s an example. The original example lined up nicely with the T5 section inwards. This helped me get my query right, from the simplest innermost detail and working outwards back to the Submission level overall. The biggest diff in his example and mine: this data was all coming from one source table, not 3 tables. His example had an bunch of joins to get it working together which I didn’t need, yet.
Stack Overflow example & my comparative section in brackets
(T5)
(T5Slip)
(fields I need not in the original example)
(Recipient Name section)
some data
(Recipient Address section)
some data
(T5 Amount section)
some data
(Next T5 Slip)
repeat as needed for next customer section
<Elements
I started with the T619 section since that was the simplest to create: it would contain 1 record of data, for our organization, plus, it has fewer nested levels to figure out.
Example using FOR XML PATH
Using a generic example that resembles the T619 section of the CRA XML, my sample below is based on my data fields in SQL not being the same as the element names I need in the file so I am aliasing everything. I show some constant values as well, since some of my XML file were constant values hard-coded in my query, where most of the rest came from SQL tables. I used PATH instead of AUTO because I found it easier to work with, by explicitly telling SQL what my XML element name was for a given section. Each subquery is for an area where there is a parent-child relationship to lower level data in the file.
SELECT ‘TEST01’ as element1,
A.FilerBN as element2,
(SELECT OrgName as Name1
FROM MySQLTable1
FOR XML PATH(‘FilerName’),TYPE),
(SELECT OrgAddr1 as Addr1,
OrgCity as City,
OrgProv as Prov,
OrgCountry as Country,
OrgPC as PC
FROM MySQLTable1
FOR XML PATH(‘FilerAddress’),TYPE),
(SELECT OrgContact as ContactName,
OrgPhone as ContactPhone,
OrgEmail as ContactEmail
FROM MySQLTable1
FOR XML PATH(‘FilerContact’),TYPE)
FROM MySQLTable1 A
FOR XML PATH(‘T619’), TYPE
The results of that code generated this in SQL. This resembles the T619, with fewer fields just to demonstrate the concept.
Once that section was working, I then moved on to what would make that fit in the larger file, i.e. the T619 section fits inside a Submission section. So, I took that script above and placed it inside another SELECT statement, within brackets. I’ve italicized the part that was already there, and bold, non-italic text is what’s new.
SELECT
(SELECT ‘TEST01’ as element1,
A.FilerBN as element2,
(SELECT OrgName as Name1
FROM MySQLTable1
FOR XML PATH(‘FilerName’),TYPE),
(SELECT OrgAddr1 as Addr1,
OrgCity as City,
OrgProv as Prov,
OrgCountry as Country,
OrgPC as PC
FROM MySQLTable1
FOR XML PATH(‘FilerAddress’),TYPE),
(SELECT OrgContact as ContactName,
OrgPhone as ContactPhone,
OrgEmail as ContactEmail
FROM MySQLTable1
FOR XML PATH(‘FilerContact’),TYPE)
FROM MySQLTable1 A
FOR XML PATH(‘T619’), TYPE)
FROM MySQLTable1 Z
FOR XML PATH(‘Submission’)
That worked exactly as expected and the beginning of the final file format began to take shape. I was missing namespace info on the top line of the file and of course I need the Return section in as well.
Next example: the big T5Slip section
Here is where things got interesting. I got a certain point very quickly because the core file format was similar to the above code. Again, here’s a simplified example with some additional nuances thrown in to handle null values. The table this data is from is a different table than the organizational info but what I did include was one field in common to link the two together later. I chose the FilerBN (business number) field but could have easily made up a field instead. You’ll notice too, like the CRA file, the element names are the same for some data. I did that intentionally to mimic the CRA format (and to avoid having to make up new examples!).
SELECT (SELECT RcpName as BusName1
NULLIF(RcpName2,’’) as BusName2 – ** see note 1 **
FROM MySQLTable2
WHERE RecipientType > 2 – ** see note 2 **
FOR XML PATH(‘BusinessName’),TYPE),
(SELECT RcpSurname as Surname,
RcpFirstName as GivenName,
NULLIF(RcpInitial,’’) as Initial – ** see note 1 **
FROM MySQLTable2
WHERE RecipientType < 3 – ** see note 2 **
FOR XML PATH(‘IndividualName’),TYPE),
(SELECT RcpAddr1 as Addr1,
RcpCity as City,
RcpProv as Prov,
RcpCountry as Country,
RcpPC as PC
FROM MySQLTable2
FOR XML PATH(‘RecipientAddress’),TYPE),
‘CAD’ as CurrencyID,
B.RecipientType as RcpType,
(SELECT InterestAmount as IntAmt
FROM MySQLTable2
FOR XML PATH(‘T5_Amounts’),TYPE)
FROM MySQLTable2 B
FOR XML PATH(‘T5Slip’), TYPE
Here’s how that code turned out. I had 4 “records” (slips), one per person or business that I was expecting to see. I’ve expanded 2 of the 4 to highlight specific things.
Notes from above:
- NULLIF was a simple way to correct what for me is a data issue. My empty records are empty strings, not NULLs, so if I don’t do something to hide them, I will get an empty tag, which the CRA does not like. So, on fields like a 2nd business name (where the name is too long for one field), or a person’s initial which I may not have, I used NULLIF(fieldname,’’) to indicate if the value is an empty string, treat it as null. The result, for the first T5Slip shows the Initial element is not showing on Jane Smith because she didn’t give me her middle initial.
- In order to hide an entire Parent-Child section if null, I had to use a where clause on the subquery. Without this, I would have had below an empty set of tags. The CRA schema does not allow this. If a tag is present, the minimum length check should be 1 character or more (for some fields at least). You’ll note that the first slip contains no BusinessName element and the second slip contains no IndividualName element.
- One other observation: the CRA spec indicated where an & was used, I needed to put ‘amp;’ in the file after it. Initially, I literally was going to code a REPLACE in to do that, until I realized quite by accident that SQL generates it that way for you auto-magically. Cool. One less thing to worry about! (J&S Plumbing in the 2nd slip as the example).
Another round of nested selects
Once again, I wanted to visualize this nested inside a element instead a element inside a element. That meant nesting that code above inside 3 more select statements. The one difference here compared to my first example is I was selecting from my first table that only has one record. When I selected from the same “table2” detail table I would get a many:many thing going on and far too many records. Here the example, slimmed down to ignore the details in the actual working sections within.
SELECT – Submission level
(SELECT – Return level
(SELECT – T5 level
(** T5Slip code **)
FROM MySQLTable1 D
WHERE D.FilerBN = Z.FilerBN
FOR XML PATH(‘T5’), TYPE) – D - T5level
FROM MySQLTable1 E
WHERE E.FilerBN = Z.FilerBN
FOR XML PATH(‘Return’), TYPE) – E - Return level
FROM MySQLTable1 Z
FOR XML PATH(‘Submission’) – Z - Submission level
Result:
Here’s what it would look like if I had joined section D to the details (2nd) table not the summary (1st) table: a duplication of the data 4 times in the T5 level!
Putting it altogether
This was where I thought I was stuck originally. I got to this point pretty quickly, detours aside (from my previous post) and there was a point at which I couldn’t get it all in one Submission element. As soon as I tried to combine the T619 with the Return, I got some weird format back, not XML.
It turns out I was missing something SUPER basic and that was in my T619 section in my original code (not in this post), I used the tag ELEMENTS not TYPE. Well, that was one annoying little detail I missed that caused all hell to break loose when I thought I was close to getting a file ready.
Adding the T619 and T5Summary in
Here’s the above example, now with the other 2 working sections pieced in. Italics indicates the code above is duplicated, and bold, non-italic text indicates the new parts. Basically each of the other pieces becomes a select in brackets with a comma at the end to separate it from the next “field” selected in the query.
SELECT – Submission level
(** T619 code **), -- A - T619
(SELECT – Return level
(SELECT – T5 level
_ _ (** T5Slip code **), – B - T5Slip
(** T5Summary code **) – C - T5Summary
FROM MySQLTable1 D
WHERE D.FilerBN = Z.FilerBN
FOR XML PATH(‘T5’), TYPE) – D - T5level
FROM MySQLTable1 E
WHERE E.FilerBN = Z.FilerBN
FOR XML PATH(‘Return’), TYPE) – E - Return level
FROM MySQLTable1 Z
FOR XML PATH(‘Submission’) – Z - Submission level
End result
Recap
In the end, there’s a lot going on in this post but I know it might have helped me if I had found something like this when I was working through the issues. Using XML PATH and “TYPE” throughout got me the results I needed.
Next, I’ll cover how I got namespaces to work and another around using a parser in Notepad++ to validate the file against the CRA xsd schema files.