Fun with XML, part 1

This is going to be a short series of posts (short series, not short posts!) about my journey with creating an XML file for T5 filing with the Canada Revenue Agency (CRA for short). So many things happened in the last couple of weeks on this, that there were plenty of lessons learned to share and stories to tell! The story overall was one of a lot of frustration at times but in the end, a lot of pride in figuring things out with a deadline looming and ultimately, successfully creating an XML file to submit on time for our T5’s.

Background

The short version is a T5 form needs to be issued for investment income paid out to a business or individual in a given tax year once the total for the year exceeds $50. For businesses like the one I work for, we pay interest income on deposits, and we issue T5 forms every year. The vast majority of the forms we file are generated from interest paid out via our customer information system (CIS for short) (against security deposits on hydro accounts) and thankfully, that system has a function to generate a T5 XML file for the CRA. Oddly enough, it doesn’t print them properly so we have a separate “issue” each year of printing on pre-printed T5 forms. Any time you’re printing on pre-printed forms, it makes for all sorts of fun. But I digress…

What a T5 form looks like, typically 3 up on a legal sized piece of paper

There is another class of deposits that fall outside of the realm of our “hydro” business that incur interest payments and for those, there is no system to track the interest or generate the T5’s automatically. In previous years, there haven’t been that many of them and we have been able to file them via a Web Forms application the CRA provides. “Filing” in this case meant typing each T5 in to a system manually, but the number of forms was relatively minimal such that it wasn’t much of a burden.

Fast forward to this year and before the story gets too much longer: this year we were going have dozens of T5’s to file, and more than 100 in fact, which is the threshold at which a filer must submit the T5’s electronically, via XML. That’s where the story begins…

The Journey

Here is the relatively short version of the journey I went through, with plenty of self-doubt and questioning my assumptions along the way. If nothing else, this may serve as a warning to others (LOL!).

How hard can this be?

I remember thinking that exact thought when I was first alerted to “the issue” (the issue being “OMG we need to create an XML file for our T5’s this year”). In fact, it seemed so straight-forward that I sat on it for a week while I got to a logical breaking point on another development project I was working on before I even contemplated the details of this task.

That was my first mistake. If I ever say “how hard can this be?” for another thing in my lifetime, I hope someone reminds me of this and makes me re-live this experience again to humble me back into submission.

It’s got to be possible in Excel right?

As mentioned above, our CIS system has the functionality to generate an XML file in the proper format and we’ve submitted that for years for those T5’s. Since we could never print straight from that product though, a few years ago I macgyver’d a mail merge template in Word (of all things) to read the XML file in Excel and mail merge create the T5 forms. It’s worked like a charm for 3 years despite not being super fancy. Fitting things on a pre-printed form using Word is no more fun than it sounds but miraculously, it’s survived 3 year ends in a row including this year.

Well, back in my “how hard can this be” days, I figured Excel has some XML import/export functions and I assumed it would be a no brainer to mimic the file format from our CIS and create the XML from Excel.

You should have seen the look on my face when I realized it actually wasn’t as easy as I first thought. “Oh shit” came to mind, but with worse choice of words mixed in there. There were a couple of errors I got, either this one that says “Cannot save or export XML data. The XML maps in this workbook are not exportable.” or a sister message to it (which I don’t have a screenshot for) relating to there being “denormalized data” and thus it wouldn’t export either.

Let’s create a SQL query

Since Excel was out of the equation (or so I thought, I will come back to Excel later), my natural instinct as a non-developer was to use SQL and create a process around it. I consider my SQL skills fairly strong, albeit not creating XML specifically, but even still, I had the confidence I could figure it out. I found several examples of SQL queries online which guided me to a certain point, and dove into learning all about FOR XML.

SQL worked well up to a point and then I got stuck. I was able to figure out the “easy” part of the file, a T619 record that contains information about our company as the “filer” of the forms. I had examples from my searches on how to get to the more complex piece but the multi-level nested XML was giving me a headache.

Stop & re-evaluate what you know

At this point in the journey, I started keeping “score” like it was a game. It’s a coping mechanism, what can I say?

XML 2, Jen 0.

Each time I ran into a stumbling block, I forced myself to stop and re-evaluate the end game. It was starting to take a lot more time than I originally envisioned when I decided to take this on myself (vs. farm the work out to a developer).

First, I needed to re-confirm some assumptions, the biggest one was: “Is this a one-time event or will we have >100 T5 forms for the foreseeable future?” If the answer was “last year was an anomaly and we’ll never need to generate anywhere near 100 forms ever again”, then I would have stopped and the focus would then be “what’s the quickest, dirtiest way to create a file and be done with it?”. However, the answer was we are more than likely going to have this many for a while so that confirmed my approach to create a proper end to end process around this that we can re-use for several years (but only build it once).

What about SmartConnect?

Once I had confirmed that, the direction was clear(er): work on a repeatable process. It was then that I remembered SmartConnect from eOne Solutions has an XML Destination option. BEAUTIFUL, I thought to myself. From a process perspective, I’d already mapped out that the users, who have the recipients and interest amounts in Excel, would kick the process off by running a SmartConnect map to load the data to a staging table in SQL. So, having a second map that they could run to create the XML file itself would have been brilliant.

Alas, the file format proved to be too complex for SmartConnect. I won’t go into details on this one. I imported the sample XML file I mocked up, mapped the fields and it was SO EASY, but when I ran the map, I got this for each record in my source: (“SmartConnect does not support XML destination output where parent nodes are unmapped or null.”).

XML 3, Jen 0.

Cue the sad trombone. It seemed SO promising, and would have been a perfect solution, process-wise. The file format would possibly contain areas that are unmapped – it has sections for individuals and businesses so on every record one or the other would be unmapped and ideally the XML file produced would ignore it. Unfortunately SmartConnect couldn’t handle it being unmapped. Bummer.

SQL take 2

I was back looking at the SQL query again and whatever point I was at, I was stuck once again after making a very slight bit of progress from where I had left off.

Stop & re-evaluate once again

Just like earlier in this post, when I got to a seemingly insurmountable road block, I would have another moment of self doubt. Did I mention that T5 filings are due on Feb 28th each year? The days were rapidly disappearing as I worked on this and what seemed like plenty of time was relatively quickly becoming “OMG!”. I started to second-guess that I made the right decision to keep the development in my own hands, although I had made great progress and really should have had more confidence in my abilities. Still, it seemed like I was spinning my wheels so once again I stopped and looked at my options.

SmartConnect was failing because of unmapped parent nodes, which at the time I associated with my data having some individuals (with null/empty business sections) and vice versa on the business side. I went down a brief side road option of splitting my task into two parts. There was no requirement for all of our T5’s to be in one file so why not think outside the box and create a file for individuals and a file for businesses if that is more feasible given the tools I have at my disposal? Who cares right? The CRA wouldn’t have an issue with it, we already had been filing 2 submissions per year all this time anyway.

SmartConnect take 2

This didn’t take too long in terms of detours. I was learning that if I wanted to check the feasibility of a different approach, I was finding the shortest way possible to evaluate if it was viable or not so I wouldn’t get too far into the weeds on any one idea.

I found the second kick at the SmartConnect “can” also failed. The secondary issue was the CRA file format repeats node names throughout the file like address fields for instance. SmartConnect couldn’t handle the same node names twice in different parts of the file even though they fell in different sections under different parent nodes. Time to put this avenue to rest.

Excel take 2

XML 4, Jen 0.

At this point, it was late last week and I was hitting the “phone a friend” stage. I reached out on Twitter to my normally super useful followers and friends and once again fellow MVP Steve Endow came to my rescue with a couple of tips from some things he found about Excel that looked like avenues I hadn’t yet explored. We briefly discussed the feasibility of him creating a .Net app for this but I felt that the target market for this would be too small for him to waste his time. In brief, while many organizations need to create T5’s, I figured the vast majority use software that can produce what is needed or they are too small to require software at all and use the CRA web forms site. The number of those in between firms that are large enough to have >100 forms but not have software seemed like a sweet spot that would be hard to capture.

Anyway, to make a long story a tiny bit shorter, what Steve sent me looked SUPER promising and so much so that I jinxed myself by celebrating early on Friday that I had found the solution. I checked the file format and it was using headings not data values for the company info but the data section was perfect and I was thrilled. Minor changes, so I thought, and I’d be off to the races.

No so fast there Jen…

Once I changed the mapping on those company fields, I was right back to the “you have denormalized data” and XML was not exportable. FFS.

XML 5, Jen 0.

SQL take 3

At this point, I was getting more stressed but intent on finishing this up even if it meant working all weekend on my own time to figure it out. I wasn’t about to let this thing get the better of me. I’m nothing if not persistent… sometimes to my detriment.

Success!

I worked the rest of Friday afternoon on the file and still was stumbling and felt like I was going in circles on the same damn problem over and over. I’ll get into the technical details on another post but I had successfully created a working “T619” section and a working “Return” section but for the life of me, I could not figure out how to combine the 2 sections together under one parent “Submission” node. I went home, I played with it some more, I went to play my regular Friday night hockey game, I came back home and played with it some more. Wouldn’t you know it, at 11:30 at night I figured out that I had ONE DAMN WORD WRONG in my syntax and once I saw it (comparing to the example I was basing my code on), it worked! Like holy shit, it’s actually working! OMG! I can sleep now! (and I must say I had the best night’s sleep that night!)

Filing with the CRA failed

I left the code alone all weekend, now that the panic was over. I had some minor things to sort out (and still do) but nothing that was preventing us from filing. 3 hours after filing yesterday morning, the dreaded email came back that it failed validation. It looked perfect to me, and yet, it wasn’t. I will get into more about this in a post about using a parser to validate against the XSD schema. Suffice to say, the CRA format does not like empty tags, If the tag is present, the validation expects a minimum value length of 1 character. Dumb, since most apps know to ignore empty tags but it’s their file… and that was relatively easy to fix. After working through checking my file against the CRA XSD schemas, I finally got the message I was looking for. We re-filed and all is good.

Next posts in the series…

So, this post was about the story and the journey but no technical details. If you’re on this post because a search hit some keywords, rest assured I intend to finish up with 2-3 more posts about some of the technical details. I will update this post with the related posts once I write them.

Roughly, here is what I plan to write about:

There may be more than that, but at this point, that’s the logical next few posts (perhaps not in that exact order). Stay tuned!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to top