Supercharge Your Tag & Compliance Audits With This Toolkit

Too Much Going On

We recently wrote a piece on the importance of starting with the KPI when devising a QA automation strategy for analytics tags.

But the reality for many who work in organizations managing dozens (nay, hundreds) of web properties is, sometimes you simply don't have the luxury to start where you should.

Analysts spread thin over many websites, often dealing with multiple dev teams and stakeholders with competing priorities, need all the help they can get when they're asked to enforce organization-wide standards and compliance.

"These are just the websites we know about..."

Unsanctioned tags fired by an under-researched social share plugin someone added, PII leaks in the latest conversion form update by a team who didn't realize their code changes impacted tracking, dataLayer defects in obscure corners of the website that go unnoticed for months (until the quarterly reports are due)—these all-too-common crises usually hit all at once, on a Monday from hell.

The Challenges of Deep Auditing

Traditional QA automation tools, for all their emphasis on "the big picture", are in fact crucially flawed when it comes to assisting in many auditing use cases that we consider foundational:

Ability to inspect authenticated and personalized content, not just the public pages of a website.
Ability to catch PII leaks that occur after key interactions such as submitting a form or going through a social login. That is, most PII leaks!
Ability to not just flag unsanctioned tags and cookies, but also to help pinpoint their source.

"Turns out, all our PII Leaks were under the surface..."

Our Answers

The QA2L toolkit has included these key enhancements for a while now, but until recently they were only available in our full automation and scheduling UI.

Recognizing that some of our power users prefer to work in places where multiple tools can converge as needed, we are rolling out a set of new API calls that bring advanced auditing features directly into external applications like Google Sheets.

Here's a breakdown of how these features come together to meet the challenge of the 3 power-auditing use cases above:

1. Ability to inspect authenticated and personalized content, not just the public pages of a website.

Let's get into the nuts & bolts. In order to reach authenticated or personalized content, typically the session has to perform a series of actions that uniquely identify the user. Subsequent content in the session may be geared towards that user, and thus expose tags to automated QA that would not be possible to see with a "dumb" site crawler.

For a long time now, site scans and audits created in our UI have had the option to specify a "precursor" flow task that can perform any kind of interactive login, including social logins passing through third-party websites like Facebook and LinkedIn.

A similar use case is now supported entirely via the API, with the new Turing Flow method, here shown inside our free Google Sheets Add-on:

Turing Flow is our in-house term for natural-language, self-documenting tasks that involve page interactions.

In the example above, interactions are necessary to authenticate the browser into a restricted staging environment, but you can easily picture a social login instead (going across sites is fine), or even a set of actions that put the user into a certain A/B test group by grabbing a special cookie, for example.

After authentication, we specify a set of checks to be performed on a series of URLs, and we can perform the validation in real-time. We can tweak checks based on the values seen and iterate as many times as needed.

The above screenshot also showcases our ability to inspect Google Tag Manager dataLayer variables and to execute arbitrary code during our tests. In this case, we push a dataLayer event to more quickly trigger a set of DoubleClick, Adwords, and Facebook conversion tags.

2. Ability to catch PII leaks that occur after key interactions such as submitting a form or going through a social login.

Anyone who tells you that you can rest easy with page-level audits as a way to identify PII leaks is telling you a fairytale you want to hear, not the truth.

The vast majority of PII leaks take place after the user has entered some data into a website form or is returning from a login sequence with some unfortunate URL parameters that get auto-collected.

The honest truth? Automating the detection of PII leaks happening "in the deep" is not easy. But with the help of the right tools, you can at least set a close watch on all the workflows that are likely to spring a leak—those involving the end user sharing information about themselves:

3. Ability to not just flag unsanctioned tags and cookies, but also to help pinpoint their source.

We like looking at tag hierarchies as much as the next analyst. They're pretty. Unfortunately, they can also be pretty mysterious:

Occasionally, hierarchies will show rogue tags that shouldn't be there. But tracking the exact source of those tags is never simple and can consume a lot of time if the auditing & pruning needs to be done at scale.

We recently had the opportunity to help a client with this exact challenge, and did so inside a Google Sheets workflow that the client was comfortable using, publishing internally, and rolling up into their existing organizational documentation.

1. First, QA2L enabled them to scan a wide range of URLs and collect all the tags and cookies, identifying AddThis as one of the unsanctioned vendors:

2. Unfortunately, it wasn't at all obvious how AddThis was even ending up on those pages. For that same set of URLs, our client was able to re-run the scans while blocking individual JS scripts from loading in our testing browser:

After a few quick rounds, they were able to identify the source of both the unwanted tags and cookies. It was the one that, when blocked, lopped off an entire branch in their tag tree and cleaned up their cookie inventory to show only secure, server-side cookies across all URLs placed under the microscope.

For the techies in our audience, rest assured, we do have other ways of blocking and altering page content dynamically to help track down rogue code. We can also help you perform all kinds of other detailed validation tasks, such as checking cookies set by response headers, monitoring page load speeds, checking for tags hardcoded on the page, etc.

These are just some of the big differences between doing these kinds of audits manually with a tool like Fiddler and using the QA2L API:

We make the process a lot more scalable and streamlined, while also flexible and responsive to changing requirements.
We help you build up a library of reusable tests, which also doubles as your (automated) tag documentation!
We make validation user-friendly so that even junior resources in your organization can be empowered to spearhead complex audits.

In Closing

Auditing tags for compliance at scale is a formidable challenge in all mid-to-large organizations with multiple web properties. You could use a strong ally, and an ally who is also a veteran is even better.

Some tag audit automation platforms tout shallow scans as the be-all-end-all of governance and compliance. Nothing could be further from the truth, or more dangerous to organizations who take compliance seriously.

We, on the other hand, aim to solve the real-world challenges of automating audits and compliance checks. Our recent API enhancements paired with our ever-evolving Google Sheets Add-on are the latest arrows in the unique QA2L quiver to help organizations govern their tracking at scale.

Curious to see any of this in action?

Or simply

Tags: Data Governance Data Quality Product News

Subscribe to our quarterly data quality newsletter: