Too Much Going On
We recently wrote a piece on the importance of starting with the KPI when devising a QA automation strategy for analytics tags.
But the reality for many who work in organizations managing dozens (nay, hundreds) of web properties is, sometimes you simply don't have the luxury to start where you should.
Analysts spread thin over many websites, often dealing with multiple dev teams and stakeholders with competing priorities, need all the help they can get when they're asked to enforce organization-wide standards and compliance.
"These are just the websites we know about..."
Unsanctioned tags fired by an under-researched social share plugin someone added, PII leaks in the latest conversion form update by a team who didn't realize their code changes impacted tracking, dataLayer defects in obscure corners of the website that go unnoticed for months (until the quarterly reports are due)—these all-too-common crises usually hit all at once, on a Monday from hell.
The Challenges of Deep Auditing
Traditional QA automation tools, for all their emphasis on "the big picture", are in fact crucially flawed when it comes to assisting in many auditing use cases that we consider foundational:
- Ability to inspect authenticated and personalized content, not just the public pages of a website.
- Ability to catch PII leaks that occur after key interactions such as submitting a form or going through a social login. That is, most PII leaks!
- Ability to not just flag unsanctioned tags and cookies, but also to help pinpoint their source.
"Turns out, all our PII Leaks were under the surface..."
The QA2L toolkit has included these key enhancements for a while now, but until recently they were only available in our full automation and scheduling UI.
Recognizing that some of our power users prefer to work in places where multiple tools can converge as needed, we are rolling out a set of new API calls that bring advanced auditing features directly into external applications like Google Sheets.
Here's a breakdown of how these features come together to meet the challenge of the 3 power-auditing use cases above:
1. Ability to inspect authenticated and personalized content, not just the public pages of a website.
Let's get into the nuts & bolts. In order to reach authenticated or personalized content, typically the session has to perform a series of actions that uniquely identify the user. Subsequent content in the session may be geared towards that user, and thus expose tags to automated QA that would not be possible to see with a "dumb" site crawler.
For a long time now, site scans and audits created in our UI have had the option to specify a "precursor" flow task that can perform any kind of interactive login, including social logins passing through third-party websites like Facebook and LinkedIn.
A similar use case is now supported entirely via the API, with the new Turing Flow method, here shown inside our free Google Sheets Add-on:
Turing Flow is our in-house term for natural-language, self-documenting tasks that involve page interactions.
In the example above, interactions are necessary to authenticate the browser into a restricted staging environment, but you can easily picture a social login instead (going across sites is fine), or even a set of actions that put the user into a certain A/B test group by grabbing a special cookie, for example.
After authentication, we specify a set of checks to be performed on a series of URLs, and we can perform the validation in real-time. We can tweak checks based on the values seen and iterate as many times as needed.
The above screenshot also showcases our ability to inspect Google Tag Manager dataLayer variables and to execute arbitrary code during our tests. In this case, we push a dataLayer event to more quickly trigger a set of DoubleClick, Adwords, and Facebook conversion tags.
2. Ability to catch PII leaks that occur after key interactions such as submitting a form or going through a social login.
Anyone who tells you that you can rest easy with page-level audits as a way to identify PII leaks is telling you a fairytale you want to hear, not the truth.
The vast majority of PII leaks take place after the user has entered some data into a website form or is returning from a login sequence with some unfortunate URL parameters that get auto-collected.
The honest truth? Automating the detection of PII leaks happening "in the deep" is not easy. But with the help of the right tools, you can at least set a close watch on all the workflows that are likely to spring a leak—those involving the end user sharing information about themselves: