Navigating Rough Data Seas

Principles For Governing Clickstream Data Quality

Governing data is like trying to steer a battered ship. You do your best to chart a course and navigate through the storms on the horizon and all the while you are ladling water out.

If you disagree, just look at some numbers. A 2017 Experian Data Quality study says executives estimate 33% of their organization's data to be inaccurate; an earlier study had shown that 92% of organizations had experienced problems because of inaccurate data.

Clickstream data is even worse - the nature and quantity of it can make it look like all you have is a thimble to empty out the flooding water. 

Whether it's Adobe Analytics, Google Analytics or your own custom tracking system, you have probably experienced the horror first-hand: wrong revenue numbers, miss-classified campaign codes, missing KPIs... More often than not, data is in shambles.

So what can you do? What are some of the ways to dress gaping wounds in the hull, steer through one rocky storm after another, and get aboard a sustainable clickstream tracking implementation?

At QA2L, we believe in these five guiding principles for a sound governance strategy to support sustainable clickstream data tracking:

1. Focus on the big picture

Your most important metrics are called Key Performance Indicators (KPIs) for a reason - they are the keys to your digital kingdom. Your tracking strategy was started with the definition of these KPIs. They measure the pulse of your business, they provide you with a direct answer to the most important question, "How is my business doing?"  

And if that's not enough, these metrics get all the eyeballs - such KPIs are discussed by everyone in your organization, from your data analysts and product managers to your C-suite and Wall Street stock analysts. So maintaining the quality of these metrics should top your agenda. 

For each KPI, your data governance dictionary should include at minimum these four artifacts: 

  • A concise business definition of the KPI.
  • A technical definition desribing how this data is collected with details on what tags are used to report on such data.
  • The list (and description) of flows and your site/app that a user can take in order to complete the event(s) that will increment the KPI number.
  • Scheduled (preferably automated) audits of each of the flows in the above bullet.

The last bullet leads us to the next stepping stone in your clickstream data governance strategy:

2. Scheduled (preferably automated)  audits of your flows

To better define this step, let's use an example.

One of your KPI's is the number of New Accounts created on your website. To create an account, a user needs to go to the account create page, fill out a set of form fields, and successfully submit their information. 

The exact sequence of steps that increments  the "New Accounts" metric is:  Going to account create page >> Filling out the form fields >> Successful account creation.

Your scheduled audits should be stepping through each of these steps and confirming the tags associated with the account create event. When done manually such verifications are time-consuming and subject to human error.

Automated data quality platforms allow you to build out such flows as part of your data quality strategy. The best of breed solutions offer features such as visual design of flows (no coding required), automated discovery of tags (triggered as part of page load or non-page load user interactions), robust validation logic, and real-time preview of audit results. 

This type of auditing is very different from wholesale scans of an entire domain, where the the robot is simply trying to discover a basic pattern for generic tags/elements that should be present on every page. The whole point of flow-based auditing is to be able to narrow down the audits and the results to a very specific event on the website that is directly linked to one of your KPIs. 

3.  Data Quality audits should be integrated in the Software Development Life Cycle

Automated audits of your key flows done on a schedule achieve the goal of alerting you when something breaks. But even then, you are in a reactive position - trying to fix something that may have broken without your knowledge.

What if you could prevent tags from breaking in the first place? What if your data governance strategy integrates tagging quality checks as part of the SDLC and thus prevents the majority of tag issues from surfacing in production? What if such automation could be extended not just to individual tagging vendors, but also to your data layer, bullet-proofing the foundation of all of your clickstream data?

The best data governance platforms allow you to do this by enabling features such as:

  • Documenting business requirements and tagging specifications directly as part of your automated data quality checks, 
  • Running checks in your QA or Dev environments
  • Providing the necessary APIs for your code repositories to connect and automate the checking of tags before they go into production. 


4. Leverage the best collaboration tools

Platforms such as Slack have revolutionized many of the team communication workflows. So wouldn't it make sense for your data governance strategy to incorporate these best of class solutions and allow your team to take advantage of the full suite of collaboration features such platforms can offer.

Leveraging collaboration platforms can include a number of benefits:

  • Receiving timely updates in any Slack channel when something breaks
  • Checking at a glance the list of steps and the KPIs that have been audited
  • Managing you auditing tasks directly from your Slack channel.

5. Keeping everyone honest

Part of your data governance strategy should include publishing a data quality score, a metric in itself that keeps the organization honest and makes any data quality issues transparent, documenting when they occurred and when they were resolved. And what if you could publish this quality score directly in tools such as Google Analytics or Adobe Analytics, where analysts will be able to retrieve it as they retrieve any other metric? The benefits of such an approach include:

  • Providing your analysts with a direct gauge for the quality of the data as they work with the data. No longer would your colleagues need to ask if the the metric is correct. The audit results will be right in front of them telling them that a metric passed 7 of the 7 audits for that week, or that a dimension started failing its audits after the 29th day of the month.
  • The audit score would get updated in real time as the audits take place, presenting the analysts with a living history of what the audit score is at different times for different reports. As a result, when dealing with historical data (that has been audited) analysts will know if there are any necessary adjustments for tagging insufficiencies they need to account for without having to reference history of emails or JIRA tickets where issues such as these are usually documented.
  • Perhaps the greatest benefit is that the quality of the data will be consistently and objectively communicated on a company level. It is a fundamental change in the perception of tagging quality. Data quality becomes a KPI in itself, a metric that is tracked over time with the potential to build a strategy around addressing the reasons that lead to gaps in tagging quality.

So how do you account for clickstream data in your Data Governance Strategy? Curious to see how QA2L addresses these issues?

Tags: Data Governance Data Quality

Subscribe to our quarterly data quality newsletter