Navigating Rough Data Seas

Principles For Governing Clickstream Data Quality


Governing data is like trying to steer a battered ship. You do your best to chart a course and navigate through the storms on the horizon and all the while you are ladling water out.

If you disagree, just look at some numbers. A 2017 Experian Data Quality study says executives estimate 33% of their organization's data to be inaccurate; an earlier study had shown that 92% of organizations had experienced problems because of inaccurate data.

Clickstream data is even worse - the nature and quantity of it can make it look like all you have is a thimble to empty out the flooding water. 

Whether it's Adobe Analytics, Google Analytics or your own custom tracking system, you have probably experienced the horror first-hand: wrong revenue numbers, miss-classified campaign codes, missing KPIs... More often than not, data is in shambles.

So what can you do? What are some of the ways to dress gaping wounds in the hull, steer through one rocky storm after another, and get aboard a sustainable clickstream tracking implementation?

At QA2L, we believe in these five guiding principles for a sound governance strategy to support sustainable clickstream data tracking:

1. Focus on the big picture

Your most important metrics are called Key Performance Indicators (KPIs) for a reason - they are the keys to your digital kingdom. Your tracking strategy was started with the definition of these KPIs. They measure the pulse of your business, they provide you with a direct answer to the most important question, "How is my business doing?"  

And if that's not enough, these metrics get all the eyeballs - such KPIs are discussed by everyone in your organization, from your data analysts and product managers to your C-suite and Wall Street stock analysts. So maintaining the quality of these metrics should top your agenda. 

For each KPI, your data governance dictionary should include at minimum these four artifacts: 

The last bullet leads us to the next stepping stone in your clickstream data governance strategy:

2. Scheduled (preferably automated)  audits of your flows

To better define this step, let's use an example.

One of your KPI's is the number of New Accounts created on your website. To create an account, a user needs to go to the account create page, fill out a set of form fields, and successfully submit their information. 

The exact sequence of steps that increments  the "New Accounts" metric is:  Going to account create page >> Filling out the form fields >> Successful account creation.

Your scheduled audits should be stepping through each of these steps and confirming the tags associated with the account create event. When done manually such verifications are time-consuming and subject to human error.

Automated data quality platforms allow you to build out such flows as part of your data quality strategy. The best of breed solutions offer features such as visual design of flows (no coding required), automated discovery of tags (triggered as part of page load or non-page load user interactions), robust validation logic, and real-time preview of audit results. 

This type of auditing is very different from wholesale scans of an entire domain, where the the robot is simply trying to discover a basic pattern for generic tags/elements that should be present on every page. The whole point of flow-based auditing is to be able to narrow down the audits and the results to a very specific event on the website that is directly linked to one of your KPIs. 

3.  Data Quality audits should be integrated in the Software Development Life Cycle

Automated audits of your key flows done on a schedule achieve the goal of alerting you when something breaks. But even then, you are in a reactive position - trying to fix something that may have broken without your knowledge.

What if you could prevent tags from breaking in the first place? What if your data governance strategy integrates tagging quality checks as part of the SDLC and thus prevents the majority of tag issues from surfacing in production? What if such automation could be extended not just to individual tagging vendors, but also to your data layer, bullet-proofing the foundation of all of your clickstream data?

The best data governance platforms allow you to do this by enabling features such as:

 

4. Leverage the best collaboration tools

Platforms such as Slack have revolutionized many of the team communication workflows. So wouldn't it make sense for your data governance strategy to incorporate these best of class solutions and allow your team to take advantage of the full suite of collaboration features such platforms can offer.

Leveraging collaboration platforms can include a number of benefits:

5. Keeping everyone honest

Part of your data governance strategy should include publishing a data quality score, a metric in itself that keeps the organization honest and makes any data quality issues transparent, documenting when they occurred and when they were resolved. And what if you could publish this quality score directly in tools such as Google Analytics or Adobe Analytics, where analysts will be able to retrieve it as they retrieve any other metric? The benefits of such an approach include:


So how do you account for clickstream data in your Data Governance Strategy? Curious to see how QA2L addresses these issues?

Request a Demo

Tags: Data Governance Data Quality