What is a Data Dictionary anyway?

It seems to be one of those buzzwords that get thrown around too easily and too often. Fuzzy definitions are plentiful. There seems to be little consensus on what should make it into such a dictionary. We're reminded of the Buddhist at the hot-dog stand who allegedly said, "Give me one with everything!" At the same time, any actual implementations are few and far between, usually looking pretty drab compared to the original intent.

With that in mind, we asked ourselves what would a practical implementation of a data dictionary for managing clickstream analytics solutions such as Adobe Analytics and Google Analytics look like?

Recognizing that specific organizations' needs may vary, there are nevertheless some common themes on what's likely to make the final cut:

  1. Clear descriptions, categorization, and origins of business requirements.
  2. The link between requirements and the individual data points (dimensions & measures) that have been created to meet them.
  3. The mapping of data points to solution-specific variables (Adobe Analytics eVars, Google Event Actions, etc). Optionally, dependencies on any data layer elements and/or TMS rules.
  4. The URLs and user flows that trigger these solution-specific variables.
  5. The teams and team members involved in decisions and support of different aspects of the implementation.
  6. Flags that tell stakeholders if collection for individual data elements, and any downstream reporting, is in good health or not.

Items 1 through 5 are an exercise in documentation and having good discipline in keeping things up to date. But the prospect of manually verifying each data element in even a medium-size organization at any kind of sensible cadence is often daunting. In most organizations, item number 6 comes into play infrequently: once when the tags are first added, and then when (if) some data irregularity is caught. The latter usually happens only when bad data has been flowing for a while.

The Google Sheets template we have put together shows one way to structure and address the first five items. Since different organizations may have vastly different takes on documentation and governance, we've chosen an open and flexible platform that can be customized by anyone who is familiar with how spreadsheets work. Where we really aim to add value is in the enabling of ongoing automated checks for key data points and data elements through a built-in live bridge to the QA2L API.

In our free and open Data Dictionary template, automated checks come in two flavors:

  • URL Validation

You provide a set of URLs, tracking parameters and their expected values. You click the "Validate URLs" button and the Google Sheet returns which parameters passed inspection and which ones failed.
You may ask, how is that better than using sniffing tools or browser devtools? The short answer is, it's faster and easier. But there are also deeper advantages:

  1. You define the rules once and from that point on they are testable. You can run validations across multiple pages and workflows with a single click, and you can even start while tags are still being added!
  2. Validation can be performed by team members that are not well-versed in the details of analytics implementations or the tools and site actions that may (or may not) expose the respective data points.
  3. Automation ensures that the tests are performed consistently and that there is no room for human error or dalliance.

The template ships with page-level validation support for Adobe and Google Analytics. It also includes support for our special QA2L tag, which offers an easy way to interrogate the landing page for loading speed, headers, cookies, and even DOM variables. The embedded script is modifiable, allowing for more vendors.

  • Workflow Validation

For the really complex use cases (and where key tags most often break), you need to go through multiple steps: user authentication, form submission, a certain sequence of clicks to trigger the data points you want to validate. If you are an existing client of ours, you will automatically have access to all your recorded flows via the Google Sheet template. If you are not, you can request a free trial account to test-drive our visual Task Design interface along with the Google Sheets bridge.

Access to our API to perform real-time validation is free for up to 100 requests per day until further notice. You will need a secret key to unlock this functionality, which will be automatically emailed to you when you submit the form above. You will also receive a link to download the free Google Sheets add-on. 


Setup Instructions

1. Install the add-on from the direct link, then access the Add-ons menu from a Sheet of your choice. Select QA2L Connector > Create Default Tabs:

2. Navigate to the newly created QA2L Settings tab. Enter your API Secret and if you are a customer, toggle the evaluation mode cell to N:

3. You're now ready for take-off.

Here's an example of a real-time URL validation row created by populating cells A2-E2 and then running QA2L Connector > Scan URLs:

Here's an example of a Turing Flow which executes some page interactions prior to performing tag validations:

4. At any point, refer to the QA2L Connector > QA2L Help file for a complete description of all other menu options.

Subscribe to our quarterly data quality newsletter