Discussing Data Layers with Jim Gordon

Jim Gordon is a veteran in the data analytics industry, an analytics technologist, and a product builder. He is best known for creating Tagtician and the Satellite/DTM/Launch cheat sheets. 

We sat down with Jim to discuss a range of topics, including the legacy burden Tag Management Systems continue to carry, Jim's quest for popularizing the Event-Driven Data Layer (EDDL) as a logical improvement over the W3C Customer Experience Digital Data Layer (CEDDL), the future of tag management systems, and the continued importance of talking to your developers. 


You are known in the industry for creating Tagtician and the DTM cheat sheets. What prompted you to work on these projects?
 

JG: I worked at Search Discovery (I actually just started working with the great folks at Search Discovery again) when they were building Satellite, the precursor of DTM, which has now evolved into Adobe Launch. I was working with the Engineering team, running tech support, conducting implementations, and was one of the key people for provisioning, conducting demos, and support. While there, I created the original Satellite cheat sheet and it included common functions that I used. A lot of the cheat sheet was jQuery, which I barely use any more because that library is being deprecated more or less. But there were some other functions, which I did not want to memorize, and I wanted to be easy for people to copy and paste, especially for people who are new to the tool. It is a lot easier to have a resource containing all of these functions, so I don’t have to keep looking them up. So I created this one sheet that I can print out and put on my cubicle wall and reference. 

Tagtician was created because I was concerned about people’s ownership of their implementation. When you house your implementation in a Tag Management System--whether owned by Google, Adobe, or somebody else--it is rooting you that much deeper into the vendor's ecosystem. I felt I had less ownership over my implementation, I could not export all my tags easily, I could not have a living document. I did not have much control and whatever I put into the TMS, I was not able to get back out easily, especially if I wanted to switch platforms. That did not really sit well with me, so I created Tagtician. 

Originally, it was intended to be a library viewer. I wanted to learn from other people’s implementations. And then we added debugging, so you could see what rules are firing and what is in the rules. We also added search features, which are really popular since they unlock capabilities that are normally not available within the DTM/Launch interface.

We have approximately 6000 weekly active users and we are probably going to roll out a feature that will pull in third-party scripts, most likely as a toggle-able option.

 

For a long time Tag Management Systems were being sold as the ultimate cure for all data collection issues. One of the key (but false) selling points was the promise that the TMS will reduce the dependencies from Dev teams. What are some of the main misconceptions about data collection that you still see in the marketplace?

JG: One hundred percent! I was part of that problem in selling Tag Management systems that way. 

Eight years ago, tag management systems were a new concept and were seen as a vulnerability by a lot of folks. A lot of times you were trying to justify the sale to Development teams and the Tech teams. The TMS is a javascript container where one can deploy anything, so vendors erred on the side of doing a lot of things in the UI. Many tools--Tealium, Ensighten, Tagman, Signal--had a lot of these templates, where you are kept within these walls, you were kept on rails. You couldn’t just deploy any javascript. Technically, deploying JavaScript was possible, but it was done in a way to make people feel safer. All of the tag management was done within the TMS interface. Since we have have jQuery selectors and CSS selectors, you wouldn’t have to deploy a single line of code. It was easy to describe to developers, and that made it easy to sell. And these teams are thinking about the bottom line: how much time is this going to save? If we came in and said, "Oh, by the way you have to implement code just like you did with GA/AA, inline, on the page" they would have said "Hell no, we are not doing it.” So at the time, we had to really sell the tool as something that does not require any additional implementation. And actually, there is new technology coming out that is using the same sales pitch, it is so cyclical.

So ultimately, we set that foundation for those misconceptions. And the development teams, who want to do what’s best, were sold these tools under the pretense that this will reduce the amount of time they were spending micromanaging analytics. 

Tag Manager Rules Save a Website Tracked "the Old Way"
William-Adolphe Bouguereau, 1878
(Inspired by Classic Programmer Paintings)


We need to educate our development teams and help them understand why we do the things we do. I have a recent blog post on why you need a data layer. It goes into the value proposition of why we are doing what we are doing and the importance of a data layer. I also make the claim that we should no longer use the legacy W3C Customer Experience Digital Data Layer (CEDDL), but instead an event-driven data layer (EDDL).

On the plus side, a lot of organizations weren’t using a data layer, so they can start from scratch. In terms of tag management and current misconceptions, I think that all stems from how we originally sold it: "Hey it is an easy thing, your analytics team or front-end designer could go in there and just pop in the tags and you are good to go."

In reality, things have gotten more complex since then. We can get more data from users (within reason), we are tracking more stuff, we are using more tools - it is not just AdWords and Google Analytics. You have 20 different marketing pixels, all requiring different formats of data, and as the world becomes more complex and we have a more diverse set of tools, we need better ways to ingest that data. Ultimately, one of the biggest problems is education. 

As analysts, we have accepted that the cost of doing business is that the data isn’t pure, the data does not look perfect. We have reached a point where we are spending way too much time transforming the data from CSS selectors, or losing way too much data from site changes as Dev cycles speed up, and people need to implement a data layer. There are very few cases where someone does not need a data layer.

 

There are may ongoing issues with data quality. Data layers can alleviate a lot of the issues that you see if you are relying on CSS selectors, but are data layers bullet proof to data quality collection issues? What are some of the more common issues you see?

JG: Data Layers are not bulletproof. If there is zero structure around your data layers, of course things are going to fall apart. A lot of factors can affect it: staff turnover, tag density, there is definitely such a thing as collecting too much data. If you are tagging every single element on a page, developers are going to say, "Hell no, I can’t prioritize this, this is ridiculous, this is too much.” So there is a balance to everything.

It starts with education, like I said before: we need to educate why we are doing what we are doing and figure out the business justification for everything that we add to a data layer.

For instance, if I want to be able to track email opt-ins, 10 years ago I might have told you that one way to do it would be to track it onclick of the submit button. That’s not going to be accurate, because I may not have put in a valid email, or I may click it 50 times and that will inflate the numbers. You can serialize it, or look at it on a session or a visitor basis, but still it is noisy and it adds server calls.

Another way I may have done this is on form submit, which is a little better. Form submit still relies on a form ID or some sort of CSS selectors where it tracks when someone has submitted the form. That’s a lot more reliable, but it is still error prone, especially if you transition between different form types. Sometimes teams don’t feel it is necessary to tell analytics that they switched from form engine A to form engine B, especially since it looks the exact same way and functions the exact same way. So we are still susceptible to site changes and deployments screwing up your tags.

Another way: you could do URL destination (going from URL A to URL B) which is pretty reliable, but email signups are is usually done in AJAX, so you don’t have the URL transition. You could build a rule if the element exists or if it is visible on your screen. I have done that before and I don’t recommend anyone doing that.

We know that you have submitted your email when you see the "Thank You" page on your screen, the best way to do it is to integrate the tracking with your form validation system. It might still be susceptible to changes because, for example, switching form engines may result in losing the data layer code, whether it is done through dataLayer.push() or _satellite.track(), but we have business rules that are in place to make it less prone to error. 

 

The Death of an Implementation Caused by Changing CSS Selectors
Francois-Joseph Navez, 1824
(Inspired by Classic Programmer Paintings)

 

What are some of the challenges of being able to properly validate data layers?

JG: The biggest challenge, in particular with Adobe Launch, is that there is no standard data layer. The documentation Adobe provides is to use W3C in combination with either JavaScript events or direct call rules (functions that can send JSON objects to DTM/Launch). The problem is that this approach is tricky to QA, it does not push the elements to a global object. You have to write custom code like a static JSON object in your header. 

We are mixing multiple methodologies when implementing the data layer, which makes it really difficult to prioritize. As an analyst, I have to tell the Dev team "Hey, folks - we are going to do this data layer thing, and we want to paste in this big ugly tree of elements that fires above our DTM or Launch container and also we are gonna need to track events. You are going to have to use this _satellite.track() thing or you could use custom events”. Maybe you even ask them for their advice, what they’d recommend and they are like “We have no idea what a data layer is.” Ultimately, it dies before it even starts in some cases, because it sounds difficult. The hardest part is getting a data layer implemented in the first place.

In terms of QA, there is a lot of sequencing issues that people have with data layers, making sure the data is available and ready before the beacons fire. Once the data is there, obviously you have to worry not just if it is there but, when it is there before you send the data to your reporting platforms. 

Race Conditions in Data Elements in W3C CEDDL
Salvador Dali, 1931
(Inspired by Classic Programmer Paintings)

 

Once an organization has a data layer, does QA become more difficult because you might have to QA in two different places, once at the data layer level and perhaps a second time to inspect the actual tracking requests that go out to the different MarTech vendors

The right way to do it is all of your pixels get their information from the data layer. So if one platform isn’t getting the data, all I have to do is look in the data layer. It is centralized. If something isn’t coming through correctly, I know where to identify it, because it is all from a single source. If we were implementing everything inline - let’s say I have five developers working on five different business units - each of them might treat values differently. With a data layer, the right way to do it would be to have a centralized library defining what all those values mean. If handed off to a new developer, there is at least an initial validation of what’s being set in the data layer. The wrong way to do it would be to do everything ad hoc or bespoke, and not use that central library and not set any validation parameters around the data layer. If nothing is consistent, you might as well not have a data layer. 

 

There is a common pitfall that sometimes developers fall into: you tell them, "The data layer is your source of truth, it is the most robust source of meta data," but the way they implement it, just because it was easier for them, they are doing some scraping in order to put some values in the data layer. And it defies the whole purpose of a data layer. In reality, you are introducing fragility at an even lower level, which is even harder to debug.

JG: That’s exactly right. I can’t tell you how many times we have introduced the concept of a data layer to a developer and they say they get it and then they put a container with a bunch of jQuery on the site. It’s like, “Whoa, whoa... wait a second, that’s not a data layer!" And that’s why education is so important. Data layers can be an abstract concept. It is not abstract to us because we live in it, but it is abstract to folks who don’t live in the analytics world.

 

Typically in a data layer driven implementation, the kinds of tracking defects that come about have to do with the value of a data layer variable changing for some unknown reason, outside of what you might expect, rather than having too much tracking or something disappearing completely. Have you seen this happen?

JG: I have seen it happen, certainly. If you are using something like QA2L or running your own unit test, you will run into this less frequently. But yes, any time anything is touched by a person, there is an opportunity for an error. One misconception with data layers is just because you have one, it means that everything is OK, and you never have to QA anything again. That’s not the case. That is far from the case. You have to QA less than someone who uses CSS selectors. The variability will be a lot lower, but there is absolutely still variability. If someone says a data layer will solve all of their problems… that's probably a fairly ridiculous thing to say.

 

One of the things that can be a little scary about data layers sometimes is when they are in the hands of a developer who does not really understand what the tracking is supposed to do. And you write the specs and you tell them these data layer events should be sent at these times and you don’t really go into the reasons why and you realize a couple of months later they have been on their own making changes to the data layer thinking that now that you have set up the rules and all the tracking seems to go from the data layer into the reporting, they sort of try to do things on their own because they can do that. With a data layer push they can fire a tracking event. So they have the power, but not necessarily the knowledge and understanding of why we are doing things a certain way.

JG: It is aways important to instill mutual accountability. Everyone has a resume, you have a resume, I have a resume, developers have a resume. While throughput is nice for a developer, it would surely be nice if they had some kind of a business outcome associated with what they are doing, as opposed to, “Hey, I implemented 200 lines of code in 200 seconds.” While that’s nice, it is not gonna get you hired at Apple.

What people often underestimate is that everyone wants to feel important, everyone wants to feel like what they are doing is meaningful. The more we partner with developers and other teams, the more they will be invested in the process, your process. Often times, we are the analysts behind the curtain doing this magical stuff and eventually an analysis comes out, or we send out a report, which means literally nothing to most developers.

This is why education is so important and helping folks understand what a data layer is and why it is important will ensure that they know how it impacts the business. Analysts may often not think past what’s on their plate, and that’s natural, it's human nature. But in order to inspire people, you have to help people understand why we are doing the things we are doing. The developers I have spoken to really appreciated and bought into our initiatives after explaining how it is going to help the business. It is a good exercise for analysts: don’t just implement a data layer to implement a data layer. You, as an analyst, have to understand how it impacts the business, because that affects prioritization. As a product owner, I am not going to prioritize your work if I don’t think it is going to help us make more money. And the whole organization needs to know that what you are doing is helping them make more money.

Some time ago, I talked to a client’s analytics team and asked them, “Hey, what’s going on?” and they said “The developers...” (I hate saying the word “the developers” - it feels so soloist), “...the developers are not prioritizing our work, they say they are going to do this at this time, but it is just not getting done”. And I asked them “Well, have you talked to them?" They said “Yeah, we sent out an email a couple of times this week,” and I said “No, have you TALKED to them, like a human to a human?”  Their response was, “No, I guess we could.” So we found out where they sat and went over and actually talked to the developers. And the developers were, “Oh, I get what you are doing, this makes a lot more sense, I can bang this out real quick." And they actually got it done while we were sitting up there, hovering over their shoulder. We were very self aware, saying, “Hey, we get that we are hovering over your shoulder, let us know if you want us to go away,” they got it done on the spot.

 

 ...They liked the attention

JG: They did. Sometimes our jobs can be very lonely. We are all working with humans and we all need to feel like we are doing meaningful work. And if we don’t help make their work more meaningful, then there is not a heck of a lot of incentive to do it. I know I am not incentivized when my work is not meaningful. I don’t feel like what I am doing is actually making a difference, I am probably going to reprioritize it and push it off and procrastinate. Because I am not going to see the outcome of it. And we work in an industry that is 100% dependent on creating outcomes. It has never failed talking to developers and explaining why we are doing what we are doing. If you fail, it is not a failure on your developers, it is a failure on our end in justifying the value.

 

How often do you see organizations iterating on a structure of data layer?

It varies wildly organization to organization. Especially if you are wanting to tag a lot of stuff with a data layer, you are prioritizing different phases of a data layer implementation, so more often it is broken out into phase 1,  phase 2, phase 3. They are changing the data layer in each of these phases, so you are exposing yourself to something being not accurate or something breaking. We are working with humans, there is the capacity to make mistakes, on our end and the developers' end.

I have worked with clients that have put up a data layer and then just never changed the site again, but odds are if you are in an agile organization and you have two-week time boxes and you are releasing something every two weeks, you are updating the data layer pretty frequently, perhaps once a month. If you are updating it once a week or once every other week, there might be a process issue going on in there.

It also depends on how many business units you are working with. At a really large organization, things are constantly changing and business units are operating independently on their own schedules. 

I have worked on almost every vertical, I think the verticals that change the least are typically B2B. And I am probably going to get someone from Salesforce saying, “Hey, we change things all the time,” but typically B2B moves the slowest because they have these really long sales cycles, not constantly pushing out promotions. When there are fewer promotions, I’d say we are modifying data layers less.

Surprisingly, eComm is probably in the middle. The most variability you will see is with account information where you typically collect information about a user if they are registered, but they are not constantly updating preferences and profiles.

A social media website will have constant updates, because the features you are implementing are in an effort to always learn something new about the user, not necessarily to promote a deal which is a little bit more modular. A social media site might be pushing out new features that help get a deeper sense into the persona, because that’s how we make money - collecting accounts, understanding the user and selling advertising. I think that’s where you will see the most updates to a data layer. But, granted, we are talking about a huge spectrum.

 

Showcasing the Benefits of a Data Layer in front of the Executive Committee
Hug Kauffmann, 1866

(Inspired by Classic Programmer Paintings)

 

You have seen different industry types in terms of their implementation and we know ideally everything should be data-layer driven, but for a lot of implementations, that may not be achievable and a lot of the time for a very good reason. For example, nowadays you can do a lot with CSS selectors that are specially added to the page for tracking purposes. These are special attributes that provide an easy way for the developers to build something robust and then you use your listeners and all the logic from the TMS to fire off certain onclick conversion tags. In your observation, what is the typical ratio of event-driven or listener-driven calls vs. something that is driven purely by a data layer.

JG: So how many are CSS-based vs. data layer ... there is a ton of variation. When I first started writing about data layers, I got a lot of comments from consultants saying, “You are a million miles behind the curve on this one, we have known about data layers, all of my clients know about data layers, why are you writing about data layers?” That was from consultants who work with organizations that raise their hand and say “We want to be better with data” and maybe these are just departments in organizations or business units, not necessarily the entire organization. So consultants think everyone has a data layer.

When I was working at VML, we had  a lot of creative driven scopes and almost no one had a data layer and it was mind boggling to me. I worked on dozens of accounts and I can count maybe on one hand how many of them had a data layer. For consultants this is old news because they are working with people who have already recognized the value of data, identified the need for something like a data layer, and who are hiring consultants to build out that data layer. But I’d say that’s a minority.

I think there is an opportunity to transition people, especially in Adobe Launch/DTM, onto an event-driven data layer. And I recommend reading the blog post on EDDL (Event-Driven Data Layer) vs. CEDDL (Customer Experience Digital Data Layer), because a lot of folks don’t know what the difference is. 

 

What are some future trends in the development of data layers and the tag management as a whole?

JG: EDDL is the future. Period. Google Tag Manager has an event-driven data layer and it is very good and I did not appreciate it fully until the last year or two. I was like, “Holy Cow, this is what a data layer should be.” Before that I was thinking, "What is this jenky array that I have to deal with?" but the event-driven data layer is the future. With EDDL you don’t have the same dependencies as the W3C recommendation where they recommend a lot about the schema and do not focus on the vehicle in which you send the data. W3C is more about what your data should look like. It was a good start, but we have matured past that. And I think Adobe needs to update its documentation to start talking about EDDL. They either need to build their own version of an EDDL, or start promoting Search Discovery’s Data Layer Manager because that’s an EDDL. I think ultimately Adobe will need to own the data layer, in my opinion.

For the future of tag management - GTM just released their Templates feature which is really cool, even though it is totally ripping off of Adobe (Launch) Extensions. I don’t know who started the project first, whether GTM was already working on Templates when Launch worked on Extensions, but I think Templates are really neat. I like the workflow of it.

Platforms go through these processes of expansion and contraction, diversification and consolidation. What’s going to happen over the next five years, we'll see a lot of noise with Extensions and Templates where it is going to make things a lot more complicated and tough to filter out the signal from the noise. We are going to run into a lot of dependencies on these third-party Templates and Extensions. They are great, fine tools, but I don’t think it is healthy for an implementation to be dependent on a lot of coding, and Extensions/Templates require coding. So over the next five years, Adobe will find ways to improve their workflow for updating and curating Extensions. 

Another prediction - I think there will be a lot more interests in server-side tag management in the near future. Server-side tag management will grow. The trust of front end pixels will begin to erode (with Google Calendar going down at one point). I think people will want to consolidate where they are sending their data and I think that will lead to more server-side tag management implementations. It has the potential to make data layers easier to modify.

 

The server is the one place where you can more easily track cross-platform experiences, the server is also the place that will know when to fire the tags more intelligently.

JG: As far as I know, GTM right now does not have a server-side solution and Launch does not have a server-side solution yet, they might be working on something like that, I don’t know. Tealium, Ensighten, and Signal have server-side solutions. The third-party Tag Management Systems have a server-side component, because that’s their differentiator, but once we start talking about shaving off milliseconds from load time, we are going to start questioning the bulk that every extension, every rule adds to the page.

I ran a cursory analysis of how heavy each extension is within Adobe Launch just by installing it and looking at the default state of an extension without adding any of the actions or any of the data elements. Most of them are 400 bytes/600 bytes, but some of them are 100+ kilobytes. This is huge and this load is added on every single page. So when we talk about people going to server-side, it is for reasons like this.

 

So you are thinking server-side will get you to a point where it will really let you optimize the client side tracking as well, not just server-to-server tracking but also using it to optimize situations like that where we don’t want a heavy page full of tracking logic that does not apply to this user?

JG: It will mostly be for enterprise-level customers, I don’t see this happening for small and mid-sized businesses because often times they might not have the resources to implement server-side tag management.

What you just described is logic that GTM can currently roll out if they really cared about controlling container sizes - they could easily roll that out - conditional rule loading. You’d be able to do a lot of that using their cloud server, maybe that is something we should be telling them to do. The Templates feature was the logical next step, they do have this point where they tell you “your container is too big,” but they don’t necessarily tell you what to do about it.

JG: Right now we are all worried about getting data layers on the page, but what happens when we have mastered data layers - the reality is OK, these tag management file sizes are getting kind of big, maybe we should move everything to the server side.

Optimization is going to be a big topic and clients already by default suspect the tag management container for every bit of slowness on their website as it is something external that they don’t have any visibility into. It is very easy for developers to say, "We are doing everything right in our application, but this external container which we have no idea what it does - that must be the reason why the page takes an extra X seconds to load." We are under a lot of scrutiny.

JG: It is only going to get more aggressive. Google started penalizing sites for load speed and even if your TMS containers are loading asynchronously it does not necessarily help to have two-megabyte files loading on your pages. A lot of applications nowadays load content asynchronously, so even though your container is loaded async that does not mean it doesn’t undermine the user experience.

Tags: Analytics Tips

Subscribe to our quarterly data quality newsletter