wfh.jpg

Detecting Internal Traffic in Adobe Analytics in the Time of COVID-19

As the world fights the COVID-19 (Coronavirus) pandemic, working from home has become the new normal for all businesses where this is viable. But for many, this abrupt change has added significant challenges to correctly identifying internal traffic in Adobe Analytics. This in turn has resulted in compromises in data quality with internally-generated transactions, adding yet another source of anomalies to the already abnormal traffic and conversion rates.

While prior to COVID-19, the de-facto standard for detection of internal traffic was to use IP ranges, multiple factors were already chipping away at its usability. Enterprise network proxies (such as Zscaler) have been routing internal traffic through IP address ranges that are shared by different customers, making IP detection suspect. Excluding shared-tenancy IP ranges often results in removing valid B2B traffic together with the internal traffic.

In addition, GDPR/ITP developments have imposed requirements for anonymizing IP addresses in the Adobe reports. While using Adobe’s IP Obfuscation capabilities can correctly exclude a specific IP address before it gets obfuscated in the reports, the option to replace the last octet of the IP with 0 requires changing all of the existing IP address-based filters AND introducing of level of risk associated with excluding valid non-internal traffic.

With employees working from home, our new reality is that IP addresses are no longer reliable as a method for internal traffic identification. Employees can now generate internal traffic from hundreds/thousands of dispersed IP addresses that can change dynamically at any time.

With the old playbook out of the window, we’ll review some of the possible solutions for addressing this challenge.

 

Understanding levels of detection/exclusion

The internal traffic problem is so common that over the years multiple different approaches have been adopted for the detection of such traffic:

  • One approach is to prevent internal traffic from being collected in your reports

The most common way to do this is by identifying a range of IPs and all traffic from such IPs gets automatically excluded within the report suite in question. This is done under Admin/Exclude by IP:

The approach does not work retroactively, so your IP definitions always need to be up-to-date.

Another method (although less commonly used) is the cookie opt-out approach. Your organization gets a specialized URL where users can opt out of tracking.

 

  • The detection approach allows internal traffic to reach the various Adobe reports, but you also have a flag (or a selection of attributes)  that give you the option to segment out such traffic.

This is a very flexible approach with many benefits (including in some cases the ability to retroactively identify and segment out traffic) and various different flavors - from being able to create a separate “internal only” report suite to letting internal traffic flow into the production report suite, but with the option to create segments (and virtual report suites) that can account for this traffic.

  • A quasi-exclusion approach is to leverage the “Bot Detection” feature which lets internal traffic flow into specialized tables that are reserved for bot traffic.

In such implementations, the internal traffic does not pollute any of the production dimensions/metrics and you get two specialized reports (Bots and Bot Pages) where you can see details about this internal traffic. One advantage of this approach is that settings can be easily applied in bulk across a list of report suites.

 

Possible solutions

  • Virtual Private Network (VPN)

IP-based detection can still be used when employees working from home are required to VPN. In such cases, legacy IP-based definitions will continue to work. It is likely that your Adobe implementation will still see traces of internal traffic, as employees might access your public website from their work computer without having VPN-ed.

  • Hosts file updates

One option could be for the company to push a policy update to all company-owned computers to use an updated version of a hosts file. The hosts file controls what IP/server get resolved when an HTTP request is made, so adding an entry such as:

127.0.0.1       xchq2.d1.sc.omtrdc.net

would result in all traffic normally sent to the Adobe collection domains to be routed to your local machine instead, thereby preventing your activities from polluting any reports.

This approach also works for implementations that use CNAME instead of the omtrdc.net or the 112.2o7.net legacy Omniture domain.

  • Browser plugins

A selection of browser plugins (such as Ghostery) can automatically block analytics (as well as advertising) tracking requests. If deployed via a group policy update by the organization’s IT team on the browsers of remote users, this approach can go a long way to keep production data clean. 

  • Browser policy updates 

One option that will allow data to reach Adobe Analytics collection servers and the corresponding report suite but give you the flexibility to segment it out or have it be recorded in the Bots tables is to work with your IT team to push a group policy update for your company's browser’s User Agent settings. Using Chrome as an example, IT can set the User Agent to a custom string which is easily identifiable and can be used to segment out internal traffic.

In cases where you cannot rely on your IT teams to deploy policy updates to employees' computers, users can still be instructed to individually change their hosts files or update their browser's User Agent. 

  • Query parameter segmentation

An approach that requires no browser or hosts file configuration, but is more susceptible to user error/omission is to ask users to append a query parameter when visiting the company’s website, e.g. www.QA2L.com becomes www.QA2L.com?internal=true.

(After publishing this blog post, we noticed that a similar approach has been documented by Simo Ahava with respect to filtering internal traffic in Google Analytics)

You can create a custom dimension to capture all instances of the "internal" query parameter:

and then create a processing rule that maps the "internal" parameter to this dimension:

The custom dimension could be used to break out other dimensions or within a visitor-level segment to filter out any activity that might be generated by employees:

This solution is easy to implement but puts more of a burden on making sure that employees add internal=true at least once during their cookie lifetime. 

These are some ideas that come to mind with respect to filtering out internal traffic for employees working from home.

We'd love to hear your feedback or different ideas at info@qa2l.com!

Tags: Data Quality Adobe Analytics Tips

Subscribe to our quarterly data quality newsletter