A privacy-first approach to building a Google Analytics alternative
{%tip-box title="Disclaimer"%}This blog post is based on my own personal research and should not be considered legal advice. Make sure to seek legal counsel that specializes in data privacy regulations to ensure your company complies with any such regulations.{%tip-box-end%}
Google Analytics has been the de facto standard in web analytics for well over a decade. It gives individual content creators, small businesses, and enterprises a way to track, analyze, and report on website visits, marketing conversions, and ad-generated revenue. It’s also a pretty impressive feat of engineering, rather gracefully handling what must be an absolutely colossal load.
But it’s having a pretty rough go these days.
Recent regulations, like GDPR in Europe and the CCPA in the United States, don’t just define legal boundaries for companies using Google Analytics; they also signal a broader concern about digital privacy. Ad campaigns from Apple and Samsung further highlight how this concern extends all the way down to the individual consumer:
In case you need a refresher on digital privacy legislation, here are a few points that cover how this shift affects Google Analytics:
- The European Union regulates how to share data between EU and the US, and it looks like Google Analytics is not 100% compliant.
- Website tracking of users in the EU is only legal if you (among other requirements) ask for and obtain their explicit consent before activating any non-necessary cookies and website trackers on your domain. Cookies that feed Google Analytics have a hard time classifying as “necessary” to avoid these consent barriers.
- Authorities in Austria, the Netherlands, France, and Italy have already ruled against Google Analytics, and other European countries are expected to follow.
- Some web browsers, like Microsoft Edge, already block tracking by default, and Google itself has announced its intent to make Chrome free of third-party cookies at some point in the future (though they keep delaying… I wonder why?)
One thing is certain, things aren’t getting easier for analytics leaders at enterprises that track visitors across the web, and many companies are lawyering up (or already have) at considerable expense.
Some product insights leaders see these compliance frustrations as the final provocation to transition away from Google Analytics, and many are turning to other tools like Fathom or Plausible that are marketed as privacy-first alternatives to Google Analytics.
But even if you can get to compliance with Google Analytics or privacy-first tools, there are some very good reasons to consider building web analytics yourself, either to replace these tools entirely or to achieve use cases where they are not extensible.
In particular:
You want all the data.
If you use Google Analytics, two things conspire against getting an accurate picture of behavior on your website:
- Google Analytics - like many others - currently relies on cookies to track behavioral analytics. These cookies require consent to be given. And as it turns out, people don’t like giving consent to being tracked. This will partially change with the transition to GA4, but that can be a heavy lift for analytics teams.
- Google Analytics samples the data so it can deal with the considerable loads it requires to serve analytics. So the analytics you do see, regardless of consent, aren’t always 100% accurate.
You want the most recent data.
Data available in Google Analytics can be delayed by up to 36 hours or more, depending on the size of your property.
Running a flash sale on your massive Commerce site and want to see real-time conversions? Sorry, no insights for you. Enhanced data freshness isn’t guaranteed, even if you pay the $150K+ a year for GA 360. (By the way, once you hit 25 billion events, you’re considered a “Large” property, and your freshness SLA drops to 48 hours on 98% uptime. 250 billion? Now you’re “XL” and the SLA is 7 days… )
You want control.
Google Analytics is a behemoth. Configuration is complex. Control is fleeting. Flexibility demands engineering hours. Hell, when you use Google Analytics, you don’t even own the data you collect, nor can you decide where it’s stored.
It might seem like I’m trashing Google Analytics. Maybe I am. But I will acknowledge that it’s still exceptionally valuable to many people and companies the world over, especially those without the engineering resources to build something themselves.
But if you have the resource and you’re thinking about a DIY approach, here’s what to consider to build a privacy-first solution and lower your risk of non-compliance.
Guidelines to reduce your risk when building web analytics
As Google has no doubt discovered, GDPR and similar data privacy regulations can be a minefield of legal gray areas and litigation risk. Even if you think you’re taking a compliant approach to web analytics, other entities (and their lawyers) may disagree. As you build an analytics platform, then, pursue these two goals:
- Minimize your risk by reducing and managing the processing of personal information
- Maintaining full visibility, transparency, and control regarding the personal information processed.
Here are our guidelines for taking a privacy-first approach as you build a Google Analytics alternative:
Guideline #1: Minimize and control your use of tracking technologies
You can’t talk about web session analytics and not talk about cookies. Entire domains dedicate themselves to helping users assess the legal compliance of their cookie policies, so we will avoid too much detail here. You can do your own research.
That said, there are important guidelines regarding cookies that will help you take a privacy-first approach to DIY web analytics.
First, if you can accomplish your use case without any cookies, even necessary cookies, then by all means do it. If you don’t use cookies to track users or even sessions, while you should still observe GDPR guidelines for how you store and process the data, you shouldn’t need consent. An example of this approach might be if you only need aggregated metrics on events, like pageviews or button clicks, and you don’t need to map those events back to a unique user or session.
For most use cases, though, cookies are useful. But just as IRL cookies come in chocolate chip, snickerdoodle, and macadamia nut, web cookies come in different flavors, too. And each flavor has its own risk profile.
Third-party cookies carry the most risk. By definition, a third-party cookie is set by a domain different than the site being visited. If your website uses third-party cookies, your visitors’ information is being shared with an entity in which they have no interest. This shouldn’t be a shock, but it turns out that people don’t like this. The jig is up on third-party cookies, and if you’re pursuing a privacy-first approach to DIY analytics, then you should avoid them.
Then there are first-party cookies, which by definition would be set by your domain. There are many different types of first-party cookies, and the ways they can be used are nigh innumerable. If you decide to use first-party cookies for analytics, here are some considerations:
- No matter how you’re using the cookie, or whether it is a persistent cookie or a session cookie, you should at the very least notify your web visitors about the cookie and what you’re doing with the data it collects. This ensures transparency under GDPR.
- Depending on how you use the cookie, you may or may not need to ask for explicit consent. There’s a benefit to not asking for consent: you’re more likely to capture a complete analytics picture because there’s a less direct path for users to reject the cookie. But there are some strict legal boundaries on cookies not requiring consent, so if you need to take this approach, you should be on the same page with your legal team.
- If you don’t ask for consent, still give people the ability to opt out unless the cookie can be deemed as “strictly necessary”. Will opt-outs create some flutter in your analytics? Perhaps, but it will demonstrate your commitment to your visitors’ privacy, which is probably a good tradeoff for your brand.
- Make sure your use of cookies is easily auditable, in case you need to defend your claims on how they are being used.
The TL;DR on cookies? Avoid them if you can, use them if you must, get aligned with your legal team, and keep everything transparent to your web visitors.
Guideline #2: Avoid storing PII or anything that can be mapped to an individual user
This should go without saying, but you should avoid storing any personally identifying information (PII) for your analytics. PII includes but is not limited to usernames, passwords, email addresses, credit card numbers, etc. These may be necessary to actually run your site (for example if you have an eCommerce store), but you really shouldn’t need them for your analytics. As you define your events data schema for your analytics, never include fields that could have PII, and even consider building masking functions into the tracking code that you embed on your website.
Related: How an eCommerce giant replaced Google Analytics.
Even beyond PII, it’s a good idea to avoid “fingerprinting”, that is, creating or storing any kind of data that could be tied back to an individual user. For example, if you’re tracking sessions, anonymize them by creating randomized and/or hashed session IDs.
Guideline #3: Always aggregate metrics above individual levels
As you build analytical dashboards or processes that use the data sent by your tracking code, remember that every row in a table of raw events was generated by an individual user. You relied on that person to give you that information, and you (hopefully) asked for their consent and/or demonstrated that you would never analyze their individual behavior.
We think that the ethical approach to privacy-first web analytics is to never create analytical views that would process or expose data at the individual level.
Said differently, all of your metrics should be aggregated above the visitor, item, or order level. And in that vein, you should only store raw, user-level data for as short a time as possible, and rely on materialized aggregations for long-term analysis.
Guideline #4: Avoid use case scope creep
If you’re choosing to build an alternative to Google Analytics, chances are you have a particular use case or set of use cases in mind. And if these use cases depend on tracking visitors across your website, then you’ll likely be using a cookie. And when you set a cookie, you will either need to ask for consent or notify your visitors of that cookie. And importantly, you should be telling them how you are using the data collected by that cookie.
But what if a Product Manager comes running, sees the amazing analytics you’ve built, and asks you to add a new event type into your tracking code so they can solve their own use case?
If you acquiesce, you may jeopardize your legal standing. It’s not that you can’t solve their use case, it just means you need to be transparent, ethical, and thoughtful about it. You’ll need to update your cookie policies, refresh your consent/notice banners, and circle back up with your legal team. Which is about as good a reason as ever to turn down a request from a PM. You’re welcome ;).
Want to create a privacy-first alternative to Google Analytics?
If you’re convinced that you want to build your own Google Analytics alternative, and you’re willing to take the steps to ensure privacy, transparency, and legal compliance, then you might check out Tinybird. Tinybird offers a DIY analytics platform for analysts and developers that lets you build privacy-first, low latency analytics on any volume of traffic. Tinybird solves the sampling, latency, and freshness issues that Google Analytics presents, and does it in a way that lets you have complete control over what data is collected, stored, and processed.
If you need a starting point, then check out Tinybird’s Web Analytics Starter Kit. You can create a basic, privacy-first visitor analytics dashboard in about 3 minutes.