Choosing between Deterministic and Probabilistic Data

In the cookieless era, identity data plays a significant role in building customer profiles for marketers. Do you use the right approach?

Building customer profiles is not always about acquisition or sales growth. It highly impacts the marketing strategy, data privacy, and customer relationships. So, identifying customers interacting with the brand and delivering relevant experiences require companies to resolve cross-device data to unified customer profiles. And one of the best ways to move forward is through identity data.

What does identity data entail?

Identity data is a collection of data that includes customer details such as name, address, bank account number, health records, and other information that are highly sensitive. While the governments and healthcare companies mainly collect it, other sectors such as retail, banking, third-party payment processors, and others, also collect them, mainly for marketing purposes.

There are two main types of identity data that marketers access: deterministic and probabilistic identity data

These terms have been familiar to digital advertisers, publishers and ad tech executives for years. However, as the entire industry is on the hunt for alternatives to the third-party cookie, they seem to be tossed around more frequently, especially in descriptions of how the new crop of so-called cookieless identifiers work.

What is probabilistic data?

Probabilistic data is data based on behavioural events like page views, time spent on page, or click-throughs. This data is analysed and grouped by the likelihood that a user belongs to a certain demographic, socio-economic status or class.

To generate probabilistic data, algorithms will identify pre-defined behavioural patterns such as interests or browsing behaviours to determine the probability of the user’s age, gender or socio-economic status. Behavioural patterns could be as general as grouping users according to the types of media they’re most likely to consume, or they could be more precise and group audiences by the type of device they’re most likely to use to access a touchpoint.

How is probabilistic data used?

Probabilistic data can be used to add more value to deterministic datasets and to scale deterministic data models. If something is unknown in a deterministic dataset, enriching the data with probabilistic data can offer more accurate insights.

What is deterministic data?

Deterministic data is linked to something which identifies a user, like an email address or a cookie ID, and has a likelihood of being 100% true. Deterministic data provides a solid foundation for marketing operations because it is based on fact. For example, if a user signs up in one year and gives their current age, it is a fact that the following year they will be a year older.

In addition to demographic information, deterministic data can also take the form of a user’s interests or commonly visited geographical locations. Having factual data of this kind is critical to helping marketers refine the accuracy of their personalised and targeted marketing efforts.

How is deterministic data used?

Deterministic data can be used to provide accuracy and clarity in targeted marketing campaigns and to enhance probabilistic segments. One effective use case for deterministic data is in the creation of granular segmentation to target users with relevant campaigns. For example, grouping users who you know for a fact share an interest in cycling.

Deterministic data can also be used to supplement and enhance the accuracy of marketing prediction. Prediction is used to make educated guesses about users if the information is not apparent in deterministic data. Marketers may attempt to guess ages, genders or interests to then create probabilistic segments from their prediction which can be fed into CDPs (Customer Data Platforms).

However, predictions can be wholly inaccurate, which can then lead machine-learning algorithms to produce unsatisfactory results. To this extent, supplementing unknown information with deterministic data gives the algorithm a higher percentage of accuracy.

How deterministic and probabilistic data is collected

Deterministic and probabilistic data are collected in two different ways:

Deterministic data is typically collected from users inputting their own information, such as signing up for a service. Common channels for collecting deterministic data include online surveys, social media platforms, point of sale (POS) software and newsletters. Companies can then utilise deterministic matching to encrypt personally identifiable information (PII) and use it to recognize profiles for future logins.

Probabilistic data is usually anonymously collected based on a user’s browsing behaviour, such as gathering browser cookies or tracking website clicks. The information is then aggregated to create a model of a customer, which can then be compared to deterministic data points. Probabilistic matching is done when a user’s behavioural data is identified as a registered, known user. It can also be used in identity resolution to recognize the same user across multiple devices and applications.

Choosing between deterministic and probabilistic data

Deciding which data approach is best relies on the underlying target business goal.

When to choose a probabilistic model

If your goal is to target specific audiences who might be interested in buying certain types or products, using probabilistic data can simply help you reach a larger audience vs. pinpointing precisely which consumers qualify as prospects.

When to choose a deterministic model

A deterministic model is appropriate when the probability of an outcome can be determined with certainty. For example, a software platform selling its technology products may use this type of model to set prices or forecast demand for new products. In general, this type of modelling is used in situations where it is important to make decisions based on objective facts rather than subjective opinions about what might happen in the future.

If your organisation is making use of CDPs, deterministic data can be used to create 360 degree customer views.

CDPs use AI and machine learning to collect, manage and analyse both deterministic and probabilistic data from multiple disparate sources at breakneck speeds. Some CDPs also use deterministic data (unique identifiers such as email addresses and phone numbers), but will need access to both deterministic and probabilistic data to perform customer identity resolution and build a complete customer profile.

Most data management processes use both methods together. More specifically, probabilistic data can be used to add value to deterministic data. One way is to use probabilistic data to widen the scale and expand reach to deterministic data. When something is unknown in the deterministic dataset, probabilistic data can give companies their best bet. Another way is by using probabilistic data to learn more about the deterministic data. For example, finding out which known customers might be interested in other products or understanding their preferred browsing behaviour.

Deterministic data can also be used to train probabilistic data models. When a probabilistic model is created, it can be compared to the known deterministic data for validation. Without a solid foundation of deterministic data, the probabilistic data cannot be precise.

Applications of deterministic and probabilistic data

When combined, deterministic and probabilistic data can be used for:

Properly executing cross-device tracking and attribution.
Validating the success of marketing campaigns toward new audiences.
Enhancing deterministic data with probabilistic information, such as profiling multiple family members that share the same account.
Creating buyer personas that can be used for customer segmentation.
Launching programmatic buying campaigns, such as making product suggestions.
Charting customer profiles in an accurate, real time identity graph.
Expanding the reach of advertising across various audiences.