Campaign Ads Segmentation and Wide Table

Customer segmentation and targeting the right audience is a critical marketing need for any manufacturer.  One of the key prerequisites is to bring relevant data sources together so that past customer behavior can be analyzed so that trends and patterns can be analyzed.  

SpringML has been involved in building Customer Data Depots on the Google Cloud Platform, by leveraging Google Dataflow, Google Cloud Storage and Google BigQuery.  This data depot integrates data between systems such as DoubleClick for Publishers, Data Management Platforms such as Bluekai, Ad Measurement systems like MOAT and a few other systems that form a typical ad tech stack.

Why did we choose Google Cloud Platform?  Three reasons:

  1. Integrated stack – dataflow, pubsub, bigquery, GCS form an integrated stack that allows developers to rapidly implement solutions.  This is in contrast to other platforms that may have been put together via tech acquisitions so they don’t work together seamlessly.
  2. Serverless – GCP allows developers to focus on writing code and solving business problems, and not worry about scaling or infrastructure setup.
  3. Automated – pipelines can be fully automated requiring minimal intervention for tuning or deployments.

Understanding Campaign Ad Analysis

A Campaign is a program of publishing one or more ads for a particular product. The ads can be published over multiple ad platforms, which can be websites, videos and flashtalking.

The point here is to identify and target user segments such as age group, demographics, location, interests and see clusters or trends in terms of the user behaviour towards these ads that would better enable the customer to target these user segments better in future campaigns.Ads data comes in the form of cookies and related categories(user segments/information) and MOAT metrics (User Experience Metrics)

Ads data comes from 2 major data sources – Bluekai and MOAT


BlueKai ads data predominantly contains ads cookies and ads Categories. BlueKai categories comprise of user specific information such as User Age bracket, location, interests, behaviours etc.

The BlueKai database leverages the concept of taxonomy that contains a hierarchy of Brand-Campaigns and User Categories. The hierarchy for both Brand-Campaign and the User Categories can be found in the BlueKai taxonomy table, but are identified differently.

To identify the Brand-Campaign, we have to look for a taxonomy_path that contains self-classification, such as Company – Private –> Self-Classification –> –> FY17 ABC Tires. Here, Self-classification+1 = Brand (, Self-Classification+2 is the campaign (FY17 ABC Tires)

User Categories comprise of user specific information such as User Age bracket, location, interests, behaviours etc.  For the User Category hierarchies, we need to look for all taxonomy_paths that do not contain self-classification. For example Autos –> Aftermarket –> Auto Parts Buyers , shows us user groupings or categories. This is defining a group of users who are interested in Autos. Then within that group, there is a subset interested in Aftermarket, and with the aftermarket subgroup, some are auto parts buyers. Sometimes, a user may just fall into the aftermarket bucket, but is not interested in Autoparts. BK attributes contain demographic, age, location, etc., but that’s not all. It’s a lot bigger. such as the ex above.


MOAT metrics comprise of User Experience related information such as – how long did a user spend on a screen, hover over, click, did the user come back to that screen etc. MOAT metrics are divided into 3 broad categories. The MOAT video log (contains video ad related categories). MOAT display log (contains display ad related categories) and MOAT Flashtalking log (display, richmedia etc).

Some examples of MOAT categories are:

  • Very Much Liked (Watched the Whole thing)
  • Not Liked at all (Closed the window after the ad was displayed)
  • Meh (The Ad was displayed but the user did not pay attention)

As an example, customers sometimes have about 50 categories that ads are weighed against coming from MOAT.

BlueKai and MOAT data need to be seen together to get a true picture of user responsiveness to ads. The join is done using the BKUID and Campaign ID fields that are present in both the BlueKai and MOAT datasets.


One of ways in which this integrated data can be analyzed to get customer segments is to create a pivot table view of the BlueKai and MOAT cookies (aka Bluekai ID) and categories associated with them. The picture below defines the requirement:

BlueKai IDAge GroupLocationInterestsVery Much LikedNot Liked at AllMeh

Here, 0 shows that the user has never touched that category and that the category does not apply to the BlueKai ID that is associated with it. 1 shows a 100%, i.e the user recorded activity against that category. Once data is in this format it can be analyzed in many different ways including application of machine learning models to cluster or predict which users are likely to engage positively with an ad.

We leveraged Google Cloud Dataflow to build the pivot table by sourcing data from BigQuery and writing the output as a csv to Google Cloud Storage. The google cloud dataflow presented advantages such as parameterization allowing end users to create such tables for a particular campaign etc. Also, dataflow allowed dynamic allocation of virtual workers to the dataflow depending on the volume of data being processed by leveraging the dynamic scaling in dataflow. Lastly, the dataflow allowed cron jobs to be scheduled and managed through the Google App Engine, thus making support and maintenance for IT very easy.

Customer segmentation is a complex problem and the approach here represents one option that allows business and marketing analysts to quickly analyze trends and create segments for new ads.  There are several other ways to do this and we will continue to outline them in future blogs.