Using Informatica for Complex Data Integrations into Einstein Analytics

Salesforce Einstein Analytics is a great tool for end users to gain insights on business performance.

However, critical business data doesn’t always live 100% within Salesforce. For example, manually maintained sales goals, commission plans, and revenue are external datasets that are often needed to get the full picture on business performance.  Some external datasets can be a simple CSV upload, however some integrations require enterprise grade ETL.

In this post, I’ll give a high-level overview of using Informatica Cloud, and a few example of common use cases.

High Level Use Case

Informatica Cloud gives you the tools you need for complex data integrations into Einstein Analytics.  Combined with a Staging Database or Data Lake, you can:

  1. Connect to dozens of external systems, using Informatica’s many Connectors.
  2. Build complex mappings for combining, augmenting, filtering, and transforming source data.
  3. Schedule and organize integration tasks with Task Flows.
  4. Synchronize Salesforce objects and external database tables into a central data lake.
  5. Re-use existing integrations to build quick data pipelines
  6. Use Informatica validation services e.g Address Doctor for address validation.

Use of Einstein Analytics Connectors

Einstein has a built-in Salesforce Analytics connector which makes it very easy to load data into Einstein Analytics.  Creating the connection is as simple as entering your Salesforce credentials, and defining a temp folder for your dataset definitions.  You can then use Einstein Analytics as a target in a Mapping or Data Synchronization Task..

Informatica uses as JSON file to define the target dataset metadata.  It will even automatically create a target metadata file for you, using the metadata from your source system.  You can also modify the JSON definition file after Informatica creates it for you.  The most common change you might make is changing data types between text and numeric, so that Einstein Analytics correctly categorizes them as dimension or measures.

Using Mappings to pull data from REST APIs

One common use case is extracting data from a REST API.  I recommend first using Postman for testing and troubleshooting your API connection.  Then you can use Informatica Cloud to build your connection to the API (which I won’t cover here).  You can find more information in Informatica’s Online Help.

Once you have your API connection setup, you’re ready to build a mapping. In the example below, we are doing 6 things in order.

  1. Source – Use a table to return one row per date.  This allows us to parameterize the dates we want to load, and also lowers the volume of data returned in each response from the API.
  2. Expression – Set up your API authorization request parameters.
  3. Web Service – Call the API to get an access token.
  4. Expression – Set up your report request parameters (including the authorization).
  5. Web Service – Call the API to get your data.
  6. Target

Data Synchronization and Data Replication Tasks

For making a quick direct copy (with minimal transformations and no additional steps) of a source object into another system, you can use a Data Synchronization Task.  This allows you to define a source, target, filters, and map your source and target fields.

If you need to replicate a larger number of objects, you can use a Data Replication Task.  This is very similar to Source Data Replication within Einstein Analytics.

With a Data Replication Task you can:

  • Select multiple sources
  • Define a prefix for your target objects (eg, “SF_”)
  • Include/exclude fields
  • Filter data
  • Schedule incremental/full replication

Things to consider:

  1. Security and Firewall considerations (and installation) for Secure Agent for connecting to on-premise applications.
  2. Dataset refresh. Performance considerations on full data refresh vs incremental appends.
  3. Error handling. Setting up email notifications on data processing failures/warnings, and designing tasks/mappings to be easily restarted without missing or loading duplicate data.

The Salesforce Analytics connector can only create and write to datasets.  If you need to read from an existing dataset, you’ll need to the use Analytics REST API.