Eliminating ETL Tools and Reducing Costs using Snowflake’s Native Capabilities

A lot of companies are adopting cloud data warehousing services like Snowflake since it enables data processing, storage, and analytic solutions that are easier to use, faster, and more flexible than traditional offerings.

In this article, we demonstrate a flexible and scalable solution for managing and analyzing usage data present in Google Cloud Storage (GCS) without using any ETL tool, to land the data into Snowflake, where it can be joined with other data sources to provide greater insights. To access the usage data in GCS, we will create a Snowflake stage that points to the GCS bucket containing the usage data file. This stage is used to load the usage data into Snowflake using the COPY command. To get the Account and Subscription data present in Salesforce into Snowflake, we will use the third-party integration tool, Fivetran, to load the Salesforce data into a Snowflake table. Fivetran offers a number of key features that make it easy to manage and process data including, support for large volumes of data, incremental loading, and integration with transformation tools like dbt.

Data from both these sources (GCS and Salesforce) are loaded into Snowflake tables to aggregate the monthly usage data which is used for billing purposes. The usage data is then visualized using Tableau.

We will go over the following steps in detail:

  • Architecture
  • Google Cloud Storage (GCS) to Snowflake
  • Salesforce to Snowflake
  • Aggregation of the Usage Data in Snowflake
  • Visualization in Tableau

Architecture

Figure 1 below illustrates the high-level architecture, including all the components involved and how the data flows.

Figure 1 - Notional architecture diagram
Figure 1 - Notional architecture diagram

Snapshots of the Account, Subscription, and Usage tables (Figures 2, 3, and 4) are included below.

Figure 2 - Snapshot of the Account Table
Figure 2 - Snapshot of the Account Table
Figure 3 - Snapshot of the Subscription Table
Figure 3 - Snapshot of the Subscription Table
Figure 4 - Snapshot of the Usage Table
Figure 4 - Snapshot of the Usage Table
Figure 5 - Relationship diagram
Figure 5 - Relationship diagram

As shown in Figure 2, Account is the master table, and each Subscription is related to an Account through a Customer Id. We are using 250 Accounts and 200 Subscription sample records and imported them into Salesforce using the data import wizard. For each Subscription, we added 6 to 7 months of daily usage data. Lastly, we uploaded the CSV file to Google Cloud Storage.

Google Cloud Storage (GCS) to Snowflake

There are two ways in which we can load data from GCS into Snowflake:

  1. Bulk Loading Using the COPY Command:
    This is a good option for loading large amounts of historical data from GCS into Snowflake in a batch process.
  2. Continuous Loading Using Snowpipe:
    This is useful for real-time data ingestion and analysis where we need to have the most up-to-date data available in Snowflake as soon as it is generated.

The best solution may depend upon the volume of data to load and the frequency of loading.
For this use case, we did bulk loading of the data from GCS into Snowflake by configuring storage integration and creating an external stage.

Alternatively, we could directly access the data in GCS through an external table feature in Snowflake without having to move the data. This can be useful if you want to perform analysis on large datasets without incurring the cost and time associated with transferring data between systems.

For Snowflake to access data from GCS, we must first grant Snowflake access to the GCS bucket through the GCS console. Then, create an external stage in Snowflake that points to the GCS bucket and then use the COPY INTO command to load the data from the stage into a Snowflake table.

Loading data from Salesforce into Snowflake through Fivetran

The Account and Subscription data was loaded from Salesforce into Snowflake tables using by connecting Salesforce as the source and Snowflake as the target.

Fivetran discovers all available standard and custom objects in Salesforce and automatically pulls data for the selected objects into Snowflake. Using the Fivetran dashboard we can start the data sync to extract data from Salesforce and load it into the target table in Snowflake.

Fivetran allows us to easily set up and manage the data transfer between Salesforce and Snowflake, ensuring that the data is always up-to-date and available for analysis in Snowflake.

Aggregation of the Usage Data in Snowflake

The final table Monthly Usage was created by combining all the 3 tables (Account, Subscription, and Usage), including the two calculated fields, ‘Quantity’ and ‘Status’.

The Logic used to compute Quantity and Status is as follows:

  • Status in the Usage table can either have a value of 1, which indicates quality checks have been passed, or a value of 0, which indicates a failed quality check.
  • The usage is summed for each of the customers for the prior month where there is a status of 1. If the number of records summed is equal to the number of days in the billing month, the billing status is set to ‘New’. Alternatively, whenever the number of records summed is less than the number of records expected in the billing month (because there were some records with a status of 0 or a record was missing altogether), the billing status is set to ‘New with Warning’ in the final table.
  • Next, the calculated usage is compared to the prepaid usage limit for each customer. If the usage exceeds the prepaid limit, the difference is recorded in the Quantity (Overage) field. If the usage did not exceed the prepaid limit, a value of zero is recorded in the Quantity (Overage) field. A snapshot of the final table Monthly_Usage is shown below.
Figure 6 - Snapshot of Monthly_Usage Table
Figure 6 - Snapshot of Monthly_Usage Table

Visualization in Tableau

Once the Final table is loaded in Snowflake, we connect it to Tableau and create a dashboard to visualize the Account and Usage information to get insights.

Figure 7 - Tableau Dashboard
Figure 7 - Tableau Dashboard

Summary

Snowflake is a powerful tool for data-driven organizations that need to access and analyze data from multiple sources to make informed decisions and drive business growth.
In this article, we demonstrate how you can use Snowflake to access and analyze data from GCS without the need for a third-party ETL (Extract, Transform, Load) tool. This reduces costs and latency allowing us to use a single platform for both extraction and transformation of the data.
We also demonstrate how you can extract data from Salesforce through Fivetran and load it directly into Snowflake, and how to use Tableau to visualize the data to gain insights.

As a trusted Snowflake partner, we can help you on your Snowflake journey. Contact us today, at info@springml.com to learn more about how we can accelerate time-to-market, deliver a better customer experience, and improve operational efficiency by leveraging the speed, scale, and performance of Snowflake’s platform.

Thought Leadership