SpringML’s takeaways from the Snowflake Summit 2022

Let’s talk about the next big things happening in the cloud world. Snowflake, the fastest growing software company to achieve a billion dollars in revenue, organized a summit recently in Vegas where they highlighted new advancements in their ecosystem and announced future developments in their pipeline that will soon be in public preview. I was lucky enough to be present at the summit to represent SpringML and showcase some of the solution accelerator use cases that we built. The summit frequently focused on the term “application disruption” and described how Snowflake can help monetize apps. The events at the summit also helped paint a clear picture of Snowflake as a Data Cloud platform with many different workloads.

I was attending the summit to showcase how SpringML is keeping up with all the advancements in the Snowflake community and how we can help customers achieve their data cloud dream. We presented three use cases; data migration, machine learning, and application development with all of them being powered by Snowflake. These demos prepared by us resonated with the crowd attending the Snowflake summit.

  • BTEQ (Teradata) to Snowflake migration – We have developed this accelerator using Streamlit and dbt to cover three main pain points.
    • Conversion of Teradata (BTEQ Scripts) to Snowflake SQL format.
    • Applying the transformations into Snowflake.
    • Validating the transformed data using dbt testing.

It provided a one-point solution for any data migration tasks from Teradata to Snowflake using just a Streamlit application. The beauty of this solution is how it can be run serverless with the power of dbt.

  • Retail price optimization and what-if analysis using Looker – This demo showcased how we can leverage snowflake in collaboration with Looker. This use case was built on the problem of optimizing price to maximize the clearance of inventory in a certain number of days. Therefore, we built a model for determining future sales for certain products at different price points. Afterwards, we simulated the results for the next 180 days and stored the results in our snowflake database. We then leveraged Looker to do a what-if analysis on the price and number of days to clear inventory. This dashboard gives the user a holistic idea about how the price can be optimized to increase sales/margin by tuning the two different parameters.
  • Anomaly Detection using Streamlit – Snowflake has advocated the use of operationalizing any machine learning models interactively (by building an application around it). Hence, they acquired Streamlit, which gives power to the data scientists to express their model results and findings in an interactive application. We built a similar kind of application in Streamlit that can give some useful descriptive insights and detect certain anomalies in the underlying data.The only thing that is needed from the user is to select the table from snowflake and select the dimensions and metrics that you need to analyze, the app then generates some descriptive metrics and tells if there is an abnormal trend in the underlying data.

As a first-time attendee of the summit, I was astounded by the close relationship between Snowflake and its partner community, and I was amazed to see hundreds of new products based solely on Snowflake. No doubt Snowflake is going to continue as the future of the Data Cloud, and the same was reciprocated in the turnout at the summit. Compared to the last in-person conference three years ago there was an increase of almost 300%.

Being there itself was an amazing experience for me to connect and share common thoughts with other partners and Snowflake users. There were Happy Hour events at the end of every day, where we could chat and discuss the new features and improvements with others.

In addition to all that, Snowflake’s main idea was to educate others about the advancements that they are making towards their goal of ensuring Snowflake is a single point of solution for everyone’s data cloud needs. It was reflected in their keynote too.  As a techie myself I believe this was the most technical keynote that I have ever seen with multiple demos and use cases shown live at the session. There are a few takeaways from the keynote that I think captured everyone’s attention. I would like to divide it into 5 specific parts:

  • Core Platform Advancements
  • Financial Governance
  • Apache Iceberg
  • Streaming Data Ingestion and Transformation
  • Disruption of application development in the cloud
    • Python in Snowpark
    • Streamlit native
    • Snowflake Marketplace
    • Unistore and Hybrid tables

Core Platform Advancements

As a part of the core platform advancement, this part of the keynote covered the following points –

  • 10% faster compute on AWS
  • 10% faster performance for write-heavy workloads
  • 5XL and 6XL data warehouses for AWS
  • 5x faster searches on maps as part of the Search Optimization Service

Financial Governance/Governance

There has long been a need in Snowflake to enable budgets and financial aids for certain resources. This part of the keynote introduced a new concept called resource groups that allows users to select an object that consumes resources and assigns a budget to them.

In addition to that, we all know about the replication feature that Snowflake has had for years now that allows us to replicate data across multiple accounts. However, they have extended this feature now to backup and retain information on users, roles, network policies, and a whole host of additional settings that are part of their Snowflake account. This also includes external resources as part of something called pipeline replication.
Another useful feature that they introduced is masking policies. Users can redact or mask certain columns that contain sensitive information. This is going to be in private preview soon.

Apache Iceberg

Snowflake has had external tables support for years now, although they were excited to announce the support of Apache Iceberg tables on Snowflake. However, when asked how many people use Apache Iceberg, only a few hands were raised. Snowflake then took the time to explain the concept behind Apache Iceberg and how extending the support on Iceberg tables can help the big data community.

Streaming Data Ingestion and Transformation

This is one of the important features that intrigued me the most and I feel that it makes Snowflake stand out amongst its competitors. With this announcement, they have made streaming ingestion to Snowflake easier With “Snowpipe streaming”, it will make be easier to query the data as soon as it lands in the Snowflake ecosystem resulting in 10x lower latency.

Another important announcement on the transformation side was the introduction of materialized tables that can be seen as a mid-point between streams/tasks and materialized views. This can be paired with the Snowpipe streaming to provide simplicity and flexibility to users to query directly from the materialized tables as soon as the data lands in snowflake.

Now coming to the next section of the keynote where they have focused deeply and are guiding the future of monetizing data apps. “Disrupting data app development in the cloud”. As a part of this section, they have inducted a lot of new technologies and concepts into Snowflake that can help in the monetization of applications, data governance, machine learning app development, etc.

Python in Snowpark

The Snowflake execs were excited about this and have emphasized the importance of having a sandbox type of environment for data transformation which is secured from outside interference, it can be run from your local machine or natively, and uses the same distributed power of snowflake. I was amazed by this and the demo that was shown by a senior developer using multiple ML libraries and SQL, showed me the true power of python in Snowpark. Moreover, they also took care of computation-intensive machine learning techniques by introducing large memory instances in snowflake.
This feature was so important to them that I was even asked by their senior product development manager to give some feedback on it and say some nice things to the Snowpark product team.

Streamlit Integration

We are all aware of the existing gap between machine learning models and using the models to generate some actionable insights, rather than just preparing slides of the findings. By acquiring Streamlit, Snowflake took a step forward in the direction of bridging this gap.

Their main motivation behind introducing Streamlit as a native app in Snowflake’s UI is to give the power in the hands of data scientists to build simple applications that convey their results to their model users. We saw this live in the keynote session itself, where a senior developer built a simple native application on Streamlit on top of the model that was developed as part of the previous demo. The most interesting aspect of this was the storing of models as UDF in Snowflake and the Streamlit application just called that UDF to get its prediction result.

Snowflake Marketplace

The title of this section might seem familiar to you and to kill the fun, it is the same data marketplace that Snowflake introduced a couple of years ago. However, they have rebranded it to Snowflake marketplace as it is not just for monetizing the data, it is also for monetizing anything related to or built on snowflake i.e., applications. As a partner of Snowflake, this intrigued us the most as it gives the Snowflake users the ability to monetize any generic application built on Snowflake to be published and accessed by millions of Snowflake users.

Snowflake demonstrated how easy it becomes to publish a Streamlit application that does predictive forecasting, with the backend build using Snowpark API. It also becomes very easy for the users to use the same app with just a single click (if they are willing to pay for the application :D).

With this in place, Snowflake addressed the problem of security, governance, distribution, monetization, and serverless deployment. Now, nobody needs to worry about accessing customer data as they can already access it via the Snowflake data share.

Last but not the least, one of the most interesting and amusing announcements was made by Benoit, which makes Snowflake stand out from its other competitors. The introduction of Unistore.

Unistore and Hybrid Tables

Even after a two-plus hour of the keynote, this announcement got a cheerful “Hell Yeah!” from the audience.

Unistore is introduced as a new workload to tackle the problem of handling transactional data on Snowflake. People have always complained that they can’t perform transactional data analysis on Snowflake. But, with this feature included in the tech stack of Snowflake, they have now covered all the bases. You can now analyze with a combination of both the historical and transactional data and create reports on them. One of the live demos showed us the power of the Unistore.

Unistore is powered by hybrid tables, a new type of table with fine-grained read and writes, and great analytical query performance. You can get the same query latency without any compromise in efficiency. This also enforces a primary key constraint to make sure there are no duplicates. This feature solves the problem of real-time analytics on transactional data and removes all the dependency from other transactional database ecosystems.

After all these announcements I am excited to try all of them out. As a partner company of Snowflake, I can just imagine how we can play a pivotal role in helping companies transition to this new data cloud architecture and unlock the power of their data.

Thought Leadership