Snowflake makes it easy to unlock the value of your data, no matter where it exists. There are many benefits of bringing data into Snowflake, but sometimes it’s a journey that takes time to be completed. In the interim, while your data exists in other locations, you still want to derive value from it. That is easy to do with the Snowflake external tables.
Snowflake external tables give you the ability to directly process or query your data that exists elsewhere without ingesting it into Snowflake, including data that lives in a data lake. As you would expect, the Snowflake external tables have been well-received especially, because Snowflake continues to expand support for external tables.
One new way you can work with your external data is by integrating Apache Hive Meta stores with Snowflake. You can use the new Hive Metastore connector to connect to your Hadoop environments.
The Snowflake support is also available if you are using newer technologies, like Delta Lake or Apache Iceberg.
Delta Lake is a table format on a data lake Spark-based platform. A Snowflake external table can be created which will reference your Delta Lake cloud storage locations. Delta Lake support is currently a Snowflake preview feature for all accounts.
Snowflake supports processing and querying unstructured data, which is stored externally in a data lake, as well as providing the capability to store unstructured data directly in Snowflake. While it is possible to store unstructured data in a Snowflake table, using a VARIANT data type. It is not the preferred way as there is a limit on the file size. However, when using an internal Snowflake stage for storing unstructured data, there is no inherent limitation. You can use the Snowflake platform as a data lake, by storing unstructured data in an internal stage, or you can access data stored outside of Snowflake by using external tables.
One of the newest and most exciting announcements from Snowflake is the upcoming support for Apache Iceberg. Many of the challenges associated with object stores have been addressed by Apache Iceberg and has made it a popular choice as a data lake. Whereas Hive keeps track of data at the folder level, Iceberg keeps track of a complete list of all files within a table using a persistent tree structure. Keeping track of the data at the folder level can lead to performance problems, and there is the potential for data to appear as if it were missing when file list operations are performed at the folder level. Apache Iceberg table format is used by many leading technology companies like Netflix, Apple, LinkedIn, Expedia, and AWS.
Snowflake support for Iceberg tables will soon be in private preview. While the design is not yet finalized, you can expect creating an external table to support your data in Apache to look something like this:
CREATE EXTERNAL TABLE <table name>
TABLE_FORMAT = Iceberg
FILE_FORMAT = Parquet
REFRESH_ON_CREATE = False
AUTO_REFRESH = False
SNAPSHOT_LOCATION = @<stage name>/<file path>
External tables are just one way that the Snowflake platform supports a variety of different data types and workloads at scale, giving organizations the ability to easily implement their architectural design pattern of choice.
Be sure to check out our other blog posts and demos for more details about the latest Snowflake announcements.
Snowflake Summit 2022: Register to join us live at Snowflake Summit 2022