Snowflake Data Cloud World Tour

Tom Swann
7 min readOct 10, 2023

In which our intrepid correspondant provides summary and some amount of insight on the London leg of the Snowflake Data Cloud World Tour 2023.

Today I was (and still am at time of writing) in the London Excel to find out about Snowflake’s plans and ambitions for the “Data Cloud” — their all encompassing platform for data storage, processing, AI and applications.

I am a Data Engineer/Architect at Insider. Our organisation is a current Snowflake customer, so I have that interest in having a finger on the pulse of their roadmap. But hopefully this reportage will provide some insight for really anyone who is an interested party — be ye data product org or consultant.

I am typing this in the expo hall: expect limited editing, plentiful typos and the bits that I would usually decide to delete from shame being left in.

Onwards!

Snowflake Data Cloud

In which Snowflake sets out to consume the data ecosystem like some kind of cloud data warehousing Galactus.

The first thing I should impress upon you is the scale of the whole affair. If like me back in 2022, you came to Snowflake as a relative newcomer, you might be forgiven for thinking it is “just another data warehouse” — but y’know cloud native, a foundational part of the “modern data stack” etc. etc.

Well, Snowflake have filled a significant portion of the London Excel. It feels more like the Strata Data Conference of old than it does a single vendor conference. Unless that vendor is a Microsoft, or an Amazon — more of which later. Each 30 minute slot is occuppied by 9 parallel conference tracks. There’s a lot going on, by necessity and lack of a Tom-cloning machine (my boss is still holding out for it) I have had to skip large swathes of interesting talks focusing (I hope) on the vital topics.

The Keynote — Snowflake eats the dataverse

Snowflake Ketnotes — ICC Auditorium

First — off to the ICC Auditorium for the keynote speeches. Keynote’s are hit and miss affairs and prone to coma inducement a lot of the time.

This one held my attention — it was delivered with a laser-like focus on the message and the vision that Snowflake sees for the platform. I was impressed and carried along by the enthusiasm and the conviction of CEO, Frank Slootman.

Snowflake, in short, want to own the data cloud real-estate in the same way that AWS does more broadly. You can hardly fault them for ambition.

This means removing the reasons why you wouldn’t be on the Snowflake platform. Some of the key new developments that look to enable that approach will be summarised in this blog.

Data Cloud seeks to be the one platform of storage for data, but also then the architecture by which you process and manage it — be that traditional data engineering, AI and ML and even native application development. A complete end-to-end platform for the data plane of an organisation.

Snowpark — going after those Spark jobs

Snowpark is Snowflake’s application development layer.

30% of Snowflake customers are currently using Snowpark and that leaves a lot of room for growth. All those Spark clusters and jobs that are running out there are the target — these workloads that Snowflake would see their own platform cannabalising.

The headline development here was Snowpark Container Services. Do you currently run the applications which interact with Snowflake on a Kubernetes cluster, or other containerised environment in your cloud provider of choice?

Following the vision presented in the keynote — these are workloads that Snowflake are going after. The idea being that you bring those applications within the boundary of data cloud — within it’s data governance and security permitter, and that you offload things like cluster provisioning and management to the Snowflake platform.

It’s a compelling argument and would give pause to the idea of developing further 3rd parter container based processes and applications, knowing that this is coming down the road.

Iceberg — Open Table Data Format

Apache Iceberg is an open table format. Think of it as an open and interoperable alternative to Snowflakes native table object format.

By supporting it, it opens up internally managed Snowflake tables to direct processing by other tools — the obvious one is Spark — which is a boon for organisations which are already heavily invested in these technologies and who don’t want their data to be locked in to Snowflakes proprietary table storage encoding.

For those who are less concerned with this aspect of things, the benefits extend to being able to use Iceberg for external tables — if you have parquet files sitting in GCS or S3 storage then you should see performance benefits when using Iceberg for the storage format.

Blurry snap of Iceberg benchmarking

Snowpark ML Lifecycle

This is the other exciting development as far as I’m concerned — the support for an end-to-end machine learning development process on the Snowpark platform.

This is something that things like AWS Sagemaker make a big deal of — the support for training, validation, model registry and model deployment are big wins for inclusion on the Snowpark platform as they represent a pipeline that is difficult to peice together from fragmented technology choices. A compelling value add for vendors like Sagemaker or Azure ML are the integrated nature of the tooling that supports this kind of (highly specialised) development lifecycle.

So it makes complete sense that Snowflake would also want to add more robust and complete support for the ML productionisation lifecycle natively within their own platform.

Unistore

Snowflake is traditionally an OLAP (offline analytical processing) system, but with the introduction of Unistore the ambition is very much to start tackling OLTP (online transactional processing) and to thereby unify both of these classes of workloads within the one platform.

This is currently not at the public preview stage, so you can’t go and play around with it, but it’s one to watch and more evidence of Snowflakes ambition to be “the data cloud” — and to have many different types of application and workload within the boundary of their governance.

Document AI — Unstructured Data

I sat up and took notice of this one — big time. Unstructured data — PDFs, invoices, recipts, scans, handwritten scrawls — all of this represents the “long tail” of the data footprint within most organisations.

Traditionally your options here were either to build a custom UDF or bespoke application that would either use open source libraries or a cloud managed service (think AWS Textract or Comprehend) and have a bulk process that would try to wrangle the corpus of files in to some semblence of structure.

Snowflake have (in preview) an LLM based native tool called Document AI which can ingest documents and — remarkably to be completely honest — do a more than good job of extracting named values and structuring it such a way as it can be queried and processed in the data warehouse.

As someone who is intimately familiar with building this type of application using the aforementioned bespoke approach, I can’t tell you how hyped I was to see this and how much I will be keeping tabs on it’s development and progress towards general availability.

For certain organisations this has the potential to be a real game changing innovation.

At the moment it is limited to around 5 pages per document, which constrains it’s applicability to certain use cases, but I would expect this to improve over time.

Media Data Cloud

Let’s get domain specific for a moment. I am obviously very interested in digital media — content publishing, the advertising ecosystem, user identity an all that good stuff.

So are Snowflake — and they have an industry vertical data cloud offering to suit.

The “Media Data Cloud”

Much of the narrative here concerned the concept of “Clean Rooms” — essentially a cross organisation secure data sharing arrangement, where things like customer marketing data can be anonymised, aggregated and then safely analysed — primarily to know if your advertising spend is being effective and hitting the desired audience.

If you don’t care about this, then chances are Snowflake have a domain specialism that you do care about — retail, government, education, manufacturing, healthcare and finance. You know — all the things.

Another keynote topic and one of the major selling points of Snowflake at present are the opportunities for data sharing across organisations that are part of the data cloud ecosystem.

One keynote use case was the example of banking behemoth JP Morgan making acquisitions of other banks and leveraging the fact that both were Snowflake users to greatly reduce the otherwise painful process of integrating and sharing data assets.

Organisationally, Snowflake are supporting this specialism with Principal leadership roles in various verticals — with senior individuals on hand to present and answer questions regarding the various domains, so there people within the organisation who can act as points of contact these.

That’s A Wrap

I really feel like I was only able to scratch the surface of this expo.

I think the overall advice I would leave you with is that Snowflake has become a player and a platform that you cannot ignore. It has gathered that kind of cultural traction in the data arena.

Whether your interests lie in data engineering, 3rd party data exploration or the refined airs of AI and LLMs, then if you’re not looking at what Snowflake Data Cloud has to offer then you have a blind spot: a big one, and judging by the scope and interest shown in the expo today there’s a high likelihood that your competitors or your customers will be.

--

--

Tom Swann

Botherer of data, player of games. All my views are materialised.