Exploring Cortex: AI on the Snowflake Data Cloud

Tom Swann
4 min readMar 15, 2024

Not long ago, I covered some of the upcoming roadmap features of Snowflake, as they were presented at the London Data Cloud World Tour.

Being a bit of a data governance nerd, I am quite excited about Horizon which is the integrated data governance, security and lineage tool suite.

That’s not what we’re here to talk about today though — oh no.

Auditing and compliance doesn’t get the people in their coats running over the fields.

For that you’d need some exciting AI features!

Luckily, Snowflake has us covered. In this article we’re going to take a look at Cortex.

Snowflake Cortex Architecture

LLM’s as a Service, within the Data Warehouse

On our data engineering team we are long time users of Snowpark ML.

Essentially this is a bring-your-own model solution to delivering machine learning. Snowpark provides the compute framework within the governance boundary of the Snowflake data platform and you provide the model which is deployed into that hosting, runtime and monitoring environment.

Build your model using a popular framework such as scikit-learn or PyTorch and you are then responsible for the deployment and operational lifecycle of that model. It is your responsibility regarding how that model is used and through what interface it is exposed.

Cortex is a higher level of abstraction that this.

It is a fully managed service for Machine Learning on the Snowflake platform which presents itself to the user as a series of functions that can be called on your data, in just the same way that you would any other built-in Snowflake SQL function.

There are two categories, which I will describe below.

  1. LLM Functions

As the name suggests, these built-in funtions provide Snowflake users with an interface to powerful Large Language Model functionality:

  • COMPLETE: Given a prompt, returns a response that completes the prompt
  • SUMMARIZE: Returns a summary of the given text
  • SENTIMENT: Returns a sentiment score, from -1 to 1, representing the detected overall positive or negative sentiment of the given text
  • EXTRACT_ANSWER: Given a question and unstructured data, returns the answer to the question if it can be found in the data
  • TRANSLATE: Translates given text from any supported language to any other
  • EMBED_TEXT: Creates vectors (embeddings) from text documents, which can be used to evaluate the semantic similarity of two documents

How cool are these?

A big part of being able to innovate is providing ease of access to disruptive technology to the widest range of people possible.

This is a general solution and so it won’t be capable of addressing all the use cases that might a bespoke model, but being able to integrate these type of built-in functions versus having to roll-your-own model and provide your own interface to it in all situations — and all the highly technical work that this requires — is a real game changer.

2. Machine Learning Functions

Aside from providing an intuitive function interface to LLM capabilities, there are also some common ML use case functions provided. These are as follows:

  • Forecasting — predicting future metric values from past trends in time-series data
  • Anomaly Detection — flags metric values that differ from typical expectations
  • Contribution Explorer — find dimensions and values that affect the metric in surprising ways

Again this is all about lowering the barrier to entry and doing so in a way that allows us to leverage these features inside the Data Warehouse in a very Snowflake-native and thus scalable way.

Our team is particularly excited about the Contribution Explorer.

To use Contribution Explorer directly in your queries and pipelines, call the TOP_INSIGHTS (SNOWFLAKE.ML) table function. This function finds the most important dimensions in a dataset, builds segments from those dimensions, and then detects which of those segments influenced the metric.

If this works in the way that it promises to, then this has the potential to be an incredibly useful tool — particularly for our web traffic data and content analysis.

The Wait

Snowflake can run on AWS, Azure or GCP at present, and there exists some variance in feature parity between the three platforms, with AWS typically being the leader in terms of preview features.

As a Google Cloud enjoyer, it makes me a tad sad that it’s currently only available in preview on AWS and so I have to wait a little longer to get my hands on these features.

I have a feeling that it will be worth the wait.

Snowflake continues to drive a lot of innovation into the core data platform. I particularly enjoy features like this, because they are SQL native and so they make it easy to incorporate powerful tools into everyday ad-hoc analysis and into dbt transformations, so they are not just within the gift of data engineers and data scientists.

Next Up

I would like to talk more about Horizon, though it’s going to be dependent again on feature availability, but if you’ve read my thoughts on DataHub you’ll know that I really enjoy features and tools that help people analyse and make sense of their metadata, so to have this capability in Snowflake native is a super promising development.

--

--

Tom Swann

Botherer of data, player of games. All my views are materialised.