Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Waldemar Hummer - LocalStack Snowflake Emulator Intro

Waldemar Hummer - LocalStack Snowflake Emulator Intro

An intro to the newest emulator by LocalStack. Contains examples and a roadmap for the future of the Snowflake emulator.

Anca Ghenade

June 08, 2024
Tweet

More Decks by Anca Ghenade

Other Decks in Programming

Transcript

  1. localstack localstack-cloud localstack localstack.cloud 2 What is Snowflake? • A

    Cloud Data Platform that allows for scalable data processing ◦ Uploading data files (CSV/JSON/parquet) to stages ◦ Running SQL statements to create databases/tables/views/… ◦ Running SELECT queries to query data from files and tables ◦ Running scheduled jobs to create ETL pipelines ◦ … • Lots of native integrations and SDKs/tools to interact with the platform ◦ Python Pandas dataframes, Snowpark libraries, JDBC driver, … • Some similarities to the Data/BigData services in AWS: ◦ Athena, Redshift, EMR, Managed Airflow, etc
  2. localstack localstack-cloud localstack localstack.cloud Developing Data Pipelines Locally 3 •

    Developing for Snowflake requires connectivity to the remote cloud at all times ◦ → how does that fit into dev Lifecycle? ◦ → is there a local development story? • Often requested feature, even in Snowflake forums , as well as on StackOverflow, Reddit, etc.. • Similar challenges as for AWS cloud ◦ Speed up development cycles; avoid resource conflicts; test reproducibility; costs …
  3. localstack localstack-cloud localstack localstack.cloud 4 LocalStack for Snowflake • Available

    as a Docker image ◦ Can be easily installed locally • Emulates the actual Snowflake API Surface ◦ → integrates natively with all Tooling ◦ JDBC, DB visualization tools, etc work out of the box • Easy to extend from local into CI pipelines - running tests in CI • Recent Announcement: https://blog.localstack.cloud/2024-05-22-introducing-localstack-for-snowflake/
  4. localstack localstack-cloud localstack localstack.cloud 5 Supported Feature Set (Excerpt) •

    Some of the key features are already available today, including: ◦ Basic operations on warehouses, databases, schemas, and tables (e.g., Using the Python Connector) ◦ Storing files in user/data/named stages (Choosing an Internal Stage for Local Files) ◦ Snowpark libraries (e.g., Snowpark Developer Guide for Python) ◦ Snowpipe streaming with Kafka connector (Using Snowflake Connector for Kafka with Snowpipe Streaming) ◦ JavaScript and Python UDFs (Introduction to JavaScript UDFs) ◦ Tasks for scheduled execution ◦ Table streams for change data capture and audit logs ◦ … and quite a bit more!
  5. localstack localstack-cloud localstack localstack.cloud Seamless integration with DB viz tools

    (e.g., DBeaver) Source: https://www.youtube.com/watch?v=1l9i_755MlA 6
  6. localstack localstack-cloud localstack localstack.cloud 8 Starting Up • Configure your

    auth token, then use the localstack CLI to start up: • Configure your client app to connect to the local endpoint: $ export LOCALSTACK_AUTH_TOKEN=<your-auth-token> $ IMAGE_NAME=localstack/snowflake localstack start import snowflake.connector as sf conn = sf.connect( user="test", password="test", account="test", database="test", host="snowflake.localhost.localstack.cloud", )
  7. localstack localstack-cloud localstack localstack.cloud 11 Sample: Queries over Covid19 dataset

    • Taken from Snowflake “Getting Started” Guide ◦ https://quickstarts.snowflake.com/guide/data_science_with_dataiku/index.html#0 • Data set contains a lot of different data points ◦ mobility data ◦ vaccination data ◦ … • For this sample, we’ll focus on: ◦ Putting CSV files to a local S3 stage ◦ Loading the CSV data into a table ◦ Running some simple SELECT queries
  8. localstack localstack-cloud localstack localstack.cloud 13 Sample App: NYC Citybike Trips

    • Taken from Snowflake “Getting Started” Guide • Contains trips and weather information over several years • Data available in a public S3 bucket ◦ → can be integrated in a local Snowflake stage directly! • Web app displays the data in simple charts
  9. localstack localstack-cloud localstack localstack.cloud 15 Table Streams • See https://docs.snowflake.com/en/user-guide/streams-intro

    • Enables Change Data Capture (CDC) for Snowflake tables • Stream = minimal set of changes from its current offset to the current version of the table • Streams can be “consumed” via DML Queries, e.g.: INSERT INTO target … SELECT * FROM stream …
  10. localstack localstack-cloud localstack localstack.cloud 17 Streamlit Apps • Streamlit =

    Python UI Framework ◦ https://streamlit.io • Integrates natively with Snowflake • Lots of UI components available ◦ Charts ◦ Widgets ◦ Maps ◦ Graphs ◦ … • → Easy way to create Data Apps!
  11. localstack localstack-cloud localstack localstack.cloud Cloud Pods 19 Persistent Shareable Sandboxes

    Cloud Pods are a mechanism that allows you to take a snapshot of the state in your current LocalStack instance, persist it to a storage backend, and easily share it with your team members.
  12. localstack localstack-cloud localstack localstack.cloud 21 Cloud Pods: Save & Load

    DB Snapshots • Cloud pods can be saved and loaded from the CLI • We’ve prepared a cloud pod with a table named “test” - load it like this: $ localstack pod save my-pod-123 $ localstack pod load my-pod-123 $ localstack pod load pod-snowflake $ snow sql -c local --query 'select * from test' +----------------------------------+ | MESSAGE | |----------------------------------| | Hello from LocalStack Snowflake! | +----------------------------------+
  13. localstack localstack-cloud localstack localstack.cloud Implementation 23 • High-Level Architecture ◦

    Query Processors ◦ Core DB Engine ▪ Current: Postgres ▪ Alternative: DuckDB ◦ Auxiliary Services ▪ Streams, stages, … • Written in Python • Running in Docker https://blog.localstack.cloud/2024-05-22-introducing-localstack-for-snowflake
  14. localstack localstack-cloud localstack localstack.cloud Challenges: Query Transpilation, Data Types 24

    • Snowflake/Postgres SQL is similar, yet many subtle differences • Query parsing using sqlglot ◦ Allows us to create a query AST, and perform modifications on it ◦ Big shout-out to Tobiko Data for providing this library! 🙌 ◦ We’ve also been able to contribute a few upstream PRs :) (#2989, #3510, #3519) • Challenge: high-fidelity support for Snowflake data types ◦ Often either no direct mapping, or different semantics in Postgres ◦ Example: timestamps (TIMESTAMP_LTZ, TIMESTAMP_NTZ, etc) ◦ advanced data types in Snowflake like generic VARIANT type ◦ Needed to introduce a custom VARIANT data type in the core DB engine https://blog.localstack.cloud/2024-05-22-introducing-localstack-for-snowflake
  15. localstack localstack-cloud localstack localstack.cloud Local Machine Bridging Local & Remote:

    Connection Proxy 25 • Easily flip the switch between local and remote execution • Can be configured with real Snowflake credentials (see screenshot below) ◦ Calls will be forwarded to real cloud and returned to the client • Enables a lot of exciting use cases: ◦ duplex mode - running queries against local AND remote (local mirror) ◦ Route only requests for certain tables to upstream: JOIN local & remote Client (e.g., Python connector) LocalStack Snowflake Connection Proxy Real Snowflake Cloud Account Core Engine
  16. localstack localstack-cloud localstack localstack.cloud 27 Roadmap • LocalStack “vNext”: Expanding

    our focus into the data engineering space ◦ Based on our learnings and foundation of the LocalStack AWS emulator ◦ Turns out that there is a need for better local testing of data pipelines as well! • The Snowflake emulator is still early stage, but the direction looks very promising • Nicely integrates with our existing LocalStack AWS features ◦ Using local S3; soon: AWS<>Snowflake integrations (e.g., Kinesis Firehose, …) ◦ LS Cloud Platform: saving/loading of Cloud Pods to manage persistent state • Exploring interesting challenges related to data testing ◦ Test data management; testing data/ETL in CI pipelines; … • Most of all: We’d love to LEARN about YOUR use cases! ◦ Get in touch to participate in the LocalStack Snowflake preview!