Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Migratory Patterns - KubeCon Salt Lake City, 2024

Migratory Patterns - KubeCon Salt Lake City, 2024

Big technical migrations - like switching databases - can feel like you're swapping out the engine of a bus while continuing to drive down the freeway (with all your users screaming in the back). However, there are ways to make these transitions safe, incremental, low-stress. In this talk we'll walk through a real-world case study of switching a production system from one database to another with no downtime, and no tears, using techniques like Expand/Contract, Dark Launch and Parallel Run. We'll also see hands-on examples of using CNCF open standards like Open Feature and Open Telemetry to manage this migration.

Pete Hodgson

November 16, 2024
Tweet

More Decks by Pete Hodgson

Other Decks in Programming

Transcript

  1. ✨AI-POWERED ✨GenAI✨ Service Our AI service uses RAG, and needed

    a vector store (just a specialized database) @thepete.net
  2. Vector Store ✨AI-POWERED ✨GenAI✨ Service Our AI service uses RAG,

    and needed a vector store (just a specialized database) @thepete.net
  3. Vector Store ✨AI-POWERED ✨GenAI✨ Service Our AI service uses RAG,

    and needed a vector store (just a specialized database) We were loading data into that vector store via a data ingestion pipeline @thepete.net
  4. @thepete.net ✨AI-POWERED ✨GenAI✨ Service Data Ingestion Vector Store Our AI

    service uses RAG, and needed a vector store (just a specialized database) We were loading data into that vector store via a data ingestion pipeline
  5. @thepete.net ✨AI-POWERED ✨GenAI✨ Service Data Ingestion Vector Store Pinecone Vector

    Store Our AI service uses RAG, and needed a vector store (just a specialized database) We were loading data into that vector store via a data ingestion pipeline For our initial proof-of-concept, we’d used Pinecone for our vector store technology
  6. @thepete.net ✨AI-POWERED ✨GenAI✨ Service Data Ingestion Vector Store Pinecone Vector

    Store Our AI service uses RAG, and needed a vector store (just a specialized database) We were loading data into that vector store via a data ingestion pipeline For our initial proof-of-concept, we’d used Pinecone for our vector store technology For various reasons, we wanted to re-platform that vector store to Postgres (w. pgvector) Pinecone Postgres
  7. Writer Reader Pinecone Postgres 1. STOP THE WORLD 2. BACKFILL

    DATA " BIG BANG MIGRATION @thepete.net
  8. Writer Reader Pinecone Postgres 1. STOP THE WORLD 2. BACKFILL

    DATA 3. CUT-OVER " BIG BANG MIGRATION @thepete.net
  9. Writer Reader Pinecone Postgres 1. STOP THE WORLD 2. BACKFILL

    DATA 3. CUT-OVER 4. START THE WORLD " BIG BANG MIGRATION @thepete.net
  10. Disadvantages of Big Bang Migrations • We have to stop

    the world • downtime for our users # • very stressful for us! $ • No safe way to test production changes • No plan B if things go wrong @thepete.net
  11. Writer Reader Changes have only been written to Postgres. Not

    feasible to fall back To Pinecone. NO PLAN B Pinecone Postgres @thepete.net
  12. EXPAND/CONTRACT MIGRATION 1. DUAL WRITE 2. BACKFILL 3. CUT-OVER 4.

    WRAP UP Pinecone Writer Reader Pinecone Postgres @thepete.net
  13. Expand/Contract enables confident migrations A sequence of steps, where every

    step has an option to fall back Safety -> Courage -> Speed
  14. def run_pipeline(self): print("loading projects...") projects = self._hippocampus_db.read_projects() print("building project descriptions...")

    build_docs(projects) print("prepping upload...") upload_df = prep_upload(projects) # at this stage of our migration from pinecone to postgres we're # dual-writing to both pinecone and local PG print("uploading to pinecone vectorstore...") self.upload_to_pinecone_vectorstore(upload_df) print("uploading to pg vectorstore...") self.upload_to_postgres_vectorstore(upload_df) print("PIPELINE COMPLETE!") DUAL WRITE, IRL Writer Pinecone Postgres prepare data upload data
  15. def lookup_candidates_for(self, description: str): if feature_flags.use_postgres_for_vector_store(): vector_store = self._postgres_vector_store else:

    vector_store = self._pinecone_vector_store results = vector_store.similarity_search( description, self._number_of_results ) return self._candidates_from(results) CUT-OVER, IRL Reader FF Pinecone Postgres
  16. One Standard Many Vendors AppDynamics (Cisco) Aria by VMware (Wavefront)

    Arize Phoenix Aspecto Axiom Better Stack BugSnag Causely Centreon Chronosphere Control Plane Coralogix Cribl Dash0 DaoCloud Datadog Dynatrace Elastic F5 observIQ OneUptime OpenObserve OpenText Oracle qryn Red Hat Sentry Software ServicePilot SigNoz SolarWinds Splunk Sumo Logic TelemetryHub Traceloop Uptrace Google Cloud Platform Grafana Labs Helios Highlight Honeycomb HyperDX Immersive Fusion Instana ITRS KloudFuse KloudMate ServiceNow Cloud Observability (Lightstep) Last9 Levitate LogicMonitor LogScale by Crowdstrike (Humio) Lumigo MetricsHub Middleware New Relic Observe, Inc. Apache SkyWalking Fluent Bit Jaeger ObserveAny GreptimeDB TingYun VictoriaMetrics Tracetest Alibaba Cloud Seq VuNet Systems Bonree Embrace groundcover @thepete.net
  17. Old Thing New Thing Consumer PARALLEL EXECUTION 1. Stand up

    the new thing 2. Keep state synchronized between old and new things 3. Choose when to send traffic to the new thing @thepete.net
  18. @thepete.net DB schema changes API schema changes Re-platforming to different

    infrastructure Extracting a microservice Switching service providers {v1} {v2}
  19. Old Thing New Thing Consumer PARALLEL EXECUTION 1. Stand up

    the new thing 2. Keep state synchronized between old and new things 3. Choose when to send traffic to the new thing @thepete.net
  20. The Monolith Accounting Module Extracting a Microservice Accounting Service -

    Put a shim in front of the module - Have all internal calls route through the shim @thepete.net
  21. The Monolith Accounting Module Extracting a Microservice Accounting Service -

    Put a shim in front of the module - Have all internal calls route through the shim - Shim routes calls to internal module or external service (based on feature flag) @thepete.net
  22. Accounting Service Accounting Module Consumer RESULT FROM OLD MODULE RESULT

    FROM NEW SERVICE CHECK RESULTS AGREE RESULT FROM OLD MODULE PARALLEL RUN pattern RESULT FROM NEW SERVICE discard parallel result CALL BOTH IMPLEMENTATIONS @thepete.net
  23. def similarity_search(self,description:str) -> List[Chunk]: # dark launch w. parallel run:

    we will use both pinecone- and # postgres-backed vector stores to find candidate comans, # and compare the results to ensure that the postgres-backed # vector store is working as expected. pinecone_results = self._pinecone_store.similarity_search(description) postgres_results = self._postgres_store.similarity_search(description) self._check_for_parallel_run_descrepency(pinecone_results,postgres_results) # *FOR NOW*, # WE THROW AWAY THE POSTGRES-BACKED RESULTS AND ONLY RETURN THE PINECONE-BACKED RESULTS return pinecone_results PARALLEL RUN, IRL
  24. def _check_for_parallel_run_descrepancy( self, pinecone_results: list[PineconeEntry], postgres_results: list[PostgresEntry] ): if len(pinecone_results)

    != len(postgres_results): self._record_discrepancy( "different number of results", { "pinecone_count": len(pinecone_results), "postgres_count": len(postgres_results), }, ) # no point in continuing comparison if we're not comparing the same entries! return for pinecone_entry, postgres_entry in zip(pinecone_results, postgres_results): # pinecone ids have a "pmatch:" prefix which we need to take into account when comparing if pinecone_entry.coman_id != "pmatch:" + postgres_entry.coman_id: self._record_discrepancy( "different coman ids", { "pinecone_coman_id": pinecone_entry.coman_id, "postgres_coman_id": postgres_entry.coman_id, }, ) # no point in continuing comparison if we're not comparing the same entries! continue # a very small variance is possible due to the way that the vector stores calculate vector distance if abs(pinecone_entry.similarity - postgres_entry.similarity) > 1e-5: self._record_discrepancy( "different similarity scores", { PARALLEL RUN, IRL
  25. Dark Launch The secret for going from zero to seventy

    million users overnight is to avoid doing it all in one fell swoop. We chose to simulate the impact of many real users hitting many machines by means of a ‘dark launch’ period in which Facebook pages would make connections to the chat servers, query for presence information and simulate message sends without a single UI element drawn on the page. “ ” https://engineering.fb.com/2008/05/13/web/facebook-chat/
  26. in conclusion • Big-bang migrations are risky • Expand/contract reduce

    the stress • The pattern is applicable for a surprising variety of migrations • There are fancy variants, but they all build on the same core ideas • feature flags and observability make things even better @thepete.net
  27. Thanks! Questions? Slides will be in my socials. I love

    talking about this stuff! Come chat. beingagile Pete Hodgson https://thepete.net @ph1 @thepete.net