Migratory Patterns - KubeCon Salt Lake City, 2024

Migratory Patterns KubeCon + CloudNativeCon North America 2024 Salt Lake
City @thepete.net

✨ AI

✨AI-POWERED ✨GenAI✨ Service Our AI service uses RAG, and needed
a vector store (just a specialized database) @thepete.net

Vector Store ✨AI-POWERED ✨GenAI✨ Service Our AI service uses RAG,
and needed a vector store (just a specialized database) @thepete.net

Vector Store ✨AI-POWERED ✨GenAI✨ Service Our AI service uses RAG,
and needed a vector store (just a specialized database) We were loading data into that vector store via a data ingestion pipeline @thepete.net

@thepete.net ✨AI-POWERED ✨GenAI✨ Service Data Ingestion Vector Store Our AI
service uses RAG, and needed a vector store (just a specialized database) We were loading data into that vector store via a data ingestion pipeline

@thepete.net ✨AI-POWERED ✨GenAI✨ Service Data Ingestion Vector Store Pinecone Vector
Store Our AI service uses RAG, and needed a vector store (just a specialized database) We were loading data into that vector store via a data ingestion pipeline For our initial proof-of-concept, we’d used Pinecone for our vector store technology

@thepete.net ✨AI-POWERED ✨GenAI✨ Service Data Ingestion Vector Store Pinecone Vector
Store Our AI service uses RAG, and needed a vector store (just a specialized database) We were loading data into that vector store via a data ingestion pipeline For our initial proof-of-concept, we’d used Pinecone for our vector store technology For various reasons, we wanted to re-platform that vector store to Postgres (w. pgvector) Pinecone Postgres

@thepete.net Pinecone Postgres Writer Reader Pinecone Postgres Writer Reader BEFORE
AFTER

" BIG BANG MIGRATION Pinecone Postgres Writer Reader @thepete.net

Writer Reader Pinecone 1. STOP THE WORLD Postgres " BIG
BANG MIGRATION @thepete.net

Writer Reader Pinecone Postgres 1. STOP THE WORLD 2. BACKFILL
DATA " BIG BANG MIGRATION @thepete.net

DATA 3. CUT-OVER " BIG BANG MIGRATION @thepete.net

DATA 3. CUT-OVER 4. START THE WORLD " BIG BANG MIGRATION @thepete.net

Disadvantages of Big Bang Migrations • We have to stop
the world • downtime for our users # • very stressful for us! $ • No safe way to test production changes • No plan B if things go wrong @thepete.net

Writer Reader Changes have only been written to Postgres. Not
feasible to fall back To Pinecone. NO PLAN B Pinecone Postgres @thepete.net

EXPAND/CONTRACT MIGRATION BIG BANG MIGRATION @thepete.net

EXPAND/CONTRACT MIGRATION Writer Reader Pinecone Postgres @thepete.net

EXPAND/CONTRACT MIGRATION 1. DUAL WRITE Writer Reader Pinecone Postgres @thepete.net

EXPAND/CONTRACT MIGRATION 1. DUAL WRITE 2. BACKFILL Writer Reader Pinecone
Postgres @thepete.net

EXPAND/CONTRACT MIGRATION 1. DUAL WRITE 2. BACKFILL 3. CUT-OVER Writer
Reader Pinecone Postgres @thepete.net

EXPAND/CONTRACT MIGRATION 1. DUAL WRITE 2. BACKFILL 3. CUT-OVER 4.
WRAP UP Pinecone Writer Reader Pinecone Postgres @thepete.net

Expand/Contract enables conﬁdent migrations A sequence of steps, where every
step has an option to fall back Safety -> Courage -> Speed

Show me the code

def run_pipeline(self): print("loading projects...") projects = self._hippocampus_db.read_projects() print("building project descriptions...")
build_docs(projects) print("prepping upload...") upload_df = prep_upload(projects) # at this stage of our migration from pinecone to postgres we're # dual-writing to both pinecone and local PG print("uploading to pinecone vectorstore...") self.upload_to_pinecone_vectorstore(upload_df) print("uploading to pg vectorstore...") self.upload_to_postgres_vectorstore(upload_df) print("PIPELINE COMPLETE!") DUAL WRITE, IRL Writer Pinecone Postgres prepare data upload data

def lookup_candidates_for(self, description: str): if feature_flags.use_postgres_for_vector_store(): vector_store = self._postgres_vector_store else:
vector_store = self._pinecone_vector_store results = vector_store.similarity_search( description, self._number_of_results ) return self._candidates_from(results) CUT-OVER, IRL Reader FF Pinecone Postgres

Pinecone Postgres Writer Reader feature ﬂag 3. CUT-OVER @thepete.net

Pinecone Postgres Reader feature ﬂag @thepete.net

Pinecone Postgres Reader feature ﬂag 0% @thepete.net

Feature Flags & Observability @thepete.net

spot the canary @thepete.net

start of 5% canary spot the canary @thepete.net

spot the canary start of 5% canary read_from_new_system: true read_from_new_system:
false @thepete.net

spot the canary start of 5% canary read_from_new_system: true read_from_new_system:
false ~ twice as slow! @thepete.net

& @thepete.net

One Standard Many Vendors AppDynamics (Cisco) Aria by VMware (Wavefront)
Arize Phoenix Aspecto Axiom Better Stack BugSnag Causely Centreon Chronosphere Control Plane Coralogix Cribl Dash0 DaoCloud Datadog Dynatrace Elastic F5 observIQ OneUptime OpenObserve OpenText Oracle qryn Red Hat Sentry Software ServicePilot SigNoz SolarWinds Splunk Sumo Logic TelemetryHub Traceloop Uptrace Google Cloud Platform Grafana Labs Helios Highlight Honeycomb HyperDX Immersive Fusion Instana ITRS KloudFuse KloudMate ServiceNow Cloud Observability (Lightstep) Last9 Levitate LogicMonitor LogScale by Crowdstrike (Humio) Lumigo MetricsHub Middleware New Relic Observe, Inc. Apache SkyWalking Fluent Bit Jaeger ObserveAny GreptimeDB TingYun VictoriaMetrics Tracetest Alibaba Cloud Seq VuNet Systems Bonree Embrace groundcover @thepete.net

Old Thing New Thing Consumer PARALLEL EXECUTION 1. Stand up
the new thing 2. Keep state synchronized between old and new things 3. Choose when to send traﬃc to the new thing @thepete.net

@thepete.net DB schema changes API schema changes Re-platforming to diﬀerent
infrastructure Extracting a microservice Switching service providers {v1} {v2}

Old Thing New Thing Consumer PARALLEL EXECUTION 1. Stand up
the new thing 2. Keep state synchronized between old and new things 3. Choose when to send traﬃc to the new thing @thepete.net

The Monolith Accounting Module Extracting a Microservice Accounting Service @thepete.net

The Monolith Accounting Module Extracting a Microservice Accounting Service -
Put a shim in front of the module @thepete.net

Put a shim in front of the module - Have all internal calls route through the shim @thepete.net

Put a shim in front of the module - Have all internal calls route through the shim - Shim routes calls to internal module or external service (based on feature ﬂag) @thepete.net

Accounting Service Accounting Module Shim feature ﬂag Consumer @thepete.net

Accounting Service Accounting Module Consumer RESULT FROM OLD MODULE RESULT
FROM NEW SERVICE CHECK RESULTS AGREE RESULT FROM OLD MODULE PARALLEL RUN pattern RESULT FROM NEW SERVICE discard parallel result CALL BOTH IMPLEMENTATIONS @thepete.net

def similarity_search(self,description:str) -> List[Chunk]: # dark launch w. parallel run:
we will use both pinecone- and # postgres-backed vector stores to find candidate comans, # and compare the results to ensure that the postgres-backed # vector store is working as expected. pinecone_results = self._pinecone_store.similarity_search(description) postgres_results = self._postgres_store.similarity_search(description) self._check_for_parallel_run_descrepency(pinecone_results,postgres_results) # *FOR NOW*, # WE THROW AWAY THE POSTGRES-BACKED RESULTS AND ONLY RETURN THE PINECONE-BACKED RESULTS return pinecone_results PARALLEL RUN, IRL

def _check_for_parallel_run_descrepancy( self, pinecone_results: list[PineconeEntry], postgres_results: list[PostgresEntry] ): if len(pinecone_results)
!= len(postgres_results): self._record_discrepancy( "different number of results", { "pinecone_count": len(pinecone_results), "postgres_count": len(postgres_results), }, ) # no point in continuing comparison if we're not comparing the same entries! return for pinecone_entry, postgres_entry in zip(pinecone_results, postgres_results): # pinecone ids have a "pmatch:" prefix which we need to take into account when comparing if pinecone_entry.coman_id != "pmatch:" + postgres_entry.coman_id: self._record_discrepancy( "different coman ids", { "pinecone_coman_id": pinecone_entry.coman_id, "postgres_coman_id": postgres_entry.coman_id, }, ) # no point in continuing comparison if we're not comparing the same entries! continue # a very small variance is possible due to the way that the vector stores calculate vector distance if abs(pinecone_entry.similarity - postgres_entry.similarity) > 1e-5: self._record_discrepancy( "different similarity scores", { PARALLEL RUN, IRL

Dark Launch

Dark Launch The secret for going from zero to seventy
million users overnight is to avoid doing it all in one fell swoop. We chose to simulate the impact of many real users hitting many machines by means of a ‘dark launch’ period in which Facebook pages would make connections to the chat servers, query for presence information and simulate message sends without a single UI element drawn on the page. “ ” https://engineering.fb.com/2008/05/13/web/facebook-chat/

in conclusion • Big-bang migrations are risky • Expand/contract reduce
the stress • The pattern is applicable for a surprising variety of migrations • There are fancy variants, but they all build on the same core ideas • feature ﬂags and observability make things even better @thepete.net

& @thepete.net

Thanks! Questions? @thepete.net

Thanks! Questions? Slides will be in my socials. I love
talking about this stuﬀ! Come chat. beingagile Pete Hodgson https://thepete.net @ph1 @thepete.net

Migratory Patterns - KubeCon Salt Lake City, 2024

Migratory Patterns - KubeCon Salt Lake City, 2024

More Decks by Pete Hodgson

Other Decks in Programming

Featured

Transcript