Observando Sistemas Distribuidos - PeruJUG

Observando Sistemas Distribuidos Jorge Quilcate @jeqo89 github.com/jeqo/talk-observing-distributed-systems

Peruano en Noruega Ingeniero de Software en Sysco AS, parte
del equipo de Middleware Iniciando mi trayecto en Sistemas Distribuidos Open-Source Contributor, Apache Kafka project Oracle ACE Associate jeqo.github.io | github.com/jeqo | @jeqo89 Jorge Quilcate

Objetivo Explorar herramientas para incrementar el nivel de Observabilidad en
nuestras aplicaciones

Observabilidad

Observabilidad Metrics, Tracing and Logging - Peter Bourgon https://peter.bourgon.org/blog/2017/02/21/metrics-tracing-and-logging.html

Logging ➔ Eventos discretos: `+load => +logs` ➔ Logging eventos
accionables: peter.bourgon.org/blog/2016/02/07/logging-v-instrumentation.html ➔ Fácil de agregar, difícil de gestionar: blog.codinghorror.com/the-problem-with-logging/ ➔ No intentes gestionar logs como parte de tu aplicación: 12factor.net/logs

https://twitter.com/copyconstruct/status/938444628923097089

OK Log --> ingestor . . . store | |
service --(stdout)--> forwarder --|--> ingestor . . . store | | --> ingestor . . . store OK Log: Distributed and Coördination-Free Logging - Peter Bourgon https://www.youtube.com/watch?v=gWWK2eyZ-sc

Demo: Centralizando logs en Docker con Fluent-bit

➔ Valores agregados: `+load => =metrics` ➔ Método RED: twitter.com/LindsayofSF/status/692191001692237825
◆ Request rate ◆ Error rate ◆ Duration Métricas

➔ Post tweets: ◆ 4.6k requests/second en promedio ◆ 12k
requests/second en pico ➔ Home timeline: ◆ 300k requests/second Soportar una carga de 12,000 writes/second seria sencillo. Sin embargo, el problema no era el volumen de tweets, pero el fan-out. `#reads/sec = 25 * #writes/sec` Describiendo Carga Twitter use-case - Nov, 2016 Designing Data-Intensive Applications (Chapter 1) - Martin Kleppmann https://dataintensive.net

Prometheus Prometheus Architecture https://prometheus.io/docs/introduction/overview

Demo: Instrumentando métricas en JAX-RS con Prometheus

Trazabilidad Distribuida con OpenTracing

Monolithic vs Distributed Tracing

Orígenes

➔ Basado en el paper “Dapper” ◆ Utilizado en la
mayoría de sistemas en Google ◆ Siguen un enfoque basado en Anotaciones, en comparación al enfoque basado en Black-box ➔ “Just an API” ➔ `Trace = DAG[Span]` OpenTracing DAG: Directed Acyclic Graph a.k.a Tree

OpenTracing OpenTracing API application logic µ-service frameworks control-flow packages RPC
frameworks existing instrumentation tracing infrastructure main() T R A C E R J a e g e r service process OpenTracing Isn't just Tracing: Measure Twice, Instrument Once - Ben Sigelman https://www.youtube.com/watch?v=NyySNe6Rr_g

Traza Distribuida

JaegerTracing JaegerTracing - Architecture http://jaeger.readthedocs.io/en/latest/architecture/

Demo: Intro a OpenTracing API

Qué sucede cuando tomamos algo apestoso y aumentamos su área
de superficie? Engineering you - Martin Thompson https://www.youtube.com/watch?v=S4LzzuMTqjs&t=1177s MONOLITH

Demo: Tweets app “Instrument once, Measure twice”

Tweets App - v1: Monolith approach

Tweets App - v2: Data pipeline approach

➔ Adopción, compatibilidad y nivel de conformidad con el API
(Gitter) ➔ Acceso, alcance y nivel de granularidad para diferentes escenarios (Canopy) Retos y Oportunidades con OpenTracing

https://twitter.com/mipsytipsy/status/932551447555858433

What’s next?

Lineage-Driven Fault Injection Orchestrating Chaos Applying Database Research in the
Wild - Peter Alvaro https://www.youtube.com/watch?v=YplkQu6a80Q

Intuition Engineering Intuition Engineering at Netflix - Justin Reynolds https://vimeo.com/173607639

➔ Benjamin Sigelman et al. - “Dapper, a Large-Scale Distributed
Systems Tracing Infrastructure” https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36356.pdf ➔ Raja R. Sambasivan et al. “So, you want to trace your distributed system? Key design insights from years of practical experience” http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf ➔ Monitoring in the time of cloud native https://medium.com/@copyconstruct/monitoring-in-the-time-of-cloud-native-c87c7a5bfa3e ➔ OK Log https://peter.bourgon.org/ok-log/ ➔ Metrics, Tracing and Logging https://peter.bourgon.org/blog/2017/02/21/metrics-tracing-and-logging.html ➔ Distributed Tracing at Uber https://eng.uber.com/distributed-tracing/ ➔ Monitoring and Observability https://medium.com/@copyconstruct/monitoring-and-observability-8417d1952e1c ➔ Measure Anything, Measure Everything https://codeascraft.com/2011/02/15/measure-anything-measure-everything/ ➔ The death of ops is greatly exaggerated https://medium.com/@copyconstruct/the-death-of-ops-is-greatly-exaggerated-ff3bd4a67f24 ➔ Logs and Metrics https://medium.com/@copyconstruct/logs-and-metrics-6d34d3026e38 ➔ Logs - 12 Factor Application https://12factor.net/logs ➔ Take OpenTracing for a HotRod Ride https://medium.com/opentracing/take-opentracing-for-a-hotrod-ride-f6e3141f7941 ➔ The Problem with Logging https://blog.codinghorror.com/the-problem-with-logging/ ➔ Logging v. Instrumentation https://peter.bourgon.org/blog/2016/02/07/logging-v-instrumentation.html ➔ SRE Book https://landing.google.com/sre/book/index.html ➔ Canopy: An End-to-End Performance Tracing And Analysis System http://cs.brown.edu/~jcmace/papers/kaldor2017canopy.pdf ➔ Peter Alvaro et al. - Lineage-Driven Fault Injection https://people.eecs.berkeley.edu/~palvaro/molly.pdf ➔ Vizceral Open Source - Netflix Techblog https://medium.com/netflix-techblog/vizceral-open-source-acc0c32113fe Referencias

https://twitter.com/jessitron/status/579109266042150912

Observando Sistemas Distribuidos Jorge Quilcate @jeqo89 github.com/jeqo/talk-observing-distributed-systems

Observando Sistemas Distribuidos - PeruJUG

Observando Sistemas Distribuidos - PeruJUG

More Decks by Jorge Quilcate

Other Decks in Technology

Featured

Transcript