Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Что OpenTelemetry нам готовит?

Что OpenTelemetry нам готовит?

Ilya Kaznacheev

March 04, 2020
Tweet

More Decks by Ilya Kaznacheev

Other Decks in Programming

Transcript

  1. Что
    нам готовит?

    View full-size slide

  2. что такое
    distributed tracing?

    View full-size slide

  3. 2009/11/10 23:00:01 starting application...
    2009/11/10 23:00:02 connecting to database
    2009/11/10 23:00:03 connecting to file storage
    2009/11/10 23:00:04 connecting to meow API 1...
    2009/11/10 23:00:05 connecting to meow API 2...
    2009/11/10 23:00:06 connecting to meow API 3...
    2009/11/10 23:00:22 GET /meow -> [200] "meow"
    2009/11/10 23:00:33 GET /meow -> [200] "meow"
    2009/11/10 23:00:44 GET /woof -> [400] "bad request"

    View full-size slide

  4. 2009/11/10 23:00:01 starting application...
    2009/11/10 23:00:02 connecting to database
    2009/11/10 23:00:03 connecting to file storage
    2009/11/10 23:00:04 connecting to meow API 1...
    2009/11/10 23:00:05 connecting to meow API 2...
    2009/11/10 23:00:06 connecting to meow API 3...
    2009/11/10 23:00:22 GET /meow -> [200] "meow"
    2009/11/10 23:00:33 GET /meow -> [200] "meow"
    2009/11/10 23:00:44 GET /woof -> [400] "bad request"
    2009/11/10 23:00:01 starting application...
    2009/11/10 23:00:02 connecting to database
    2009/11/10 23:00:03 connecting to file storage
    2009/11/10 23:00:04 connecting to meow API 1...
    2009/11/10 23:00:05 connecting to meow API 2...
    2009/11/10 23:00:06 connecting to meow API 3...
    2009/11/10 23:00:22 GET /meow -> [200] "meow"
    2009/11/10 23:00:33 GET /meow -> [200] "meow"
    2009/11/10 23:00:44 GET /woof -> [400] "bad request"
    2009/11/10 23:00:01 starting application...
    2009/11/10 23:00:02 connecting to database
    2009/11/10 23:00:03 connecting to file storage
    2009/11/10 23:00:04 connecting to meow API 1...
    2009/11/10 23:00:05 connecting to meow API 2...
    2009/11/10 23:00:06 connecting to meow API 3...
    2009/11/10 23:00:22 GET /meow -> [200] "meow"
    2009/11/10 23:00:33 GET /meow -> [200] "meow"
    2009/11/10 23:00:44 GET /woof -> [400] "bad request"

    View full-size slide

  5. 2009/11/10 23:00:01 starting application...
    2009/11/10 23:00:02 connecting to database
    2009/11/10 23:00:03 connecting to file storage
    2009/11/10 23:00:04 connecting to meow API 1...
    2009/11/10 23:00:05 connecting to meow API 2...
    2009/11/10 23:00:06 connecting to meow API 3...
    2009/11/10 23:00:22 GET /meow -> [200] "meow"
    2009/11/10 23:00:33 GET /meow -> [200] "meow"
    2009/11/10 23:00:44 GET /woof -> [400] "bad request"
    2009/11/10 23:00:01 starting application...
    2009/11/10 23:00:02 connecting to database
    2009/11/10 23:00:03 connecting to file storage
    2009/11/10 23:00:04 connecting to meow API 1...
    2009/11/10 23:00:05 connecting to meow API 2...
    2009/11/10 23:00:06 connecting to meow API 3...
    2009/11/10 23:00:22 GET /meow -> [200] "meow"
    2009/11/10 23:00:33 GET /meow -> [200] "meow"
    2009/11/10 23:00:44 GET /woof -> [400] "bad request"
    2009/11/10 23:00:01 starting application...
    2009/11/10 23:00:02 connecting to database
    2009/11/10 23:00:03 connecting to file storage
    2009/11/10 23:00:04 connecting to meow API 1...
    2009/11/10 23:00:05 connecting to meow API 2...
    2009/11/10 23:00:06 connecting to meow API 3...
    2009/11/10 23:00:22 GET /meow -> [200] "meow"
    2009/11/10 23:00:33 GET /meow -> [200] "meow"
    2009/11/10 23:00:44 GET /woof -> [400] "bad request"
    2009/11/10 23:00:01 starting application...
    2009/11/10 23:00:02 connecting to database
    2009/11/10 23:00:03 connecting to file storage
    2009/11/10 23:00:04 connecting to meow API 1...
    2009/11/10 23:00:05 connecting to meow API 2...
    2009/11/10 23:00:06 connecting to meow API 3...
    2009/11/10 23:00:22 GET /meow -> [200] "meow"
    2009/11/10 23:00:33 GET /meow -> [200] "meow"
    2009/11/10 23:00:44 GET /woof -> [400] "bad request"
    2009/11/10 23:00:01 starting application...
    2009/11/10 23:00:02 connecting to database
    2009/11/10 23:00:03 connecting to file storage
    2009/11/10 23:00:04 connecting to meow API 1...
    2009/11/10 23:00:05 connecting to meow API 2...
    2009/11/10 23:00:06 connecting to meow API 3...
    2009/11/10 23:00:22 GET /meow -> [200] "meow"
    2009/11/10 23:00:33 GET /meow -> [200] "meow"
    2009/11/10 23:00:44 GET /woof -> [400] "bad request"
    2009/11/10 23:00:01 starting application...
    2009/11/10 23:00:02 connecting to database
    2009/11/10 23:00:03 connecting to file storage
    2009/11/10 23:00:04 connecting to meow API 1...
    2009/11/10 23:00:05 connecting to meow API 2...
    2009/11/10 23:00:06 connecting to meow API 3...
    2009/11/10 23:00:22 GET /meow -> [200] "meow"
    2009/11/10 23:00:33 GET /meow -> [200] "meow"
    2009/11/10 23:00:44 GET /woof -> [400] "bad request"
    2009/11/10 23:00:01 starting application...
    2009/11/10 23:00:02 connecting to database
    2009/11/10 23:00:03 connecting to file storage
    2009/11/10 23:00:04 connecting to meow API 1...
    2009/11/10 23:00:05 connecting to meow API 2...
    2009/11/10 23:00:06 connecting to meow API 3...
    2009/11/10 23:00:22 GET /meow -> [200] "meow"
    2009/11/10 23:00:33 GET /meow -> [200] "meow"
    2009/11/10 23:00:44 GET /woof -> [400] "bad request"
    2009/11/10 23:00:01 starting application...
    2009/11/10 23:00:02 connecting to database
    2009/11/10 23:00:03 connecting to file storage
    2009/11/10 23:00:04 connecting to meow API 1...
    2009/11/10 23:00:05 connecting to meow API 2...
    2009/11/10 23:00:06 connecting to meow API 3...
    2009/11/10 23:00:22 GET /meow -> [200] "meow"
    2009/11/10 23:00:33 GET /meow -> [200] "meow"
    2009/11/10 23:00:44 GET /woof -> [400] "bad request"
    2009/11/10 23:00:01 starting application...
    2009/11/10 23:00:02 connecting to database
    2009/11/10 23:00:03 connecting to file storage
    2009/11/10 23:00:04 connecting to meow API 1...
    2009/11/10 23:00:05 connecting to meow API 2...
    2009/11/10 23:00:06 connecting to meow API 3...
    2009/11/10 23:00:22 GET /meow -> [200] "meow"
    2009/11/10 23:00:33 GET /meow -> [200] "meow"
    2009/11/10 23:00:44 GET /woof -> [400] "bad request"
    2009/11/10 23:00:01 starting application...
    2009/11/10 23:00:02 connecting to database
    2009/11/10 23:00:03 connecting to file storage
    2009/11/10 23:00:04 connecting to meow API 1...
    2009/11/10 23:00:05 connecting to meow API 2...
    2009/11/10 23:00:06 connecting to meow API 3...
    2009/11/10 23:00:22 GET /meow -> [200] "meow"
    2009/11/10 23:00:33 GET /meow -> [200] "meow"
    2009/11/10 23:00:44 GET /woof -> [400] "bad request"
    2009/11/10 23:00:01 starting application...
    2009/11/10 23:00:02 connecting to database
    2009/11/10 23:00:03 connecting to file storage
    2009/11/10 23:00:04 connecting to meow API 1...
    2009/11/10 23:00:05 connecting to meow API 2...
    2009/11/10 23:00:06 connecting to meow API 3...
    2009/11/10 23:00:22 GET /meow -> [200] "meow"
    2009/11/10 23:00:33 GET /meow -> [200] "meow"
    2009/11/10 23:00:44 GET /woof -> [400] "bad request"
    2009/11/10 23:00:01 starting application...
    2009/11/10 23:00:02 connecting to database
    2009/11/10 23:00:03 connecting to file storage
    2009/11/10 23:00:04 connecting to meow API 1...
    2009/11/10 23:00:05 connecting to meow API 2...
    2009/11/10 23:00:06 connecting to meow API 3...
    2009/11/10 23:00:22 GET /meow -> [200] "meow"
    2009/11/10 23:00:33 GET /meow -> [200] "meow"
    2009/11/10 23:00:44 GET /woof -> [400] "bad request"

    View full-size slide

  6. distributed tracing

    View full-size slide

  7. - Создан CNCF в конце 2016
    - Только API
    - Разрабатывается LightStep, Uber, NewRelic, Datadog, RedHat…
    - Широко используется

    View full-size slide

  8. - Создан Google в начале 2018
    - API и имплементация
    - Разработчики Google, Microsoft, Splunk, Honeykomb…
    - Широко используется

    View full-size slide

  9. Проблемы
    - Усиление связанности микросервисов
    - Vendor lock
    - OSS библиотеки тоже хотят делать трейсинг,
    но не быть vendor-specific
    - Как разработчик библиотеки (возможно) вы тоже не хотите
    привязываться к какому-то вендору, а хотите дать пользователю
    возможность самому выбирать

    View full-size slide

  10. Решение

    View full-size slide

  11. - Создан теми же в середине 2019
    - API и имплементация
    - API отделена от имплементаций (spec + libs)
    - lib dev может использовать API
    - app dev может использовать API и имплементацию
    - “каноничная” имплементация для каждого ЯП
    - Vendor-agnostic
    - Pluggable
    - Содержит все, что надо
    - tracing
    - monitoring
    - logging

    View full-size slide

  12. Архитектура OpenTracing

    View full-size slide

  13. обратная совместимость
    - сначала через bridges
    - потом через API
    - изменения:
    - tags -> attributes
    - logs -> events
    - сначала через bridges
    - сначала как часть OC но через API
    - потом отдельно через API

    View full-size slide

  14. Поддержка
    - OpenTelemetry - новая версия OpenTracing и OpenCensus
    - OpenTelemetry обратно совместим с OpenTracing и OpenCensus
    - После выхода стабильной версии OpenTelemetry в каждом ЯП
    разработка OpenTracing и OpenCensus прекращается
    - Поддержка OpenTracing и OpenCensus 2 года (с лета 2019)
    - Переход на OpenTelemetry возможен сразу через Bridges
    - В новых проектах предполагается использование API

    View full-size slide

  15. Поддерживаемые языки
    - Python
    - Java
    - JS (Node+browser)
    - Go
    - PHP
    - Ruby
    - .Net
    - Erlang
    - C++
    - Rust
    Статус:
    - pre-release
    - Prod-ready
    вторая половина 2020
    - в разных ЯП сроки
    отличаются

    View full-size slide

  16. Tracing API Metrics API

    View full-size slide

  17. Tracing API
    Tracer - отслеживание активного Span.
    Функционал
    - создает новый Span
    - возвращает активный Span
    - делает указанный Span активным
    SpanContext - серилиазируемые данные, передаваемые
    между уровнями приложения (между разными Span)
    Содержит TraceId, SpanId, TraceFlags и др. поля.

    View full-size slide

  18. Tracing API
    Span - представляет одну операцию в трейсе (функция, метод, и т.п.)
    Вкладывается один в другой по стеку (дереву) прохождения запроса.
    Включает
    - Имя
    - Родительский Span (Span, SpanContext или null)
    - timestamp начала
    - timestamp окончания
    - Map Attribute
    - Список Events
    - Ссылки на другие Spanы
    - Статус и прочие метаданные

    View full-size slide

  19. Tracing API
    Span имеет функционал:
    - Добавлять Attribute (key-value) - атрибуты/параметры Span
    - Доабвлять Event (имя, набор Attribute, timestamp) - лог внутри Span
    - Устанавливать статус (Ok, Cancelled, NotFound и еще 14)
    - Изменять имя
    - Возвращать контекст

    View full-size slide

  20. Metrics API
    Instruments - инструменты сбора метрик, на стороне приложения
    - Counter - счетчик, суммирующий значения синхронно
    - Add() function
    - Measure - измерения изменяющейся величины синхронно
    - Record() function
    - Observer - срезы/наблюдения величины асинхронно
    - Observe() callback

    View full-size slide

  21. Metrics API
    Metric Event Format
    - Context (Span или иное)
    - timestamp (неявно собирается на стороне SDK)
    - instrument definition (имя, тип, опции)
    - label set (key-value для использования потом)
    - value (signed integer или floating point)

    View full-size slide

  22. Metrics API
    Aggregations - инструменты обработки метрик, на стороне SDK
    - Sum для Counter
    - MinMaxSumCount для Measure
    - LastValue для Observer

    View full-size slide

  23. Pluggable instrumentation
    - можно инструментировать код без подключения коллектора
    - можно писать код сразу с трейсингом, а подключить его потом
    - меньше работы разработчика, больше переиспользования,
    стандартные практики
    - одинаковый API для разных сервисов, уменьшение связанности
    - меньше работы DevOps, разные сервисы (tracing, monitoring)
    ходят через один API для разных сервисов и ЯП
    - Vendor-agnostic
    - библиотеки могут быть инструментированы без привязки к вендору
    -> не нужно жертвовать observability, подключая 3rd party библиотеки

    View full-size slide

  24. Ссылки
    https:/
    /github.com/open-telemetry/opentelemetry-specification
    https:/
    /medium.com/opentelemetry
    https:/
    /opentelemetry.io

    View full-size slide

  25. Контакты
    kaznacheev.me
    github.com/ilyakaznacheev
    dev.to/ilyakaznacheev
    linkedin.com/in/ilyakaznacheev
    t.me/ilyakaznacheev
    t.me/kaznacheev_feed

    View full-size slide