for team to work independently, want quick development, globalization of companies Hardware trends: • death of Moore’s law leads to need for parallelization
communicating over the network Organizational trends: • desire for team to work independently, want quick development, globalization of companies Hardware trends: • death of Moore’s law leads to need for parallelization Foundational trends towards microservices
for deployment, scaling, observability •Independently deployable units •Small, represent a single business capability •Strictly hierarchical archs. •Relatively stable topologies Front end Authentication Friends Feed Ads Ads Dependency Stateless Service Stateful Service Example: Simple Social Network Posts Friends
Dependency Stateless Service Stateful Service Example: Simple Social Network Posts Scheduler Observability Framework Friends •Concept of service is right granularity for deployment, scaling, observability •Independently deployable units •Small, represent a single business capability •Strictly hierarchical archs. •Relatively stable topologies Microservices: an abstraction
JSEP ’22, JSys’22] • Focuses on topology and request work fl ows Microservice testbeds [ASPLOS’19, TSE’18] • small in scale and complexity Tools evaluated on testbeds [OSDI’20, SINAN’21, ASPLOS’21] 8
impacts deep traces Variation in # calls, even locally Wide & shallow Depth predicts # calls Traces rep. of work fl ows Topology Work fl ows execute consistently Variation in conc., decreased by children set Service is su ffi cient dimension Service is not one size fi ts all Topology is static Long-term growth with daily churn X X X ✓ X X X Services are simple Long tail of complex services Finding Abstraction Finding Abstraction Wide & shallow
lifetimes Service Complexity (1 day) • Endpoints exposed by deployed services, replication factors, and dependencies Analysis granularity: service id, a unique name assigned to each service (e.g. authentication) 10
info in service id to utilize infrastructure support Service granularity is not su ffi cient for all management tasks: multi- tenancy and data placement must be considered
were also deprecated •40% of regular services lived the entire time range 12 High creation and deprecation rates for service ids, especially for the ill- fi tting services
deployed service instances nearly doubled • Growth is due to new (regular) service ids, not an increase in replication factors for existing services 13
impacts deep traces Variation in # calls, even locally Wide & shallow Depth predicts # calls Traces rep. of work fl ows Topology Work fl ows execute consistently Variation in conc., decreased by children set Service is su ffi cient dimension Service is not one size fi ts all Topology is static Long-term growth with daily churn X X X ✓ X X X Services are simple Long tail of complex services Finding Abstraction Finding Abstraction Wide & shallow
the work done on behalf of a request •Canopy [SOSP’17]: Meta’s distributed tracing framework •Traces can be sampled anywhere in the topology 15 Authentication Verify User Execution Unit Legend: Service Block Point Edge Feed Load Posts Example Canopy Trace
Number of calls: 6 Parent’s characteristics: Node names: service id | endpoint name Child A Child B Parent Root … Child A Child A Child A Child B Time Concurrency Max concurrency rate: 0.5 (3/6)
Relay The majority of service|endpoints are leaves or single relays: • Ads Manager: 54% • Fetch Noti fi cations: 66% • RaaS: 72% Identi fi ed three categories of nodes:
Always 100% concurrent Always 0% concurrent Children set helps explain concurrency rate Standard deviation in concurrency decreases by 38-44% when grouping by children set Children sets: 2. 1. Predicting concurrency rates
diagnosis, capacity planning [Tprof, Sifter, VAIF, CRISP]: • Need to assume signi fi cant diversity in work fl ows originating from a root endpoint 24 Testbeds should be extended to provide support for: • Heterogeneity of services, churn & growth of deployed instances • Variable concurrency, number of children, and children sets even within requests from a single root endpoint Tooling that uses topology for resource management [Sage, FIRM, Sinan]: • Should be adaptable to dynamic topology • Need to test when results are based on stale data
Long-term growth with daily churn Long tail of complex services Work fl ows Observability loss impacts deep traces Variation in # calls, even locally Variation in conc., decreased by children set Wide & shallow Observability loss impacts deep traces Summary Data available @ github.com/ facebookresearch/ distributed_traces Microservice abstraction should be extended to support di ff erent types of archs.
CRISP • [SoCC ‘21]: Characterizing Microservice Dependency and Performance: Alibaba Trace Analysis. SoCC • [JSEP ‘22]: Characterizing and synthesizing the work fl ow structure of microservices in ByteDance Cloud • [JSys’22]: Huye & Shesagiri et al. [SoK] Identifying Mismatches Between Microservice Testbeds and Industrial Perceptions of Microservices • [ASPLOS’19]: Gan et al. An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems • [TSE’18]: Zhou et al. Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical Study. • [ASPLOS’21]: Sage • [Tprof’21]: Tprof SoCC 26