Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SAC 2020 - Presentation Slides

SAC 2020 - Presentation Slides

Francisco Neves

March 18, 2020
Tweet

More Decks by Francisco Neves

Other Decks in Science

Transcript

  1. Black-box inter-application traffic monitoring for adaptive container placement Francisco Neves,

    Ricardo Vilaça and José Pereira HASLab, INESC TEC and University of Minho Braga, Portugal ACM/SIGAPP Symposium On Applied Computing
  2. • Distributed software components are managed with containers, container pods

    and orchestrators • Container placement is important for achieving great performance • Inter-application traffic is a key issue for determining performance Introduction 2
  3. • Cloud-based enviroments are unable to accurately monitor inter-application traffic

    in an application-independent way • Existing tracing tools that provide detailed information about data flow within an cloud application require instrumentation • Capturing network traffic is possible but incurs large overhead and demands high computational resources during peaks Problem 3
  4. • How to optimize container placement of system’s deployment without

    application knowledge and incurring negligible overhead? Problem 4 Host 1 Host 2 Host 3 Host N
  5. • Kernel layer is the common low level layer of

    virtualized environments • Observing system calls provide useful insights on which and how software components interact • Network communication between processes (even in containers) involves system calls for managing connections, reading and writing messages to network channels Monitoring at Kernel Layer 5
  6. • There is a plenty of system calls for the

    same purpose • System calls do not always provide relevant information ◦ File descriptors are meaningless out of the process context Network Communication in Kernel 6 Write System Calls Read System Calls write(fd, buf, size) sendto(socket, …) sendmsg(socket, …) sendfile(to_fd, from_fd, …) read(fd, buf, size) readfrom(socket, …) readmsg(socket, …)
  7. Network Communication in Kernel 7 Write System Calls Read System

    Calls write(fd, buf, size) sendto(socket, …) sendmsg(socket, …) sendfile(to_fd, from_fd, …) read(fd, buf, size) readfrom(socket, …) readmsg(socket, …) int sock_sendmsg( struct socket *sock, struct msghdr *msg ) int sock_recvmsg( struct socket *sock, struct msghdr *msg, int flags ) user space kernel space
  8. • struct socket contains connection details: ◦ local and remote

    ip addresses ▪ local and remote addresses in sender are remote and local addresses at the receiver side, respectively ◦ local and remote ports ◦ socket family (AF_INET, AF_INET6) and socket type (SOCK_STREAM) Network Communication in Kernel 8 Write System Calls Read System Calls int sock_sendmsg( struct socket *sock, struct msghdr *msg ) int sock_recvmsg( struct socket *sock, struct msghdr *msg, int flags ) user space kernel space
  9. • return value indicates the amount of data actually sent/received.

    ◦ Or if any error occured Network Communication in Kernel 9 Write System Calls Read System Calls int sock_sendmsg( struct socket *sock, struct msghdr *msg ) int sock_recvmsg( struct socket *sock, struct msghdr *msg, int flags ) user space kernel space
  10. • eBPF is a popular technology that permits efficient attachment

    of custom programs at the entry and exit points of kernel functions • Collect data in kernel space and publish events to ring buffers, which are consumed by a frontend program in user space Monitoring using eBPF 10
  11. • Reading kernel structures require copying them first • Stack

    size of each probe is limited to 512 bytes • Ring buffer size is limited and high event throughput lead new events to overwrite the oldest ones • Processing events in frontend program incurs CPU usage Monitoring using eBPF - Caveats 11
  12. • Probes attached to the entry and exit points of

    kernel routines sock_sendmsg and sock_recvmsg collect connection details and amount of sent/received bytes • Worst-case stress scenario setup with iperf tool • Two versions implemented: ◦ One event for each read/write, aggregating in userspace (UserAgg) ◦ Send event with already aggregated statistics to user space (KernelAgg) Monitoring using eBPF - Overhead 12
  13. • Example of layered and distributed data processing ◦ Combination

    of Apache Cassandra and Apache Spark • Four n1-standard-4 Google Cloud Engine instances • Docker containers orchestrated by Kubernetes. ◦ 4 replicas of Apache Cassandra ◦ 4 replicas of Spark Workers and 1 of Spark Master • Populated with 2 million rows of ~2KiB in size Case Study 14
  14. • Traditional resource monitoring of two queries Q1 and Q2

    ◦ CPU time in seconds ◦ Others in KiB Case Study - Default Placement 15
  15. • Traditional resource monitoring of two queries Q1 and Q2

    ◦ CPU time in seconds ◦ Others in KiB Case Study - Default Placement 16
  16. • Traditional resource monitoring of two queries Q1 and Q2

    ◦ CPU time in seconds ◦ Others in KiB Case Study - Default Placement 17
  17. • Which instances contribute to such traffic? Case Study -

    Default Placement 19 high intra-host traffic high inter-host traffic
  18. Case Study - Automatic Placement 22 • The black-box approach

    is compatible with automatic techniques for optimizing containers placement • Pyevolve utility for optimizing placement, giving the initial set of containers and servers, each with corresponding processes • Three optimization factors: ◦ optimal result for each server where CPU cores are expected to be fully used ◦ optimal result for each server where RAM is expected to be fully used ◦ optimal result for no cross-server communication
  19. Case Study - Automatic Placement 23 • Q1: Place two

    Spark workers and two Cassandra servers in each server • Q2: Place three Cassandra servers in one instance and the remaining Cassandra together with all Spark workers in a second instance • Overall decrease of exchanged inter-host network traffic
  20. Case Study - Manual Placement 24 • The collected data

    can be used also for manual placement and configuration • We manually placed and configured containers based on network traffic • Q1: Each Spark worker together with a Cassandra server ◦ Improves locality • Q2: Only one Spark worker with 4x as much resources assigned ◦ Avoids shuffling
  21. • Monitoring at Kernel layer provides useful insights, in a

    black-box fashion, on systems performance • Quantifying the amount of data exchanged between software components is key for improving performance • Monitoring network connections is feasible with low overhead and without application knowledge Conclusions 27
  22. Black-box inter-application traffic monitoring for adaptive container placement Francisco Neves,

    Ricardo Vilaça and José Pereira HASLab, INESC TEC and University of Minho Braga, Portugal ACM/SIGAPP Symposium On Applied Computing
  23. • Which processes contribute to such traffic? Case Study -

    Default Placement 30 Cassandra — Spark data transfer Spark — Spark shuffling