Slide 1

Slide 1 text

Black-box inter-application traffic monitoring for adaptive container placement Francisco Neves, Ricardo Vilaça and José Pereira HASLab, INESC TEC and University of Minho Braga, Portugal ACM/SIGAPP Symposium On Applied Computing

Slide 2

Slide 2 text

● Distributed software components are managed with containers, container pods and orchestrators ● Container placement is important for achieving great performance ● Inter-application traffic is a key issue for determining performance Introduction 2

Slide 3

Slide 3 text

● Cloud-based enviroments are unable to accurately monitor inter-application traffic in an application-independent way ● Existing tracing tools that provide detailed information about data flow within an cloud application require instrumentation ● Capturing network traffic is possible but incurs large overhead and demands high computational resources during peaks Problem 3

Slide 4

Slide 4 text

● How to optimize container placement of system’s deployment without application knowledge and incurring negligible overhead? Problem 4 Host 1 Host 2 Host 3 Host N

Slide 5

Slide 5 text

● Kernel layer is the common low level layer of virtualized environments ● Observing system calls provide useful insights on which and how software components interact ● Network communication between processes (even in containers) involves system calls for managing connections, reading and writing messages to network channels Monitoring at Kernel Layer 5

Slide 6

Slide 6 text

● There is a plenty of system calls for the same purpose ● System calls do not always provide relevant information ○ File descriptors are meaningless out of the process context Network Communication in Kernel 6 Write System Calls Read System Calls write(fd, buf, size) sendto(socket, …) sendmsg(socket, …) sendfile(to_fd, from_fd, …) read(fd, buf, size) readfrom(socket, …) readmsg(socket, …)

Slide 7

Slide 7 text

Network Communication in Kernel 7 Write System Calls Read System Calls write(fd, buf, size) sendto(socket, …) sendmsg(socket, …) sendfile(to_fd, from_fd, …) read(fd, buf, size) readfrom(socket, …) readmsg(socket, …) int sock_sendmsg( struct socket *sock, struct msghdr *msg ) int sock_recvmsg( struct socket *sock, struct msghdr *msg, int flags ) user space kernel space

Slide 8

Slide 8 text

● struct socket contains connection details: ○ local and remote ip addresses ■ local and remote addresses in sender are remote and local addresses at the receiver side, respectively ○ local and remote ports ○ socket family (AF_INET, AF_INET6) and socket type (SOCK_STREAM) Network Communication in Kernel 8 Write System Calls Read System Calls int sock_sendmsg( struct socket *sock, struct msghdr *msg ) int sock_recvmsg( struct socket *sock, struct msghdr *msg, int flags ) user space kernel space

Slide 9

Slide 9 text

● return value indicates the amount of data actually sent/received. ○ Or if any error occured Network Communication in Kernel 9 Write System Calls Read System Calls int sock_sendmsg( struct socket *sock, struct msghdr *msg ) int sock_recvmsg( struct socket *sock, struct msghdr *msg, int flags ) user space kernel space

Slide 10

Slide 10 text

● eBPF is a popular technology that permits efficient attachment of custom programs at the entry and exit points of kernel functions ● Collect data in kernel space and publish events to ring buffers, which are consumed by a frontend program in user space Monitoring using eBPF 10

Slide 11

Slide 11 text

● Reading kernel structures require copying them first ● Stack size of each probe is limited to 512 bytes ● Ring buffer size is limited and high event throughput lead new events to overwrite the oldest ones ● Processing events in frontend program incurs CPU usage Monitoring using eBPF - Caveats 11

Slide 12

Slide 12 text

● Probes attached to the entry and exit points of kernel routines sock_sendmsg and sock_recvmsg collect connection details and amount of sent/received bytes ● Worst-case stress scenario setup with iperf tool ● Two versions implemented: ○ One event for each read/write, aggregating in userspace (UserAgg) ○ Send event with already aggregated statistics to user space (KernelAgg) Monitoring using eBPF - Overhead 12

Slide 13

Slide 13 text

Monitoring using eBPF - Overhead 13

Slide 14

Slide 14 text

● Example of layered and distributed data processing ○ Combination of Apache Cassandra and Apache Spark ● Four n1-standard-4 Google Cloud Engine instances ● Docker containers orchestrated by Kubernetes. ○ 4 replicas of Apache Cassandra ○ 4 replicas of Spark Workers and 1 of Spark Master ● Populated with 2 million rows of ~2KiB in size Case Study 14

Slide 15

Slide 15 text

● Traditional resource monitoring of two queries Q1 and Q2 ○ CPU time in seconds ○ Others in KiB Case Study - Default Placement 15

Slide 16

Slide 16 text

● Traditional resource monitoring of two queries Q1 and Q2 ○ CPU time in seconds ○ Others in KiB Case Study - Default Placement 16

Slide 17

Slide 17 text

● Traditional resource monitoring of two queries Q1 and Q2 ○ CPU time in seconds ○ Others in KiB Case Study - Default Placement 17

Slide 18

Slide 18 text

● Which instances contribute to such traffic? Case Study - Default Placement 18

Slide 19

Slide 19 text

● Which instances contribute to such traffic? Case Study - Default Placement 19 high intra-host traffic high inter-host traffic

Slide 20

Slide 20 text

Case Study - Default Placement (Q1) 20 Cassandra — Spark data transfer

Slide 21

Slide 21 text

Case Study - Default Placement (Q2) 21 Spark — Spark shuffling

Slide 22

Slide 22 text

Case Study - Automatic Placement 22 ● The black-box approach is compatible with automatic techniques for optimizing containers placement ● Pyevolve utility for optimizing placement, giving the initial set of containers and servers, each with corresponding processes ● Three optimization factors: ○ optimal result for each server where CPU cores are expected to be fully used ○ optimal result for each server where RAM is expected to be fully used ○ optimal result for no cross-server communication

Slide 23

Slide 23 text

Case Study - Automatic Placement 23 ● Q1: Place two Spark workers and two Cassandra servers in each server ● Q2: Place three Cassandra servers in one instance and the remaining Cassandra together with all Spark workers in a second instance ● Overall decrease of exchanged inter-host network traffic

Slide 24

Slide 24 text

Case Study - Manual Placement 24 ● The collected data can be used also for manual placement and configuration ● We manually placed and configured containers based on network traffic ● Q1: Each Spark worker together with a Cassandra server ○ Improves locality ● Q2: Only one Spark worker with 4x as much resources assigned ○ Avoids shuffling

Slide 25

Slide 25 text

Case Study - Manual Placement (Q1) 25

Slide 26

Slide 26 text

Case Study - Manual Placement (Q2) 26

Slide 27

Slide 27 text

● Monitoring at Kernel layer provides useful insights, in a black-box fashion, on systems performance ● Quantifying the amount of data exchanged between software components is key for improving performance ● Monitoring network connections is feasible with low overhead and without application knowledge Conclusions 27

Slide 28

Slide 28 text

Black-box inter-application traffic monitoring for adaptive container placement Francisco Neves, Ricardo Vilaça and José Pereira HASLab, INESC TEC and University of Minho Braga, Portugal ACM/SIGAPP Symposium On Applied Computing

Slide 29

Slide 29 text

● Which processes contribute to such traffic? Case Study - Default Placement 29

Slide 30

Slide 30 text

● Which processes contribute to such traffic? Case Study - Default Placement 30 Cassandra — Spark data transfer Spark — Spark shuffling