Slide 1

Slide 1 text

7-9 November, Tbilisi Ivane Javakhishvili Tbilisi State University 7-9 November, Tbilisi Ivane Javakhishvili Tbilisi State University Anton Sitnikov, Kirill Rudakov, Andrey Novikov, Evgeny Tsymbalov, Elena Trescheva, Alexey Zverev Exactpro Building an Adaptive Logs Classification System An industrial report

Slide 2

Slide 2 text

7-9 November, Tbilisi Ivane Javakhishvili Tbilisi State University Logs analysis in Fintech world ● Any feasible way to improve software quality is highly demanded ● Logs observation for passive testing – an elegant and non-disruptive way for early error discovery ● Log size is the problem

Slide 3

Slide 3 text

7-9 November, Tbilisi Ivane Javakhishvili Tbilisi State University Reducing complexity

Slide 4

Slide 4 text

7-9 November, Tbilisi Ivane Javakhishvili Tbilisi State University From M’s of log lines to ~10..100k of signatures ● A signature is a log line with values replaced by placeholders (TIMESTAMP, ID, URL, and many more) ● Signature extraction is made by regular expressions. They need to be manually added over time

Slide 5

Slide 5 text

7-9 November, Tbilisi Ivane Javakhishvili Tbilisi State University …and then to 1000’s of clusters Cluster is a set of close signatures, ideally meaning same error type for a human engineer Why signatures are not enough? ● ID’s are difficult to catch ● Words can be added/deleted, which leads to different signatures ● Fast growing clusters are sign of missing signature extraction rule

Slide 6

Slide 6 text

7-9 November, Tbilisi Ivane Javakhishvili Tbilisi State University Clustering approaches K-means on log lines Train by several days; see what’s new on new days. A signature makes new cluster if distant from all existing. + Can process massive logs + User defines radius ad hoc - Clusters are moving over time Agglomerative on signatures Fix cluster diameter; make new cluster if signature distant from existing clusters. Keep cluster history forever + Clusters stable, consistent history - Slower - Lots of small clusters, need to introduce “user macro-clusters”

Slide 7

Slide 7 text

Initial classification New logs processing New logs (hour to day) Old clusters do not move. A user is notified when new cluster appears Initial class structure 100k - 1M records a day ~10k signatures ~1k dense clusters Log archive (week) Hundreds of user clusters UI shows user clusters Human classification Vectorization in space of 1,2,3-grams k-means-facilitated human overview, population of user clusters Greedy clustering to Jaccard-dense clusters Signature extraction. Vectorization in space of 1,2,3-grams Two clusters mean the same? Human joins them to a user cluster and names it. User cluster contain single of multiple dense clusters New 1,2,3-gram? Add dimension, re-evaluate distances New signature? Calculate distance to old signatures Is adding to existing cluster break Jaccard compactness criterion? Then it is a new cluster.

Slide 8

Slide 8 text

7-9 November, Tbilisi Ivane Javakhishvili Tbilisi State University Business use cases ● See new error types for day ● Find cluster (exact or nearest) by raw log line ● Compare error portraits of two days (and test runs)

Slide 9

Slide 9 text

7-9 November, Tbilisi Ivane Javakhishvili Tbilisi State University Implementation

Slide 10

Slide 10 text

7-9 November, Tbilisi Ivane Javakhishvili Tbilisi State University DB/backend queries ● What new clusters appeared today? ● What clusters appeared on Day X but not on Day Y? ● When did Cluster X appears during the day, and how often? ● What cluster does this log line belong to? 10

Slide 11

Slide 11 text

REST Middleware for UI Clustering (after each parsing, called by parser/bash) UI server (React or Flask) Parsing (Regular, Bash-called) Model storage (file) ● Vocabulary ● Signatures x Vocabulary ● Distances (Jaccard matrix) ● Signatures ● Clusters & Rangers ● Settings mySQL Clustering Settings Macro-clusters UI settings signatures, appearances (each) Read cluster & new signatures Save re-calculated clusters Read clusters

Slide 12

Slide 12 text

7-9 November, Tbilisi Ivane Javakhishvili Tbilisi State University

Slide 13

Slide 13 text

7-9 November, Tbilisi Ivane Javakhishvili Tbilisi State University Q&A

Slide 14

Slide 14 text

7-9 November, Tbilisi Ivane Javakhishvili Tbilisi State University Thanks