Slide 25
Slide 25 text
Monitoring, troubleshooting, tweaking a system while millions of lines of data are processed is a huge challenge. It is
very important to have the tools available to not only be able to get a visibility of the system on the micro-scale (e.g see
how one particular company is processed), but also on a macro scale/global scale.
Simply adding logging to the code base to see what’s happening at execution time is not sufficient because you will
end up with millions of lines.
As already mentioned the existing monitoring solution in Spark the, web UI, is not well fit for monitoring on the
applicative level, so therefore I built a custom solution on top of it so I can see a history of the calculations that have run,
any warnings or errors that may have occurred, how long it took to process a particular company, etc.
Each calculation consists of one or more spark jobs which link to their corresponding job page in the spark UI for further
investigation.
Application-focused logging - presenter notes