Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A story of comics, neural networks, and Android!

Karumi
December 16, 2020

A story of comics, neural networks, and Android!

Finding a panel on a comic page is the hardest thing I've ever done in computer science. If you'd like to know how we can combine comics, artificial intelligence and Android code to build a software solution for this problem... this is your talk! We will review some topics related to machine learning, some computer vision algorithms and how to use machine learning in Android to analyze a comic page.

Karumi

December 16, 2020
Tweet

More Decks by Karumi

Other Decks in Programming

Transcript

  1. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics A story of comics, neural networks & Android! Pedro Gómez Senior Software Engineer at Karumi [email protected] @pedro_g_s github.com/pedrovgs
  2. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Questions? https://sli.do #madgStylesAndComics
  3. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics This story starts with an awesome app and one of my favorite hobbies
  4. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Unfortunately, we don’t have this metadata and comic pages are not always composed by rectangles
  5. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics How does Panels do it?
  6. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Current solutions are based on computer vision
  7. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics OpenCV pros and cons • Already implemented solution. • Maintained by smart people. • Works with simple pages. • Really fast. • Obj-c bridging hell. • Doesn’t work with “cool” pages. • Doesn’t work with covers. • Doesn’t work with back covers. • Does not respect artist reading design.
  8. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Can we do it better?
  9. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics “I’m sure we can do it better with Artificial Intelligence” Pedro Gómez, 2018
  10. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Unfortunately, it’s not that simple
  11. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics What would we need to solve this using AI? • State of the art review. • Learn Python. • Learn about ML. • Learn about TensorFlow. • Find a model to use in TF. • Create a data set. • Train and export the model. • Consume the model from Android/iOS code. • Integrate the libraries into Panels.
  12. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Let’s start with the hardest part Machine Learning
  13. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Convolutional neural networks to the rescue! Supervised deep learning models frequently used for image classification, segmentation and video processing.
  14. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics U-Net deep learning model U-Net is a convolutional neural network that was developed for biomedical image segmentation at the Computer Science Department of the University of Freiburg, Germany.
  15. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Machine Learning ➡ ➡
  16. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Using a supervised model means we will need a training dataset ‍♀
  17. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Machine Learning ➡ ➡
  18. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Machine Learning ➡ ➡ Label 0 - Background. Label 1 - Border Label 2 -> Content
  19. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Problems • I had no idea about Python. • My ML knowledge was poor. • I had no idea about TF. • We need to create a dataset. • We needed to measure our accuracy. • Our first run gave us ~5% acc for the border. • Even if we get the model ready we still need to find the panels between the numbers
  20. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Solutions • I learned python. • I studied some ML. • I reviewed some TF tutorials. • I manually created the masks for about than 500 pages. • I started measuring properly the accuracy per label. • We would improve our model/training to get up to 80% accuracy. • We would take care of the panels later.
  21. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics ML accuracy improvements • Configure different metrics. • Increase training dataset size from 50 pages to 500 (Google uses 3k for any sample project). • Increase neural network size from 128 to 224. • Balance our dataset. • Run some random experiments and discard the results.
  22. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Once the model was ready, it was time to think about Android & iOS
  23. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Using the model we created using Python from Android & iOS
  24. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Using the model in mobile ➡ ➡ Label 0 - Background. Label 1 - Border Label 2 -> Content
  25. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Using TensorFlow lite • We optimized and increased the tflite size model from 24mb up to 12mb. • We created the code needed to extract the image features from mobile. • Using some Kotlin code we evaluated the model and we got a wonderful matrix of numbers.
  26. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Once we got the output matrix it was time to start working with the computer vision algorithms!
  27. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Using the model in mobile ➡
  28. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Connected Component Labeling aka the slowest algorithm I’ve ever implemented
  29. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics The first Kotlin version needed 14 seconds to analyze a page
  30. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics We finally ported the algorithm to C++ and we are now able to analyze a page in less than 500ms ⚡
  31. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics +-500 LOC +-500 comic pagespages
  32. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics C++ code is fast, however the main benefit for us is portability!
  33. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Some Android/iOS details took us more time to implement than the ML part ⏳
  34. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Thanks to this solution we expected Panels’ binary size to be lower but it is not
  35. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics DeepPanel value is not only the code but the dataset
  36. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Our model doesn’t find panels. DeepPanel is able to read a comic!
  37. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics The best solution is not based on computer science but asking the artist how we should read the page
  38. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics It is a great example of ML bias
  39. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics There is room for improvements
  40. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Available in Maven Central and CocoaPods
  41. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics References • DeepPanel - https://github.com/pedrovgs/DeepPanel • DeepPanelAndroid - https://github.com/pedrovgs/DeepPanelAndroid • DeepPaneliOS - https://github.com/pedrovgs/DeepPaneliOS • Panels - https://twitter.com/Panels_ink • TensorFlow - https://www.tensorflow.org/ • U-Net model - https://en.wikipedia.org/wiki/U-Net • CCL - https://en.wikipedia.org/wiki/Connected-component_labeling • Convolutional Neural Networks - https://en.wikipedia.org/wiki/Convolutional_neural_network • Dot CSV Video - https://www.youtube.com/watch?v=V8j1oENVz00
  42. Pedro Gómez - [email protected] - @pedro_g_s - github.com/pedrovgs - Questions

    => sli.do #madgStylesAndComics Acknowledgements • Asun. • Rafael Muñóz. • Fillito. • Victor Baro.