Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fridolín Pokorný - Thoth - how to recommend the best possible libraries for your application

Fridolín Pokorný - Thoth - how to recommend the best possible libraries for your application

Having libraries in your Python project properly locked to a specific version is a well known best practice. Dependency management tools in the Python ecosystem lock dependencies to the latest version available, but what if the latest version available is not the best fit for your application? Open source project Thoth is an advanced Python dependency resolver which recommends libraries for your project based on observations that are gathered for Python libraries for specific runtime environments. How these recommendations look like? How are different observations like performance characteristics of machine learning libraries for a particular hardware gathered?

https://us.pycon.org/2019/schedule/presentation/185/

53b37e14a09c5a718a39fda61fe1b8e5?s=128

PyCon 2019

May 04, 2019
Tweet

Transcript

  1. Thoth How to recommend the best possible packages for your

    application Fridolin Pokorny <fridolin@redhat.com> 2019-May-4
  2. Thoth Station • Fridolín “fridex” Pokorný • Senior Software Engineer

    at Red Hat • Distributed systems, AI/ML and (of course) Python fan • Projects: ◦ Reverse engineer RetDec (AVG) ◦ Linux kernel TLS/DTLS module AF_KTLS ◦ Selinon - distributed task flows scheduler on top of Celery ◦ Project Thoth $ whoami https://fridex.github.io
  3. Thoth Station • Project Thoth - https://github.com/thoth-station • Red Hat

    - Office of the CTO ◦ Emerging technologies ◦ AI team - https://github.com/aicoe • Initially 2 engineers, now growing ◦ Christoph Görn <goern@redhat.com> ◦ Francesco Murdaca <fmurdaca@redhat.com> ◦ Fridolín Pokorný <fridolin@redhat.com> ◦ Harshad Reddy Nalla <hnalla@redhat.com> ◦ Marek Cermak <macermak@redhat.com> ◦ Subin Modeel <smodeel@redhat.com> $ whoarewe # project Thoth https://thoth-station.ninja/
  4. Thoth Station What is Thoth? Why Thoth?

  5. Thoth Station Why Thoth? • PyPI - Python Package Index

    ◦ https://pypi.org/ ◦ 178,016 projects ◦ 1,303,926 releases (approx. 7 releases per project)
  6. Thoth Station Why Thoth? import tensorflow as tf from flask

    import Flask application = Flask()
  7. Thoth Station Why Thoth? import tensorflow as tf from flask

    import Flask application = Flask()
  8. Thoth Station $ pip3 install --user tensorflow $ pip3 install

    --user flask $ python3 ./app.py Error: tensorflow 1.10.1 has requirement numpy<=1.14.5,>=1.13.3, but you'll have numpy 1.15.1 which is incompatible. $
  9. Thoth Station Why Thoth? import tensorflow as tf from flask

    import Flask application = Flask() 59 releases 28 releases
  10. Thoth Station Why Thoth? import tensorflow as tf from flask

    import Flask application = Flask() 59 releases 28 releases All combinations how to install libraries directly used: 59 * 28 = 1,652
  11. Thoth Station Transitive dependencies • Flask ◦ click, itsdangerous, jinja2,

    markupsafe, werkzeug Estimatimated number of combinations: 54,395,000
  12. Thoth Station Transitive dependencies • TensorFlow ◦ absl-py, astor, backports-weakref,

    bleach, enum34, gast, google-pasta, grpcio, h5py, html5lib, keras, keras-applications, keras-preprocessing, markdown, mock, numpy, pbr, protobuf, pyyaml, scipy, setuptools, six, tensorboard, tensorflow-estimator, tensorflow-tensorboard, termcolor, tf-estimator-nightly, werkzeug, wheel Estimated number of combinations: 139,740,802,927,165,440,000 approx. 1.39*1020
  13. Thoth Station Why Thoth? import tensorflow as tf from flask

    import Flask application = Flask() 1.39*1020 combinations 54,395,000 combinations All combinations how to install application stack of libraries directly and indirectly used (estimation): 1.39*1020 * 54,395,000 = 7.6*1027
  14. Thoth Station Why Thoth? import pandas as pd import tensorflow

    as tf from flask import Flask application = Flask() Operating System Fedora 30 Fedora 29 ... CentOS 7.6 CentOS 7.5 … Python interpreter
  15. Thoth Station Why Thoth? import pandas as pd import tensorflow

    as tf from flask import Flask application = Flask() Operating System Python interpreter glibc cuda
  16. Thoth Station Hardware Why Thoth? import pandas as pd import

    tensorflow as tf from flask import Flask application = Flask() Operating System Python interpreter glibc cuda GPU CPU
  17. Thoth Station Why Thoth? Python application

  18. Thoth Station Hardware Why Thoth? Python application Operating System Python

    interpreter Native dependecies Kernel modules Direct Python dependencies Transitive Python dependencies
  19. Thoth Station Why Thoth? • Create knowledge base ◦ What

    packages in which versions should I use? ▪ Application builds correctly ▪ Application runs correctly ▪ Application behaves and performs well • Create an advanced Python resolver which uses knowledge base to resolve software stacks Latest versions are not always greatest choices.
  20. Thoth Station Building Thoth’s knowledge base

  21. Thoth Station Gathering data for Thoth’s knowledge base • Resolving

    software stacks ◦ own resolution algorithm • Analyses of container images ◦ JupyterHub images ◦ Thoth’s container images • Amun, Dependency Monkey ◦ running CI and perf related tests ◦ performance related analyses • ...
  22. Thoth Station Optimized TensorFlow builds by Thoth team • Automated

    tests of libraries • Tests targeting performance • Optimized TensorFlow builds https://tensorflow.pypi.thoth-station.ninja/
  23. Thoth Station Recommendations

  24. Thoth Station How good is my software stack? simplelib anotherlib

  25. Thoth Station

  26. Thoth Station v1 v2 simplelib v1 v2 anotherlib v1 v2

    dependency1 v1 dependency2 v2
  27. Thoth Station v1 v2 simplelib v1 v2 anotherlib v1 v2

    dependency1 v1 dependency2 v2 pip/Pipenv (always latest): simplelib ==v2 anotherlib ==v2 dependency2 ==v2
  28. Thoth Station v1 v2 simplelib v1 v2 anotherlib v1 v2

    dependency1 v1 dependency2 v2
  29. Thoth Station v1 v2 simplelib v1 v2 anotherlib v1 v2

    dependency1 v1 dependency2 v2 Causes errors based on Thoth’s knowledge base.
  30. Thoth Station v1 v2 simplelib v1 v2 anotherlib v1 v2

    dependency1 v1 dependency2
  31. Thoth Station v1 v2 simplelib v1 v2 anotherlib v1 v2

    dependency1 v1 dependency2 Simplelib in version v1 performs better together with dependency1 in version v1 based on Thoth’s knowledge base.
  32. Thoth Station v1 v2 simplelib v1 v2 anotherlib v1 v2

    dependency1 v1 dependency2
  33. Thoth Station v1 v2 simplelib v1 v2 anotherlib v1 v2

    dependency1 v1 dependency2 Thoth (always greatest): simplelib ==v1 anotherlib ==v2 dependency1 ==v1 dependency2 ==v1
  34. Thoth Station Stack generation pipeline Remove pre-releases Construct dependency graph

    Remove install errors Remove run errors Adjust based on performance Adjust based on security Sort based on semver Resolved stacks steram Performance based scoring Security based scoring Lockfile generation Final score gating Runtime environment Requirements Analysis of application Lock file Justification
  35. Thoth Station Extending information about Python packages

  36. Thoth Station • License • Classifiers ◦ Programming Language ::

    Python :: 3.6 ◦ Programming Language :: Python :: Implementation :: CPython • Package purpose ◦ machine learning library ◦ plugin ◦ … • Is the given package affecting performance? Python package metadata
  37. Thoth Station • A vector space model • Each vector

    in vector space corresponds to a project • Each item in vector represents a feature • Allows feature based queries and similar projects search F = {python, machine-learning, web, django-framework, webassembly, sql, spark, gpu-support, Java} F tf = {1, 1, 0, 0, 0, 0, 0, 1, 0} F - feature vector F tf - feature vector for project TensorFlow project2vec
  38. Thoth Station Image address project2vec

  39. Thoth Station Information about Thoth • Website: ◦ https://thoth-station.ninja/ •

    Twitter ◦ https://twitter.com/thothstation ▪ Follow for updates on public availability • GitHub ◦ https://github.com/thoth-station
  40. Thoth Station

  41. Thoth Station • Community sitting at https://github.com/thoth-station/ • Bot Kebechet

    ◦ https://thoth-station.ninja/kebechet/ • Twitter ◦ https://twitter.com/thothstation $ pip3 install thamos $ cd ~/repositories/my-repo/ $ thamos config $ thamos advise It’s all on you
  42. THANK YOU plus.google.com/+RedHat linkedin.com/company/red-hat youtube.com/user/RedHatVideos facebook.com/redhatinc twitter.com/RedHat