Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fridolín Pokorný - Thoth - how to recommend the best possible libraries for your application

Fridolín Pokorný - Thoth - how to recommend the best possible libraries for your application

Having libraries in your Python project properly locked to a specific version is a well known best practice. Dependency management tools in the Python ecosystem lock dependencies to the latest version available, but what if the latest version available is not the best fit for your application? Open source project Thoth is an advanced Python dependency resolver which recommends libraries for your project based on observations that are gathered for Python libraries for specific runtime environments. How these recommendations look like? How are different observations like performance characteristics of machine learning libraries for a particular hardware gathered?

https://us.pycon.org/2019/schedule/presentation/185/

PyCon 2019

May 04, 2019
Tweet

More Decks by PyCon 2019

Other Decks in Programming

Transcript

  1. Thoth
    How to recommend the best possible packages for
    your application
    Fridolin Pokorny
    2019-May-4

    View Slide

  2. Thoth Station
    ● Fridolín “fridex” Pokorný
    ● Senior Software Engineer at Red Hat
    ● Distributed systems, AI/ML and (of course) Python fan
    ● Projects:
    ○ Reverse engineer RetDec (AVG)
    ○ Linux kernel TLS/DTLS module AF_KTLS
    ○ Selinon - distributed task flows scheduler on top of Celery
    ○ Project Thoth
    $ whoami
    https://fridex.github.io

    View Slide

  3. Thoth Station
    ● Project Thoth - https://github.com/thoth-station
    ● Red Hat - Office of the CTO
    ○ Emerging technologies
    ○ AI team - https://github.com/aicoe
    ● Initially 2 engineers, now growing
    ○ Christoph Görn
    ○ Francesco Murdaca
    ○ Fridolín Pokorný
    ○ Harshad Reddy Nalla
    ○ Marek Cermak
    ○ Subin Modeel
    $ whoarewe # project Thoth
    https://thoth-station.ninja/

    View Slide

  4. Thoth Station
    What is Thoth?
    Why Thoth?

    View Slide

  5. Thoth Station
    Why Thoth?
    ● PyPI - Python Package Index
    ○ https://pypi.org/
    ○ 178,016 projects
    ○ 1,303,926 releases (approx. 7 releases per project)

    View Slide

  6. Thoth Station
    Why Thoth?
    import tensorflow as tf
    from flask import Flask
    application = Flask()

    View Slide

  7. Thoth Station
    Why Thoth?
    import tensorflow as tf
    from flask import Flask
    application = Flask()

    View Slide

  8. Thoth Station
    $ pip3 install --user tensorflow
    $ pip3 install --user flask
    $ python3 ./app.py
    Error: tensorflow 1.10.1 has requirement
    numpy<=1.14.5,>=1.13.3, but you'll have numpy 1.15.1 which
    is incompatible.
    $

    View Slide

  9. Thoth Station
    Why Thoth?
    import tensorflow as tf
    from flask import Flask
    application = Flask()
    59 releases 28 releases

    View Slide

  10. Thoth Station
    Why Thoth?
    import tensorflow as tf
    from flask import Flask
    application = Flask()
    59 releases 28 releases
    All combinations how to install libraries directly used:
    59 * 28 = 1,652

    View Slide

  11. Thoth Station
    Transitive dependencies
    ● Flask
    ○ click, itsdangerous, jinja2, markupsafe, werkzeug
    Estimatimated number of combinations: 54,395,000

    View Slide

  12. Thoth Station
    Transitive dependencies
    ● TensorFlow
    ○ absl-py, astor, backports-weakref, bleach, enum34, gast, google-pasta, grpcio,
    h5py, html5lib, keras, keras-applications, keras-preprocessing, markdown, mock,
    numpy, pbr, protobuf, pyyaml, scipy, setuptools, six, tensorboard,
    tensorflow-estimator, tensorflow-tensorboard, termcolor, tf-estimator-nightly,
    werkzeug, wheel
    Estimated number of combinations: 139,740,802,927,165,440,000
    approx. 1.39*1020

    View Slide

  13. Thoth Station
    Why Thoth?
    import tensorflow as tf
    from flask import Flask
    application = Flask()
    1.39*1020
    combinations
    54,395,000
    combinations
    All combinations how to install application stack of libraries
    directly and indirectly used (estimation):
    1.39*1020 * 54,395,000 = 7.6*1027

    View Slide

  14. Thoth Station
    Why Thoth?
    import pandas as pd
    import tensorflow as tf
    from flask import Flask
    application = Flask()
    Operating System
    Fedora 30
    Fedora 29
    ...
    CentOS 7.6
    CentOS 7.5

    Python interpreter

    View Slide

  15. Thoth Station
    Why Thoth?
    import pandas as pd
    import tensorflow as tf
    from flask import Flask
    application = Flask()
    Operating System
    Python interpreter
    glibc cuda

    View Slide

  16. Thoth Station
    Hardware
    Why Thoth?
    import pandas as pd
    import tensorflow as tf
    from flask import Flask
    application = Flask()
    Operating System
    Python interpreter
    glibc cuda
    GPU CPU

    View Slide

  17. Thoth Station
    Why Thoth?
    Python application

    View Slide

  18. Thoth Station
    Hardware
    Why Thoth?
    Python application
    Operating System
    Python interpreter
    Native dependecies
    Kernel modules
    Direct Python dependencies
    Transitive Python dependencies

    View Slide

  19. Thoth Station
    Why Thoth?
    ● Create knowledge base
    ○ What packages in which versions should I use?
    ■ Application builds correctly
    ■ Application runs correctly
    ■ Application behaves and performs well
    ● Create an advanced Python resolver which uses knowledge base to
    resolve software stacks
    Latest versions are not always greatest choices.

    View Slide

  20. Thoth Station
    Building Thoth’s knowledge base

    View Slide

  21. Thoth Station
    Gathering data for Thoth’s knowledge base
    ● Resolving software stacks
    ○ own resolution algorithm
    ● Analyses of container images
    ○ JupyterHub images
    ○ Thoth’s container images
    ● Amun, Dependency Monkey
    ○ running CI and perf related tests
    ○ performance related analyses
    ● ...

    View Slide

  22. Thoth Station
    Optimized TensorFlow builds by Thoth team
    ● Automated tests of libraries
    ● Tests targeting performance
    ● Optimized TensorFlow builds
    https://tensorflow.pypi.thoth-station.ninja/

    View Slide

  23. Thoth Station
    Recommendations

    View Slide

  24. Thoth Station
    How good is my software stack?
    simplelib
    anotherlib

    View Slide

  25. Thoth Station

    View Slide

  26. Thoth Station
    v1 v2
    simplelib
    v1 v2
    anotherlib
    v1 v2
    dependency1
    v1
    dependency2
    v2

    View Slide

  27. Thoth Station
    v1 v2
    simplelib
    v1 v2
    anotherlib
    v1 v2
    dependency1
    v1
    dependency2
    v2
    pip/Pipenv (always latest):
    simplelib ==v2
    anotherlib ==v2
    dependency2 ==v2

    View Slide

  28. Thoth Station
    v1 v2
    simplelib
    v1 v2
    anotherlib
    v1 v2
    dependency1
    v1
    dependency2
    v2

    View Slide

  29. Thoth Station
    v1 v2
    simplelib
    v1 v2
    anotherlib
    v1 v2
    dependency1
    v1
    dependency2
    v2
    Causes errors based on
    Thoth’s knowledge base.

    View Slide

  30. Thoth Station
    v1 v2
    simplelib
    v1 v2
    anotherlib
    v1 v2
    dependency1
    v1
    dependency2

    View Slide

  31. Thoth Station
    v1 v2
    simplelib
    v1 v2
    anotherlib
    v1 v2
    dependency1
    v1
    dependency2
    Simplelib in version v1
    performs better together
    with dependency1 in
    version v1 based on
    Thoth’s knowledge base.

    View Slide

  32. Thoth Station
    v1
    v2
    simplelib
    v1 v2
    anotherlib
    v1
    v2
    dependency1
    v1
    dependency2

    View Slide

  33. Thoth Station
    v1
    v2
    simplelib
    v1 v2
    anotherlib
    v1
    v2
    dependency1
    v1
    dependency2
    Thoth (always greatest):
    simplelib ==v1
    anotherlib ==v2
    dependency1 ==v1
    dependency2 ==v1

    View Slide

  34. Thoth Station
    Stack generation pipeline
    Remove
    pre-releases
    Construct dependency
    graph
    Remove install
    errors
    Remove run
    errors
    Adjust based on
    performance
    Adjust based on
    security
    Sort based on
    semver
    Resolved stacks
    steram
    Performance
    based scoring
    Security based
    scoring
    Lockfile generation
    Final score gating
    Runtime
    environment
    Requirements
    Analysis of
    application
    Lock file
    Justification

    View Slide

  35. Thoth Station
    Extending information about Python packages

    View Slide

  36. Thoth Station
    ● License
    ● Classifiers
    ○ Programming Language :: Python :: 3.6
    ○ Programming Language :: Python :: Implementation :: CPython
    ● Package purpose
    ○ machine learning library
    ○ plugin
    ○ …
    ● Is the given package affecting performance?
    Python package metadata

    View Slide

  37. Thoth Station
    ● A vector space model
    ● Each vector in vector space corresponds to a project
    ● Each item in vector represents a feature
    ● Allows feature based queries and similar projects search
    F = {python, machine-learning, web, django-framework, webassembly, sql, spark, gpu-support, Java}
    F
    tf
    = {1, 1, 0, 0, 0, 0, 0, 1, 0}
    F - feature vector
    F
    tf
    - feature vector for project TensorFlow
    project2vec

    View Slide

  38. Thoth Station
    Image address
    project2vec

    View Slide

  39. Thoth Station
    Information about Thoth
    ● Website:
    ○ https://thoth-station.ninja/
    ● Twitter
    ○ https://twitter.com/thothstation
    ■ Follow for updates on public availability
    ● GitHub
    ○ https://github.com/thoth-station

    View Slide

  40. Thoth Station

    View Slide

  41. Thoth Station
    ● Community sitting at https://github.com/thoth-station/
    ● Bot Kebechet
    ○ https://thoth-station.ninja/kebechet/
    ● Twitter
    ○ https://twitter.com/thothstation
    $ pip3 install thamos
    $ cd ~/repositories/my-repo/
    $ thamos config
    $ thamos advise
    It’s all on you

    View Slide

  42. THANK YOU
    plus.google.com/+RedHat
    linkedin.com/company/red-hat
    youtube.com/user/RedHatVideos
    facebook.com/redhatinc
    twitter.com/RedHat

    View Slide