Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Barry Warsaw - Get your resources faster, with importlib.resources

Barry Warsaw - Get your resources faster, with importlib.resources

Resources are files that live within Python packages. Think test data files, certificates, templates, translation catalogs, and other static files you want to access from Python code. Sometimes you put these static files in a package directory within your source tree, and then locate them by importing the package and using its `__file__` attribute. But this doesn't work for zip files!

You could use `pkg_resources`, an API that comes with `setuptools` and hides the differences between files on the file system and files in a zip file. This is great because you don't have to use `__file__`, but it's not so great because `pkg_resources` is a big library and can have potentially severe performance problems, even at import time.

Welcome to `importlib.resources`, a new module and API in Python 3.7 that is also available as a standalone library for older versions of Python. `importlib.resources` is build on top of Python's existing import system, so it is very efficient. It also defines an abstract base class which loaders can implement to provide their own resource access. Python's built-in zipimporter uses this to provide efficient access to resources within a zip file. Third party import hooks can do the same, so resources can come from anything that is importable by Python.

This talk will step through the motivations behind `importlib.resources`, the library's usage, its interfaces, and the hooks made available to third party packages. It will also talk about the minor differences between the standalone version and the version in Python 3.7's standard library. Hopefully audience members will come away with compelling reasons to port their code to this much more efficient library.

https://us.pycon.org/2018/schedule/presentation/162/

PyCon 2018

May 11, 2018
Tweet

More Decks by PyCon 2018

Other Decks in Programming

Transcript

  1. importlib.resources importlib.resources If you can import it, you can read

    it* If you can import it, you can read it* Pycon 2018 Cleveland, Ohio May 2018 Barry Warsaw Python Foundation @ LinkedIn
  2. Types of static files Types of static files Templates Sample

    data Certificates gettext translation catalogs
  3. Naive approach Naive approach import thepkg from pathlib import Path

    pkg = Path(thepkg.__file__).parent path = pkg / 'data' / 'sample.dat' with open(path, 'rb') as fp: contents = fp.read()
  4. Zip files and zipapps Zip files and zipapps pkg =

    Path(thepkg.__file__).parent path = pkg / 'data' / 'sample.dat' with open(path, 'rb') as fp: contents = fp.read() Traceback (most recent call last): File "run.py", line 7, in <module> with open(path, 'rb') as fp: NotADirectoryError: [Errno 20] Not a directory: '.../thepkg.zip/thepkg/data/sample.dat'
  5. pkg_resources pkg_resources Basic Resource Access from pkg_resources import \ resource_string

    as resource_bytes contents = resource_bytes( 'thepkg', 'data/sample.dat') Works for both file system Works for both file system paths and zip file paths paths and zip file paths
  6. pkg_resources pkg_resources has import-time side-effects is slow tries to do

    too much has funky APIs is everywhere still supports Python 2
  7. We can do better! We can do better! Because we

    have Python's import machinery to help us
  8. importlib.resources importlib.resources from importlib.resources import read_binary contents = read_binary( 'thepkg.data',

    'sample.dat') import thepkg.data contents = read_binary( thepkg.data, 'sample.dat')
  9. Terminology Terminology Access a "resource" in a "package" Q: What's

    a "package"? Q: What's a "resource"? Subdirectories/subpackages are not resources! Namespace packages cannot contain resources E.g. a directory containing an __init__.py A: Any importable module with a __path__ attribute A: Any readable object contained in a package E.g. a file inside a package
  10. Packages and resources Packages and resources thepkg/ __init__.py a.py b.py

    data/ __init__.py sample.dat Package: thepkg.data
  11. importlib.resources API importlib.resources API Get the contents of a resource

    read_binary( package: Package, resource: Resource) ­> bytes read_text( package: Package, resource: Resource, encoding: str = 'utf­8', errors: str = 'strict') ­> str
  12. importlib.resources API importlib.resources API Get a file-like object open for

    reading open_text( package: Package, resource: Resource, encoding: str = 'utf­8', errors: str = 'strict') ­> TextIO open_binary( package: Package, resource: Resource) ­> BinaryIO
  13. importlib.resources API importlib.resources API Get a concrete file system path

    with path( thepkg, 'foo.cpython­37m­darwin.so' ) as lib: import_shared_library(lib) path( package: Package, resource: Resource) ­> Iterator[Path]
  14. importlib.resources API importlib.resources API List what's in a package *

    contents( package: Package) ­> Iterable[str] * Items are not guaranteed to be resources! >>> print(sorted(contents( 'thepkg.data'))) ['__init__.py', '__pycache__', 'sample.dat']
  15. importlib.resources API importlib.resources API Is a thing a resource? is_resource(

    package: Package name: str) ­> bool * Use this with contents() to iterate over resources in a package
  16. API for loaders API for loaders Low level API for

    custom loaders Built-in support for file system and zips loader.get_resource_reader( str: package_name ) ­> importlib.abc.ResourceReader
  17. importlib.abc.ResourceReader importlib.abc.ResourceReader open_resource(str: resource) ­> BytesIO resource_path(str: resource) ­> str

    is_resource(str: name) ­> bool contents() ­> Iterable[str] FileNotFoundError raised when resource doesn't exist resource_path() requires a concrete file system path contents() can return non-resources
  18. Performance Performance CLIs start up 25-50% faster importlib.resources shiv (new

    open source replacement for pex) http://shiv.readthedocs.io/en/latest/
  19. Give it up for Give it up for Brett Cannon

    Brett Cannon First of hopefully many great collaborations between the LinkedIn and Microsoft Python teams