Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Libraries, Shared Libraries, and Dynamic Loading

Libraries, Shared Libraries, and Dynamic Loading

Course materials from Scientific Computing with Python, 2002.

David Beazley

January 21, 2002
Tweet

More Decks by David Beazley

Other Decks in Programming

Transcript

  1. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 1

    [email protected] Libraries, Shared Libraries, and Dynamic Loading David M. Beazley Department of Computer Science University of Chicago [email protected] January 21, 2002 PY-MA
  2. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 2

    [email protected] Libraries? Scripted software is all about libraries • Applications are not monolithic executables • Organized as collection of modules --> Extension modules • Extension modules are shared libraries (DLLs) Libraries are problematic • Especially if you don’t know what you’re doing Most programmers do not understand libraries • At least in my general experience. • Even in CS • And the confusion might be worse after I’m done. This tutorial: • A crash course in programming libraries
  3. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 3

    [email protected] Modular Software Libraries are an essential part of modular software • The C library • The C++ library • Lapack • OpenGL • Vtk • MPI • Numerous others Most scientific programs can be organized as libraries • Remove main() and the user-interface • Repackage everything that remains as a library • Might split into multiple libraries (integration, I/O, data analysis, etc.) • Libraries often used to organize large monolithic applications
  4. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 4

    [email protected] Static Libraries Static libraries are collections of object files $ cc -c foo.c bar.c $ ls *.o foo.o bar.o $ ar cr libfoo.a foo.o bar.o • Create using ’ar’ command (create an archive) Using the library in other programs $ cc $(SRCS) -lfoo Typical code organization with libraries /src/ /lib/ /Analysis ---> libanalysis.a /Visualization ---> libvis.a /Integrators ---> libintegrate.a /Grids ---> libgrid.a $ cc main.c -lanalysis -lvis -lintegrate -lgrid -o myprog
  5. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 5

    [email protected] Static Linking Linking • Linker combines object files and binds symbol names to memory addresses • That’s all it does Object files • Broken into different segments • Linker merely combines the sections and resolves symbols to make an exe .text .data .bss .text .data .bss + foo.o bar.o .text .data .bss link .text = Executable instructions .data = Initialized global variables .bss = Uninitialized global variables a.out bar() = ??? bar() { .. } bar() bar() { ... }
  6. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 6

    [email protected] Static Linking Libraries also used to resolve unknown symbols • Code is copied from library into target program • However, only the used symbols are copied (not the entire library) Comment • The ordering of libraries matters - symbols resolved in order specified $(OBJS) .text cos = ???? .data libm.a cos.o cos() ld $(OBJS) -lm .text cos.o cos() .data a.out
  7. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 7

    [email protected] Problems with Static Linking Code bloat • Library code gets copied into every executable. • And it consumes more memory in every running process. • Big problem for system utilities (especially on Unix) Code maintenance • Suppose you fix a library bug • Then you have to relink everything. • A major pain if you don’t have the original object files (then you recompile) • Consider a multi-user project ==> Can’t easy propagate bug fixes (must relink) There are other subtle problems • Later
  8. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 8

    [email protected] Shared Libraries An alternative to static linking/static libraries • Fixes many of the problems with static libraries • Idea : libraries are "shared" by different executables. Creating a shared library $ cc -c $(SRCS) $ cc -shared $(OBJS) -o libfoo.so • Requires special compiler/linker options (-shared, -G, etc.) • Name of library has .so, .sl, or .dll suffix. • Both of these aspects are machine dependent (non-portable) • Already saw this when creating scripting language extensions Using shared libraries $ cc main.c -lfoo Comment • If both libfoo.a and libfoo.so exist, the linker usually picks the shared library
  9. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 9

    [email protected] Shared Libraries Dynamic linking • Run-time binding of libraries to programs • Library is decoupled from the executable. Idea • Executable is encoded with the name of the library ("libm.so") • Relocation list (patch list) lists all unresolved symbols and locations • Executable does not actually include any code from the library $(OBJS) .text cos = ???? .data libm.so cos.o cos() ld $(OBJS) -lm "libm.so" .text .data .reloc cos cos cos cos
  10. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 10

    [email protected] Running with Shared Libraries Dynamic Linking • Binding of libraries and symbols occurs at program startup. • Program is encoded with list of library dependencies $ ldd /usr/local/bin/python libpthread.so.1 => /usr/lib/libpthread.so.1 libsocket.so.1 => /usr/lib/libsocket.so.1 libnsl.so.1 => /usr/lib/libnsl.so.1 libdl.so.1 => /usr/lib/libdl.so.1 libthread.so.1 => /usr/lib/libthread.so.1 libm.so.1 => /usr/lib/libm.so.1 libc.so.1 => /usr/lib/libc.so.1 libmp.so.2 => /usr/lib/libmp.so.2 /usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1 • Library binding managed by a special run-time link-loader (ld.so.1)
  11. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 11

    [email protected] Shared Libraries Program startup • Execution begins in link-loader (ld.so.1) • It loads the libraries and links symbols • Then it starts your program. You can watch it in action $ env LD_DEBUG=basic a.out < bunch of output > $ env LD_DEBUG=help a.out # Gives more options $ a.out ld.so.1 libraries: "libm.so" "libc.so" load link main()
  12. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 12

    [email protected] Procedure Linking Run-time binding relies on Procedure Linkage Tables (PLT) • PLT is a jump table built into executable • Unresolved symbols point to the jump table • Run-time link/loader fills in jump table at program startup Performance comment • Adds small indirection to every function call • Fast startup time (ld.so.1 only has to fill in table). a.out "libm.so" .plt .reloc .text cos cos libm.so cos() { ... } patch ld.so.1 cos cos cos binding
  13. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 13

    [email protected] Interesting Features Lazy binding • Dynamic linking does not resolve symbols until they are used! $ env LD_DEBUG=bindings a.out • Binding occurs on first invocation of each function. • This is accomplished using PLT tricks. • PLT is initially filled with code that calls a binding function in ld.so.1 Maintenance • Since library is decoupled, can fix bugs and make changes • Changes automatically propagated to an application when it executes Sharing • Operating system can use page-sharing to reduce resources • One copy of C library loaded on whole machine---shared by all applications • Some magic involved (each process gets private data of course).
  14. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 14

    [email protected] Complications The library search path • The run-time linker usually only searches standard locations /lib /usr/lib /usr/local/lib A common problem $ ld $(OBJS) -L/your/package/libs -lbar -o a.out $ a.out ld.so.1: ./a.out: fatal: libbar.so: open failed: No such file or directory Killed To fix • Use -R or -rpath option to encode path into executable. $ ld $(OBJS) -L/your/package/libs -R/your/package/libs -lbar • LD_LIBRARY_PATH environment variable env LD_LIBRARY_PATH=/your/package/libs a.out
  15. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 15

    [email protected] LD_LIBRARY_PATH Rant Shared libraries are thorny • Many users set LD_LIBRARY_PATH as a band-aid • It is generally a bad idea. Problems • It slows down all other programs. • Linker always searches the LD_LIBRARY_PATH first (defeats caching) • Can really mess things up--if the path is bad or points to old libraries Comments • Linking your libraries with -R or -rpath fixes the problem (and is preferred) • Unfortunately, this is not portable. Different on every machine (sigh). • If you have to set LD_LIBRARY_PATH, consider using a shell script #!/bin/sh env LD_LIBRARY_PATH=/your/package/libs yourprogram
  16. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 16

    [email protected] Shared Libraries : Why Bother? Shared libraries are the key to components • Users can work on their own applications (in their own directory) • But libraries are shared. • Library maintainers can fix libraries • Changes are automatically propagated to all users • Users always get the most up-to-date version when they run. This solves a major software maintenance problem • Each user having their own copy of the code. /project/libs/ libvis.so libmd.so Common library repository User: Bob shockwave.exe crack.exe User: Jane
  17. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 17

    [email protected] Scripting Language Extensions Scripting languages use dynamic loading for modules • Modules are defined in terms of shared libraries • Loaded on demand by import or similar statement >>> import foo Dynamic loading is different than dynamic linking • Dynamic modules are not linked to the application (python) • Modules aren’t loaded at program startup. • Instead, managed by some library calls (used by Python) handle = dlopen("foomodule.so",...) void *sym = dlsym(handle,"initfoo") ... • These functions are part of the runtime link/loader ld.so.1
  18. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 18

    [email protected] Dynamic Loading Library is loaded into a running program • API functions in Python interpreter are exported as public. • Unresolved symbols in loaded modules are first resolved against Python • Libraries then used to resolve other symbols (if foomodule.so linked with libs) • Remaining unresolved symbols result in a loading error. Comments • It is never necessary to link a module with special Python libraries • Of course, a module has to link with other libraries to resolve other symbols Python PyArg_ParseTuple() exported symbol foomodule.so PyArg_ParseTuple() unresolved symbol ld.so.1 import foo
  19. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 19

    [email protected] Wrapper Code and Shared Libraries Decoupled libraries • In applications, it is often useful to decouple scripting wrappers and libraries • This can be done using shared libraries as before • Allows program libraries to be used independently of the Python interface • Python is incorporated as an optional enhancement • Allows libraries to be used in other settings (CORBA, COM, Tcl, Perl, etc.). Critical point • The shared library libvis.so is the component • Python wrappers are just wrappers (i.e., not the component). python vismodule.so libvis.so import ld ... -lvis a.out Stand-alone non-Python application wrappers library
  20. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 20

    [email protected] Dynamic Loading and Symbols Dynamic modules load into their own "namespace" • This is a rather amazing property --- more powerful than normal linking • Can concurrenty control multiple packages from Python • Even if those packages have conflicting symbol names! Caveat • If Python defines a symbol, it takes precedence • Ex: If Python also defined spam(), it would be used instead. Python foomodule.so libfoo.so barmodule.so libbar.so spam() spam() Different functions - no clash import import
  21. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 21

    [email protected] Library Sharing Many modules might link against the same library • Only one copy of each library is loaded • libcommon.so loaded only once • Shared by multiple modules This is good • Consider the problem of duplicated state! Chaos. • Decoupling of Python wrappers from libraries also important here. Python foomodule.so libfoo.so barmodule.so libbar.so libcommon.so import import
  22. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 22

    [email protected] Static Library Hell Never mix static libraries and shared libraries • You might be inclined to mix static libraries (.a files) with dynamic modules • A really really really really really really really bad idea. • Static library code is copied into each shared library • Might not notice for functions • Global variables cause total head explosion • Modification of variable in one module has no effect on value in other module • Two separate copies of variables (recall earlier slide) Python foomodule.so libfoo.so barmodule.so libbar.so import import libcommon.a libcommon.a copy
  23. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 23

    [email protected] Dynamic Library Filters Function chaining #include <dlfcn.h> #include <stdio.h> typedef void *(*malloc_t)(size_t nbytes); void *malloc(size_t nbytes) { void *r; static malloc_t real_malloc = 0; if (!real_malloc) { real_malloc = (malloc_t) dlsym(RTLD_NEXT, "malloc"); } r = (*real_malloc)(nbytes); printf("malloc %d bytes at %x\n", nbytes, r); return r; } • This isn’t the real malloc---it just calls the real malloc • How does this work?
  24. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 24

    [email protected] Dynamic Library Filters Compilation and Use $ cc -c mymalloc.c $ cc -shared mymalloc.o -o mallocfilter.so -ldl $ env LD_PRELOAD=./mallocfilter.so python Python 2.1 (#3, Aug 20 2001, 15:41:42) [GCC 2.95.2 19991024 (release)] on sunos5 Type "copyright", "credits" or "license" for more informa- tion. malloc 28 bytes at bc5c0 malloc 4 bytes at e57a8 malloc 28 bytes at c07d8 ... >>> ld.so.1 mallocfilter.so malloc() preload python malloc ... libc.so malloc() dlsym(RTLD_NEXT,...)
  25. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 25

    [email protected] Library Preloading Applications • Debugging • Reconfigurable runtime libraries • Many applications have a common core of functions • Preloading (and related tricks) allow you to have multiple versions • Debugging/production versions Comment • Can change configurations without relinking or recompilation. libvis.so libmd.so libanalysis.so libcommon.so
  26. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 26

    [email protected] Dynamic Loading and C++ C++ has special complications class Foo { public: Foo(); ~Foo(); ... }; Foo f; • The object f has to be initialized before it can be used • Implies execution of class constructor (somehow) • Class destructor called on program termination Historically, this has been problematic • Especially for dynamic loading • It does work on modern machines however
  27. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 27

    [email protected] Dynamic Loading and C++ Initialization of objects • All shared libraries have _init and _fini sections to handle initialization • Invocation is handled automatically by dynamic linker Comment • Not possible to unload modules in Python (not recommended anyways) • _fini() methods called when interpreter exits. Python foomodule.so libfoo.so import _init() _fini() executed on loading executed on unloading C++
  28. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 28

    [email protected] Dynamic Loading and C++ C++ modules may have additional library dependencies • Python is not written in C++ • Need to bring extra support code into memory Always link C++ extensions with C++ compiler $ c++ -shared $(OBJS) -o somemodule.so May need additional libraries $ ld -G $(OBJS) -lCrun -o somemodule.so (Solaris) No portable way to handle this • Depends on the platform (sigh) Comment • Some people say you have to link Python with C++ compiler to make it work • Not necessary in my experience.
  29. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 29

    [email protected] Miscellaneous Topics PIC code • Position independent code • Usually a special compiler option $ cc -c -fpic $(SRCS) What is it? • Generates relocatable code. • Uses indirection tables and other tricks Applications • Commonly used by system libraries • Enables the OS to optimize memory via page sharing. • Reduces load time (no patching required). Comments • Not generally needed for extensions • Introduces a modest performance hit (5-15%)
  30. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 30

    [email protected] Object File Formats Can I mix 32-bit and 64-bit code together? • No. • Python, all extension modules, and all libraries have to be of the same type. • All 32-bit or all 64-bit. Can different 32-bit formats be mixed together • No. • Problem on SGI (-o32, -n32) Pitfall • Commercial libraries distributed only as binaries • Have to be in same binary format that you want to use. C++ binary compatibility • C++ does not specify a binking linking standard. • Code compiled with one compiler will not link with code from another. • Could create an unstable run-time environment (memory management) • May complicate cross-module data sharing.
  31. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 31

    [email protected] Putting it Together Modular Scripted Scientific Application Component library libvis.so libio.so libphysics.so ... Stand-alone applications link Scripting Wrappers vismodule.so iomodule.so physicsmodule.so ... link Python scripts import import
  32. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 32

    [email protected] Summary Shared libraries • A very powerful feature • Key to creating software components Dynamic loading • Fundamental extension mechanism of interpreters Separation of libraries • Keep the Python wrappers separate from the underlying application • Allows Python to be decoupled • Supports stand-alone apps (maybe the original application) • Allows components to be reused in other settings (non-Python) Worth the extra effort • Getting all of this to work smoothly is hard at first • Pays off later • Simplified maintenance, better software organization • Added flexibility
  33. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 33

    [email protected] Further Directions If your head hasn’t exploded yet... Software component frameworks (COM, CORBA, etc.) • Portable dynamic linking and software modules for objects • Fix a lot of the portability problems with C++ • Worth considering if you are heavily invested in OO. • Personal preference--I think I would probably use COM (if I had to choose) Comments • Probably not worth it unless your application is really huge. • Scripting languages can work with these component frameworks • A little more complicated. .NET? • There’s some buzz • I honestly don’t know if this will be useful to scientists or not.