[email protected] Libraries, Shared Libraries, and Dynamic Loading David M. Beazley Department of Computer Science University of Chicago [email protected] January 21, 2002 PY-MA
[email protected] Libraries? Scripted software is all about libraries • Applications are not monolithic executables • Organized as collection of modules --> Extension modules • Extension modules are shared libraries (DLLs) Libraries are problematic • Especially if you don’t know what you’re doing Most programmers do not understand libraries • At least in my general experience. • Even in CS • And the confusion might be worse after I’m done. This tutorial: • A crash course in programming libraries
[email protected] Modular Software Libraries are an essential part of modular software • The C library • The C++ library • Lapack • OpenGL • Vtk • MPI • Numerous others Most scientific programs can be organized as libraries • Remove main() and the user-interface • Repackage everything that remains as a library • Might split into multiple libraries (integration, I/O, data analysis, etc.) • Libraries often used to organize large monolithic applications
[email protected] Static Linking Libraries also used to resolve unknown symbols • Code is copied from library into target program • However, only the used symbols are copied (not the entire library) Comment • The ordering of libraries matters - symbols resolved in order specified $(OBJS) .text cos = ???? .data libm.a cos.o cos() ld $(OBJS) -lm .text cos.o cos() .data a.out
[email protected] Problems with Static Linking Code bloat • Library code gets copied into every executable. • And it consumes more memory in every running process. • Big problem for system utilities (especially on Unix) Code maintenance • Suppose you fix a library bug • Then you have to relink everything. • A major pain if you don’t have the original object files (then you recompile) • Consider a multi-user project ==> Can’t easy propagate bug fixes (must relink) There are other subtle problems • Later
[email protected] Shared Libraries An alternative to static linking/static libraries • Fixes many of the problems with static libraries • Idea : libraries are "shared" by different executables. Creating a shared library $ cc -c $(SRCS) $ cc -shared $(OBJS) -o libfoo.so • Requires special compiler/linker options (-shared, -G, etc.) • Name of library has .so, .sl, or .dll suffix. • Both of these aspects are machine dependent (non-portable) • Already saw this when creating scripting language extensions Using shared libraries $ cc main.c -lfoo Comment • If both libfoo.a and libfoo.so exist, the linker usually picks the shared library
[email protected] Shared Libraries Dynamic linking • Run-time binding of libraries to programs • Library is decoupled from the executable. Idea • Executable is encoded with the name of the library ("libm.so") • Relocation list (patch list) lists all unresolved symbols and locations • Executable does not actually include any code from the library $(OBJS) .text cos = ???? .data libm.so cos.o cos() ld $(OBJS) -lm "libm.so" .text .data .reloc cos cos cos cos
[email protected] Running with Shared Libraries Dynamic Linking • Binding of libraries and symbols occurs at program startup. • Program is encoded with list of library dependencies $ ldd /usr/local/bin/python libpthread.so.1 => /usr/lib/libpthread.so.1 libsocket.so.1 => /usr/lib/libsocket.so.1 libnsl.so.1 => /usr/lib/libnsl.so.1 libdl.so.1 => /usr/lib/libdl.so.1 libthread.so.1 => /usr/lib/libthread.so.1 libm.so.1 => /usr/lib/libm.so.1 libc.so.1 => /usr/lib/libc.so.1 libmp.so.2 => /usr/lib/libmp.so.2 /usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1 • Library binding managed by a special run-time link-loader (ld.so.1)
[email protected] Shared Libraries Program startup • Execution begins in link-loader (ld.so.1) • It loads the libraries and links symbols • Then it starts your program. You can watch it in action $ env LD_DEBUG=basic a.out < bunch of output > $ env LD_DEBUG=help a.out # Gives more options $ a.out ld.so.1 libraries: "libm.so" "libc.so" load link main()
[email protected] Procedure Linking Run-time binding relies on Procedure Linkage Tables (PLT) • PLT is a jump table built into executable • Unresolved symbols point to the jump table • Run-time link/loader fills in jump table at program startup Performance comment • Adds small indirection to every function call • Fast startup time (ld.so.1 only has to fill in table). a.out "libm.so" .plt .reloc .text cos cos libm.so cos() { ... } patch ld.so.1 cos cos cos binding
[email protected] Interesting Features Lazy binding • Dynamic linking does not resolve symbols until they are used! $ env LD_DEBUG=bindings a.out • Binding occurs on first invocation of each function. • This is accomplished using PLT tricks. • PLT is initially filled with code that calls a binding function in ld.so.1 Maintenance • Since library is decoupled, can fix bugs and make changes • Changes automatically propagated to an application when it executes Sharing • Operating system can use page-sharing to reduce resources • One copy of C library loaded on whole machine---shared by all applications • Some magic involved (each process gets private data of course).
[email protected] Complications The library search path • The run-time linker usually only searches standard locations /lib /usr/lib /usr/local/lib A common problem $ ld $(OBJS) -L/your/package/libs -lbar -o a.out $ a.out ld.so.1: ./a.out: fatal: libbar.so: open failed: No such file or directory Killed To fix • Use -R or -rpath option to encode path into executable. $ ld $(OBJS) -L/your/package/libs -R/your/package/libs -lbar • LD_LIBRARY_PATH environment variable env LD_LIBRARY_PATH=/your/package/libs a.out
[email protected] LD_LIBRARY_PATH Rant Shared libraries are thorny • Many users set LD_LIBRARY_PATH as a band-aid • It is generally a bad idea. Problems • It slows down all other programs. • Linker always searches the LD_LIBRARY_PATH first (defeats caching) • Can really mess things up--if the path is bad or points to old libraries Comments • Linking your libraries with -R or -rpath fixes the problem (and is preferred) • Unfortunately, this is not portable. Different on every machine (sigh). • If you have to set LD_LIBRARY_PATH, consider using a shell script #!/bin/sh env LD_LIBRARY_PATH=/your/package/libs yourprogram
[email protected] Shared Libraries : Why Bother? Shared libraries are the key to components • Users can work on their own applications (in their own directory) • But libraries are shared. • Library maintainers can fix libraries • Changes are automatically propagated to all users • Users always get the most up-to-date version when they run. This solves a major software maintenance problem • Each user having their own copy of the code. /project/libs/ libvis.so libmd.so Common library repository User: Bob shockwave.exe crack.exe User: Jane
[email protected] Scripting Language Extensions Scripting languages use dynamic loading for modules • Modules are defined in terms of shared libraries • Loaded on demand by import or similar statement >>> import foo Dynamic loading is different than dynamic linking • Dynamic modules are not linked to the application (python) • Modules aren’t loaded at program startup. • Instead, managed by some library calls (used by Python) handle = dlopen("foomodule.so",...) void *sym = dlsym(handle,"initfoo") ... • These functions are part of the runtime link/loader ld.so.1
[email protected] Dynamic Loading Library is loaded into a running program • API functions in Python interpreter are exported as public. • Unresolved symbols in loaded modules are first resolved against Python • Libraries then used to resolve other symbols (if foomodule.so linked with libs) • Remaining unresolved symbols result in a loading error. Comments • It is never necessary to link a module with special Python libraries • Of course, a module has to link with other libraries to resolve other symbols Python PyArg_ParseTuple() exported symbol foomodule.so PyArg_ParseTuple() unresolved symbol ld.so.1 import foo
[email protected] Wrapper Code and Shared Libraries Decoupled libraries • In applications, it is often useful to decouple scripting wrappers and libraries • This can be done using shared libraries as before • Allows program libraries to be used independently of the Python interface • Python is incorporated as an optional enhancement • Allows libraries to be used in other settings (CORBA, COM, Tcl, Perl, etc.). Critical point • The shared library libvis.so is the component • Python wrappers are just wrappers (i.e., not the component). python vismodule.so libvis.so import ld ... -lvis a.out Stand-alone non-Python application wrappers library
[email protected] Dynamic Loading and Symbols Dynamic modules load into their own "namespace" • This is a rather amazing property --- more powerful than normal linking • Can concurrenty control multiple packages from Python • Even if those packages have conflicting symbol names! Caveat • If Python defines a symbol, it takes precedence • Ex: If Python also defined spam(), it would be used instead. Python foomodule.so libfoo.so barmodule.so libbar.so spam() spam() Different functions - no clash import import
[email protected] Library Sharing Many modules might link against the same library • Only one copy of each library is loaded • libcommon.so loaded only once • Shared by multiple modules This is good • Consider the problem of duplicated state! Chaos. • Decoupling of Python wrappers from libraries also important here. Python foomodule.so libfoo.so barmodule.so libbar.so libcommon.so import import
[email protected] Static Library Hell Never mix static libraries and shared libraries • You might be inclined to mix static libraries (.a files) with dynamic modules • A really really really really really really really bad idea. • Static library code is copied into each shared library • Might not notice for functions • Global variables cause total head explosion • Modification of variable in one module has no effect on value in other module • Two separate copies of variables (recall earlier slide) Python foomodule.so libfoo.so barmodule.so libbar.so import import libcommon.a libcommon.a copy
[email protected] Dynamic Library Filters Function chaining #include <dlfcn.h> #include <stdio.h> typedef void *(*malloc_t)(size_t nbytes); void *malloc(size_t nbytes) { void *r; static malloc_t real_malloc = 0; if (!real_malloc) { real_malloc = (malloc_t) dlsym(RTLD_NEXT, "malloc"); } r = (*real_malloc)(nbytes); printf("malloc %d bytes at %x\n", nbytes, r); return r; } • This isn’t the real malloc---it just calls the real malloc • How does this work?
[email protected] Library Preloading Applications • Debugging • Reconfigurable runtime libraries • Many applications have a common core of functions • Preloading (and related tricks) allow you to have multiple versions • Debugging/production versions Comment • Can change configurations without relinking or recompilation. libvis.so libmd.so libanalysis.so libcommon.so
[email protected] Dynamic Loading and C++ C++ has special complications class Foo { public: Foo(); ~Foo(); ... }; Foo f; • The object f has to be initialized before it can be used • Implies execution of class constructor (somehow) • Class destructor called on program termination Historically, this has been problematic • Especially for dynamic loading • It does work on modern machines however
[email protected] Dynamic Loading and C++ Initialization of objects • All shared libraries have _init and _fini sections to handle initialization • Invocation is handled automatically by dynamic linker Comment • Not possible to unload modules in Python (not recommended anyways) • _fini() methods called when interpreter exits. Python foomodule.so libfoo.so import _init() _fini() executed on loading executed on unloading C++
[email protected] Dynamic Loading and C++ C++ modules may have additional library dependencies • Python is not written in C++ • Need to bring extra support code into memory Always link C++ extensions with C++ compiler $ c++ -shared $(OBJS) -o somemodule.so May need additional libraries $ ld -G $(OBJS) -lCrun -o somemodule.so (Solaris) No portable way to handle this • Depends on the platform (sigh) Comment • Some people say you have to link Python with C++ compiler to make it work • Not necessary in my experience.
[email protected] Miscellaneous Topics PIC code • Position independent code • Usually a special compiler option $ cc -c -fpic $(SRCS) What is it? • Generates relocatable code. • Uses indirection tables and other tricks Applications • Commonly used by system libraries • Enables the OS to optimize memory via page sharing. • Reduces load time (no patching required). Comments • Not generally needed for extensions • Introduces a modest performance hit (5-15%)
[email protected] Object File Formats Can I mix 32-bit and 64-bit code together? • No. • Python, all extension modules, and all libraries have to be of the same type. • All 32-bit or all 64-bit. Can different 32-bit formats be mixed together • No. • Problem on SGI (-o32, -n32) Pitfall • Commercial libraries distributed only as binaries • Have to be in same binary format that you want to use. C++ binary compatibility • C++ does not specify a binking linking standard. • Code compiled with one compiler will not link with code from another. • Could create an unstable run-time environment (memory management) • May complicate cross-module data sharing.
[email protected] Summary Shared libraries • A very powerful feature • Key to creating software components Dynamic loading • Fundamental extension mechanism of interpreters Separation of libraries • Keep the Python wrappers separate from the underlying application • Allows Python to be decoupled • Supports stand-alone apps (maybe the original application) • Allows components to be reused in other settings (non-Python) Worth the extra effort • Getting all of this to work smoothly is hard at first • Pays off later • Simplified maintenance, better software organization • Added flexibility
[email protected] Further Directions If your head hasn’t exploded yet... Software component frameworks (COM, CORBA, etc.) • Portable dynamic linking and software modules for objects • Fix a lot of the portability problems with C++ • Worth considering if you are heavily invested in OO. • Personal preference--I think I would probably use COM (if I had to choose) Comments • Probably not worth it unless your application is really huge. • Scripting languages can work with these component frameworks • A little more complicated. .NET? • There’s some buzz • I honestly don’t know if this will be useful to scientists or not.