Slide 1

Slide 1 text

PHP Internals for the Inquisitive Developer Jeremy Mikola @jmikola

Slide 2

Slide 2 text

A Little About Myself

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

Some Topics to Cover ● Request lifecycle ● Types of PHP extensions ● Internal data structures ● Navigating PHP’s source ● Executing a PHP file ● Opcodes, caching, and optimization ● Examining how PHP interacts with C ● Debugging crashes

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Everything Starts Here* int main(int argc, char *argv[]) { return 0; }

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Server APIs ● Apache mod_php ● Command Line Interface (CLI) ● Common Gateway Interface (CGI) ○ FastCGI allows for persistent processes ○ Apache mod_fcgid ● FastCGI Process Manager (FPM) ○ Apache mod_proxy_fcgi

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Types of Extensions ● Extensions (aka “modules”) ○ Allows new concepts to be added to PHP ○ Integration with C, system libraries ○ e.g. APCu, MongoDB, OpenSSL ● Zend Extensions ○ More hooks for changing PHP’s behavior ○ Commonly used for debuggers and profilers ○ May also register normal extensions for user APIs ○ e.g. OPCache, Xdebug, phpdbg, Blackfire

Slide 14

Slide 14 text

Module Scope ● Module initialization ○ Allocate persistent read-only globals ○ Register INI entries, classes, constants, etc. ○ Initialize third-party libraries ● Module shutdown ○ Free persistent allocations ○ Unregister INI entries

Slide 15

Slide 15 text

Request Scope ● Request initialization ○ Allocate request-bound memory ○ Reset globals as needed ○ Avoid unintentionally altering global state ● Request shutdown ○ Free request-bound allocations ○ Zend Memory Manager helps catch leaks

Slide 16

Slide 16 text

Process Model

Slide 17

Slide 17 text

Thread Model

Slide 18

Slide 18 text

Global Scope ● Globals ○ Use persistent allocation, initialize to zero ○ Requests access globals via TSRM macros ○ Zend hash globals can be used as a cache ● Process model ○ GINIT is called once, before MINIT ○ GSHUTDOWN is called once, during MSHUTDOWN ● Thread model ○ Additionally called each time a thread spawns or dies ○ TSRM macros take care of thread local storage

Slide 19

Slide 19 text

PHP Process Manager ● Uses ReactPHP to manage a pool of PHP processes (CLI) ● Bootstraps application once per worker process ● Each worker handles a series of HTTP requests ● Leverages request/response design in frameworks ● Operates entirely within one CLI “request” ● Issues with memory leaks, resetting state

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

It looks like you're giving a presentation on PHP internals. Would you like help? Talk about zvals

Slide 22

Slide 22 text

Zvals ● 16-byte struct consisting of three union fields ● value (8-bytes) ○ Integer and double values are stored inline ○ Pointers are used for other types (e.g. string, array, object) ○ Not used for null and boolean values (denoted by type_info) ● u1 contains type_info (4-bytes) ○ Type byte and various bit flags ● u2 is a multi-purpose union (4-bytes) ○ Used for hash tables, AST line numbers, foreach iteration, etc.

Slide 23

Slide 23 text

Zval Improvements in PHP 7 ● Zvals are no longer individually heap-allocated ○ Can now be directly embedded (e.g. hash buckets) ● Zvals are no longer refcounted ○ Refcounts stored on values themselves (e.g. zend_string) ○ Values can be shared independently of the zval struct ● Much less indirection and pointer traversal

Slide 24

Slide 24 text

Improvements to Other Types ● New string representation ○ zend_string struct replaces char* ○ Encapsulates refcount, string length, and data ● Hash table redesigned ○ Buckets allocated in sequence (less pointer traversal) ○ Optimizations for “packed arrays” ● Objects are more lightweight ○ Declared properties embedded in zend_object

Slide 25

Slide 25 text

Types and Attributes Refcounted Collectable Copyable Immutable Simple Types String ✘ ✘ Array ✘ ✘ ✘ Object ✘ ✘ Resource ✘ Reference ✘ Interned String Immutable Array ✘

Slide 26

Slide 26 text

Navigating PHP

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

Executing PHP

Slide 35

Slide 35 text

Interpretation vs. Compilation

Slide 36

Slide 36 text

Slide 37

Slide 37 text

T_OPEN_TAG T_WHITESPACE T_FUNCTION T_WHITESPACE T_STRING ( ) T_WHITESPACE { T_WHITESPACE T_ECHO T_WHITESPACE T_CONSTANT_ENCAPSED_STRING ; T_WHITESPACE T_RETURN T_WHITESPACE T_STRING ; T_WHITESPACE T_ECHO T_WHITESPACE T_CONSTANT_ENCAPSED_STRING ; T_WHITESPACE } T_WHITESPACE Parsing Tokens (token_get_all)

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

Vulcan Logic Dumper

Slide 40

Slide 40 text

Examining Opcodes (vld) function name: test number of ops: 4 compiled vars: none line #* E I O op fetch ext return operands ----------------------------------------------------- 5 0 E > ECHO 'Hello%21%0A' 6 1 > RETURN 8 2* ECHO 'Not+executed.%0A' 9 3* > RETURN null

Slide 41

Slide 41 text

Opcode Caching ● Various caching strategies ● OpArrays (i.e. opcode sequences) can be optimized ● OpArrays still interpreted at runtime ● JIT would allow machine code caching

Slide 42

Slide 42 text

Pass Bit Optimization 1 1 << 0 Casts, operators, internal functions with constant and literal arguments 2 1 << 1 Type coercion in expressions, conditional elimination (e.g. if statements) 3 1 << 2 Optimize self-assignment, post-increment, and jumps 5 1 << 4 Block optimization of control flow graph (CFG) 9 1 << 8 Optimize usage of temporary variables (register allocation) 10 1 << 9 Remove NOPs - 1 << 14 Collect constants for future replacement OpCode Optimizations https://stackoverflow.com/a/21291587/162228 https://phpinternals.net/categories/opcache

Slide 43

Slide 43 text

Approaching C

Slide 44

Slide 44 text

System Calls ● open and close file descriptors ● read and write files, sockets, devices ● fork, exec, or wait on another process ● exit the current process ● Send signals to other processes (kill) ● Map files or devices to memory (mmap, munmap) ● Allocate process memory (brk, sbrk)

Slide 45

Slide 45 text

Tracing System Calls (strace) /dev/null write(1, "Hello!\n", 7) = 7 +++ exited with 0 +++

Slide 46

Slide 46 text

Tracing System Calls (strace) $ strace -e openat php example.php 2>&1 | grep ini openat(AT_FDCWD, "/usr/bin/php-cli.ini", O_RDONLY) = -1 openat(AT_FDCWD, "/etc/php/7.2/cli/php-cli.ini", O_RDONLY) = -1 openat(AT_FDCWD, "/usr/bin/php.ini", O_RDONLY) = -1 openat(AT_FDCWD, "/etc/php/7.2/cli/php.ini", O_RDONLY) = 3 openat(AT_FDCWD, "/etc/php/7.2/cli/conf.d/10-opcache.ini", … openat(AT_FDCWD, "/etc/php/7.2/cli/conf.d/10-pdo.ini", …

Slide 47

Slide 47 text

Tracing Library Calls (ltrace) $ ltrace -l mongodb.so php example.php mongodb.so->mongoc_log_trace_disable(0, 1, 0x55a78b57a410, … mongodb.so->mongoc_log_set_handler(0, 0, 0x55a78b57a410, … mongodb.so->mongoc_init(0, 0, 0x55a78b552bb8, 0x7f1941ec6320 … mongodb.so->_mongoc_openssl_init(0x7ffcf6e60460, … mongodb.so->bson_malloc0(40, 0, 0, 0x7f1940ecf8e0) … mongodb.so->bson_malloc0(40, 0x55a78b42a018, 0x55a78b554bc0, … mongodb.so->_mongoc_counters_init(0x55a78b5a4710, …

Slide 48

Slide 48 text

Debugging Crashes (gdb) ● The most common crashes are segfaults ○ C makes it trivially easy to access memory incorrectly ● Ideally, crashes produce core dumps, which can be inspected ○ https://bugs.php.net/bugs-generating-backtrace.php ● PHP source includes a .gdbinit file with helpful macros

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

This’ll Do Just Fine

Slide 51

Slide 51 text

Capturing a Core Dump $ php class-tostring-recursion.php Segmentation fault (core dumped) $ ls class-tostring-recursion.php core $ gdb `which php` core

Slide 52

Slide 52 text

Capturing a Core Dump $ gdb -q --args php class-tostring-recursion.php Reading symbols from php...done. (gdb) run Starting program: /usr/bin/php class-tostring-recursion.php Program received signal SIGSEGV, Segmentation fault. 0x0000555555c9ab9e in zend_call_function (fci=, fci_cache=) at /tmp/build_php-7.2.6.sx8/php-7.2.6/Zend/zend_execute_API.c:659 (gdb)

Slide 53

Slide 53 text

Traversing the Backtrace (gdb) bt -20 #45993 0x0000555555ce6e8a in zend_call_method (object=0x7ffff3423110, #45994 0x0000555555d13de3 in zend_std_cast_object_tostring (readobj=0x #45995 0x0000555555ca5c8f in _zval_get_string_func (op=0x7ffff3423110) #45996 0x0000555555cb4b68 in zend_make_printable_zval (expr=0x7ffff342 #45997 0x0000555555cae0a8 in concat_function (result=0x7ffff3423120, o #45998 0x0000555555d4069a in ZEND_CONCAT_SPEC_CONST_TMPVAR_HANDLER () #45999 0x0000555555db6336 in execute_ex (ex=0x7ffff34230c0) at /tmp/bu #46000 0x0000555555c9b9e1 in zend_call_function (fci=0x7fffffffb0b0, f #46001 0x0000555555ce6e8a in zend_call_method (object=0x7ffff3423090, #46002 0x0000555555d13de3 in zend_std_cast_object_tostring (readobj=0x #46003 0x0000555555ca5c8f in _zval_get_string_func (op=0x7ffff3423090)

Slide 54

Slide 54 text

#45999 0x0000555555db6336 in execute_ex (ex=0x7ffff34230c0) at /tmp/bu #46000 0x0000555555c9b9e1 in zend_call_function (fci=0x7fffffffb0b0, f #46001 0x0000555555ce6e8a in zend_call_method (object=0x7ffff3423090, #46002 0x0000555555d13de3 in zend_std_cast_object_tostring (readobj=0x #46003 0x0000555555ca5c8f in _zval_get_string_func (op=0x7ffff3423090) #46004 0x0000555555cb4b68 in zend_make_printable_zval (expr=0x7ffff342 #46005 0x0000555555cae0a8 in concat_function (result=0x7ffff34230b0, o #46006 0x0000555555d4069a in ZEND_CONCAT_SPEC_CONST_TMPVAR_HANDLER () #46007 0x0000555555db6336 in execute_ex (ex=0x7ffff3423030) at /tmp/bu #46008 0x0000555555dbaa7d in zend_execute (op_array=0x7ffff3489300, re #46009 0x0000555555cb8f1e in zend_execute_scripts (type=8, retval=0x0, #46010 0x0000555555befaa9 in php_execute_script (primary_file=0x7fffff #46011 0x0000555555dbd88f in do_cli (argc=2, argv=0x5555568f9380) at / #46012 0x0000555555dbed26 in main (argc=2, argv=0x5555568f9380) at /tm

Slide 55

Slide 55 text

(take a breath)

Slide 56

Slide 56 text

Resources and Further Reading References about Maintaining and Extending PHP https://wiki.php.net/internals/references PHP Internals (Thomas Punt) https://phpinternals.net/ Derick Rethans’ Blog https://derickrethans.nl/ Nikita Popov’s Blog https://nikic.github.io/ PHP Internals Book (Nikita and Julien Pauli) http://www.phpinternalsbook.com/

Slide 57

Slide 57 text

Thanks! Jeremy Mikola @jmikola

Slide 58

Slide 58 text

Photo Credits ● https://imgur.com/gallery/0hcTtiW ● http://inception.wikia.com/wiki/Fischer_inception_job?file=Cobb_using_the_Mr._Charles_tactic.png ● http://www.phpinternalsbook.com/ ● https://support.cloud.engineyard.com/hc/en-us/articles/205411888-PHP-Performance-I-Everything-You-Need-to-Kno w-About-OpCode-Caches ● http://www.leonarddavid.com/wp-content/uploads/2015/04/death-grip.jpg ● http://weca.mp/2016/images/coach/sara.png