Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Александр Кошкин, Positive Technologies: Знай и люби свой CPython во имя луны и великой справедливости

IT-People
August 05, 2016

Александр Кошкин, Positive Technologies: Знай и люби свой CPython во имя луны и великой справедливости

Выступление на конференции PyCon Russia 2016

IT-People

August 05, 2016
Tweet

More Decks by IT-People

Other Decks in Programming

Transcript

  1. { "talk": { "title": ( "Love your CPython in the

    name of " + "the Moon and the great justice :3" ) "event_id": "PyConRu_2016", }, "speaker": { "qname" : "Aleksandr Koshkin", "linkedin": "lnkfy.com/7Do", "g+": : "lnkfy.com/7jV", "github" : "/magniff", } }
  2. What this talk is up to Consider this talk as

    a little story about - short reminder of what the interpreter basically is - how did we get there - my thoughts and complaints Structure: - Know your python - Love (censored) your python
  3. ABC lang Extremely high level - Ridiculously user friendly -

    More theoretical then practical - Terribly inefficient PUT {} IN collection FOR line IN document: FOR word IN split line: IF word not.in collection: INSERT word IN collection RETURN collection
  4. To be more ‘real-life’ programming lang Efficient due to the

    fact of C backend Should be very practical - Simple design (fairly primitive runtime) - Lots of corner cuts even at the expense of code architecture - Tailored for solving real problems (modules like grep and so on) - Higly extensible by C code BDFL (just Guido back then) started Python
  5. - Relatively simple LL(1) grammar - Stack based virtual machine

    (iterative refinment) - Simple GC based on reference counting - Fairly simple object model - GIL - ... Simple design
  6. -- result = value + “hello world” -- load_var “value”

    # from where, btw load_const “hello world” # consts you say? binary_summ # how to perform summ bind_to result # where to put result NOTE: This model does not allow any procedural decomposition, so frames appears on the stage Decompose: - Instructions out of context interface == code objects - Context interface out of context == frames Simple design
  7. >>> factorial(3) current value: 3 current frame: <frame object at

    0x0000000002FDE748> parent frame: <frame object at 0x0000000002FDEAC8> code object <code object factorial at 0x00000000028C1DB0> current value: 2 current frame: <frame object at 0x0000000002FDEC88> parent frame: <frame object at 0x0000000002FDE748> code object <code object factorial at 0x00000000028C1DB0> current value: 1 current frame: <frame object at 0x0000000002FDEE48> parent frame: <frame object at 0x0000000002FDEC88> code object <code object factorial at 0x00000000028C1DB0> Code and frames
  8. - Frame represents running code - Code represents instructions and

    interface for context PyObject * PyEval_EvalFrameEx(PyFrameObject *f, int throwflag) { co = f->f_code; for (;;) { switch (opcode) { TARGET(LOAD_FAST) {\* implementation of LOAD_FAST *\} TARGET(LOAD_CONST) {\* implementation of LOAD_CONST *\} TARGET(STORE_FAST) {\* implementation of STORE_FAST *\} ... } } Frame evaluator
  9. TARGET(BINARY_ADD) { PyObject *right = POP(); PyObject *left = TOP();

    PyObject *sum; ... sum = PyNumber_Add(left, right); ... DISPATCH(); } C API
  10. typedef struct _object { _PyObject_HEAD_EXTRA # expands to nothing in

    release Py_ssize_t ob_refcnt; struct _typeobject *ob_type; } PyObject; PyObject
  11. typedef struct _typeobject { PyObject_VAR_HEAD # PyObject as well ...

    getattrofunc tp_getattro; # __getattribute__ setattrofunc tp_setattro; # __setattr__ ... PyObject *tp_dict; # cls.__dict__ initproc tp_init; # __init__ newfunc tp_new; # __new__ ... PyObject *tp_bases; # __bases__ ... } PyTypeObject; Types in Python
  12. (gdb) call PyLong_FromLong(123) (PyObject *) 0x9836e0 (gdb) call PyLong_FromLong(456) (PyObject

    *) 0x7ffff7f64e50 (gdb) # Hmm, what is 0x9836e0 up to (gdb) call PyLong_Type->tp_as_number->nb_add( 0x9836e0, 0x7ffff7f64e50 ) (PyObject *) 0x7ffff7f64ea0 (gdb) call PyObject_Print(0x7ffff7f64ea0, stderr, 1) 579 # 123 + 456 == 579 No need of python
  13. from github.username.project import module >>> sys.meta_path [ <class '_frozen_importlib.BuiltinImporter'>, <class

    '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'> ] >>> sys.meta_path = [] >>> import re # cache lookup runs before hooks >>> import code # import machinery is ruined now ImportError: No module named 'code' Import hooks magic
  14. __builtins__, __import__ and __build_class__ import magic # https://github.com/magniff/magic >>> def

    builder(default_builder, *args, **kwargs): ... print(args, kwargs) ... return default_builder(*args, **kwargs) ... >>> with magic.wonderland(builder): ... class MyAwesomeClass(int, metaclass=type): pass ... ( <function MyAwesomeClass at 0x7f1abf5cceb8>, 'MyAwesomeClass', <class 'int'> ) {'metaclass': <class 'type'>} >>> MyAwesomeClass <class '__main__.MyAwesomeClass'>
  15. Attribute descriptors and whatnot >>> def factorial(self): ... return 1

    if not self else factorial(self-1)*self ... >>> ten_factorial = factorial.__get__(10, int) >>> ten_factorial() 3628800 >>> print(ten_factorial) <bound method factorial of 10> >>> (10).factorial AttributeError: 'int' object has no attribute 'factorial'
  16. Descriptors and whatnot >>> s = SomeClass() >>> s.foo =

    100 >>> s.__dict__ {‘foo’: 100} >>> SomeClass.__dict__['__dict__'].__get__(s, SomeClass) {‘foo’: 100}
  17. Magic methods >>> class MagicClass: ... pass ... >>> MagicClass.__call__

    = lambda self, value: value**2 >>> m = MagicClass() >>> m(10) # wow, but __call__ maps to tp_call 100
  18. super() + c3 linearization class BaseClass: def foo(self): print('From BaseClass!!!')

    class DerivatedClass(BaseClass): pass >>> def foo(self): ... return super().foo() >>> DerivatedClass().foo() RuntimeError: super(): __class__ cell not found
  19. All those things require magic eval + exec + compile

    function, ast module, dis module, Parser module, GC module CodeType, FrameType, memoryviews ... namedtuple, jinja2, greenlets ...
  20. - Python code as a string - Python AST -

    Python CodeObject - Internal machinery written in C - Binary layout of objects So what are the points of interest
  21. import ctypes class LLReader: def __get__(self, obj, klass=None): return ctypes.cast(

    id(obj), ctypes.POINTER(ctypes.c_ubyte) ) def __set__(self, obj, value): pass # we actually need this method ctypes.cast( id(object)+type.__dictoffset__, ctypes.POINTER(ctypes.py_object) )[0]['dump'] = LLReader() >>> (100).dump[12]=200 # index varies from build to build >>> 100+100 400 # yep, 200+200 == 400 PyOM (Python Object Massacre)
  22. >>> import sys >>> import hacky >>> int_100_memory = [

    hacky.read_memory_in(id(100)+shift) for shift in range(sys.getsizeof(100)) ] >>> print(int_100_memory) [ 32, 18, 143, 0, 0, 0, 0, 0, # 9376288 # py_object(99) 128, 18, 143, 0, 0, 0, 0, 0, # py_object(101) 12, 0, 0, 0, 0, 0, 0, 0, # ob_refcnt 224, 228, 137, 0, 0, 0, 0, 0, # id(int) 1, 0, 0, 0, 0, 0, 0, 0, 100, 0, 0, 0 ] # 99 -> 100 -> 101 # 9376288 -> 9376336 -> 9376384 >>> hacky.write_memory_in(id(100)+40, 200) # ctypes.memset >>> 100 200 https://github.com/magniff/hacky
  23. >>> "hello world".__class__ = type TypeError: __class__ assignment only supported

    for heap types or ModuleType subclasses >>> class VerboseModule(ModuleType): ... def __getattribute__(self, attr_name): ... print(“Looking up...”) # warning: recursion ... return super().__getattribute__(attr_name) ... >>> re.__class__ = VerboseModule >>> re Looking up attribute '__loader__' of module 're' Looking up attribute '__spec__' of module 're' <module 're' from '/home/magniff/workspace/cpython/Lib/re.py'> https://github.com/magniff/hacky
  24. >>> class MyInt(int): ... def __repr__(self): ... return "This is

    %s." % self ... >>> hacky.set_class(100, MyInt) >>> 100 This is 100. >>> 50+50 This is 100. https://github.com/magniff/hacky
  25. >>> class MyInt(int): ... def __get__(self, instance, klass): ... print("Running

    __get__ of instance %s." % self) ... return self ... def __set__(self, instance, value): ... print("No way, man!") ... >>> class Some: ... foo = 100 ... >>> s = Some() >>> hacky.set_class(100, MyInt) >>> s.foo Running __get__ of instance 100. 100 >>> s.foo = 200 No way, man! Ints as descriptors (wait what?)
  26. >>> a = "hello world" >>> a[0] = "H" TypeError:

    'str' object does not support item assignment >>> Mutable strings, for god`s sake
  27. >>> a = "hello world" >>> ord("h") 104 # ascii

    code for “h” >>> ord("e") 101 >>> hacky.read_memory_in(id(a)+64) # this is “h” 104 >>> hacky.read_memory_in(id(a)+65) # this is “e” 101 # WHY 64, 65? You dont wanna know, trust me. Mutable strings, for god`s sake
  28. >>> class MutableString(str): ... def __setitem__(self, index, item): ... if

    index < len(self): ... hacky.write_memory_in(id(self)+64+index, ord(item)) ... else: ... raise IndexError( ... "Object mutation out of boundary!" ... ) ... >>> a = "hello world" >>> hacky.set_class(a, MutableString) >>> type(a) <class '__main__.MutableString'> Mutable strings, for god`s sake
  29. >>> a[0] = "H" >>> print(a) Hello world >>> a[100]

    = "H" IndexError: Object mutation out of boundary! Mutable strings, for god`s sake
  30. >>> import types >>> class MyFunction(types.FunctionType): ... pass ... TypeError:

    type 'function' is not an acceptable base type >>> class MyBool(bool): pass TypeError: type 'bool' is not an acceptable base type $ grep "is not an acceptable base type" -rn * Objects/typeobject.c:1959 Objects/typeobject.c:2742 Inheritable types
  31. ... if (!PyType_HasFeature(base, Py_TPFLAGS_BASETYPE)) { PyErr_Format( PyExc_TypeError, "type '%.100s' is

    not an acceptable base type", base->tp_name ); ... >>> bin(hacky.get_flags(bool) ^ hacky.get_flags(int)) 0b10000000010000000000 ^ Py_TPFLAGS_BASETYPE Inheritable types
  32. >>> hacky.set_flags( types.FunctionType, hacky.get_flags(types.FunctionType) | (1<<10) ) >>> class MyFuction(types.FunctionType):

    pass ... >>> >>> hacky.set_flags(object, hacky.get_flags(object) ^ (1<<10)) >>> class T: pass TypeError: type 'object' is not an acceptable base type Inheritable types
  33. Recursion >>> def factorial(value): ... return 1 if not value

    else value * factorial(value-1) ... >>> factorial(10) 3628800 >>> factorial(998) == factorial(997) * 998 # 999 leads to crash True >>> sys.getrecursionlimit() 1000
  34. Disassembly >>> dis.dis(factorial) 2 0 LOAD_FAST 0 (value) 3 POP_JUMP_IF_TRUE

    10 6 LOAD_CONST 1 (1) 9 RETURN_VALUE >> 10 LOAD_FAST 0 (value) 13 LOAD_GLOBAL 0 (factorial) 16 LOAD_FAST 0 (value) 19 LOAD_CONST 1 (1) 22 BINARY_SUBTRACT 23 CALL_FUNCTION 1 (blah blah) 26 BINARY_MULTIPLY 27 RETURN_VALUE
  35. >>> import opcode >>> opcode.opmap["LOAD_FAST"] 124 >>> hacky.read_memory_in(id(factorial.__code__.co_code)+48) 124 >>>

    hacky.write_memory_in(id(factorial.__code__.co_code)+48+13, 9) >>> hacky.write_memory_in(id(factorial.__code__.co_code)+48+14, 9) >>> hacky.write_memory_in(id(factorial.__code__.co_code)+48+15, 9) O_O
  36. >>> dis.dis(factorial) 2 0 LOAD_FAST 0 (value) 3 POP_JUMP_IF_TRUE 10

    6 LOAD_CONST 1 (1) 9 RETURN_VALUE >> 10 LOAD_FAST 0 (value) 13 NOP 14 NOP 15 NOP 16 LOAD_FAST 0 (value) 19 LOAD_CONST 1 (1) 22 BINARY_SUBTRACT 23 CALL_FUNCTION 1 (blah blah) 26 BINARY_MULTIPLY 27 RETURN_VALUE O_O
  37. >>> factorial(10) TypeError: 'int' object is not callable >>> hacky.write_memory_in(id(factorial.__code__.co_code)+48+23,

    86) >>> hacky.write_memory_in(id(factorial.__code__.co_code)+48+24, 9) >>> hacky.write_memory_in(id(factorial.__code__.co_code)+48+25, 9) O_O
  38. >>> dis.dis(factorial) 2 0 LOAD_FAST 0 (value) 3 POP_JUMP_IF_TRUE 10

    6 LOAD_CONST 1 (1) 9 RETURN_VALUE >> 10 LOAD_FAST 0 (value) 13 NOP 14 NOP 15 NOP 16 LOAD_FAST 0 (value) 19 LOAD_CONST 1 (1) 22 BINARY_SUBTRACT 23 YIELD_VALUE 24 NOP 25 NOP 26 BINARY_MULTIPLY 27 RETURN_VALUE \(O_O)/
  39. \(>_<)/ >>> factorial(10) 9 >>> factorial(10) 9 >>> def gen():

    yield >>> gen.__code__.co_flags 99 >>> factorial.__code__.co_flags 67 >>> hacky.read_memory_in(id(factorial.__code__)+48) 67
  40. |(+_+)| >>> hacky.write_memory_in(id(factorial.__code__)+48, 99) >>> factorial.__code__.co_flags 99 >>> factorial(3) <generator

    object factorial at 0x7f6ae975b0e0> >>> f_three = _ >>> next(f_three) 2 >>> f_three.send(2) StopIteration: 6 # result
  41. |(+_+)| >>> f_ten = factorial(10) >>> hacky.write_memory_in( id(factorial.__code__.co_code)+48+27, 9 )

    >>> f_ten = factorial(10) >>> next(f_ten) 9 >>> f_ten.send(1) SystemError: unknown opcode
  42. https://github.com/magniff/endless def factorial(value): return 1 if value == 1 else

    value * factorial(value-1) @endless.make def factorial(value): return 1 if value == 1 else value * (yield {'value': value-1}) @endless.make def maccarthy91(value): # F(F(value)) == (yield (yield ...)) if value > 100: return value - 10 else: return (yield {'value': (yield {'value': value+11})})
  43. - Run the CPython in tracee mode (ptrace, beeaaah) -

    Hack the CPyhon`s binary before start (ugly) - Somehow hack CPyhon`s .data section at runtime (ugly and impossible) - Do some code injection from .so (dll) maybe (unstable) - Rework CPython to be more injection friendly - Then inject some C function from C extension - Pass control flow to Python code back Nevertheless
  44. int PyRun_InteractiveLoopFlags(fp, filename_str, flags) { ... for (;;) { ret

    = PyRun_InteractiveOneObject(fp, filename, flags); _PY_DEBUG_PRINT_TOTAL_REFS(); if (ret == E_EOF) { err = 0; break; } ... } ... return err; } Custom REPL implementation
  45. Console of a healthy person int PyRun_InteractiveOneObject(fp, filename, flags) {

    mod = PyParser_ASTFromFileObject( fp, filename, enc, Py_single_input, ps1, ps2, flags, &errcode, arena ); ... v = run_mod( mod, filename, main_module_dict, main_module_dict, flags, arena ); ... }
  46. Smoker console int PyRun_InteractiveOneObject(fp, filename, flags) { ... techno_module =

    PyImport_ImportModule("techno"); module_entry = PyObject_GetAttr( techno_module, PyUnicode_FromString("main_entry") ); read = getline(&line, &len, fp); # only oneliners will do ... line_object = PyUnicode_FromFormat(line); //REPL input PyObject_Call( module_entry, entry_args=PyTuple_Pack(1, line_object), NULL ); ... }
  47. The techno module import __main__ import parser G = L

    = __main__.__dict__ DEFAULT_IMPLEMENTATION = ( lambda string: eval(parser.compilest(parser.suite(string)), G, L) ) IMPLEMENTATION = DEFAULT_IMPLEMENTATION def main_entry(string): global IMPLEMENTATION if string.strip() == 'reset': IMPLEMENTATION = DEFAULT_IMPLEMENTATION IMPLEMENTATION(string)
  48. Little demo <magniff`s mode> print("PyCon Russia 2016!") PyCon Russia 2016!

    <magniff`s mode> import techno <magniff`s mode> variable = “Lets check context facility” <magniff`s mode> print(variable) Lets check context facility <magniff`s mode> <magniff`s mode> <magniff`s mode> techno.IMPLEMENTATION = ( ... lambda string: print(string.strip()[::-1] ... ) <magniff`s mode> print("PyCon Russia 2016!") )"!6102 aissuR noCyP"(tnirp <magniff`s mode> reset <magniff`s mode> print("PyCon Russia 2016!") PyCon Russia 2016! <magniff`s mode>
  49. Little demo <magniff`s mode> import mybasic <magniff`s mode> # import

    hacks console evaluator <magniff`s mode> PROGRAM PYCON2016 <magniff`s mode> 10 LET msg = “HELLO WORLD!” <magniff`s mode> 20 PRINT “MSG WAS: “ + msg <magniff`s mode> 30 GOTO 20 <magniff`s mode> RUN PYCON2016 MSG WAS: HELLO WORLD! MSG WAS: HELLO WORLD! MSG WAS: HELLO WORLD! MSG WAS: HELLO WORLD! MSG WAS: HELLO WORLD! MSG WAS: HELLO WORLD! MSG WAS: HELLO WORLD! MSG WAS: HELLO WORLD! ...
  50. https://github.com/magniff/techno >>> import techno >>> techno.init() # hacks interpreter internals

    >>> Frames evaluated: 1 # even though the input is empty there still one empty frame to evaluate >>> import re Frames evaluated: 1 >>> import code # some module, that not been cached yet Frames evaluated: 5301
  51. https://github.com/magniff/techno >>> import functools Frames evaluated: 1 >>> @functools.lru_cache(maxsize=None) ...

    def factorial(value): ... return 1 if not value else value * factorial(value-1) ... Frames evaluated: 4 >>> factorial(10) 3628800 Frames evaluated: 12 >>> factorial(10) 3628800 Frames evaluated: 1
  52. https://github.com/magniff/techno >>> class A(): pass ... Frames evaluated: 2 >>>

    Point = collections.namedtuple("Point", ("x", "y", "z")) Frames evaluated: 15 >>> a = A() Frames evaluated: 1
  53. https://github.com/magniff/techno >>> hello world Frames evaluated: 0 >>> import not_a_module

    ImportError: No module named 'not_a_module' Frames evaluated: 119 >>> techno.reset() # back to normal Frames evaluated: 1 >>> print("hello world") hello world >>>
  54. :3