Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CPython 인터프리터 구조 파헤치기 - PyCon Korea 24

CPython 인터프리터 구조 파헤치기 - PyCon Korea 24

해당 발표자료는 파이썬의 주요 인터프리터 CPython의 내부 구조를 파헤쳐, 인터프리터를 빌드하고 그 안에 있는 기능을 이해하기 위한 고급자를 위해 고안되었습니다.
특히 Python의 내부 구조를 이해하거나 Python 언어 자체의 Core 시스템의 이해를 주제로 만들어진 자료입니다.

Sungmin Han

October 27, 2024
Tweet

More Decks by Sungmin Han

Other Decks in Programming

Transcript

  1. 발표자 한성민 - Sungmin Han Google Developer Experts(GDE) for AI/ML

    and Cloud Google Developer Groups(GDG) for Go Google Cloud Champion Innovator for Modern Architecture F-Lab Python Mentor Formal Head of Tech at Riiid Formal Research Engineer at Naver Clova Formal Software Engineer at IGAWorks Formal Software Engineer at Simsimi
  2. An interpreter and a compiler as it compiles Python code

    into bytecode before interpreting it
  3. CPython/ ├── Doc/ # 공식 문서와 빌드 관련 정보 ├──

    Grammar/ # Python의 EBNF 문법 파일 ├── Include/ # 인터프리터 전역 헤더 파일 ├── Lib/ # 순수 Python으로 구현된 표준 라이브러리 ├── Mac/ # macOS 관련 코드 ├── Misc/ # 기타 개발자 문서 ├── Modules/ # C로 구현된 표준 라이브러리 코드 ├── Objects/ # 내장 타입 관련 코드 ├── PC/ # Windows 전용 코드 ├── PCbuild/ # MSVC 빌드 파일 ├── Parser/ # 파서와 AST 노드 정의 코드 ├── Programs/ # CPython 인터프리터 실행 파일 소스 코드 ├── Python/ # CPython 런타임 핵심 코드 └── Tools/ # Python 유지 관리 도구
  4. deb-src http://archive.ubuntu.com/ubuntu/ jammy main sudo apt-get update sudo apt-get build-dep

    python3 sudo apt-get install pkg-config sudo apt-get install build-essential gdb lcov pkg-config \ libbz2-dev libffi-dev libgdbm-dev libgdbm-compat-dev liblzma-dev \ libncurses5-dev libreadline6-dev libsqlite3-dev libssl-dev \ lzma lzma-dev tk-dev uuid-dev zlib1g-dev libmpdec-dev https://devguide.python.org/getting-started/setup-building/#install-dependencies
  5. /*[clinic input] sum as builtin_sum iterable: object / start: object(c_default="NULL")

    = 0 Return the sum of a 'start' value (default: 0) plus an iterable of numbers When the iterable is empty, return the start value. This function is intended specifically for use with numeric values and may reject non-numeric types. [clinic start generated code]*/ static PyObject * builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start) /*[clinic end generated code: output=df758cec7d1d302f input=162b50765250d222]*/
  6. static PyObject * builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start) {

    PyObject *result = start; PyObject *temp, *item, *iter; iter = PyObject_GetIter(iterable); if (iter == NULL) return NULL; # ... validation for(;;) { item = PyIter_Next(iter); temp = PyNumber_Add(result, item); Py_DECREF(result); Py_DECREF(item); result = temp; if (result == NULL) break; } Py_DECREF(iter); return result; }
  7. static PyObject * builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start) {

    PyObject *result = start; PyObject *temp, *item, *iter; iter = PyObject_GetIter(iterable); if (iter == NULL) return NULL; # ... validation for(;;) { item = PyIter_Next(iter); temp = PyNumber_Add(result, item); Py_DECREF(result); Py_DECREF(item); result = temp; if (result == NULL) break; } PyObject *result_times_two = PyNumber_Multiply(result, PyLong_FromLong(2)); Py_DECREF(result); return result_times_two; }
  8. Python 3.14.0a1+ (heads/main-dirty:9c01db40aa5, Oct 27 2024, 06:00:07) [Clang 14.0.3 (clang-1403.0.22.14.1)]

    on darwin Type "help", "copyright", "credits" or "license" for more information. Traceback (most recent call last): File "/usr/local/lib/python3.14/runpy.py", line 198, in _run_module_as_main return _run_code(code, main_globals, None, "__main__", mod_spec) File "/usr/local/lib/python3.14/runpy.py", line 88, in _run_code exec(code, run_globals) ~~~~^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.14/_pyrepl/__main__.py", line 6, in <module> __pyrepl_interactive_console() ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^ ... File "/usr/local/lib/python3.14/_pyrepl/completing_reader.py", line 261, in calc_screen screen = super().calc_screen() File "/usr/local/lib/python3.14/_pyrepl/reader.py", line 413, in calc_screen self.cxy = self.pos2xy() ~~~~~~~~~~~^^ File "/usr/local/lib/python3.14/_pyrepl/reader.py", line 595, in pos2xy p, l2 = self.screeninfo[y] ~~~~~~~~~~~~~~~^^^
  9. def max_column(self, y: int) -> int: """Return the last x-offset

    for line y""" return self.screeninfo[y][0] + sum(self.screeninfo[y][1]) # Lib/_pyrepl/reader.py
  10. def max_column(self, y: int) -> int: """Return the last x-offset

    for line y""" return self.screeninfo[y][0] + (sum(self.screeninfo[y][1]) // 2) # Lib/_pyrepl/reader.py
  11. io_uring A Linux kernel system call interface for storage device

    asynchronous I/O operations addressing performance issues with similar interfaces provided by functions like read()/write() or aio_read()/aio_write() etc. for operations on data accessed by file descriptors.
  12. int main(int argc, char *argv[]) { struct io_uring ring; if

    (argc < 2) { fprintf(stderr, "Usage: %s [file name] <[file name] ...>\n", argv[0]); return 1; } io_uring_queue_init(QUEUE_DEPTH, &ring, 0); for (int i = 1; i < argc; i++) { int ret = submit_read_request(argv[i], &ring); if (ret) { fprintf(stderr, "Error reading file: %s\n", argv[i]); return 1; } get_completion_and_print(&ring); } io_uring_queue_exit(&ring); return 0; }
  13. int submit_read_request(char *file_path, struct io_uring *ring) { int file_fd =

    open(file_path, O_RDONLY); off_t file_sz = get_file_size(file_fd), bytes_remaining = file_sz, offset = 0; int current_block = 0, blocks = (int) file_sz / BLOCK_SZ; if (file_sz % BLOCK_SZ) blocks++; struct file_info *fi = malloc(sizeof(*fi) + (sizeof(struct iovec) * blocks)); char *buff = malloc(file_sz); while (bytes_remaining) { off_t bytes_to_read = bytes_remaining; if (bytes_to_read > BLOCK_SZ) bytes_to_read = BLOCK_SZ; offset += bytes_to_read; fi->iovecs[current_block].iov_len = bytes_to_read; void *buf; fi->iovecs[current_block].iov_base = buf; current_block++; bytes_remaining -= bytes_to_read; } fi->file_sz = file_sz; struct io_uring_sqe *sqe = io_uring_get_sqe(ring); io_uring_prep_readv(sqe, file_fd, fi->iovecs, blocks, 0); io_uring_sqe_set_data(sqe, fi); io_uring_submit(ring); return 0; }
  14. int get_completion_and_print(struct io_uring *ring) { struct io_uring_cqe *cqe; int ret

    = io_uring_wait_cqe(ring, &cqe); if (ret < 0) { perror("io_uring_wait_cqe"); return 1; } if (cqe->res < 0) { fprintf(stderr, "Async readv failed.\n"); return 1; } struct file_info *fi = io_uring_cqe_get_data(cqe); int blocks = (int) fi->file_sz / BLOCK_SZ; if (fi->file_sz % BLOCK_SZ) blocks++; for (int i = 0; i < blocks; i ++) output_to_console(fi->iovecs[i].iov_base, fi->iovecs[i].iov_len); io_uring_cqe_seen(ring, cqe); return 0; }
  15. io_uring vs epoll(aka. Async IO) 1 I/O 방식: epoll은 이벤트

    기반, io_uring은 커널에서 직접 비동기 I/O 처리. 2 큐 구조: epoll은 단일 이벤트 큐, io_uring은 SQ/CQ 이중 큐 구조. 3 시스템 호출: epoll은 단일 I/O에 여러 호출이 요구, io_uring은 단일 호출을 통해 여러 I/O 처리. 4 성능: epoll은 많은 문맥 교환 요구, io_uring은 문맥 교환 최소화.
  16. #include <Python.h> #include <liburing.h> #include <fcntl.h> #include <unistd.h> typedef struct

    { PyObject_HEAD int fd; struct io_uring ring; } IOUringObject; static PyObject* IOUring_new(PyTypeObject* type, PyObject* args, PyObject* kwds) { IOUringObject* self; self = (IOUringObject*)type->tp_alloc(type, 0); if (self != NULL) { self->fd = -1; } return (PyObject*)self; } static int IOUring_init(IOUringObject* self, PyObject* args, PyObject* kwds) { const char* path; if (!PyArg_ParseTuple(args, "s", &path)) return -1; self->fd = open(path, O_RDONLY); if (self->fd < 0) { PyErr_SetFromErrnoWithFilename(PyExc_OSError, path); return -1; } if (io_uring_queue_init(8, &self->ring, 0) < 0) { PyErr_SetString(PyExc_RuntimeError, "io_uring_queue_init failed"); close(self->fd); return -1; } return 0; } static void IOUring_dealloc(IOUringObject* self) { if (self->fd >= 0) close(self->fd); io_uring_queue_exit(&self->ring); Py_TYPE(self)->tp_free((PyObject*)self); } static PyObject* IOUring_read(IOUringObject* self, PyObject* Py_UNUSED(ignored)) { struct io_uring_sqe *sqe; struct io_uring_cqe *cqe; PyObject* result; struct stat st; off_t file_size; char* buffer; if (fstat(self->fd, &st) < 0) { PyErr_SetFromErrno(PyExc_OSError); return NULL; } file_size = st.st_size; buffer = PyMem_Malloc(file_size); if (!buffer) { PyErr_NoMemory(); return NULL; } sqe = io_uring_get_sqe(&self->ring); io_uring_prep_read(sqe, self->fd, buffer, file_size, 0); if (io_uring_submit(&self->ring) < 0) { PyErr_SetString(PyExc_RuntimeError, "io_uring_submit failed"); PyMem_Free(buffer); return NULL; } if (io_uring_wait_cqe(&self->ring, &cqe) < 0) { PyErr_SetString(PyExc_RuntimeError, "io_uring_wait_cqe failed"); PyMem_Free(buffer); return NULL; } if (cqe->res < 0) { PyErr_SetFromErrno(PyExc_OSError); PyMem_Free(buffer); io_uring_cqe_seen(&self->ring, cqe); return NULL; } result = PyBytes_FromStringAndSize(buffer, cqe->res); PyMem_Free(buffer); io_uring_cqe_seen(&self->ring, cqe); return result; } static PyObject* IOUring_print(IOUringObject* self, PyObject* Py_UNUSED(ignored)) { PyObject* data = IOUring_read(self, NULL); if (!data) return NULL; if (PySys_WriteStdout("%s", PyBytes_AS_STRING(data)) < 0) { Py_DECREF(data); return NULL; } Py_DECREF(data); Py_RETURN_NONE; } static PyMethodDef IOUring_methods[] = { {"read", (PyCFunction)IOUring_read, METH_NOARGS, "Read data from the file"}, {"print", (PyCFunction)IOUring_print, METH_NOARGS, "Print data to stdout"}, {NULL} /* Sentinel */ }; static PyTypeObject IOUringType = { PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "asyncio.io_uring.IOUring", .tp_doc = "IOUring objects", .tp_basicsize = sizeof(IOUringObject), .tp_flags = Py_TPFLAGS_DEFAULT, .tp_new = IOUring_new, .tp_init = (initproc)IOUring_init, .tp_dealloc = (destructor)IOUring_dealloc, .tp_methods = IOUring_methods, }; static PyModuleDef _asynciouringmodule = { PyModuleDef_HEAD_INIT, .m_name = "_asynciouring", .m_doc = "C extension module for asyncio io_uring support", .m_size = -1, }; PyMODINIT_FUNC PyInit__asynciouring(void) { PyObject* m; if (PyType_Ready(&IOUringType) < 0) return NULL; m = PyModule_Create(&_asynciouringmodule); if (m == NULL) return NULL; Py_INCREF(&IOUringType); if (PyModule_AddObject(m, "IOUring", (PyObject*)&IOUringType) < 0) { Py_DECREF(&IOUringType); Py_DECREF(m); return NULL; } return m; } # Modules/_asynciouringmodule.c
  17. from _asynciouring import IOUring def io_uring_open(path): """ Open a file

    using io_uring. Args: path (str): The file path to open. Returns: IOUring: An IOUring object with read() and print() methods. """ return IOUring(path) # Lib/asyncio.c
  18. Conclusion 1 CPython 빌드와 개발 과정은 그렇게 어렵지 않다. 2

    Python의 오픈소스 기여 시스템은 잘 구조화 되어있다. (특히 이슈를 검색하는 과정) 3 CPython의 코드 기여/리뷰는 파이썬에 대한 이해를 크게 높여준다. (성능과 동작 면에서) 4 예상보다 최신 OS 기능 지원 속도가 미비할 수 있다. 여러분도 기여해보자.
  19. QnA