ソースコードを読むときの思考プロセスの例 ~markdownのレンダリング方法を知りたかった2 markdownパッケージ~

ソースコードを読む時の思考プロセスの例 ~ markdownのレンダリング方法を知りたかった2 markdownパッケージ~ Nov. 3rd, 2025 Satoru Takeuchi X:
satoru_takeuchi

はじめに • 前回動画の続き ◦ ソースコードを読む時の思考プロセス ~markdownのレンダリング方法を知りたかった ~ • 前回のあらすじ ◦
markdownファイルからhtmlへのレンダリング方法が知りたくて MkDocsのコードを読んだ ◦ レンダリング処理の本体は pythonのmarkdownパッケージにあることがわかった ▪ MkDocsはmarkdownパッケージのユーザに過ぎない • 今回やること ◦ markdownパッケージのコードを読むぞ！ ◦ https://github.com/Python-Markdown/markdown

ソースを読む…前にドキュメントを読む • MkDocsがmarkdownパッケージを使っている箇所は特定したが、それがどういう意味を持つのかはわかっていない • それを知るにはまず、markdownパッケージの仕様を知る必要がある • 仕様が書かれているのはドキュメント ◦ 📝
ドキュメントが無いかドキュメントには古い情報が書いていることもあるよ !

markdownのドキュメント • 公式ドキュメントがWeb上で公開されている ◦ https://python-markdown.github.io/reference/ ◦ 最新バージョンのものだけ存在する • MkDocsで使っているmarkdownパッケージは最新のものだった •
📝 最新じゃない場合はリポジトリのトップディレクトリにmkdocsの設定ファイルがあるので、自分でドキュメントサイトを作れば読める $ pip list Package Version Editable project location --------------- ----------- ------------------------- ... Markdown 3.9

ドキュメントに書いてあったこと • 基本的な使いかた 1. markdown.Markdown()でMarkdownクラスのオブジェクトを作る 2. markdown.convert(source)でMarkdownをレンダリング • convert()の流れ 1.
A bunch of preprocessors munge the input text. 2. A BlockParser parses the high-level structural elements of the pre-processed text into an ElementTree object. 3. A bunch of treeprocessors are run against the ElementTree object. One such treeprocessor (markdown.treeprocessors.InlineProcessor) runs inlinepatterns against the ElementTree object, parsing inline markup. 4. Some postprocessors are run against the text after the ElementTree object has been serialized into text. 5. The output is returned as a string.

データ変換の流れ string (Markdown) string preprocessors ElementTree BlockParser ElementTree tree processor
string string serialize postprocessors

どこから読む? • 多分BlockParserが鍵 ◦ ここで構文解析してmarkdownテキストを木構造にしているはず string (Markdown) string preprocessors ElementTree
BlockParser ElementTree tree processor string string serialize postprocessors

BlockParserはどう使われる? • Markdown.convert()のソースを読む class Markdown: ... def __init__(self, **kwargs): ...
self.build_parser() # オブジェクトの初期化 ... def build_parser(self) -> Markdown: ... self.parser = build_block_parser(self) # parserの初期化 ... def convert(self, source: str) -> str: ... root = self.parser.parseDocument(self.lines).getroot() …

BlockParserはどう使われる? • おそらくbuild_block_parserがmarkdownの構文解析をするparserの初期設定をして、そのparserをparseDocumentで呼び出している class Markdown: ... def __init__(self, **kwargs):
... self.build_parser() ... def build_parser (self) -> Markdown: ... self.parser = build_block_parser( self) ... def convert(self, source: str) -> str: ... root = self.parser.parseDocument( self.lines).getroot() …

build_block_parserのコードを読む • それっぽいハンドラを登録している def build_block_parser(md: Markdown, **kwargs: Any) -> BlockParser:
""" Build the default block parser used by Markdown. """ parser = BlockParser(md) … parser.blockprocessors.register(CodeBlockProcessor(parser), 'code', 80) … parser.blockprocessors.register(OListProcessor(parser), 'olist', 40) parser.blockprocessors.register(UListProcessor(parser), 'ulist', 30) parser.blockprocessors.register(BlockQuoteProcessor(parser), 'quote', 20) … return parser

次は内部でparserを呼び出しているであろうparseDocument • おそらくbuild_block_parserでparserの初期設定をして、それをparseDocumentで呼び出している class Markdown: ... def __init__(self, **kwargs):
... self.build_parser() ... def build_parser (self) -> Markdown: ... self.parser = build_block_parser( self) ... def convert(self, source: str) -> str: ... root = self.parser.parseDocument( self.lines).getroot() …

parseDocumentを読む parseDocument() -> parseChunk() # “\n”で区切られる1行単位で呼ばれる -> parseBlocks() # “\n\n”で区切られる1段落単位で呼ばれる
def parseBlocks(self, parent: etree.Element, blocks: list[str]) -> None: """ Process blocks of Markdown text and attach to given `etree` node. ... """ while blocks: for processor in self.blockprocessors: if processor.test(parent, blocks[0]): if processor.run(parent, blocks) is not False: # run returns True or None break

こういうことをしているはず(*細かいところは適当) # title test - foo - bar text ElementTree
section(“title”) paragraph(“test”) unordered list list item(“foo”) list item(“bar”)

ところでElementTreeって何? • 別パッケージのものらしい • XMLドキュメントを表現する木構造 ◦ 各要素をElementと呼ぶ import xml.etree.ElementTree as
etree """Lightweight XML support for Python. XML is an inherently hierarchical data format, and the most natural way to represent it is with a tree. This module has two classes for this purpose: 1. ElementTree represents the whole XML document as a tree and 2. Element represents a single node in this tree.

ということはmarkdownがやっていることは… 1. まずXML(に対応する木構造であるElementTree)に変換される 2. それを何らかの方法でhtmlに変換している 3. じゃあその変換はどこでやってる? 4. 適当に検索したら”serializers.py”が出てきた a.
確かElementTreeからserializeしてhtmlに変換しているはず $ git grep -i "xml.*html" docs/extensions/api.md:[ElementTree]: https://docs.python.org/3/library/xml.etree.elementtree.html markdown/core.py: [`xmlcharrefreplace `](https://docs.python.org/3/library/codecs.html#error-handlers) markdown/inlinepatterns.py: [`Element`][xml.etree.ElementTree.Element] which contains HTML escaped text (with markdown/serializers.py:from xml.etree.ElementTree import Comment, ElementTree, Element, QName, HTML_EMPTY

serializers.pyを読む • 当たりっぽい。以下のような呼び出し関係だった markdown.Convert -> self.serializer(ElementTreeのroot) -> to_html_string() -> _write_html()
-> _serialize_html() # ここでxmlからhtmlに変換している

最終的に明らかになった、ざくっとした処理の流れ 1. ブラウザ: ページにアクセスする 2. MkDocs: markdown.Markdownクラスのオブジェクトを作る 3. MkDocs: ページに対応するMarkdownファイルを引数としてmarkdown.convert()
メソッドを呼ぶ 4. markdown: convert()メソッドの中でbuildprocessorsを使ってMarkdownファイルを ElementTreeに変換 a. 📝 省略したがElementTreeの各要素に対して呼ばれる treeprocessorの一つでインライン要素 (強調表示など)を処理している 5. markdown: ElementTreeをシリアライズしてhtmlを返す 6. MkDocs: htmlをブラウザに返す

まとめ • Pythonのmarkdownパッケージを読むことによってmarkdownファイルをhtmlファイルにレンダリングする方法(の一つ)がわかった • 目的と読み方を定めていたため、効率的に目的を達成できた ◦ MkDocsとmarkdownのコード量はテストを除き、それぞれ約 7000行、約16000行程度 ▪
📝 前回動画でMkDocsを16000行と書いたのは間違い。 venvのコードを含んでいた ◦ 実際に読んだコードは高々 1000行程度。所要期間は合計 2,3時間 • ひとくちに「ソースコードを読む」といっても、実はそれだけではなく、ドキュメントを読んだり動かしたりしている ◦ 「どこを読まないか」を選択するのが非常に大事

ソースコードを読むときの思考プロセスの例 ~markdownのレンダリング方法を知りたかった2...

ソースコードを読むときの思考プロセスの例 ~markdownのレンダリング方法を知りたかった2 markdownパッケージ~

Satoru Takeuchi PRO

More Decks by Satoru Takeuchi

Other Decks in Technology

Featured

Transcript

ソースコードを読む時の思考プロセスの例 ~ markdownのレンダリング方法を知りたかった2 markdownパッケージ~ Nov. 3rd, 2025 Satoru Takeuchi X:

はじめに • 前回動画の続き ◦ ソースコードを読む時の思考プロセス ~markdownのレンダリング方法を知りたかった ~ • 前回のあらすじ ◦

markdownのドキュメント • 公式ドキュメントがWeb上で公開されている ◦ https://python-markdown.github.io/reference/ ◦ 最新バージョンのものだけ存在する • MkDocsで使っているmarkdownパッケージは最新のものだった •

ドキュメントに書いてあったこと • 基本的な使いかた 1. markdown.Markdown()でMarkdownクラスのオブジェクトを作る 2. markdown.convert(source)でMarkdownをレンダリング • convert()の流れ 1.

データ変換の流れ string (Markdown) string preprocessors ElementTree BlockParser ElementTree tree processor

どこから読む? • 多分BlockParserが鍵 ◦ ここで構文解析してmarkdownテキストを木構造にしているはず string (Markdown) string preprocessors ElementTree

BlockParserはどう使われる? • Markdown.convert()のソースを読む class Markdown: ... def init(self, **kwargs): ...

BlockParserはどう使われる? • おそらくbuild_block_parserがmarkdownの構文解析をするparserの初期設定をして、そのparserをparseDocumentで呼び出している class Markdown: ... def init(self, **kwargs):

build_block_parserのコードを読む • それっぽいハンドラを登録している def build_block_parser(md: Markdown, **kwargs: Any) -> BlockParser:

次は内部でparserを呼び出しているであろうparseDocument • おそらくbuild_block_parserでparserの初期設定をして、それをparseDocumentで呼び出している class Markdown: ... def init(self, **kwargs):

parseDocumentを読む parseDocument() -> parseChunk() # “\n”で区切られる1行単位で呼ばれる -> parseBlocks() # “\n\n”で区切られる1段落単位で呼ばれる

こういうことをしているはず(*細かいところは適当) # title test - foo - bar text ElementTree

ところでElementTreeって何? • 別パッケージのものらしい • XMLドキュメントを表現する木構造 ◦ 各要素をElementと呼ぶ import xml.etree.ElementTree as

serializers.pyを読む • 当たりっぽい。以下のような呼び出し関係だった markdown.Convert -> self.serializer(ElementTreeのroot) -> to_html_string() -> _write_html()

最終的に明らかになった、ざくっとした処理の流れ 1. ブラウザ: ページにアクセスする 2. MkDocs: markdown.Markdownクラスのオブジェクトを作る 3. MkDocs: ページに対応するMarkdownファイルを引数としてmarkdown.convert()