Own Mustache

Свой Mustache за 40 минут Сергей Архипов, 2017

http:/ /9seconds.github.io/curly

Почему доклад такой странный ― Он для тех, кто умеет
писать регулярные выражения; ― У меня 40 минут, а в Книге дракона 1184 страницы; ― Книгу дракона я не дочитал; ― И тем не менее, я хочу рассказать полную реализацию шаблонизатора; ― Если вы знаете, в чем разница между LR(0) и LR(2), то вам тут будет скучно; ― 40 минут — у меня нет времени, буду лихо, хамски, срезать кучу углов.

Stanford CS143 https://web.stanford.edu/class/archive/cs/cs143/cs143.1128/

Описание языка Hello world! Hello world!

Hello {{ name }}! Hello Jane! {"name": "Jane"} Описание языка

Hello {% if name %}{{ name }}{% else %}default{% /if
%}! Hello default! {"name": ""} Описание языка

Hello {% for names %}{{ item }} {{% /for %}!
Hello 12! {"names": ["1", "2"]} Описание языка

Hello {{ first_name }}, {% if last_name %} {{ last_name
}} {% elif title %} {{ title }} {% else %} Doe {% /if %}! Here is the list of stuff I like: {% loop stuff %} - {{ item.key }} (because of {{ item.value }}) {% /loop %} And that is all!

Лексер Мама мыла раму. Сущ(Мама), Пробел, Глг(мыла), Пробел, Сущ(раму), Пнкт(.)

Парсер Сущ(Мама), Пробел, Глг(мыла), Пробел, Сущ(раму), Пнкт(.) Предложение Сущ(Мама) Пробел
Глг(мыла) Пробел Сущ(раму) Пнкт(.)

Анатомия шаблона Hello world {{ name }} {% if something
%} {% elif condition %} {% else %} {% /if %}

Анатомия шаблона Hello world {{ name }} {% if something
%} {% elif condition %} {% else %} {% /if %} Block Tags Print Tag Literal Function Expression

Пишем лексер Hello {%if title%}{{title}}{%/if%} {{name}}! 1. Конечный автомат в
явном виде 2. Составное регулярное выражение

Пишем лексер Function \w+ Expression (?:\\.|\w)+ LPrintBracket {{ RPrintBracket }}
LBlockBracket {% RBlockBracket %} Literal .*?

Пишем лексер FUNC_RE = r"[a-zA-Z0-9_-]+" EXP_RE = r"(?:\\.|[^\{\}%])+"

Пишем лексер PRINT_RE = r""" {{\s* (%s) \s*}} """ %
EXP_RE

Пишем лексер START_BLOCK_RE = r""" {%%\s* (%s) \s* (%s)? \s*%%}
""" % (FUNC_RE, EXP_RE)

Пишем лексер END_BLOCK_RE = r""" {%%\s* /\s* (%s) \s*%%} """
% FUNC_RE

Пишем лексер TOKENIZE_RE = r"(?P<{0}>{1})|(? P<{2}>{3})|(?P<{4}>{5})".format( "print", PRINT_RE, "start_block", START_BLOCK_RE,
"end_block", END_BLOCK_RE )

Пишем лексер re.finditer(pattern, string, flags=0) re.match.start([group]) re.match.end([group]) re.match.lastgroup

Пишем лексер Hello {%if title%}{{title}}{%/if%} {{name}}! match_start = 0 match_end
= 0 previous_match_end = 0

= 17 previous_match_end = 0

= 17 previous_match_end = 0 text='Hello '

= 17 previous_match_end = 0 text='Hello ' func='if' expr=' title '

= 26 previous_match_end = 17 text='Hello ' func='if' expr=' title ' expr='title'

= 33 previous_match_end = 26 text='Hello ' func='if' expr=' title ' expr='title' func='if'

= 42 previous_match_end = 33 text='Hello ' func='if' expr=' title ' expr='title' func='if' text=' ' expr='name'

= 42 previous_match_end = 33 text='Hello ' func='if' expr=' title ' expr='title' func='if' text=' ' expr='name' text='!'

Пишем лексер Hello {%if title%}{{title}}{%/if%} {{name}}! text='Hello ' func='if' expr='
title ' expr='title' func='if' text=' ' expr='name' text='!'

Пишем лексер def tokenize(text): previous_end = 0 tokens = get_token_patterns()
if isinstance(text, bytes): text = text.decode("utf-8") for matcher in make_tokenizer_regexp().finditer(text): if matcher.start(0) != previous_end: yield LiteralToken(text[previous_end:matcher.start(0)]) previous_end = matcher.end(0) match_groups = matcher.groupdict() token_class = tokens[matcher.lastgroup] yield token_class(match_groups[matcher.lastgroup]) leftover = text[previous_end:] if leftover: yield LiteralToken(leftover)

Пара слов о парсерах 1. Будем писать парсер по старинке;
2. Парсер — почти классический shift-reduce (восходящий, rightmost); 3. Мы не будем писать грамматику; 4. Мы будем сразу строить AST на стеке.

Пример грамматики: JSON object → '{' pairs '}' pairs →
pair pairs_tail | ε pair → STRING ':' value pairs_tail → ',' pairs | ε value → STRING | NUMBER | 'true' | 'false' | 'null' | object | array array → '[' elements ']' elements → value elements_tail | ε elements_tail → ',' elements | ε

Пишем парсер Hello {%if title%}{{title}}{%else%}Mr.{%/if%} {{name}}! text='Hello ' func='if' expr='
title ' expr='title' func='if' text=' ' expr='name' text='!' func='else' text='Mr.'

Пишем парсер Hello {%if title%}{{title}}{%else%}Mr.{%/if%} {{name}}! func='if' expr=' title '
expr='title' func='if' text=' ' expr='name' text='!' func='else' text='Mr.' text='Hello ' text('Hello ')

Пишем парсер Hello {%if title%}{{title}}{%else%}Mr.{%/if%} {{name}}! func='if' expr=' title '
expr='title' func='if' text=' ' expr='name' text='!' func='else' text='Mr.' text='Hello ' text('Hello ') conditional

Пишем парсер Hello {%if title%}{{title}}{%else%}Mr.{%/if%} {{name}}! expr='title' func='if' text=' '
expr='name' text='!' func='else' text='Mr.' text='Hello ' text('Hello ') conditional if(' title ')

Пишем парсер Hello {%if title%}{{title}}{%else%}Mr.{%/if%} {{name}}! func='if' text=' ' expr='name'
text='!' func='else' text='Mr.' text='Hello ' text('Hello ') conditional if(' title ') print('title')

text='!' text='Mr.' text='Hello ' text('Hello ') conditional if(' title ') print('title') else

text='!' text='Hello ' text('Hello ') conditional if(' title ') print('title') else text('Mr.')

Пишем парсер Hello {%if title%}{{title}}{%else%}Mr.{%/if%} {{name}}! text=' ' expr='name' text='!'
text='Hello ' text('Hello ') if(' title ') print('title') else text('Mr.')

Пишем парсер Hello {%if title%}{{title}}{%else%}Mr.{%/if%} {{name}}! expr='name' text='!' text='Hello '
text('Hello ') if(' title ') print('title') else text('Mr.') text(' ')

Пишем парсер Hello {%if title%}{{title}}{%else%}Mr.{%/if%} {{name}}! text='!' text='Hello ' text('Hello
') if(' title ') print('title') else text('Mr.') text(' ') print('name')

Пишем парсер Hello {%if title%}{{title}}{%else%}Mr.{%/if%} {{name}}! text='Hello ' text('Hello ')
if(' title ') print('title') else text('Mr.') text(' ') print('name') Text('!')

Пишем парсер Hello {%if title%}{{title}}{%else%}Mr.{%/if%} {{name}}! text('Hello ') if(' title
') print('title') else text('Mr.') text(' ') print('name') Text('!') Root

Пишем парсер Hello {%if title%}{{title}}{%else%}Mr.{%/if%} {{name}}! text('Hello ') if(' title
') print('title') else text('Mr.') text(' ') print('name') text('!') Root

Пишем парсер def parse(tokens): stack = [] for token in
tokens: if isinstance(token, lexer.LiteralToken): stack = parse_literal_token(stack, token) elif isinstance(token, lexer.PrintToken): stack = parse_print_token(stack, token) elif isinstance(token, lexer.StartBlockToken): stack = parse_start_block_token(stack, token) elif isinstance(token, lexer.EndBlockToken): stack = parse_end_block_token(stack, token) else: raise exceptions.CurlyParserUnknownTokenError(token) root = RootNode(stack) validate_for_all_nodes_done(root) return root

Пишем парсер def parse_print_token(stack, token): stack.append(PrintNode(token)) return stack def parse_start_elif_token(stack,
token): stack = rewind_stack_for(stack, search_for=IfNode) stack.append(IfNode(token)) return stack

Пишем парсер def rewind_stack_for(stack, *, search_for): nodes = [] node
= None while stack: node = stack.pop() if not node.done: break nodes.append(node) else: raise exceptions.CurlyParserNoUnfinishedNodeError() if not isinstance(node, search_for): raise exceptions.CurlyParserUnexpectedUnfinishedNodeError( search_for, node) node.done = True node.data = nodes[::-1] stack.append(node) return stack

Генерируем шаблон 1. Делаем in-order обход дерева; 2. На основе
контекста генерируем кусочек текста из ноды; 3. Собираем кусочки в том порядке, в котором они были сгенерированы; 4. Конкатенируем эти кусочки.

Генерируем шаблон class Node: ... def process(self, context): return "".join(self.emit(context))
def emit(self, context): for node in self: yield from node.emit(context)

Генерируем шаблон class LiteralNode(Node): ... @property def text(self): return self.token.contents["text"]
def emit(self, _): yield self.text

Генерируем шаблон class PrintNode(ExpressionMixin, Node): ... def emit(self, context): yield
str(self.evaluate_expression(context))

Генерируем шаблон class IfNode(BlockTagNode): ... def emit(self, context): if self.evaluate_expression(context):
yield from super().emit(context) elif self.elsenode: yield from self.elsenode.emit(context)

Генерируем шаблон class LoopNode(BlockTagNode): ... def emit(self, context): resolved =
self.evaluate_expression(context) context_copy = context.copy() if isinstance(resolved, dict): for key, value in sorted(resolved.items()): context_copy["item"] = {"key": key, "value": value} yield from super().emit(context_copy) else: for item in resolved: context_copy["item"] = item yield from super().emit(context_copy)

Спасибо! @9seconds    https://speakerdeck.com/9seconds/own-mustache https://9seconds.github.io/curly

Own Mustache

Own Mustache

More Decks by Sergey Arkhipov

Other Decks in Programming

Featured

Transcript