Slide 9
Slide 9 text
Text to Tokens
Example: “ZnO is a wide bandgap semiconductor”
https://platform.openai.com/tokenizer
[57, 77, 46, 374, 3094,
4097, 43554, 39290, 87836]
Token-IDs
768 dimensional embeddings are looked up from the
(contextual) embedding matrix. These are model specific
Note that Zn is
split into two
tokens
(not ideal for
chemistry)