Slide 8
Slide 8 text
Cambridge CheminformaAcs MeeAng 07/12/2016
Ma# Swain
Part-Of-Speech Tagging
8
• Assign a category to each token - 47 tags in total
• CRF trained on newspaper arAcles (WSJ) and
biomedical abstracts (GENIA)
• Used as a feature in machine learning enAty
recogniAon
• Used to construct parsing rules
NN Noun
NNS Plural Noun
JJ Adjec/ve
CD Cardinal Number
CC Conjunc/on
DT Determiner
VB Verb
FW Foreign Word
>>> cpt = ChemCrfPosTagger()
>>> cpt.tag(['NMR', 'spectra', 'were', 'recorded', 'on', 'a', '300', 'MHz', 'BRUKER', ‘spectrometer'])
[('NMR', 'NN'), ('spectra', 'NNS'), ('were', 'VBD'), ('recorded', 'VBN'), ('on', 'IN'),
('a', 'DT'), ('300', 'CD'), ('MHz', 'NNP'), ('BRUKER', 'NNP'), ('spectrometer', ‘NN')]