Slide 7
Slide 7 text
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analyzing ASK AN EXPERT Logs !
Tokenization
from janome.tokenizer import Tokenizer
t = Tokenizer("userdic.csv", udic_enc="utf8")
f = io.open('./sodan.txt', 'r', encoding='utf-8’)
tokens = t.tokenize(line)
for token in tokens:
partOfSpeech = token.part_of_speech.split(',')[0]
if partOfSpeech == u'’:
if token.surface == ‘https’: pass
elif token.surface.isnumeric(): pass
else: sodan_words.append(token.surface)
https://github.com/mocobeta/janome