Extracting Troubles from Daily Reports based on Syntactic Pieces
Yoshifumi Kakimoto and Kazuhide Yamamoto. Extracting Troubles from Daily Reports based on Syntactic Pieces. Proceedings of the Annual meetings of the Pacific Asia Conference on Language, Information and Computation (PACLIC 22), pp.411-417 (2008.11)
daily reports Troubles must take into account the context of the problem We dealt with syntactic pieces [Aoki et al. 07] examples of troubles αʔόʔ͕ ˰ յΕΔ (the server breaks) Ԇ͕ ˰ ൃੜ͢Δ (the delay occurs)
No-trouble reports Training data Web corpus Trouble dictionary Extract syntactic pieces and calculate score A B Expand dictionary Matching of pieces Trouble information In new input report New input report Pieces of input report
pieces as troubles The score observes deviation between trouble and no-trouble reports. range of value: -1 ʙ +1 Consider reliability of scores with frequency Apply the confidence interval estimation method [Fujimura et al 04] [Agresti et al. 98] Pieces having positive scores are added to the trouble dictionary
extract all troubles Tackling troubles not included in the training data Expansion of Trouble Dictionary ಈ࡞͕͍ (motion is slow) ݕࡧ͕ (search) ද͕ࣔ (display) ରԠ͕ (response) ɾɾɾ ݕࡧ͕͍ (search is slow) ද͕͍ࣔ (display is slow) ରԠ͕͍ (response is slow) ɾɾɾ Searching of similar verbal nouns Add to the dictionary B A
(don’t appear on the screen) Ԇ͕ ˰ ൃੜ͢Δ (the delay occurs) ʢ̎ʣ αϙʔτʹ ˰ ి͢Δ (call for support) ൢചళʹ ˰ ฦ͢Δ (return goods to selling office) ʢ̏ʣ ίϯηϯτΛ ˰ ൈ͘ (pull out a plug) ిݯΛ ˰ ೖΕΔ (turn on power) Correct: base (1) Correct: base (1) and base (2) 0.30 (precision) 0.40 (precision) Input : 266 reports Threshold of the dictionary: 0.780 Number of extracted troubles: 407
reports Our dictionary is constructed using training data involving syntactic pieces The two-values classifier had an F- value of 0.772 the extracted troubles had a precision of 0.400
large) ɾɾɾ Ԇ͕ൃੜ͢Δ (the delay occurs) Pieces list of web corpus Τϥʔ͕ൃੜ͢Δ (the error occurs) ͕ۤൃੜ͢Δ (the complaint occurs) ͕ൃੜ͢Δ (the problem occurs) ɾɾɾ Top modifiers list Τϥʔ͕ (the error) ͕ۤ (the complaint) ͕ (the problem) ɾɾɾ Add to the dictionary
delay occurs : 10000 , 1000 , 0.819 Piecs Frequency in trouble reports Frequency in no- trouble reports scores We want to consider ‘the delay occurs’ is more expensive than another one. the server breaks: ʶ0.150 the delay occurs : ʶ0.014 confidence interval the server breaks: 0.669 the delay occurs : 0.805 Final scores
ʣ : the frequency of trouble reports containing wi Nʢwi ʣ : the frequency of no-trouble reports containing wi Pdoc : the total number of trouble reports Ndoc : the total number of no-trouble reports
or titles have the word ‘trouble’. No-trouble reports Tags and titles don’t have the word ‘trouble’. Kakaku.com review boards Trouble reports Tags have ‘bad’. No-trouble reports Tags don’t have ‘bad’ and ‘question’