Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Wordometer

Kai
August 26, 2013

The Wordometer

Estimating the Number of Words Read Using Document Image Retrieval and Mobile Eye Tracking

Paper Presentation at ICDAR 2013 http://icdar2013.org/

Abstract—This paper introduces the Wordometer, a novel method to estimate the words a user reads using the eye gaze recorded by a mobile eye tracker and document image retrieval. We present a reading detection algorithm which works with over 91 % accuracy over 10 test subjects using 10-fold cross validation. We implement two algorithms to estimate the read words using a line break detector. A simple version gives an average error rate of 13,5 % for 9 users over 10 documents. A more sophisticated word count algorithm based on support vector regression with an RBF kernel reaches an average error rate from only 8.2 % (6.5 % if one test subject with abnormal behavior is excluded). The achieved error rates are comparable to pedometers that count our steps in our daily life. This means, the Wordometer can be used as a step counter for the information we read to make our knowledge life healthier.

Kai

August 26, 2013
Tweet

More Decks by Kai

Other Decks in Science

Transcript

  1. Estimating the Number of Words Read Using Document Image Retrieval

    and Mobile Eye Tracking The Wordometer Kai Kunze, Hitoshi Kawaichi, Kazuyo Yoshimura, Koichi Kise Osaka Prefecture University
  2. Kai Kunze - The Wordometer Overview Motivation + Approach Method

    Reading Detection Detection of Line Breaks Estimating Words Read Experimental Results Conclusion and Future Work 2
  3. Kai Kunze - The Wordometer “Traditional” Document Analysis Focus on

    the Object: the Document and its Structure usually using Computer Vision (Scanner or Camera) a lot of effort to reconstruct the document information Sometimes also the creator (Historic Documents and Handwriting Recognition) including topics like: identity (who wrote it) authenticity (is it the right person at the right time) 3 What if we broaden the topics and change our focus?
  4. Kai Kunze - The Wordometer “User-centered” Document Analysis Let’s focus

    on the readers 1. Analyze the document through the users ... What did they read? How much?How fast? How often? How much do they understand? 2. Analyze the users through the documents/ their reading behavior 4 If you give quantified feedback people can improve their habits see Step Counter and their impact on weight loss People who read more have higher vocabulary skill, higher general knowledge * * A. Cunningham and K. Stanovich. What reading does for the mind. Journal of Direct Instruction, 1(2):137–149, 2001. “Can I copy the habits of my thesis advisor to become a better researcher?”
  5. Kai Kunze - The Wordometer Method Overview 5 eyegaze data

    scene camera eyegaze projection to document coordinates using LLAH (“Locally Likely Arrangement Hashing”) reading recognition line break detection estimating word count static words per line SVM Regression
  6. Kai Kunze - The Wordometer Eyegaze projection using document image

    retrieval 6 Scene camera Retrieved Document Detailed Stats
  7. Kai Kunze - The Wordometer Problems sometimes a vertical offset

    is introduced likely due to head movements no issue as long as the user starts reading at the top of the page 7
  8. Kai Kunze - The Wordometer Reading Recognition fixation features: number

    of fixations sum of the duration of fixations average time of fixations 8 G. Buscher and A. Dengel, “Gaze-based filtering of relevant document segments,” in Workshop on Web Search Result Summarization and Presentation. (WSSP-2009) saccade features: average length of saccades minimum length of saccades horizontal element of saccades 3 sec. frame sliding win. SV Classifier with RBF
  9. Kai Kunze - The Wordometer Line Break Detection long saccade

    in the opposite reading direction detected over direction change and thresholding some succeeding fixations/saccades are also included 9 ideal realistic Ainsley Harriott - PET - I’ve always been a bit of an entertainer and played the funny man. I was a part-time comedian for years, so I learned how to stand in front of audiences. It made me sure of myself. I like being liked and I love making everyone smile. I’ve lived in London all my life and have just moved to a larger house with my wife Clare and our two children, Jimmy and Madeleine. We spend a lot of time just singing and dancing around the house. 5 I grew up with music because my dad is the pianist, Chester Harriott - who’s still playing, by the way. My working day is divided between television and writing cook books, though TV takes most of my time. I spend about five days a fortnight working on the cooking programs I appear in. I eat all sorts of things at home but I only buy quality food. When I’m cooking, I experiment with whatever is in the fridge - it’s good practice for my TV series. 10 I’m a football fan and enjoy going to matches, but I’m a home-loving person really. I don’t like going to the pub but we do go out to eat about twice a month. There’s nothing better than a night at home playing with the children. I rarely go to bed before midnight. Late evening is when fresh thoughts on cooking usually come to me, so I often write or plan my programs then. When I eventually get to bed, I have no trouble sleeping! 15 Ainsley Harriott - PET - I’ve always been a bit of an entertainer and played the funny man. I was a part-time comedian for years, so I learned how to stand in front of audiences. It made me sure of myself. I like being liked and I love making everyone smile. I’ve lived in London all my life and have just moved to a larger house with my wife Clare and our two children, Jimmy and Madeleine. We spend a lot of time just singing and dancing around the house. 5 I grew up with music because my dad is the pianist, Chester Harriott - who’s still playing, by the way. My working day is divided between television and writing cook books, though TV takes most of my time. I spend about five days a fortnight working on the cooking programs I appear in. I eat all sorts of things at home but I only buy quality food. When I’m cooking, I experiment with whatever is in the fridge - it’s good practice for my TV series. 10 I’m a football fan and enjoy going to matches, but I’m a home-loving person really. I don’t like going to the pub but we do go out to eat about twice a month. There’s nothing better than a night at home playing with the children. I rarely go to bed before midnight. Late evening is when fresh thoughts on cooking usually come to me, so I often write or plan my programs then. When I eventually get to bed, I have no trouble sleeping! 15 1 2 3
  10. Kai Kunze - The Wordometer Estimating Words Read Simple Method:

    use a fixed word count per line given by the document mean words per line (N) x lines detected ( L + 1) = words read SVR Method: use a SV Regression model to estimate words read f(N,L, x) SVR with Radial Basis Function 10 feature vector x: duration required for reading the number of fixations for a page total distance of eye movements total distance of saccades average distance of saccades
  11. Kai Kunze - The Wordometer Experiments 10 English texts (between

    200 - 400 words) None reading activities as long as reading we pick playing with objects (or games on the smart phone) 10 participants ( 3 female, 7 male, average age ~23) users wear the SMI Mobile Eyetracker records eyegaze 30 Hz binocular, video 1280x960 read the documents naturally 3 point calibration before each document 11
  12. Kai Kunze - The Wordometer Results reading /not reading detection:

    91 % (3 sec. sliding window) word count error: 12 simple method error 13.5% 8.5% svr method error 8.5% 6.5% excluding one subject over all problems: uncharacteristic reading patterns re-reading lines/parts of the sentence etc.
  13. Kai Kunze - The Wordometer Conclusion and Future Work Remove

    the LLAH (document image retrieval) requirement More Natural Experimental Evaluations Evaluate other Modalities that could help with Tracking Reading Documents especially the questions: “What are you reading?” “How much do you understand?” see the talk this afternoon ;-) We are living in a knowledge economy ... We need to know more about how we process structured information. 13
  14. Kai Kunze - The Wordometer Future Work? 14 read.it words

    read word count Manga Science Papers Concentrated Reading 20 pages 15 pages Japanese Overview 30 min
  15. Kai Kunze - The Wordometer Demo What can we do

    for people who think a mobile eye-tracker is too expensive or don’t want to wear glasses the whole day? 15