Analysis and Estimation of News Article Reading Time with Multimodal Machine Learning

Slide 1

Slide 1 text

Shotaro Ishihara (Nikkei Inc.), and Yasufumi Nakama [email protected] IEEE BigData 2022, Industry and Government Program Does Text Length matter? Analysis and Estimation of News Article Reading Time with Multimodal Machine Learning

Slide 2

Slide 2 text

Research Overview 2 ● text length ● headline / body text ● thumbnail image ● others like genre ● past reading history reading time

Slide 3

Slide 3 text

Summary 1: Dataset 3 ● text length ● headline / body text ● thumbnail image ● others like genre ● past reading history reading time ✅ Real-world content and access log of Nikkei

Slide 4

Slide 4 text

Summary 2: Text length 4 ● text length ● headline / body text ● thumbnail image ● others like genre ● past reading history reading time ✅ Doesn’t strongly correlate with reading time

Slide 5

Slide 5 text

Summary 3: Multimodal 5 ● text length ● headline / body text ● thumbnail image ● others like genre ● past reading history reading time ✅ Boosted performance

Slide 6

Slide 6 text

Outline 6 ● Introduction ● Problem Formulation ● Proposed Method ● Experiments ● Conclusion and Future Work

Slide 7

Slide 7 text

Reading time estimation helps: 7 ● Push notiﬁcations [1] ● Recommendation [2, 4-6] ● User decision support [3, 7] ● Clickbait analysis [22-23]

Slide 8

Slide 8 text

How can we estimate reading time? 8 ● text length ● headline / body text ● thumbnail image ● others like genre ● past reading history reading time

Slide 9

Slide 9 text

Research questions 9 1. How much does text length correlate with reading time? 2. How much do features other than text length improve the performance of reading time estimation?

Slide 10

Slide 10 text

Reading time dataset 10 ● A large dataset that includes reading time from Japanese ﬁnancial news from the Nikkei. ○ About 1,000 articles a day, 800,000 paid subscribers (and data infrastructure) ○ Larger and more scalable than some existing data on recording eye movements [8] [9] and brain activity [10]

Slide 11

Slide 11 text

Dataset details 11 100,000 sessions * 3 ● train: 21-12-01 ● val: 21-12-08 ● test: 21-12-15

Slide 12

Slide 12 text

RQ1: text length (x) & reading time (y) 12 Correlation coeﬃcient is 0.04 (and 0.31)

Slide 13

Slide 13 text

13 ● Architecture corresponding to the speciﬁc data ● E2E ﬁne-tuning Proposed Method

Slide 14

Slide 14 text

Experiments: Features & Models 14 1. The model was fixed to LightGBM [16] and the features were explored. 2. We fixed the features and observed differences. a. Ridge regression b. MLP c. Proposed method (w/wo E2E fine-tuning)

Slide 15

Slide 15 text

Experiments: Features 15 Additional features improved the metric. ●

Slide 16

Slide 16 text

1. mean reading time 2. text length 3. minimum reading time 4. embedding of body text (dimension 193) 5. embedding of thumbnail image (dimension 88) Important features by LightGBM 16

Slide 17

Slide 17 text

Experiments: Models 17 ● LightGBM worked better in the same feature. ● Proposed method outperformed LightGBM by adding LSTM, and e2e ﬁne-tuning.

Slide 18

Slide 18 text

Multimodal training tips 18 ● Different learning rate: 2e-5 for BERT, 1e-4 for Swin Transformer, and 1e-2 for the others ● CosineAnnealingLR: For training stability

Slide 19

Slide 19 text

Conclusion 19 ● We highlighted the importance of reading time and evaluated the implementation. ● Our analysis revealed reading time does not strongly correlate with text length. ● Our experiments showed a multimodal machine learning approach led to a more accurate estimation than simply using text length.

Slide 20

Slide 20 text

Future Work 20 ● Oﬄine evaluation => Online operation ● Further feature & model exploration ● Clickbait analysis