$30 off During Our Annual Pro Sale. View Details »

Analysis of Bias in Gathering Information Between User Attributes in News Application (ABCCS 2018)

ysekky
December 10, 2018

Analysis of Bias in Gathering Information Between User Attributes in News Application (ABCCS 2018)

ysekky

December 10, 2018
Tweet

More Decks by ysekky

Other Decks in Research

Transcript

  1. Analysis of Bias in Gathering Information
    Between User Attributes in News Application
    Yoshifumi Seki (Gunosy Inc.)
    Mitsuo Yoshida (Toyohashi University of
    Technology)
    ABCCS2018@IEEE Bigdata 2018
    2018.12.10

    View Slide

  2. Motivations
    ● Confirmation bias is existed in information gathering on the web.
    ○ e.g. Filter Bubbles, Echo chamber
    ○ These phenomena have been investigated by questionnaires.
    ● We would like to clarify these phenomena by analyzing behavior data.
    ○ In this study, using user activity logs in news application.
    ○ For evaluating diversity of recommender systems, improving long-period user satisfaction, and
    so on.

    View Slide

  3. Research Question
    ● Q. How behavior in the news application differs between user attributes?
    ○ Ideally, we would like to analyze users based on their interest.
    ○ Instead of user’s interest, we analyze users based on their attributes.
    ● Our Contributions:
    ○ Clarify relationships of user behavior between user attributes.
    ○ Detect keywords that are biased by attribute, using regression analysis.

    View Slide

  4. Data Source
    ● Gunosy
    ○ Japanese popular news delivery service
    ○ providing mobile application (iOS, Android)
    ○ over 24 million downloads
    ○ deliver over 600 media news
    4

    View Slide

  5. DataSet
    ● August 1 to 31, 2019 (1 month )
    ● news articles
    ○ politics, society
    ● 2 type action
    ○ Click, Like
    ● Clicked more than 100 times
    ● User Attributes
    ○ users register own attributes to that application.
    ■ if users don’t register, their attributes are predicted by supervised learning.
    ○ age
    ■ - 29 (younger), 30-39 (middle), 40- (older)
    ○ gender
    ■ male, female
    5

    View Slide

  6. Gender Action Ratio
    all politics society
    click male 58.9% 76.2% 54.0%
    female 41.1% 23.8% 46.0%
    like male 47.7% 78.2% 47.4%
    female 52.3% 21.8% 52.6%
    # of news 1,333 8,801

    View Slide

  7. Age Action Ratio
    all politics society
    click young 34.7% 16.4% 23.1%
    middle 30.2% 22.1% 30.4%
    older 35.1% 61.5% 46.5%
    like young 25.8% 8.8% 16.0%
    middle 25.4% 11.0% 22.1%
    older 48.7% 80.2% 61.9%
    # of news 1,333 8,801

    View Slide

  8. Normalize # of Action
    ● The trend in # of action is different depending on categories and attributes.
    ○ The normalization is needed.

    View Slide

  9. Scatter Plot by gender
    Click Like
    Pearson’s correlation coefficient
    0.902 0.883 0.502 0.509
    strong positive correlation weakly than click
    >

    View Slide

  10. Pearson’s coefficient by ages
    politics society
    click like click like
    young-middle 0.993 0.909 0.985 0.955
    middle-older 0.923 0.845 0.969 0.976
    older-young 0.901 0.786 0.936 0.902

    View Slide

  11. Result of Correlation Analysis
    ● Difference in category user behavior by attributes where compared using
    correlation coefficient.
    ○ Click number has strong positive correlations between attribute.
    ○ Like number has weak correlations compared to click’s.
    ● User behavior between attributes has strong correlation.
    ○ we are able to discuss about their differences by user behavior data.

    View Slide

  12. Comparison by keywords
    ● Our purpose is to clarify how the behavior differ between user attributes on
    the topic of news articles.
    ○ There are various definitions of news topics.
    ○ This study compares articles based on the keywords included in the title
    ● Extract keywords from news articles.
    ○ Divide the title of the news article into morphemes using Mecab
    ■ These morphemes are taken as keyword candidates.
    ○ Count news articles including each keyword candidate.
    ○ We adopt top 100 words in this count as keywords.
    ■ meaningless words are excluded.

    View Slide

  13. Distribution of keyword correlation coefficient
    ● We would like to compare keywords between user attributes.
    ○ If the correlation coefficient of the keyword is weak, that keyword is not comparable.
    ● Keywords with weak correlation coefficient are included articles with very few
    number of actions.
    Click Like

    View Slide

  14. Regression Analysis
    ● For detecting the difference of keyword, we adopt regression analysis.
    ● By regression analysis, Slope and Intercept are obtained.
    ○ exclude keywords whose coefficient of determination is 0.5 or less.
    ■ coefficient of determination is similar to correlation coefficient

    View Slide

  15. Compare Keyword Intercept
    The slope of these two keywords are close to the average,
    the intercept is large and small.

    View Slide

  16. Compare Keyword Slope
    The intercept of these two keywords are close to the average,
    the slope is large and small.

    View Slide

  17. Compare keywords preferred by female
    Keyword “hospital” has many articles with fewer clicks than keyword “mother”.

    View Slide

  18. Biased Keywords Detection
    ● Using slope (s) and intercept (i), keywords are divided into three categories
    based on mean ± σ.
    ○ lager than upper ( x > mean + σ)
    ○ smaller than lower (x < mean - σ)
    ○ within the section ( mean - σ < x < mean + σ)
    ● These category is defined under the assumption that the distribution of these
    parameter is normal distribution.
    ○ belonging to 95% or not.
    ● If one is within section and other is not, this keyword is biased.

    View Slide

  19. Biased Keyword by intercept in gender
    ● Mio Sugita is a Japanese politician who presented papers on LGBT in
    magazines. The claims in these papers is caused controversy.
    ● There is news about the possible introduction of Summer Time before the
    2020 Summer Olympic Games in Tokyo.
    ● A 2-year-old boy was missing in the forest and was rescued by a volunteer.
    politics society
    click like click like
    Upper
    (biased to male)
    House of Representatives,
    China
    Police Obscenity
    Lower
    (biased to female)
    Sugita Mio,
    Summer Time, Cabinet,
    Olympics
    Child, Mother Boy, Crush,
    Mother, Children

    View Slide

  20. Biased Keyword by intercept in gender
    ● Mio Sugita is a Japanese politician who presented papers on LGBT in
    magazines. The claims in these papers is caused controversy.
    ● There is news about the possible introduction of Summer Time before the
    2020 Summer Olympic Games in Tokyo.
    ● A 2-year-old boy was missing in the forest and was rescued by a volunteer.
    politics society
    click like click like
    Upper
    (biased to male)
    House of Representatives,
    China
    Police Obscenity
    Lower
    (biased to female)
    Sugita Mio,
    Summer Time, Cabinet,
    Olympics
    Child, Mother Boy, Crush,
    Mother, Children

    View Slide

  21. Biased Keyword by intercept in gender
    ● Mio Sugita is a Japanese politician who presented papers on LGBT in
    magazines. The claims in these papers is caused controversy.
    ● There is news about the possible introduction of Summer Time before
    the 2020 Summer Olympic Games in Tokyo.
    ● A 2-year-old boy was missing in the forest and was rescued by a volunteer.
    politics society
    click like click like
    Upper
    (biased to male)
    House of Representatives,
    China
    Police Obscenity
    Lower
    (biased to female)
    Sugita Mio,
    Summer Time, Cabinet,
    Olympics
    Child, Mother Boy, Crush,
    Mother, Children

    View Slide

  22. Biased Keyword by intercept in gender
    ● Mio Sugita is a Japanese politician who presented papers on LGBT in
    magazines. The claims in these papers is caused controversy.
    ● There is news about the possible introduction of Summer Time before the
    2020 Summer Olympic Games in Tokyo.
    ● A 2-year-old boy was missing in the forest and was rescued by a
    volunteer.
    politics society
    click like click like
    Upper
    (biased to male)
    House of Representatives,
    China
    Police Obscenity
    Lower
    (biased to female)
    Sugita Mio,
    Summer Time, Cabinet,
    Olympics
    Child, Mother Boy, Crush,
    Mother, Children

    View Slide

  23. Conclusion
    ● We analyzed behavior differences between user attributes based on the user
    behavior log of news applications and extracted keywords with biased
    behavior.
    ● Using regression analysis, we obtain a biased keyword from the degree of
    departure from the average value of slope and intercept.
    ● Future Works
    ○ Verify whether this result is valid according to social science knowledge.
    ○ Discover a strong bias topic due to user's interests rather than user
    attributes.
    ○ Create a measure that can extract keywords more simply.

    View Slide