Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Narrative of Iris

kilometer
September 19, 2020

Narrative of Iris

in #88 Tokyo.R

kilometer

September 19, 2020
Tweet

More Decks by kilometer

Other Decks in Programming

Transcript

  1. #88
    2020.09.19
    Narrative of iris data
    kilometer00

    View Slide

  2. Who!?
    誰だ?

    View Slide

  3. Who!?
    Name: @kilometer
    Job: Post-Doc (Ph. D. in Engineering)
    Field: Behavioral Neurosci.
    Brain Imaging
    Medical System
    R: ~ 10 years

    View Slide

  4. Introduction of Iris data

    View Slide

  5. > head(iris)
    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    1 5.1 3.5 1.4 0.2 setosa
    2 4.9 3.0 1.4 0.2 setosa
    3 4.7 3.2 1.3 0.2 setosa
    4 4.6 3.1 1.5 0.2 setosa
    5 5.0 3.6 1.4 0.2 setosa
    6 5.4 3.9 1.7 0.4 setosa
    Iris Data
    > str(iris)
    'data.frame': 150 obs. of 5 variables:
    $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
    $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
    $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
    $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
    $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 ...

    View Slide

  6. library(tidyverse)
    iris_long iris %>%
    pivot_longer(cols = -Species,
    names_sep = "⧵⧵.",
    names_to = c("key", ".value"))
    > iris_long
    ## # A tibble: 6 x 4
    ## Species key Length Width
    ##
    ## 1 setosa Sepal 5.1 3.5
    ## 2 setosa Petal 1.4 0.2
    ## 3 setosa Sepal 4.9 3
    ## 4 setosa Petal 1.4 0.2
    ## 5 setosa Sepal 4.7 3.2
    Iris Data

    View Slide

  7. ggplot(data = iris_long) +
    aes(x = Width, y = Length,
    color = Species, shepe = key) +
    geom_point()
    Iris Data

    View Slide

  8. “R for Data Science” (Wickham & Grolemund, 2017)

    View Slide

  9. “R for Data Science” (Wickham & Grolemund, 2017)
    Data

    View Slide

  10. “R for Data Science” (Wickham & Grolemund, 2017)
    Data Hypothesis & observation Objectives
    Background

    View Slide

  11. “R for Data Science” (Wickham & Grolemund, 2017)
    Data Hypothesis & observation Objectives
    Background

    View Slide

  12. Iris flowers

    View Slide

  13. Iris flowers
    Northern Blue flag (Iris versicolor)
    Gordon, D. & Robertson, E., from Wikipedia, CC BY-SA 3.0

    View Slide

  14. Iris setosa var. canadensis
    Iris setosa var. interior
    Iris setosa
    Iris versicolor
    Iris virginica
    Iris virginica var. Shrevei
    Iris flowers
    (Morphological classification of northern and sub-artic blue flags)
    Anderson, E., 1936, Ann Mo Bot Gard.

    View Slide

  15. Iris setosa var. canadensis
    The species problem
    Specific name
    (種⼩名)
    Genus
    (属)
    Variety
    (変種)
    Species are groups of actually or potentially
    interbreeding natural populations, which are
    reproductively isolated from other such groups.
    Queiroz, K., 2005, PNAS
    Mayr, E., 1942, Columbia Univ. Press
    ---- by Ernst Mayr, 1942

    View Slide

  16. The species problem
    Queiroz, K., 2005, PNAS
    SC (species criterion) 1-8:
    the times at which the daughter lineages
    acquire different properties relative to
    one another

    View Slide

  17. "The species Problem in Iris"
    ---- by Edgar Anderson, 1936
    Anderson, E., 1936, Ann Mo Bot Gard.
    As a biological phenomenon the species problem is
    worthy of serious study as an end in itself.

    View Slide

  18. Iris setosa
    Iris versicolor
    Iris virginica
    minutely papillate at the base of blade
    macroscopically pubescent at the base of blade
    Petals
    setose
    laminate
    Sepals
    Iris flowers
    (Morphological classification of northern and sub-artic blue flags)
    Anderson, E., 1936, Ann Mo Bot Gard.
    (花弁)
    (萼⽚)
    (剛⽑を有する)
    (滑らか)
    (基部に⾁眼で観察できる軟⽑を有する)
    (基部は細かな乳頭状)

    View Slide

  19. The other Irises

    View Slide

  20. The botanical Garden' detailing plants brought to Egypt after the
    campaigns of Tuthmosis III (around 1426 B.C.), Karnak Temple
    Farrar, L., 2016, Windgather Press, photo: en.wikipedia.org/wiki/Iris_albicans
    Iris albicans

    View Slide

  21. Iris, a Greek goddess
    ・Daughter of
    Taumas (sun of Pontus) &
    Electra (daughter of Oceanus)
    ・Messenger of Hera
    ・Goddess of the rainbow
    ・Goddess of the sky and sea
    Rainbow: bridge between Heaven and Earth
    (in ancient Greek, iris = rainbow, eiris = messenger)
    Koudu, H., 1953, Iwanami

    View Slide

  22. photo: https://www.theoi.com/Gallery/P21.6.html
    Hera & Iris (ca. 480 B.C.)
    kerykeion
    oinochoe jug
    wings
    skkos
    Hera Iris

    View Slide

  23. Figures: en.wikipedia.org/wiki/Iris_(anatomy)
    Iris in anatomy
    Iris(虹彩)

    View Slide

  24. Iris, as a symbol
    Fleur-de-lis
    photos: en.wikipedia.org/wiki/Fleur-de-lis, wiki/Iris_pseudacorus, wiki/Iris_florentina
    I. pseudacorus I. florentina

    View Slide

  25. Iris
    Encode

    View Slide

  26. Ramen
    Encode

    View Slide

  27. &ODPEF
    "QQMF
    3FBM
    "QQMF
    *OGPSNBUJPO
    %FDPEF

    View Slide

  28. %JWFSHFODF
    3FBM
    *OGP
    %BUB "QQMF
    &ODPEJOH

    View Slide

  29. -PTT͛
    Symbol grounding problem
    %JWFSHFODF
    3FBM
    *OGP
    %BUB "QQMF
    &ODPEJOH

    View Slide

  30. Anderson's Iris study

    View Slide

  31. "The species Problem in Iris"
    ---- by Edgar Anderson, 1936
    Symbol grounding problem

    View Slide

  32. Iris setosa
    Iris versicolor
    Iris virginica
    minutely papillate at the base of blade
    macroscopically pubescent at the base of blade
    Petals
    setose
    laminate
    Sepals
    Iris flowers
    (Morphological classification of northern and sub-artic blue flags)
    Anderson, E., 1936, Ann Mo Bot Gard.
    (花弁)
    (萼⽚)
    (剛⽑を有する)
    (滑らか)
    (基部に⾁眼で観察できる軟⽑を有する)
    (基部は細かな乳頭状)

    View Slide

  33. Anderson, E., 1936, Ann Mo Bot Gard., towardsdatascience.com

    View Slide

  34. Anderson, E., 1936, Ann Mo Bot Gard.
    I. versicolor
    I. virginica
    I. virginica var. shrevei
    ideograph
    Sepal
    Petal

    View Slide

  35. The northern blue flags ...... study the
    minutae of variation so intensively in these
    two species that one might demonstrate
    the way in which one species had evolved
    from the other, or from some common
    ancestor.
    Iris versicolor might vary greatly and
    that Iris virginica might vary greatly
    but that each remained itself. ......
    The variation within could never be
    compounded into the variation between.
    Q.
    A.

    View Slide

  36. The iris data

    View Slide

  37. > ?iris

    View Slide

  38. View Slide

  39. View Slide

  40. View Slide

  41. “R for Data Science” (Wickham & Grolemund, 2017)
    Data Hypothesis & observation Objectives
    Background

    View Slide

  42. Fisher, R. A., 1936, Annals of Eugenics
    I. 判別関数
    2つ以上の集団がx1
    , ....,x8
    で測定されているとして、集団が最もよ
    く識別される線形関数を求めることに関⼼がある。著者の提案は
    (a)......、および(b) ......など、頭蓋測定において既に⾏われている
    最も明確に進歩的または世俗的な傾向を⽰す。本論⽂では、同じ
    原理の応⽤を分類学的な問題に例⽰し、採⽤されたプロセスの精
    度に関連したいくつかの問題についても議論する。

    View Slide

  43. Fisher, R. A., 1936, Annals of Eugenics
    I. 判別関数
    2つ以上の集団がx1
    , ....,x8
    で測定されているとして、集団が最もよ
    く識別される線形関数を求めることに関⼼がある。著者の提案は
    (a)......、および(b) ......など、頭蓋測定において既に⾏われている
    最も明確に進歩的または世俗的な傾向を⽰す。本論⽂では、同じ
    原理の応⽤を分類学的な問題に例⽰し、採⽤されたプロセスの精
    度に関連したいくつかの問題についても議論する。
    (優⽣学年鑑)

    View Slide

  44. Fisher, R. A., 1936, Annals of Eugenics
    I. 判別関数
    2つ以上の集団がx1
    , ....,x8
    で測定されているとして、集団が最もよ
    く識別される線形関数を求めることに関⼼がある。著者の提案は
    (a)......、および(b) ......など、頭蓋測定において既に⾏われている
    最も明確に進歩的または世俗的な傾向を⽰す。本論⽂では、同じ
    原理の応⽤を分類学的な問題に例⽰し、採⽤されたプロセスの精
    度に関連したいくつかの問題についても議論する。
    (優⽣学年鑑)

    View Slide

  45. -- Publisher's comment --
    The work of eugenicists was often pervaded by
    prejudice against racial, ethnic and disabled groups.
    (優⽣学者の仕事は時として⼈種・⺠族・障害者グルー
    プに対する偏⾒が蔓延していた。)
    Publication of this material online is for scholarly
    research purposes is not an endorsement or
    promotion of the views expressed in any of these
    articles or eugenics in general.
    (この資料をオンラインで公開するのは学術研究を⽬的
    としたものであり、これらの論⽂や優⽣学⼀般の⾒解
    を⽀持したり宣伝したりするものではない。)

    View Slide

  46. Num. of citation is one of the most popular
    index of scientific research impact.
    Do you REALLY want to give
    this paper any more impact?

    View Slide

  47. When you use the iris data,
    you also become one of the
    characters in its narrative.

    View Slide

  48. Is your iris "the iris"?

    View Slide

  49. Bezdek, J. C. et al., 1999, IEEE Transactions on Fuzzy Systems
    "We do not guarantee that all the results
    we discuss for “the” Iris data really
    pertain to the same numerical inputs."

    View Slide

  50. Bezdek, J. C. et al., 1999, IEEE Transactions on Fuzzy Systems
    Specifically, two vectors in Iris Sestosa were
    wrong: vector 35 in Fisher is (4.9, 3.1, 1.5, 0.2),
    but in the machine learning electronic database
    it had the coordinates (4.9, 3.1, 1.5, 0.1); and
    vector 38 in Fisher is (4.9, 3.6, 1.4, 0.1), but in
    the electronic database it was (4.9, 3.1, 1.5, 0.1).

    View Slide

  51. http://archive.ics.uci.edu/ml/datasets/Iris

    View Slide

  52. http://archive.ics.uci.edu/ml/datasets/Iris

    View Slide

  53. View Slide

  54. "Better yet (and we know many of you will
    check our version this way), return to the
    source and take the values directly from
    Fisher’s paper."
    Bezdek, J. C. et al., 1999, IEEE Transactions on Fuzzy Systems

    View Slide

  55. "Better yet (and we know many of you will
    check our version this way), return to the
    source and take the values directly from
    Fisher’s paper."
    Bezdek, J. C. et al., 1999, IEEE Transactions on Fuzzy Systems
    -> Or, stop using the Iris data.

    View Slide

  56. Movements

    View Slide

  57. View Slide

  58. View Slide

  59. View Slide

  60. View Slide

  61. Penguins?

    View Slide

  62. View Slide

  63. https://allisonhorst.github.io/palmerpenguins/

    View Slide

  64. Summary

    View Slide

  65. 1.
    Data has always its own narrative.
    Data Hypothesis & observation Objectives
    Background

    View Slide

  66. 2.
    Anderson, E., "Species problem in iris.", 1936
    Fisher, R., Annals of Eugenic, 1936
    Should not be cited any more,
    because it is one measure of scientific impact

    View Slide

  67. Original
    Miscopy
    3.
    There are several miscopy version of the "iris".

    View Slide

  68. 4.
    Community movement

    View Slide

  69. The only way to stop citing Fisher's paper is to
    not use iris data. That would solve the other
    annoying problem of checking for miscopying.
    Don't forget when you use the iris data you
    also become one of the characters in its
    narrative.
    We can start stopping the use of iris data today.
    Actually, it's quite easy.
    5. My opinion

    View Slide

  70. Enjoy!!!
    KTM

    View Slide