Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Narrative of Iris

kilometer
September 19, 2020

Narrative of Iris

in #88 Tokyo.R

kilometer

September 19, 2020
Tweet

More Decks by kilometer

Other Decks in Programming

Transcript

  1. #88
    2020.09.19
    Narrative of iris data
    kilometer00

    View full-size slide

  2. Who!?
    誰だ?

    View full-size slide

  3. Who!?
    Name: @kilometer
    Job: Post-Doc (Ph. D. in Engineering)
    Field: Behavioral Neurosci.
    Brain Imaging
    Medical System
    R: ~ 10 years

    View full-size slide

  4. Introduction of Iris data

    View full-size slide

  5. > head(iris)
    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    1 5.1 3.5 1.4 0.2 setosa
    2 4.9 3.0 1.4 0.2 setosa
    3 4.7 3.2 1.3 0.2 setosa
    4 4.6 3.1 1.5 0.2 setosa
    5 5.0 3.6 1.4 0.2 setosa
    6 5.4 3.9 1.7 0.4 setosa
    Iris Data
    > str(iris)
    'data.frame': 150 obs. of 5 variables:
    $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
    $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
    $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
    $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
    $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 ...

    View full-size slide

  6. library(tidyverse)
    iris_long <-
    iris %>%
    pivot_longer(cols = -Species,
    names_sep = "⧵⧵.",
    names_to = c("key", ".value"))
    > iris_long
    ## # A tibble: 6 x 4
    ## Species key Length Width
    ##
    ## 1 setosa Sepal 5.1 3.5
    ## 2 setosa Petal 1.4 0.2
    ## 3 setosa Sepal 4.9 3
    ## 4 setosa Petal 1.4 0.2
    ## 5 setosa Sepal 4.7 3.2
    Iris Data

    View full-size slide

  7. ggplot(data = iris_long) +
    aes(x = Width, y = Length,
    color = Species, shepe = key) +
    geom_point()
    Iris Data

    View full-size slide

  8. “R for Data Science” (Wickham & Grolemund, 2017)

    View full-size slide

  9. “R for Data Science” (Wickham & Grolemund, 2017)
    Data

    View full-size slide

  10. “R for Data Science” (Wickham & Grolemund, 2017)
    Data Hypothesis & observation Objectives
    Background

    View full-size slide

  11. “R for Data Science” (Wickham & Grolemund, 2017)
    Data Hypothesis & observation Objectives
    Background

    View full-size slide

  12. Iris flowers

    View full-size slide

  13. Iris flowers
    Northern Blue flag (Iris versicolor)
    Gordon, D. & Robertson, E., from Wikipedia, CC BY-SA 3.0

    View full-size slide

  14. Iris setosa var. canadensis
    Iris setosa var. interior
    Iris setosa
    Iris versicolor
    Iris virginica
    Iris virginica var. Shrevei
    Iris flowers
    (Morphological classification of northern and sub-artic blue flags)
    Anderson, E., 1936, Ann Mo Bot Gard.

    View full-size slide

  15. Iris setosa var. canadensis
    The species problem
    Specific name
    (種⼩名)
    Genus
    (属)
    Variety
    (変種)
    Species are groups of actually or potentially
    interbreeding natural populations, which are
    reproductively isolated from other such groups.
    Queiroz, K., 2005, PNAS
    Mayr, E., 1942, Columbia Univ. Press
    ---- by Ernst Mayr, 1942

    View full-size slide

  16. The species problem
    Queiroz, K., 2005, PNAS
    SC (species criterion) 1-8:
    the times at which the daughter lineages
    acquire different properties relative to
    one another

    View full-size slide

  17. "The species Problem in Iris"
    ---- by Edgar Anderson, 1936
    Anderson, E., 1936, Ann Mo Bot Gard.
    As a biological phenomenon the species problem is
    worthy of serious study as an end in itself.

    View full-size slide

  18. Iris setosa
    Iris versicolor
    Iris virginica
    minutely papillate at the base of blade
    macroscopically pubescent at the base of blade
    Petals
    setose
    laminate
    Sepals
    Iris flowers
    (Morphological classification of northern and sub-artic blue flags)
    Anderson, E., 1936, Ann Mo Bot Gard.
    (花弁)
    (萼⽚)
    (剛⽑を有する)
    (滑らか)
    (基部に⾁眼で観察できる軟⽑を有する)
    (基部は細かな乳頭状)

    View full-size slide

  19. The other Irises

    View full-size slide

  20. The botanical Garden' detailing plants brought to Egypt after the
    campaigns of Tuthmosis III (around 1426 B.C.), Karnak Temple
    Farrar, L., 2016, Windgather Press, photo: en.wikipedia.org/wiki/Iris_albicans
    Iris albicans

    View full-size slide

  21. Iris, a Greek goddess
    ・Daughter of
    Taumas (sun of Pontus) &
    Electra (daughter of Oceanus)
    ・Messenger of Hera
    ・Goddess of the rainbow
    ・Goddess of the sky and sea
    Rainbow: bridge between Heaven and Earth
    (in ancient Greek, iris = rainbow, eiris = messenger)
    Koudu, H., 1953, Iwanami

    View full-size slide

  22. photo: https://www.theoi.com/Gallery/P21.6.html
    Hera & Iris (ca. 480 B.C.)
    kerykeion
    oinochoe jug
    wings
    skkos
    Hera Iris

    View full-size slide

  23. Figures: en.wikipedia.org/wiki/Iris_(anatomy)
    Iris in anatomy
    Iris(虹彩)

    View full-size slide

  24. Iris, as a symbol
    Fleur-de-lis
    photos: en.wikipedia.org/wiki/Fleur-de-lis, wiki/Iris_pseudacorus, wiki/Iris_florentina
    I. pseudacorus I. florentina

    View full-size slide

  25. &ODPEF
    "QQMF
    3FBM
    "QQMF
    *OGPSNBUJPO
    %FDPEF

    View full-size slide

  26. %JWFSHFODF
    3FBM
    *OGP
    %BUB "QQMF
    &ODPEJOH

    View full-size slide

  27. -PTT͛
    Symbol grounding problem
    %JWFSHFODF
    3FBM
    *OGP
    %BUB "QQMF
    &ODPEJOH

    View full-size slide

  28. Anderson's Iris study

    View full-size slide

  29. "The species Problem in Iris"
    ---- by Edgar Anderson, 1936
    Symbol grounding problem

    View full-size slide

  30. Iris setosa
    Iris versicolor
    Iris virginica
    minutely papillate at the base of blade
    macroscopically pubescent at the base of blade
    Petals
    setose
    laminate
    Sepals
    Iris flowers
    (Morphological classification of northern and sub-artic blue flags)
    Anderson, E., 1936, Ann Mo Bot Gard.
    (花弁)
    (萼⽚)
    (剛⽑を有する)
    (滑らか)
    (基部に⾁眼で観察できる軟⽑を有する)
    (基部は細かな乳頭状)

    View full-size slide

  31. Anderson, E., 1936, Ann Mo Bot Gard., towardsdatascience.com

    View full-size slide

  32. Anderson, E., 1936, Ann Mo Bot Gard.
    I. versicolor
    I. virginica
    I. virginica var. shrevei
    ideograph
    Sepal
    Petal

    View full-size slide

  33. The northern blue flags ...... study the
    minutae of variation so intensively in these
    two species that one might demonstrate
    the way in which one species had evolved
    from the other, or from some common
    ancestor.
    Iris versicolor might vary greatly and
    that Iris virginica might vary greatly
    but that each remained itself. ......
    The variation within could never be
    compounded into the variation between.
    Q.
    A.

    View full-size slide

  34. The iris data

    View full-size slide

  35. “R for Data Science” (Wickham & Grolemund, 2017)
    Data Hypothesis & observation Objectives
    Background

    View full-size slide

  36. Fisher, R. A., 1936, Annals of Eugenics
    I. 判別関数
    2つ以上の集団がx1
    , ....,x8
    で測定されているとして、集団が最もよ
    く識別される線形関数を求めることに関⼼がある。著者の提案は
    (a)......、および(b) ......など、頭蓋測定において既に⾏われている
    最も明確に進歩的または世俗的な傾向を⽰す。本論⽂では、同じ
    原理の応⽤を分類学的な問題に例⽰し、採⽤されたプロセスの精
    度に関連したいくつかの問題についても議論する。

    View full-size slide

  37. Fisher, R. A., 1936, Annals of Eugenics
    I. 判別関数
    2つ以上の集団がx1
    , ....,x8
    で測定されているとして、集団が最もよ
    く識別される線形関数を求めることに関⼼がある。著者の提案は
    (a)......、および(b) ......など、頭蓋測定において既に⾏われている
    最も明確に進歩的または世俗的な傾向を⽰す。本論⽂では、同じ
    原理の応⽤を分類学的な問題に例⽰し、採⽤されたプロセスの精
    度に関連したいくつかの問題についても議論する。
    (優⽣学年鑑)

    View full-size slide

  38. Fisher, R. A., 1936, Annals of Eugenics
    I. 判別関数
    2つ以上の集団がx1
    , ....,x8
    で測定されているとして、集団が最もよ
    く識別される線形関数を求めることに関⼼がある。著者の提案は
    (a)......、および(b) ......など、頭蓋測定において既に⾏われている
    最も明確に進歩的または世俗的な傾向を⽰す。本論⽂では、同じ
    原理の応⽤を分類学的な問題に例⽰し、採⽤されたプロセスの精
    度に関連したいくつかの問題についても議論する。
    (優⽣学年鑑)

    View full-size slide

  39. -- Publisher's comment --
    The work of eugenicists was often pervaded by
    prejudice against racial, ethnic and disabled groups.
    (優⽣学者の仕事は時として⼈種・⺠族・障害者グルー
    プに対する偏⾒が蔓延していた。)
    Publication of this material online is for scholarly
    research purposes is not an endorsement or
    promotion of the views expressed in any of these
    articles or eugenics in general.
    (この資料をオンラインで公開するのは学術研究を⽬的
    としたものであり、これらの論⽂や優⽣学⼀般の⾒解
    を⽀持したり宣伝したりするものではない。)

    View full-size slide

  40. Num. of citation is one of the most popular
    index of scientific research impact.
    Do you REALLY want to give
    this paper any more impact?

    View full-size slide

  41. When you use the iris data,
    you also become one of the
    characters in its narrative.

    View full-size slide

  42. Is your iris "the iris"?

    View full-size slide

  43. Bezdek, J. C. et al., 1999, IEEE Transactions on Fuzzy Systems
    "We do not guarantee that all the results
    we discuss for “the” Iris data really
    pertain to the same numerical inputs."

    View full-size slide

  44. Bezdek, J. C. et al., 1999, IEEE Transactions on Fuzzy Systems
    Specifically, two vectors in Iris Sestosa were
    wrong: vector 35 in Fisher is (4.9, 3.1, 1.5, 0.2),
    but in the machine learning electronic database
    it had the coordinates (4.9, 3.1, 1.5, 0.1); and
    vector 38 in Fisher is (4.9, 3.6, 1.4, 0.1), but in
    the electronic database it was (4.9, 3.1, 1.5, 0.1).

    View full-size slide

  45. http://archive.ics.uci.edu/ml/datasets/Iris

    View full-size slide

  46. http://archive.ics.uci.edu/ml/datasets/Iris

    View full-size slide

  47. "Better yet (and we know many of you will
    check our version this way), return to the
    source and take the values directly from
    Fisher’s paper."
    Bezdek, J. C. et al., 1999, IEEE Transactions on Fuzzy Systems

    View full-size slide

  48. "Better yet (and we know many of you will
    check our version this way), return to the
    source and take the values directly from
    Fisher’s paper."
    Bezdek, J. C. et al., 1999, IEEE Transactions on Fuzzy Systems
    -> Or, stop using the Iris data.

    View full-size slide

  49. https://allisonhorst.github.io/palmerpenguins/

    View full-size slide

  50. 1.
    Data has always its own narrative.
    Data Hypothesis & observation Objectives
    Background

    View full-size slide

  51. 2.
    Anderson, E., "Species problem in iris.", 1936
    Fisher, R., Annals of Eugenic, 1936
    Should not be cited any more,
    because it is one measure of scientific impact

    View full-size slide

  52. Original
    Miscopy
    3.
    There are several miscopy version of the "iris".

    View full-size slide

  53. 4.
    Community movement

    View full-size slide

  54. The only way to stop citing Fisher's paper is to
    not use iris data. That would solve the other
    annoying problem of checking for miscopying.
    Don't forget when you use the iris data you
    also become one of the characters in its
    narrative.
    We can start stopping the use of iris data today.
    Actually, it's quite easy.
    5. My opinion

    View full-size slide