Slide 1

Slide 1 text

#88 2020.09.19 Narrative of iris data kilometer00

Slide 2

Slide 2 text

Who!? 誰だ?

Slide 3

Slide 3 text

Who!? Name: @kilometer Job: Post-Doc (Ph. D. in Engineering) Field: Behavioral Neurosci. Brain Imaging Medical System R: ~ 10 years

Slide 4

Slide 4 text

Introduction of Iris data

Slide 5

Slide 5 text

> head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa Iris Data > str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 ...

Slide 6

Slide 6 text

library(tidyverse) iris_long <- iris %>% pivot_longer(cols = -Species, names_sep = "⧵⧵.", names_to = c("key", ".value")) > iris_long ## # A tibble: 6 x 4 ## Species key Length Width ## ## 1 setosa Sepal 5.1 3.5 ## 2 setosa Petal 1.4 0.2 ## 3 setosa Sepal 4.9 3 ## 4 setosa Petal 1.4 0.2 ## 5 setosa Sepal 4.7 3.2 Iris Data

Slide 7

Slide 7 text

ggplot(data = iris_long) + aes(x = Width, y = Length, color = Species, shepe = key) + geom_point() Iris Data

Slide 8

Slide 8 text

“R for Data Science” (Wickham & Grolemund, 2017)

Slide 9

Slide 9 text

“R for Data Science” (Wickham & Grolemund, 2017) Data

Slide 10

Slide 10 text

“R for Data Science” (Wickham & Grolemund, 2017) Data Hypothesis & observation Objectives Background

Slide 11

Slide 11 text

“R for Data Science” (Wickham & Grolemund, 2017) Data Hypothesis & observation Objectives Background

Slide 12

Slide 12 text

Iris flowers

Slide 13

Slide 13 text

Iris flowers Northern Blue flag (Iris versicolor) Gordon, D. & Robertson, E., from Wikipedia, CC BY-SA 3.0

Slide 14

Slide 14 text

Iris setosa var. canadensis Iris setosa var. interior Iris setosa Iris versicolor Iris virginica Iris virginica var. Shrevei Iris flowers (Morphological classification of northern and sub-artic blue flags) Anderson, E., 1936, Ann Mo Bot Gard.

Slide 15

Slide 15 text

Iris setosa var. canadensis The species problem Specific name (種⼩名) Genus (属) Variety (変種) Species are groups of actually or potentially interbreeding natural populations, which are reproductively isolated from other such groups. Queiroz, K., 2005, PNAS Mayr, E., 1942, Columbia Univ. Press ---- by Ernst Mayr, 1942

Slide 16

Slide 16 text

The species problem Queiroz, K., 2005, PNAS SC (species criterion) 1-8: the times at which the daughter lineages acquire different properties relative to one another

Slide 17

Slide 17 text

"The species Problem in Iris" ---- by Edgar Anderson, 1936 Anderson, E., 1936, Ann Mo Bot Gard. As a biological phenomenon the species problem is worthy of serious study as an end in itself.

Slide 18

Slide 18 text

Iris setosa Iris versicolor Iris virginica minutely papillate at the base of blade macroscopically pubescent at the base of blade Petals setose laminate Sepals Iris flowers (Morphological classification of northern and sub-artic blue flags) Anderson, E., 1936, Ann Mo Bot Gard. (花弁) (萼⽚) (剛⽑を有する) (滑らか) (基部に⾁眼で観察できる軟⽑を有する) (基部は細かな乳頭状)

Slide 19

Slide 19 text

The other Irises

Slide 20

Slide 20 text

The botanical Garden' detailing plants brought to Egypt after the campaigns of Tuthmosis III (around 1426 B.C.), Karnak Temple Farrar, L., 2016, Windgather Press, photo: en.wikipedia.org/wiki/Iris_albicans Iris albicans

Slide 21

Slide 21 text

Iris, a Greek goddess ・Daughter of Taumas (sun of Pontus) & Electra (daughter of Oceanus) ・Messenger of Hera ・Goddess of the rainbow ・Goddess of the sky and sea Rainbow: bridge between Heaven and Earth (in ancient Greek, iris = rainbow, eiris = messenger) Koudu, H., 1953, Iwanami

Slide 22

Slide 22 text

photo: https://www.theoi.com/Gallery/P21.6.html Hera & Iris (ca. 480 B.C.) kerykeion oinochoe jug wings skkos Hera Iris

Slide 23

Slide 23 text

Figures: en.wikipedia.org/wiki/Iris_(anatomy) Iris in anatomy Iris(虹彩)

Slide 24

Slide 24 text

Iris, as a symbol Fleur-de-lis photos: en.wikipedia.org/wiki/Fleur-de-lis, wiki/Iris_pseudacorus, wiki/Iris_florentina I. pseudacorus I. florentina

Slide 25

Slide 25 text

Iris Encode

Slide 26

Slide 26 text

Ramen Encode

Slide 27

Slide 27 text

&ODPEF "QQMF 3FBM "QQMF *OGPSNBUJPO %FDPEF

Slide 28

Slide 28 text

%JWFSHFODF 3FBM *OGP %BUB "QQMF &ODPEJOH

Slide 29

Slide 29 text

-PTT͛ Symbol grounding problem %JWFSHFODF 3FBM *OGP %BUB "QQMF &ODPEJOH

Slide 30

Slide 30 text

Anderson's Iris study

Slide 31

Slide 31 text

"The species Problem in Iris" ---- by Edgar Anderson, 1936 Symbol grounding problem

Slide 32

Slide 32 text

Iris setosa Iris versicolor Iris virginica minutely papillate at the base of blade macroscopically pubescent at the base of blade Petals setose laminate Sepals Iris flowers (Morphological classification of northern and sub-artic blue flags) Anderson, E., 1936, Ann Mo Bot Gard. (花弁) (萼⽚) (剛⽑を有する) (滑らか) (基部に⾁眼で観察できる軟⽑を有する) (基部は細かな乳頭状)

Slide 33

Slide 33 text

Anderson, E., 1936, Ann Mo Bot Gard., towardsdatascience.com

Slide 34

Slide 34 text

Anderson, E., 1936, Ann Mo Bot Gard. I. versicolor I. virginica I. virginica var. shrevei ideograph Sepal Petal

Slide 35

Slide 35 text

The northern blue flags ...... study the minutae of variation so intensively in these two species that one might demonstrate the way in which one species had evolved from the other, or from some common ancestor. Iris versicolor might vary greatly and that Iris virginica might vary greatly but that each remained itself. ...... The variation within could never be compounded into the variation between. Q. A.

Slide 36

Slide 36 text

The iris data

Slide 37

Slide 37 text

> ?iris

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

“R for Data Science” (Wickham & Grolemund, 2017) Data Hypothesis & observation Objectives Background

Slide 42

Slide 42 text

Fisher, R. A., 1936, Annals of Eugenics I. 判別関数 2つ以上の集団がx1 , ....,x8 で測定されているとして、集団が最もよ く識別される線形関数を求めることに関⼼がある。著者の提案は (a)......、および(b) ......など、頭蓋測定において既に⾏われている 最も明確に進歩的または世俗的な傾向を⽰す。本論⽂では、同じ 原理の応⽤を分類学的な問題に例⽰し、採⽤されたプロセスの精 度に関連したいくつかの問題についても議論する。

Slide 43

Slide 43 text

Fisher, R. A., 1936, Annals of Eugenics I. 判別関数 2つ以上の集団がx1 , ....,x8 で測定されているとして、集団が最もよ く識別される線形関数を求めることに関⼼がある。著者の提案は (a)......、および(b) ......など、頭蓋測定において既に⾏われている 最も明確に進歩的または世俗的な傾向を⽰す。本論⽂では、同じ 原理の応⽤を分類学的な問題に例⽰し、採⽤されたプロセスの精 度に関連したいくつかの問題についても議論する。 (優⽣学年鑑)

Slide 44

Slide 44 text

Fisher, R. A., 1936, Annals of Eugenics I. 判別関数 2つ以上の集団がx1 , ....,x8 で測定されているとして、集団が最もよ く識別される線形関数を求めることに関⼼がある。著者の提案は (a)......、および(b) ......など、頭蓋測定において既に⾏われている 最も明確に進歩的または世俗的な傾向を⽰す。本論⽂では、同じ 原理の応⽤を分類学的な問題に例⽰し、採⽤されたプロセスの精 度に関連したいくつかの問題についても議論する。 (優⽣学年鑑)

Slide 45

Slide 45 text

-- Publisher's comment -- The work of eugenicists was often pervaded by prejudice against racial, ethnic and disabled groups. (優⽣学者の仕事は時として⼈種・⺠族・障害者グルー プに対する偏⾒が蔓延していた。) Publication of this material online is for scholarly research purposes is not an endorsement or promotion of the views expressed in any of these articles or eugenics in general. (この資料をオンラインで公開するのは学術研究を⽬的 としたものであり、これらの論⽂や優⽣学⼀般の⾒解 を⽀持したり宣伝したりするものではない。)

Slide 46

Slide 46 text

Num. of citation is one of the most popular index of scientific research impact. Do you REALLY want to give this paper any more impact?

Slide 47

Slide 47 text

When you use the iris data, you also become one of the characters in its narrative.

Slide 48

Slide 48 text

Is your iris "the iris"?

Slide 49

Slide 49 text

Bezdek, J. C. et al., 1999, IEEE Transactions on Fuzzy Systems "We do not guarantee that all the results we discuss for “the” Iris data really pertain to the same numerical inputs."

Slide 50

Slide 50 text

Bezdek, J. C. et al., 1999, IEEE Transactions on Fuzzy Systems Specifically, two vectors in Iris Sestosa were wrong: vector 35 in Fisher is (4.9, 3.1, 1.5, 0.2), but in the machine learning electronic database it had the coordinates (4.9, 3.1, 1.5, 0.1); and vector 38 in Fisher is (4.9, 3.6, 1.4, 0.1), but in the electronic database it was (4.9, 3.1, 1.5, 0.1).

Slide 51

Slide 51 text

http://archive.ics.uci.edu/ml/datasets/Iris

Slide 52

Slide 52 text

http://archive.ics.uci.edu/ml/datasets/Iris

Slide 53

Slide 53 text

No content

Slide 54

Slide 54 text

"Better yet (and we know many of you will check our version this way), return to the source and take the values directly from Fisher’s paper." Bezdek, J. C. et al., 1999, IEEE Transactions on Fuzzy Systems

Slide 55

Slide 55 text

"Better yet (and we know many of you will check our version this way), return to the source and take the values directly from Fisher’s paper." Bezdek, J. C. et al., 1999, IEEE Transactions on Fuzzy Systems -> Or, stop using the Iris data.

Slide 56

Slide 56 text

Movements

Slide 57

Slide 57 text

No content

Slide 58

Slide 58 text

No content

Slide 59

Slide 59 text

No content

Slide 60

Slide 60 text

No content

Slide 61

Slide 61 text

Penguins?

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text

https://allisonhorst.github.io/palmerpenguins/

Slide 64

Slide 64 text

Summary

Slide 65

Slide 65 text

1. Data has always its own narrative. Data Hypothesis & observation Objectives Background

Slide 66

Slide 66 text

2. Anderson, E., "Species problem in iris.", 1936 Fisher, R., Annals of Eugenic, 1936 Should not be cited any more, because it is one measure of scientific impact

Slide 67

Slide 67 text

Original Miscopy 3. There are several miscopy version of the "iris".

Slide 68

Slide 68 text

4. Community movement

Slide 69

Slide 69 text

The only way to stop citing Fisher's paper is to not use iris data. That would solve the other annoying problem of checking for miscopying. Don't forget when you use the iris data you also become one of the characters in its narrative. We can start stopping the use of iris data today. Actually, it's quite easy. 5. My opinion

Slide 70

Slide 70 text

Enjoy!!! KTM