Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Exploratory: Introduction to Factor for Handlin...

Kan Nishida
August 30, 2019

Exploratory: Introduction to Factor for Handling Ordered Categorical Data

Factor is one of the data types in R and it is designed to address typical challenges with categorical data. With Factor data type, we can set the order for the categorical values and manipulate the order based on your needs with a series of convenient functions.

In this seminar, Kan will introduce Factor data type and show how to manage the order in Exploratory.

Kan Nishida

August 30, 2019
Tweet

More Decks by Kan Nishida

Other Decks in Science

Transcript

  1. Kan Nishida co-founder/CEO Exploratory Summary Beginning of 2016, launched Exploratory,

    Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams for building various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker
  2. Data Science is not just for Engineers and Statisticians. Exploratory

    makes it possible for Everyone to do Data Science. The Third Wave
  3. First Wave Second Wave Third Wave Proprietary Open Source UI

    & Programming Programming 2016 2000 1976 Monetization Commoditization Democratization Statisticians Data Scientists Democratization of Data Science Algorithms Experience Tools Open Source UI & Automation Business Users Theme Users
  4. Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling Visualization

    Analytics (Statistics / Machine Learning) Exploratory Data Analysis
  5. Factor • For Ordinal Data (Categorical Data with Order) Columns

    • Set Levels Explicitly • Manipulate Levels • Many Statistical Models rely on ‘Base Level’ of Factor
  6. 12 Data Type in General Data Type in R /

    Exploratory Numerical numeric, Integer Categorical character Ordinal factor Logical logical Date, Time Date, POSIXct
  7. Categorical California Texas New York Florida Oregon • No continuous

    relationship • Limited Set of Values • Ordinal relationship is NOT necessary
  8. Ordinal Really Bad Bad Neutral Good Really Good 1 2

    3 4 5 But, there is an inherent ordinal relationship.
  9. Category Really Bad Bad Neutral Good Really Good Character Category

    Level Really Bad 1 Bad 2 Neutral 3 Good 4 Really Good 5 Factor vs.
  10. When do we need it? • Visualization • Window Calculation

    with Chart • Binning • Statistical Model with Categorical Predictors
  11. forcats functions • fct_relevel • fct_inorder • fct_infreq • fct_reorder

    • fct_rev (reverse) • fct_lump • and others…
  12. fct_relevel(`Online learning platforms and MOOCs`, "Much worse", "Slightly worse", "Neither

    better nor worse", "Slightly better", "Much better") List up all the values that you want to set the levels explicitly for.
  13. Category Much Worse Slightly Worse Neither better or worse Slightly

    Better Much Better No Opinions Category Level Much Worse 1 Slightly Worse 2 Neither better or worse 3 Slightly Better 4 Much Better 5 No Opinions
  14. Category Much Worse Slightly Worse Neither better or worse Slightly

    Better Much Better No Opinions Category Level Much Worse 1 Slightly Worse 2 Neither better or worse 3 Slightly Better 4 Much Better 5 No Opinions 6 The ones you didn’t set will be added after in an alphabetical order.
  15. Notice that we are not sorting inside the chart. The

    countries are sorted according to the Factor order.
  16. What if we want to set the level based on

    Sales Amount, NOT based on Frequency (Number of Orders)?
  17. Category Research Director Healthcare Rep Human Resources Laboratory Technician Manager

    Manufacturing Director Category Healthcare Rep Human Resources Laboratory Technician Manager Manufacturing Director Research Director
  18. Category Healthcare Rep Human Resources Laboratory Technician Manager Manufacturing Director

    Research Director Category Level Healthcare Rep Human Resources Laboratory Technician Manager Manufacturing Director Research Director 1 Character Factor Set Research Director as the 1st Level.
  19. Category Level Research Director 1 Healthcare Rep 2 Human Resources

    3 Laboratory Technician 4 Manager 5 Manufacturing Director 6 Category Level Healthcare Rep Human Resources Laboratory Technician Manager Manufacturing Director Research Director 1 The rest of the values will be assigned the levels based on the alphabetical sorting order.
  20. Category Level Research Director 1 Healthcare Rep 2 Human Resources

    3 Laboratory Technician 4 Manager 5 Manufacturing Director 6 Category Level Healthcare Rep Human Resources Laboratory Technician Manager Manufacturing Director Research Director 1 The rest of the values will be assigned the levels based on the alphabetical sorting order.
  21. Category Level Research Director 1 Healthcare Rep 2 Human Resources

    3 Laboratory Technician 4 Manager 5 Manufacturing Director 6 Base Level
  22. With the newer version, assigning Numerical columns to X-Axis will

    automatically categorize the values (binning).
  23. I’m not familiar with Sales Executive, I want to compare

    all the Job Roles based on Research Director.