Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Basic Steps in Working with Data

Jih-shien Lu
May 02, 2015
100

Basic Steps in Working with Data

A section from chapter 5 of "The Data Journalism Handbook"

Jih-shien Lu

May 02, 2015
Tweet

Transcript

  1. 我⾃自⼰己學⽤用 GA 的經驗 • Different business objectives • Drive product

    sales • Drive contact form submissions • Encourage engagement & awareness • Encourage frequent visitation • Provide information quickly • Different objective needs different metrics
  2. Step 1: Know the Questions You Want to Answer •

    Work BACKWARD i. list the data-evidenced statements you want to make ii. decide which variables and records to acquire
  3. Step 2: Clean Messy Data • Quick way to look

    for messiness: create frequency tables of categorical variables • Excel • Filter • Pivot Tables • Demo? • Standardize into a shorter list of possibilities • Open Refine (formerly Google Refine) - http://openrefine.org/
  4. Data dictionary • how the data file is formatted •

    explain the codes being used by particular variables
  5. In reality, people find ways to workaround design constraints (data

    dictionary) to fulfill their uncommon needs not preconceived at design time.
  6. Example • A analysis by Miami Herald • “the varying

    rates of punishment that different judges were giving to drunk-driving people” • 1%~2% no punishment? • That’s against the state law (that required anyone convicted of drunk-driving be punished).
  7. Step 3: Resolve Undocumented Features in Data • always ask

    the source if there are any undocumented usage in the data • always examine the results of your analysis and see if they make sense
  8. Recap: Basic Steps in Working w/ Data 1. Know the

    Questions You Want to Answer 2. Clean Messy Data 3. Resolve Undocumented Features