Tony Ojeda - Human-Machine Collaboration for Improved Analytical Processes

HUMAN + MACHINE COLLABORATION FOR IMPROVED ANALYTICAL PROCESSES

TONY OJEDA (ME) • Data Scientist @ Follett • Founder
@ District Data Labs • Co-Author • Applied Text Analysis with Python (O’Reilly, Fall 2017) • Practical Data Science Cookbook (Packt, Fall 2014) • Conference Speaker • Data Day Seattle 2016 • PyData - Carolinas & DC 2016

SOME BACKGROUND…

RECENT AI HEADLINES • Can The Rise Of Artificial Intelligence
End Humanity? • Will Artificial Intelligence Leave You Jobless? • Essential Skills To Keep Your Job In The Era Of Artificial Intelligence • Artificial intelligence probably won't kill you, but it could take your job

How can we combine human & machine abilities to produce
better outcomes than either could on their own?

Not Human vs. Machine But Human + Machine

DESIGNING COLLABORATIVE ANALYTICAL PROCESSES

WHAT IS AN ANALYTICAL PROCESS? • A series of tasks
for ingesting, transforming, analyzing, modeling, or visualizing data. Ingestion Wrangling Analysis Modeling Visualization Data Science Pipeline

DECONSTRUCTING A PROCESS Steps Tasks Process

What types of tasks are humans better at? What types
of tasks are machines better at?

TYPES OF TASKS HUMANS ARE GOOD AT • Sensory Tasks
• Social/Language/Communication Tasks • General or Domain Knowledge Tasks • Tasks Requiring Flexibility, Adaptability, or Creativity • Exploratory or Investigative Tasks

TYPES OF TASKS MACHINES ARE GOOD AT • Tasks Where
Precision is Important • Tasks that Require Processing Vast Amounts of Information • Memory and Recollection Tasks • Repetitive Tasks Where Consistency is Important

DESIGNING COLLABORATIVE PROCESSES • Deconstruct the process into tasks and
steps. • Determine which steps should be performed by the human and which should be performed by the machine. • Identify the points of interaction and ensure those are intuitive.

THE INTERFACE IS IMPORTANT

COLLABORATIVE DATA EXPLORATION

DATA EXPLORATION FRAMEWORK Prep Phase Explore Phase

CREATE: CATEGORY AGGREGATIONS Categorical variables with a lot of categories
(ex. more than 10) Distill down into fewer categories

CATEGORY AGGREGATION REQUIREMENTS • Identification of categorical variables and unique
values • Natural language understanding • General and/or domain knowledge • Similarity in meaning • Sometimes creativity

CREATE: CONTINUOUS BINS Very Low Low Moderate High Very High
Identify continuous variables Assign them to buckets or bins based on how high or low their values are.

BINNING REQUIREMENTS • Identification of continuous variables • Comparison, ordering,
and segregation • Knowing whether higher or lower values are better • Meaningful naming of resulting categories

CONTINUOUS BINNING EXAMPLE import pandas as pd import numpy as
np numeric_cols = data.select_dtypes(include=[np.number]).columns.values for column in numeric_cols: quint_levels = ['Very Low', 'Low','Moderate', 'High', 'Very High'] data[column + ' Level'] = pd.qcut(data[column], 5, quint_levels) data[column + ' Decile'] = pd.qcut(data[column], 10, range(1,11)) data[column + ' Perc'] = pd.qcut(data[column],100, range(1,101))

CREATE: CLUSTER CATEGORIES

CLUSTERING REQUIREMENTS • Identification of numeric variables • Clustering similar
records together • Determining quality and appropriate numbers of clusters • Meaningful naming of resulting categories

DATA EXPLORATION FRAMEWORK Prep Phase Explore Phase

EXPLORE: FILTER + AGGREGATE

FILTER + AGGREGATE REQUIREMENTS • Identifying categorical and numeric variables.
• Filtering/sub-setting the data set by categories. • Aggregating on categories and calculation of numeric fields. • Interpreting results and determining what is useful.

EXPLORE: FIELD RELATIONSHIPS

FIELD RELATIONSHIP REQUIREMENTS • Identifying numeric fields. • Comparing cross-distributions
of values across all combinations of numeric fields. • Identifying existence, direction, strength, and type of relationship. • Determining which relationships (or lack thereof) are interesting or insightful.

EXPLORE: ENTITY RELATIONSHIPS

GRAPH ANALYSIS REQUIREMENTS • Identifying hierarchical entity levels in the
data. • Identifying similarities and strength of similarities between entities. • Identifying clusters, communities, sub-networks and other important groupings within the network. • Interpreting those relationships and what they mean in the real world.

KEY TAKE-AWAYS • Human machine collaboration is important and very
useful. • We can design these processes via deconstruction into tasks and steps. • Pay special attention to the interfaces. • There is plenty of room for development and advancement in this area, and Python already contains a lot of the tools we need to make progress.

WHERE TO LEARN MORE & GET INVOLVED • Blog: blog.districtdatalabs.com
• Cultivar: github.com/DistrictDataLabs/cultivar • Yellowbrick: github.com/DistrictDataLabs/yellowbrick • Twitter: @tonyojeda3 • LinkedIn: linkedin.com/in/tonyojeda

THANK YOU!

Tony Ojeda - Human-Machine Collaboration for Im...

Tony Ojeda - Human-Machine Collaboration for Improved Analytical Processes

PyCon 2017

More Decks by PyCon 2017

Other Decks in Programming

Featured

Transcript

HUMAN + MACHINE COLLABORATION FOR IMPROVED ANALYTICAL PROCESSES

TONY OJEDA (ME) • Data Scientist @ Follett • Founder

SOME BACKGROUND…

RECENT AI HEADLINES • Can The Rise Of Artificial Intelligence

How can we combine human & machine abilities to produce

Not Human vs. Machine But Human + Machine

DESIGNING COLLABORATIVE ANALYTICAL PROCESSES

WHAT IS AN ANALYTICAL PROCESS? • A series of tasks

DECONSTRUCTING A PROCESS Steps Tasks Process

What types of tasks are humans better at? What types

TYPES OF TASKS HUMANS ARE GOOD AT • Sensory Tasks

TYPES OF TASKS MACHINES ARE GOOD AT • Tasks Where

DESIGNING COLLABORATIVE PROCESSES • Deconstruct the process into tasks and

THE INTERFACE IS IMPORTANT

COLLABORATIVE DATA EXPLORATION

DATA EXPLORATION FRAMEWORK Prep Phase Explore Phase

CREATE: CATEGORY AGGREGATIONS Categorical variables with a lot of categories

CATEGORY AGGREGATION REQUIREMENTS • Identification of categorical variables and unique

CREATE: CONTINUOUS BINS Very Low Low Moderate High Very High

BINNING REQUIREMENTS • Identification of continuous variables • Comparison, ordering,

CONTINUOUS BINNING EXAMPLE import pandas as pd import numpy as

CREATE: CLUSTER CATEGORIES

CLUSTERING REQUIREMENTS • Identification of numeric variables • Clustering similar

DATA EXPLORATION FRAMEWORK Prep Phase Explore Phase

EXPLORE: FILTER + AGGREGATE

FILTER + AGGREGATE REQUIREMENTS • Identifying categorical and numeric variables.

EXPLORE: FIELD RELATIONSHIPS

FIELD RELATIONSHIP REQUIREMENTS • Identifying numeric fields. • Comparing cross-distributions

EXPLORE: ENTITY RELATIONSHIPS

GRAPH ANALYSIS REQUIREMENTS • Identifying hierarchical entity levels in the

KEY TAKE-AWAYS • Human machine collaboration is important and very

WHERE TO LEARN MORE & GET INVOLVED • Blog: blog.districtdatalabs.com

THANK YOU!