A Blocks-Based Introduction to Text Analysis

Institute for Software Integrated Systems Vanderbilt University A Blocks-Based Introduction
to Text Analysis Brian Broll [email protected] Clifford Anderson, Sarah Burriss, Corey Brady, Mark Schoenfield

Meet the Team 2 Sarah Burriss Doctoral Student Department of
Teaching & Learning, Vanderbilt University Mark Schoenfield, Professor of English, Vanderbilt University And me, Corey Brady Assistant Professor Learning Sciences, Vanderbilt Brian Broll Research Scientist, Vanderbilt University

Research Questions: an origin A Culture of Litigation: 1765-1835 A
project underway on desktops (wooden and virtual) and on library shelves 3

Research Questions 4

Research Questions ▪ Could a systematic computer-assisted analysis of the
4m articles in BP supplement a close-reading analysis of a subsection? ▪ Could it improve the selection of that subsection? ▪ Could it clarify or help discover new questions? 1. Cross-examination as a popularized term, as well as evidence of public attitudes toward it 2. Developing respect for juries; and the tension between law/facts paralleling that between judge/jury 3. Generalization and discussions of the term jury and its relationship to citizenship and the public versus the people argument 4. Judges lawyers and other agents of the court as celebrity figures. As a subsection celebrity figures as a phenomenon. The wit and witicisms of lawyers, their public reputations 5

Institutional Context 6

Motivating Question Can we use blocks-based programming (via NetsBlox) to
aid both in these research questions and students’ access to text analytical concepts? 7

Brief Intro to NetsBlox ▪ NetsBlox is an extension of
Snap! which provides many new features such as: ▪ Networking Capabilities ▪ Undo Capabilities ▪ Collaborative Editing ▪ Shared Projects ▪ Sharing libraries ▪ One of the new networking concepts is Remote Procedure Calls which enables users to invoke code implemented remotely. Examples include: ▪ Google Maps ▪ Cloud Variables 8

Text Analysis in NetsBlox ▪ We explored a number of
different questions within NetsBlox pertaining to learning text analysis concepts: ▪ Can we enable the students to interactively probe machine learning models to learn about their strengths and weaknesses? Can we hypothesize about the causes based on this interaction? ▪ Sentimental Writer Example ▪ Can we introduce students to word embeddings? ▪ Word Embeddings Example ▪ Can we enable students to train their own word embeddings? ▪ Training Word Embeddings on Middlemarch 9

Thank you! 10

Appendix Additional information about presented topics can be found in
this section. 11

FAQ ▪ I can’t find the “TextAnalysis” category in my
services? ▪ These are private, auxiliary services and must be explicitly enabled for individual users or groups ▪ For more information, check out Services Overview 12

Text Analysis Exercises This subsection contains more details about the
motivation, goals, and discussion topics for each of the presented projects. 13

Probing Existing Models ▪ Motivation: Interaction with machine learning models
for text analysis could facilitate a better understanding of strengths and weakness. ▪ Question: Can we enable the students to interactively probe machine learning models to form hypotheses about their shortcomings? ▪ Approach: Using the ParallelDots Service, create a typewriter which color-codes the text based on the sentiment. Then explore the predictions made by ParallelDots by writing with the typewriter! 14

Probing Existing Models ▪ Student Questions: ▪ Can I fool
the model? ▪ What if I use historic text? ▪ What if I use long sentences? ▪ What if I use unnatural punctuation? ▪ Does it care if use ALL CAPS? ▪ Discussion Questions: ▪ Why do I think ____ fools the model (or doesn’t)? ▪ What is the impact of training data on the resultant model? Is this an error with the training itself or the data? 15

Word Embeddings ▪ Motivation: Word embeddings can be trained in
an unsupervised way (ie, we don’t need to label the training data by hand) so they are a good candidate for exploration of the 4+ million documents of interest. ▪ Question: Can we introduce students to word embeddings so they can understand how it might be able to be used to support or reject hypotheses? ▪ Approach: Use the WordEmbeddings service (only available to members of the class) to enable students to retrieve pre-trained word embeddings. ▪ Important Concepts: Vectors, Vector Spaces, Cosine Similarity, Euclidean Distance 16

Training Word Embeddings ▪ Motivation: Training word embeddings could enable
students to find quantitative evidence for hypotheses about the data. ▪ Question: Can we enable students to train their own word embeddings from scratch within NetsBlox? ▪ Approach: Using the Word2Vec and Datasets services, students can incrementally build datasets and then train word embeddings on the dataset. 17

Training Word Embeddings ▪ Student Questions: ▪ Can I train
a language model using the Middlemarch text? ▪ Should I trust the model? ▪ Discussion Questions: ▪ Can I trust the results of the model? ▪ Did the model have enough data? How can I check that the results are not due to the random initialization of the model? 18

A Blocks-Based Introduction to Text Analysis

A Blocks-Based Introduction to Text Analysis

Brian Broll

More Decks by Brian Broll

Other Decks in Programming

Featured

Transcript

Institute for Software Integrated Systems Vanderbilt University A Blocks-Based Introduction

Meet the Team 2 Sarah Burriss Doctoral Student Department of

Research Questions: an origin A Culture of Litigation: 1765-1835 A

Research Questions 4

Research Questions ▪ Could a systematic computer-assisted analysis of the

Institutional Context 6

Motivating Question Can we use blocks-based programming (via NetsBlox) to

Brief Intro to NetsBlox ▪ NetsBlox is an extension of

Text Analysis in NetsBlox ▪ We explored a number of

Thank you! 10

Appendix Additional information about presented topics can be found in

FAQ ▪ I can’t find the “TextAnalysis” category in my

Text Analysis Exercises This subsection contains more details about the

Probing Existing Models ▪ Motivation: Interaction with machine learning models

Probing Existing Models ▪ Student Questions: ▪ Can I fool

Word Embeddings ▪ Motivation: Word embeddings can be trained in

Training Word Embeddings ▪ Motivation: Training word embeddings could enable

Training Word Embeddings ▪ Student Questions: ▪ Can I train