Slide 1

Slide 1 text

Introduction to Data Storytelling Rasagy Sharma Principal Information Designer, Gramener

Slide 2

Slide 2 text

At Gramener, we narrate business insights as stories

Slide 3

Slide 3 text

“Can’t I just show the data?” “Why do I need charts?”

Slide 4

Slide 4 text

How many numbers are above 100? 23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79

Slide 5

Slide 5 text

How many numbers are below 10? 23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79

Slide 6

Slide 6 text

Which quadrant has the highest total? 23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79

Slide 7

Slide 7 text

Answer them again, with design changes… 7

Slide 8

Slide 8 text

23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79 How many numbers are above 100?

Slide 9

Slide 9 text

How many numbers are below 10? 23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79

Slide 10

Slide 10 text

Which quadrant has the highest total? 23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79

Slide 11

Slide 11 text

Visually representing data helps us to see patterns in the data quickly “The greatest value of a picture is when it forces us to notice what we never expected to see.” — John Tukey 11 Datasauras dataset, animated by Autodesk Research

Slide 12

Slide 12 text

Stories have a huge impact on humans 12 Storytelling has a 30X Return on Investment Rob Walker and Joshua Glenn auctioned common items like mugs, golf balls, toys, etc. The item descriptions were stories purpose-written by 200+ contributing writers. Items that were bought for $250 sold for over $8,000 – a return of over 3,000% for storytelling! Stories are memorable and viral People remember stories. They’ll act on them. People share stories. That enables collective action. We analyze data to improve people’s decision making. For this to be effective, data stories are needed more than ever before.

Slide 13

Slide 13 text

Visual data storytelling is a critical skill for data scientists, analysts & managers 13 Share your data & analysis as data stories Whenever you share inferences from data – whether it’s as a presentation, or an email or document with your analysis, or as a dashboard – craft it as a story. This session will give you a glimpse of some of the data stories we’ve created at Gramener, and how you can make these yourself. But analysts present their work, not their message Data scientists present their analysis – what they did, and what they found. That’s not what the audience needs. Audiences need a message that tells them what to do, and why. Told in an engaging way. As a story.

Slide 14

Slide 14 text

We’ve been telling stories with data for a long time… 14

Slide 15

Slide 15 text

…but the overload of data in today’s age makes this critical “Every second of every day, our senses bring in way too much data than we can possibly process in our brains.” – Peter Diamandis Data Storytelling helps make sense of this data 15 Data never sleeps Infographic by Domo

Slide 16

Slide 16 text

With the growth of self-service BI, 85% of companies have lost track of how many dashboards they generated What QUESTION does the dashboard answer? Is the ANSWER evident from the dashboard? What ACTION should the user take now? BUT 3 THINGS ARE UNCLEAR ON MOST DASHBOARDS 16

Slide 17

Slide 17 text

This is a dataset (1975 – 1990) that has been around for several years and has been studied extensively. Yet, a visualization can reveal patterns that are neither obvious nor well known. For example, • Are birthdays uniformly distributed? • Do doctors or parents exercise the C-section option to move dates? • Is there any day of the month that has unusually high or low births? • Are there any months with relatively high or low births? Very high births in September. But this is fairly well known. Most conceptions happen during the winter holiday season Relatively few births during the Christmas and Thanksgiving holidays, as well as New Year and Independence Day. Most people prefer not to have children on the 13th of any month, given that it’s an unlucky day Some special days like April Fool’s day are avoided, but Valentine’s Day is quite popular More births Fewer births … on average, for each day of the year (from 1975 to 1990) Let’s look at 15 years of US Birth Data https://gramener.com/posters/Birthdays.pdf

Slide 18

Slide 18 text

The pattern in India is quite different https://gramener.com/posters/Birthdays.pdf This is a birth date dataset that’s obtained from school admission data for over 10 million children. When we compare this with births in the US, we see none of the same patterns. For example, • Is there an aversion to the 13th or is there a local cultural nuance? • Are holidays avoided for births? • Which months have a higher propensity for births, and why? • Are there any patterns not found in the US data? Very few children are born in the month of August, and thereafter. Most births are concentrated in the first half of the year We see a large number of children born on the 5th, 10th, 15th, 20th and 25th of each month – that is, round numbered dates Such round numbered patterns a typical indication of fraud. Here, birthdates are brought forward to aid early school admission More births Fewer births … on average, for each day of the year (from 2007 to 2013)

Slide 19

Slide 19 text

This adversely impacts children’s marks https://gramener.com/posters/Birthdays.pdf It’s a well-established fact that older children tend to do better at school in most activities. Since many children have had their birth dates brought forward, these younger children suffer. The average marks of children “born” on the 1st, 5th, 10th, 15th etc.. of the month tend to score lower marks. • Are holidays avoided for births? • Which months have a higher propensity for births, and why? • Are there any patterns not found in the US data? Higher marks Lower marks … on average, for children born on a given day of the year (from 2007 to 2013) Children “born” on round numbered days score lower marks on average, due to a higher proportion of younger children

Slide 20

Slide 20 text

Who is the *best* Batsman? 20

Slide 21

Slide 21 text

Seeing the best & the worst of the best 21 Sachin R Tendulkar Sourav C Ganguly Rahul Dravid Mohammad Azharuddin Yuvraj Singh Virender Sehwag Mahendra S Dhoni Alaysinhji D Jadeja Navjot S Sidhu Gautam Gambhir Krishnamachari Srikkanth Kapil Dev Dillip B Vengsarkar Suresh K Raina Ravishankar J Shastri Sunil M Gavaskar Mohammad Kaif Virat Kohli Vinod G Kambli Vangipurappu V S Laxman Rabindra R Singh Sanjay V Manjrekar Mohinder Amarnath Manoj M Prabhakar Rohit G Sharma Irfan K Pathan Nayan R Mongia Ajit B Agarkar Dinesh Mongia Harbhajan Singh Krishna K D Karthik Sandeep M Patil Anil Kumble Yashpal Sharma Javagal Srinath Hemang K Badani Yusuf K Pathan Robin V Uthappa Raman Lamba Zaheer Khan Ravindra A Jadeja Pathiv A Patel Sadagopan Ramesh Roger M H Binny Woorkeri V Raman Sunil B Joshi Kiran S More Praveen K Amre Ashok Malhotra Chetan Sharma

Slide 22

Slide 22 text

European brewery identified €15 m cost savings after consolidating vendors 22 A leading European brewery’s plants purchased commodity raw materials from several vendors each – and had low volume discounts. Plants also placed multiple orders placed every week, leading to higher logistics cost. Gramener built a custom analytics solution that showed how each plant performed compared to peers – shaming those with poor performance. With this, they identified savings of €15 m — which the plant managers couldn’t refute. €15 m 40% savings potential identified annually vendor based reduction identified

Slide 23

Slide 23 text

Global airline reduced cargo turnaround time by 15% with scenario modeling 23 SEE LIVE DEMO A global airline company took up a service level agreement to deliver cargo from the flight to the warehouse in under 1.5 hours. This target was 15% lower than their current best. Several factors affect cargo delay across airports. Availability of forklifts, staff size, cargo type, part shipment, and many others. Altering any of these is expensive and takes long. Gramener built a visual analytics solution that showed where cargo was delayed. This allowed the airline to reduce the turnaround time by 15% from 1.76 hrs to 1.5 hrs. The worst-case turnaround time also reduced by 34% from 2.9 hrs to 1.92 hrs. 15% 34% cargo turnaround time reduction (from 1.76 to 1.5 hrs) reduction in worst-case turnaround time Evening Morning Night Fri Mon Sat Sun Thu Tue Wed FAH N70 RPP TDS ZDH 20-40% 40-60% 60-80% <20% Full Recovery times are neutral during the evening and morning shifts (mornings are slightly worse), night times are the best. Recovery times are worst on Fridays, and best on Saturdays & Wednesdays. Specifically, Friday mornings are particularly bad. So are Thursday mornings. The FAH product category has the best recovery time, while ZDH is much worse. However, RPP on Sundays is unusually slow. Part shipped products tend to perform worse than full-shipments. Specifically the <20% and 40-60% part-shipments. This is especially problematic for ZDH Product category Part shipment Weekday Shift This slide is best viewed in slideshow mode. The animations tell a story that isn’t obvious on the static version.

Slide 24

Slide 24 text

Pharma IT&SM team saves €7.8 m by reducing delay in service requests 25 SEE LIVE DEMO The IT & SM team of a large multinational wanted to understand the status of their service request delays, and the drivers of delay. This would also drive the decision on whether service management would remain in-house. We analyzed every stage of services requests and visually represented how the requests flow and how long they are stuck at different stages. The analysis showed the problem as rework – not efficiency. This reversed their strategy of optimization in favor of a better screening. €7.9 m 22% effort reduction due to reduced service management itme reduction in service request time identified

Slide 25

Slide 25 text

World Bank used data stories to clarify impact of technology on innovation 26 SEE LIVE DEMO The World Bank approached us to help communicating data stories from their economic development indicator data. Specifically: which countries have similar levels of innovation? Does technology drive innovation? Gramener collated the diverse datasets and clustered the countries based on similarity of economic indicators. We then ran a series of visual analytics that showed the impact of one on another – annotated with narrative explanations. We discovered that innovation is enabled by access to latest technology and reliance on professional management. But it does not align with appetite for entrepreneurship in high income countries. This interactive is featured on the World Bank website. 1.3 m 75% viewers read this interactive data story (as of Mar 2020) more people concluded the when shown the data story

Slide 26

Slide 26 text

Data stories can help communicate the data science process https://gramener.com/cluster/cluster-census-2011-district Poor Rural, uneducated agri workers. Young population with low income and asset ownership. Mostly in Bihar, Jharkhand, UP, MP. Breakout Rural, educated agri workers poised for skilled labor. Higher asset ownership. Parts of UP, Bihar, MP. Aspirant Regions with skilled labor pools but low purchasing power. Cusp of economic development. Mostly WB, Odisha, parts of UP Owner Regions with unskilled labor but high economic prosperity (landlords, etc..) Mostly AP, TN, parts of Karnataka, Gujarat Business Lower education but working in skilled jobs, and prosperous. Typical of business communities. Parts of Gujarat, TN, Urban UP, Punjab, etc. Rich Urban educated population working in skilled jobs. All metros, large cities, parts of Kerala, TN Skilled Poorer Richer Unskilled Skilled Uneducated Educated Uneducated Educated Unskilled Purchasing power Skilled jobs Education Poor Breakout Aspirant Owner Business Rich The 6 clusters are Previously, the client was treating contiguous regions as a homogenous entity, from a channel content perspective. To deliver targeted content, we divided India into 6 clusters based on their demographic behavior. Specifically, three composite indices were created based on the economic development lifecycle: • Education (literacy, higher education) that leads to... • Skilled jobs (in mfg. or services) that leads to... • Purchasing power (higher income, asset ownership) Districts were divided (at the average cut-off) by:

Slide 27

Slide 27 text

Data stories can lead to higher engagement & answer new questions A large FMCG organization wanted to create a visualization to review sales performance across geographies, channels and products for their board meetings. Gramener built an interactive slide deck that allowed users to drill-down within powerpoint. Dynamic presentations led to a complete revamp of the entire structure of board presentations. 28 https://gramener.com/fmcg/ Worldwide: $288 mn UK: 87.0 Stores: 34.4 Product 9: 6.2 Product 10: 5.4 Product 7: 5.1 Product 15: 4.8 Product 8: 3.1 Product 14: 2.1 Partners: 29.2 Product 15: 6.7 Product 17: 4.1 Product 6: 3.4 Product 1: 3.2 Product 7: 2.9 Product 11: 2.4 Direct: 23.5 Product 17: 5.2 Product 8: 4.4 Product 16: 4.0 Product 14: 2.5 Product 1: 2.5 Japan: 71.9 Stores: 25.9 Product 14: 6.0 Product 7: 5.4 Product 11: 4.0 Product 17: 2.8 Partners: 25.5 Product 8: 8.2 Product 11: 3.6 Product 16: 3.3 Product 1: 3.1 Product 9: 2.0 Direct: 20.5 Product 11: 5.2 Product 15: 4.5 P roduct 14: 2.8 Product 9: 2.3 China: 65.6 Partners: 27.3 Product 10: 8.0 Product 3: 7.1 Product 15: 3.0 Product 2: 2.1 Product 8: 2.0 Direct: 19.6 Product 3: 5.5 Product 2: 4.7 Product 8: 2.6 Product 17: 2.1 Stores: 18.7 Product 10: 5.4 Product 14: 2.2 Product 7: 2.1 Product 15: 2.0 India: 46.6 Stores: 17.5 Product 16: 6.8 Direct: 15.6 Product 10: 3.4 Product 16: 2.9 Product 17: 2.5 Product 7: 2.4 Partners: 13.4 Product 8: 2.5 Product 7: 2.3 US: 17.0 Partners: 6.0 Product 10: 4.4 Direct: 5.8 Product 11: 3.9 Stores: 5.3 Product 11: 3.8 Worldwide $288.0mn A: Accelerate $68.9mn B: Build $77.2mn C: Cut down $141.9mn

Slide 28

Slide 28 text

Interactive data stories with comics can turn your analysis into a fun quiz https://blog.gramener.com/data-comics-storytelling-for-business-decisions/

Slide 29

Slide 29 text

Data stories can show you what you can’t see 30 http://rasagy.in/500days/

Slide 30

Slide 30 text

Data stories can show you what you can’t see 31 http://rasagy.in/500days/

Slide 31

Slide 31 text

Data stories can spark questions that you may have never asked 32 http://rasagy.in/VisualizingTrains/

Slide 32

Slide 32 text

“So how can I make data stories?” 33

Slide 33

Slide 33 text

You have data. You have analysis. Now, narrate your story. Understand the audience & intent Find insights Storyline Design data stories

Slide 34

Slide 34 text

We use these steps to go from data to a data story: 35 Who is your audience? They determine the story What is their problem? That defines your analysis Find the right analysis to solve the problem Filter for big, useful, surprising insights Start with the takeaway. Summarize your entire story Add supporting analyses as a tree Pick a format based on how your audience will consume the story Pick a visual design based on the takeaway Annotate to explain & engage. Use four types of narratives

Slide 35

Slide 35 text

DO IT: Who is the audience for your analysis? q Role: _____________ Be specific. “Head of sales”, not “executive” q Example name: ______________ Name a real person. “Jim Fry”, not “any sales head”. Different people want different things from the same data. Given sales data: • The Board: “Predict next quarter’s sales” • Product head: “Which product grew the most?” • Sales head: “Did we meet our target?” They are not interested in each others’ questions. Who is your audience? They determine the story

Slide 36

Slide 36 text

DO IT: Write it in this structure “[Person, Role] is in [situation], and faces this [problem]. By taking [action], she can drive [impact].” Example John, the Marketing head, person, role must create a region-wise budget, situation and doesn’t know the region-wise RoI. problem By prioritizing the region, action she can maximize ROI. impact For each person, answer the following questions: 1. What’s their situation? 2. What problems do they face? 3. What action can they take? 4. What is the impact of this action? What is their problem? That defines your analysis

Slide 37

Slide 37 text

Here are three examples in real life 38 Purchasing Commodities Cargo Delay Customer Churn Person, Role Adam, the purchasing head of a leading European brewery Cris, the operations head of a leading US airline Ravi, the marketing manager of an Asian telecom company Situation had plants that purchased commodities from several vendors. Discounts were low. Number of weekly orders were high. had an SLA to deliver cargo from the flight to the warehouse in under 1.5 hours – 15% lower than their current best performance. Found that the cost of replacing customers was thrice the cost of retention. Problem But he didn’t know which plants and commodities were a problem. Every plant denied it. But she didn’t know what were the biggest drivers of this delay – people, assets, or type of cargo. But he didn’t know which customers to make offers to in order to retain them. Action By consolidating vendors and reducing order frequency, By adding resources only to the largest levers of delay, By predicting which customer was likely to churn, Impact they could increase their discounts and reduce logistics cost. she could reduce turnaround time with the lowest spend. they could tailor a retention offer and reduce re-acquisition cost.

Slide 38

Slide 38 text

Filter for big, useful, surprising insights DO IT: Rate each analysis against B.U.S. Filter the analyses using this checklist IS THE INSIGHT BIG IS THE INSIGHT USEFUL IS THE INSIGHT SURPRISING We want a result that substantially changes the outcome. Can they take an action that improves their objective? What should they do next? Is it non-obvious? Does it overturn an existing belief, or bring consensus? Example B U S There are twice as many restaurants in NYC than any other city ü ü û Sales increased in every region except our largest branch, which dipped by 0.1% û ü ü Increase in rainfall increases the sale of umbrellas, and is the biggest driver of our sales ü û û

Slide 39

Slide 39 text

Here are the analyses & filters for the problems we saw earlier 40 Purchasing Commodities B U S Cargo Delay B U S Customer Churn B U S The most common commodity was ordered 10 times a week across 2.4 vendors Fragile cargo is a big factor in the delay, with a 20% impact Number of inbound calls does not impact churn. The number of orders is correlated with the number of vendors. Reducing one will reduce the other Fridays are when cargo is delayed the most Customers who haven’t made any calls in the last 15 days are the most likely to churn Plant P126 was the plant with the most violations, especially on largest commodity Trained staff and forklifts impact delay the most Customers making infrequent calls, recharging small amounts infrequently, are most at risk

Slide 40

Slide 40 text

Here are the analyses & filters for the problems we saw earlier 41 Purchasing Commodities B U S Cargo Delay B U S Customer Churn B U S The most common commodity was ordered 10 times a week across 2.4 vendors Fragile cargo is a big factor in the delay, with a 20% impact B S Number of inbound calls does not impact churn. S The number of orders is correlated with the number of vendors. Reducing one will reduce the other U Fridays are when cargo is delayed the most Customers who haven’t made any calls in the last 15 days are the most likely to churn B Plant P126 was the plant with the most violations, especially on largest commodity B U Trained staff and forklifts impact delay the most B U S Customers making infrequent calls, recharging small amounts infrequently, are most at risk B U S

Slide 41

Slide 41 text

DO IT: Write your takeaway as one sentence What’s the one thing you want the audience to remember from your story? What’s the one message that the audience should take away? CHECK IT: Verify these yourself q Is it a single, complete, sentence? q Does it deliver what you want the audience to remember? q Will your audience care a lot about this? Close your eyes. Think of a childhood tale. Summarize the moral of the story in one line We easily we remember these stories and their summary as a moral several years later. Close your eyes. Think of a business presentation from last week. Can you easily summarize the message in one line? Stories are designed around a moral. A single takeaway. An “elevator pitch” It’s a one-sentence summary of the most important message for the audience. Start with the takeaway. Summarize your entire story 42

Slide 42

Slide 42 text

Here is the storyline for the analyses we saw earlier 43 Purchasing Commodities Cargo Delay Customer Churn Takeaway Focus on reducing the number of vendors products ICG (in P126), FRS (in P121) and SWB (in P074) for a potential 40% reduction in logistics & vendor cost. To reduce the TAT to 1.5 hours at Airport XYZ, increase the number of forklifts from 1 to 2, and the number of trained staff from 4 to 6 If a customer has not called in the last 5-14 days, and they have made only 1 recharge under $20 last quarter, make them an offer to retain them. Supporting points ICG spend is among the highest, at €6.9m. P126 typically orders 40 times a week, often from 15-20 vendors. The number of forklifts is the biggest driver of TAT. Each forklift typically reduces TAT by 15-30%. The biggest driver of retention is when the customer made the outgoing call. The 5-14 days bucket has the highest variation. FRS spend is €3.2m. P121 orders from 3 vendors 8-14 times a week. Total staff count does not impact TAT. Increasing trained staff has a more tangible impact of ~5-10% per person. Customers who make at most 1 recharge under $20 are 280% more likely to churn than others.

Slide 43

Slide 43 text

Pick a format based on how your audience will consume the story 44

Slide 44

Slide 44 text

Here’s a fun quiz 45

Slide 45

Slide 45 text

Human memory is continuously capturing & forgetting information 46 Iconic Memory Working Memory Long-term Memory Attention Retrieval Maintenance Rehearsal 1-3 seconds 15- 30 seconds 1 second -Lifetime Pre-attentive Processing even before we pay attention Can hold and process between 5-9 chunks of information Information is stored by repeated application or through rehearsal Unattended information is lost Unrehearsed information is lost Some information may be lost by overtime Encoding

Slide 46

Slide 46 text

Visual perception as the ability to interpret the surrounding environment by processing information that is contained in visible light. Introduction to Data Storytelling by Rasagy Sharma

Slide 47

Slide 47 text

Some visual attributes are noticed before we actively pay attention to them 4 categories of pre-attentive visual attributes. Form | Colour | Spatial Position | Movement

Slide 48

Slide 48 text

Source: Designing Data Visualizations by Noah Iliinsky and Julie Steele (O’Reilly). Copyright 2011 Julie Steele and Noah Iliinsky, 978-1-449-31228-2. Position is the most powerful encoding. The eye and brain are naturally wired to detect mis- alignment of the smallest order 1 Colour, when used in context, is powerful. We can detect miniscule changes or variations in colour when comparing an element with neighbouring elements. This is what makes true colour (32-pixel colour, i.e. 4 billion) a necessity in computer graphics 2 Size is a useful differentiator. The eye can detect moderate size variations at moderate distances. Size also has a natural interpretation: that of priority. 3 Several other encodings are possible Aesthetics such as angle, shadows, shapes, patterns, density, labelling, enclosures, etc. can each be used to map data. 4 …and these attributes vary in their effectiveness

Slide 49

Slide 49 text

Let’s start small: visualize two numbers (2 & 8) from today’s date Sketch it out or watch others on Invision Freehand 50 Check the link in chat: https://gramener.invisionapp.com/freehand/document/PJDgyGxVU

Slide 50

Slide 50 text

There are many ways to visualize just two numbers 51

Slide 51

Slide 51 text

Properties and Best Uses of Visual Encodings Noah Illinsky ComplexDiagrams.com/properties 2012-06

Slide 52

Slide 52 text

Properties and Best Uses of Visual Encodings Noah Illinsky ComplexDiagrams.com/properties 2012-06

Slide 53

Slide 53 text

Some encoding methods are better* than others 54

Slide 54

Slide 54 text

COLOR POSITION TEXT LABEL Examples - Visual encoding

Slide 55

Slide 55 text

COLOR POSITION TEXT LABEL Olympic Medals SIZE/AREA Examples - Visual encoding

Slide 56

Slide 56 text

Guidelines on Visual Encodings List what you want to convey about the data. (remember the data relationships) If multiple things, sort these in order of importance Shortlist the pre attentive attributes that can be used for the above relationships. Map these attributes to the messages. Quick validation - self & with team

Slide 57

Slide 57 text

How do we select which chart to use? 58

Slide 58

Slide 58 text

Pick a visual design based on the takeaway 59 Deviation Change- over-Time Spatial Ranking Correlation Part-to- Whole Flow Magnitude Distribution

Slide 59

Slide 59 text

How the data should be interpreted decides the type of chart to be used 60 Help people explore data Showcase high/low performance Explain drivers of performance https://gramener.github.io/visual-vocabulary-vega/ Deviation Emphasise variations (+/-) from a fixed reference point. Typically the reference point is zero but it can also be a target or a long- term average. Change-over-Time Give emphasis to changing trends. These can be short (intra-day) movements or extended series traversing decades or centuries. Spatial Used only when precise locations or geographical patterns in data are more important to the reader than anything else. Ranking Use where an item's position in an ordered list is more important than its absolute or relative value. Correlation Show the relationship between two or more variables. Part-to-Whole Show how a single entity can be broken down into its component elements. Flow Show the reader volumes or intensity of movement between two or more states or conditions. These might be logical sequences or geographical locations Magnitude Show size comparisons. These can be relative (just being able to see larger/bigger) or absolute (need to see fine differences). Distribution Show values in a dataset and how often they occur. The shape (or skew) of a distribution can be a memorable way of highlighting the lack of uniformity or equality in the data.

Slide 60

Slide 60 text

Several other chart frameworks 61 Datavizcatalogue.com Datavizproject.com ft.com/vocabulary

Slide 61

Slide 61 text

Annotate to explain & engage. Use four types of narratives Remember “SEAR”: Summarize, Explain, Annotate, Recommend 62 0 5,000 10,000 15,000 20,000 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Marks # students Teachers add marks to stop some students from failing This chart shows Class 10 students’ English marks in Tamil Nadu, India, in 2011. The X-axis has the mark a student has scored. The Y-axis has the # of students who scored that mark. Large number of students score exactly 35 marks Few (but not 0) students fail at 31-34 marks What’s unusual Large number of students score 35 marks. Few (but not 0) students score between 30-35 Only some students get this benefit. Identify a fair policy that will be applied consistently. Summarize the visual in its title Don’t describe the chart. Don’t write the user’s question. Write the answer itself. Like a headline. Explain & interpret the visual How should the user read it? What do you say when you talk through it? Explain what the visual is. Then the axes. Then its contents. Then the inference. Recommend an action How should I act on this? You need to change the audience. (Otherwise, you made no difference.) Annotate essential elements What should the user focus their eyes on? Point it out, or highlight it with colors Interpret what they’re seeing – in words. This is a bell curve. But the spike at 35 (the mark at which students pass) is unusual. Teachers must be adding marks to some of the students who are likely to fail by a small margin. No one scores 0-4 marks

Slide 62

Slide 62 text

In summary, here are the 9 steps to go from data to a data story 63 Who is your audience? They determine the story What is their problem? That defines your analysis Find the right analysis to solve the problem Filter for big, useful, surprising insights Start with the takeaway. Summarize your entire story Add supporting analyses as a tree Pick a format based on how your audience will consume the story Pick a visual design based on the takeaway Annotate to explain & engage. Use four types of narratives

Slide 63

Slide 63 text

“Where can I learn more?” 64

Slide 64

Slide 64 text

Recommended Resources Books to read • Resonate - Nancy Duarte • Storytelling with Data - Cole Nussbaumer Knaflic • Truthful Art - Alberto Cairo • Design of everyday things - Don Norman • Back of the napkin - Dan Roam Data Storytelling at Gramener 1. Solutions on Gramener site: https://gramener.com/solutions/ 2. Gramener’s Blog https://blog.gramener.com/ 3. Gramener’s upcoming webinars: https://linkedin.com/company/ gramener/ Tools to learn • Paper & Pen (Collaborative) • Excel • Tableau & PowerBI • JS (D3, Vega, Plot.ly) • Python (Bokeh/Matplotlib) • R (ggplot) • Raw graphs • Illustrator / Sketch / Figma You can find me (Rasagy) on Twitter/LinkedIn/Instagram 65

Slide 65

Slide 65 text

“Most of us need to listen to the music to understand how beautiful it is. But often that’s how we present statistics: we just show the notes, we don’t play the music.” — Hans Rosling Introduction to Data Storytelling by Rasagy Sharma

Slide 66

Slide 66 text

THANK YOU