Purely categorical data can come in a range of formats • raw data: individual observations • aggregated data: counts for each unique combination of levels • cross-tabulated data
levels of a categorical feature as bars Levels are plotted on one chart axis, and values are plotted on the other axis Each categorical value claims one bar, and the length of each bar corresponds to the bar’s value
Bar charts plot categorical data Example: The variables on the horizontal axis are categorical - they provide the names of the exhibitions. The vertical axis indicates time in minutes. The height of each bar represents the median time for that exhibition https://www.forbes.com/sites/naomirobbins/2012/01/04/a-histogram-is-not-a-bar-chart/#166f022f6d77
quantitative data with ranges of the data grouped into bins or intervals Example first bin includes visits from 0 up to and including ten minutes, the second bin from 10 up to and including 20 minutes, and so on https://www.forbes.com/sites/naomirobbins/2012/01/04/a-histogram-is-not-a-bar-chart/#166f022f6d77
graph or grouped column graph Takes bar graph one step further and plots two variables instead of one Color almost always represents the secondary variable
Potential accessibility problems (we will address in future) Use when: • want to look at how the second category variable changes within each level of the first • want to look at how the first category variable changes across levels of the second
element in the categories • Comparing elements across categories • Hard to tell the difference between the total of each group Still confused? visual.ly/blog/how-groups-stack-up-when-to-use-grouped-vs-stacked-column-charts/ Stacked bar charts • Great for showing visual aggregate and differences between totals • Hard to compare sizes within one category
— it’s called a Mosaic Plot In modern data visualization tools, including Tableau, known as a Marimekko Chart Has a ton of other names: matrix chart, stacked spinogram, spineplot, olympic or submarine chart, a Mondrian diagram, or even shortened to just mekko chart
100% stacked horizontal-bar chart using a different variable for each A variable-width stacked column chart A way to show part-to-whole relationships across two variables at once
allows you to examine the relationship among two or more categorical variables The area of each box demonstrates the total amount for each observation How to Build a Mosaic Plot - step by step. http://www.pmean.com/definitions/mosaic.htm
represent the number of observations for each level of the X variable, which is country. • The proportions on the y-axis at right represent the overall proportions of Small, Medium, and Large cars for the combined levels (American, European, and Japanese). • The scale of the y-axis at left shows the response probability, with the whole axis being a probability of one (representing the total sample).
heights within each bin are a percentage, not a count You can use multiple variables (example on right has four) Too many variables can make it hard to read