Using Pictures to Look at Your Data

We can’t all take the time to make world changing data graphics like Edward Tufte or Hans Rosling; it’s just not always cost effective. However, there are some really nice pre-built graphics that we can feed data into that allow your brain to pick up on interesting information in the data. One of my favorites is the Box Plot (or Box and Whisker Plot). They display a lot of information in a concise format but the overall data is easy to interpret.

The Box Plot allows you to view the distribution of the individual data points but still look at things side by side.

In the above graphic plots each student’s high score for a particular assessment for a school year. The graphic allows you to see the distribution of scores for each teacher and school year and is laid out so it is easy to see scores for each teacher within the year.

This graphic shows the distribution of student assessment scores for each teacher/school year  but groups each teacher’s years together so you can more easily see how each teacher’s students did from year to year.

Even though the underlying data points are the same they each show different trends and comparisons. When looking at this second image of teacher clusters a couple of unexpected things stand out: 1. Which teachers have been at the school the longest. 2. How much turn-over in teachers there has been for this grade over the last few years.

Typically, what you are looking for are anomalies in the data and not the expected trends. Looking at a graphic like this can give you ideas to create more detailed queries that would pull out those anomalies more specifically. For example, after looking into these images you might be tempted to do another more specific query to ask which classrooms did not improve their scores during the year; and eventually drill-down to specific students and other factors which might affect scores like absences or whether they are transient students.

Moving Forward

This easy and nice Box Plot tool is available starting in Excel 2016 and is pretty easy to use as long as you can format the data the way it likes.

  1. Take a look at these examples and think about what types of questions you are trying to answer using your data. Think about what specific data points you have and what would be good to compare side by side or perhaps over time. Then flip one or two of those points and look at it with one of them in a different location on the plot.
  2. Extract your data in a format that’s easy for Excel or other charting utility to utilize. This is where something who is familiar with manipulating data can be a big help. The data used for these graphs had to be manipulated before the Excel chart could easily use the data. The highest score for a student for each year needed determined. And more importantly the data needed distributed into columns for the charting tool to use.

For Excel’s Box Plot the data works best in a format like the following but this is not the way most data is stored or organized in a database or spreadsheet. To group by Teacher the first column is the teacher and then scores are aligned under the school year columns.

Each student score falls somewhere in the grid. For this example, I ran a program that randomized the scores and teacher names. I used SQL to query the database and extract the data in the above format. Actually, I initially included all identifying information like the student id, SchoolYear, all scores for the year for the student, full teacher name and basically anything I need to then validate that the output of the query was precise. The SQL was created dynamically by a program which dynamically determined the number of years and created columns and distributed the student’s highest scores to the correct columns for each student and year. There are ways to do this all in Excel if you don’t have SQL. The key takeaway is that there are endless methods for transforming your data so it can be viewed in unique ways.

Here’s a quick primer on all of the data in each box plot image.*

*The box and whisker graphic was reused from the following web page. The web page is worth visiting for more information on Box Plots.