Data Transformation: Getting more out of your data – Part 1

Turn a multi-part field into individual rows

Many of my customers just want to press a button and print a simple grid report with numbers. More and more I’m finding that they still want those but they want visualizations along with more unique and more flexible ways of looking at their data. More people also want to play with their own data in Excel or other tools they are comfortable using. I help them enhance their existing data and provide an example or two of visualizations then can build on.

I’ve explained this example in more detail on OpenLDR.org but it’s a good example of transforming data so it’s in a structure that’s easier to work with; a more obvious structure.

Consider a table which has a laboratory test result in each row (think of it as a simple Excel worksheet if you like). However, within that row there is a cell/field which has a list of drugs to which the organism has tested resistant. This is a difficult format to try to use to report on those individual drugs.

TestId Organism ResistantDrugs
12345 Staphylococcus aureus Ampicillin~Gentamicin~ Nalidixic Acid~
54321 Haemophilus influenzae Ceftriaxone~Cefoxitin~Tetracycline~

 

A database person can create a simple SQl query to transform the data into a more obvious format that many existing tools like Excel, Crystal Reports or other report tool can easily use. The following format is easy to feed into a crosstab report or chart in order to report on individual organisms and drugs.

TestId Organism ResistantDrugs
12345 Staphylococcus aureus Ampicillin
12345 Staphylococcus aureus Gentamicin
12345 Staphylococcus aureus Nalidixic Acid
54321 Haemophilus influenza Ceftriaxone
54321 Haemophilus influenza Cefoxitin
54321 Haemophilus influenza Tetracycline

Depending on how much data you have and other factors your database person can also help you determine the best way to access the data; SQL query, database View or an extract to another data format; or if it would be better to actually store the data in another table for easy access and improved performance.

Include a Region column and you can produce a very useful visualization:

Bottom line: This is a simple example of how you can transform data into something more easy to use for analysis.