5 keys to robust data analysis
Lines of incomprehensible data flow across the screen. The more you try and decipher what it means, the more panicked you become. Your finance team has given you a mountain of data and your executives expect you to tell a story. However, you don’t know what it means and you aren’t sure where to start.
Below, you will find 5 guiding principles for taking the intimidation out of data analysis. These principles will make your data analysis comprehensive, relevant, and applicable to any problem you are solving.
1. Define the Story You Want to Tell
Think of the data you pull akin to the main characters of your favorite novel. Without a plot, the characters are only static beings that are not relevant to anyone or anything. Imagine reading The Lord of the Rings without knowing that the Ring holds great and mysterious power. We would think it ludicrous that Frodo is risking everything for something of seemingly little value.
It is the same with data analysis. Columns and rows upon rows of values are simply just that- rows and columns. In order to drive meaning and insight, we need the story. The story will provide support for the initial hypotheses and will allow you to target your effort. This can usually be framed within an overriding question. For example, “Which of my partners are causing the greatest amount of developers to create the highest quality apps?”
If we pull specific data in an effort to answer a clearly worded question, we can drive to insight. Once that data is collected, we can ask the follow up questions, “What are these partners doing to achieve such success?” And finally, “How can we leverage this knowledge?”
Knowing the story the data is meant to tell paves the way for application and ultimately process improvement.
2. Verify the Credibility of the Source
Ensure the data is accurate and complete. There are two ways. The first is to go directly to the actual source of the data.
Have you ever played the game of telephone? It is pretty much a guarantee that the last person in the circle will receive a message that is totally different than the one initially passed on by the first person.
So it is with data. The more people you have to go through to get the data, the less likely it is to be accurate. If possible, pull your data as close as possible to where it was generated. This will greatly increase the validity of your data.
The second way to verify the credibility of the source is to actually take some time to study the data before starting to manipulate it. Do the values seem reasonable, measureable, and reproducible? Do they make sense within the context of the story you are telling? Oftentimes as data analysts we get zeroed in on singular bits of information. It is important that we take a step back and ensure that what we are analyzing supports the overall picture.
3. Automate, Automate, Automate
When dealing with millions of pieces of data, it is all too easy to spend hours manipulating it. Not only that, but we are all prone to human error. The more manual a process, the greater risk there is of making mistakes. Because of these two factors, it is imperative that you automate the entire process of data analysis as much as possible. Not only will this build confidence in your numbers, but it will also allow for scalability.
The key is to look for patterns. If anything seems repetitious, chances are you can probably automate it. It can be as simple as writing a macro that organizes the data in Excel or as complicated as creating a web crawling engine. However, beware of over-engineering the process. The definition of success is when you automate the process in a way that is repeatable and easily executable by anyone, at any time.
4. Measure Twice, Cut Once
This adage is especially true if you have moved to the point where the data collection is almost entirely automated. It is easy to fall into the trap of pulling the data and moving on without taking a moment to ensure that it is correct.
To combat against this, it is important to create a system of checks and balances. A good approach is to take a moment to brainstorm everything that could possibly go wrong, and then actively defend against it. This can include doing a simple checksum formula or a check point for text in the cells you expect text. In fact, you can even go a step further and check that a certain cell contains the value you are expecting. Whatever the case, this will help to make you data as robust as possible.
5. Present the Data in a Simple, Meaningful Way
This is key. When pulling thousands and thousands of rows of data, it is all too easy to get lost in the sea of numbers. Organizing the data in a way that makes sense will relieve you and everyone involved in the data analysis process of some serious headaches. It is this final principle that gets practiced the least.
Before presenting the data, you need to decide who the audience is going to be. Is it going to be an executive who wants a high-level summary? Or is it going to be someone in the field who wants the raw numbers for process improvement?
A good way to reach the widest audience is to present a high-level summary of your results with the option of drilling deeper. Too often, data will not be comprehensive, or it will be so complicated that no one besides the data analyst can understand it. A good test is to present your findings to someone with no knowledge of the project. If they can understand the meaning behind the data, then it’s a safe bet that your client will too. Remember, data is only relevant if it tells a story. It is this story that will bridge the gap from numbers to insight.