Five Steps to Get Started with Predictive Analytics
When talking about predictive analytics, it often rings synonymous with other terms like big data, machine learning, and artificial intelligence. While all of those terms are connected in one way or another, predictive analytics is its own entity. I define
predictive analytics as the use of data, formulas, and statistical modeling to predict future trends and outcomes based off of known information.
Predictive analytics can easily become a complicated and expensive venture, but my goal today is to show you how predictive analytics is not as daunting as it may seem. I am going to take you through five steps you can take to start making more intelligent business decisions with the data you already have lying around.
1. What Is the Business Goal?
The first step I always take when thinking about predictive analytics is to determine the business goal. What are we trying to accomplish with our data? Here are some common business goals using predictive analytics:
- We want to measure marketing’s impact on sales, even for hard-to-track activities
- We want to better understand the customer’s purchasing tendencies
- We want to retain more customers and, in turn, increase lifetime value
- We want to determine from a customer service standpoint what hold times result in the most dropped calls
- We want to discover potential areas of risk within our organization
- We want to more intelligently control our inventory
- We want to identify common road blocks in our sales funnel
- We want to find customers that are eligible for upsell and cross-sell opportunities
- We want to reduce downtime within our fleet by predicting probable maintenance and downtime events
These “We want to” business statements are a great way to get started. Think of common problems you deal with on a day-to-day basis and start making a list. Once you have that list determined, figure out which of these goals can be aided by predictive analytics. When you have your list narrowed down, select one goal to pursue and move on to the next step.
For the purpose of this article, I am going to walk you through the first bullet point listed above, measuring marketing’s impact on sales – even for hard-to-track activities.
2. Define Key Data Points
Next, we need to work on compiling our data. The key data points are the variables we need to cross-check against each other.
In this example, I am looking at marketing’s impact on sales for a specific product. Therefore, I know that I am going to need to have total sales, marketing spend and a specified time frame. This is the simplest example; ideally, I would add additional data points like average buying cycle length (For this example we are assuming the buying cycle is instantaneous for our product), medium, messaging, source, audience segment, etc.
Take all of your historical data, as far back as you deem relevant, and export your information into spreadsheets if not done so already. You might have some cleaning and formatting to do to your data, but it is well worth the time investment. Your end result should look something like the image below.
3. Choose a Model
Picking the correct model is key to predictive analytics. The easiest one to start with, and the one we will be using with our example today, is a regression model. Regression models are either going to be linear or multiple based off the number of data points or variables that you defined earlier. I am not going to get into the complex formulas that are used when you get into predictive analytics, but it does help to understand how we are getting the results. Our key data points will always have one target variable, and any number of predictor variables. The number of predictor variables we have determines whether we are using a linear (including one data point) or multiple regression formula.
4. Plot and Display Your Data
The video and steps below show a walk through of how we are going to plot and display all of our data inside Excel. You can see that we have our y target variable defined as monthly sales. Then if you look at the next column over you can see we have our monthly advertising spend, or our predictive variable. Since we are only using one predictor variable, we have a linear regression.
Now that you have plugged in all of your data, it is time to display it in a scatter plot. The scatter plot is going to show the relationship between marketing dollars spent and total sales. We hope to see that an increase in marketing dollars spent is equal to an increase in sales. If you are following along in Excel, here are the steps I used to import my data into the chart.
- Highlight your sales and marketing dollar columns, go to the insert tab at the top, and select the scatter plot from your chart options.
- Set your x-axis and y-axis appropriately. My x-axis is monthly sales and my y-axis is marketing dollars spent. You should now see a scatter of colored dots on the chart.
- Next, we need to add our trendline. If you hover over your chart and select the green plus sign in the upper right you will see a box for the trendline. Click on the arrow and select the option that says exponential. We use an exponential trendline instead of just a straight one because it more accurately represents our data.
- Now that we have the dots and the trendline I like to format each element to make the chart a little more readable. You can do that by clicking format on each element you wish to adjust.
5. Evaluate Your Results
The most important part is to evaluate your results once you have plotted your data. Just by looking at the plotted data, we can probably tell whether there is some correlation or not. When we look at our chart, we can see that there is definitely a correlation between the two variables, but how strong is another question. To figure out, we are going to calculate an R-squared value. The R-squared value is measured on a percentage scale from zero to 100. One hundred means that there is a near perfect correlation, and zero means there is no correlation what-so-ever. What is considered an acceptable R-squared value changes based on what business goal you are trying to achieve. It is important to determine what you would accept before you run your test results in the spreadsheet. There is no right or wrong answer.
If you look at the chart image above, you can see that we have about a 93 percent value for R-squared. For this test, I would have been really happy with anything over 90 percent. I was able to calculate that value by right clicking on my trend line, selecting format options, and checking the box which says display R-squared value.
I hope this was a nice entry point for you to start exploring how simple applications in predictive analytics can drive real business results. You might not want to track your marketing spend and sales dollars like I did, but that doesn’t mean there isn’t a ton of applications you can put in practice right now. If you want to get more in-depth and start exploring other modeling options, like multiple regression, decision trees, time series, association, clustering and more, there are a bunch of great resources available online to learn from and expand your knowledge.
Ryan Middleton - Database Marketing Analyst