Forecasting in Python with Prophet
Forecasting is often considered a natural progression from reporting. Reporting helps us answer, what happened? Forecasting helps answer the next logical question, what will happen?
Historically, high quality forecasts have been very challenging to produce. This resulted in a severe shortage of analysts who could deliver forecasts with the level of accuracy required to drive business decisions. To alleviate this supply gap and to make scalable forecasting dramatically easier, the Core Data Science team at Facebook created Prophet, a forecasting library for Python and R, which they open-sourced in 2017.
The intent behind Prophet is to “make it easier for experts and non-experts to make high-quality forecasts that keep up with demand.” Prophet is able to produce reliable and robust forecasts (often performing better than other common forecasting techniques) with very little manual effort, while allowing for the application of domain knowledge via easily-interpretable parameters.
In this recipe, you'll learn how to use Prophet (in Python) to solve a common problem: forecasting a company's daily orders for the next year. This lightweight example should serve as a great way to get started with Prophet, and will hopefully spark some inspiration to dive even deeper into the library's vast functionality.
This recipe is broken down into four main sections:
You can find implementations of all of the steps outlined below in this example Mode report.
Data Preparation & Exploration
Prophet works best with daily periodicity data with at least one year of historical data. It's possible to use Prophet to forecast using sub-daily or monthly data, but for the purposes of this recipe, we'll use the recommended daily periodicity. We will use SQL to wrangle the data we’d like to forecast at a daily periodicity:
order by date
NOTE: While Prophet is relatively robust to missing data, it’s important to ensure that your time series is not missing a significant number of observations. If your time series is missing a large number of observations, consider using a resampling technique or forecasting your data at a lower frequency (e.g. making monthly forecasts using monthly observations)
Now that we have our data at a daily periodicity, we can pipe our SQL query result set into a pandas dataframe object in the Python notebook. First, rename your SQL query to
Daily Orders. Then, inside the Python notebook, we can use the following statement to pipe our query result set into a dataframe
df = datasets["Daily Orders"]
To get a quick sense of how many observations your dataframe contains, you can run the following statement, which will return a tuple containing the number of rows and columns in your dataframe, respectively:
Prophet always expects two columns in the input DataFrame:
ds column represents the date from your SQL query, and needs to be either date or datetime data type. The
y column represents the value we are looking to forecast, and must be of numeric data type. To check the types of the columns in your DataFrame, you can run the following statement in the Python notebook:
Once you have confirmed that the columns in your dataframe are the correct data types, you can create a new column
ds in your dataframe that is an exact copy of the
date column, and a new column
y that is an exact copy of the
df['ds'] = df['date']
df['y'] = df['value']
You can then repurpose the
date column to be used as the index of the dataframe:
This transforms the index of your dataframe into a DatetimeIndex, which gives pandas the capability to interpret this dataset as a Time Series.
Now that you have your data prepped to be used with Prophet, it’s good practice to plot it and inspect what the data looks like before feeding it into Prophet. Using our example data, we can use matplotlib to create the following visualization:
There are a few things to notice about this plot:
- There is a noticeable change in trend trajectory around May 2017. By default, Prophet automatically detects these kinds of “trend changepoints” and allows the trend to adapt appropriately. Prophet also allows finer-grained control over the identification of these trend changepoints.
- There is noticeable weekly and yearly seasonality. Prophet will automatically fit weekly and yearly seasonalities if the time series is more than two cycles long.
- The mean and variance of our observations increase over time. Prophet natively models the increase in mean of the data over time, but we should take additional steps to normalize as much variance as possible to achieve the most accurate forecasting results. We can do this by applying a power transform to our data.
Often in forecasting, you'll explicitly choose a specific type of power transform to apply to the data to remove noise before feeding the data into a forecasting model (e.g. a log transform or square root transform, amongst others). However, it can sometimes be difficult to determine which type of power transform is appropriate for your data. This is where the Box-Cox Transform comes in.
Box-Cox Transforms are data transformations that evaluate a set of lambda coefficients (λ) and selects the value that achieves the best approximation of normality.
The scipy.stats Python module provides a built-in Box-Cox Transform method, called boxcox. The
boxcox method returns a positive dataset transformed by a Box-Cox power transformation. To use the
boxcox method, first import the method from the
scipy.stats module by adding the following line to your import block:
from scipy.stats import boxcox
boxcox method has one required input: a 1-dimensional array of positive data to transform. Optionally, you can also specify the λ value you’d like to use for your transformation (e.g. λ = 0 for a log transform). Otherwise, the
boxcox method will find the λ that maximizes the log-likelihood function and will return it as the second output argument.
For our example, we will let the
boxcox method determine the optimal λ to use for our transformation, and will return that value to a variable named lam:
# Apply Box-Cox Transform to value column and assign to new column y
df['y'], lam = boxcox(df['value'])
If we plot our newly transformed data alongside the untransformed data, we can see that the Box-Cox transformation was able to remove much of the increasing variance in our observations over time:
The first step in creating a forecast using Prophet is importing the
fbprophet library into our Python notebook:
Once we've imported the Prophet library into our notebook, we can begin by instantiating (create an instance of) a Prophet object:
m = fbprophet.Prophet()
Once you have instantiated a Prophet object, you're ready to fit a model to your historical data. You do this by calling the
fit method on the Prophet object and passing in your dataframe:
Once you have used Prophet to fit the model using the Box-Cox transformed dataset, you can now start making predictions for future dates. Prophet has a built-in helper function
make_future_dataframe to create a dataframe of future dates. The
make_future_dataframe function lets you specify the frequency and number of periods you would like to forecast into the future. By default, the frequency is set to days. Since we are using daily periodicity data in this example, we will leave
freq at it’s default and set the
periods argument to 365, indicating that we would like to forecast 365 days into the future.
future = m.make_future_dataframe(periods=365)
We can now use the
predict method to make predictions for each row in the future dataframe.
forecast = m.predict(future)
At this point, Prophet will have created a new dataframe assigned to the
forecast variable that contains the forecasted values for future dates under the column
yhat, as well as uncertainty intervals and components for the forecast. We can visualize the forecast using Prophet’s built-in
plot helper function:
In our example, our forecast looks as follows:
If you want to visualize the individual forecast components, you can use Prophet’s built-in
plot_components on our example data returns the following set of component visualizations:
The forecast and component visualizations show that Prophet was able to accurately model the underlying trend in the data, while also accurately modeling weekly and yearly seasonality (e.g. lower order volume on weekend and holidays).
Inverse Box-Cox Transform
Since Prophet was used on the Box-Cox transformed data, you'll need to transform your forecasted values back to their original units. To transform your new forecasted values back to their original units, you will need to perform an inverse Box-Cox transform.
The scipy.special Python module provides a built-in Inverse Box-Cox Transform method, called inv_boxcox. To use the
inv_boxcox method, first import the method from the
scipy.special module by adding the following line to your import block:
from scipy.special import inv_boxcox
inv_boxcox method has two required inputs; an array of data to transform and a λ value for the transform. We will inverse transform specific columns in our forecast dataframe, and supply the λ value we obtained earlier from our first Box-Cox transform stored in the
# Apply inverse Box-Cox transform to specific forecast columns
forecast[['yhat','yhat_upper','yhat_lower']] = forecast[['yhat','yhat_upper','yhat_lower']].apply(lambda x: inv_boxcox(x, lam))
Now that your forecasted values are transformed back to their original units, you are able to visualize the forecasted values alongside the historical values: