Hands-On Time Series Analysis with Python Course [4/6]: Forecasting with ARIMA Models [Part 1]

A practical guide for time series forecasting using ARIMA models in Python

Jul 30, 2025

∙ Paid

Have you ever tried to predict the future? What lies ahead is a mystery that is usually only solved by waiting. In this article and the coming article, we will stop waiting and learn to use the powerful ARIMA class models to forecast the future.

You will learn how to use the statsmodels package to analyze time series, build tailored models, and forecast under uncertainty. How will the stock market move in the next 24 hours?

How will the levels of CO2 change in the next decade? How many earthquakes will there be next year? You will learn to solve all these problems and more.

In this course, “Hands-On Time Series Analysis with Python Course “, I will go through the basic techniques to work with time-series data, starting from data manipulation, analysis, and visualization to understand your data and prepare it for, and then using the statistical, machine learning, and deep learning techniques for forecasting and classification. It will be a more practical guide, in which I will apply each discussed and explained concept to real data.

This series will consist of 6 articles:

Manipulating Time Series Data In Python Pandas [A Practical Guide]
Time Series Analysis in Python Pandas [A Practical Guide]
Visualizing Time Series Data in Python [A Practical Guide]
Time Series Forecasting with ARIMA Models in Python [Part 1] (You are here!)
Time Series Forecasting with ARIMA Models In Python [Part 2]
Machine Learning for Time Series Data [Regression]

Predicting the future with ARIMA models / Photo by Michael Dziedzic on Unsplash

The data used and the codes used in this article can be found in this repository.

Get All My Books, One Button Away With 40% Off

Youssef Hosni

Jun 17

Get All My Books, One Button Away With 40% Off

I have created a bundle for my books and roadmaps, so you can buy everything with just one button and for 40% less than the original price. The bundle features 8 eBooks, including:

Read full story

1. ARMA Models

We will start with a small introduction to stationarity and how this is important for ARMA models. Then we will revise how to test for stationarity by eye and with a standard statistical test.

If you would like to get more information about these topics, you can check my previous article, Time Series Analysis in Python, as they are covered in more detail in it. Finally, you’ll learn the basic structure of ARMA models and use this to generate some ARMA data and fit an ARMA model.

We will use the candy production dataset, which represents the monthly candy production in the US between 1972 and 2018. Specifically, we will be using the industrial production index IPG3113N.

This is the total amount of sugar and confectionery products produced in the USA per month, as a percentage of the January 2012 production. So 120 would be 120% of the January 2012 industrial production.

1.1. Introduction to stationarity

Stationary means that the distribution of the data doesn’t change with time. For a time series to be stationary, it must fulfill three criteria:

The series has zero trends. It isn’t growing or shrinking.
The variance is constant. The average distance of the data points from the zero line isn’t changing.
The autocorrelation is constant. How each value in the time series is related to its neighbors stays the same.

Get 50% off for 1 year

The importance of stationarity comes from that to model a time series, it must be stationary. The reason for this is that modeling is all about estimating parameters that represent the data; therefore, if the parameters of the data are changing with time, it will be difficult to estimate all the parameters.

Let’s first load and plot the monthly candy production dataset:

# Load in the time series
candy = pd.read_csv('candy_production.csv', 
            index_col='date',
            parse_dates=True)
# change the plot style into fivethirtyeight 
plt.style.use('fivethirtyeight')

# Plot and show the time series on axis ax1
fig, ax1 = plt.subplots()
candy.plot(ax=ax1, figsize=(12,10))
plt.show()

Monthly production of candy in the US from 1974 to 2018.

Generally, in machine learning, you have a training set on which you fit your model, and a test set on which you will test your predictions against. Time series forecasting is just the same. Our train-test split will be different.

We use the past values to make future predictions, and so we will need to split the data in time. We train on the data earlier in the time series and test on the data that comes later. We can split a time series at a given date as shown below using the DataFrame’s .loc method.

# Split the data into a train and test set
candy_train = candy.loc[:'2006']
candy_test = candy.loc['2007':]

# Create an axis
fig, ax = plt.subplots()

# Plot the train and test sets on the axis ax
candy_train.plot(ax=ax, figsize=(12,10))
candy_test.plot(ax=ax)
plt.title('train - test split of the monthly production of candy in US')
plt.xlabel('Date')
plt.ylabel('Production')
plt.show()

train — test split of the monthly production of candy in the US.

1.2. Making a time series stationary

There are many ways to test stationarity, one of them with the eyes, and others are more formal using statistical tests. There are also ways to transform non-stationary time series into stationary ones. We’ll address both of these in this subsection, and then you’ll be ready to start modeling.

The most common test for identifying whether a time series is non-stationary is the augmented Dicky-Fuller test. This is a statistical test, where the null hypothesis is that your time series is non-stationary due to trends.

We can implement the augmented Dicky-Fuller test using statsmodels. First, we import the adfuller function as shown, then we can run it on the candy production time series.

from statsmodels.tsa.stattools import adfuller
results = adfuller(candy)
print(results)

The results object is a tuple. The zeroth element is the test statistic; in this case, it is -1.77. The more negative this number is, the more likely that the data is stationary.

The next item in the results tuple is the test p-value. Here it’s 0.3. If the p-value is smaller than 0.05, we reject the null hypothesis and assume our time series must be stationary.

The last item in the tuple is a dictionary. This stores the critical values of the test statistic, which equate to different p-values. In this case, if we wanted a p-value of 0.05 or below, our test statistic needed to be below -2.86.

Based on this result, we are sure that the time series is non-stationary. Therefore, we will need to transform the data into a stationary form before we can model it.

We can think of this a bit like feature engineering in classic machine learning. One very common way to make a time series stationary is to take its difference. This is where from each value in our time series we subtract the previous value.

# Calculate the first difference and drop the nans
candy_diff = candy.diff()
candy_diff = candy_diff.dropna()

# Run test and print
result_diff = adfuller(candy_diff)
print(result_diff)

From the results, we can see that now the time series are stationary. This time, taking the difference was enough to make it stationary, but for other time series, we may need to make the difference more than once or do other transformations.

Sometimes we will need to perform other transformations to make the time series stationary. This could be to take the log, or the square root of a time series, or to calculate the proportional change. It can be hard to decide which of these to do, but often the simplest solution is the best one.

1.3. Introduction to AR, MA, and ARMA models

Get 50% off for 1 year

In an autoregressive (AR) model, we regress the values of the time series against previous values of the same time series. The equation for a simple AR model is shown below:

y(t) = a(1) * y(t-1) + ϵ(t)

The value of the time series at time (t) is the value of the time series at the previous step multiplied by parameter a(1), added to a noise or shock term ϵ(t). The shock term is white noise, meaning each shock is random and not related to the other shocks in the series.

The a(1) is the autoregressive coefficient at lag one. Compare this to a simple linear regression where the dependent variable is y(t) and the independent variable is y(t-1). The coefficient a(1) is just the slope of the line, and the shocks are the residuals of the line.

This is a first-order AR model. The order of the model is the number of time lags used. An order two AR model has two autoregressive coefficients and has two independent variables, the series at lag one and the series at lag two. More generally, we use p to mean the order of the AR model. This means we have p autoregressive coefficients and use p lags.

In a moving average (MA) model, we regress the values of the time series against the previous shock values of this same time series. The equation for a simple MA model is shown below:

y(t) = m(1)*ϵ(t-1) + ϵ(t)

The value of the time series y(t)is m(1) times the value of the shock at the previous step, plus a shocking term for the current time step. This is a first-order MA model.

Again, the order of the model means how many time lags we use. An MA two model would include shocks from one and two steps ago. More generally, we use q to mean the order of the MA model.

An ARMA model is a combination of the AR and MA models. The time series is regressed on the previous values and the previous shock terms. This is an ARMA-one-one model.

More generally, we use ARMA(p,q) to define an ARMA model. The p tells us the order of the autoregressive part of the model, and the q tells us the order of the moving average part.

y(t) = a (1)*y(t-1) + m(1)* ϵ(t-1) + ϵ(t)

Using the statsmodels package, we can both fit ARMA models and create ARMA data. Let’s take this ARMA-one-one model. Say we want to simulate data with these coefficients.

First, we import the arma-generate-sample function. Then we make lists for the AR and MA coefficients. Note that both coefficient lists start with one. This is for the zero-lag term, and we will always set this to one. We set the lag one AR coefficient as 0.5 and the MA coefficient as 0.2.

We generate the data, passing in the coefficients, the number of data points to create, and the standard deviation of the shocks. Here, we actually pass in the negative of the AR coefficients we desire. This is a quirk we will need to remember.

from statsmodels.tsa.arima_process import arma_generate_sample
ar_coefs = [1, -0.5] 
ma_coefs = [1, 0.2]
y = arma_generate_sample(ar_coefs, ma_coefs, nsample=100, scale=0.5)

The generated data can be represented with this equation:

y(t) = 0.5y(t−1) + 0.2* ϵ(t−1) + ϵ(t).

Fitting is covered in the next section, but here is a quick peek at how we might fit this data. First, we import the ARMA model class. We instantiate the model, feed it the data, and define the model order. Then, finally, we fit.

from statsmodels.tsa.arima.model import ARIMA
# Instantiate model object
model = ARIMA(y, order=(1,0,1))
# Fit model
results = model.fit()

2. Fitting the Future

In this section, you’ll learn how to use the elegant statsmodels package to fit ARMA, ARIMA, and ARMAX models. Then you’ll use your models to predict the uncertain future of Amazon stock prices.

2.1. Fitting time series models

We had a quick look at fitting time series models in the last section, but let’s have a closer look. To fit these models, we first import the ARIMA model class from the statsmodels package.

Keep reading with a 7-day free trial

Subscribe to To Data & Beyond to keep reading this post and get 7 days of free access to the full post archives.

To Data & Beyond