Skip to content
Snippets Groups Projects
Commit 39c8df68 authored by Ahmad Said Yousf Ayad's avatar Ahmad Said Yousf Ayad
Browse files

Deleted Exercise_3.ipynb, airline-passengers.csv

parent 61e72b8e
No related branches found
No related tags found
1 merge request!11Update requirements.txt
%% Cell type:markdown id: tags:
This is a problem where, given a year and a month, the task is to predict the number of international airline passengers in units of 1,000. The data ranges from January 1949 to December 1960, or 12 years, with 144 monthly observations.
We can phrase the problem as: given the number of passengers (in units of thousands) this month, what is the number of passengers next month?
%% Cell type:markdown id: tags:
# Import Libraries
%% Cell type:code id: tags:
``` python
import numpy
import pandas
import matplotlib.pyplot as plt
from pandas import read_csv
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
# fixed random seed for reproducibility
numpy.random.seed(7)
```
%% Output
Using TensorFlow backend.
%% Cell type:markdown id: tags:
We can load this dataset easily using the Pandas library. We are not interested in the date, given that each observation is separated by the same interval of one month. Therefore, when we load the dataset we can exclude the first column.
Extract the NumPy array from the dataframe and convert the integer values to floating point values, which are more suitable for modeling with a neural network. Load and plot the dataset below.
%% Cell type:markdown id: tags:
# Load and plot dataset
%% Cell type:code id: tags:
``` python
# load the dataset
dataframe = pandas.read_csv('airline-passengers.csv', usecols=[1], engine='python')
dataset = dataframe.values
dataset = dataset.astype('float32')
```
%% Output
[[112.]
[118.]
[132.]
[129.]]
%% Cell type:markdown id: tags:
# Normalize the dataset
LSTMs are sensitive to the scale of the input data, specifically when the sigmoid (default) or tanh activation functions are used. It can be a good practice to rescale the data to the range of 0-to-1, also called normalizing. We can easily normalize the dataset using the MinMaxScaler preprocessing class from the scikit-learn library
%% Cell type:code id: tags:
``` python
# normalize the dataset
```
%% Cell type:markdown id: tags:
# Create training and test sets
After we model our data and estimate the performance of our model on the training dataset, we need to get an idea of the accuracy of the model on new unseen data. For a normal classification or regression problem, we would do this using cross validation.
With time series data, the sequence of values is important. A simple method that we can use is to split the ordered dataset into train and test datasets. Calculate the index of the split point and separate the first part of the data into the training datasets with 67% of the observations that we can use to train our model, leaving the remaining 33% for testing.
%% Cell type:code id: tags:
``` python
# split into train and test sets
```
%% Cell type:markdown id: tags:
Write a simple function to convert our single column of data into a two-column dataset: the first column containing this month's (t) passenger count and the second column containing next month's (t+1) passenger count, to be predicted.
The function takes two arguments: the "dataset" and the "look_back", which is the number of previous time steps to use as input variables to predict the next time period. The function will then create a dataset where X is the number of passengers at a given time, t, and Y is the number of passengers at the next time. t + 1. Otherwise, expressed as Y(t) = X(t+1).
PS: the syntax "look_back=1" means that the default argument value value of "look_back" is one. The function (algorithm) should still account for look_back value different than 1.
%% Cell type:code id: tags:
``` python
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1):
dataX, dataY = [], []
#take a look at the effect of this function on the first rows of the dataset
return numpy.array(dataX), numpy.array(dataY)
```
%% Cell type:markdown id: tags:
Compare the first 5 rows to the original dataset sample listed in the previous section, can you see the X=t and Y=t+1 pattern?
Use the function to prepare the train and test datasets for our model.
%% Cell type:code id: tags:
``` python
# reshape into X=t and Y=t+1
look_back = 1
trainX, trainY =
testX, testY =
```
%% Cell type:markdown id: tags:
The LSTM network expects the input data (X) to be provided with a specific array structure in the form of: [samples, time steps, features].
Currently, our data is in the form: [samples, features] and we are framing the problem as one time step for each sample. You can transform the prepared train and test input data into the expected structure using numpy.reshape() as follows:
%% Cell type:code id: tags:
``` python
# reshape input to be [samples, time steps, features]
trainX =
testX =
```
%% Cell type:markdown id: tags:
# Create an LSTM network
The sequential network has a visible layer with 1 input, a hidden layer with 4 LSTM blocks or neurons, and an output layer that makes a single value prediction. The default sigmoid activation function can be used for the LSTM blocks. The network is trained for 100 epochs and a batch size of 1 is used. For the loss function the mean squared error is a reasonable choice and the adam optimizer can be used.
%% Cell type:code id: tags:
``` python
# create and fit the LSTM network
```
%% Output
Epoch 1/100
- 0s - loss: 0.0410
Epoch 2/100
- 0s - loss: 0.0197
Epoch 3/100
- 0s - loss: 0.0141
Epoch 4/100
- 0s - loss: 0.0127
Epoch 5/100
- 0s - loss: 0.0117
Epoch 6/100
- 0s - loss: 0.0106
Epoch 7/100
- 0s - loss: 0.0096
Epoch 8/100
- 0s - loss: 0.0087
Epoch 9/100
- 0s - loss: 0.0076
Epoch 10/100
- 0s - loss: 0.0065
Epoch 11/100
- 0s - loss: 0.0057
Epoch 12/100
- 0s - loss: 0.0048
Epoch 13/100
- 0s - loss: 0.0041
Epoch 14/100
- 0s - loss: 0.0035
Epoch 15/100
- 0s - loss: 0.0030
Epoch 16/100
- 0s - loss: 0.0027
Epoch 17/100
- 0s - loss: 0.0025
Epoch 18/100
- 0s - loss: 0.0023
Epoch 19/100
- 0s - loss: 0.0022
Epoch 20/100
- 0s - loss: 0.0021
Epoch 21/100
- 0s - loss: 0.0021
Epoch 22/100
- 0s - loss: 0.0021
Epoch 23/100
- 0s - loss: 0.0021
Epoch 24/100
- 0s - loss: 0.0020
Epoch 25/100
- 0s - loss: 0.0020
Epoch 26/100
- 0s - loss: 0.0021
Epoch 27/100
- 0s - loss: 0.0020
Epoch 28/100
- 0s - loss: 0.0020
Epoch 29/100
- 0s - loss: 0.0020
Epoch 30/100
- 0s - loss: 0.0021
Epoch 31/100
- 0s - loss: 0.0020
Epoch 32/100
- 0s - loss: 0.0020
Epoch 33/100
- 0s - loss: 0.0021
Epoch 34/100
- 0s - loss: 0.0021
Epoch 35/100
- 0s - loss: 0.0021
Epoch 36/100
- 0s - loss: 0.0020
Epoch 37/100
- 0s - loss: 0.0021
Epoch 38/100
- 0s - loss: 0.0020
Epoch 39/100
- 0s - loss: 0.0021
Epoch 40/100
- 0s - loss: 0.0020
Epoch 41/100
- 0s - loss: 0.0020
Epoch 42/100
- 0s - loss: 0.0020
Epoch 43/100
- 0s - loss: 0.0021
Epoch 44/100
- 0s - loss: 0.0020
Epoch 45/100
- 0s - loss: 0.0021
Epoch 46/100
- 0s - loss: 0.0020
Epoch 47/100
- 0s - loss: 0.0020
Epoch 48/100
- 0s - loss: 0.0020
Epoch 49/100
- 0s - loss: 0.0020
Epoch 50/100
- 0s - loss: 0.0020
Epoch 51/100
- 0s - loss: 0.0020
Epoch 52/100
- 0s - loss: 0.0020
Epoch 53/100
- 0s - loss: 0.0020
Epoch 54/100
- 0s - loss: 0.0020
Epoch 55/100
- 0s - loss: 0.0021
Epoch 56/100
- 0s - loss: 0.0020
Epoch 57/100
- 0s - loss: 0.0020
Epoch 58/100
- 0s - loss: 0.0020
Epoch 59/100
- 0s - loss: 0.0020
Epoch 60/100
- 0s - loss: 0.0020
Epoch 61/100
- 0s - loss: 0.0021
Epoch 62/100
- 0s - loss: 0.0020
Epoch 63/100
- 0s - loss: 0.0020
Epoch 64/100
- 0s - loss: 0.0020
Epoch 65/100
- 0s - loss: 0.0020
Epoch 66/100
- 0s - loss: 0.0020
Epoch 67/100
- 0s - loss: 0.0020
Epoch 68/100
- 0s - loss: 0.0021
Epoch 69/100
- 0s - loss: 0.0020
Epoch 70/100
- 0s - loss: 0.0021
Epoch 71/100
- 0s - loss: 0.0020
Epoch 72/100
- 0s - loss: 0.0020
Epoch 73/100
- 0s - loss: 0.0020
Epoch 74/100
- 0s - loss: 0.0021
Epoch 75/100
- 0s - loss: 0.0021
Epoch 76/100
- 0s - loss: 0.0020
Epoch 77/100
- 0s - loss: 0.0021
Epoch 78/100
- 0s - loss: 0.0019
Epoch 79/100
- 0s - loss: 0.0022
Epoch 80/100
- 0s - loss: 0.0020
Epoch 81/100
- 0s - loss: 0.0020
Epoch 82/100
- 0s - loss: 0.0020
Epoch 83/100
- 0s - loss: 0.0020
Epoch 84/100
- 0s - loss: 0.0020
Epoch 85/100
- 0s - loss: 0.0021
Epoch 86/100
- 0s - loss: 0.0021
Epoch 87/100
- 0s - loss: 0.0020
Epoch 88/100
- 0s - loss: 0.0020
Epoch 89/100
- 0s - loss: 0.0020
Epoch 90/100
- 0s - loss: 0.0020
Epoch 91/100
- 0s - loss: 0.0020
Epoch 92/100
- 0s - loss: 0.0020
Epoch 93/100
- 0s - loss: 0.0021
Epoch 94/100
- 0s - loss: 0.0021
Epoch 95/100
- 0s - loss: 0.0020
Epoch 96/100
- 0s - loss: 0.0020
Epoch 97/100
- 0s - loss: 0.0020
Epoch 98/100
- 0s - loss: 0.0020
Epoch 99/100
- 0s - loss: 0.0020
Epoch 100/100
- 0s - loss: 0.0020
<keras.callbacks.callbacks.History at 0x2576945aa88>
%% Cell type:markdown id: tags:
Once the model is fit, we can estimate the performance of the model on the train and test datasets.
%% Cell type:markdown id: tags:
# Make predictions
Note that you must invert the scale of predictions before calculating error scores to ensure that performance is reported in the same units as the original data (thousands of passengers per month).
%% Cell type:code id: tags:
``` python
# make predictions
trainPredict =
testPredict =
# invert/rescale predictions
trainPredict =
trainY =
testPredict =
testY =
```
%% Cell type:markdown id: tags:
# Evaluate performance
Use the root mean squared error to measure the performance in the training and test datasets.
%% Cell type:code id: tags:
``` python
# calculate root mean squared error
trainScore =
print('Train Score: %.2f RMSE' % (trainScore))
testScore =
print('Test Score: %.2f RMSE' % (testScore))
```
%% Output
Train Score: 22.93 RMSE
Test Score: 47.60 RMSE
%% Cell type:markdown id: tags:
Finally, generate predictions using the model for both the train and test dataset to get a visual indication of the performance of the model.
Because of how the dataset was prepared, you must shift the predictions so that they align on the x-axis with the original dataset.
%% Cell type:code id: tags:
``` python
# shift train predictions for plotting
trainPredictPlot = numpy.empty_like(dataset)
trainPredictPlot[:, :] = numpy.nan
trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict
# shift test predictions for plotting
testPredictPlot = numpy.empty_like(dataset)
testPredictPlot[:, :] = numpy.nan
testPredictPlot[len(trainPredict)+(look_back*2):len(dataset), :] = testPredict
```
%% Cell type:markdown id: tags:
# Plot the training and test results per month
Once prepared, the data is plotted, showing the original dataset in blue, the predictions for the training dataset in green, and the predictions on the unseen test dataset in red.
%% Cell type:code id: tags:
``` python
# plot baseline and predictions
```
%% Output
%% Cell type:markdown id: tags:
Now test the performance for different values of look_back
"Month","Passengers"
"1949-01",112
"1949-02",118
"1949-03",132
"1949-04",129
"1949-05",121
"1949-06",135
"1949-07",148
"1949-08",148
"1949-09",136
"1949-10",119
"1949-11",104
"1949-12",118
"1950-01",115
"1950-02",126
"1950-03",141
"1950-04",135
"1950-05",125
"1950-06",149
"1950-07",170
"1950-08",170
"1950-09",158
"1950-10",133
"1950-11",114
"1950-12",140
"1951-01",145
"1951-02",150
"1951-03",178
"1951-04",163
"1951-05",172
"1951-06",178
"1951-07",199
"1951-08",199
"1951-09",184
"1951-10",162
"1951-11",146
"1951-12",166
"1952-01",171
"1952-02",180
"1952-03",193
"1952-04",181
"1952-05",183
"1952-06",218
"1952-07",230
"1952-08",242
"1952-09",209
"1952-10",191
"1952-11",172
"1952-12",194
"1953-01",196
"1953-02",196
"1953-03",236
"1953-04",235
"1953-05",229
"1953-06",243
"1953-07",264
"1953-08",272
"1953-09",237
"1953-10",211
"1953-11",180
"1953-12",201
"1954-01",204
"1954-02",188
"1954-03",235
"1954-04",227
"1954-05",234
"1954-06",264
"1954-07",302
"1954-08",293
"1954-09",259
"1954-10",229
"1954-11",203
"1954-12",229
"1955-01",242
"1955-02",233
"1955-03",267
"1955-04",269
"1955-05",270
"1955-06",315
"1955-07",364
"1955-08",347
"1955-09",312
"1955-10",274
"1955-11",237
"1955-12",278
"1956-01",284
"1956-02",277
"1956-03",317
"1956-04",313
"1956-05",318
"1956-06",374
"1956-07",413
"1956-08",405
"1956-09",355
"1956-10",306
"1956-11",271
"1956-12",306
"1957-01",315
"1957-02",301
"1957-03",356
"1957-04",348
"1957-05",355
"1957-06",422
"1957-07",465
"1957-08",467
"1957-09",404
"1957-10",347
"1957-11",305
"1957-12",336
"1958-01",340
"1958-02",318
"1958-03",362
"1958-04",348
"1958-05",363
"1958-06",435
"1958-07",491
"1958-08",505
"1958-09",404
"1958-10",359
"1958-11",310
"1958-12",337
"1959-01",360
"1959-02",342
"1959-03",406
"1959-04",396
"1959-05",420
"1959-06",472
"1959-07",548
"1959-08",559
"1959-09",463
"1959-10",407
"1959-11",362
"1959-12",405
"1960-01",417
"1960-02",391
"1960-03",419
"1960-04",461
"1960-05",472
"1960-06",535
"1960-07",622
"1960-08",606
"1960-09",508
"1960-10",461
"1960-11",390
"1960-12",432
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment