Recurrent Neural Networks#

CSC/DSC 340 Week 12 Slides

Author: Dr. Julie Butler

Date Created: November 4, 2023

Last Modified: November 5, 2023

Previously#

  • We have been using regular neural networks to perform interpolation

nn

import tensorflow as tf
import numpy as np
from sklearn.metrics import mean_squared_error as mse
import matplotlib.pyplot as plt
from keras.preprocessing.sequence import TimeseriesGenerator


# Generate training data
x_train = np.random.uniform(-3*np.pi, 3*np.pi, 1000)  # Generate 1000 random numbers between -1 and 1
y_train = np.sin(x_train)  # Get corresponding function values

# Create the neural network model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(1,)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(x_train, y_train, epochs=500, verbose=0)

# Test the model
x_test = np.random.uniform(-3*np.pi, 3*np.pi, 100)
y_test = np.sin(x_test)
y_pred = model.predict(x_test)

print(mse(y_test, y_pred))


plt.scatter(x_train, y_train)
plt.scatter(x_test, y_pred)
1/4 [======>.......................] - ETA: 0s

4/4 [==============================] - 0s 1ms/step
0.0007015685019484169
<matplotlib.collections.PathCollection at 0x175601c10>
_images/4f2819b77b4033bcd909753331fd39450c151417ec369e91497987c00607a31b.png
  • While regular neural networks perform well at interpolation, they tend to perform poorly when asked to extrapolate (this is a fact that is true of most machine learning algorithms)

X_train = np.arange(0,100,0.5) 
y_train = np.sin(X_train)

X_test = np.arange(90,200,0.5) 
y_test = np.sin(X_test)

# Create the neural network model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(1,)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=500, verbose=0)

y_pred = model.predict(X_test)

print(mse(y_test, y_pred))

plt.plot(X_train, y_train)
plt.plot(X_test, y_test)
plt.plot(X_test, y_pred, color='green')
1/7 [===>..........................] - ETA: 0s

7/7 [==============================] - 0s 590us/step
1.154618569497259
[<matplotlib.lines.Line2D at 0x176fd5bb0>]
_images/eeee07092de531f39780f21dc7ad38b645558a407ae64c857cad236cf342b5bb.png
  • Many extrapolation cases with machine learning use recurrent neural networks

  • While traditional neural networks are feedforward (data only passes from input layer to output layer) recurrent neural networks have a memory that feeds information backwards

rnn

  • Note that the input data for an RNN for both training and testing needs to be three dimensional.

  • SimpleRNN has the same arguments as Dense where the number is the number of neurons and we can set the activation function.

  • Note that you still need at least one Dense layer at the end of the network to “post-process” the results

X_train = np.arange(0,100,0.5) 
y_train = np.sin(X_train)

X_test = np.arange(90,200,0.5) 
y_test = np.sin(X_test)

# Preprocess the data for the RNN
X_train = X_train.reshape(-1, 1, 1)  # Reshape the input data for RNN

# Create the RNN model
model = tf.keras.Sequential([
    tf.keras.layers.SimpleRNN(32, activation='relu', input_shape=(1, 1)),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=100, verbose=0)

X_test = X_test.reshape(-1, 1, 1)
y_pred = model.predict(X_test)

print(mse(y_test.flatten(), y_pred))


plt.plot(X_train.flatten(), y_train)
plt.plot(X_test.flatten(), y_test)
plt.plot(X_test.flatten(), y_pred, color='green')
1/7 [===>..........................] - ETA: 0s

7/7 [==============================] - 0s 638us/step
0.5288347596842933
[<matplotlib.lines.Line2D at 0x176b29fa0>]
_images/7d2152705425f79106ee2bcf9000e34ba28eefbfbc8a48cffe317569cfc012e2.png
  • This network did not perform well at extrapolation, so let’s try some hyperparameter tuning

    • Try adding more SimpleRNN layers which will result in more recurrent neurons (more memory)

X_train = np.arange(0,100,0.5) 
y_train = np.sin(X_train)

X_test = np.arange(90,200,0.5) 
y_test = np.sin(X_test)

# Preprocess the data for the RNN
X_train = X_train.reshape(-1, 1, 1)  # Reshape the input data for RNN

# Create the RNN model
model = tf.keras.Sequential([
    tf.keras.layers.SimpleRNN(32, activation='relu', return_sequences=True, input_shape=(1, 1)),
    tf.keras.layers.SimpleRNN(32, activation='relu',return_sequences=True),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=100, verbose=0)

X_test = X_test.reshape(-1, 1, 1)
y_pred = model.predict(X_test)

print(mse(y_test, y_pred.flatten()))


plt.plot(X_train.flatten(), y_train)
plt.plot(X_test.flatten(), y_test)
plt.plot(X_test.flatten(), y_pred.flatten(), color='green')
1/7 [===>..........................] - ETA: 0s

7/7 [==============================] - 0s 772us/step
0.5173149708723822
[<matplotlib.lines.Line2D at 0x176d6f5b0>]
_images/cd2de50ab3612ec562cb02488458887feb04b5aa73f66ad04d383b40206938a6.png
  • Some applications of RNNs have shown that in addition to using Dense layers to post-process the results of RNN layers, Dense layers can also be used to pre-process the results before they reach the RNN layers

X_train = np.arange(0,100,0.5) 
y_train = np.sin(X_train)

X_test = np.arange(90,200,0.5) 
y_test = np.sin(X_test)

# Preprocess the data for the RNN
X_train = X_train.reshape(-1, 1, 1)  # Reshape the input data for RNN

# Create the RNN model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(32, activation='relu', input_shape=(1, 1)),
    tf.keras.layers.SimpleRNN(32, activation='relu', return_sequences=True),
    tf.keras.layers.SimpleRNN(32, activation='relu',return_sequences=True),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=100, verbose=0)

X_test = X_test.reshape(-1, 1, 1)
y_pred = model.predict(X_test)

print(mse(y_test, y_pred.flatten()))


plt.plot(X_train.flatten(), y_train)
plt.plot(X_test.flatten(), y_test)
plt.plot(X_test.flatten(), y_pred.flatten(), color='green')
1/7 [===>..........................] - ETA: 1s

7/7 [==============================] - 0s 815us/step
0.5700624385641841
[<matplotlib.lines.Line2D at 0x290c1f340>]
_images/b41106fc3217b11644c3df97b0d5285f6db03eb69b02173b780a4eb7982d6d1e.png
  • SimpleRNN layers are not working too well at this task, so let’s attempt to try another type of RNN layer

  • LSTM (long short term memory) layers are improved RNN layers

  • It maintains long-term dependencies in the data by replacing a simple neuron with a memory cell that can store information over time

  • The memory is maintained through a series of logic gates

  • Well suited for applications with ordered data such as natural language processing

X_train = np.arange(0,100,0.5) 
y_train = np.sin(X_train)

X_test = np.arange(90,200,0.5) 
y_test = np.sin(X_test)

# Preprocess the data for the RNN
X_train = X_train.reshape(-1, 1, 1)  # Reshape the input data for RNN

# Create the RNN model
model = tf.keras.Sequential([
    tf.keras.layers.LSTM(32, activation='relu', return_sequences=True, input_shape=(1, 1)),
    tf.keras.layers.LSTM(32, activation='relu',return_sequences=True),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=100, verbose=0)

X_test = X_test.reshape(-1, 1, 1)
y_pred = model.predict(X_test)

print(mse(y_test, y_pred.flatten()))


plt.plot(X_train.flatten(), y_train)
plt.plot(X_test.flatten(), y_test)
plt.plot(X_test.flatten(), y_pred.flatten(), color='green')
1/7 [===>..........................] - ETA: 0s

7/7 [==============================] - 0s 822us/step
0.5052410511806004
[<matplotlib.lines.Line2D at 0x1765e0fd0>]
_images/8ac326d44f910fec8400c88df2cc12a6c8d66f1cfae54b05eea5028324970efe.png
  • Another, improved RNN layer is called a GRU (gated recurrent unit)

  • Perform similarly to LSTM layers and are used in many of the same applications, but are less prone to the vanishing gradient problem

X_train = np.arange(0,100,0.5) 
y_train = np.sin(X_train)

X_test = np.arange(90,200,0.5) 
y_test = np.sin(X_test)

# Preprocess the data for the RNN
X_train = X_train.reshape(-1, 1, 1)  # Reshape the input data for RNN

# Create the RNN model
model = tf.keras.Sequential([
    tf.keras.layers.GRU(32, activation='relu', return_sequences=True, input_shape=(1, 1)),
    tf.keras.layers.GRU(32, activation='relu',return_sequences=True),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=100, verbose=0)

X_test = X_test.reshape(-1, 1, 1)
y_pred = model.predict(X_test)

print(mse(y_test, y_pred.flatten()))


plt.plot(X_train.flatten(), y_train)
plt.plot(X_test.flatten(), y_test)
plt.plot(X_test.flatten(), y_pred.flatten(), color='green')
1/7 [===>..........................] - ETA: 0s

7/7 [==============================] - 0s 828us/step
0.5355605067324839
[<matplotlib.lines.Line2D at 0x2944fb880>]
_images/0d118af3adb5dbcef29a701b790afa9c74c945b613eefc95546017064f2c8a3e.png
  • All these RNN networks are performing poorly on what should be an easy to recognize long term pattern

  • Perhaps the problem is not the RNNs but on how we are trying to train them.

Traditional Training: \(f(x_i) = y_i\)

Time Series Training: \(f(y_{i-3}, y_{i-2}, y_{i-1}) = y_i\)

  • Limitation: the data must be evenly spaced by some sort of input variable (this is now a hidden variable)

  • Let’s try reformatting our data as a time series and training a very simple RNN

n_features = 1

train_series = y_train.reshape((len(y_train), n_features))
test_series  = y_test.reshape((len(y_test), n_features))

seq  = 20

train_generator = TimeseriesGenerator(train_series, train_series,
                                      length        = seq, 
                                      sampling_rate = 1,
                                      stride        = 1,
                                      batch_size    = 10)

test_generator = TimeseriesGenerator(test_series, test_series,
                                      length        = seq, 
                                      sampling_rate = 1,
                                      stride        = 1,
                                      batch_size    = 10)
n_neurons  = 4
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(n_neurons, input_shape=(seq, n_features)))
model.add(tf.keras.layers.Dense(1))
model.compile(optimizer='adam', loss='mse')

model.fit(train_generator,epochs=300, verbose=0)

y_pred  = model.predict(test_generator)

print(mse(y_test[20:], y_pred.flatten()))

plt.plot(X_train.flatten(),y_train)
plt.plot(X_test.flatten(),y_test)
plt.plot(X_test[20:].flatten(),y_pred)
 1/20 [>.............................] - ETA: 4s

20/20 [==============================] - 0s 1ms/step
3.928643762298923e-06
[<matplotlib.lines.Line2D at 0x2952cefa0>]
_images/4aa396664829b2aab2be47632e4e4e690563041ec266e552e229ccef27f03154.png
  • Even with an exceedingly small and simple RNN, it was able to capture the patterns in the data using the time series data formatting.

  • This is how all RNNs are taught sequential data, even text/natural language processing (more on this next week!)