Recurrent Neural Networks

CSC/DSC 340 Week 12 Slides

Author: Dr. Julie Butler

Date Created: November 4, 2023

Last Modified: November 5, 2023

Previously

  • We have been using regular neural networks to perform interpolation

nn
import tensorflow as tf
import numpy as np
from sklearn.metrics import mean_squared_error as mse
import matplotlib.pyplot as plt
from keras.preprocessing.sequence import TimeseriesGenerator


# Generate training data
x_train = np.random.uniform(-3*np.pi, 3*np.pi, 1000)  # Generate 1000 random numbers between -1 and 1
y_train = np.sin(x_train)  # Get corresponding function values

# Create the neural network model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(1,)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(x_train, y_train, epochs=500, verbose=0)

# Test the model
x_test = np.random.uniform(-3*np.pi, 3*np.pi, 100)
y_test = np.sin(x_test)
y_pred = model.predict(x_test)

print(mse(y_test, y_pred))


plt.scatter(x_train, y_train)
plt.scatter(x_test, y_pred)
4/4 [==============================] - 0s 1ms/step
0.0004397257246680628

  • While regular neural networks perform well at interpolation, they tend to perform poorly when asked to extrapolate (this is a fact that is true of most machine learning algorithms)
X_train = np.arange(0,100,0.5) 
y_train = np.sin(X_train)

X_test = np.arange(90,200,0.5) 
y_test = np.sin(X_test)

# Create the neural network model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(1,)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=500, verbose=0)

y_pred = model.predict(X_test)

print(mse(y_test, y_pred))

plt.plot(X_train, y_train)
plt.plot(X_test, y_test)
plt.plot(X_test, y_pred, color='green')
7/7 [==============================] - 0s 709us/step
0.9014237629730215

  • Many extrapolation cases with machine learning use recurrent neural networks
  • While traditional neural networks are feedforward (data only passes from input layer to output layer) recurrent neural networks have a memory that feeds information backwards

rnn
  • Note that the input data for an RNN for both training and testing needs to be three dimensional.

  • SimpleRNN has the same arguments as Dense where the number is the number of neurons and we can set the activation function.

  • Note that you still need at least one Dense layer at the end of the network to “post-process” the results

X_train = np.arange(0,100,0.5) 
y_train = np.sin(X_train)

X_test = np.arange(90,200,0.5) 
y_test = np.sin(X_test)

# Preprocess the data for the RNN
X_train = X_train.reshape(-1, 1, 1)  # Reshape the input data for RNN

# Create the RNN model
model = tf.keras.Sequential([
    tf.keras.layers.SimpleRNN(32, activation='relu', input_shape=(1, 1)),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=100, verbose=0)

X_test = X_test.reshape(-1, 1, 1)
y_pred = model.predict(X_test)

print(mse(y_test.flatten(), y_pred))


plt.plot(X_train.flatten(), y_train)
plt.plot(X_test.flatten(), y_test)
plt.plot(X_test.flatten(), y_pred, color='green')
7/7 [==============================] - 0s 835us/step
0.5763407301417874

  • This network did not perform well at extrapolation, so let’s try some hyperparameter tuning
    • Try adding more SimpleRNN layers which will result in more recurrent neurons (more memory)
X_train = np.arange(0,100,0.5) 
y_train = np.sin(X_train)

X_test = np.arange(90,200,0.5) 
y_test = np.sin(X_test)

# Preprocess the data for the RNN
X_train = X_train.reshape(-1, 1, 1)  # Reshape the input data for RNN

# Create the RNN model
model = tf.keras.Sequential([
    tf.keras.layers.SimpleRNN(32, activation='relu', return_sequences=True, input_shape=(1, 1)),
    tf.keras.layers.SimpleRNN(32, activation='relu',return_sequences=True),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=100, verbose=0)

X_test = X_test.reshape(-1, 1, 1)
y_pred = model.predict(X_test)

print(mse(y_test, y_pred.flatten()))


plt.plot(X_train.flatten(), y_train)
plt.plot(X_test.flatten(), y_test)
plt.plot(X_test.flatten(), y_pred.flatten(), color='green')
7/7 [==============================] - 0s 1ms/step
0.5098268571541952

  • Some applications of RNNs have shown that in addition to using Dense layers to post-process the results of RNN layers, Dense layers can also be used to pre-process the results before they reach the RNN layers
X_train = np.arange(0,100,0.5) 
y_train = np.sin(X_train)

X_test = np.arange(90,200,0.5) 
y_test = np.sin(X_test)

# Preprocess the data for the RNN
X_train = X_train.reshape(-1, 1, 1)  # Reshape the input data for RNN

# Create the RNN model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(32, activation='relu', input_shape=(1, 1)),
    tf.keras.layers.SimpleRNN(32, activation='relu', return_sequences=True),
    tf.keras.layers.SimpleRNN(32, activation='relu',return_sequences=True),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=100, verbose=0)

X_test = X_test.reshape(-1, 1, 1)
y_pred = model.predict(X_test)

print(mse(y_test, y_pred.flatten()))


plt.plot(X_train.flatten(), y_train)
plt.plot(X_test.flatten(), y_test)
plt.plot(X_test.flatten(), y_pred.flatten(), color='green')
7/7 [==============================] - 0s 988us/step
0.6448849821374143

  • SimpleRNN layers are not working too well at this task, so let’s attempt to try another type of RNN layer
  • LSTM (long short term memory) layers are improved RNN layers
  • It maintains long-term dependencies in the data by replacing a simple neuron with a memory cell that can store information over time
  • The memory is maintained through a series of logic gates
  • Well suited for applications with ordered data such as natural language processing
X_train = np.arange(0,100,0.5) 
y_train = np.sin(X_train)

X_test = np.arange(90,200,0.5) 
y_test = np.sin(X_test)

# Preprocess the data for the RNN
X_train = X_train.reshape(-1, 1, 1)  # Reshape the input data for RNN

# Create the RNN model
model = tf.keras.Sequential([
    tf.keras.layers.LSTM(32, activation='relu', return_sequences=True, input_shape=(1, 1)),
    tf.keras.layers.LSTM(32, activation='relu',return_sequences=True),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=100, verbose=0)

X_test = X_test.reshape(-1, 1, 1)
y_pred = model.predict(X_test)

print(mse(y_test, y_pred.flatten()))


plt.plot(X_train.flatten(), y_train)
plt.plot(X_test.flatten(), y_test)
plt.plot(X_test.flatten(), y_pred.flatten(), color='green')
7/7 [==============================] - 0s 770us/step
0.5191158924236982

  • Another, improved RNN layer is called a GRU (gated recurrent unit)
  • Perform similarly to LSTM layers and are used in many of the same applications, but are less prone to the vanishing gradient problem
X_train = np.arange(0,100,0.5) 
y_train = np.sin(X_train)

X_test = np.arange(90,200,0.5) 
y_test = np.sin(X_test)

# Preprocess the data for the RNN
X_train = X_train.reshape(-1, 1, 1)  # Reshape the input data for RNN

# Create the RNN model
model = tf.keras.Sequential([
    tf.keras.layers.GRU(32, activation='relu', return_sequences=True, input_shape=(1, 1)),
    tf.keras.layers.GRU(32, activation='relu',return_sequences=True),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=100, verbose=0)

X_test = X_test.reshape(-1, 1, 1)
y_pred = model.predict(X_test)

print(mse(y_test, y_pred.flatten()))


plt.plot(X_train.flatten(), y_train)
plt.plot(X_test.flatten(), y_test)
plt.plot(X_test.flatten(), y_pred.flatten(), color='green')
7/7 [==============================] - 0s 766us/step
0.5433677944169741

  • All these RNN networks are performing poorly on what should be an easy to recognize long term pattern
  • Perhaps the problem is not the RNNs but on how we are trying to train them.

Traditional Training: \(f(x_i) = y_i\)

Time Series Training: \(f(y_{i-3}, y_{i-2}, y_{i-1}) = y_i\)

  • Limitation: the data must be evenly spaced by some sort of input variable (this is now a hidden variable)

  • Let’s try reformatting our data as a time series and training a very simple RNN

n_features = 1

train_series = y_train.reshape((len(y_train), n_features))
test_series  = y_test.reshape((len(y_test), n_features))

seq  = 20

train_generator = TimeseriesGenerator(train_series, train_series,
                                      length        = seq, 
                                      sampling_rate = 1,
                                      stride        = 1,
                                      batch_size    = 10)

test_generator = TimeseriesGenerator(test_series, test_series,
                                      length        = seq, 
                                      sampling_rate = 1,
                                      stride        = 1,
                                      batch_size    = 10)
n_neurons  = 4
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(n_neurons, input_shape=(seq, n_features)))
model.add(tf.keras.layers.Dense(1))
model.compile(optimizer='adam', loss='mse')

model.fit(train_generator,epochs=300, verbose=0)

y_pred  = model.predict(test_generator)

print(mse(y_test[20:], y_pred.flatten()))

plt.plot(X_train.flatten(),y_train)
plt.plot(X_test.flatten(),y_test)
plt.plot(X_test[20:].flatten(),y_pred)
20/20 [==============================] - 0s 1ms/step
8.970706534730378e-06

  • Even with an exceedingly small and simple RNN, it was able to capture the patterns in the data using the time series data formatting.
  • This is how all RNNs are taught sequential data, even text/natural language processing (more on this next week!)