12 Recurrent Neural Networks

Neural Networks

Recurrent Neural Networks

LSTM

Deep Learning

This lecture discusses the basics of recurrent neural networks (RNNs), including their architecture, training process, and applications. It also covers the concept of long short-term memory (LSTM) networks and their role in handling sequential data.

Author

Jiuru Lyu

Published

April 7, 2025

Recurrent Neural Networks (RNNs)

Application:
- Language modeling
- Sequence tagging
- Text classification
RNN is a family of neural networks for processing sequential data of arbitrary length.
- Output of the layer can connect back to the neuron itself or a layer before it.
- Share same weights across several time steps.
A recurrence function is applied at each step: \[h_t=f_W(h_{t-1},x_t),\] where
- \(h_t\) is the new state
- \(f_W\) is a neural network with parameter \(W\)
- \(h_{t-1}\) is the old state
- \(x-t\) is the input feature vector at time step \(t\)
Vanilla RNN: connect the output of the last layer to the input of the next layer.
- The problem of long-term dependencies:
  - Appeal of RNN is to connect previous information to current task.
  - Gap between relevant information and where we need it can be large.
  - Long-range dependencies are difficult to learn because of vanishing gradients or exploding gradients.
To solve the problem, we introduce LSTM networks and GRU networks.
There are other more advanced architectures, such as Attention and Transformer networks.