Week 13: Attention and Transformer Networks

Learning Objectives

Note that the learning objectives related to natural language processing from Week 12 will also apply here.

Concepts

Without any programming, you should be able to:

  • Describe the architecture of an attention network. Compare an attention network to neural networks (NNs), convolutional neural networks (CNNs), and recurrent neural networks (RNNs).
  • Describe the architecture of a transformer network. Compare a transformer network to neural networks (NNs), convolutional neural networks (CNNs), and recurrent neural networks (RNNs).
  • Explain the process of encoding and decoding as they relate to attention layers.
  • Explain why both an attention network and a transformer network are needed to create large language models and what purpose do each of them have. What order do the layers need to be in?
  • Compare language processing with attention and transformer networks to language processing with recurrent neural networks.
  • Explain (though not neccessarily compute by hand) the mathematics happening during an attention transformer model.

Implementation

Using the Python programming language, you should be able to:

  • Use the libraries Tensorflow and Keras to implement attention and transformer layers.
  • Use the attention and transformer layers to perform simple natural language processing tasks.