Week 1: Introduction to Regression

Learning Objectives

Concepts

Without any programming, you should be able to:

  • Define machine learning and explain the differences between supervised and unsupervised learning and regression and classification.
  • Describe what occurs in the six main steps of the machine learning workflow, those being data preprocessing, train-test split, training, testing performance, evaluating performance, and improving the model’s performance.
  • Explain the main components of every machine learning algorithm, including the input, the output, the trainable parameters, and the loss function.
  • Describe the following algorithms: linear regression, LASSO regression, logistic regression, and ridge regression. Be able to explain mathematically how each algorithm arrives at its trained parameter and how each of the loss functions differ.
  • Describe how the following techniques can be applied to each of the above algorithms to improve performance: feature engineering, use of a design matrix, hyperparameter tuning.

Implementation

Using the Python programming language, you should be able to:

  • Use the Pandas library to import, analyze, and clean a data file.
  • Use the Scikit-Learn library to perform a train-test split on a given data set.
  • Use the Scikit-Learn library to implement the following regression algorithms: linear regression, LASSO regression, logistic regression, and ridge regression.
  • Write code which will train each of the above algorithms using the training set, test its performance with the training set, evaluate its performance, and improve its performance.
  • Implement linear regression and ridge regression from scratch using only the Numpy library.