Week 1: Introduction to Regression
Learning Objectives
Concepts
Without any programming, you should be able to: * Define machine learning and explain the differences between supervised and unsupervised learning and regression and classification. * Describe what occurs in the six main steps of the machine learning workflow, those being data preprocessing, train-test split, training, testing performance, evaluating performance, and improving the model’s performance. * Explain the main components of every machine learning algorithm, including the input, the output, the trainable parameters, and the loss function. * Describe the following algorithms: linear regression, LASSO regression, logistic regression, and ridge regression. Be able to explain mathematically how each algorithm arrives at its trained parameter and how each of the loss functions differ. * Describe how the following techniques can be applied to each of the above algorithms to improve performance: feature engineering, use of a design matrix, hyperparameter tuning.
Implementation
Using the Python programming language, you should be able to:
- Use the Pandas library to import, analyze, and clean a data file.
- Use the Scikit-Learn library to perform a train-test split on a given data set.
- Use the Scikit-Learn library to implement the following regression algorithms: linear regression, LASSO regression, logistic regression, and ridge regression.
- Write code which will train each of the above algorithms using the training set, test its performance with the training set, evaluate its performance, and improve its performance.
- Implement linear regression and ridge regression from scratch using only the Numpy library.