DSC 340 Week 1: Introduction to Machine Learning

Dr. Julie Butler

DATE: TBD

Plans for the Week

Monday

  • Introduction to the Course
  • Begin “What is Machine Learning?”

Wednesday

  • Finish “What is Machine Learning?”
  • Begin In-Class Project: Mathematical Crash Course with Python
  • Suggested Reading: Hands-On Machine Learning Chapter 1

Friday

  • In-Class Project: Mathematical Crash Course with Python

Course Information

Instructor Information

  • Dr. Julie Butler
  • Email: TBD
  • Office: Bracy 107
  • Office Hours: Monday 12:30pm-3pm, Tuesday 3:30pm-6pm, and by appointment

General Weekly Schedule

Monday

  • Introduction to new material with lecture and some guided examples
  • Office Hours: 12:30pm - 3pm
  • Pre-class homework due before class
  • In-class project from previous week due before class

Tuesday

  • Office Hours 3:30pm - 6pm

Wednesday

  • Finish lecture material
  • Begin weekly in-class project individually or in groups up to three people
  • Post-class homework from previous week due before class

Thursday

  • Pre-class homework for next week released

Friday

  • Finish weekly in-class project
  • Post-class homework released

Topics Covered (By Week)

  1. What is machine learning?; Mathematics Crash Course
  2. Introduction to Machine Learning with Linear Regression
  3. Ridge and LASSO Regression for Regression
  4. Ridge and LASSO Regression for Classification
  5. Support Vector Machines (SVMs) for Classification and Regression
  6. Unsupervised Learning: Clustering and Dimensionality Reduction
  7. Introduction to Neural Networks with Keras
  8. Introduction to Neural Networks with Tensorflow
  9. Introduction to Neural Networks with Jax
  10. Convolutional Neural Networks for Image Classification Part 1
  11. Convoluional Neural Networks for Image Classification Part 2
  12. Recurrent Neural Networks for Time Series Analysis
  13. Closing Remarks and Work on Final Projects
  14. Final Project Presentations

Grading Policy

  • Pre-class Homework: 10%
  • In-Class Projects: 20%
  • Post-Class Homework: 30%
  • Final Project: 40%

Lowest two scores will be dropped from pre-class homework, in-class projects, and post-class homework (6 scores dropped in total)

Assignments that are not turned in will recieve a zero

Pre-Class Homework

  • Short assigments to introduce concepts before the lecture
  • Will include some reading assigments, some conceptual questions, and some programming questions
  • Should take aroound 1.5 hours to complete
  • Pre-class homeworks are released on Thursday the week before they are due. They are due on Mondays at the start of class.

In-Class Projects

  • In-class projects will be worked on during class on Wednesdays (if lecture is complete) and Fridays.
  • They can be completed after class on Friday if needed. They are due before class on Monday the week after they are assigned.
  • Should take less 2 hours to complete and will allow you to explore the concepts learned in lecture with actual machine learning examples.

Post-Class Homework

  • Post-class homework assignments are released at the end of class on Friday and are due at the start of class on the following Wednesday.
  • Longer and more in-depth machine learning problems, should take around 3 hours to complete

Final Project

The final project will consist of two components: a 10 minute presentation given during the last week of the course and a report written in the style of a scientific paper due the last day of this course.

Small pieces of the final project are due throughout the semester, and a complete list of expectations can be found at the below website.

Any topic and machine learning algorithm are valid for this project but the proposed project must have sufficient complexity and be unique (different from other works found online).

You can complete this project individually or in groups up to size three. If you are working in groups, it is expected that each member contributes equally.

More details: juliebutler.org/DSC340/finalproject

Late Work Policy

Everyone has 10 late days that they can apply to turning in any pre-class homework, in-class project, or post-class homework. You can apply as many days as you want but you only get 10 late days in total for the entire semester. You must email me if you plan on using the late days and how many.

Example, you can apply 2 late days to pre-class homework #3, so you turn it in on Wednesday instead of Monday. You now have 8 late days remaining to use on other assignments.

Other extensions will not be given on assignments and late work will not be accepted EXCEPT in very special circumstances (talk to me).

Note, late days cannot be used on turning in the final report or the final presentation.

Group Work Policy

Working on assignments in groups is allowed and encouraged in the course. Group size must be limited to three people however. For assignments that you complete in a group, the same assignment can be submitted by all group members. Everyone must submit the assignment and it must contain everyone’s names.

Working in groups implies that everyone in the group is contributing equally. If this becomes a problem, I reserve the right to restructure or disband certain groups.

Technology Requirements

A laptop or other device with an internet connection is required in every class period and to complete all out of class assignments. All assignments will be completed on Google’s Colab, a cloud based Python notebook computing environment which has all libraies needed for this class pre-installed.

Assignment Submission

Assignments will be submitted through the course’s D2L page. For notebook submittions, please submit a “View” link to your Google Colab notebook. Details on getting this link can be found in the syllabus.

Important Resources

  • Course Website: juliebutler.org/DSC340
  • Assignments are turned in on D2L page
  • Course Slack, invitation in weekly email
  • Textbook: Hands-On Machine Learning, 3rd ed.
  • Additional readings and resources will be provided throughout the course
  • All assignments will be completed in Google Colab

Providing Feedback

Annoymous Feedback Form

Questions?

What is Machine Learning?

What is Machine Learning?

  • There has been much recent interest in machine learning and artifical intelligence (ChatGTP, self-driving cars, etc.) but generally not a good explantion of what it is.
  • Machine learning is the field that occurs at artificial intelligence and data science; it is a collection of programs that learn from given data

When is Machine Learning Useful?

  • Large datsets
  • Datasets with unknown patterns
  • Image and video analysis
  • Text processing (Natural Language Processing)
  • Predicting future values

The Machine Learning Workflow

  1. Importing your data set and formatting it
  2. Splitting the data into a training set and a test set
  3. Training your machine learning model with the training set
  4. Evaluate the trained model’s performance with the test set
  5. (Optional) Make improvements to your model to increase its performance

Types of Machine Learning

  • Machine learning algorithms are classified on what kind of data they take
  • Data sets have two components:
    • X: the inputs or the independent variables; features
    • y: the outputs or the dependent variables; labels

Supervised Learning

  • Takes labelled data (i.e. both X and y) and learns the pattern between the features and the labels
  • Two types of supervised learning based on the task
    • Classification
      • Sorting inputs into a set number of categories
      • Ex: Given a picture, determine if the picture shows a cat or a dog (two categories)
    • Regression
      • Infinite number of possible outputs
      • Approximating a function f such that \(f(X) \approx y\)
      • Ex: Given some information on a house, what price should it sale for?
  • Examples: k-nearest neighbors, linear regression, support vector machines, neural networks

Unsupervised Learning

  • Learn patterns from unlabelled data (i.e. its given only X)
  • Can also be roughly split into two categories depending on the task
    • Clustering
      • Unsupervised version of classification
      • How many categories can the data be split into?
      • Given information of a bunch of different iris flowers, how many species are present?
    • Dimensionality Reduction
      • Reduces the number of features by combining similar features
      • When trying to determine if a person is at risk for diabetes you are given 8 measures of a person’s health. Can the total number of features be reduced by combining similar ones?
      • Supervised algrithms may perform better on data sets with many features if dimensionality reduction is preform first
  • Ex: principal component analysis (PCA), k-means, and heirarchical cluster analysis (HCA)

Semisupervised Learning

  • Most data is unlabelled but some is labelled
  • Not always considered a separate type of machine learning
  • Ex: Photo apps will identify and group faces together, you have to provide the names of each person

Reinforcement Learning

  • Based around an agent that learns to perform a task by maximizing a reward
  • Common in fields such as robotics

Offline vs. Online Learning

  • Offline Learning
    • All training data is given to the algorithm at once to train
    • If new data is added to the training set, need to entirely retrain with the old and the new data
  • Online Learning
    • Training data can be given in batches at any time, online algorithms can improve themselves at any time if given new data
  • Most algorithms in this class will use offline learning, but online learning does have advantages when it comes to memory

Challenges of Machine Learning

Problems with Data Sets

  • Small training sets
  • Poor quality data
  • Too many features or irrelevant features
  • Overfitting or underfitting

Computational Limitations

  • Memory (RAM)
  • Lack of GPUs and computing clusters