Python Homework Part 3
The code you submit for this homework must following the coding guidelines.
Problem 1 (50 pts.)
(This task was begun at the end of the Machine Learning notes.)
Consider the question of identifying the survivors of the Titanic. Determine an appropriate machine learning approach covered in class and implement it.
Specifically:
- Using an appropriate method,train a machine learning algorithm on the dataset titanic_train.csv
- Predict the fate of the survivors in titanic_test.csv
- Check your predictions against the complete table of survivors in titanic_complete.csv
- Report the % accuracy of your approach. (You’ll find that it isn’t incredibly high – this is a challenging data set to model and the basic methods we’re using here aren’t quite up to the task.)
One challenge you’ll face: some columns you should want to include are categorical (e.g.male vs. female). You can use df["column name"] = df["column name"].map(lambda x : func(x))
where func
is some function you write that converts a string to a float or converts continuous data to discrete data.
Submit a Jupyter Notebook to the D2L Dropbox.
Problem 2 (24 pts.)
In your own words, explain the following terms in a few sentences each: 1. (3 pts.) Machine Learning 2. (3 pts.) Unsupervised Machine Learning 3. (3 pts.) Classification 4. (3 pts.) k-Nearest Neighbors 5. (3 pts.) K-Means 6. (3 pts.) Accuracy Score 7. (3 pts.) Confusion Matrix 8. (3 pts.) Train-Test Split
Submit a Word Doc or a PDF with the answers to these questions.
Problem 3 (6 pts.)
In your own words, explain the machine learning workflow in a short paragraph or in bullet points. Explain what happens in each step, not just the name of the step.
Submit a Word Doc or a PDF with the answers to these questions.