DSC 140S Final Exam Topics
Though not directly tested, you may be asked to answer questions that will use skills tested on the midterm exam. See the list of midterm exam topics for a review.
Even if not directly stated below, you should be able to interpret and explain every graph and statistic you may be asked to calculate.
Machine Learning in Python
- Perform a train-test split of a data set with a specified amount of data going to the training set.
- Use k-nearest neighbors with a specified number of neighbors to classify a data set. This includes defining the algorithm, training the algorithm, and using the trained algorithm to make predictions.
- Compute an accuracy score using the results of a k-nearest neighbors algorithm. Interpret the results of the accuracy score.
- Create and interpret a confusion matrix.
- Create and interpret a correlation matrix. Be able to display the amtrix in a meaningful and easy to understand way.
- Define and train a k-means algorithm with a specified number of clusters.
- Evaluate the performance of a k-means algorithm (note that you will not be asked to create an elbow plot from scratch but you may be given the code to create one and asked to interpret it).
R
In the R programming language you should be able Topics
- Make a comment.
- Print one or more pieces of information with a single statement.
- Perform the basic mathematical operations of addition, subtraction, multiplication, division, and exponentiation.
- Create variables and assign them values.
- Create and use boolean expressions.
- Create a list and access elements using list indexing.
- Determine the length, minimum value, and maximum value in a list using built-in functions.
- Create an if, if/else, or if/elif/else statement.
- Create a while or for loop.
- Create a function which has arguments and/or returned values.
- Import a data file saved to your computer as a dataframe.
- Access all column names in a dataframe.
- Access a column of data from a dataframe.
- Print a statistical summary of a dataframe, or a single column of a dataframe.
- Create a pie chart for a categorical variable.
- Create a contingency table with one or more columns of data.
- Using the ggplot2 library you should be able to:
- Create a histogram, changing its color and bin size.
- Create a bar plot of counts or format the data to create a bar plot depicting other trends (such as averages)
- Create a scatter plot with two columns of data. Be able to set the color and/or shape of the scatter plots based on a third column of data.
- Create a box plot for a single column of data or grouped by a second column of data.
- Create a scatter plot with a line of best fit.
- Change the x axis label, y axis label, and title of the plot
- Calculate a Pearson’s correlation coefficient. Interpret the results.
- Use a mask to create a sub-dataframe.
- Find numeric values for the slope and intercept of a line of best fit. Intepret the results.
- Perform a train-test split of a data set with a specified amount of data going to the training set.
- Use k-nearest neighbors with a specified number of neighbors to classify a data set. This includes defining the algorithm, training the algorithm, and using the trained algorithm to make predictions.
- Compute an accuracy score using the results of a k-nearest neighbors algorithm. Interpret the results of the accuracy score.
- Create and interpret a confusion matrix.
- Create and interpret a correlation matrix. Be able to display the amtrix in a meaningful and easy to understand way.
- Define and train a k-means algorithm with a specified number of clusters.
- Evaluate the performance of a k-means algorithm (note that you will not be asked to create an elbow plot from scratch but you may be given the code to create one and asked to interpret it).
- Perform a chi-squared test on two columns of data and interpret the results.
MySQL
Using the pymysql
library you should be able to do the following:
- Connect to a database given the connection code.
- Display the names of every table in the database.
- Display the column names for every column in a table.
- Fetch the data from every column or selected columns in a table. Be able to fetch only selected data as well usng the the
WHERE
keyword. - Use an
INNER JOIN
statement to select data across multiple tables. - Convert the data retrieved from a database to a Pandas Dataframe.