R Homework

Perform the following in R. Submit either a Jupyter Notebook file (.ipynb) or an RStudio file (.r) to the D2L Dropbox. Problems 1, 2, and 3 are respectively weighted at 8, 4, and 12 points.

Problem 1 (4 pts.)

We practiced making a histogram, but that’s not the only way to visualize data:

  • Using custdata.tsv, make a “density plot” of the “Income” column. Be sure to clean the data first and remove the values that make no sense. Ensure that the plot has a title and labeled axes.
  • Add a red vertical line to your plot that represents the average value of the cleaned income data.
  • Also, explain what a density plot is and how it is both similar and different from a histogram.

Some of this is new – look up the syntax as needed.

Problem 2 (8 pts.)

Create a bar chart for housing type using the custdata.tsv file. Make sure to clean the data first to remove “NA” type.

Problem 3 (12 pts.)

Here’s a silly problem. You are given a data file (OA 6.6 survey.csv) containing observations from someone who surveyed their 1000 friends on social media to find out how much that person travels (Miles), plays games (Games), and eats ice cream (Icecream). Additionally, they recorded whether they liked that social media friend or not (Like). Use this data set and R to answer the following questions:

  • Is there a relationship between eating ice cream and playing games? What about playing games and traveling? Report the correlation values for these and your conclusion (1-2 sentences) for each after doing a quantitative analysis.
  • Use Miles to predict Games. Make a regression graph, find the linear regression fit line and plot it. Write the line equation on the graph. (this last part is new - look it up!)
  • Cluster the data based on outcome (Like). Use Miles and Games to plot the data and color the points using Like. Cluster the data using k-means and plot the same data using clustering information. Compare the two plots. In 2-4 sentences, explain how well your clustering worked.