Sample Midterm Exam 2
Part 1: Data Analysis in Excel
Answer each question below in a separate tab in an Excel workbook. You may create multiple copies of the data if needed. Download the file and perform the following tasks:
- Make a scatter plot of age on the x axis and avg_glucose_level on the y axis. Insert a linear line of best fit and display the equation and \(R^2\) value on the chart. Label the axes, include a title, and customize the style of the chart from the default.
- In a cell near your figure, respond to the question “How does glucose level tend to change as patients age?” Justify your answer.
- Repeat the first part only showing data for patients who have had a stroke (they have a “1” in the stroke column).
- Calculate the average age, average BMI, and average glucose level for (a) stroke patients and (b) non-stroke patients
- What is the correlation coefficient between the avg_glucose_level and bmi columns?
- Make a histogram of the avg_glucose_level column. Customize the bin width to be 12.
- Each patient can be characterized as Male or Female, with or without hypertension, and with or without heart disease.
- How many stroke patients fall into each of these 8 categories (e.g., how many are male/with hypertension/with heart disease)? Use a pivot table to answer this question.
- In a cell near the table, indicate the number of male patients with hypertension and without heart disease (this just requires you to interpret the values in your pivot table correctly).
Part 2: Python Basics
As this is a timed exam you do not have to provide detailed comments for your code, but you must use a comment to label the answers to each question.
- Write a function that takes three arguments (x,y,z) that are floating point numbers. The function should return the “Euclidean norm” of the values, which is given by \((x^2+y^2+z^2)^{1/2}\). Call the function for x=5.0, y=7.0, z=9.0. Print the answer to the console.
- The Fibonacci sequence starts with [1,1] and every following value is the sum of the previous two: [1,1,2,3,5,8,…]. Use an appropriate loop to determine the 50th number in the Fibonacci sequence.
Part 3: Data Analysis in Python
As this is a timed exam you do not have to provide detailed comments for your code, but you must use a comment to label the answers to each question.
- Download the file. This contains information on some games in the video game sales platform Steam. Load the file as a dataframe and print the columns to the console.
- Determine the minimum, maximum, and all three quartiles of the “price” column. Make a boxplot of the price distribution. Hint: use
import matplotlib.pyplot as plt
. - Make a scatter plot of positive_ratings (on the x axis) vs. negative_ratings (on the y axis).
- Generate a line of best fit and include it on the plot generated in (a). Print the value of the slope to the console and interpret: do most games have more positive or more negative ratings?
- Make a contingency table of developer and genres but only for games that have a price above 100.0. Display a list of the names of each of the games with a price above 100.0. Hint:
df2 = df.query(“A > 0”)
will select rows in df where the column named A has rows with values above 0.