Sample Midterm Exam

Part 1: Data Analysis in Excel

Answer each question below in a separate tab in an Excel workbook. You may create multiple copies of the data if needed.

  1. Download the fish dataset and open it in Excel. Using built-in functions calculate the mean, mode, and standard deviation of each column of numeric data. Be sure to provide labels in the Excel sheet for every value you calculate.
  2. Create a histogram of the Length of the Fish. Set the bin width to 500. How many bins are there? Which bin contains the most fish? How many? Be sure to include a title and adjust the axes so they can be easily read. Write the answers to the questions in the Excel file.
  3. Create a scatter plot of the Length of the fish vs. Age of the fish. Also create a scatter plot of Length of the fish vs. Temperature of the Water. Label the axes and title each of the plots.
  4. Using your columns of data, calculate the Pearson’s r correlation for the relationship between the Length of Fish vs. Age of Fish. Also compute the slope and y-intercept of the line of best fit using built-in functions (not by displaying it on the scatter plot).
  5. Without calculating the Pearson’s r correlation for the relationship between the Length of Fish vs. Temperature of the Water, can you make an educated guess for what it will be? Tell me what the Pearson’s r correlation tells you about the data set and why you made this guess.

Part 2: Python Basics

As this is a timed exam you do not have to provide detailed comments for your code, but you must use a comment to label the answers to each question.

  1. Write a function f(x) that returns 1⁄((1-x). Call the function for x=0.05 and print the answer.
  2. Use an appropriate loop to evaluate 1+x+x\(^2\)+⋯+x\(^{20}\) for x=0.05. Print the answer. [Hint: Your answers to #1 and #2 should be very similar.]

Part 3: Data Analysis in Python

As this is a timed exam you do not have to provide detailed comments for your code, but you must use a comment to label the answers to each question.

  1. Download the life expantancy dataset and load it as a Dataframe in a Python notebook. Generate a descriptive summary of the numerical columns of the data.
  2. Your output from #1 should include descriptive statistics including the standard deviation (abbreviated std) of each column of data. In a sentence or two (without using any mathematics), describe what is meant by this term.
  3. Two other columns of data are titled “Life Expectancy” and “GDP”. Make a scatter plot of these data. Label your axes and make the symbols small. Calculate and report a correlation coefficient. In a comment, interpret this value (e.g. “The results suggests there is very little correlation between the data.”). Add a line of best fit for the data to your scatter plot.
  4. What is the average life expectancy for entries before the year 2010? What is the average life expectancy for 2010 and later?