Sample Midterm Exam
Part 1: Data Analysis in Excel
Answer each question below in a separate tab in an Excel workbook. You may create multiple copies of the data if needed.
- Download the fish dataset and open it in Excel. Using built-in functions calculate the mean, mode, and standard deviation of each column of numeric data. Be sure to provide labels in the Excel sheet for every value you calculate.
- Create a histogram of the Length of the Fish. Set the bin width to 500. How many bins are there? Which bin contains the most fish? How many? Be sure to include a title and adjust the axes so they can be easily read. Write the answers to the questions in the Excel file.
- Create a scatter plot of the Length of the fish vs. Age of the fish. Also create a scatter plot of Length of the fish vs. Temperature of the Water. Label the axes and title each of the plots.
- Using your columns of data, calculate the Pearson’s r correlation for the relationship between the Length of Fish vs. Age of Fish. Also compute the slope and y-intercept of the line of best fit using built-in functions (not by displaying it on the scatter plot).
- Without calculating the Pearson’s r correlation for the relationship between the Length of Fish vs. Temperature of the Water, can you make an educated guess for what it will be? Tell me what the Pearson’s r correlation tells you about the data set and why you made this guess.
Part 2: Python Basics
As this is a timed exam you do not have to provide detailed comments for your code, but you must use a comment to label the answers to each question.
- Write a function f(x) that returns 1⁄((1-x). Call the function for x=0.05 and print the answer.
- Use an appropriate loop to evaluate 1+x+x\(^2\)+⋯+x\(^{20}\) for x=0.05. Print the answer. [Hint: Your answers to #1 and #2 should be very similar.]
Part 3: Data Analysis in Python
As this is a timed exam you do not have to provide detailed comments for your code, but you must use a comment to label the answers to each question.
- Download the life expantancy dataset and load it as a Dataframe in a Python notebook. Generate a descriptive summary of the numerical columns of the data.
- Your output from #1 should include descriptive statistics including the standard deviation (abbreviated std) of each column of data. In a sentence or two (without using any mathematics), describe what is meant by this term.
- Two other columns of data are titled “Life Expectancy” and “GDP”. Make a scatter plot of these data. Label your axes and make the symbols small. Calculate and report a correlation coefficient. In a comment, interpret this value (e.g. “The results suggests there is very little correlation between the data.”). Add a line of best fit for the data to your scatter plot.
- What is the average life expectancy for entries before the year 2010? What is the average life expectancy for 2010 and later?