In 1-3 sentences, define each of the following terms in your own words and provide an example: Correlation coefficient Bold text start(2 marks)Bold text End Linear regression Bold text start(2 marks)Bold text End Small sample bias Bold text start(2 marks)Bold text End Common cause relationship Bold text start(2 marks)Bold text End You are hired to study the statistical relationship between the number of trees and the number of Starbucks in different neighbourhoods of a city. Describe how you would obtain this information and two biases that could occur in your survey. How would you avoid these biases? Bold text start(5 marks)Bold text End The next 3 questions refer to the following data set. Show any relevant calculations or spreadsheet formulas that you used while responding to the questions. Years 0.9 2.0 2.1 4.0 4.8 5.2 6.2 9.0 9.1 Height (cm) 0.79 3.98 4.81 11.41 29.72 24.2 26.81 91.47 129.45 Use automatic linear regression (e.g. using Excel or Sheets) to find a line of best fit, with the Years as the x values and the Height (cm) as y values. Explain exactly what you did. Bold text start(4 marks)Bold text End Find the correlation coefficient and coefficient of determination for the data. Assess how well the line fits the data. Bold text start(5 marks)Bold text End Create a residual plot for the data using your line of best fit. What conclusions can you draw from the residual plot? Bold text start(4 marks)Bold text End For the next 5 questions, use the internet to find a reliable time series data set with at least 10 years of data, and at least 10 data points. State the source of the data, and explain how you know it is a reliable source. Bold text start(4 marks)Bold text End Perform a linear regression and interpret the results. Bold text start(5 marks)Bold text End Using the regression results from the previous question, estimate the value that would occur 10 years before your first data point, and 10 years after your last data point. How valid are these predictions? Bold text start(6 marks)Bold text End Create a residual plot and comment on the appropriateness of the model. Bold text start(5 marks)Bold text End Even though the source of the data is reliable, describe at least two biases or mistakes that could have been made in collecting the data that would affect any conclusions. Include questions you may ask to investigate these potential issues. Bold text start(6 marks)Bold text End The next 2 questions refer to the following data set. Show any relevant calculations or spreadsheet formulas that you used while responding to the questions. x 1 2 3 4 5 6 7 8 9 y 7.2 6.2 2.4 17.0 35.1 60.0 96.6 133.5 180.1 Create a scatter plot of the data, and create a residual plot for a line of best fit. Based on your observation, does the data follow a linear pattern? Bold text start(5 marks)Bold text End While we used a line of best fit to calculate residuals earlier, you can actually use any model to calculate residuals and create a residual plot. Use the estimate to calculate a new set of residuals, using a formula or a calculator. Explain your steps, then create a residual plot using this new model and assess its validity. Bold text start(6 marks)Bold text End For each of the following relationships, predict whether they are cause-effect, accidental, or common-cause. Explain your conclusion and, if they are cause-effect, explain which causes the other and why. Students with a shorter travel time to school have lower test scores. Bold text start(3 marks)Bold text End Children with more books in their home are more likely to earn a PhD when they grow up. Bold text start(3 marks)Bold text End People born between the 15th and 25th of any month are more likely to have a cell phone number ending in 26. Bold text start(3 marks)Bold text End People who drink more coffee in the morning are more likely to have insomnia (the inability to sleep) at night. Bold text start(3 marks)

Leave a Reply
You must be logged in to post a comment.