R Studio Help

The dataset has also been adjusted from any previous iteration of the AA5221 course. You must download this new dataset and load it into your chosen software application for this summative assessment.

Throughout this course, you have learned about research methods, data analysis techniques, and statistical tests to study the differences among (or between) various groups of a dataset. In this final assessment assignment, you will comprehensively apply the course concepts you have learned during the first six weeks of the course to demonstrate your mastery of the course concepts. The data you will use in this assessment is an expanded Human Resources (HR) dataset. Therefore, you will be very familiar with the data variables for this assignment. However, please note that this HR dataset is different from the dataset you’ve used in previous assignments. The new, expanded HR data is posted on Canvas in the Week 7 module, specifically within the ‘Dataset for Assessment’ tab.————-

Background. As youve experienced in the first weeks of this course, the HR Director at our notional company continues to seek knowledge related to our current and former workforce. For this final assessment assignment, the HR Director wants your help to form a holistic assessment of the factors potentially related to employee retention (stay vs leave) and their performance levels, including how these factors relate to factors such as workload (average monthly work hours), salary, filing complaints, and how well employees are valued (last evaluation). The HR Director has given you five tasks each with several sub-tasks, which are described below.

Task 1: Basic data analysis and descriptive statistics

  • Briefly explain the HR Dataset for the HR Director, including a short description of the variables and their measurement scales (e.g., nominal, ordinal, scale).
  • Produce a bar chart showing the different counts for two mutually exclusive groups of employees: those who left the company, those who stayed with the company. Calculate and report the proportion of employees who have left the company.
  • Generate descriptive statistics (specifically the mean, median, mode, standard deviation, kurtosis, and skewness) for the performance scores of the two mutually exclusive groups, i.e., descriptive statistics for those who left the company and for those who stayed.
  • Produce two frequency distributions (i.e., histograms) of performance scores, one for each mutually exclusive group of employees (those who left, those who stayed). Explain and interpret the resultant shape of the frequency distributions.
  • Provide an interpretation of the descriptive statistics of the performance scores between the two mutually exclusive groups of those who left and those who stayed.

Task 2: Z score interpretations

  • Produce a set of z scores for performance level for all employees in the dataset. Then, conduct descriptive statistics of the z scores and report the resultant mean and standard deviation of the z scores.
  • Interpret the z score for performance level for the first employee in the dataset (i.e., the first record in the HR dataset, employee #1). How does this employees performance level compare relatively to all employees (current and former) in the HR dataset? Does this employee perform better or worse relative to other employees?

Task 3: Tests of differences and associations

  • Conduct the appropriate statistical test to answer the following question: do those who have left the company differ from those who have stayed in terms of their average monthly work hours? Justify the test you used and explain if the difference is meaningful assuming a statistical significance threshold of 0.05 (i.e., = 0.05).
  • Conduct the appropriate statistical test to answer the following question: do those who have left the company differ from those who have stayed in terms of their salary level? [Note: for this question you should assume the salary level is ordinal, i.e., salary level 1 is less than salary level 2 and salary level 2 is less than salary level 3]. Justify the test you used and explain if the difference is meaningful assuming a statistical significance threshold of 0.05 (i.e., = 0.05).
  • Conduct the appropriate statistical test to answer the following question: are those who have filed a complaint more likely to have left than those who have not filed a complaint? Justify the test you used and explain if the difference is meaningful assuming a statistical significance threshold of 0.05 (i.e., = 0.05).

Task 4: Correlations and regression analysis

  • What is the correlation between turnover (using the nominal variable “left”) and performance level? Justify the correlation statistic you used and explain if the correlation is meaningful assuming a statistical significance threshold of 0.05 (i.e., = 0.05).
  • Create a scatterplot of employee’s last evaluation score (as a predictor variable on the x-axis) and employee’s performance level (as an outcome variable on the y-axis). Provide an informal interpretation of any general trends in the scatterplot between the two variables, i.e., is there a positive or negative correlation between the two variables?
  • Conduct a bivariate linear regression between last evaluation (predictor) and performance level (outcome). Provide a summary of your results including the appropriate output tables. Finally, interpret the regression results in terms of overall model fit (F statistic), the regressor value () and its significance (p-value), and amount of variance in the data accounted for by the regressor (adjusted R2).

Task 5: Summarize what you learned about employee retention and performance from your analysis, as well as the other factors you think might be related to retention or performance

  • Which factors seem to be most related to retention?
  • Which factors seem to be most related to performance?

Requirements: as stated

WRITE MY PAPER