For this project, you will conduct a comprehensive descriptive analysis of a real-world dataset using Microsoft Excel. You will organize, summarize, visualize, and interpret data to tell a compelling story about what the data reveals. This project allows you to apply the skills learned in Weeks 1-6 and demonstrate your ability to extract meaningful insights from data.
Weight: 30% of final grade (300 points)
Submission: Excel workbook (.xlsx) + Written Report (PDF or Word document)
This midterm project assesses your mastery of the following course learning outcomes:
- Organize, summarize, and visualize data using Excel to extract meaningful patterns and insights
- Apply fundamental descriptive statistical techniques (measures of central tendency, variation, frequency distributions) to analyze real-world datasets
- Ethically interpret and communicate data insights using appropriate visual and written forms, including consideration of data limitations and responsible use
By completing this project, you will demonstrate your ability to conduct a complete data analysis workflowfrom data cleaning and exploration through statistical analysis and professional communication of findings.
STEP 1: Select Your Dataset (Week 5)
Choose ONE dataset from the approved options below. All datasets are available for free download. You may need to create a free Kaggle account to access some datasets.
STEP 2: Create Your Excel Workbook (Weeks 5-6)
Your Excel file must include five sheets with the following content:
Sheet 1: Raw Data
- Import your original dataset exactly as downloaded
- Do NOT modify this sheet
- Label tab clearly as “Raw Data”
Sheet 2: Data Cleaning & Organization
- Copy raw data and document all cleaning steps:Handle missing values (delete, fill with average, mark as “Unknown”)
- Remove duplicates
- Create new calculated fields if needed
- Rename variables for clarity
- Filter to relevant subset if dataset is very large
- Add text boxes or comment cells explaining what you did and why
- Example: “Removed 15 rows with missing graduation rate data because this variable is essential to my analysis”
Sheet 3: Summary Statistics Create at least THREE summary tables including:
- Numerical variables: Mean, median, standard deviation, min, max, range, count
- Categorical variables: Frequency counts, percentages
- Grouped summaries: Use pivot tables or formulas to summarize by categories (e.g., average temperature by city, median tuition by public/private)
- Use Excel formulas (AVERAGE, MEDIAN, STDEV.S, COUNT, COUNTIF, SUMIF, etc.) – no manual calculations
Sheet 4: Visualizations Create at least FIVE different charts including:
- At least one histogram or bar chart (distribution/comparison)
- At least one scatterplot (relationship between variables)
- At least three different chart types total
- Every chart must have: descriptive title, axis labels with units, legend (if needed), appropriate scale, professional colors, data source note
Examples: histogram of graduation rates, bar chart of average sales by product, scatterplot of tuition vs. graduation rate with trendline, line chart of temperature trends, pie chart of public vs. private proportions
Sheet 5: Analysis Notes (Optional but Recommended)
- Document interesting findings, surprises, patterns, or questions as you work
- Helps you write your report later
STEP 3: Write Your Report (Week 6-7)
Submit a written report of 3-5 pages, double-spaced, 12-point font (Times New Roman or Arial) with these sections:
1. Introduction (0.5-1 page)
- What dataset did you choose and why?
- What is the data source? Is it credible?
- What research question(s) are you exploring? Be specific.
- What do you hope to learn?
2. Data Description (0.5-1 page)
- How many observations (rows) and variables (columns)?
- What are your key variables? Define them clearly.
- What time period or population does the data represent?
- What data cleaning steps did you perform and why?
3. Descriptive Analysis & Findings (2-3 pages) MOST IMPORTANT SECTION
- Organize by findings, NOT by methods
- Good: “Graduation Rates Vary Widely Across Institutions”
- Poor: “First I Made a Histogram”
- For each finding: state it clearly, present specific statistics as evidence, reference visualizations, interpret what it means
- Required: Discuss central tendency for 2+ variables, variation/spread, group comparisons, relationships between variables
- Reference all five visualizations from your Excel file
- Use specific numbers but explain them in context
4. Limitations & Ethical Considerations (0.5 page)
- Data limitations: What’s missing? How representative? How current?
- Interpretation cautions: What can you NOT conclude?
- Ethical issues: Privacy concerns? Potential biases? Who’s represented/missing? How might findings be misused?
5. Conclusion (0.5 page)
- Summarize 2-3 most important findings
- What “story” does your data tell?
- What questions remain unanswered?
- How might this analysis be useful in real-world decision-making?
6. References
- Cite your dataset source in APA format (with URL and access date)
- Include any additional sources consulted
- Excel WorkbookFilename: LastName_FirstName_Midterm.xlsx
- All 5 sheets properly labeled
- Formulas intact (NOT pasted as values)
- Charts professional and formatted
- Written ReportFilename: LastName_FirstName_MidtermReport.pdf or .docx
- 3-5 pages, double-spaced, 12-point font
- Proofread for grammar and clarity
- Charts embedded OR clearly referenced by figure number
- Excel WorkbookFilename: LastName_FirstName_Midterm.xlsx
- All 5 sheets properly labeled
- Formulas intact (NOT pasted as values)
- Charts professional and formatted
- Written ReportFilename: LastName_FirstName_MidtermReport.pdf or .docx
- 3-5 pages, double-spaced, 12-point font
- Proofread for grammar and clarity
- Charts embedded OR clearly referenced by figure number

Leave a Reply
You must be logged in to post a comment.