This practice dataset based on the Healthcare Administration scenario. This dataset is intentionally “dirty”it includes the exact inconsistencies we discussed: mixed gender formats, missing discharge reasons, and logical date errors.
You can copy and save the text below as healthcare_data.csv.
Code snippet
Patient_ID,Admission_Date,Discharge_Date,Gender,Discharge_Reason,Cost
P001,2023-01-10,2023-01-15,M,Recovered,5000
P002,2023-01-12,2023-01-14,Female,,3200
P003,2023-02-01,2023-01-28,F,Recovered,4500
P004,2023-02-05,2023-02-10,Male,Transfer,6100
P005,2023-03-01,2023-03-05,M,,2900
P006,2023-03-10,2023-03-12,Female,Recovered,3400
P001,2023-01-10,2023-01-15,M,Recovered,5000
Python Analysis Script
You can run this script in a Jupyter Notebook or Google Colab to practice the Cleaning and Validation steps. This script uses the pandas library, which is the gold standard for data wrangling in MS-HA and MBA programs.
Python
import pandas as pd
# 1. Load the data
df = pd.read_csv(“healthcare_data.csv”)
# 2. Deduplication: Remove the duplicate row for Patient P001
df = df.drop_duplicates()
# 3. Standardization: Fix the ‘Gender’ column
gender_map = {‘M’: ‘Male’, ‘F’: ‘Female’, ‘Male’: ‘Male’, ‘Female’: ‘Female’}
df[‘Gender’] = df[‘Gender’].map(gender_map)
# 4. Handling Missingness: Fill blank Discharge Reasons
df[‘Discharge_Reason’] = df[‘Discharge_Reason’].fillna(‘Standard’)
# 5. Validation: Identify the logical date error (P003)
df[‘Admission_Date’] = pd.to_datetime(df[‘Admission_Date’])
df[‘Discharge_Date’] = pd.to_datetime(df[‘Discharge_Date’])
df[‘Date_Error’] = df[‘Discharge_Date’] < df[‘Admission_Date’]
print(“— Cleaned Dataset —“)
print(df)
What to Look For in Your Results:
- Row Count: You should see 6 rows instead of 7 (the duplicate P001 is gone).
- Standardization: All “M” and “F” values should now be “Male” and “Female.”
- Logical Check: Patient P003 will have True in the Date_Error column because their discharge date is before their admission date. As a manager, you would flag this record for the IT or clinical audit team.
Managerial Takeaway
- By running this code, you are moving from data consumer to data governor. You aren’t just looking at a final report; you are ensuring the “plumbing” of the data is sound before it reaches the CFO or Chief Medical Officer.
Why this Chart Matters to a Healthcare Administrator:
- Resource Allocation: By seeing that “Transfer” patients have the highest average cost ($6,100), a manager might investigate if these patients require more intensive pre-transfer stabilization or if the billing for transfers needs auditing.
- Performance Benchmarking: If “Standard” discharges are significantly cheaper than “Recovered,” is it due to shorter stays or lower intensity of care? This chart prompts the next level of operational questioning.
- Data Integrity Assurance: By explicitly filtering out Date_Errors, you can confidently tell stakeholders: “This report represents only verified patient encounters, excluding administrative logging errors.”
<!–
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{mso-style-unhide:no;
mso-style-qformat:yes;
mso-style-parent:””;
margin-top:0in;
margin-right:0in;
margin-bottom:8.0pt;
margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:”Calibri”,sans-serif;
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;
mso-fareast-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:”Times New Roman”;
mso-bidi-theme-font:minor-bidi;
mso-font-kerning:1.0pt;
mso-ligatures:standardcontextual;}
.MsoChpDefault
{mso-style-type:export-only;
mso-default-props:yes;
font-family:”Calibri”,sans-serif;
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;
mso-fareast-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:”Times New Roman”;
mso-bidi-theme-font:minor-bidi;}
.MsoPapDefault
{mso-style-type:export-only;
margin-bottom:8.0pt;
line-height:115%;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;
mso-header-margin:.5in;
mso-footer-margin:.5in;
mso-paper-source:0;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:873075596;
mso-list-template-ids:1364634374;}
@list l0:level2 lfo2
{mso-level-start-at:0;
mso-level-number-format:bullet;
mso-level-numbering:continue;
mso-level-text:o;
mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:”Courier New”;
mso-bidi-font-family:”Times New Roman”;}
@list l0:level2 lfo3
{mso-level-start-at:0;
mso-level-number-format:bullet;
mso-level-numbering:continue;
mso-level-text:o;
mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:”Courier New”;
mso-bidi-font-family:”Times New Roman”;}
@list l0:level2 lfo4
{mso-level-start-at:0;
mso-level-number-format:bullet;
mso-level-numbering:continue;
mso-level-text:o;
mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:”Courier New”;
mso-bidi-font-family:”Times New Roman”;}
–>
Question 1
50 Points
Your Submission Requirements:
Healthcare Majors
Copy the CSV file and strings of codes (listed in your designated assignment section) into the Python platform to create the following:
- Create a table/chart
- Create a simple bar chart
Upload your screen shot of the two charts
Business Majors:
Copy the CSV file and strings of codes (listed in your designated assignment section) into the Python platform to create the following:
- Stacked Bar chart in Python platform
- Post a screenshot of your Stacked Bar chart
Deliverable:
- Upload a screenshot of the Python-generated graphs , which were created from cleaned data.
Text Editor
Question 2
Question 2
50 Points
Part 2: Data Output (All Majors)
Deliverable: Screenshot of your Python pivot table output
Instructions:
- Run your Python code to generate the pivot table
- Ensure the entire pivot table is visible in your output window
- Take a screenshot that captures the complete table (use Snipping Tool on Windows, Command+Shift+4 on Mac, or your screenshot tool of choice)
- Save the screenshot as an image file (PNG or JPG format)
- Click the “Insert Image” button below and upload your screenshot
- Your screenshot should clearly show the pivot table with all column headers and data values
If using Jupyter Notebook:
- Run the cell containing your pivot table code
- Screenshot the output section below the code cell that displays your pivot table
- Upload the image using the “Add Image” button

Leave a Reply
You must be logged in to post a comment.