Healthcare Practice Dataset (CSV)

This practice dataset based on the Healthcare Administration scenario. This dataset is intentionally “dirty”it includes the exact inconsistencies we discussed: mixed gender formats, missing discharge reasons, and logical date errors.

You can copy and save the text below as healthcare_data.csv.

Code snippet

Patient_ID,Admission_Date,Discharge_Date,Gender,Discharge_Reason,Cost

P001,2023-01-10,2023-01-15,M,Recovered,5000

P002,2023-01-12,2023-01-14,Female,,3200

P003,2023-02-01,2023-01-28,F,Recovered,4500

P004,2023-02-05,2023-02-10,Male,Transfer,6100

P005,2023-03-01,2023-03-05,M,,2900

P006,2023-03-10,2023-03-12,Female,Recovered,3400

P001,2023-01-10,2023-01-15,M,Recovered,5000

Python Analysis Script

You can run this script in a Jupyter Notebook or Google Colab to practice the Cleaning and Validation steps. This script uses the pandas library, which is the gold standard for data wrangling in MS-HA and MBA programs.

Python

import pandas as pd

# 1. Load the data

df = pd.read_csv(“healthcare_data.csv”)

# 2. Deduplication: Remove the duplicate row for Patient P001

df = df.drop_duplicates()

# 3. Standardization: Fix the ‘Gender’ column

gender_map = {‘M’: ‘Male’, ‘F’: ‘Female’, ‘Male’: ‘Male’, ‘Female’: ‘Female’}

df[‘Gender’] = df[‘Gender’].map(gender_map)

# 4. Handling Missingness: Fill blank Discharge Reasons

df[‘Discharge_Reason’] = df[‘Discharge_Reason’].fillna(‘Standard’)

# 5. Validation: Identify the logical date error (P003)

df[‘Admission_Date’] = pd.to_datetime(df[‘Admission_Date’])

df[‘Discharge_Date’] = pd.to_datetime(df[‘Discharge_Date’])

df[‘Date_Error’] = df[‘Discharge_Date’] < df[‘Admission_Date’]

print(“— Cleaned Dataset —“)

print(df)

What to Look For in Your Results:

  • Row Count: You should see 6 rows instead of 7 (the duplicate P001 is gone).
  • Standardization: All “M” and “F” values should now be “Male” and “Female.”
  • Logical Check: Patient P003 will have True in the Date_Error column because their discharge date is before their admission date. As a manager, you would flag this record for the IT or clinical audit team.

Managerial Takeaway

  • By running this code, you are moving from data consumer to data governor. You aren’t just looking at a final report; you are ensuring the “plumbing” of the data is sound before it reaches the CFO or Chief Medical Officer.

Why this Chart Matters to a Healthcare Administrator:

  • Resource Allocation: By seeing that “Transfer” patients have the highest average cost ($6,100), a manager might investigate if these patients require more intensive pre-transfer stabilization or if the billing for transfers needs auditing.
  • Performance Benchmarking: If “Standard” discharges are significantly cheaper than “Recovered,” is it due to shorter stays or lower intensity of care? This chart prompts the next level of operational questioning.
  • Data Integrity Assurance: By explicitly filtering out Date_Errors, you can confidently tell stakeholders: “This report represents only verified patient encounters, excluding administrative logging errors.”

<!–

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

{mso-style-unhide:no;

mso-style-qformat:yes;

mso-style-parent:””;

margin-top:0in;

margin-right:0in;

margin-bottom:8.0pt;

margin-left:0in;

line-height:115%;

mso-pagination:widow-orphan;

font-size:12.0pt;

font-family:”Calibri”,sans-serif;

mso-ascii-font-family:Calibri;

mso-ascii-theme-font:minor-latin;

mso-fareast-font-family:Calibri;

mso-fareast-theme-font:minor-latin;

mso-hansi-font-family:Calibri;

mso-hansi-theme-font:minor-latin;

mso-bidi-font-family:”Times New Roman”;

mso-bidi-theme-font:minor-bidi;

mso-font-kerning:1.0pt;

mso-ligatures:standardcontextual;}

.MsoChpDefault

{mso-style-type:export-only;

mso-default-props:yes;

font-family:”Calibri”,sans-serif;

mso-ascii-font-family:Calibri;

mso-ascii-theme-font:minor-latin;

mso-fareast-font-family:Calibri;

mso-fareast-theme-font:minor-latin;

mso-hansi-font-family:Calibri;

mso-hansi-theme-font:minor-latin;

mso-bidi-font-family:”Times New Roman”;

mso-bidi-theme-font:minor-bidi;}

.MsoPapDefault

{mso-style-type:export-only;

margin-bottom:8.0pt;

line-height:115%;}

@page WordSection1

{size:8.5in 11.0in;

margin:1.0in 1.0in 1.0in 1.0in;

mso-header-margin:.5in;

mso-footer-margin:.5in;

mso-paper-source:0;}

div.WordSection1

{page:WordSection1;}

/* List Definitions */

@list l0

{mso-list-id:873075596;

mso-list-template-ids:1364634374;}

@list l0:level2 lfo2

{mso-level-start-at:0;

mso-level-number-format:bullet;

mso-level-numbering:continue;

mso-level-text:o;

mso-level-tab-stop:1.0in;

mso-level-number-position:left;

text-indent:-.25in;

mso-ansi-font-size:10.0pt;

font-family:”Courier New”;

mso-bidi-font-family:”Times New Roman”;}

@list l0:level2 lfo3

{mso-level-start-at:0;

mso-level-number-format:bullet;

mso-level-numbering:continue;

mso-level-text:o;

mso-level-tab-stop:1.0in;

mso-level-number-position:left;

text-indent:-.25in;

mso-ansi-font-size:10.0pt;

font-family:”Courier New”;

mso-bidi-font-family:”Times New Roman”;}

@list l0:level2 lfo4

{mso-level-start-at:0;

mso-level-number-format:bullet;

mso-level-numbering:continue;

mso-level-text:o;

mso-level-tab-stop:1.0in;

mso-level-number-position:left;

text-indent:-.25in;

mso-ansi-font-size:10.0pt;

font-family:”Courier New”;

mso-bidi-font-family:”Times New Roman”;}

–>

Question 1

50 Points

Your Submission Requirements:

Healthcare Majors

Copy the CSV file and strings of codes (listed in your designated assignment section) into the Python platform to create the following:

  1. Create a table/chart
  2. Create a simple bar chart

Upload your screen shot of the two charts

Business Majors:

Copy the CSV file and strings of codes (listed in your designated assignment section) into the Python platform to create the following:

  • Stacked Bar chart in Python platform
  • Post a screenshot of your Stacked Bar chart

Deliverable:

  • Upload a screenshot of the Python-generated graphs , which were created from cleaned data.

Text Editor

Question 2

Question 2

50 Points

Part 2: Data Output (All Majors)

Deliverable: Screenshot of your Python pivot table output

Instructions:

  1. Run your Python code to generate the pivot table
  2. Ensure the entire pivot table is visible in your output window
  3. Take a screenshot that captures the complete table (use Snipping Tool on Windows, Command+Shift+4 on Mac, or your screenshot tool of choice)
  4. Save the screenshot as an image file (PNG or JPG format)
  5. Click the “Insert Image” button below and upload your screenshot
  6. Your screenshot should clearly show the pivot table with all column headers and data values

If using Jupyter Notebook:

  • Run the cell containing your pivot table code
  • Screenshot the output section below the code cell that displays your pivot table
  • Upload the image using the “Add Image” button

WRITE MY PAPER


Comments

Leave a Reply