Best Assignment Doers

sjhflsfskfjs;s

—

by

in Artificial Intelligence, uncategorised

Dataset

Use a public dataset from the approved list (Chicago Crime dataset selected)

Assignment Requirements

Q1: Problem Identification & Data Collection

In the notebook:

Clearly define a real-world problem suitable for EDA and ML
Describe the dataset and source (link included)
Identify:
- Target variable
- Feature variables
Show dataset shape and preview

Q2: Exploratory Data Analysis (Manual EDA)

Using the SAME dataset:

Comment on data quality (missing values, duplicates, data types)
Descriptive statistics and interpretation
Identify and remove outliers
Check feature distributions
Correlation analysis
Clearly list:
- Dependent variable
- Independent variables
Drop unnecessary independent features
Check skewness using p-value
Apply:
- Standardization
- Normalization
Save:

Cleaned dataset
Standardized dataset
Normalized dataset

Q3: Automated EDA (Sweetviz)

Using the SAME dataset in the SAME notebook:

Install and use Sweetviz (must work in Google Colab)
Generate:
- analyze() report for raw dataset
- analyze() report for cleaned dataset
- compare() report (raw vs cleaned)
- compare_intra() report (e.g., class-based comparison)
Display and save Sweetviz HTML reports
Provide written explanations comparing:
- Raw vs cleaned dataset
- Insights from analyze, compare, compare_intra
Discuss how dataset quality affects Linear Regression performance (conceptual explanation)

Submission Expectations

ONE clean, well-structured .ipynb notebook
Clear markdown explanations (student level)
Code must run successfully in Google Colab
Proper handling of Sweetviz + NumPy compatibility
No plagiarism

What I Expect From You

Complete end-to-end solution
Same dataset across Q1, Q2, Q3
Ready to submit with no errors

Requirements: as long | Python

Comments

Leave a Reply Cancel reply

You must be logged in to post a comment.