Real-Time Object Recognition with CNNs

Summary

This assignment is an individual work. Every student will independently design and implement a real-time object recognition system using OpenCV and PyTorch. The aim is to build a system that captures live video frames from a laptop camera, automatically recognizes four chosen object categories, and overlays the recognized object as a subtitle on the video stream. The categories can be any e.g., shoes, pens, cards, bottles, etc.

Students will collect their own image datasets, train and test different CNN models, and compare results across three strategies: (1) baseline CNN trained from scratch on the dataset collected, (2) CNN with data augmentation, and (3) transfer learning with a pre-trained network.

Deliverables include:

All Python code files (separate scripts for: i) data collection (a .py file), ii) model training and evaluation (a notebook (.ipynb file) including the code and outputs (plots and printed evaluation results)), iii) real-time application (.py file))
Final trained model file (.pt or .pth file (best model only) that can be loaded with torch.load())
A brief written report (PDF, Maximum word limit 2000)
A short demonstration video of the real-time application (using the best model only)

Submission: Submit as a PDF file via Forum. Include links to various artifacts (e.g., video) on the first page of the report. You can put artifacts on Google drive. Make sure that the link is accessible.

Section 1: Data Collection and Preprocessing

Objective: Create and prepare a dataset for CNN training.

Steps/Questions:

Write an OpenCV script to capture and store frames from your laptop camera.
- The script should automatically create a directory and save frames as image files. (Note: You may need to grant camera permissions by running your code in terminal: $python your_script.py)
Collect 35 images per object category for four distinct objects. It is advised to collect more (2x) and then manually filter out samples that have problems (such as those not containing the object, containing artifacts like blur or other type of noise)
Split data into training, validation, and test sets (20, 5, 10).

Section 2: Model selection

Objective: Define three CNN-based approaches for comparison.

Steps/Tasks:

-** Model 1: Baseline CNN: **Implement and train a CNN model from scratch using your dataset.

Model 2: Data Augmented CNN: Retrain the same CNN using data augmentation techniques (rotation, flips, brightness adjustments, etc.).
Model 3: Transfer Learning: Choose a pre-trained model (e.g., MobileNetV2, ResNet18) and fine-tune it on your dataset. Use a relatively small model suitable for limited data and your laptop resources. Note that you may not be able to run on your computer the large models you train on colab.

IMPORTANT: You should avoid using YOLO object detection models! The models used should be lightweight CNN models.

Section 3: Model Training and Hyperparameter Tuning

Objective: Train and optimize models using PyTorch.

Steps/Tasks:

Implement a PyTorch training loop with clear code comments.
Train each of the three models using appropriate hyperparameters.
Explore hyperparameter variations (learning rate, batch size, optimizer choice).
Record training and validation performance (create learning curves).
Identify which hyperparameters have the biggest effect on performance.
Everything done (e.g., hyper parameter tuning) should be reflected in the report e.g., via graphs.

Section 4: Model Evaluation

Objective: Evaluate model performance on unseen test data.

Steps/Questions:

Evaluate all three models on the test set.
Report accuracy, precision, recall, and confusion matrix for each model using proper visualizations
Compare performance results across the three models and explain differences.

Section 5: Real-Time Demonstration

Objective: Deploy the best-performing model in a real-time OpenCV application.

Steps/Tasks:

Implement an application script using OpenCV that:
Captures real-time video feed.
Runs inference using the selected CNN model.
Displays the recognized object name as a subtitle on the video.
For an object outside the 4 categories, the model should predict “Other”
Record a short video demo (1-2 minutes) of your application running with different objects. (you may use the screen recording function of QuickTime Player). You must include yourself in the video too. Please discuss with your instructor in case you have any challenge in this part.

Section 6: Conclusions

Objective: Reflect on results, insights, and limitations.

Questions:

Show key results via graphs and charts
Which of the three approaches performed best? Why?
What challenges did you face during dataset collection and training?
Link the concepts you used in this assignment to respective lessons from the course. Include a table for this part

Section 7: Technical Interviews

After the submission of assignment, each student will give a technical interview to the instructor. In the interview, the student will explain how he/she executed this assignment. The instructor will ask questions from the student to assess students’ understanding of the assignment.

Word limit: Max 2500

Grading dimensions

The following components will be considered for assessment:

Dataset collection, quality, and preprocessing
CNN model design and training implementation (including hyper parameter tuning)
Evaluation and comparison of three models
Functionality of real-time object recognition system
Code quality, report clarity, and demonstration video

IMPORTANT: Students are expected to develop a good understanding of the content of their work and be able to answer the instructors questions during viva. This will be considered part of the assessment and reflected in their grades.

Assignment Information

Weight:

15%

Learning Outcomes Added

: Explain the fundamentals of deep learning, including motivation, problem formulation, and architectures.
: Apply and evaluate the design and implementation of deep learning architectures and techniques.
: Recognize and critically analyze deep learning methods for different types of learning tasks across various domains.

Requirements: 2000 | Python

WRITE MY PAPER

Real-Time Object Recognition with CNNs

Section 1: Data Collection and Preprocessing

IMPORTANT: You should avoid using YOLO object detection models! The models used should be lightweight CNN models.

Section 3: Model Training and Hyperparameter Tuning

Section 4: Model Evaluation

Section 5: Real-Time Demonstration

Section 6: Conclusions

Grading dimensions

Assignment Information

Learning Outcomes Added

Comments

Leave a Reply Cancel reply