An end-to-end machine learning pipeline that classifies breadboard circuits as PASS or FAIL using overhead camera images. The system compares traditional ML models against a Convolutional Neural Network across activation functions, optimisation algorithms, and hyperparameter tuning strategies.
flowchart TD
A["Raw Dataset\n54 images · 19 PASS · 35 FAIL"] --> B["Offline Augmentation\n8 rotations x 2 mirrors = 16x expansion"]
B --> C["Augmented Dataset\n864 images · 304 PASS · 560 FAIL"]
C --> CNN["CNN Branch\n224x224x3 = 150,528 input dims"]
C --> CML["Classical ML Branch\n64x64x3 flatten = 12,288 input dims"]
subgraph S3_CNN["Sec 3 — CNN Design & Optimisation"]
CNN --> SP["70 / 15 / 15 Split\n605 train · 128 val · 131 test"]
SP --> ACT["Activation Comparison\nReLU 91.41% vs Tanh 65.62%"]
ACT --> OPT["CNN Optimiser Comparison\nAdam 91.41% vs SGD 65.62%"]
OPT --> RS["Keras Tuner RandomSearch\n10 trials · Best val: 93.75%\n64 filters · 128 units · dropout 0.2 · lr 0.001"]
end
subgraph S3_TRAD["Sec 3 — Traditional Optimisers"]
CML --> PCA["PCA Reduction\n50 components · 87.1% variance"]
PCA --> TOPT["Optimiser Comparison on Logistic Reg\nCoord Search 69.36% · GD 65.90% · Newton 73.41%"]
end
subgraph S4["Sec 4 — Baseline Comparison"]
CML --> BL["6 Classifiers · 80/20 Split\nSVM 82.08% · LR 79.19% · GB 78.03%\nRF 74.57% · KNN 69.94% · MLP 64.74%"]
BL --> RSCV["RandomizedSearchCV on SVM\nBest: C=0.1 linear · 82.08% (no gain)"]
end
subgraph S5["Sec 5 — Experimental Rigor"]
RSCV --> CV["5-Fold Stratified CV\nSVM 86.98% +/-2.77% best\nMLP 60.49% +/-13.26% worst"]
RS --> EVAL["CNN Test Evaluation\n87.02% accuracy · F1 = 0.86\nFAIL recall 95% · FP = 4 / 75"]
CV --> CMP["Model Comparison\nCNN 87.02% > SVM 82.08%"]
EVAL --> CMP
EVAL --> OFA["Overfitting Analysis\nTrain 99.17% · Val 90.62% · Gap 8.55%"]
end
CMP --> DEP["Deployment\nanalyzer.py · webcam inference\nArduino LED indicator · robotic arm integration"]
OFA --> DEP
| Model | Test Accuracy | Notes |
|---|---|---|
| CNN — Tuned (RandomSearch) | 93.75% val | Best overall |
| CNN — ReLU + Adam | 87.02% | Baseline deep model |
| SVM (Linear) — CV | 86.98% mean | Best classical (5-fold CV) |
| SVM (Linear) — test | 82.08% | Best classical (held-out) |
| Logistic Regression | 79.19% | — |
| Gradient Boosting | 78.03% | — |
| Random Forest | 74.57% | — |
| KNN (k=5) | 69.94% | — |
| MLP (sklearn) | 64.74% | Unstable (±13.26% CV std) |
breadboard-analyzer/
├── PDE4444_Technical_Portfolio.ipynb # Main deliverable — all 5 assessment sections
├── Dataset/
│ ├── breadboard_dataset/ # Original 54 images (PASS/FAIL subdirs)
│ ├── augmented_dataset/ # 864 images after 16x augmentation
│ └── flattened_dataset/ # CSV of 64x64 flattened pixels for classical ML
├── Models/
│ ├── cnn_relu_adam.keras # Baseline CNN (ReLU + Adam)
│ ├── cnn_tanh_adam.keras # Activation comparison model
│ ├── cnn_relu_sgd.keras # Optimiser comparison model
│ ├── cnn_tuned_random_search.keras # Best CNN (keras-tuner RandomSearch)
│ ├── best_traditional_model.joblib # Best baseline classical model (SVM)
│ └── best_traditional_model_tuned.joblib # Tuned SVM (RandomizedSearchCV)
├── Scripts/
│ ├── analyzer.py # Webcam inference script
│ ├── grid_search_tuner.py # CNN architecture search (source)
│ ├── traditional_ml_tuner.py # Classical ML training + tuning (source)
│ └── random_search_tuner.py # Keras Tuner RandomSearch (source)
├── Utils/
│ ├── offline_augmentation.py # 8-angle rotation + mirror augmentation
│ └── image_flattener.py # Resize to 64x64 + flatten to CSV
└── Legacy/ # Archived earlier scripts
The notebook follows the 5-section assessment structure:
| Section | Content | Key Cells |
|---|---|---|
| 1. Engineering Problem Definition | System overview, inputs/outputs, engineering relevance | Cells 3–5 |
| 2. Dataset Collection & Feature Representation | Augmentation pipeline, dimensionality analysis, CNN vs ML input comparison | Cells 6–12 |
| 3. Neural Network Design & Optimisation | CNN architecture, activation comparison (ReLU vs Tanh), optimiser comparison (Coord Search / GD / Newton + Adam vs SGD), hyperparameter tuning | Cells 13–35 |
| 4. Baseline Comparison | 6 classical ML models, RandomizedSearchCV tuning, CNN vs classical comparison | Cells 36–44 |
| 5. Experimental Rigor | Train/val/test splits, 5-fold stratified CV, overfitting analysis (learning curves, confusion matrix, classification report) | Cells 45–52 |
Python 3.10+ recommended. Install all dependencies into a virtual environment:
python -m venv venv
venv\Scripts\activate # Windows
source venv/bin/activate # Linux/macOS
pip install tensorflow==2.21.0 scikit-learn numpy pandas matplotlib seaborn pillow opencv-python scipy joblib keras-tuner tensorboard
Note: TensorFlow GPU is not supported on native Windows for TF >= 2.11. Training runs on CPU. Use WSL2 or the TensorFlow-DirectML plugin for GPU support.
pip install ipykernel
python -m ipykernel install --user --name=breadboard-venv --display-name "Python (breadboard-venv)"
Then open PDE4444_Technical_Portfolio.ipynb in VS Code and select Python (breadboard-venv) as the kernel.
Set LOAD_PRETRAINED = True in Cell 16 of the notebook. This loads all pre-trained models from Models/ and skips the training cells (~1 hour of CPU training).
Set LOAD_PRETRAINED = False (default). Run all cells top to bottom. Expected runtimes on CPU:
| Step | Estimated Time |
|---|---|
| Data augmentation | ~2 min (skipped if already done) |
| CNN ReLU training | ~5–10 min |
| CNN Tanh training | ~5–10 min |
| CNN SGD training | ~5–10 min |
| CNN RandomSearch (10 trials) | ~50–60 min |
| Classical ML baselines | ~2–5 min |
| RandomizedSearchCV (SVM) | ~2 min |
This section covers how to fully reproduce the experimental outcomes on your own machine, and how to run the deployment scripts once the notebook has completed.
git clone <repo-url>
cd breadboard-analyzer
python -m venv venv
venv\Scripts\activate # Windows
source venv/bin/activate # Linux/macOS
pip install tensorflow==2.21.0 scikit-learn numpy pandas matplotlib seaborn pillow opencv-python scipy joblib keras-tuner tensorboard ipykernel
python -m ipykernel install --user --name=breadboard-venv --display-name "Python (breadboard-venv)"
Open PDE4444_Technical_Portfolio.ipynb in VS Code and select Python (breadboard-venv) as the kernel. Run all cells from top to bottom using Run All.
LOAD_PRETRAINED = False in Cell 16.LOAD_PRETRAINED = True in Cell 16.All outputs — plots, metrics, confusion matrices, and cross-validation tables — are generated inline within the notebook. The trained models are saved automatically to the Models/ directory on first run.
The following scripts can be run independently after the notebook has completed at least one full execution (so that models exist in Models/).
Live webcam inference (analyzer.py)
Loads the best saved CNN and opens a webcam feed. Each frame is classified in real time with a PASS/FAIL overlay.
python Scripts/analyzer.py
Press q to quit. The script also accepts a static image path as an argument for offline testing:
python Scripts/analyzer.py --image path/to/image.jpg
Arduino LED indicator (led_notifier.py)
Sends the PASS/FAIL result from the model to an Arduino over serial, which lights a green or red LED accordingly. Ensure the Arduino is connected and the correct COM port is set in the script before running.
python Scripts/led_notifier.py
Both scripts depend on models saved in
Models/. Run the notebook first to generate them, or use the pre-trained models already committed to the repository.
ReLU achieved 91.41% validation accuracy vs Tanh at 65.62% — a 25.78 percentage-point gap. Tanh converged to the majority-class baseline and failed to learn useful features, consistent with vanishing gradients in the 3-conv + 1-dense architecture.
Traditional ML (logistic regression on PCA-50 features):
| Method | Accuracy | Final Loss |
|---|---|---|
| Coordinate Search (zero-order) | 69.36% | 0.534 |
| Gradient Descent (first-order) | 65.90% | 1.287 (diverged) |
| Newton’s Method (second-order) | 73.41% | 0.476 |
Gradient Descent diverged at lr=0.01. Newton’s Method converged fastest (10 iterations) due to Hessian-scaled steps, feasible only because PCA reduced dimensions to 50.
Deep learning (CNN):
| Optimiser | Val Accuracy |
|---|---|
| Adam | 91.41% |
| SGD | 65.62% |
| Approach | Method | Result |
|---|---|---|
| CNN | keras-tuner RandomSearch (10 trials) | 93.75% (+2.34pp), best: 64 filters, 128 units, dropout=0.2, lr=0.001 |
| SVM | sklearn RandomizedSearchCV (10 iter, 3-fold) | 82.08% (no improvement; C=0.1, linear kernel was already optimal) |
| Model | CV Mean | Std Dev | Fold Range |
|---|---|---|---|
| SVM (Linear) | 86.98% | ±2.77% | 84.1% – 91.3% |
| Logistic Regression | 83.21% | ±2.64% | 79.0% – 87.0% |
| Gradient Boosting | 79.59% | ±3.52% | 74.6% – 84.8% |
| Random Forest | 77.42% | ±1.29% | 75.4% – 79.1% |
| KNN (k=5) | 72.36% | ±2.19% | 69.6% – 76.1% |
| MLP | 60.49% | ±13.26% | 34.8% – 73.2% |
SVM was the most consistent classical model. MLP was severely unstable; one fold dropped to 34.8%, below the majority-class baseline, confirming training divergence on that partition.
Input (224×224×3)
└─ Rescaling (÷255)
└─ Conv2D(32, 3×3, ReLU) + MaxPool(2×2)
└─ Conv2D(64, 3×3, ReLU) + MaxPool(2×2)
└─ Conv2D(128, 3×3, ReLU) + MaxPool(2×2)
└─ Flatten
└─ Dense(128, ReLU)
└─ Dropout(0.5)
└─ Dense(2, Softmax)
Training: Adam (lr=0.001), early stopping (patience=3), 70/15/15 split.
Raw images → Resize 64×64 → Flatten → 12,288-D feature vector
└─ StandardScaler
└─ PCA (50 components, 87.1% variance explained) [for optimiser comparison only]
└─ Classifier (SVM / LR / KNN / RF / GB / MLP)
See LICENSE.