Titanic Survival Predictor

Project Summary

A classic binary classification project to predict passenger survival on the Titanic. It includes data cleaning, imputation, encoding, scaling, and feature engineering (e.g., titles from names, family size, cabin deck). Multiple scikit‑learn models are compared using cross‑validation; the best model is persisted and exposed with a lightweight UI for quick inference.

Skills Demonstrated

Data Cleaning Imputation & Encoding Feature Engineering Exploratory Data Analysis Model Selection Cross‑Validation Hyperparameter Tuning Evaluation (Accuracy/ROC‑AUC) Model Persistence Streamlit UI

Tools Used

Python 3.8+ Pandas NumPy Scikit‑Learn Matplotlib Seaborn Joblib Streamlit Hugging Face Spaces

Solution

A reproducible ML pipeline built with scikit‑learn pipelines: preprocess (impute, encode, scale), engineer domain features, train and validate several classifiers, and persist the best performer. A minimal Streamlit UI hosted on Hugging Face Spaces accepts inputs and returns predicted survival with explanatory outputs.

Approach

Data Prep: Handle missing values, encode categoricals, scale numerics.
Feature Engineering: Extract titles, family size, ticket/cabin cues, deck, etc.
Modeling: Train baseline (LogReg), tree‑based (RF/GB), and others with CV.
Tuning: Grid/Random search on key hyperparameters with stratified CV.
Evaluate: Compare Accuracy/ROC‑AUC; inspect confusion matrix.
Persist & Serve: Save best model (joblib) and expose via Streamlit UI.

Project Link(s)

Repository: Titanic Survival Predictor

Streamlit App (Hugging Face Space)

Completed: 2025

Project Snapshots

Project Technical Information