Titanic Survival Predictor


Project Technical Information

Project Name:
Titanic Survival Predictor
Project Type:
Classification Supervised Learning Feature Engineering
Tech Stack:
Python 3.8+ Scikit-Learn Pandas NumPy Matplotlib Seaborn
AI Features:
Cross-validation Hyperparameter Tuning Evaluation Metrics (Accuracy/ROC-AUC) Model Persistence

Project Summary

A classic binary classification project to predict passenger survival on the Titanic. It includes data cleaning, imputation, encoding, scaling, and feature engineering (e.g., titles from names, family size, cabin deck). Multiple scikit‑learn models are compared using cross‑validation; the best model is persisted and exposed with a lightweight UI for quick inference.

Skills Demonstrated

Data Cleaning Imputation & Encoding Feature Engineering Exploratory Data Analysis Model Selection Cross‑Validation Hyperparameter Tuning Evaluation (Accuracy/ROC‑AUC) Model Persistence Streamlit UI

Tools Used

Python 3.8+ Pandas NumPy Scikit‑Learn Matplotlib Seaborn Joblib Streamlit Hugging Face Spaces

Solution

A reproducible ML pipeline built with scikit‑learn pipelines: preprocess (impute, encode, scale), engineer domain features, train and validate several classifiers, and persist the best performer. A minimal Streamlit UI hosted on Hugging Face Spaces accepts inputs and returns predicted survival with explanatory outputs.

Approach

  1. Data Prep: Handle missing values, encode categoricals, scale numerics.
  2. Feature Engineering: Extract titles, family size, ticket/cabin cues, deck, etc.
  3. Modeling: Train baseline (LogReg), tree‑based (RF/GB), and others with CV.
  4. Tuning: Grid/Random search on key hyperparameters with stratified CV.
  5. Evaluate: Compare Accuracy/ROC‑AUC; inspect confusion matrix.
  6. Persist & Serve: Save best model (joblib) and expose via Streamlit UI.

Designed and Developed by Aradhya Pavan H S