← Retour au projet

Salifort Motors HR Analytics

Employee Turnover Prediction - Documentation Complète

Author Abdoulaye Leye
Project Type Capstone Project - Google Data Analytics
Date November 2025

📊 Project Overview

This project analyzes employee turnover at Salifort Motors using advanced machine learning techniques. The goal is to predict which employees are likely to leave the company and provide actionable insights for HR decision-making.

🏆 Key Achievements

98.1%
ROC-AUC Score (Gradient Boosting)
95.1%
Precision Rate
93.2%
Recall Rate
$18.5M
Potential Annual Savings

📁 Dataset

  • Source: Salifort Motors HR Department
  • Records: 14,999 employees (11,991 after removing duplicates)
  • Features: 10 variables including satisfaction level, evaluation scores, project count, hours worked, tenure, accidents, promotions, department, and salary
  • Target: Employee departure (left = 1, stayed = 0)
  • Class Distribution: 76.19% stayed, 23.81% left

🔬 Methodology

Data Preprocessing

  • Removed 3,008 duplicate records
  • Standardized column naming conventions
  • Created engineered features:
    • satisfaction_squared: Non-linear satisfaction effects
    • hours_per_project: Work intensity metric
    • overworked/underworked: Workload category flags

Feature Engineering

  • One-hot encoding for categorical variables (department, salary)
  • StandardScaler normalization for numerical features
  • Train-test split (80/20) with stratification

Model Development

  • Logistic Regression with GridSearchCV optimization
  • Random Forest with hyperparameter tuning
  • Gradient Boosting with learning rate optimization
  • 5-fold cross-validation for all models

Model Evaluation

  • Classification metrics (Precision, Recall, F1-Score, Accuracy)
  • ROC-AUC curves for model comparison
  • Confusion matrices for error analysis
  • Feature importance analysis

📈 Model Results

Model Accuracy Precision Recall F1-Score ROC-AUC
Gradient Boosting WINNER 98.1% 95.1% 93.2% 0.941 0.981
Random Forest 98.6% 98.9% 92.7% 0.957 0.978
Logistic Regression 75.9% 48.2% 87.9% 0.623 0.872

🎯 Winner: Gradient Boosting

Gradient Boosting achieved the best overall performance with 98.1% ROC-AUC, balancing high precision (95.1%) and recall (93.2%). This model successfully identifies 371 out of 398 employees who will leave while minimizing false positives.

💼 Business Impact

ROI Analysis

  • Departures Prevented: 371 employees correctly identified (vs. 13 baseline)
  • Cost per Departure: $50,000 (recruitment, training, knowledge loss)
  • Annual Savings: (371 - 13) × $50K = $17.9M
  • False Alarm Cost: 19 × 5 hours × $50/hour × 12 months = $57K
  • Net ROI: $17.9M - $57K = $17.84M per year

Key Recommendations

Immediate Actions (0-3 months)

  • Deploy monthly risk scoring system using Gradient Boosting predictions
  • Implement mandatory career reviews at 4-year tenure milestone
  • Audit and accelerate promotion timelines (73% risk reduction)
  • Investigate post-accident support programs

Strategic Initiatives (3-12 months)

  • Develop differentiated retention strategies for low vs. high satisfaction employees
  • Optimize project allocation to eliminate extreme workloads (2 or 7+ projects)
  • Integrate model outputs into performance management systems
  • Train managers on interpreting and acting on risk scores

🛠️ Technologies Used

  • Python 3.11 - Core programming language
  • Scikit-learn - Machine learning algorithms and GridSearchCV
  • Pandas & NumPy - Data manipulation and analysis
  • Matplotlib & Seaborn - Data visualization
  • Jupyter Notebook - Interactive development environment

📚 Additional Resources

Essayez le notebook vous-même!

Cliquez sur les badges en haut pour ouvrir le notebook dans Google Colab, Binder, ou voir le code sur GitHub.

← Retour au portfolio