📊 Project Overview
This project analyzes employee turnover at Salifort Motors using advanced machine learning techniques. The goal is to predict which employees are likely to leave the company and provide actionable insights for HR decision-making.
🏆 Key Achievements
📁 Dataset
- Source: Salifort Motors HR Department
- Records: 14,999 employees (11,991 after removing duplicates)
- Features: 10 variables including satisfaction level, evaluation scores, project count, hours worked, tenure, accidents, promotions, department, and salary
- Target: Employee departure (left = 1, stayed = 0)
- Class Distribution: 76.19% stayed, 23.81% left
🔬 Methodology
Data Preprocessing
- Removed 3,008 duplicate records
- Standardized column naming conventions
- Created engineered features:
satisfaction_squared: Non-linear satisfaction effectshours_per_project: Work intensity metricoverworked/underworked: Workload category flags
Feature Engineering
- One-hot encoding for categorical variables (department, salary)
- StandardScaler normalization for numerical features
- Train-test split (80/20) with stratification
Model Development
- Logistic Regression with GridSearchCV optimization
- Random Forest with hyperparameter tuning
- Gradient Boosting with learning rate optimization
- 5-fold cross-validation for all models
Model Evaluation
- Classification metrics (Precision, Recall, F1-Score, Accuracy)
- ROC-AUC curves for model comparison
- Confusion matrices for error analysis
- Feature importance analysis
📈 Model Results
| Model | Accuracy | Precision | Recall | F1-Score | ROC-AUC |
|---|---|---|---|---|---|
| Gradient Boosting WINNER | 98.1% | 95.1% | 93.2% | 0.941 | 0.981 |
| Random Forest | 98.6% | 98.9% | 92.7% | 0.957 | 0.978 |
| Logistic Regression | 75.9% | 48.2% | 87.9% | 0.623 | 0.872 |
🎯 Winner: Gradient Boosting
Gradient Boosting achieved the best overall performance with 98.1% ROC-AUC, balancing high precision (95.1%) and recall (93.2%). This model successfully identifies 371 out of 398 employees who will leave while minimizing false positives.
💼 Business Impact
ROI Analysis
- Departures Prevented: 371 employees correctly identified (vs. 13 baseline)
- Cost per Departure: $50,000 (recruitment, training, knowledge loss)
- Annual Savings: (371 - 13) × $50K = $17.9M
- False Alarm Cost: 19 × 5 hours × $50/hour × 12 months = $57K
- Net ROI: $17.9M - $57K = $17.84M per year
Key Recommendations
Immediate Actions (0-3 months)
- Deploy monthly risk scoring system using Gradient Boosting predictions
- Implement mandatory career reviews at 4-year tenure milestone
- Audit and accelerate promotion timelines (73% risk reduction)
- Investigate post-accident support programs
Strategic Initiatives (3-12 months)
- Develop differentiated retention strategies for low vs. high satisfaction employees
- Optimize project allocation to eliminate extreme workloads (2 or 7+ projects)
- Integrate model outputs into performance management systems
- Train managers on interpreting and acting on risk scores
🛠️ Technologies Used
- Python 3.11 - Core programming language
- Scikit-learn - Machine learning algorithms and GridSearchCV
- Pandas & NumPy - Data manipulation and analysis
- Matplotlib & Seaborn - Data visualization
- Jupyter Notebook - Interactive development environment
📚 Additional Resources
Essayez le notebook vous-même!
Cliquez sur les badges en haut pour ouvrir le notebook dans Google Colab, Binder, ou voir le code sur GitHub.
← Retour au portfolio