Skip to main content
 
 

Edge Staker MLB

Using the power of Machine Learning to give you an edge

We’ve engineered a professional-grade

analytics platform from the ground up.

Here’s what’s under the hood.

Ensemble AI Models

Instead of relying on a single model, our system combines predictions from a team of specialized models (XGBoost and LightGBM ). Each model is pre-selected as a ‘champion’ for its high performance on specific historical data sets. Their individual predictions are then intelligently weighted and combined to produce a single, more robust and reliable consensus forecast.

Deep Statistical Features

The feature engine processes raw MLB StatsAPI data into over 100 predictive variables for each game. This includes derived metrics like park-adjusted performance, pitcher fatigue indices, recent form momentum (e.g., OPS vs. 30-day average), and Pythagorean luck differentials to quantify team over/underperformance.

Advanced Betting Metrics

The application translates raw model probabilities into actionable betting metrics. For each prediction with available market odds, it automatically calculates the implied market probability, model edge (alpha), expected value (EV), and the optimal Kelly Criterion stake fraction to guide bankroll management.

Live Odds Integration

The system fetches live moneyline odds from The Odds API, aggregating prices from multiple US bookmakers. In the event of an API service disruption, a built-in fallback mechanism automatically scrapes ESPN’s scoreboard to ensure continuous odds availability.

Isotonic Calibration

Raw model outputs are not used directly. Each model’s predictions are passed through a post-processing step using Isotonic Calibration (via scikit-learn). This corrects for model bias and ensures that the predicted probabilities are reliable and closely reflect real-world win frequencies.

Derived Feature Engine

The system includes a robust feature engineering pipeline that automatically calculates dozens of derived stats not available in standard data feeds, such as Fielding Independent Pitching (FIP), BABIP, and Pythagorean win expectancy. This creates a rich, high-dimensional feature set for the models to learn from.

Bayesian Optimization

Model performance is maximized through automated hyperparameter tuning using the Hyperopt library. This Bayesian optimization process systematically finds the best model configurations (e.g., learning rate, tree depth, regularization) to achieve the highest predictive accuracy during cross-validation.

Time-Series Validation

Model integrity is maintained using a strict chronological data splitting methodology. The dataset is partitioned into training, calibration, and testing sets based on game dates, ensuring the model never trains on future information. This out-of-time validation provides a true measure of real-world predictive power.

User-Friendly Dashboard

The front-end is a clean, single-page application that presents all predictions and associated betting metrics. Key functionality includes bankroll and Kelly fraction inputs for dynamic bet sizing, and the ability to sort all available games by game time, model edge, confidence, or expected value.