v2.4.1 — Apache 2.0 Open Source
$ automl fit --target revenue --deploy

Raw data in.
Production model out.

AutoMLHub eats your dataset and spits out a deployed model before most teams finish arguing about hyperparameters. Zero vendor lock-in. Fully open source.

automlhub — training run #2847
LIVE
DatasetIngested
filerevenue_q4.csv
rows847,293
features34
targetrevenue
nulls0 (cleaned)
taskregression
Feature importance
customer_ltv
82%
churn_score
61%
region_code
44%
product_tier
38%
Model Leaderboardsorted by F1 ↓
#
Model
F1
Time
Progress
1
XGBoost_v3
0.947
2m 14s
2
LightGBM_v2
0.931
1m 52s
3
RandomForest_v1
0.918
3m 08s
4
CatBoost_v1
0.904
5
NeuralNet_v2
[14:32:07] XGBoost_v3 — converged at epoch 847 · F1=0.947
[14:32:09] CatBoost_v1 — training epoch 312/480 · ETA 1m 23s
[14:32:11] automl deploy --model XGBoost_v3 --env production
[14:32:12] ✓ deployed → api.company.io/predict · p50=14ms
47models
evaluated per run
3,200lines
of boilerplate saved
8.3×faster
than manual pipeline
// Feature Matrix

Every manual step, automated.

Six capabilities. Six terminal commands. Scroll through and watch the manual alternative become progressively more absurd.

Data Ingestion

From any source to a clean tensor in one command

terminal
$automl ingest --source revenue_q4.csv --profile
Capability
AutoMLHub
Manual Workflow
Supported formats
CSV, Parquet, JSON, Avro, SQL, S3, GCS
Write custom connectors per format (~200 LOC each)
Null handling
Auto-detect + impute (median/mode/KNN)
df.fillna() per column, 40–80 lines
Type inference
Automatic with datetime parsing
pd.to_datetime() + dtypes audit, ~60 lines
Outlier detection
IQR + Z-score, configurable policy
Custom logic per feature, 100+ lines
Schema validation
Auto-generated + drift alerts
Write Pydantic/Great Expectations schema manually
AutoMLHub
LOC1 line
Time3s
Coverage12+
Manual
LOC~400 lines
Time2–3 days
Coverage1 at a time

Feature Engineering

Automated transformations that would take a sprint to write

terminal
$automl features --auto --importance-threshold 0.02
Capability
AutoMLHub
Manual Workflow
Categorical encoding
Auto-select: OHE / target / ordinal by cardinality
Decide + implement per column, ~80 lines
Numeric scaling
StandardScaler / MinMax / RobustScaler auto-chosen
sklearn Pipeline boilerplate, ~40 lines
Datetime features
28 derived features from timestamp columns
Custom extraction per column, 60–120 lines
Feature interactions
Polynomial + cross-feature terms, pruned by importance
Manual specification, combinatorial explosion risk
Feature selection
SHAP-based pruning, configurable threshold
Recursive elimination or correlation matrix analysis
AutoMLHub
LOC1 flag
Time8s
Coverage94 features auto
Manual
LOC~600 lines
Time3–5 days
CoverageWhatever you thought of

Model Selection

Race 47 candidates. Crown the winner. Skip the argument.

terminal
$automl select --task regression --metric f1 --time-budget 30m
Capability
AutoMLHub
Manual Workflow
Candidate models
47 across tree, linear, neural, ensemble families
Whatever you have time to try (usually 3–5)
Evaluation metric
20+ metrics, multi-objective support
Hardcode metric in training loop
Cross-validation
Stratified k-fold auto-configured
StratifiedKFold boilerplate, ~50 lines
Parallelism
All candidates on available cores simultaneously
Sequential unless you write joblib/Ray logic
Ensemble creation
Stacking + blending from top-k models
Complex custom implementation, 200+ lines
AutoMLHub
LOC1 command
Time28 min
Coverage47 models
Manual
LOC~800 lines
Time1–2 weeks
Coverage3–5 models

Hyperparameter Tuning

Stop guessing. Let Bayesian search find the global optimum.

terminal
$automl tune --model XGBoost_v3 --trials 200 --sampler tpe
Capability
AutoMLHub
Manual Workflow
Search algorithm
Bayesian (TPE), Grid, Random, CMA-ES
GridSearchCV (exhaustive, slow) or manual guessing
Search space definition
Auto-generated from model type
Define param_grid manually, ~40 lines
Early stopping
Automatic with configurable patience
Custom callback implementation
Parallel trials
Distributed across cores/machines
Requires Dask/Ray setup, ~200 lines overhead
Result tracking
Built-in MLflow-compatible experiment log
Set up MLflow or W&B integration separately
AutoMLHub
LOC1 command
Time~2 hours
CoverageBayesian optimal
Manual
LOC~500 lines
Time3–7 days
CoverageGrid or luck

Deployment

From winning model to live REST endpoint: one command.

terminal
$automl deploy --model XGBoost_v3 --env production --replicas 3
Capability
AutoMLHub
Manual Workflow
Serving framework
FastAPI auto-generated with OpenAPI schema
Write Flask/FastAPI app from scratch, ~300 lines
Containerization
Dockerfile auto-generated + optimized
Write Dockerfile, manage base images
Input validation
Pydantic schema inferred from training data
Manual schema definition + validation logic
Scaling config
Replica count + resource limits one flag
Kubernetes YAML manifests, ~150 lines
Rollback
Instant rollback to any previous model version
Custom CI/CD pipeline with versioning logic
AutoMLHub
LOC1 command
Time< 4 min
CoverageREST + gRPC + batch
Manual
LOC~1,200 lines
Time2–4 days
CoverageWhatever you build

Monitoring

Know the moment your model starts lying to your users.

terminal
$automl monitor --model XGBoost_v3 --alert-threshold 0.05
Capability
AutoMLHub
Manual Workflow
Data drift detection
PSI + KS-test on all features, auto-baseline
Custom evidently/whylogs integration, ~300 lines
Performance tracking
Real-time F1/RMSE on labelled stream
Custom logging + metrics pipeline
Alerting
Slack / PagerDuty / webhook out-of-the-box
Custom alert logic + notification integration
Retraining trigger
Auto-trigger on drift threshold breach
Custom cron + threshold logic, ~200 lines
Explainability
SHAP values + feature drift per prediction
Separate SHAP integration, ~100 lines
AutoMLHub
LOC1 command
Time< 30s setup
CoverageAll features
Manual
LOC~900 lines
Time1–2 weeks
CoverageWhat you build
Ready to install

Install in 60 seconds.

One command. No config files. No cloud account. No vendor to call when pricing changes.

pip · terminal
$pip install automlhub

Requires Python 3.9+. Installs CLI + Python SDK.

Quick start — 3 commands to a deployed model
$automl ingest --source data.csv
# ingest & profile your dataset
$automl fit --target revenue --deploy
# train, select, tune, deploy
$curl api.localhost/predict -d '{"customer_ltv": 8400}'
# call your live endpoint
Star on GitHub12.4k ★
⚖️Apache 2.0Forever free
🔒No telemetryYour data stays yours
🖥️Self-hostedYour infra, your rules
47 modelsPer training run