Make meta-learners scikit-learn compliant via BaseEstimator
Merged Jun 2026
Made BaseLearner inherit sklearn.base.BaseEstimator, giving every subclass get_params/set_params for free and enabling Pipeline and GridSearchCV compatibility out of the box
Refactored all five learner families (S/T/X/R/DR) to store constructor arguments verbatim in __init__ with no logic or deepcopy; all model construction deferred to fit()
Replaced the bespoke _unfitted_clone/_model_*_template machinery introduced in #910 with a direct clone(self) call in the bootstrap path, eliminating the regression where clone(self, safe=False) deepcopied fitted models on every bootstrap iteration
Fixed XGBRRegressor to use an explicit named-parameter signature with xgb_kwargs=None instead of *args/**kwargs, deferring all XGBRegressor construction to fit() so get_params()/clone() work correctly
Moved learner-presence validation out of __init__ into fit() across all learners, since __init__-time assertions break clone()
Added self.propensity = {} sentinel to BaseXLearner and BaseDRLearner so estimate_ate(pretrain=True) before fit() raises a clean ValueError instead of AttributeError
Fixed BaseTClassifier.predict fail-fast ordering to match BaseTLearner.predict, checking mutually exclusive flags at the top before any computation
Made fit() return self across all learners for Pipeline/GridSearchCV method chaining; nested params now visible via get_params (e.g. learner__max_depth)
Added 31 sklearn compliance tests to test_meta_learners.py covering clone()/get_params() round-trips for all 8 learner variants, fit() returns self, XGBRRegressor bootstrap CI path, bit-identical equivalence guards, and propensity sentinel consistency
Addressed all blocking and non-blocking maintainer review comments across 6+ review rounds, including merge conflict resolution, Cython extension troubleshooting on Windows, architecture consistency, and statistical correctness
Add Post-Fit Confidence Intervals to BaseTLearner via store_bootstraps and return_ci
Merged May 2026
Added store_bootstraps=False to BaseTLearner.fit(), enabling storage of a bootstrap ensemble after training for train-once, score-many workflows
Added return_ci=False to BaseTLearner.predict(), allowing confidence intervals to be generated on new unseen datasets without retraining
Introduced a reusable bootstrap ensemble framework through BaseLearner.fit_bootstrap_ensemble(), making the implementation extensible to additional causal inference meta-learners
Refactored bootstrap training into module-level helper functions to eliminate joblib parallelization and pickling issues caused by nested functions
Replaced deepcopy() with sklearn.base.clone() following EconML-style design patterns for efficient model replication and reproducibility
Added support for reproducible bootstrap inference through random_state handling and parallel execution via joblib
Extended confidence interval support to BaseTClassifier.predict(), enabling uncertainty estimation for classification-based treatment effect models
Added comprehensive test coverage for reproducibility, parallel execution (n_jobs > 1), random seed behavior, BaseTLearner confidence intervals, and BaseTClassifier confidence intervals
Addressed all blocking and non-blocking maintainer review comments across multiple review rounds, including architecture refactoring, API consistency, parallelization safety, and statistical correctness
Add Bootstrap Confidence Intervals and P-values to rate_score()
Merged Apr 2026
Extended rate_score() in causalml/metrics/rate.py with return_ci=False, n_bootstrap=200, alpha=0.05, and random_state=None parameters following sklearn conventions
When return_ci=True, uses half-sample bootstrap (m = n // 2, without replacement) per the Yadlowsky et al. (2021) functional CLT, returning SE, CI bounds, and a two-sided p-value testing H0: RATE = 0
Refactored integration logic into a module-level _compute_rate_from_toc() helper to eliminate code duplication and avoid joblib pickle issues with nested functions
Added 4 new tests to tests/test_rate.py using existing synthetic_df and rct_df fixtures and RANDOM_SEED from tests/const.py; addressed all blocking and non-blocking review comments across two review rounds; passed black and CI checks
Bootstrap inference verified correct against the Yadlowsky et al. (2021) paper by the maintainer across two review rounds
Add Rank-weighted Average Treatment Effect (RATE) Metric
Merged Mar 2026
Added causalml/metrics/rate.py with three public functions — get_toc(), rate_score(), and plot_toc() — following the exact same API conventions as get_qini / qini_score / plot_qini
get_toc() computes the Targeting Operator Characteristic curve via O(n) cumulative sums; rate_score() computes the RATE scalar with AUTOC (1/q) or Qini (q) weighting; plot_toc() visualizes the TOC curve
Supported both oracle mode (simulated tau) and observed RCT mode (y + w); fixed normalize division-by-zero by using max(|TOC|) instead of TOC(1); added logger.warning for observed-outcome fallback
Added 20 unit tests in tests/test_rate.py; addressed all blocking and non-blocking review comments across two review rounds; passed black and pre-commit clean
Implementation verified correct against the Yadlowsky et al. (2021) paper and the grf R package reference by the maintainer
Add Native NaN Support for UpliftTree and UpliftRandomForest
Merged Mar 2026
Added native NaN routing logic to each candidate split, evaluating both left/right directions and learning the optimal routing per node — consistent with scikit-learn's decision tree behavior
Stored the learned NaN routing in each DecisionTree node and applied it consistently during training, pruning, filling, and prediction
Guarded all np.isnan() calls with np.issubdtype(..., np.number) to prevent TypeError on string/categorical columns
Added NaN-aware percentile calculation by filtering out NaN values before computing split thresholds
Added two targeted tests: one for NaN values in numeric columns, one for None values in object-dtype columns