Doubly-robust semiparametric inference#

LFC() estimates per-gene log-fold changes using a doubly-robust AIPW estimator. It requires the augmented covariate matrix W = [X | U] where U are the latent factors from fit_gcate(), and produces a DataFrame with effect estimates, standard errors, and BH-adjusted p-values.

For screens with hundreds of perturbations use gcate_lfc_batch(), which runs GCATE and LFC in batches to keep peak memory bounded.

Effect-size columns#

The returned DataFrame reports the treatment-versus-control effect on both natural-log and base-2 scales:

Column	Definition
`tau`	Natural logarithm of the treated/control mean ratio.
`std`	Standard error of `tau` on the natural-log scale.
`log2fc`	Base-2 log fold change, exactly `tau / log(2)`.
`log2fc_se`	Standard error of `log2fc`, exactly `std / log(2)`.

log2fc_se is the standard error of the estimated log2 fold change, not the sample standard deviation of expression. The rescaling leaves stat, p-values, adjusted p-values, and discoveries unchanged. tau and std remain available for backward compatibility. The same columns are returned by gcate_lfc_batch; compatible caches made by older versions are upgraded in memory when loaded.

Choosing the variance estimator#

LFC defaults to usevar='unequal' (Welch inference), which estimates the treatment and control variances separately. Prefer this default when the treatment and control sample sizes, or their propensity-weighted effective sample sizes, are meaningfully unbalanced. Also retain it when arm-specific pseudo-outcome variances may differ and for case-control, bulk, and donor-level pseudo-bulk analyses. Independence of the rows does not imply equal treatment-arm variances: disease severity, biological response, residual composition, library size, treatment imbalance, and heterogeneous expression can all make pooled inference anti-conservative.

For a small, approximately balanced perturbation comparison, usevar='pooled' may provide better power when the independent sampling units and arm-specific pseudo-outcome variances are reasonably comparable. Treat it as an opt-in, empirically justified analysis rather than an automatic small-sample choice. Balanced counts alone do not justify pooling in a case-control study. Pooled inference can produce much smaller standard errors and substantially more discoveries. For a deliberately justified batched analysis, pass lfc_kwargs=dict(usevar='pooled').

There is no universal sample-size ratio at which the recommendation changes. Compare nominal arm sizes, propensity-weighted effective sample sizes, arm-specific pseudo-outcome variability, and the stability of discoveries under both estimators. Retain usevar='unequal' when these diagnostics do not support pooling.

Donor-level independence alone does not establish equal arm variances. For example, the SEA-AD tutorial uses usevar='unequal' because disease severity, inter-individual response, residual cell composition, and library-size variation can produce different gene-wise variability between disease groups.

Welch inference does not itself model within-subject correlation. Repeated cells from the same donor or experimental unit should still be pseudo-bulked or handled with cluster-aware inference; usevar='unequal' only protects against unequal arm variances.

Propensity diagnostics#

estimate_propensity_scores() estimates propensity scores without fitting the outcome model. Use K=5 to obtain out-of-fold scores for positivity and overfitting diagnostics. summarize_propensity_scores() reports overlap, tail mass, and inverse-weight effective sample size, while plot_propensity_scores() compares treatment and control distributions.

Both the standalone estimator and LFC use class-balanced logistic propensity fitting by default. This preserves historical causarray behavior and ensures that standalone overlap diagnostics describe the same nuisance model used for effect estimation. Pass class_weight=None to estimate_propensity_scores or ps_class_weight=None to LFC for a calibrated-probability sensitivity analysis.

LFC uses the standard AIPW pseudo-outcome, which may be negative for individual cells even though its counterfactual mean is positive. Individual pseudo-outcomes are never clipped because doing so can bias the arm means, particularly when a large shared control group is compared with much smaller treatment groups.

For a calibrated-propensity sensitivity analysis, use LFC(..., ps_class_weight=None) and diagnose the matching scores with estimate_propensity_scores(..., class_weight=None).

The result includes mean_control, mean_treated, and estimable. These are computed from the unclipped pseudo-outcomes. For numerical stability, valid aggregate arm means are floored at thres_diff only when constructing the log ratio and delta-method denominator. A pair with a nonfinite or nonpositive raw aggregate mean remains non-estimable, so the aggregate floor cannot create an extreme discovery from an invalid estimate.

causarray.DR_learner.LFC(Y, W, A, W_A=None, family='nb', offset=False, Y_hat=None, pi_hat=None, cross_est=False, K=None, mask=None, usevar: Literal['unequal', 'pooled'] = 'unequal', thres_min=0.01, thres_diff=0.01, eps_var=0.0001, fdx=False, fdx_alpha=0.05, fdx_c=0.1, verbose=False, backend: str = 'auto', ps_clip=(0.01, 0.99), ps_class_weight='balanced', **kwargs)#

Estimate log-fold changes of treatment effects (LFCs) using AIPW.

Fits a doubly-robust AIPW estimator for the log-ratio of counterfactual means E[Y(1)] / E[Y(0)]. Call this after fit_gcate() to incorporate estimated latent factors into the covariate matrix W.

Parameters:

Yarray, shape (n, p)

Count matrix of outcomes.

Warray, shape (n, d)

Covariate matrix, typically [X | U] where U are the latent factors from GCATE.

Aarray, shape (n, a)

Binary treatment indicator matrix.

W_Aarray or None, shape (n, d_A)

Covariate matrix for the propensity model. If None, W is used.

familystr

GLM family for the outcome model: 'nb' (default) or 'poisson'.

offsetbool or array-like

Log-scale offset for the outcome model. True computes size factors automatically; False or None disables the offset.

Y_hatarray or None, shape (n, p, a, 2)

Pre-computed counterfactual predictions. When provided, cross-fitting is skipped.

pi_hatarray or None, shape (n, a)

Pre-computed propensity scores. When provided, propensity fitting is skipped.

cross_estbool

Whether to use two-fold cross-estimation for nuisance parameters. An explicit K takes precedence.

Kint or None

Number of nuisance-estimation folds. None uses 2 when cross_est=True and 1 otherwise.

maskarray or None, shape (n, a)

Boolean mask indicating eligible cells for each treatment. It limits propensity-model fitting and final estimand computation.

usevarstr

Variance estimator for the AIPW pseudo-outcomes:

'unequal' (default, v0.0.6+): Welch variance s₀²/n₀ + s₁²/n₁ with Welch-Satterthwaite degrees of freedom; p-values use the t-distribution. Prefer this estimator when treatment and control sample sizes or effective sample sizes are meaningfully unbalanced, when arm-specific pseudo-outcome variances may differ, and for case-control, bulk, and donor-level pseudo-bulk analyses. Independence of the rows does not imply equal treatment-arm variances, and biological heterogeneity or imbalance can make pooled inference anti-conservative.
'pooled': pooled-variance estimator (s² + eps_var) / n. For a small, approximately balanced perturbation comparison, this estimator may provide better power when the independent sampling units and arm-specific pseudo-outcome variances are reasonably comparable. Treat it as an opt-in, empirically justified analysis, not as an automatic choice for every small study. Balanced sample counts alone are insufficient for case-control data, where biological heterogeneity commonly favors 'unequal'. Pooled inference can produce substantially smaller standard errors and many more discoveries.

There is no universal arm-size ratio at which the choice should switch. Inspect nominal and propensity-weighted effective sample sizes, arm-specific pseudo-outcome variability, and sensitivity of the discoveries. When those diagnostics are uncertain, retain 'unequal'.

'unequal' accommodates arm-specific variance but does not model within-donor or within-subject correlation. Repeated cells from the same biological unit should still be pseudo-bulked or analyzed with a cluster-aware method; changing usevar alone does not remove pseudoreplication.

Changed in version 0.0.6: Default changed from 'pooled' to 'unequal'. The 'unequal' formula was also corrected from (s₀²/n₀ + s₁²/n₁)/2 to the standard Welch form, which shrinks t-statistics by ≈ √2 relative to v0.0.5. Pass usevar='pooled' to recover pre-v0.0.6 behaviour.

thres_minfloat

Genes whose maximum counterfactual mean is below this threshold are excluded (reported as tau=0, padj=NaN).

thres_difffloat

Genes whose counterfactual means differ by less than this value are excluded.

eps_varfloat

Small constant added to per-arm variances to prevent division by zero.

fdxbool

Whether to apply FDX control (P(FDP > fdx_c) < fdx_alpha).

fdx_alphafloat

Significance level for FDX control.

fdx_cfloat

FDP threshold for FDX control.

verbosebool

Print progress information.

backendstr

GLM backend: "auto" (default), "fast" (force crispyx), or "original" (force statsmodels).

ps_cliptuple(float, float)

Bounds applied to propensity scores used by AIPW. Raw, unclipped scores remain available as estimation['pi_hat_raw'].

ps_class_weightstr, dict or None

Class weighting for the propensity model. 'balanced' remains the default to limit nuisance-model drift; pass None for calibrated probabilities.

**kwargs

Additional arguments forwarded to the GLM fitting functions.

Returns:

df_resDataFrame: Test results with natural-log effect estimate tau and standard error std, together with their base-2 equivalents log2fc and log2fc_se. The fold change is treatment relative to control, and log2fc_se is a standard error (not a sample standard deviation). The result also contains inference columns, raw mean_control and mean_treated counterfactual means, and an estimable flag (plus trt for multiple treatments).

New in version 0.0.8: Added the log2fc and log2fc_se convenience columns. The original natural-log tau and std columns remain unchanged.

causarray.DR_learner.LFC_batch(*args, **kwargs)#: Deprecated alias for gcate_lfc_batch().

Deprecated since version Use: gcate_lfc_batch instead. LFC_batch will be removed in a future release.

causarray.DR_learner.VIM(eta_est, X, id_covs, **kwargs)#

Estimate variable importance measures (VIM) for heterogeneous treatment effects.

Decomposes treatment effect variance into components explained by each covariate using conditional average treatment effect (CATE) regression.

Parameters:

eta_estarray, shape (n, p): Influence function values from LFC() or compute_causal_estimand().
Xarray, shape (n, d): Covariate matrix.
id_covsint or array-like of int: Column indices of X to compute VIM for. An integer k is treated as range(k).

Returns:

estimationdict

Dictionary with keys:

'CATE', 'CATE_lower', 'CATE_upper'array, shape (n_covs, n, p): Conditional average treatment effect and pointwise confidence band.
'VTE'array, shape (p,): Total variance of the treatment effect (marginal).
'CVTE'array, shape (n_covs, p): Conditional variance of the treatment effect given each covariate.
'VIM_mean'array, shape (n_covs, p): VIM point estimate CVTE / VTE - 1 for each covariate and gene.
'VIM_sd'array, shape (n_covs, p): Standard deviation of the VIM estimate.

causarray.DR_learner.compute_causal_estimand(estimand, Y, W, A, W_A=None, family='nb', offset=False, Y_hat=None, pi_hat=None, mask=None, fdx=False, fdx_B=1000, fdx_alpha=0.05, fdx_c=0.1, verbose=False, random_state=0, backend: str = 'auto', K=1, ps_clip=(0.01, 0.99), ps_class_weight='balanced', **kwargs)#

Estimate causal treatment effects using AIPW with a user-supplied estimand.

Parameters:

estimandcallable: Function that maps influence function values (etas, A) to (eta_est, tau_est, var_est[, df_eff]). See LFC() for an example implementation.
Yarray, shape (n, p): Count matrix of outcomes.
Warray, shape (n, d): Covariate matrix (including latent factors from GCATE).
Aarray, shape (n, a): Binary treatment indicator matrix.
W_Aarray or None, shape (n, d_A): Covariate matrix for the propensity model. If None, W is used.
familystr: GLM family for the outcome model: 'nb' (default) or 'poisson'.
offsetbool or array-like: Log-scale offset for the outcome model. True computes size factors automatically; False or None disables the offset.
Y_hatarray or None, shape (n, p, a, 2): Pre-computed counterfactual predictions. When provided, cross-fitting is skipped.
pi_hatarray or None, shape (n, a): Pre-computed propensity scores. When provided, propensity fitting is skipped.
Kint: Number of folds used for nuisance estimation. The default 1 preserves in-sample fitting.
ps_cliptuple(float, float): Bounds applied to propensity scores used by AIPW.
ps_class_weightstr, dict or None: Class weighting for the propensity model. 'balanced' preserves the established nuisance fit; pass None for calibrated probabilities.
maskarray or None, shape (n, a): Boolean mask indicating eligible cells for each treatment. It limits propensity-model fitting and final estimand computation.
fdxbool: Whether to apply FDX control (P(FDP > fdx_c) < fdx_alpha).
fdx_Bint: Number of bootstrap samples for FDX control.
fdx_alphafloat: Significance level for FDX control.
fdx_cfloat: FDP threshold for FDX control.
backendstr: GLM backend: "auto" (default), "fast" (force crispyx), or "original" (force statsmodels).
verbosebool: Print progress information.
**kwargs: Additional arguments forwarded to the GLM fitting functions.

Returns:

df_resDataFrame: Test results produced by estimand. An estimand may optionally return a fifth dictionary whose arrays are added as diagnostic columns.

causarray.DR_learner.gcate_lfc_batch(Y, X, A, r, W_A=None, batch_size=10, n_batches=None, max_cells=2000, n_ctrl=2000, family='nb', offset=True, warm_start_U=False, cache_path=None, random_state=0, verbose=False, gcate_kwargs=None, lfc_kwargs=None, **kwargs)#

Batch-wise GCATE + doubly-robust LFC estimation.

Partitions perturbations into chunks of batch_size, runs fit_gcate_batch() to estimate per-batch latent confounders, then calls LFC() on each batch independently. All large intermediate arrays (res_1, res_2, Y_hat, pi_hat) are freed immediately after each batch so that peak memory is bounded by one batch’s worth of data regardless of the total number of perturbations.

Results can optionally be cached to an HDF5 file (cache_path) so that interrupted runs can be resumed without re-processing completed batches.

Parameters:

Yarray-like or DataFrame, shape (n, p)

Count matrix.

Xarray, shape (n, d)

Covariate matrix (intercept column should be included).

Aarray-like or DataFrame, shape (n, a)

Binary treatment indicator matrix; control cells have all-zero rows.

rint

Number of latent factors.

W_Aarray or None, shape (n, d_A)

Propensity-score covariate matrix. If None, X is used.

batch_sizeint

Perturbations per batch (default 10). Ignored when n_batches is set. Batches are sized evenly with numpy.array_split() so the last batch is never drastically smaller than the others.

n_batchesint or None

Total number of batches. When set, overrides batch_size and perturbations are split as evenly as possible across exactly n_batches batches (e.g. n_batches=2 on a 29-pert dataset gives two batches of 15 and 14).

max_cellsint or None

Maximum pert cells per batch (default 2 000). None means no cap. Ctrl cells are added on top so the actual batch size is at most n_ctrl + max_cells. The cap is rarely active because typical Perturb-seq datasets have only a few hundred cells per perturbation.

n_ctrlint

Number of ctrl cells in the fixed subsample (default 2 000).

familystr

GLM family (default 'nb').

offsetbool or array-like

Offset specification passed to fit_gcate_batch().

warm_start_Ubool

Passed to fit_gcate_batch().

cache_pathstr or None

Path to an HDF5 file used for incremental caching. When set:

On entry, any already-computed batches are loaded from the store and their indices are skipped by fit_gcate_batch().
After each new batch, the result DataFrame is appended to the store under key /batch_{i:04d}.
On exit, all batches (cached + newly computed) are concatenated and returned.

This lets you resume an interrupted run by re-calling the function with the same cache_path — completed batches are not re-run.

random_stateint

RNG seed.

verbosebool

Print per-batch timing.

gcate_kwargsdict or None

Extra keyword arguments forwarded to fit_gcate_batch() (and ultimately fit_gcate()). E.g.:

gcate_kwargs=dict(backend='fast',
                  kwargs_es_1=dict(max_iters=10, rel_tol=2e-4),
                  kwargs_es_2=dict(max_iters=10, rel_tol=2e-4))

lfc_kwargsdict or None

Extra keyword arguments forwarded to LFC() (e.g. usevar, fdx, thres_min). Retain the default usevar='unequal' when arm sizes or effective sample sizes are meaningfully unbalanced, when arm-specific variability may differ, and for case-control, bulk, or pseudo-bulk analyses. For a small, approximately balanced perturbation comparison with comparable pseudo-outcome variability, lfc_kwargs=dict(usevar='pooled') may improve power. There is no universal balance threshold; compare the relevant diagnostics and retain 'unequal' when uncertain.

**kwargs

Additional arguments forwarded to both fit_gcate_batch() and LFC(). When a key collides with gcate_kwargs / lfc_kwargs, the stage-specific dict wins — this lets you scope a kwarg to one stage (e.g. gcate_kwargs=dict( backend='fast') paired with a top-level backend='original' targeting LFC).

Returns:

df_resDataFrame: Concatenated result from all batches. Includes natural-log tau and std columns, base-2 log2fc and log2fc_se columns, and a 'batch' column with the 0-based batch index. Older compatible caches containing only tau and std are upgraded in memory.

causarray.DR_estimation.estimate_propensity_scores(A, X_A, K=1, ps_model='logistic', mask=None, clip=None, random_state=0, verbose=False, class_weight='balanced', **kwargs)#

Estimate per-treatment propensity scores.

Each treatment is compared with the shared all-zero control group. With K > 1, every returned score is predicted by a model that did not train on that cell. Logistic models use class_weight='balanced' by default, matching LFC() and historical causarray fits. Pass class_weight=None for calibrated treatment probabilities.

Parameters:

Aarray-like, shape (n,) or (n, a): Binary treatment indicators. Rows containing only zeros are controls.
X_Aarray-like, shape (n, d_A): Covariates used by the propensity model, including an intercept column when fit_intercept=False.
Kint, optional: Number of folds. 1 fits and predicts on all eligible cells; values greater than one produce out-of-fold predictions.
ps_model{‘logistic’, ‘random_forest_cv’, ‘ensemble’}, optional: Propensity model.
maskarray-like or None, shape (n,) or (n, a): Optional per-treatment eligibility mask for model fitting.
cliptuple(float, float) or None, optional: Bounds applied after prediction. None returns raw probabilities.
random_stateint, optional: Random seed used for fold construction and supported estimators.
class_weightstr, dict or None, optional: Class weighting for logistic propensity estimation. The default 'balanced' matches LFC(); pass None for calibrated probabilities.

Returns:

pi_hatndarray, shape (n, a): Estimated probabilities P(A_j=1 | X_A).

Diagnostics for treatment overlap and propensity-score quality.

causarray.diagnostics.plot_propensity_scores(A, pi_hat, treatments=None, treatment_names=None, overlap_bounds=(0.05, 0.95), bins=40, max_panels=4, axes=None)#: Plot propensity distributions for treatment and control cells.

causarray.diagnostics.summarize_propensity_scores(A, pi_hat, treatment_names=None, overlap_bounds=(0.05, 0.95), clip_bounds=(0.01, 0.99), bins=40)#

Summarize overlap and inverse-weight stability for each treatment.

Other perturbations are excluded from a treatment’s diagnostic comparison; each row compares that treatment with shared all-zero controls.

Doubly-robust semiparametric inference

Contents

Doubly-robust semiparametric inference#

Effect-size columns#

Choosing the variance estimator#

Propensity diagnostics#