Processing Module ⚙️

API reference for the memory::processing module, which provides core data processing functionality for single-cell RNA-seq analysis.

Normalization

normalize_expression

Normalizes expression data to a target count per observation.

pub fn normalize_expression(
    matrix: &IMArrayElement,
    expression_target: u32,
    direction: &Direction,
    precision: Option<Precision>
) -> anyhow::Result<()>

Parameters

matrix: Expression matrix to normalize
expression_target: Target sum for normalization (typically 10,000)
direction: Either Direction::Row (normalize cells) or Direction::Column (normalize genes)
precision: Optional floating-point precision (Single for f32, Double for f64)

Returns

anyhow::Result<()>: Success or error

Example

normalize_expression(&matrix, 10000, &Direction::ROW, Some(Precision::Single))?;

log1p_expression

Applies natural logarithm transformation after adding 1 (log1p) to the data.

pub fn log1p_expression(
    matrix: &IMArrayElement,
    precision: Option<Precision>
) -> anyhow::Result<()>

Parameters

matrix: Expression matrix to transform
precision: Optional floating-point precision (Single for f32, Double for f64)

Returns

anyhow::Result<()>: Success or error

Example

log1p_expression(&matrix, None)?;

Filtering

mark_filter_cells

Creates a boolean mask indicating which cells pass all specified filtering criteria.

pub fn mark_filter_cells<I, T>(
    anndata: &IMAnnData,
    min_genes: Option<I>,
    max_genes: Option<I>,
    min_counts: Option<T>,
    max_counts: Option<T>,
    min_fraction: Option<T>,
    max_fraction: Option<T>
) -> anyhow::Result<Vec<bool>>
where
    I: PrimInt + Unsigned + Zero + AddAssign + Into<T>,
    T: Float + NumCast + AddAssign + Sum

Type Parameters

I: Integer type for counting genes (typically u32)
T: Floating-point type for counts and fractions (typically f64)

Parameters

anndata: Reference to AnnData object
min_genes: Minimum number of genes expressed required for a cell
max_genes: Maximum number of genes expressed allowed for a cell
min_counts: Minimum count total required for a cell
max_counts: Maximum count total allowed for a cell
min_fraction: Minimum fraction of total genes that must be expressed in a cell
max_fraction: Maximum fraction of total genes that can be expressed in a cell

Returns

anyhow::Result<Vec<bool>>: Boolean vector where true indicates cells that pass all filters

Example

let cell_mask = mark_filter_cells::<u32, f64>(
    &adata,
    Some(200),    // Minimum genes
    Some(5000),   // Maximum genes
    Some(500.0),  // Minimum counts
    None,         // No maximum counts
    None, None    // No fraction thresholds
)?;

mark_filter_genes

Creates a boolean mask indicating which genes pass all specified filtering criteria.

pub fn mark_filter_genes<I, T>(
    anndata: &IMAnnData,
    min_cells: Option<I>,
    max_cells: Option<I>,
    min_counts: Option<T>,
    max_counts: Option<T>,
    min_fraction: Option<T>,
    max_fraction: Option<T>
) -> anyhow::Result<Vec<bool>>
where
    I: PrimInt + Unsigned + Zero + AddAssign + Into<T>,
    T: Float + NumCast + AddAssign + Sum

Type Parameters

I: Integer type for counting cells (typically u32)
T: Floating-point type for counts and fractions (typically f64)

Parameters

anndata: Reference to AnnData object
min_cells: Minimum number of cells expressing a gene
max_cells: Maximum number of cells expressing a gene
min_counts: Minimum count total required for a gene
max_counts: Maximum count total allowed for a gene
min_fraction: Minimum fraction of total cells that must express a gene
max_fraction: Maximum fraction of total cells that can express a gene

Returns

anyhow::Result<Vec<bool>>: Boolean vector where true indicates genes that pass all filters

Example

let gene_mask = mark_filter_genes::<u32, f64>(
    &adata,
    Some(3),      // Expressed in at least 3 cells
    None,         // No maximum cells threshold
    Some(10.0),   // At least 10 total counts
    None,         // No maximum counts threshold
    Some(0.001),  // Expressed in at least 0.1% of cells
    None          // No maximum fraction threshold
)?;

Highly Variable Genes

compute_highly_variable_genes

Identifies highly variable genes using statistical methods.

pub fn compute_highly_variable_genes(
    adata: &IMAnnData,
    params: Option<HVGParams>
) -> anyhow::Result<()>

Parameters

adata: Reference to AnnData object
params: Optional HVGParams struct with the following fields:
- min_mean: Minimum mean expression (default: 0.0125)
- max_mean: Maximum mean expression (default: 3.0)
- min_dispersion: Minimum dispersion (default: 0.5)
- max_dispersion: Maximum dispersion (default: Infinity)
- n_bins: Number of bins for mean-variance relationship (default: 20)
- n_top_genes: Optional number of top variable genes to select
- flavor: Statistical method (FlavorType::Seurat, FlavorType::CellRanger, or FlavorType::SVR)
- span: Span parameter for trend fitting (default: 0.3)
- batch_key: Optional column name for batch correction

Returns

anyhow::Result<()>: Success or error

Side Effects

Adds columns to adata.var():

means: Mean expression per gene
dispersions: Dispersion values
dispersions_norm: Normalized dispersion values
highly_variable: Boolean indicating highly variable genes
dispersions_normalized_standardized: Standardized dispersion values

Example

// Default parameters
compute_highly_variable_genes(&adata, None)?;

// Custom parameters
let params = HVGParams {
    min_mean: 0.01,
    max_mean: 5.0,
    min_dispersion: 0.5,
    max_dispersion: f64::INFINITY,
    n_bins: 20,
    n_top_genes: Some(2000),
    flavor: FlavorType::Seurat,
    span: 0.3,
    batch_key: None,
};
compute_highly_variable_genes(&adata, Some(params))?;

Differential Expression

rank_gene_groups

Performs differential expression analysis between groups of cells.

pub fn rank_gene_groups(
    adata: &IMAnnData,
    groupby: &str,
    reference: Option<&str>,
    groups: Option<&[&str]>,
    key_added: Option<&str>,
    method: Option<TestMethod>,
    n_genes: Option<usize>,
    correction_method: CorrectionMethod,
    compute_logfoldchanges: Option<bool>,
    pseudocount: Option<f64>
) -> anyhow::Result<()>

Parameters

adata: Reference to AnnData object
groupby: Column name in obs containing group information
reference: Reference group name for comparison (None or "rest" uses all other cells as reference)
groups: Groups to test (None tests all groups)
key_added: Key for storing results (None or empty uses "rank_genes_groups")
method: Statistical test method (default: TestMethod::TTest(TTestType::Welch))
n_genes: Number of top genes to report (default: 100)
correction_method: Multiple testing correction method:
- CorrectionMethod::Bonferroni
- CorrectionMethod::BenjaminiHochberg
- CorrectionMethod::BenjaminiYekutieli
- CorrectionMethod::HolmBonferroni
- CorrectionMethod::Hochberg
- CorrectionMethod::StoreyQValue
compute_logfoldchanges: Whether to compute log fold changes (default: true)
pseudocount: Pseudocount for log fold change calculation (default: 1.0)

Returns

anyhow::Result<()>: Success or error

Side Effects

Adds results to adata.uns() under the specified key:

{key}_scores: Test statistics per group
{key}_pvals: P-values per group
{key}_pvals_adj: Adjusted p-values per group
{key}_logfoldchanges: Log fold changes per group
{key}_names: Gene names per group
{key}_params_reference: Reference group information
{key}_params_method: Method information
{key}_params_groupby: Group column information
{key}_groups: List of tested groups

Example

rank_gene_groups(
    &adata,
    "cell_type",                         // Column with group info
    Some("control"),                     // Reference group
    Some(&["type_a", "type_b"]),         // Groups to test
    Some("de_results"),                  // Key for storing results
    Some(TestMethod::TTest(TTestType::Welch)),
    Some(100),                           // Top genes to report
    CorrectionMethod::BenjaminiHochberg,
    Some(true),                          // Compute log fold changes
    Some(1.0)                            // Pseudocount
)?;

Normalization​

normalize_expression​

Parameters​

Returns​

Example​

log1p_expression​

Parameters​

Returns​

Example​

Filtering​

mark_filter_cells​

Type Parameters​

Parameters​

Returns​

Example​

mark_filter_genes​

Type Parameters​

Parameters​

Returns​

Example​

Highly Variable Genes​

compute_highly_variable_genes​

Parameters​

Returns​

Side Effects​

Example​

Differential Expression​

rank_gene_groups​

Parameters​

Returns​

Side Effects​

Example​

Normalization

normalize_expression

Parameters

Returns

Example

log1p_expression

Parameters

Returns

Example

Filtering

mark_filter_cells

Type Parameters

Parameters

Returns

Example

mark_filter_genes

Type Parameters

Parameters

Returns

Example

Highly Variable Genes

compute_highly_variable_genes

Parameters

Returns

Side Effects

Example

Differential Expression

rank_gene_groups

Parameters

Returns

Side Effects

Example