Skip to main content

Processing Module ⚙️

API reference for the memory::processing module, which provides core data processing functionality for single-cell RNA-seq analysis.

Normalization

normalize_expression

Normalizes expression data to a target count per observation.

pub fn normalize_expression(
matrix: &IMArrayElement,
expression_target: u32,
direction: &Direction,
precision: Option<Precision>
) -> anyhow::Result<()>

Parameters

  • matrix: Expression matrix to normalize
  • expression_target: Target sum for normalization (typically 10,000)
  • direction: Either Direction::Row (normalize cells) or Direction::Column (normalize genes)
  • precision: Optional floating-point precision (Single for f32, Double for f64)

Returns

  • anyhow::Result<()>: Success or error

Example

normalize_expression(&matrix, 10000, &Direction::ROW, Some(Precision::Single))?;

log1p_expression

Applies natural logarithm transformation after adding 1 (log1p) to the data.

pub fn log1p_expression(
matrix: &IMArrayElement,
precision: Option<Precision>
) -> anyhow::Result<()>

Parameters

  • matrix: Expression matrix to transform
  • precision: Optional floating-point precision (Single for f32, Double for f64)

Returns

  • anyhow::Result<()>: Success or error

Example

log1p_expression(&matrix, None)?;

Filtering

mark_filter_cells

Creates a boolean mask indicating which cells pass all specified filtering criteria.

pub fn mark_filter_cells<I, T>(
anndata: &IMAnnData,
min_genes: Option<I>,
max_genes: Option<I>,
min_counts: Option<T>,
max_counts: Option<T>,
min_fraction: Option<T>,
max_fraction: Option<T>
) -> anyhow::Result<Vec<bool>>
where
I: PrimInt + Unsigned + Zero + AddAssign + Into<T>,
T: Float + NumCast + AddAssign + Sum

Type Parameters

  • I: Integer type for counting genes (typically u32)
  • T: Floating-point type for counts and fractions (typically f64)

Parameters

  • anndata: Reference to AnnData object
  • min_genes: Minimum number of genes expressed required for a cell
  • max_genes: Maximum number of genes expressed allowed for a cell
  • min_counts: Minimum count total required for a cell
  • max_counts: Maximum count total allowed for a cell
  • min_fraction: Minimum fraction of total genes that must be expressed in a cell
  • max_fraction: Maximum fraction of total genes that can be expressed in a cell

Returns

  • anyhow::Result<Vec<bool>>: Boolean vector where true indicates cells that pass all filters

Example

let cell_mask = mark_filter_cells::<u32, f64>(
&adata,
Some(200), // Minimum genes
Some(5000), // Maximum genes
Some(500.0), // Minimum counts
None, // No maximum counts
None, None // No fraction thresholds
)?;

mark_filter_genes

Creates a boolean mask indicating which genes pass all specified filtering criteria.

pub fn mark_filter_genes<I, T>(
anndata: &IMAnnData,
min_cells: Option<I>,
max_cells: Option<I>,
min_counts: Option<T>,
max_counts: Option<T>,
min_fraction: Option<T>,
max_fraction: Option<T>
) -> anyhow::Result<Vec<bool>>
where
I: PrimInt + Unsigned + Zero + AddAssign + Into<T>,
T: Float + NumCast + AddAssign + Sum

Type Parameters

  • I: Integer type for counting cells (typically u32)
  • T: Floating-point type for counts and fractions (typically f64)

Parameters

  • anndata: Reference to AnnData object
  • min_cells: Minimum number of cells expressing a gene
  • max_cells: Maximum number of cells expressing a gene
  • min_counts: Minimum count total required for a gene
  • max_counts: Maximum count total allowed for a gene
  • min_fraction: Minimum fraction of total cells that must express a gene
  • max_fraction: Maximum fraction of total cells that can express a gene

Returns

  • anyhow::Result<Vec<bool>>: Boolean vector where true indicates genes that pass all filters

Example

let gene_mask = mark_filter_genes::<u32, f64>(
&adata,
Some(3), // Expressed in at least 3 cells
None, // No maximum cells threshold
Some(10.0), // At least 10 total counts
None, // No maximum counts threshold
Some(0.001), // Expressed in at least 0.1% of cells
None // No maximum fraction threshold
)?;

Highly Variable Genes

compute_highly_variable_genes

Identifies highly variable genes using statistical methods.

pub fn compute_highly_variable_genes(
adata: &IMAnnData,
params: Option<HVGParams>
) -> anyhow::Result<()>

Parameters

  • adata: Reference to AnnData object
  • params: Optional HVGParams struct with the following fields:
    • min_mean: Minimum mean expression (default: 0.0125)
    • max_mean: Maximum mean expression (default: 3.0)
    • min_dispersion: Minimum dispersion (default: 0.5)
    • max_dispersion: Maximum dispersion (default: Infinity)
    • n_bins: Number of bins for mean-variance relationship (default: 20)
    • n_top_genes: Optional number of top variable genes to select
    • flavor: Statistical method (FlavorType::Seurat, FlavorType::CellRanger, or FlavorType::SVR)
    • span: Span parameter for trend fitting (default: 0.3)
    • batch_key: Optional column name for batch correction

Returns

  • anyhow::Result<()>: Success or error

Side Effects

Adds columns to adata.var():

  • means: Mean expression per gene
  • dispersions: Dispersion values
  • dispersions_norm: Normalized dispersion values
  • highly_variable: Boolean indicating highly variable genes
  • dispersions_normalized_standardized: Standardized dispersion values

Example

// Default parameters
compute_highly_variable_genes(&adata, None)?;

// Custom parameters
let params = HVGParams {
min_mean: 0.01,
max_mean: 5.0,
min_dispersion: 0.5,
max_dispersion: f64::INFINITY,
n_bins: 20,
n_top_genes: Some(2000),
flavor: FlavorType::Seurat,
span: 0.3,
batch_key: None,
};
compute_highly_variable_genes(&adata, Some(params))?;

Differential Expression

rank_gene_groups

Performs differential expression analysis between groups of cells.

pub fn rank_gene_groups(
adata: &IMAnnData,
groupby: &str,
reference: Option<&str>,
groups: Option<&[&str]>,
key_added: Option<&str>,
method: Option<TestMethod>,
n_genes: Option<usize>,
correction_method: CorrectionMethod,
compute_logfoldchanges: Option<bool>,
pseudocount: Option<f64>
) -> anyhow::Result<()>

Parameters

  • adata: Reference to AnnData object
  • groupby: Column name in obs containing group information
  • reference: Reference group name for comparison (None or "rest" uses all other cells as reference)
  • groups: Groups to test (None tests all groups)
  • key_added: Key for storing results (None or empty uses "rank_genes_groups")
  • method: Statistical test method (default: TestMethod::TTest(TTestType::Welch))
  • n_genes: Number of top genes to report (default: 100)
  • correction_method: Multiple testing correction method:
    • CorrectionMethod::Bonferroni
    • CorrectionMethod::BenjaminiHochberg
    • CorrectionMethod::BenjaminiYekutieli
    • CorrectionMethod::HolmBonferroni
    • CorrectionMethod::Hochberg
    • CorrectionMethod::StoreyQValue
  • compute_logfoldchanges: Whether to compute log fold changes (default: true)
  • pseudocount: Pseudocount for log fold change calculation (default: 1.0)

Returns

  • anyhow::Result<()>: Success or error

Side Effects

Adds results to adata.uns() under the specified key:

  • {key}_scores: Test statistics per group
  • {key}_pvals: P-values per group
  • {key}_pvals_adj: Adjusted p-values per group
  • {key}_logfoldchanges: Log fold changes per group
  • {key}_names: Gene names per group
  • {key}_params_reference: Reference group information
  • {key}_params_method: Method information
  • {key}_params_groupby: Group column information
  • {key}_groups: List of tested groups

Example

rank_gene_groups(
&adata,
"cell_type", // Column with group info
Some("control"), // Reference group
Some(&["type_a", "type_b"]), // Groups to test
Some("de_results"), // Key for storing results
Some(TestMethod::TTest(TTestType::Welch)),
Some(100), // Top genes to report
CorrectionMethod::BenjaminiHochberg,
Some(true), // Compute log fold changes
Some(1.0) // Pseudocount
)?;