Processing Module ⚙️
API reference for the memory::processing
module, which provides core data processing functionality for single-cell RNA-seq analysis.
Normalization
normalize_expression
Normalizes expression data to a target count per observation.
pub fn normalize_expression(
matrix: &IMArrayElement,
expression_target: u32,
direction: &Direction,
precision: Option<Precision>
) -> anyhow::Result<()>
Parameters
matrix
: Expression matrix to normalizeexpression_target
: Target sum for normalization (typically 10,000)direction
: EitherDirection::Row
(normalize cells) orDirection::Column
(normalize genes)precision
: Optional floating-point precision (Single
for f32,Double
for f64)
Returns
anyhow::Result<()>
: Success or error
Example
normalize_expression(&matrix, 10000, &Direction::ROW, Some(Precision::Single))?;
log1p_expression
Applies natural logarithm transformation after adding 1 (log1p) to the data.
pub fn log1p_expression(
matrix: &IMArrayElement,
precision: Option<Precision>
) -> anyhow::Result<()>
Parameters
matrix
: Expression matrix to transformprecision
: Optional floating-point precision (Single
for f32,Double
for f64)
Returns
anyhow::Result<()>
: Success or error
Example
log1p_expression(&matrix, None)?;
Filtering
mark_filter_cells
Creates a boolean mask indicating which cells pass all specified filtering criteria.
pub fn mark_filter_cells<I, T>(
anndata: &IMAnnData,
min_genes: Option<I>,
max_genes: Option<I>,
min_counts: Option<T>,
max_counts: Option<T>,
min_fraction: Option<T>,
max_fraction: Option<T>
) -> anyhow::Result<Vec<bool>>
where
I: PrimInt + Unsigned + Zero + AddAssign + Into<T>,
T: Float + NumCast + AddAssign + Sum
Type Parameters
I
: Integer type for counting genes (typicallyu32
)T
: Floating-point type for counts and fractions (typicallyf64
)
Parameters
anndata
: Reference to AnnData objectmin_genes
: Minimum number of genes expressed required for a cellmax_genes
: Maximum number of genes expressed allowed for a cellmin_counts
: Minimum count total required for a cellmax_counts
: Maximum count total allowed for a cellmin_fraction
: Minimum fraction of total genes that must be expressed in a cellmax_fraction
: Maximum fraction of total genes that can be expressed in a cell
Returns
anyhow::Result<Vec<bool>>
: Boolean vector wheretrue
indicates cells that pass all filters
Example
let cell_mask = mark_filter_cells::<u32, f64>(
&adata,
Some(200), // Minimum genes
Some(5000), // Maximum genes
Some(500.0), // Minimum counts
None, // No maximum counts
None, None // No fraction thresholds
)?;
mark_filter_genes
Creates a boolean mask indicating which genes pass all specified filtering criteria.
pub fn mark_filter_genes<I, T>(
anndata: &IMAnnData,
min_cells: Option<I>,
max_cells: Option<I>,
min_counts: Option<T>,
max_counts: Option<T>,
min_fraction: Option<T>,
max_fraction: Option<T>
) -> anyhow::Result<Vec<bool>>
where
I: PrimInt + Unsigned + Zero + AddAssign + Into<T>,
T: Float + NumCast + AddAssign + Sum
Type Parameters
I
: Integer type for counting cells (typicallyu32
)T
: Floating-point type for counts and fractions (typicallyf64
)
Parameters
anndata
: Reference to AnnData objectmin_cells
: Minimum number of cells expressing a genemax_cells
: Maximum number of cells expressing a genemin_counts
: Minimum count total required for a genemax_counts
: Maximum count total allowed for a genemin_fraction
: Minimum fraction of total cells that must express a genemax_fraction
: Maximum fraction of total cells that can express a gene
Returns
anyhow::Result<Vec<bool>>
: Boolean vector wheretrue
indicates genes that pass all filters
Example
let gene_mask = mark_filter_genes::<u32, f64>(
&adata,
Some(3), // Expressed in at least 3 cells
None, // No maximum cells threshold
Some(10.0), // At least 10 total counts
None, // No maximum counts threshold
Some(0.001), // Expressed in at least 0.1% of cells
None // No maximum fraction threshold
)?;
Highly Variable Genes
compute_highly_variable_genes
Identifies highly variable genes using statistical methods.
pub fn compute_highly_variable_genes(
adata: &IMAnnData,
params: Option<HVGParams>
) -> anyhow::Result<()>
Parameters
adata
: Reference to AnnData objectparams
: OptionalHVGParams
struct with the following fields:min_mean
: Minimum mean expression (default: 0.0125)max_mean
: Maximum mean expression (default: 3.0)min_dispersion
: Minimum dispersion (default: 0.5)max_dispersion
: Maximum dispersion (default: Infinity)n_bins
: Number of bins for mean-variance relationship (default: 20)n_top_genes
: Optional number of top variable genes to selectflavor
: Statistical method (FlavorType::Seurat
,FlavorType::CellRanger
, orFlavorType::SVR
)span
: Span parameter for trend fitting (default: 0.3)batch_key
: Optional column name for batch correction
Returns
anyhow::Result<()>
: Success or error
Side Effects
Adds columns to adata.var()
:
means
: Mean expression per genedispersions
: Dispersion valuesdispersions_norm
: Normalized dispersion valueshighly_variable
: Boolean indicating highly variable genesdispersions_normalized_standardized
: Standardized dispersion values
Example
// Default parameters
compute_highly_variable_genes(&adata, None)?;
// Custom parameters
let params = HVGParams {
min_mean: 0.01,
max_mean: 5.0,
min_dispersion: 0.5,
max_dispersion: f64::INFINITY,
n_bins: 20,
n_top_genes: Some(2000),
flavor: FlavorType::Seurat,
span: 0.3,
batch_key: None,
};
compute_highly_variable_genes(&adata, Some(params))?;
Differential Expression
rank_gene_groups
Performs differential expression analysis between groups of cells.
pub fn rank_gene_groups(
adata: &IMAnnData,
groupby: &str,
reference: Option<&str>,
groups: Option<&[&str]>,
key_added: Option<&str>,
method: Option<TestMethod>,
n_genes: Option<usize>,
correction_method: CorrectionMethod,
compute_logfoldchanges: Option<bool>,
pseudocount: Option<f64>
) -> anyhow::Result<()>
Parameters
adata
: Reference to AnnData objectgroupby
: Column name in obs containing group informationreference
: Reference group name for comparison (None or "rest" uses all other cells as reference)groups
: Groups to test (None tests all groups)key_added
: Key for storing results (None or empty uses "rank_genes_groups")method
: Statistical test method (default:TestMethod::TTest(TTestType::Welch)
)n_genes
: Number of top genes to report (default: 100)correction_method
: Multiple testing correction method:CorrectionMethod::Bonferroni
CorrectionMethod::BenjaminiHochberg
CorrectionMethod::BenjaminiYekutieli
CorrectionMethod::HolmBonferroni
CorrectionMethod::Hochberg
CorrectionMethod::StoreyQValue
compute_logfoldchanges
: Whether to compute log fold changes (default: true)pseudocount
: Pseudocount for log fold change calculation (default: 1.0)
Returns
anyhow::Result<()>
: Success or error
Side Effects
Adds results to adata.uns()
under the specified key:
{key}_scores
: Test statistics per group{key}_pvals
: P-values per group{key}_pvals_adj
: Adjusted p-values per group{key}_logfoldchanges
: Log fold changes per group{key}_names
: Gene names per group{key}_params_reference
: Reference group information{key}_params_method
: Method information{key}_params_groupby
: Group column information{key}_groups
: List of tested groups
Example
rank_gene_groups(
&adata,
"cell_type", // Column with group info
Some("control"), // Reference group
Some(&["type_a", "type_b"]), // Groups to test
Some("de_results"), // Key for storing results
Some(TestMethod::TTest(TTestType::Welch)),
Some(100), // Top genes to report
CorrectionMethod::BenjaminiHochberg,
Some(true), // Compute log fold changes
Some(1.0) // Pseudocount
)?;