Linear Algebra Operations
single-algebra provides robust linear algebra capabilities built on multiple backend options. These operations form the foundation for dimensionality reduction, matrix decomposition, and other advanced analytical techniques essential for working with high-dimensional data.
SVD (Singular Value Decomposition)
SVD is a fundamental matrix factorization technique that decomposes a matrix into three matrices: U, Σ, and V^T. single-algebra implements SVD with multiple backend options:
- LAPACK-based SVD: Leverages the industry-standard LAPACK library through the
nalgebra-lapack
crate with OpenBLAS backend for high performance. - Faer-based SVD: Uses the Faer library, a pure Rust implementation optimized for modern CPU architectures.
- Single-SVDLib: A specialized implementation for large sparse matrices, with support for both Lanczos algorithm and randomized approaches.
All SVD implementations expose a consistent interface for retrieving:
- Left singular vectors (U)
- Singular values (Σ)
- Right singular vectors (V^T)
- Matrix reconstruction
// Example of using LAPACK-based SVD
let mut svd = crate::svd::lapack::SVD::new();
svd.compute(matrix.view())?;
// Access components
let u = svd.u().unwrap();
let s = svd.s().unwrap();
let vt = svd.vt().unwrap();
// Reconstruct the original matrix
let reconstructed = svd.reconstruct().unwrap();
PCA (Principal Component Analysis)
PCA is implemented as a higher-level abstraction built on SVD, with multiple implementations optimized for different use cases:
Dense PCA
Full-matrix PCA implementation with comprehensive features:
// Create a PCA model with LapackSVD backend
let mut pca = PCABuilder::new(LapackSVD)
.n_components(2)
.center(true)
.scale(false)
.build();
// Fit the model to data
pca.fit(data.view())?;
// Transform data to the principal component space
let transformed = pca.transform(data.view())?;
Sparse PCA
Specialized version optimized for sparse matrices, with better memory efficiency for high-dimensional, sparse data:
// Create a Sparse PCA model
let mut sparse_pca = SparsePCABuilder::<f64>::new()
.n_components(50)
.center(true)
.alpha(1.0)
.svd_method(SVDMethod::Lanczos)
.build();
// Fit and transform in one step
let transformed = sparse_pca.fit_transform(&sparse_matrix)?;
Masked Sparse PCA
Feature selection integrated into PCA computation for analyzing specific subsets of features:
// Create a boolean mask for feature selection
let feature_mask = vec![true, false, true, true, false]; // Only use 1st, 3rd, and 4th features
// Create a Masked Sparse PCA model
let mut masked_pca = MaskedSparsePCABuilder::<f64>::new()
.n_components(2)
.mask(feature_mask)
.svd_method(SVDMethod::Random {
n_oversamples: 10,
n_power_iterations: 5,
normalizer: PowerIterationNormalizer::QR,
})
.build();
// Fit and transform
let transformed = masked_pca.fit_transform(&sparse_matrix)?;
PCA Features
All PCA implementations provide:
- Configurable Components: Control the number of principal components to extract
- Centering and Scaling: Options for data preprocessing
- Variance Analysis: Calculate explained variance ratios and cumulative explained variance
- Feature Importance: Determine the contribution of each feature to the principal components
The modular architecture allows for extension with different SVD implementations via the SVDImplementation
trait, enabling easy switching between backends.
SVD Method Selection
single-algebra supports multiple SVD computation methods:
// Lanczos algorithm - good for sparse matrices
let svd_method = SVDMethod::Lanczos;
// Randomized SVD - faster for very large matrices
let svd_method = SVDMethod::Random {
n_oversamples: 10, // Additional dimensions for better accuracy
n_power_iterations: 7, // Power iterations for enhanced accuracy
normalizer: PowerIterationNormalizer::QR, // Normalization method
};
Matrix Operations
The library provides comprehensive operations for both sparse and dense matrices:
Matrix Similarity Measures
// Calculate cosine similarity between vectors
let cosine_sim = CosineSimilarity;
let similarity = cosine_sim.calculate(vector_a.view(), vector_b.view());
// Calculate Euclidean similarity
let euclidean_sim = EuclideanSimilarity::new(1.0);
let similarity = euclidean_sim.calculate(vector_a.view(), vector_b.view());
// Other similarity measures
let pearson = PearsonSimilarity;
let manhattan = ManhattanSimilarity::default();
let jaccard = JaccardSimilarity::new(0.001);
Sparse Matrix Utilities
Efficient operations optimized for sparse matrices:
// Count non-zero elements per column
let nnz_per_col: Vec<u32> = sparse_matrix.nonzero_col()?;
// Calculate column sums
let col_sums: Vec<f64> = sparse_matrix.sum_col()?;
// Calculate column variances
let variances: Vec<f64> = sparse_matrix.var_col::<u32, f64>()?;
// Find min/max values per column
let (min_vals, max_vals): (Vec<f64>, Vec<f64>) = sparse_matrix.min_max_col()?;
// Normalize columns to sum to 1.0
sparse_matrix.normalize(&col_sums, 1.0, &Direction::COLUMN)?;
Performance Features
single-algebra includes numerous optimizations for high-performance computing:
- Parallel Processing: Multi-threaded implementations using Rayon for critical operations
- Chunk-wise Processing: Efficient data handling for better cache utilization
- Specialized Sparse Algorithms: Algorithms tailored to exploit sparsity patterns
- SIMD Acceleration: Optional SIMD support through the
simba
feature
Integration with External Libraries
single-algebra provides seamless integration with multiple linear algebra ecosystems:
- nalgebra: Core integration for general-purpose linear algebra operations
- ndarray: Integration for n-dimensional array processing
- Faer: Modern, SIMD-optimized implementations
- BLAS/LAPACK: Industry-standard high-performance routines through OpenBLAS
Application Areas
These linear algebra operations serve as building blocks for:
- Dimensionality reduction in high-dimensional data (like single-cell RNA-seq)
- Feature extraction in machine learning pipelines
- Signal processing and data compression
- Network and graph analysis
- Statistical modeling and inference
The modular design of single-algebra allows users to select the most appropriate implementation for their specific use case, balancing accuracy, performance, and memory requirements.