Two-Tower Hybrid Embedding Networks for GRN Inference
Gene Regulatory Network Inference from Single-Cell Transcriptomics
A pure-Rust two-tower MLP that learns entity embeddings and cell-type expression profiles to predict transcription factor–gene interactions — 83% ensemble accuracy, CPU-trainable without any deep learning framework.
Abstract
Introduction
Inferring gene regulatory networks (GRNs) from single-cell RNA-seq data is a central challenge in computational biology. This paper proposes a two-tower multilayer perceptron that jointly learns entity embeddings for transcription factors and target genes alongside cell-type-specific expression profiles to predict regulatory interactions.
Methods
Separate encoder towers process 512-dimensional learnable embeddings and 11-dimensional cell-type expression profiles through three fully connected layers with batch normalization and dropout. Interaction scores are produced via temperature-scaled dot-product similarity (τ = 0.05). The system is implemented entirely in Rust without external deep learning dependencies.
Results
On a human brain dataset with 47,388 TF–gene pairs, the single model achieves 80.14% accuracy and AUROC of 0.844. A 5-model ensemble reaches 83.06% accuracy. Hyperparameter optimization alone contributed 77% of total improvement (+16.1 pp), with cross-seed variance (CV = 2.06%) confirming model robustness.
Discussion
A competing cross-attention model (GCAN) achieved 91% accuracy but only 0.692 F1-score due to poor recall, underperforming on balanced prediction metrics. This highlights a critical accuracy–F1 paradox where high accuracy masks poor generalization under class imbalance.
Conclusion
A carefully optimized standard MLP can achieve competitive GRN inference results (83%) while remaining CPU-trainable, challenging the assumption that complex architectures are necessary for strong performance on this task.
Introduction
Gene regulatory networks encode the interactions between transcription factors (TFs) and target genes that govern cellular identity and function. Reconstructing these networks from single-cell RNA-seq data is challenging due to the high dimensionality, sparsity, and cell-type heterogeneity of expression measurements.
Key Motivation
Prior work on GRN inference often relies on complex graph neural networks or attention mechanisms. This study investigates whether a well-tuned, simple MLP can close the gap — with the added benefit of full CPU trainability and no framework dependencies.
This paper presents a two-tower architecture that separately encodes TF and gene identities through learned embeddings, while a second pathway encodes cell-type-specific expression profiles. The towers are combined via temperature-scaled dot-product similarity to produce regulatory link predictions.
Methods
Dataset & Regulatory Priors
Human brain single-cell RNA-seq data was paired with regulatory ground truth from the DoRothEA and TRRUST databases, yielding 47,388 TF–gene pairs split 70/15/15 for training, validation, and testing.
Two-Tower Architecture
Each tower consists of three fully connected layers with batch normalization and dropout. The entity tower processes 512-dimensional learnable embeddings for TFs and genes; the expression tower processes 11-dimensional cell-type mean expression profiles.
Similarity Scoring & Training
Tower outputs are combined via temperature-scaled dot-product similarity (τ = 0.05) with sigmoid activation. Training uses Adam optimizer with L2 regularization (λ = 0.01), learning rate 5×10³, and early stopping with 10-epoch patience.
Ensemble & Evaluation
Five independently trained models are aggregated by averaging predicted probabilities. Cross-seed evaluation across five random initializations measures variance (CV = 2.06%). Metrics include accuracy, F1, AUROC, and a comparison against GCAN (cross-attention baseline).
Results
Performance is evaluated on 47,388 human brain TF–gene pairs across single and ensemble configurations.
Model Performance Comparison
Single model vs ensemble vs GCAN baseline on human brain scRNA-seq data.
GCAN's 91% accuracy masks a 0.692 F1-score driven by poor recall — the two-tower ensemble is more reliable on balanced prediction despite a lower headline accuracy.
Accuracy Attribution
Hyperparameter tuning alone contributed +16.1 percentage points (77% of total improvement). Ensemble aggregation added +2.9 pp and expression features +1.8 pp.
Discussion
The results challenge the common narrative that GRN inference requires increasingly complex architectures. A carefully tuned two-tower MLP, implemented without any deep learning framework, achieves competitive accuracy on a challenging human brain dataset.
The Accuracy–F1 Paradox
GCAN's 91% accuracy is inflated by biased recall. When evaluated on balanced metrics, the two-tower model outperforms it, illustrating why accuracy alone is insufficient for imbalanced biological datasets.
The pure-Rust implementation introduces a useful engineering constraint: no automatic differentiation, no GPU assumed, full determinism. This forced explicit numerical decisions (temperature scaling, L2 regularization) that ultimately contributed to model stability, evidenced by a cross-seed CV of only 2.06%.
Conclusion
This study demonstrates that a standard, well-tuned MLP implemented in pure Rust can achieve 83% ensemble accuracy on GRN inference from single-cell RNA-seq data — competitive with significantly more complex architectures.
The dominant contribution of hyperparameter optimization over architectural complexity suggests that future work on GRN inference should prioritize thorough tuning before escalating model complexity. The CPU-trainable implementation also makes the approach accessible to researchers without GPU infrastructure.
The accuracy–F1 paradox observed in the GCAN comparison is a cautionary note for benchmarking in computational biology: headline accuracy on imbalanced datasets can be misleading, and balanced metrics like F1 and AUROC should be primary evaluation criteria.