Dual-Encoder vs Cross-Encoder for Transductive GRN Link Prediction
An Empirical Comparison Under Parameter-Matched Conditions
A rigorous parameter-matched comparison of dual-encoder and cross-encoder architectures for gene regulatory network link prediction — including ablation studies, pruning experiments, imbalance robustness, and cold-start evaluation.
Abstract
Introduction
This paper presents a rigorous, parameter-matched comparison of dual-encoder (two-tower) and cross-encoder neural architectures for predicting transcription factor–gene regulatory relationships from single-cell RNA-seq data. Both models contain exactly 5.58M parameters but differ fundamentally in how they process input pairs.
Methods
Both architectures are evaluated across four negative sampling regimes (1:1, 5:1, 10:1, 50:1), with ablation studies isolating individual architectural components, structured neuron pruning experiments testing sparsity tolerance, effective rank analysis measuring representational efficiency, and cold-start evaluation probing inductive generalization.
Results
The cross-encoder achieves AUROC 0.9025 vs 0.7941 for the dual-encoder under balanced training — a 10.84-point gap. The dual-encoder degrades severely with imbalance (AUROC −13.6 pp at 10:1, complete collapse at 50:1) while the cross-encoder remains stable (≈0.911 at 10:1). Ablation reveals 95% of the gap stems from joint encoding itself, with only 5% from the element-wise product interaction term.
Discussion
Pruning experiments show both architectures tolerate ≥60% neuron removal without post-hoc performance loss, suggesting substantial over-parameterization. Cold-start evaluation reveals that the dual-encoder's independent representations provide slightly better generalization to unseen entities, while the cross-encoder's joint processing is inherently transductive.
Conclusion
Under parameter-matched conditions, the cross-encoder dominates on transductive GRN link prediction. The dual-encoder's only advantage is modest cold-start generalization, which is insufficient to overcome its 10.84-point AUROC deficit on the primary task. Joint encoding is the decisive architectural factor.
Introduction
Dual-encoder models score interactions by computing a similarity function over independently encoded representations, enabling efficient large-scale retrieval. Cross-encoders process pairs jointly, allowing direct feature interaction at the cost of linear inference complexity. For GRN link prediction, the choice between these paradigms has significant implications for both accuracy and scalability.
Novelty of this Study
Prior comparisons between dual-encoder and cross-encoder architectures for biological link prediction have not controlled for parameter count, leaving open the question of whether performance gaps reflect architectural differences or capacity differences. This paper closes that gap.
This study enforces strict parameter matching at 5.58M parameters across both architectures and performs the most comprehensive evaluation of dual-encoder vs cross-encoder trade-offs for GRN inference to date, including ablation, pruning, effective rank analysis, and cold-start evaluation.
Methods
Parameter-Matched Architectures
Both models are sized to exactly 5.58M parameters. The dual-encoder scores interactions via cosine similarity of independent encodings; the cross-encoder processes concatenated TF–gene features through shared layers before scoring.
Imbalance Robustness Testing
Four negative sampling ratios are evaluated: 1:1, 5:1, 10:1, and 50:1. This range spans from artificially balanced training to highly realistic biological conditions where regulatory interactions are sparse.
Ablation & Pruning Studies
Ablation systematically removes architectural components to isolate their contributions. Structured neuron pruning removes low-importance neurons across sparsity levels (10–90%) to identify each model's redundancy budget. Effective rank analysis measures intrinsic dimensionality of learned representations.
Cold-Start & Classical Baselines
Cold-start evaluation holds out entities unseen during training to probe inductive generalization. Expression-only feature baselines (no learned embeddings) provide context for quantifying the contribution of neural components over simple feature matching.
Results
Under parameter-matched conditions, the cross-encoder outperforms the dual-encoder across every evaluation setting except cold-start generalization.
AUROC Across Imbalance Regimes
Both architectures at 5.58M parameters on human brain scRNA-seq.
The dual-encoder collapses entirely at 50:1 imbalance, while the cross-encoder remains stable — a critical advantage for real biological datasets where positive regulatory interactions are rare.
Ablation: What Drives the Gap?
Ablation reveals that 95% of the AUROC gap stems from joint encoding itself. The element-wise product interaction term within the cross-encoder accounts for only 5% — the decisive factor is whether inputs are processed jointly or independently.
Discussion
By matching parameters exactly at 5.58M, this study eliminates capacity as a confounding variable. The 10.84-point AUROC gap at balanced training — growing to >25 points at severe imbalance — is attributable to architectural choice alone.
Pruning Tolerance
Both architectures tolerate ≥60% structured neuron pruning without post-hoc performance loss, suggesting both are substantially over-parameterized at 5.58M for the dataset size. Leaner models deserve investigation.
The dual-encoder's only advantage is a modest improvement in cold-start generalization to unseen entities, a consequence of its independent representations. However, this advantage is practically limited by the transductive nature of GRN inference — in most scenarios, TF and gene identities are known at training time, making the cross-encoder's inductive limitation irrelevant.
The 50:1 collapse of the dual-encoder is a critical practical concern. Real regulatory databases have far more non-interacting TF–gene pairs than interacting ones, making the dual-encoder unsuitable for production GRN inference without significant modifications.
Conclusion
This paper provides the first parameter-matched empirical comparison of dual-encoder and cross-encoder architectures for transductive GRN link prediction, submitted to IEEE Transactions on Neural Networks and Learning Systems (TNNLS).
The results are unambiguous: for transductive GRN link prediction from single-cell RNA-seq data, cross-encoders outperform dual-encoders by a large margin (10.84 pp AUROC at balanced training, growing catastrophically under imbalance). The decisive factor is joint encoding, not any secondary architectural feature.
Future directions include investigating whether dual-encoders can be augmented with late interaction mechanisms to recover the AUROC gap while retaining their retrieval efficiency, and exploring whether the 60% pruning tolerance enables significant model compression without accuracy loss.