Dual-Encoder vs Cross-Encoder for Transductive GRN Link Prediction
An Empirical Comparison Under Parameter-Matched Conditions
Two ways to predict which genes control which other genes, tested side by side with the same model size, so the contest is fair.
A rigorous parameter-matched comparison of dual-encoder and cross-encoder architectures for gene regulatory network link prediction: ablation studies, pruning experiments, imbalance robustness, and cold-start evaluation.
Concept Overview
Abstract
Introduction
This paper presents a rigorous, parameter-matched comparison of dual-encoder (two-tower) and cross-encoder neural architectures for predicting transcription factor–gene regulatory relationships from single-cell RNA-seq data. Both models contain exactly 5.58M parameters but differ fundamentally in how they process input pairs.
Cells contain master-switch genes (called transcription factors) that turn other genes on or off. Mapping who controls whom would help biologists understand disease and design drugs. We compare two AI designs on this prediction task. One reads each gene on its own, then checks if the pair matches. The other reads the pair together from the start. Both models have exactly 5.58M parameters (their internal "dials"), so any difference in skill comes from the design, not the size.
Methods
Both architectures are evaluated across four negative sampling regimes (1:1, 5:1, 10:1, 50:1), with ablation studies isolating individual architectural components, structured neuron pruning experiments testing sparsity tolerance, effective rank analysis measuring representational efficiency, and cold-start evaluation probing inductive generalization.
In real biology, most gene pairs are not connected. So we trained each model with different mixes of "non-pairs" to "real pairs": 1:1, 5:1, 10:1, 50:1. The last mix is the most realistic and the hardest. We also turned off parts of each model one at a time to see which parts mattered, trimmed away neurons to see how much was redundant, and tested each model on genes it had never seen during training.
Results
The cross-encoder achieves AUROC 0.9025 vs 0.7941 for the dual-encoder under balanced training, a 10.84-point gap. The dual-encoder degrades severely with imbalance (AUROC −13.6 pp at 10:1, complete collapse at 50:1) while the cross-encoder remains stable (≈0.911 at 10:1). Ablation reveals 95% of the gap stems from joint encoding itself, with only 5% from the element-wise product interaction term.
The model that reads pairs together scored 0.9025 on the standard accuracy measure (AUROC, where 1.0 is perfect and 0.5 is a coin flip). The model that reads inputs separately scored 0.7941, a 10.84-point gap. When real-world imbalance was added, the separate-reader fell apart (−13.6 pp at 10:1, complete collapse at 50:1). The joint-reader held steady at ≈0.911 at 10:1. When we turned off parts of the winner, 95% of its advantage came from simply reading the pair together. Only 5% came from a specific math trick inside it.
Discussion
Pruning experiments show both architectures tolerate ≥60% neuron removal without post-hoc performance loss, suggesting substantial over-parameterization. Cold-start evaluation reveals that the dual-encoder's independent representations provide slightly better generalization to unseen entities, while the cross-encoder's joint processing is inherently transductive.
We could remove more than 60% of each model's internal neurons without hurting accuracy, which means both have a lot of unused capacity. On genes never seen during training, the separate-reader generalized a little better. The joint-reader, by design, only works on gene pairs it has met before.
Conclusion
Under parameter-matched conditions, the cross-encoder dominates on transductive GRN link prediction. The dual-encoder's only advantage is modest cold-start generalization, which is insufficient to overcome its 10.84-point AUROC deficit on the primary task. Joint encoding is the decisive architectural factor.
With model sizes held equal, the joint-reader clearly wins at predicting gene-control links it has seen the players for. The separate-reader's small edge on brand-new genes can't make up for its 10.84-point accuracy shortfall on the main task. The lesson: how the model looks at the pair matters more than how big it is.
Introduction
Dual-encoder models score interactions by computing a similarity function over independently encoded representations, enabling efficient large-scale retrieval. Cross-encoders process pairs jointly, allowing direct feature interaction at the cost of linear inference complexity. For GRN link prediction, the choice between these paradigms has significant implications for both accuracy and scalability.
There are two common ways to teach an AI to score whether two things go together. The "separate" approach reads each item alone, makes a numerical summary of it, then compares the summaries. It's fast because you can pre-compute the summaries. The "joint" approach reads both items together, which catches subtle interactions but is slower to run on huge databases. For predicting which gene controls which, this choice affects both how accurate the model is and how easily it scales.
Novelty of this Study
Prior comparisons between dual-encoder and cross-encoder architectures for biological link prediction have not controlled for parameter count, leaving open the question of whether performance gaps reflect architectural differences or capacity differences. This paper closes that gap.
Earlier studies compared these two designs without keeping their size the same. So when one won, no one could tell if it won because of its design or just because it was bigger. We hold both models to the exact same size, so we can finally answer the question cleanly.
This study enforces strict parameter matching at 5.58M parameters across both architectures and evaluates dual-encoder vs cross-encoder trade-offs for GRN inference across ablation, pruning, effective rank analysis, and cold-start evaluation.
Both models are fixed at exactly 5.58M parameters. From there, we run the most thorough head-to-head test yet: turning off parts, trimming neurons, measuring how compactly each model represents information, and testing it on genes it has never seen.
Methods
Parameter-Matched Architectures
Both models are sized to exactly 5.58M parameters. The dual-encoder scores interactions via cosine similarity of independent encodings; the cross-encoder processes concatenated TF–gene features through shared layers before scoring.
Both models have exactly 5.58M parameters. The separate-reader makes a numerical summary of each gene on its own, then checks if the two summaries look alike. The joint-reader glues the gene pair together first, then passes them through its shared layers to decide if they're linked.
Imbalance Robustness Testing
Four negative sampling ratios are evaluated: 1:1, 5:1, 10:1, and 50:1. This range spans from artificially balanced training to highly realistic biological conditions where regulatory interactions are sparse.
In real biology, "non-pairs" vastly outnumber real gene-control pairs. We test four mixes: 1:1 (balanced, easy mode), 5:1, 10:1, and 50:1 (the realistic, hard case). This shows how each model holds up as the haystack gets bigger.
Ablation & Pruning Studies
Ablation systematically removes architectural components to isolate their contributions. Structured neuron pruning removes low-importance neurons across sparsity levels (10–90%) to identify each model's redundancy budget. Effective rank analysis measures intrinsic dimensionality of learned representations.
We turn off model parts one at a time to see which ones actually matter. We also trim away unimportant neurons in steps from 10% to 90% to find how much of the model is dead weight. A separate check measures how compactly each model packs information into its internal summaries.
Cold-Start & Classical Baselines
Cold-start evaluation holds out entities unseen during training to probe inductive generalization. Expression-only feature baselines (no learned embeddings) provide context for quantifying the contribution of neural components over simple feature matching.
For the cold-start test, we hide some genes during training and ask the model to handle them only at test time. This checks if it has truly learned biology or just memorized. We also compare against a simple baseline that uses raw gene-activity numbers and no learned summaries, to see how much value the neural network actually adds.
Results
Under parameter-matched conditions, the cross-encoder outperforms the dual-encoder across every evaluation setting except cold-start generalization.
With the two models held to the same size, the joint-reader wins in every test, except when handling genes it has never seen before.
AUROC Across Imbalance Regimes
Both architectures at 5.58M parameters on human brain scRNA-seq.
The dual-encoder collapses entirely at 50:1 imbalance, while the cross-encoder remains stable, a critical advantage for real biological datasets where positive regulatory interactions are rare.
Ablation: What Drives the Gap?
Ablation reveals that 95% of the AUROC gap stems from joint encoding itself. The element-wise product interaction term within the cross-encoder accounts for only 5%. The decisive factor is whether inputs are processed jointly or independently.
When we turned off parts to see which mattered, 95% of the accuracy gap came down to just reading the gene pair together from the start. A specific math trick inside the joint-reader added only 5%. In short: the winning move is looking at both genes at once.
Discussion
By matching parameters exactly at 5.58M, this study eliminates capacity as a confounding variable. The 10.84-point AUROC gap at balanced training (growing to >25 points at severe imbalance) is attributable to architectural choice alone.
Because both models have exactly 5.58M parameters, size can't be the reason one wins. The 10.84-point accuracy gap in easy mode (growing to over 25 points when the data gets realistic) is purely about how each model is designed.
Pruning Tolerance
Both architectures tolerate ≥60% structured neuron pruning without post-hoc performance loss, suggesting both are substantially over-parameterized at 5.58M for the dataset size. Leaner models deserve investigation.
We could cut more than 60% of each model's neurons and accuracy didn't drop. That means both designs are carrying a lot of extra weight for this amount of data. Smaller, leaner versions are worth exploring.
The dual-encoder's only advantage is a modest improvement in cold-start generalization to unseen entities, a consequence of its independent representations. However, this advantage is practically limited by the transductive nature of GRN inference. In most scenarios, TF and gene identities are known at training time, making the cross-encoder's inductive limitation irrelevant.
The separate-reader has one bright spot: it handles brand-new genes a bit better, because it builds an independent summary for each gene. But in real gene-network research, scientists usually already know which genes they're studying. So the joint-reader's weakness with unseen genes rarely shows up in practice.
The 50:1 collapse of the dual-encoder is a critical practical concern. Real regulatory databases have far more non-interacting TF–gene pairs than interacting ones, making the dual-encoder unsuitable for production GRN inference without significant modifications.
The separate-reader falling apart at 50:1 is a serious problem. Real biological databases are full of gene pairs that don't interact and only a few that do. Without major fixes, this design just isn't ready for real-world gene-network work.
Conclusion
This paper provides the first parameter-matched empirical comparison of dual-encoder and cross-encoder architectures for transductive GRN link prediction, submitted to IEEE Transactions on Neural Networks and Learning Systems (TNNLS).
This is the first fair, same-size comparison of the two AI designs on the task of predicting gene-control links. It has been submitted to IEEE Transactions on Neural Networks and Learning Systems (TNNLS).
The results are unambiguous: for transductive GRN link prediction from single-cell RNA-seq data, cross-encoders outperform dual-encoders by a large margin (10.84 pp AUROC at balanced training, growing catastrophically under imbalance). The decisive factor is joint encoding, not any secondary architectural feature.
The verdict is clear. For predicting gene-control links from single-cell data, the joint-reader wins by a wide margin (10.84 points in easy mode, much more under realistic imbalance). What matters is reading both genes together. Smaller design choices are just details.
Future directions include investigating whether dual-encoders can be augmented with late interaction mechanisms to recover the AUROC gap while retaining their retrieval efficiency, and exploring whether the 60% pruning tolerance enables significant model compression without accuracy loss.
Next, we want to see if the separate-reader can close the gap by adding a small "compare step" at the end, keeping it fast but smarter. And since both models survive losing 60% of their neurons, we'll explore how much smaller and cheaper to run we can make them without losing accuracy.