GRASP: Graph-Routed Adaptive Spectral Partitioning
An Autonomous Topology-Aware Clustering System with Oracle-Guided Routing
It looks at your data, figures out what shape the groups make, then hands the job to the algorithm best suited for that shape.
GRASP is a parameter-free clustering pipeline that constructs a topological fingerprint of any dataset, estimates the natural cluster count automatically, and dispatches to the most suitable specialist algorithm — without requiring labels, validation data, or user-specified hyperparameters.
Concept Overview
Abstract
Problem
No single clustering algorithm is universally optimal — the No Free Lunch Theorem formalises this. The optimal choice depends critically on the geometric and statistical properties of the dataset, which are precisely unknown in unsupervised settings. Existing approaches either require user-specified hyperparameters or rely on heuristic ensemble methods.
When you ask a computer to find groups in unlabelled data, the answer depends on which algorithm you pick. No single one works best everywhere — the No Free Lunch Theorem proves this. The right choice depends on the shape and spread of your data, which you cannot see without labels. So today, people either tweak settings by hand or stack several algorithms together and hope for the best.
Approach
GRASP presents a parameter-free autonomous pipeline that identifies the input distribution's geometric structure, estimates k via normalised spectral gap, and delegates the final assignment to one of six specialists — KMeans++, TopSOM, NnSom, WardAgglom, KnnSpectral, or RandNystrom — chosen by an oracle that maps topological and Euclidean diagnostic signals to the algorithm most likely to achieve Bayes-optimal partitioning.
GRASP works on its own — no knobs to tune. First it sketches the shape of your data. Then it estimates k, the number of groups, using a math trick called the normalised spectral gap. Finally, it hands the data to one of six specialist algorithms — KMeans++, TopSOM, NnSom, WardAgglom, KnnSpectral, or RandNystrom. A built-in routing oracle reads diagnostic signals about the shape of the data and picks the specialist most likely to give the best possible answer.
Results
Evaluated on 26 benchmark datasets spanning non-convex shapes, SIPU S-sets and A-sets, UCI real-world data, and high-dimensional settings. GRASP achieves ARI 0.378 on interlocking rings against DBSCAN's ARI 0.000, passes all 49 unit tests, and processes 50,000 points within 140ms including full GNG training and Nyström embedding.
We tested GRASP on 26 datasets: weird shapes like rings and spirals, classic benchmarks (SIPU S-sets and A-sets), real-world data from UCI, and very high-dimensional inputs. On interlocking rings — a famously hard case — GRASP scored ARI 0.378 while DBSCAN scored 0.000. All 49 unit tests pass. It chews through 50,000 points in 140ms, training and embedding included.
Contribution
A topology-prior partitioning principle, geometry-adaptive oracle routing, and selective spectral deployment — three design principles that together produce an interpretable, parameter-free system whose oracle converges to the Bayes-optimal specialist as n → ∞.
Three ideas hold the system together: map the shape first, let the shape pick the algorithm, and only use heavy spectral math when the shape demands it. Together they make a transparent, knob-free system whose oracle is provably right as the dataset grows large (as n → ∞).
Introduction
The fundamental difficulty in clustering is that the optimal algorithm depends on geometric and statistical properties of the dataset that are precisely unknown in the unsupervised setting. In practice, practitioners execute several algorithms in succession, selecting among them via visual inspection or labelled validation data — neither of which is available in genuinely unsupervised analysis.
Clustering — sorting items into groups where similar things end up together — has one big catch. The best algorithm depends on the shape and spread of your data, but you cannot see either without labels. So people just run a few algorithms back to back and pick the answer that looks right, or compare it against labelled data. Neither option exists when your data is truly unlabelled.
The Normalised Spectral Gap
A topology-faithful graph makes cluster structure computationally accessible. The normalised spectral gap δk = λk+1/λk+2 is large when the graph decomposes cleanly into well-separated subgraphs — and small when cluster boundaries are topologically entangled. GRASP uses this gap to both estimate k and calibrate the oracle's routing decisions.
Once you build a graph that faithfully captures the shape of the data, finding the groups becomes a math problem. The normalised spectral gap δk = λk+1/λk+2 is a number that gets large when the groups are clean and far apart, and small when they tangle together. GRASP uses this number to guess how many groups there are and to decide which specialist algorithm to call next.
GRASP's three design principles separate it from existing methods. First, topology-prior partitioning constructs a topology map before committing to any partitioning strategy, at cost O(nm) independent of k. Second, geometry-adaptive routing selects the specialist on the basis of diagnostics extracted from the topology map — not heuristics external to the pipeline. Third, selective spectral deployment applies full spectral computation only when topological evidence indicates Euclidean methods are insufficient, keeping the typical cost at O(nm).
Three design choices set GRASP apart. First, map first, decide later: it builds a shape map of the data before picking how to split it — the cost is O(nm), which doesn't grow with the number of groups k. Second, let the map pick the tool: the algorithm is chosen using signals read from the map, not outside rules of thumb. Third, save the expensive math for when you need it: heavy spectral computation runs only when straightforward methods clearly won't work. The typical cost stays at O(nm).
Full spectral clustering incurs O(n²) memory for the kernel matrix and O(n³) for eigendecomposition — prohibitive at n ≥ 10,000. GRASP's Nyström extension with GNG landmarks reduces this to O(nmk) with data-proportional coverage, enabling spectral-quality clustering at linear cost.
Standard spectral clustering chokes on big datasets. It needs O(n²) memory and O(n³) time, so once you hit n ≥ 10,000 points it becomes impractical. GRASP sidesteps this with a Nyström approximation built on GNG landmarks — a smart way to learn from a representative slice instead of every point. Cost drops to O(nmk) and the landmarks cover the data evenly, so quality stays high.
Pipeline
Stage 1 — Convex-Bypass Probe
On a random subsample X̂ of size ns = 800, GRASP runs r = 2 independent KMeans++ restarts with k = khint, computes the best mean silhouette ŝ on the best labelling. If ŝ > 0.55, all n points are assigned by nearest-centroid and the pipeline terminates at cost O(nskd). This fast convex-bypass saves 16 of 26 benchmark datasets, completing in 2–22ms.
First, a quick sanity check on whether the easy answer works. GRASP grabs a random sample of ns = 800 points and runs KMeans++ twice with k = khint. It scores how clean the resulting groups look using a number called the silhouette ŝ. If ŝ > 0.55, the data is well-behaved and GRASP just assigns every point to its nearest group center and stops. Cost: O(nskd). This shortcut solves 16 of 26 benchmarks in 2–22ms.
Stage 2 — Growing Neural Gas Topology
GRASP trains a GNG on X using min(n, 8000) points for T = 50 epochs. GNG inserts new neurons at locations of maximal quantisation error, yielding a topology map whose connectivity emerges from data. The Competitive Hebbian Learning theorem guarantees the GNG edge set is a subgraph of the Delaunay triangulation of the neuron positions — providing a formal topological fidelity guarantee. Edges older than dmax are pruned every λ steps.
If the shortcut fails, GRASP trains a Growing Neural Gas (GNG) on up to 8000 points for T = 50 passes. Think of GNG as a small flexible mesh that grows toward your data, adding new points where it fits worst. The result is a shape map that mirrors the data's structure. The Competitive Hebbian Learning theorem proves the mesh's edges line up with the natural Delaunay structure of the data, so the map is faithful. Old, unused edges get trimmed every λ steps.
Stage 3 & 4 — Graph Laplacian & Eigengap k*
The bandwidth σ = median({‖wi−wj‖}) is estimated from GNG edge lengths. The sparse affinity matrix Aij = exp(−‖wi−wj‖²/2σ²) is built, and the normalised Laplacian L = I−D−1/2AD−1/2 is computed at constant cost in m. GRASP then estimates k* via the normalised gap δi = (λi+1−λi)/max(λk+1, ε), setting k* = arg max δi.
Now GRASP turns the shape map into a math object. It sets the bandwidth σ = median({‖wi−wj‖}) from the edge lengths of the mesh, builds an affinity matrix Aij = exp(−‖wi−wj‖²/2σ²) that says how connected each pair of mesh nodes is, and forms the normalised Laplacian L = I−D−1/2AD−1/2. Then it finds the natural number of groups k* by looking for the biggest jump in the spectrum: k* = arg max δi. No guessing required.
L = I − D−1/2AD−1/2, δi = (λi+1−λi) / max(λk+1, ε)
k* = arg maxi δi — robust to spectral scale by construction
Stage 5 — Nyström Embedding
Given GNG eigenvectors U ∈ ℝm×k*, each data point x is embedded as φ(x) = ∑i=1m K(x, wi) / √(Dx · di) Uij. Each row of φ(x) is ℓ²-normalised. Because GNG neurons concentrate in proportion to data density, they systematically cover all cluster regions, whereas random landmarks may miss thin or low-density structures. The Rust implementation distributes the loop across all cores via Rayon.
Next, GRASP places every data point into a new coordinate system that makes the groups easy to see. Using the GNG eigenvectors U ∈ ℝm×k*, each point x lands at φ(x) = ∑i=1m K(x, wi) / √(Dx · di) Uij, with rows ℓ²-normalised. The trick is using GNG mesh points as landmarks — they sit where the data sits, so thin or sparse regions never get missed (random landmarks would). The Rust implementation runs this in parallel across all CPU cores via Rayon.
Stage 6 — Six Specialist Sub-Algorithms
GRASP instantiates six specialists, each targeting a different geometric regime:
GRASP keeps six specialist algorithms on hand, each one good at a particular type of data shape:
- KMeans++ — O(nkd), on original features. Best for approximately Gaussian clusters.
- TopSOM — O(nm²), Laplacian → Eigengap → Nyström → KMeans. Full spectral pipeline on GNG topology.
- GNG-DenSom — O(nm), density-watershed segmentation on the GNG graph.
- WardAgglom — O(n², n ≤ 1000), Ward linkage on the Nyström embedding.
- KnnSpectral — Full k-NN spectral clustering on original features.
- RandNystrom — O(nm), Nyström with GNG landmarks, KMeans final step.
Stage 7 — Routing Oracle
The oracle maps three diagnostic signals to an algorithm choice: sts (TopoSOM silhouette), skm (KMeans silhouette), and spectral advantage Δ. The routing logic encodes four cases: Case 2.5 (high-confidence Gaussian, ρ > 0.70 and skm > 0.60 → KMeans); Case 2 (low ARI signals non-convex geometry → KnnSpectral/WardAgglom/TopoSOM); Cases 3a/3a′ (explicit spectral advantage → TopoSOM for large graphs); Case 3b (silhouette fallback → max(sts, skm, srn)).
The oracle decides who runs the job by reading three signals: how clean the TopoSOM grouping looks (sts), how clean the KMeans grouping looks (skm), and the spectral advantage Δ (whether spectral methods do meaningfully better than KMeans). It branches in four ways. Case 2.5: if the data looks like clean blobs (ρ > 0.70 and skm > 0.60), use KMeans. Case 2: if scores are low, the shape is probably weird — route to KnnSpectral, WardAgglom, or TopoSOM. Cases 3a/3a′: clear spectral advantage on large graphs → TopoSOM. Case 3b: when in doubt, pick the algorithm with the cleanest groups — max(sts, skm, srn).
Δ = sTopoSOM − sKMeans
Spectral advantage Δ > 0.20 with p > 0.05 routes to TopoSOM
Results
GRASP was evaluated on 26 benchmark datasets in six categories: non-convex shapes (moons, circles, spirals, interlocking rings, swiss_roll_2d, anisotropic, varied_density; n = 150–1000), SIPU S-sets s1–s4, A-sets a1–a3, UCI real-world (Iris, Wine, Breast Cancer, Digits), and scalability datasets at n = 50,000.
We threw 26 datasets at GRASP, grouped into six families. Tricky shapes: moons, circles, spirals, interlocking rings, swiss_roll_2d, anisotropic, varied_density (n = 150–1000). Standard academic benchmarks: SIPU S-sets s1–s4 and A-sets a1–a3. Real-world data from UCI: Iris, Wine, Breast Cancer, Digits. And a speed test at n = 50,000.
Interlocking Rings — GRASP ARI 0.378
On the interlocking rings benchmark where cluster boundaries are maximally topologically entangled: DBSCAN ARI 0.055, HDBSCAN ARI 0.055, KnnSpectral ARI 0.234 — GRASP achieves ARI 0.378 via TopoSOM routing, a 14× improvement over DBSCAN and 60% over KnnSpectral.
Picture two rings linked through each other like a chain — the toughest possible test for separating groups. DBSCAN scored ARI 0.055, HDBSCAN scored 0.055, and KnnSpectral managed 0.234. GRASP picked TopoSOM and hit ARI 0.378 — 14× better than DBSCAN and 60% better than KnnSpectral. (ARI is a score from 0 to 1 for how well the groups match the truth.)
Non-convex benchmarks show the clearest advantage: GNG correctly captures the ring and spiral topology, the Laplacian eigengap identifies k = 2 or k = 3, and the oracle routes to TopoSOM which recovers the true partition via spectral embedding. KMeans++ achieves mean ARI 0.188 on these datasets; GRASP achieves 0.376.
Weird shapes are where GRASP shines brightest. The GNG mesh captures the rings and spirals correctly. The eigengap correctly says there are 2 or 3 groups. The oracle picks TopoSOM, and TopoSOM uses its spectral embedding to find the right answer. Across these datasets, KMeans++ averages ARI 0.188; GRASP averages 0.376 — roughly twice as accurate.
A-sets and S-sets (overlapping Gaussian clusters): The convex-bypass probe correctly activates on S4 (tight clusters, silhouette > 0.55) and routes to KMeans on A1 (ρ = 0.862). On A3 (heavily overlapping, k = 50), GRASP routes to KMeans++ and achieves ARI 0.216 vs KMeans++ at 0.188.
A-sets and S-sets are made of overlapping blob-shaped groups. GRASP correctly takes the shortcut on S4 (tight clusters, silhouette > 0.55) and hands A1 to KMeans (ρ = 0.862). On A3, which is messy with k = 50 heavily overlapping groups, GRASP still routes to KMeans++ and scores ARI 0.216, slightly better than running KMeans++ alone at 0.188.
Scalability: GNG, Laplacian, and eigendecomposition stages collectively incur O(m³) operations, entirely independent of n. The sole n-dependent step is the Nyström extension O(nmk); at n = 50,000 this completes in ~140ms. Standard SC requires O(n²d) at n = 10,000 this is 1012 operations — GRASP's O(nm) is a 500× reduction at comparable quality.
How fast does it scale? The heavy math (GNG, Laplacian, eigendecomposition) costs O(m³) and doesn't grow with the dataset size n. Only one step actually touches every data point, and it costs O(nmk). At n = 50,000 the whole thing finishes in roughly 140ms. Standard spectral clustering would need O(n²d), which at just n = 10,000 is already 1012 operations. GRASP's O(nm) is about 500× less work for similar quality.
Theoretical Analysis
GRASP's three main theoretical results establish conditions under which each component achieves its design goal.
Three theorems pin down exactly when each part of GRASP is guaranteed to do its job correctly.
Theorem 2 — Spectral Cluster Recovery
If X is drawn from k distributions with compact support satisfying minj≠l d(Cj, Cl) ≥ Δ > 0, and the GNG mean edge length σ < Δ/2, then L has exactly k zero eigenvalues whose eigenvectors span a subspace in which the k cluster indicator vectors are mutually orthogonal.
In plain terms: if your data really does come from k separate groups that don't touch (minj≠l d(Cj, Cl) ≥ Δ > 0), and the GNG mesh is fine-grained enough that its edges are shorter than half the gap between groups (σ < Δ/2), then the math is guaranteed to find them. The Laplacian L has exactly k zero eigenvalues, and the corresponding eigenvectors point cleanly to each group.
Theorem 3 (Nyström error): For rank-m Nyström approximation K̃ with GNG landmarks, ‖K−K̃‖F ≤ λm+1(K) √(n−m). The eigenvector error for the leading k components is O(λk+1/(λk−λk+1)), which diminishes as the spectral gap increases — exactly the regime where GRASP routes to TopSOM.
Theorem 3 (how close the shortcut gets to the real answer): With GNG landmarks, the Nyström approximation K̃ differs from the true kernel by at most ‖K−K̃‖F ≤ λm+1(K) √(n−m). The error in the leading k directions is O(λk+1/(λk−λk+1)). The bigger the spectral gap, the smaller the error — and the big-gap case is exactly when GRASP routes to TopSOM.
Theorem 4 (Delaunay subgraph, Frötzke 2009): The GNG edge set is a subgraph of the Delaunay triangulation of the neuron positions. Hence any topologically separable cluster boundary present in the data's Delaunay structure is also represented in the GNG graph, bounding the Nyström approximation quality from below.
Theorem 4 (Delaunay subgraph, Frötzke 2009): The GNG mesh's edges are always part of the data's natural Delaunay structure — the canonical way to connect nearby points. So any boundary between groups that the Delaunay structure can see, the GNG mesh sees too. This sets a guaranteed floor on how good the Nyström approximation can be.
Proposition 5 (Oracle consistency): As n → ∞ with fixed cluster counts, the routing oracle selects the Bayes-optimal specialist with probability approaching 1. The oracle uses ARI estimate ρ, silhouette coefficients, and spectral advantage Δ — all of which converge by law of large numbers, and eigenvector stability of symmetric operators stabilises at a fixed label corresponding to the asymptotically optimal specialist.
Proposition 5 (the oracle picks right in the long run): As the dataset grows large (n → ∞) with a fixed number of groups, the oracle picks the truly best specialist with probability approaching 1. The signals it relies on — ρ, the silhouette scores, the spectral advantage Δ — all settle into stable values by the law of large numbers, and the eigenvectors lock onto the right specialist.
Conclusion
GRASP demonstrates that autonomous, parameter-free clustering is achievable through principled geometric diagnosis rather than heuristic ensemble voting. The three-principle architecture — topology prior, geometry-adaptive routing, selective spectral deployment — produces a system that is simultaneously faster than full spectral methods, more robust than density methods on complex topology, and more accurate than KMeans++ on non-convex structure.
GRASP shows you can build a clustering system that picks its own algorithm and tunes itself, by reading the geometry of the data rather than blindly stacking methods together. The three-part design — map first, route by shape, use heavy math only when needed — gives you something that is faster than full spectral methods, steadier than density methods on tangled shapes, and more accurate than KMeans++ on anything that isn't a clean blob.
The fast convex-bypass terminates 16 of 26 benchmarks in under 22ms. For the remaining 10, the full pipeline achieves state-of-the-art accuracy while processing 50,000 points within 140ms — scaling as O(nm) in n, independent of the O(m³) topology computation which depends only on the GNG size m.
The shortcut path finishes 16 of 26 benchmarks in under 22ms. For the other 10, the full pipeline reaches top-tier accuracy and still handles 50,000 points in 140ms. The cost grows as O(nm) with the dataset, and the heavier O(m³) topology step doesn't grow with n at all — only with the mesh size m.
Future directions include extending GRASP's routing oracle with learned diagnostic features via a small neural classifier trained on the topology signals, integrating SOM-TSK's multi-start seeding as a replacement for KMeans++ in the KMeans specialist, and evaluating GRASP on single-cell RNA-seq datasets where cluster topology is biologically meaningful and ground truth is available via cell-type markers.
Next steps: train a small neural classifier to read the topology signals and make the oracle even sharper, swap KMeans++ for SOM-TSK's multi-start seeding inside the KMeans specialist, and try GRASP on single-cell RNA-seq data — where the shape of the cluster carries real biological meaning and the answers can be checked against known cell-type markers.