Algorithm
The core of BasicSpanner implements the heuristic strategy devised by Marín & Hoyas (2009) to characterize basic networks, extended with a parallel execution layer.
Problem statement
Let \(G = (V, E)\) be an undirected graph and \(S \subseteq V\) a set of seeds. A basic network is an induced subgraph \(B = (V_B, E_B)\) with \(S \subseteq V_B \subseteq V\) such that, for every pair of seeds \(s_i, s_j \in S\):
\[ d_B(s_i, s_j) = d_G(s_i, s_j), \]
and \(|V_B|\) is minimal among all subgraphs with this property. The nodes in \(V_B \setminus S\) are called connectors.
Computing \(V_B\) exactly is intractable for realistic graphs, since all subsets of \(V \setminus S\) would in principle have to be inspected. The heuristic implemented in BasicSpanner converges to near-optimal basic networks in manageable time.
Heuristic strategy
The algorithm proceeds in the following stages:
- Initialization. Seeds are marked as basic units; their pairwise geodesic distances in \(G\) are computed and stored.
- Seed+n expansion. All internal nodes of at least one seed-to-seed geodesic are collected and classified as seed+1, seed+2, …, seed+n according to how far they lie from the seeds. Units that are strictly necessary to preserve at least one geodesic distance are promoted to basic units.
- Path evaluation. For each pair of seeds, alternative geodesic paths are evaluated. Paths that traverse basic units or frequently reused seed+n units are preferred; paths through sparsely reused units are candidates for elimination.
- Path pruning. Non-essential alternative paths are removed whenever doing so does not alter any seed-to-seed distance.
- Replicated execution. To mitigate convergence to suboptimal local minima, the heuristic is executed over multiple randomized orderings. Replicates can be:
- Permutations of a user-defined seed list, or
- independent batches of randomly drawn seeds (used to establish the null distribution of connector counts).
- Optional final pruning. A final pass removes redundant connectors from tied basic networks while preserving every seed-to-seed distance. Pruning is generally required to reach the optimal basic network.
Parallel execution
Each replicate is independent and is dispatched to a worker thread. A fixed-size thread pool consumes the replicate queue and accumulates the results. Accumulation of the connector-count distribution, real-time histogram updates and summary statistics are performed on a single accumulator thread to avoid contention.
Significance of observed connectors
Random seed batches provide a null distribution for the number of connectors. For a user-defined seed set, a one-sided comparison against this null indicates whether the seeds are more or less connected than expected by chance.
In practice, \(n = 100\) batches are typically sufficient to characterize the null; for larger analyses, \(n = 20\) already yields stable results.
Implementation
The main computational entry points live in the src/core directory:
BasicSpannerEngine— orchestrates the analysis pipeline.BasicNetworkFinder— computes a single basic network.ParallelBasicSpannerEngine/ParallelGraphAlgorithms— thread-pool based parallel execution.GraphAlgorithms/PaperAlgorithmHelpers— low-level graph routines (BFS, pruning, path enumeration).
See the API reference for a detailed breakdown.