Optimizer Configuration
Configurable parameters for the BADGERS evolutionary spacer optimization algorithm.
Overview
The optimizer exposes two configuration structs: EvolutionaryConfig controls the evolutionary algorithm hyperparameters, and SiteExtractionConfig controls which MSA positions are eligible for optimization. Both have sensible defaults tuned from the BADGERS reference implementation.
Evolutionary Parameters
Two presets are available via EvolutionaryConfig::multi_target_defaults() and EvolutionaryConfig::variant_id_defaults(). The correct preset is auto-selected based on the optimization mode.
| Parameter | Multi-Target Default | Variant-ID Default | Description |
|---|---|---|---|
| population_size | 87 | 119 | Number of parent sequences in the population (S in paper) |
| beta | 0.077 | 2.202 | Boltzmann selection temperature. Smaller = more greedy, larger = more exploratory. |
| replacement_fraction | 0.795 | 0.893 | Fraction of population replaced per generation. floor(j × S) parents are sampled and mutated. |
| mutation_rate | 0.003 | 0.029 | Per-position probability of random nucleotide substitution (gamma in paper) |
| budget | 2000 | 2000 | Maximum number of novel children to evaluate before stopping |
| local_search_depth | 3 | 3 | Max mismatch positions for combinatorial local search (0 = disabled) |
| local_search_top_k | 5 | 5 | Number of top spacers to apply local search to |
| seed | None | None | Optional RNG seed for deterministic reproducibility |
Site Extraction Configuration
SiteExtractionConfig controls how the MSA is scanned for eligible optimization sites.
| Parameter | Default | Description |
|---|---|---|
| spacer_length | 28 | Spacer sequence length in nucleotides (Cas13a) |
| context_nt | 10 | Flanking context nucleotides on each side (ADAPT model requirement) |
| min_valid_fraction | 0.80 | Minimum fraction of MSA sequences that must have valid ACGT-only windows at a position |
| window_ranges | None | Optional position ranges to restrict site extraction (set automatically by region discovery) |
Conservation Threshold
The min_valid_fraction parameter (range: 0.5 to 1.0, default: 0.80) determines the conservation threshold for site eligibility. At the default of 0.80, a site must have valid ACGT-only 48 nt windows in at least 80% of the MSA sequences to be considered for optimization.
Lower values include more divergent sites (useful for diverse pathogen sets), while higher values restrict to highly conserved regions where spacers are most likely to detect all variants. The total window length is 2 × context_nt + spacer_length = 48 nt.
Automatic Region Discovery
Rather than optimizing every eligible site across the full alignment, the optimizer can auto-discover the most promising regions:
| Mode | Discovery Method | Metric |
|---|---|---|
| Multi-target detection | discover_conserved_regions() | Lowest positional Shannon entropy (most conserved stretches) |
| Variant identification | discover_discriminative_regions() | Highest Jensen–Shannon divergence between on/off partitions |
Both methods accept a region count (n) and width (region_width, default: 50 window positions). Discovered regions are injected into SiteExtractionConfig.window_ranges, restricting the optimizer to those regions only.
Diversity Settings
Two builder parameters control post-evolution diversity filtering:
| Parameter | Default | Description |
|---|---|---|
| diversity_distance | 3 | Minimum pairwise Hamming distance between kept spacers. Spacers within this distance of a higher-fitness spacer are removed. |
| top_k_per_site | 5 | Maximum spacers retained per site after diversity filtering and truncation |
Setting diversity_distance to 0 disables Hamming-based deduplication entirely, keeping all population members (subject to top_k_per_site truncation).
Assay Score Weight Preset
Optimized spacers are scored with the for_optimizer() weight preset, which increases the ML activity weight to 0.35 (from default 0.30) and decreases heuristic quality to 0.05 (from 0.10). It also narrows the ML activity normalization range to (2.0, 4.0) to better separate top candidates whose fitness values cluster in that band.
See the Assay Score page and the BADGERS Optimizer page for context on how this preset is applied.