BADGERS Optimizer
An evolutionary algorithm for generating optimized—and potentially novel—Cas13a spacer sequences from multiple sequence alignments.
Overview
Unlike standard spacer finding, which identifies and scores subsequences already present in the input, the BADGERS optimizer generates novel spacer sequences that may not exist in any natural sequence. It uses the ADAPT CNN models as a frozen fitness oracle and applies an evolutionary algorithm to maximize spacer activity across target sequence diversity.
The algorithm is based on Mantena et al., Nature Biotechnology 2024. Input is a multiple sequence alignment (MSA) of pathogen variants in FASTA format. All spacer sequences are 28 nt (Cas13a).
Two Objective Modes
The optimizer supports two distinct objectives, each with its own fitness function and default hyperparameters.
| Mode | Use Case | Fitness Objective |
|---|---|---|
| Multi-target detection | Detect all variants of a pathogen | Maximize frequency-weighted mean activity across all targets |
| Variant identification | Distinguish variant A from variant B | Maximize on-target activity while minimizing off-target activity (sigmoidal cost) |
Workflow
The optimizer processes each eligible site in the MSA through a five-step pipeline.
| Step | Operation | Output |
|---|---|---|
| 1. Extract sites | Slide a 48 nt window (10 nt flanking + 28 nt spacer + 10 nt flanking) across the MSA. Keep positions where ≥80% of sequences have valid ACGT-only windows. | Vec<GenomicSite> — one per eligible position |
| 2. Build fitness | Construct a MultiTargetFitness or VariantIdFitness evaluator wrapping the ADAPT predictor and target set for the site. | Fitness function for this site |
| 3. Evolve | Initialize population via Boltzmann sampling from seed sequences. Each generation: sample parents, mutate, replace worst. Repeat until evaluation budget is exhausted. Optional local search around top spacers. | OptimizationResult with ranked population |
| 4. Diversity filter | Greedily remove spacers within a Hamming distance threshold of a higher-fitness spacer. | Deduplicated spacer set |
| 5. Score & return | Convert evolutionary fitness to ScoredSpacerCandidate with full quality flags, assay score (using for_optimizer() weights), and tier classification. | SiteOptimResult per site, aggregated into OptimizerOutput |
Fitness Functions
Multi-Target Detection
Maximizes expected Cas13a activity across all sequence variants. The fitness of a spacer is the frequency-weighted average of its combined activity against all unique targets:
fitness(spacer) = Σ(freq_t × combined_activity(spacer, target_t))
Where combined_activity = classify_prob × (regression + 4.0) − 4.0. This joint classification-regression score is the ADAPT model's native output format.
After evolution, the optimizer also computes perc_highly_active for each top-k spacer: the frequency-weighted fraction of targets where the spacer is classified as "highly active" (both classification probability and regression score above their respective thresholds).
Variant Identification
Maximizes activity against an on-target partition while minimizing activity against an off-target partition, using sigmoidal cost functions:
t2_cost = c / (1 + a × exp(k × (t2_activity − o)))t1_cost = c − c / (1 + a × exp(k × (t1_activity − o)))fitness = −(t2w × t2_cost + t1_cost)
| Hyperparameter | Default | Role |
|---|---|---|
| c | 1.0 | Sigmoid amplitude |
| a | 5.897 | Sigmoid scale factor |
| k | −2.858 | Sigmoid steepness |
| o | −2.511 | Sigmoid midpoint offset |
| t2w | 1.737 | Off-target cost weight |
Diversity Filter
After evolution, a greedy Hamming distance filter ensures sequence diversity in the output. Spacers are iterated in descending fitness order; each spacer is kept only if its Hamming distance to all previously kept spacers exceeds the threshold (default: 3).
Setting the minimum distance to 0 disables filtering entirely. After filtering, results are truncated to top_k_per_site (default: 5).
Output
The optimizer produces an OptimizerOutput containing per-site results (SiteOptimResult). Each site result includes:
| Field | Description |
|---|---|
| spacers | Optimized spacers as ScoredSpacerCandidates with full quality flags and tier |
| shannon_entropy | Average Shannon entropy across the spacer region at this site |
| consensus_fitness | Fitness of the consensus seed spacer (baseline for improvement) |
| num_targets / num_valid_seqs | Unique targets and total valid sequences at the site |
| mean_on/off_target_activity | Weighted combined activity against each partition (variant-id only) |
| site_targets | Per-target sequences with frequencies and partition labels |
Cross-site convenience methods include best_spacer(), all_spacers_ranked(), and summary() for aggregated statistics (total spacers, novel count, best fitness, mean improvement over consensus).
Optimizer Weight Preset
Optimized spacers use the for_optimizer() assay score weight preset, which differs from the standard default in two key ways:
| Component | Default Weight | Optimizer Weight |
|---|---|---|
| ml_activity | 0.30 | 0.35 |
| heuristic_quality | 0.10 | 0.05 |
| ml_activity_range | (0.0, 4.0) | (2.0, 4.0) |
The narrower ML activity range of (2.0, 4.0) is used because optimizer fitness values (shifted by +4.0) cluster in that band. The default (0.0, 4.0) range would compress their spread, making it hard to differentiate top candidates. See the Assay Score page for the full weight breakdown.