MSA Guide Design
Automated pan-variant guide RNA design from Multiple Sequence Alignments, producing guides ranked by cross-strain coverage and activity.
Overview
MSA guide design is SPACER's end-to-end workflow for designing CRISPR guides that detect all known variants of a target. Instead of scoring individual guides one at a time, you provide a set of variant sequences (as an MSA or unaligned FASTA), and SPACER automatically finds candidate guides from the reference sequence, scores each against every variant, and returns guides ranked by their variant coverage.
The workflow consists of three stages:
- Find spacers in the reference (first) sequence of the MSA.
- Score each spacer against all variant sequences using the ML activity model, producing per-variant activity predictions.
- Rank by coverage — select guides that detect the most variants above the activity threshold, using the configured ranking strategy.
Input Format
The input is a FASTA file containing at least 2 sequences. SPACER supports two input modes:
| Mode | Detection | Behavior |
|---|---|---|
| Pre-aligned MSA | All sequences have equal length | Used directly; gap characters (‘-’) preserved |
| Unaligned sequences | Sequences differ in length | Auto-aligned with MAFFT before analysis |
The first sequence in the FASTA is treated as the reference. Candidate spacers are identified in this reference sequence, then scored against every other sequence in the alignment.
Configuration
MSA guide design uses the same configuration as multi-target scoring, plus a site extraction parameter that controls which alignment columns are considered:
| Parameter | Default | Range | Description |
|---|---|---|---|
| activity_threshold | 0.0 (shifted) | [0, 4+] | Minimum activity for a variant to count as covered |
| min_coverage_fraction | 0.95 | [0.0, 1.0] | Minimum fraction of variants that must be covered |
| gap_handling | skip_gapped | — | Strategy for variants with gaps in the target region |
| max_gap_fraction | 0.0 | [0.0, 1.0] | Maximum gap ratio before a variant is skipped (0.0 = ADAPT compatibility) |
| ranking_strategy | coverage_first | — | Guide ranking: coverage_first, maximize_minimum, maximize_mean |
| signal_ratio_cutoff | None | [0.0, 1.0] | Optional signal-to-noise filter for coverage |
Conservation Threshold (min_valid_fraction)
When extracting candidate sites from an MSA, SPACER filters alignment columns by the fraction of sequences that have valid (non-gap) nucleotides at each position. The min_valid_fraction parameter controls this filter:
| Property | Value |
|---|---|
| Parameter | min_valid_fraction |
| Default | 0.80 (80%) |
| Range | 0.0–1.0 |
| Effect | Alignment columns where fewer than this fraction of sequences have valid nucleotides are excluded from site extraction |
A value of 0.80 means a site must have valid nucleotides in at least 80% of the input sequences to be considered. Increasing this value produces more conservative results by focusing on highly conserved regions; decreasing it allows sites with more variation to be evaluated.
min_valid_fraction to 0.5 to explore less-conserved regions. For stable targets, the default 0.80 is appropriate.Output
Each guide in the output includes:
| Field | Description |
|---|---|
| coverage_fraction | Fraction of scorable variants above the activity threshold |
| strains_covered / strains_total | Absolute count of covered vs. total variants |
| mean_activity | Mean predicted activity across all scored variants |
| median_activity | Median activity (robust central tendency) |
| min_activity | Worst-case variant activity |
| max_activity | Best-case variant activity |
| std_activity | Standard deviation of activity scores |
| percentile_5 / percentile_95 | Robust worst-case and best-case bounds |
| low_activity_strains | IDs of variants that fell below the activity threshold |
| low_signal_variants | Count of variants above threshold but below signal ratio cutoff |
| variant_scores | Per-variant detail: activity, mismatch/gap counts, signal class |
Guides are returned sorted by their ranking score (descending). A meets_coverage flag indicates whether the guide satisfies the configured min_coverage_fraction.
Coverage as an Assay Score Component
When MSA data is provided, the coverage fraction feeds directly into the composite assay score as the coverage component with a default weight of 0.25. This means variant coverage accounts for 25% of the final guide ranking in the default weight preset — the second highest weight after ML activity (0.30).
See Coverage & Specificity for details on how coverage integrates with the assay score, including weight rebalancing when specificity components are activated.
Gap Handling Strategies
When extracting spacer regions from an MSA, some variants may have gaps (insertions or deletions) in the target region. The gap_handling parameter controls how these are treated:
| Strategy | Behavior | Coverage Effect |
|---|---|---|
| skip_gapped (default) | Skip variants with any gaps in the target region | Excluded from both numerator and denominator |
| include_in_denominator | Skip scoring but count in denominator | Reduces coverage fraction for gapped variants |
| fill_with_n | Replace gaps with N and score anyway | All variants scored; gaps may reduce activity |
The default skip_gapped with max_gap_fraction = 0.0 matches the behavior of the original ADAPT Python implementation, which excludes any sequence with gaps in the target+context region from scoring.