Composite Scoring
How SPACER calculates the overall guide quality score from additive scoring components.
Overview
Every candidate guide RNA receives a composite score between 0 and 100. This score is computed as an additive sum of individual component adjustments applied to a fixed base score. The composite score determines the guide's tier classification and its rank relative to other candidates.
Scoring Formula
The composite score is computed as a clamped additive sum of fixed-point component adjustments:
composite = base_score
+ gc_adjustment
+ ml_adjustment
- homopolymer_penalty
+ pfs_adjustment
+ structure_adjustment
Result is clamped to [0, 100].Each component contributes a fixed number of points (positive or negative) to the final score. There is no weighting or normalization — the ranges below are the actual point values applied. When optional components (AI activity, RNA structure) are not enabled, their adjustments are simply 0.
Scoring Components
The composite score is built from a base score plus adjustments from each component:
| Component | Point Range | When Applied |
|---|---|---|
| Base score | 40 | Always (starting point for all candidates) |
| GC content | -10 to +10 | Always — graduated piecewise interpolation |
| AI activity | -50 to +40 | Only when AI prediction is enabled (EasyDesign for Cas12, ADAPT for Cas13) |
| Homopolymer | 0 to -10 | When longest run exceeds 3 consecutive identical bases |
| PFS | -5 to +5 | Cas13 only — protospacer flanking sequence preference |
| RNA structure | -10 to +10 | Only when structure prediction is enabled (ViennaRNA) |
has_poly_t flag separately. Some synthesis methods might tolerate poly-T without issue.Component Details
Base Score (40 points)
All candidates start at 40 points. This provides a baseline that gives AI predictions room to differentiate guides (up to +40 headroom) while negative penalties can push poor candidates toward 0.
GC Content Adjustment (-10 to +10)
Uses graduated piecewise linear interpolation rather than a binary optimal/non-optimal check. This produces a continuous adjustment that smoothly ramps between penalty and bonus:
| GC Range | Adjustment | Behavior |
|---|---|---|
| 0–20% | -10 | Flat penalty (extreme AT bias) |
| 20–40% | -10 to +10 | Linear ramp toward optimal |
| 40–60% | +10 | Full bonus (optimal range) |
| 60–80% | +10 to -10 | Linear ramp away from optimal |
| 80–100% | -10 | Flat penalty (extreme GC bias) |
For example, a spacer with 30% GC content receives an adjustment of 0 (the midpoint of the ramp), while 50% GC receives the full +10 bonus.
AI Activity Adjustment (-50 to +40)
When AI prediction is enabled, a piecewise formula maps the predicted activity score to a point adjustment:
| Activity Score | Adjustment | Interpretation |
|---|---|---|
| 0.0 (inactive) | -50 | Classifier rejected — effectively disqualifies |
| 0.5 | +5 | Weak activity |
| 1.0 | +10 | Low-moderate activity |
| 2.0 | +20 | Moderate activity (typical EasyDesign range) |
| 3.0 | +30 | Good activity (typical ADAPT range) |
| 4.0+ | +40 (max) | Strong activity (bonus capped) |
The formula for active guides is round(activity × 10), clamped to +40. Inactive guides (activity = 0.0) receive a flat -50 penalty since the classifier determined they are unlikely to have any on-target effect.
When AI prediction is enabled, the AI activity score and the composite score serve different purposes and should be interpreted as complementary evaluation dimensions:
- AI activity score (0.0–4.0+) is the model's direct prediction of on-target cleavage efficacy. If your primary goal is maximizing predicted activity, sort and filter by this value. The guide with the highest AI activity score is the one the model predicts will perform best at its target site.
- Composite score (0–100) is a holistic quality metric that incorporates AI activity alongside sequence-composition factors (GC content, homopolymers, PFS, RNA structure). It reflects overall guide quality, not just predicted activity.
Critically, the AI models do not evaluate GC content, homopolymer runs, poly-T synthesis issues, PFS preferences, or RNA secondary structure — those are captured only by the composite score. A guide with the highest AI activity may still have suboptimal GC content or problematic homopolymer runs. Use the composite score as a second-pass quality filter to identify guides that are both highly active and have favorable sequence properties.
Homopolymer Penalty (0 to -10)
Applied when the longest run of consecutive identical nucleotides exceeds 3. The penalty scales with run length:
penalty = min((run_length - 3) × 2.5, 10)| Run Length | Penalty | Example |
|---|---|---|
| 1–3 | 0 | AAA (no penalty) |
| 4 | -2 | AAAA |
| 5 | -5 | AAAAA |
| 6 | -7 | AAAAAA |
| 7+ | -10 (max) | AAAAAAA or longer |
PFS Adjustment (Cas13 only, ±5)
For Cas13 enzymes, the Protospacer Flanking Sequence (PFS) at the 3' end of the target is evaluated:
- +5: Favorable PFS (e.g., non-G at 3' for LwaCas13a)
- -5: Unfavorable PFS
- 0: PFS not evaluated (Cas12, or Cas13 without flanking information)
RNA Structure Adjustment (±10)
When RNA secondary structure prediction is enabled (via ViennaRNA), the predicted folding of the target site adjusts the score based on MFE (minimum free energy) and seed region accessibility:
- +10: Minimal structure, high seed accessibility (favorable)
- 0: Average structure (neutral)
- -10: Strong structure, low seed accessibility (unfavorable)
Interpreting Scores
The composite score maps directly to the tier classification system:
| Score Range | Tier | Recommendation |
|---|---|---|
| 80–100 | Excellent | Strong candidates for experimental validation |
| 60–79 | Good | Viable candidates, likely to perform well |
| 40–59 | Fair | Usable but may have one or more weaknesses |
| 0–39 | Poor | Not recommended without additional validation |
Without AI prediction enabled, the heuristic-only score range is effectively 0–60 (base 40 ± GC ± homopolymer ± PFS ± structure). AI activity predictions extend the range to the full 0–100, allowing excellent candidates to reach 90+ while inactive guides drop below 10.
Score Examples
| Scenario | Calculation | Score | Tier |
|---|---|---|---|
| Optimal heuristic-only | 40 base + 10 GC (50%) = 50 | 50 | Fair |
| Poor GC, long homopolymer | 40 base - 10 GC (15%) - 5 homopolymer (5-run) = 25 | 25 | Poor |
| Optimal + high AI activity | 40 base + 10 GC (50%) + 36 AI (3.6) = 86 | 86 | Excellent |
| Optimal GC, inactive AI | 40 base + 10 GC (50%) - 50 AI (0.0) = 0 | 0 | Poor |
| Full stack (Cas13) | 40 + 10 GC + 30 AI (3.0) + 5 PFS + 5 structure = 90 | 90 | Excellent |