Composite Scoring

How SPACER calculates the overall guide quality score from additive scoring components.

Overview

Every candidate guide RNA receives a composite score between 0 and 100. This score is computed as an additive sum of individual component adjustments applied to a fixed base score. The composite score determines the guide's tier classification and its rank relative to other candidates.

Scoring Formula

The composite score is computed as a clamped additive sum of fixed-point component adjustments:

text
composite = base_score
           + gc_adjustment
           + ml_adjustment
           - homopolymer_penalty
           + pfs_adjustment
           + structure_adjustment

Result is clamped to [0, 100].

Each component contributes a fixed number of points (positive or negative) to the final score. There is no weighting or normalization — the ranges below are the actual point values applied. When optional components (AI activity, RNA structure) are not enabled, their adjustments are simply 0.

Scoring Components

The composite score is built from a base score plus adjustments from each component:

ComponentPoint RangeWhen Applied
Base score40Always (starting point for all candidates)
GC content-10 to +10Always — graduated piecewise interpolation
AI activity-50 to +40Only when AI prediction is enabled (EasyDesign for Cas12, ADAPT for Cas13)
Homopolymer0 to -10When longest run exceeds 3 consecutive identical bases
PFS-5 to +5Cas13 only — protospacer flanking sequence preference
RNA structure-10 to +10Only when structure prediction is enabled (ViennaRNA)
Poly-T does NOT affect the score
Poly-T/U stretches (4+ consecutive T or U) are detected and flagged in quality flags, but they are intentionally excluded from the composite score. Poly-T affects crRNA synthesis (premature transcription termination), not CRISPR activity. Users concerned about synthesis can filter by the has_poly_t flag separately. Some synthesis methods might tolerate poly-T without issue.

Component Details

Base Score (40 points)

All candidates start at 40 points. This provides a baseline that gives AI predictions room to differentiate guides (up to +40 headroom) while negative penalties can push poor candidates toward 0.

GC Content Adjustment (-10 to +10)

Uses graduated piecewise linear interpolation rather than a binary optimal/non-optimal check. This produces a continuous adjustment that smoothly ramps between penalty and bonus:

GC RangeAdjustmentBehavior
0–20%-10Flat penalty (extreme AT bias)
20–40%-10 to +10Linear ramp toward optimal
40–60%+10Full bonus (optimal range)
60–80%+10 to -10Linear ramp away from optimal
80–100%-10Flat penalty (extreme GC bias)

For example, a spacer with 30% GC content receives an adjustment of 0 (the midpoint of the ramp), while 50% GC receives the full +10 bonus.

AI Activity Adjustment (-50 to +40)

When AI prediction is enabled, a piecewise formula maps the predicted activity score to a point adjustment:

Activity ScoreAdjustmentInterpretation
0.0 (inactive)-50Classifier rejected — effectively disqualifies
0.5+5Weak activity
1.0+10Low-moderate activity
2.0+20Moderate activity (typical EasyDesign range)
3.0+30Good activity (typical ADAPT range)
4.0++40 (max)Strong activity (bonus capped)

The formula for active guides is round(activity × 10), clamped to +40. Inactive guides (activity = 0.0) receive a flat -50 penalty since the classifier determined they are unlikely to have any on-target effect.

AI activity vs. composite score

When AI prediction is enabled, the AI activity score and the composite score serve different purposes and should be interpreted as complementary evaluation dimensions:

  • AI activity score (0.0–4.0+) is the model's direct prediction of on-target cleavage efficacy. If your primary goal is maximizing predicted activity, sort and filter by this value. The guide with the highest AI activity score is the one the model predicts will perform best at its target site.
  • Composite score (0–100) is a holistic quality metric that incorporates AI activity alongside sequence-composition factors (GC content, homopolymers, PFS, RNA structure). It reflects overall guide quality, not just predicted activity.

Critically, the AI models do not evaluate GC content, homopolymer runs, poly-T synthesis issues, PFS preferences, or RNA secondary structure — those are captured only by the composite score. A guide with the highest AI activity may still have suboptimal GC content or problematic homopolymer runs. Use the composite score as a second-pass quality filter to identify guides that are both highly active and have favorable sequence properties.

Homopolymer Penalty (0 to -10)

Applied when the longest run of consecutive identical nucleotides exceeds 3. The penalty scales with run length:

text
penalty = min((run_length - 3) × 2.5, 10)
Run LengthPenaltyExample
1–30AAA (no penalty)
4-2AAAA
5-5AAAAA
6-7AAAAAA
7+-10 (max)AAAAAAA or longer

PFS Adjustment (Cas13 only, ±5)

For Cas13 enzymes, the Protospacer Flanking Sequence (PFS) at the 3' end of the target is evaluated:

  • +5: Favorable PFS (e.g., non-G at 3' for LwaCas13a)
  • -5: Unfavorable PFS
  • 0: PFS not evaluated (Cas12, or Cas13 without flanking information)

RNA Structure Adjustment (±10)

When RNA secondary structure prediction is enabled (via ViennaRNA), the predicted folding of the target site adjusts the score based on MFE (minimum free energy) and seed region accessibility:

  • +10: Minimal structure, high seed accessibility (favorable)
  • 0: Average structure (neutral)
  • -10: Strong structure, low seed accessibility (unfavorable)

Interpreting Scores

The composite score maps directly to the tier classification system:

Score RangeTierRecommendation
80–100ExcellentStrong candidates for experimental validation
60–79GoodViable candidates, likely to perform well
40–59FairUsable but may have one or more weaknesses
0–39PoorNot recommended without additional validation

Without AI prediction enabled, the heuristic-only score range is effectively 0–60 (base 40 ± GC ± homopolymer ± PFS ± structure). AI activity predictions extend the range to the full 0–100, allowing excellent candidates to reach 90+ while inactive guides drop below 10.

Score Examples

ScenarioCalculationScoreTier
Optimal heuristic-only40 base + 10 GC (50%) = 5050Fair
Poor GC, long homopolymer40 base - 10 GC (15%) - 5 homopolymer (5-run) = 2525Poor
Optimal + high AI activity40 base + 10 GC (50%) + 36 AI (3.6) = 8686Excellent
Optimal GC, inactive AI40 base + 10 GC (50%) - 50 AI (0.0) = 00Poor
Full stack (Cas13)40 + 10 GC + 30 AI (3.0) + 5 PFS + 5 structure = 9090Excellent
Tip
The composite score is designed for ranking and comparison, not as an absolute prediction of guide performance. A score of 85 does not guarantee 85% cleavage efficiency — it means this guide has favorable properties across the evaluated criteria relative to other candidates.