Composite Scoring

How SPACER calculates the overall guide quality score from additive scoring components.

Overview

Every candidate guide RNA receives a composite score between 0 and 100. This score is computed as an additive sum of individual component adjustments applied to a fixed base score. The composite score determines the guide's tier classification and its rank relative to other candidates.

Scoring Formula

The composite score is computed as a clamped additive sum of fixed-point component adjustments:

text

composite = base_score
           + gc_adjustment
           + ml_adjustment
           - homopolymer_penalty
           + pfs_adjustment
           + structure_adjustment

Result is clamped to [0, 100].

Each component contributes a fixed number of points (positive or negative) to the final score. There is no weighting or normalization — the ranges below are the actual point values applied. When optional components (AI activity, RNA structure) are not enabled, their adjustments are simply 0.

Scoring Components

The composite score is built from a base score plus adjustments from each component:

Component	Point Range	When Applied
Base score	40	Always (starting point for all candidates)
GC content	-10 to +10	Always — graduated piecewise interpolation
AI activity	-50 to +40	Only when AI prediction is enabled (EasyDesign for Cas12, ADAPT for Cas13)
Homopolymer	0 to -10	When longest run exceeds 3 consecutive identical bases
PFS	-5 to +5	Cas13 only — protospacer flanking sequence preference
RNA structure	-10 to +10	Only when structure prediction is enabled (ViennaRNA)

Poly-T does NOT affect the score

Poly-T/U stretches (4+ consecutive T or U) are detected and flagged in quality flags, but they are intentionally excluded from the composite score. Poly-T affects crRNA synthesis (premature transcription termination), not CRISPR activity. Users concerned about synthesis can filter by the has_poly_t flag separately. Some synthesis methods might tolerate poly-T without issue.

Component Details

Base Score (40 points)

All candidates start at 40 points. This provides a baseline that gives AI predictions room to differentiate guides (up to +40 headroom) while negative penalties can push poor candidates toward 0.

GC Content Adjustment (-10 to +10)

Uses graduated piecewise linear interpolation rather than a binary optimal/non-optimal check. This produces a continuous adjustment that smoothly ramps between penalty and bonus:

GC Range	Adjustment	Behavior
0–20%	-10	Flat penalty (extreme AT bias)
20–40%	-10 to +10	Linear ramp toward optimal
40–60%	+10	Full bonus (optimal range)
60–80%	+10 to -10	Linear ramp away from optimal
80–100%	-10	Flat penalty (extreme GC bias)

For example, a spacer with 30% GC content receives an adjustment of 0 (the midpoint of the ramp), while 50% GC receives the full +10 bonus.

AI Activity Adjustment (-50 to +40)

When AI prediction is enabled, a piecewise formula maps the predicted activity score to a point adjustment:

Activity Score	Adjustment	Interpretation
0.0 (inactive)	-50	Classifier rejected — effectively disqualifies
0.5	+5	Weak activity
1.0	+10	Low-moderate activity
2.0	+20	Moderate activity (typical EasyDesign range)
3.0	+30	Good activity (typical ADAPT range)
4.0+	+40 (max)	Strong activity (bonus capped)

The formula for active guides is round(activity × 10), clamped to +40. Inactive guides (activity = 0.0) receive a flat -50 penalty since the classifier determined they are unlikely to have any on-target effect.

AI activity vs. composite score

When AI prediction is enabled, the AI activity score and the composite score serve different purposes and should be interpreted as complementary evaluation dimensions:

AI activity score (0.0–4.0+) is the model's direct prediction of on-target cleavage efficacy. If your primary goal is maximizing predicted activity, sort and filter by this value. The guide with the highest AI activity score is the one the model predicts will perform best at its target site.
Composite score (0–100) is a holistic quality metric that incorporates AI activity alongside sequence-composition factors (GC content, homopolymers, PFS, RNA structure). It reflects overall guide quality, not just predicted activity.

Critically, the AI models do not evaluate GC content, homopolymer runs, poly-T synthesis issues, PFS preferences, or RNA secondary structure — those are captured only by the composite score. A guide with the highest AI activity may still have suboptimal GC content or problematic homopolymer runs. Use the composite score as a second-pass quality filter to identify guides that are both highly active and have favorable sequence properties.

Homopolymer Penalty (0 to -10)

Applied when the longest run of consecutive identical nucleotides exceeds 3. The penalty scales with run length:

text

penalty = min((run_length - 3) × 2.5, 10)

Run Length	Penalty	Example
1–3	0	AAA (no penalty)
4	-2	AAAA
5	-5	AAAAA
6	-7	AAAAAA
7+	-10 (max)	AAAAAAA or longer

PFS Adjustment (Cas13 only, ±5)

For Cas13 enzymes, the Protospacer Flanking Sequence (PFS) at the 3' end of the target is evaluated:

+5: Favorable PFS (e.g., non-G at 3' for LwaCas13a)
-5: Unfavorable PFS
0: PFS not evaluated (Cas12, or Cas13 without flanking information)

RNA Structure Adjustment (±10)

When RNA secondary structure prediction is enabled (via ViennaRNA), the predicted folding of the target site adjusts the score based on MFE (minimum free energy) and seed region accessibility:

+10: Minimal structure, high seed accessibility (favorable)
0: Average structure (neutral)
-10: Strong structure, low seed accessibility (unfavorable)

Interpreting Scores

The composite score maps directly to the tier classification system:

Score Range	Tier	Recommendation
80–100	Excellent	Strong candidates for experimental validation
60–79	Good	Viable candidates, likely to perform well
40–59	Fair	Usable but may have one or more weaknesses
0–39	Poor	Not recommended without additional validation

Without AI prediction enabled, the heuristic-only score range is effectively 0–60 (base 40 ± GC ± homopolymer ± PFS ± structure). AI activity predictions extend the range to the full 0–100, allowing excellent candidates to reach 90+ while inactive guides drop below 10.

Score Examples

Scenario	Calculation	Score	Tier
Optimal heuristic-only	40 base + 10 GC (50%) = 50	50	Fair
Poor GC, long homopolymer	40 base - 10 GC (15%) - 5 homopolymer (5-run) = 25	25	Poor
Optimal + high AI activity	40 base + 10 GC (50%) + 36 AI (3.6) = 86	86	Excellent
Optimal GC, inactive AI	40 base + 10 GC (50%) - 50 AI (0.0) = 0	0	Poor
Full stack (Cas13)	40 + 10 GC + 30 AI (3.0) + 5 PFS + 5 structure = 90	90	Excellent

Tip

The composite score is designed for ranking and comparison, not as an absolute prediction of guide performance. A score of 85 does not guarantee 85% cleavage efficiency — it means this guide has favorable properties across the evaluated criteria relative to other candidates.

Search documentation