GC Content Scoring
How SPACER evaluates the GC composition of candidate guide RNAs and applies penalty curves for extreme values.
Why GC Content Matters
GC content — the fraction of guanine (G) and cytosine (C) bases in a spacer — directly affects guide RNA thermodynamic stability and binding kinetics. G-C base pairs form three hydrogen bonds compared to two for A-T/A-U pairs, making GC-rich sequences bind more tightly.
- Too low (<30%): Weak target binding, poor specificity, increased off-target activity
- Too high (>70%): Overly stable secondary structures, reduced enzyme turnover, potential self-dimerization
- Optimal (40–60%): Balanced binding affinity with minimal secondary structure complications
Calculation
GC content is calculated as a simple fraction of the spacer length:
GC% = (count_G + count_C) / spacer_length × 100
Example: AUGCCUAGGCUUAACGUUCA (20 nt)
G count: 4, C count: 5
GC% = 9/20 × 100 = 45%Scoring Function
SPACER converts the raw GC percentage into a normalized score (0–1) using a trapezoidal penalty function. Guides within the ideal range receive a perfect score, with linear penalties applied as GC content deviates:
| GC Range | Score | Interpretation |
|---|---|---|
| 40–60% | 1.0 | Optimal — no penalty |
| 30–40% | 0.5–1.0 | Slightly AT-rich — minor penalty |
| 60–70% | 0.5–1.0 | Slightly GC-rich — minor penalty |
| <30% | 0.0–0.5 | Very AT-rich — significant penalty |
| >70% | 0.0–0.5 | Very GC-rich — significant penalty |
| <20% or >80% | 0.0 | Extreme — maximum penalty |
The penalty curve is symmetric around the 40–60% ideal window. The linear slope in the penalty regions ensures that small deviations from the ideal range are penalized proportionally rather than as a hard cutoff.
Quality Flags
In addition to the continuous score, SPACER raises quality flags when GC content is particularly extreme:
| Flag | Condition | Meaning |
|---|---|---|
| LOW_GC | GC% < 30% | Spacer is very AT-rich; binding may be weak |
| HIGH_GC | GC% > 70% | Spacer is very GC-rich; secondary structures likely |