Homopolymer Scoring

How SPACER detects and penalizes runs of consecutive identical nucleotides in guide RNA spacers.

What Are Homopolymers?

A homopolymer is a run of consecutive identical nucleotides within a sequence — for example, AAAA (poly-A), CCCC (poly-C), GGGG (poly-G), or TTTT/UUUU (poly-T/U). Homopolymer runs are problematic in guide RNA design for several reasons:

  • Synthesis errors: Oligo synthesis platforms have higher error rates in homopolymer regions, leading to insertions and deletions
  • Reduced specificity: Repetitive sequences are more likely to have off-target matches in complex genomes
  • Secondary structures: Poly-G runs can form G-quadruplexes; poly-C runs can form i-motifs
  • Enzyme inhibition: Certain homopolymer configurations interfere with Cas enzyme loading and activity

Detection Algorithm

SPACER scans each spacer sequence and identifies the longest run of any single nucleotide. The detection considers all four bases independently and reports the maximum run length found:

text
Spacer: A U G C C C C U A G G G U U A A C G U C
                       ↑↑↑↑           ↑↑↑
                    poly-C (4)     poly-G (3)

Longest homopolymer: 4 (CCCC)

Scoring Function

The homopolymer score is based on the length of the longest run. Short runs (≤3 nt) are considered normal and receive no penalty. Penalties increase with run length:

Longest RunScoreInterpretation
1–3 nt1.0Normal — no homopolymer concern
4 nt0.5Moderate — synthesis risk, minor activity impact
5 nt0.25Significant — likely synthesis issues, reduced activity
≥6 nt0.0Severe — strong recommendation against use

The penalty applies to the single longest run, not the sum of all runs. A spacer with two trinucleotide runs (e.g., AAA and GGG) receives no penalty, while a spacer with a single tetranucleotide run (AAAA) is penalized.

Quality Flag

When the longest homopolymer run is 4 or more nucleotides, SPACER raises the HOMOPOLYMER quality flag:

FlagConditionMeaning
HOMOPOLYMERLongest run ≥ 4 ntSpacer contains a homopolymer that may affect synthesis or activity
Info
Poly-T/U runs (TTTT or UUUU) are tracked separately by the poly-T scoring component, because they have an additional biological effect as transcription terminators. The homopolymer score captures all nucleotide types including T/U.

Base-Specific Risks

While all homopolymers are penalized equally in the scoring function, their biological risks differ:

Run TypePrimary Risk
Poly-AWeak binding (A-U pairs only have 2 hydrogen bonds); synthesis slippage
Poly-CCan form i-motif structures at slightly acidic pH; synthesis challenges
Poly-GG-quadruplex formation; strong secondary structures; highest synthesis error rate
Poly-T/UActs as Pol III terminator signal; see Poly-T Scoring for dedicated handling
Tip
Poly-G runs are particularly problematic because G-quadruplexes are exceptionally stable and can prevent Cas enzyme loading entirely. If you see a guide flagged for homopolymers, check whether the run is poly-G — these guides are the riskiest.