Skip to content

[security][LOW] guard injection regexes don't fold homoglyphs (Cyrillic look-alikes) #30

Description

@m1ngshum

Severity: LOW

Location

src/guard/patterns.ts:124 (PATTERN_BREAKERS) + signature matching

Summary

The normalization pipeline (NFKC + zero-width/bidi strip) correctly folds full-width Latin and removes invisible separators, but it does not map confusable homoglyphs — e.g. Cyrillic е (U+0435) for Latin e, Greek ο for Latin o. NFKC does not fold these, so ignоre previous instructions (with a Cyrillic о) evades the injection signatures while remaining visually identical to a human/LLM reader.

Recommended fix

Add a confusable-fold step (Unicode confusables.txt / skeleton mapping, or a small targeted Latin-lookalike table) before regex matching. Scope can be limited to the script ranges most used for evasion (Cyrillic/Greek → Latin).

Acceptance criteria

  • An injection phrase using Cyrillic/Greek look-alikes is detected.
  • Fixture added to the guard eval suite.

Filed from a repo security review.
https://claude.ai/code/session_01XX9sT7kYs1ctQyY2SBg87t

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions