# `Unicode.String.Break.Line`
[🔗](https://github.com/elixir-unicode/unicode_string/blob/v2.1.0/lib/unicode/string/break/line.ex#L1)

Single-pass line-break implementation following UAX #14.

This is a pragmatic pair-table evaluator covering the rules used in
realistic prose: the LB1 resolution of ambiguous classes, mandatory
breaks (LB4–LB6), spaces (LB7–LB8a, LB18), combining marks (LB9–LB10),
word-joiner / glue / quotation behavior (LB11–LB12a, LB19), the LB13
cluster of close/postfix punctuation, the OP/CL pair (LB14–LB17),
the LB15c/LB15d numeric-prefix carve-out, the LB20a word-initial
hyphen rule, the LB21a Hebrew-letter trailing hyphen rule,
Brahmic / numeric / alphabetic continuations (LB22–LB30b), the
Hangul rules (LB26–LB27), Regional_Indicator parity (LB30a), and
emoji-modifier (LB30b).

Trailing space-runs are tracked via a small state vector rather than
re-scanned each step, so each character costs O(1).

## Limitations and known gaps

Line breaking is by far the largest UAX #14 algorithm and ICU layers
several locale-specific tailorings on top of it. The following parts
are not currently implemented; on the conformance corpora these are
the dominant remaining failures:

* **CJK locale tailoring (loose / normal / strict).** ICU ships
  separate rule files (`line_loose_cj.txt`, `line_normal_cj.txt`,
  `line_strict_cj.txt`) that adjust break behaviour around CJK
  characters and small-kana / hyphen / iteration marks. Notably:

  - In *loose* mode `CJ` resolves to `ID` and break opportunities
    are introduced between Hiragana/Katakana characters.
  - In *normal* mode (the standard UAX default) `CJ` resolves to
    `NS`, which prevents most breaks within Japanese.
  - ID × HY in CJK contexts is permitted to break to support
    Japanese hyphen usage like `あ‐1`.

  This module currently implements only the standard mode (`CJ →
  NS`). Several Japanese-locale cases in ICU's `rbbitst.txt`
  expect loose-mode behaviour and therefore differ.

* **LB15a / LB15b (Pi / Pf quotation).** Initial-quote and
  final-quote sub-classes of `QU` are treated as plain `QU`. The
  east-asian-width-aware variants in LB15a/15b are approximated.

* **LB28a (Brahmic clusters).** Indic conjunct clusters
  (`AK`/`AP`/`AS`/`VI`/`VF`) follow the default break rules rather
  than the Brahmic-specific cluster handling.

* **LB30 East-Asian-width sensitivity.** LB30 should distinguish
  between F/W/H and other East-Asian-width values when deciding
  whether `(AL|HL|NU) × OP` and `CP × (AL|HL|NU)` apply. This
  implementation applies the rule uniformly.

These gaps are tracked by the line-break conformance regression
tests in `test/line_break_conformance_test.exs`.

## State

* `effective_prev` — the previous non-CM/non-ZWJ class, after LB1
  resolution and LB9 (combining marks taking the class of their base).
* `prev_actual` — the immediately previous class, for LB5 (CR×LF).
* `space_run` — `:none`, `:after_op`, `:after_qu`, `:after_cl`,
  `:after_b2`, or `:after_zw`. Tracks the `X SP*` patterns required
  by LB14, LB15, LB16, LB17, and LB8.
* `ri_parity` — `:odd` / `:even` for LB30a.

# `break?`

```elixir
@spec break?(String.t(), String.t()) :: boolean()
```

Boundary predicate for a `{before, after}` pair.

# `next`

```elixir
@spec next(String.t()) :: {String.t(), String.t()} | nil
```

Returns `{first_segment, rest}` or `nil` for the empty string.

# `split`

```elixir
@spec split(String.t()) :: [String.t()]
```

Splits `string` into line-break segments.

---

*Consult [api-reference.md](api-reference.md) for complete listing*