# `Unicode.String.Break.Sentence`
[🔗](https://github.com/elixir-unicode/unicode_string/blob/v2.1.0/lib/unicode/string/break/sentence.ex#L1)

Single-pass DFA-style implementation of UAX #29 sentence break with
locale-specific class extensions and abbreviation suppressions.

## Background

The sentence-break algorithm differs from grapheme/word break in two
important ways:

* The **default** rule is *no break* (rule SB998 `× Any`). Sentence
  boundaries are emitted only by SB4 (`ParaSep ÷`) and by SB11
  (`SATerm Close* Sp* ParaSep? ÷`), in the absence of an earlier
  suppressing rule.

* SB8 has unbounded forward look-ahead — at an `ATerm Close* Sp*` it
  suppresses the break if a `Lower` letter is reached before any of
  `OLetter | Upper | Lower | ParaSep | SATerm`.

## Locale-specific class extensions

Some locales extend the standard Sentence_Break property classes.
CLDR's `el.xml`, for example, extends `$STerm` to include U+003B
(ASCII semicolon) so that Greek text like "γδ; Ε" breaks at the
semicolon. The walker accepts a `locale` argument and applies these
per-locale overrides via `classify/2`.

## State

The walker carries:

* `prev_actual` — the property of the immediately-previous codepoint
  (without the SB5 transparent skip). Needed for SB3 (`CR × LF`).

* `effective_prev` — the property of the previous *non-transparent*
  codepoint (Extend/Format are skipped per SB5).

* `before_aterm` — the effective property *before* the most recent
  `ATerm`, used by SB7 (`(Upper|Lower) ATerm × Upper`).

* `phase` — encodes how far we are through a potential
  sentence-terminating sequence `(SA)Term Close* Sp* ParaSep?`.

## Suppressions

Locale-specific suppressions (e.g. "Mr.", "Dr.") are applied as a
post-pass: when SB11 would fire after an ATerm-led sequence, the
walker compares the trailing fragment of the segment against the
suppression set and cancels the break on a longest-match.

# `break?`

```elixir
@spec break?(String.t(), String.t(), atom() | binary(), MapSet.t()) :: boolean()
```

Boundary predicate. Returns `true` if there is a sentence break
between `string_before` and `string_after`.

When suppressing, the suppression check matches the trailing word
of `string_before`.

# `next`

```elixir
@spec next(String.t(), atom() | binary(), MapSet.t()) ::
  {String.t(), String.t()} | nil
```

Returns `{first_sentence, rest}` for `string`, or `nil` for empty input.

# `split`

```elixir
@spec split(String.t(), atom() | binary(), MapSet.t()) :: [String.t()]
```

Splits `string` into sentences.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
