t = convert_patterns('''4m1p pu2t 5pute put3er
l1g4 lgo3 igo 2ith 4hm
hy3ph he2n hena4 hen5at ina n2at itio 2io'''.split())
test_eq(t.prefix_items('puter'),
[('put', (0, 0, 2, 0)),
('pute', (5, 0, 0, 0, 0)),
('puter', (0, 0, 0, 3, 0, 0))])01_pattern
Hyphenation patterns
TeX patterns look like 2a1ly4. There are letters and numbers, and the numbers denote weights that fall between the letters, also before the first letter and after the last letter:
| a | l | y | ||||
| 2 | 1 | 0 | 4 |
Missing numbers mean zero.
_cvt
_cvt (pattern:str)
| Type | Details | |
|---|---|---|
| pattern | str | pattern as read from the TeX patterns file |
| Returns | tuple | position i has the weight of the slot before character i |
The following function turns many patterns into one trie.
convert_patterns
convert_patterns (patterns:collections.abc.Iterable[str])
| Type | Details | |
|---|---|---|
| patterns | Iterable | TeX style patterns |
| Returns | Trie | trie mapping matched substrings to weights |
TeX exceptions are simply words with hyphens where hyphenation should happen.
convert_exceptions
convert_exceptions (exceptions:collections.abc.Iterable[str])
| Type | Details | |
|---|---|---|
| exceptions | Iterable | |
| Returns | Mapping | mapping from word to word parts |
assert convert_exceptions(['saippua-kauppias', 'xyzzy']) == {
'saippuakauppias': ('saippua', 'kauppias'),
'xyzzy': ('xyzzy',)
}