01_pattern

Hyphenation patterns

TeX patterns look like 2a1ly4. There are letters and numbers, and the numbers denote weights that fall between the letters, also before the first letter and after the last letter:

a l y
2 1 0 4

Missing numbers mean zero.


source

_cvt

 _cvt (pattern:str)
Type Details
pattern str pattern as read from the TeX patterns file
Returns tuple position i has the weight of the slot before character i

The following function turns many patterns into one trie.


source

convert_patterns

 convert_patterns (patterns:collections.abc.Iterable[str])
Type Details
patterns Iterable TeX style patterns
Returns Trie trie mapping matched substrings to weights
t = convert_patterns('''4m1p pu2t 5pute put3er
l1g4 lgo3 igo 2ith 4hm
hy3ph he2n hena4 hen5at ina n2at itio 2io'''.split())
test_eq(t.prefix_items('puter'), 
       [('put', (0, 0, 2, 0)),
        ('pute', (5, 0, 0, 0, 0)),
        ('puter', (0, 0, 0, 3, 0, 0))])

TeX exceptions are simply words with hyphens where hyphenation should happen.


source

convert_exceptions

 convert_exceptions (exceptions:collections.abc.Iterable[str])
Type Details
exceptions Iterable
Returns Mapping mapping from word to word parts
assert convert_exceptions(['saippua-kauppias', 'xyzzy']) == {
    'saippuakauppias': ('saippua', 'kauppias'), 
    'xyzzy': ('xyzzy',)
}