02_hyphenation

Hyphenator

prerequisites

First a simple function to add hyphens at given positions:


source

add_hyphens

 add_hyphens (s:str, positions:collections.abc.Sequence[int],
              hyphen:str='-')
Type Default Details
s str word to hyphenate
positions Sequence positions to insert hyphens (increasing order)
hyphen str - hyphen character
Returns str word with hyphens
test_eq(add_hyphens('saippuakauppias', ()), 'saippuakauppias')
test_eq(add_hyphens('saippuakauppias', (7,)), 'saippua-kauppias')
test_eq(add_hyphens('saippuakauppias', (4, 7, 11)), f'saip-pua-kaup-pias')
test_eq(add_hyphens('', ()), '')

Liang hyphenation algorithm

The following function implements the Liang hyphenation algorithm, given the patterns and exceptions. For each possible hyphenation slot, we take the maximum of all weights given by the patterns, and if the maximum is odd, we insert a hyphen. TeX has parameters called \lefthyphenmin and \righthyphenmin, with default values 2 and 3 (respectively), meaning that hyphens with only one letter to their left or only one or two to their right are forbidden. The default patterns produce such hyphens so we must also filter them out.

hyphenator methods


source

hyphenator

 hyphenator
             (initializer:str|pathlib.Path|collections.abc.Iterable[str]|N
             one, hyphen:str='-', lefthyphenmin:int=2,
             righthyphenmin:int=3, alphabet:str|None=None)

Hyphenates words

Type Default Details
initializer str | pathlib.Path | collections.abc.Iterable[str] | None filename of hyphen.tex, or an iterable of its lines, or None
hyphen str -
lefthyphenmin int 2
righthyphenmin int 3
alphabet str | None None alphabet; None for ASCII default

source

hyphenator.hyphenate

 hyphenator.hyphenate (word:str)

source

hyphenator.add_exception

 hyphenator.add_exception (word:str, split:tuple[str,...]|None=None)
Type Default Details
word str word to add, possibly with - characters to indicate hyphenation points
split tuple[str, …] | None None how to split the word, or None to split at - characters

source

hyphenator.rm_exception

 hyphenator.rm_exception (word:str)
Type Details
word str word to make unexceptional, without hyphens
hyph = hyphenator('''
\patterns{
4m1p pu2t 5pute put3er
l1g4 lgo3 igo 2ith 4hm
hy3ph he2n hena4 hen5at ina n2at itio 2io
}
\hyphenation{
pro-gram
}
'''.splitlines(), lefthyphenmin=1, righthyphenmin=1)

assert hyph('computer') == 'com-put-er'
assert hyph('program') == 'pro-gram'
assert hyph('algorithm') == 'al-go-rithm'
assert hyph('hyphenation') == 'hy-phen-ation'

prefix exceptions

Finnish tends to form compound words, and it is unseemly to hyphenate them at positions other than the borders between constituent subwords. Finnish also has a lot of declension so it would be a fool’s errand to attempt listing all forms of a single compound word. But the declension almost always happens at the end of the word, so we can add exceptions that depend only on a prefix of a word.

# mock Finnish rules made up for this example
hyph = hyphenator('''
\patterns{
l1l n1p a1s se1ma a1na ä1a
}
'''.splitlines(), lefthyphenmin=1, righthyphenmin=1, alphabet=string.ascii_letters + 'åäöÅÄÖ')

words = 'sillanpää sillanpään sillanpäästä sillanpäänä sillanpääasema sillanpääasemana sillanpäät'.split()
test_eq([hyph(w) for w in words], 
        ['sil-lan-pää', 'sil-lan-pään', 'sil-lan-päästä', 'sil-lan-päänä',
         'sil-lan-pää-a-se-ma', 'sil-lan-pää-a-se-ma-na', 'sil-lan-päät'])
hyph.add_prefix_exception('sillan-pää')
hyph.add_prefix_exception('sillan-pää-asema')
test_eq([hyph(w) for w in words], 
        ['sillan-pää', 'sillan-pään', 'sillan-päästä', 'sillan-päänä',
         'sillan-pää-asema', 'sillan-pää-asemana', 'sillan-päät'])