test_eq(add_hyphens('saippuakauppias', ()), 'saippuakauppias')
test_eq(add_hyphens('saippuakauppias', (7,)), 'saippua-kauppias')
test_eq(add_hyphens('saippuakauppias', (4, 7, 11)), f'saip-pua-kaup-pias')
test_eq(add_hyphens('', ()), '')02_hyphenation
prerequisites
First a simple function to add hyphens at given positions:
add_hyphens
add_hyphens (s:str, positions:collections.abc.Sequence[int], hyphen:str='-')
| Type | Default | Details | |
|---|---|---|---|
| s | str | word to hyphenate | |
| positions | Sequence | positions to insert hyphens (increasing order) | |
| hyphen | str | - | hyphen character |
| Returns | str | word with hyphens |
Liang hyphenation algorithm
The following function implements the Liang hyphenation algorithm, given the patterns and exceptions. For each possible hyphenation slot, we take the maximum of all weights given by the patterns, and if the maximum is odd, we insert a hyphen. TeX has parameters called \lefthyphenmin and \righthyphenmin, with default values 2 and 3 (respectively), meaning that hyphens with only one letter to their left or only one or two to their right are forbidden. The default patterns produce such hyphens so we must also filter them out.
hyphenator methods
hyphenator
hyphenator (initializer:str|pathlib.Path|collections.abc.Iterable[str]|N one, hyphen:str='-', lefthyphenmin:int=2, righthyphenmin:int=3, alphabet:str|None=None)
Hyphenates words
| Type | Default | Details | |
|---|---|---|---|
| initializer | str | pathlib.Path | collections.abc.Iterable[str] | None | filename of hyphen.tex, or an iterable of its lines, or None | |
| hyphen | str | - | |
| lefthyphenmin | int | 2 | |
| righthyphenmin | int | 3 | |
| alphabet | str | None | None | alphabet; None for ASCII default |
hyphenator.hyphenate
hyphenator.hyphenate (word:str)
hyphenator.add_exception
hyphenator.add_exception (word:str, split:tuple[str,...]|None=None)
| Type | Default | Details | |
|---|---|---|---|
| word | str | word to add, possibly with - characters to indicate hyphenation points |
|
| split | tuple[str, …] | None | None | how to split the word, or None to split at - characters |
hyphenator.rm_exception
hyphenator.rm_exception (word:str)
| Type | Details | |
|---|---|---|
| word | str | word to make unexceptional, without hyphens |
hyph = hyphenator('''
\patterns{
4m1p pu2t 5pute put3er
l1g4 lgo3 igo 2ith 4hm
hy3ph he2n hena4 hen5at ina n2at itio 2io
}
\hyphenation{
pro-gram
}
'''.splitlines(), lefthyphenmin=1, righthyphenmin=1)
assert hyph('computer') == 'com-put-er'
assert hyph('program') == 'pro-gram'
assert hyph('algorithm') == 'al-go-rithm'
assert hyph('hyphenation') == 'hy-phen-ation'prefix exceptions
Finnish tends to form compound words, and it is unseemly to hyphenate them at positions other than the borders between constituent subwords. Finnish also has a lot of declension so it would be a fool’s errand to attempt listing all forms of a single compound word. But the declension almost always happens at the end of the word, so we can add exceptions that depend only on a prefix of a word.
# mock Finnish rules made up for this example
hyph = hyphenator('''
\patterns{
l1l n1p a1s se1ma a1na ä1a
}
'''.splitlines(), lefthyphenmin=1, righthyphenmin=1, alphabet=string.ascii_letters + 'åäöÅÄÖ')
words = 'sillanpää sillanpään sillanpäästä sillanpäänä sillanpääasema sillanpääasemana sillanpäät'.split()
test_eq([hyph(w) for w in words],
['sil-lan-pää', 'sil-lan-pään', 'sil-lan-päästä', 'sil-lan-päänä',
'sil-lan-pää-a-se-ma', 'sil-lan-pää-a-se-ma-na', 'sil-lan-päät'])
hyph.add_prefix_exception('sillan-pää')
hyph.add_prefix_exception('sillan-pää-asema')
test_eq([hyph(w) for w in words],
['sillan-pää', 'sillan-pään', 'sillan-päästä', 'sillan-päänä',
'sillan-pää-asema', 'sillan-pää-asemana', 'sillan-päät'])