[whatwg] Hyphenation

Øistein E. Andersen html5 at xn--istein-9xa.com
Thu Jan 11 16:15:33 PST 2007

On 11 Jan 2007, at 5:33PM, Håkon Wium Lie wrote:

> The term "hypenation dictionary" is quite common, but I see your
> point. What would be a better name for the property?

>  hyphenation-pattern
>  hypenation-list
>  hypenation-resource

Liang's paper `Word Hy-phen-a-tion by Com-put-er', in which the concept
was first introduced, used the term `hyphenation patterns'. Unsurprisingly,
Liang's supervisor, Knuth, used the same term in the TeXbook, and this
expression seems to have become the generally accepted one amongst TeX users.

`Hyphenation dictionary' is also common, but this tends to mean something
slightly different. To exemplify, the first five lines of what I would call a
hyphenation dictionary looks like this:
> a cap·pel·la
> a for·ti·o·ri
> a go·go
> a pos·te·ri·o·ri
> a pri·o·ri

[Interestingly, this particular dictionary contains multi-word expression, but
most hyphenation engines, as well as spelling checkers, cannot take advantage of
these, as each word (according to some definition) is typically treated in isolation.]

In contrast, the first five hyphenation patterns in TeX82 are the following:
> .ach4
> .ad4der
> .af1t
> .al3t
> .am5at

It think it is useful to keep the distinction and would suggest to rename the
property in question `hyphenation-patterns'. (TeX's exception dictionary
falls within this narrower definition of a hyphenation dictionary.)

http://computing-dictionary.thefreedictionary.com/hyphenation says:
> HYPHENATION: Breaking words that extend beyond the right margin.
> Software hyphenates words by matching them against a hyphenation
> dictionary or by using a built-in set of rules, or both.

http://www.answers.com/topic/hyphenation-dictionary is more specific:
> HYPHENATION DICTIONARY: A word file with predefined hyphen locations.

gives a more generic definition:
> A file, usually in a word processing or desktop publishing program,
> which defines where hyphens will be placed for common words.

Google returns about 21,200 results for /hyphenation dictionar(y|ies)/ and
148,100 for /hyphenation patterns?/, so the latter should also be fairly common.

To me, a `hyphenation list' suggests something rather like a hyphenation
dictionary, whereas `hyphenation resource' probably should be reserved
for a more comprehensive source of hyphenation information — unless
the same property is supposed to be able to refer to different kinds
of hyphenation data.

>>>> [In TeX], hyphenation can [also] be indicated locally.
>>>> This is needed in order to hyphenate words like
>>>> rec-ord/re-cord and is the only level that deals with
>>>> spelling changes.

> ­ is probably the best way to encode this. However, it can be done
through CSS as well:

>    Dont's wait for <span style="hypenation-dictionary: rec-ord.dic">record
>    </span> companies, <span style="hypenation-dictionary: re-cord.dic">
>    record</span> yourself.

Right, I did not get your point at first. This does indeed cover the first reason
to use explicit mark-up in TeX.

Concerning spelling changes, Petr Sojka's `Notes on Compound Word
Hyphenation in TeX' [1], section 3.2, describes how a minimally extended
version of the TeX algorithm can deal with irregular hyphenation without any
extraneous mark-up, i.e., without any unnecessary burden on the author.
Perhaps an idea for Prince7?

Anyway, the preliminary conclusion seems to be that a <hyph> element in HTML
is unnecessary, so this discussion should probably continue somewhere else.

[1] http://www.fi.muni.cz/usr/sojka/papers/tug95.pdf

Øistein E. Andersen

More information about the whatwg mailing list