[whatwg] Hyphenation
Øistein E. Andersen
html5 at xn--istein-9xa.com
Thu Jan 11 16:15:33 PST 2007
On 11 Jan 2007, at 5:33PM, Håkon Wium Lie wrote:
> The term "hypenation dictionary" is quite common, but I see your
> point. What would be a better name for the property?
> hyphenation-pattern
> hypenation-list
> hypenation-resource
Liang's paper `Word Hy-phen-a-tion by Com-put-er', in which the concept
was first introduced, used the term `hyphenation patterns'. Unsurprisingly,
Liang's supervisor, Knuth, used the same term in the TeXbook, and this
expression seems to have become the generally accepted one amongst TeX users.
`Hyphenation dictionary' is also common, but this tends to mean something
slightly different. To exemplify, the first five lines of what I would call a
hyphenation dictionary looks like this:
> a cap·pel·la
> a for·ti·o·ri
> a go·go
> a pos·te·ri·o·ri
> a pri·o·ri
[Interestingly, this particular dictionary contains multi-word expression, but
most hyphenation engines, as well as spelling checkers, cannot take advantage of
these, as each word (according to some definition) is typically treated in isolation.]
In contrast, the first five hyphenation patterns in TeX82 are the following:
> .ach4
> .ad4der
> .af1t
> .al3t
> .am5at
It think it is useful to keep the distinction and would suggest to rename the
property in question `hyphenation-patterns'. (TeX's exception dictionary
falls within this narrower definition of a hyphenation dictionary.)
http://computing-dictionary.thefreedictionary.com/hyphenation says:
> HYPHENATION: Breaking words that extend beyond the right margin.
> Software hyphenates words by matching them against a hyphenation
> dictionary or by using a built-in set of rules, or both.
http://www.answers.com/topic/hyphenation-dictionary is more specific:
> HYPHENATION DICTIONARY: A word file with predefined hyphen locations.
http://www.computeruser.com/resources/dictionary/definition.html?lookup=2188
gives a more generic definition:
> A file, usually in a word processing or desktop publishing program,
> which defines where hyphens will be placed for common words.
Google returns about 21,200 results for /hyphenation dictionar(y|ies)/ and
148,100 for /hyphenation patterns?/, so the latter should also be fairly common.
To me, a `hyphenation list' suggests something rather like a hyphenation
dictionary, whereas `hyphenation resource' probably should be reserved
for a more comprehensive source of hyphenation information — unless
the same property is supposed to be able to refer to different kinds
of hyphenation data.
>>>> [In TeX], hyphenation can [also] be indicated locally.
>>>> This is needed in order to hyphenate words like
>>>> rec-ord/re-cord and is the only level that deals with
>>>> spelling changes.
> is probably the best way to encode this. However, it can be done
through CSS as well:
> Dont's wait for <span style="hypenation-dictionary: rec-ord.dic">record
> </span> companies, <span style="hypenation-dictionary: re-cord.dic">
> record</span> yourself.
Right, I did not get your point at first. This does indeed cover the first reason
to use explicit mark-up in TeX.
Concerning spelling changes, Petr Sojka's `Notes on Compound Word
Hyphenation in TeX' [1], section 3.2, describes how a minimally extended
version of the TeX algorithm can deal with irregular hyphenation without any
extraneous mark-up, i.e., without any unnecessary burden on the author.
Perhaps an idea for Prince7?
Anyway, the preliminary conclusion seems to be that a <hyph> element in HTML
is unnecessary, so this discussion should probably continue somewhere else.
[1] http://www.fi.muni.cz/usr/sojka/papers/tug95.pdf
--
Øistein E. Andersen
More information about the whatwg
mailing list