[whatwg] several messages about HTML5 -- authors' tools
ddailey at zoominternet.net
Thu Feb 22 05:14:25 PST 2007
Interesting thread (including various sub-ravels thereof).
Suppose in a semantically charged, but markup-impoverished medium such as
the textual narrative (constituting the majority of the content of the web
as we know it), we seek to build the word processor that generates not only
the surface structure (the sentences and paragraphs) but the semantic
structure as well. How do we minimize the author's effort? Authors will not
want to write both their utterances and the translation of those utterances
into semantic tags -- it is simply too labor intensive (unless we care more
about form than substance and chose to purge ill-formed ideas from the human
Rather we may seek a word-processor that deduces semantics from authors'
expressions. Yeah, without the existence of full-blown AI (which has been a
while in coming, now), 40% (or so) of such deductions will be incorrect. But
suppose following the creation of a sentence, or a paragraph, or a larger
chunk of text, semantically enabled software were to pose to the author a
finite (and hopefully small) number of "deductive disambiguators"?
"Dear author, did you mean to imply that AI will indeed, be arriving soon?"
"Dear author, who exactly does 'we' refer to in the above paragraph -- I'm
sorry, but I see no people mentioned before?"
Using a relatively simply inference engine (SFOL + set theory + predicate
calculus + arithmetic + time + causality + modal logic) coupled with
thesauri and parsers (all available client-side these days), and (most
importantly) the author's expert intervention, I rather suspect that the
40% (incorrect deductions) could be brought down to 8% with an additional
cost of 20% in authorial time investment. With current software that most
folks use, and requiring authors to generate their own semantics, I think we
might expect to achieve 5% spurious deduction with 400% additional
investment of authors' time. The cost-benefit ratio is just too high with
current desktop tools.
In semantically impoverished (not in the evocative space it engenders, but
in the surface expression of its utterance) but markup-rich environments
such as SVG, the generation of a parallel semantic substrate is going to be
a lot more difficult, but maybe that's why we have things like sXBL: to
allow semantics to be imported from other disciplines.
That's one approach. Another is to build a semantic expression system for
which we abandon our native languages and agree to write in a semantic
shorthand (with lots of parentheses, by the way). For even one language, the
task of finding a minimal set of semantic primitives (from its monolingual
dictionary) is NP-complete, but if we seek such a shorthand to span the
space of human semantics, it may take longer to bring into existence than AI
itself . The different language families I have looked at probably share a
core semantics of only about 20% of the expressive space of any one language
by itself. The nice thing about such languages is that people from different
linguistic backgrounds can all read the same text; the hassle is that it's
hard to translate ordinary expressions into such languages.
----- Original Message -----
From: "Elliotte Harold" <elharo at metalab.unc.edu>
To: "Ian Hickson" <ian at hixie.ch>
Cc: <whatwg at lists.whatwg.org>; "Vlad Alexander (xhtml.com)"
<vlad.alexander at xhtml.com>
Sent: Wednesday, February 21, 2007 4:34 PM
Subject: Re: [whatwg] several messages about HTML5
> Ian Hickson wrote:
>> The original reason I got involved in this work is that I realised that
>> the human race has written literally billions of electronic documents,
>> but without ever actually saying how they should be processed.
> That's a feature, not a bug.
>> If, in a thousand years, someone found a trove of HTML documents and
>> decided they
>> would right an HTML browser to view them, they couldn't do it! Even with
>> the existing HTML specs -- HTML4, SGML, DOM2 HTML, etc -- a perfect
>> implementation couldn't render the vast majority of documents as they
>> were originally intended.
> Authorial intent is a myth. Documents don't have to be rendered like the
> author intended, nor should we expect them to be. We don't read Homer
> like Homer intended, but we still read him, well more than a thousand
> years later. (For one thing Homer actually intended that people listen to
> the poems, not read them.)
> This is not to say that I don't think it's useful to define a standard
> tree structure for documents. It is useful. However the benefit of this
> exercise is not in maintaining authorial intent. That's tilting at
> windmills, and will never succeed no matter what we do.
> Elliotte Rusty Harold elharo at metalab.unc.edu
> Java I/O 2nd Edition Just Published!
More information about the whatwg