[whatwg] Sentence structure

Ian Hickson ian at hixie.ch
Fri Jan 11 14:59:41 PST 2013


On Sat, 12 Jan 2013, Vipul S. Chawathe wrote:
> 
> I'm doing some related work that requires machine translation on the lines
> of export/import HTML snippets. Human language content boundaries are
> directly determined by author's grammatical punctuation skills at the
> sentence level.

Sure, but if the author isn't competent enough to use punctuation, I think 
we're probably not going to be able to rely on them using <sentence> 
correctly either, at the end of the day.


> HTML is everything to-do tied-up with GUI web-browsers, so machine 
> translation, screen readers, & so forth are supported through other 
> "living" standards GRDDL XSLT RDFa that also work with HTML as one of 
> multiple possible host, however their relationship with XML 
> serialization as dependency for proper functioning might cause browser 
> engine makers to promote sticking to microdata, unless someday we get 
> Google SilverFlash.java Safari plug-in so that one size will fit all. As 
> HTML is host language in wide-spread use (my apologies for lacking 
> statistics that I compensate by deriving statements from common sense), 
> perhaps this is starting point for raising concerns that may be 
> redirected into other specs too. It's the only opening for those rare 
> use cases as the story of Emperor's New Clothes.
> Getting back to business, for larger content fragments there's the p 
> element. An immediate citation is search results cut-off abrupt 
> fragments in content preview. For improvising on such fragment indices 
> they've come up with schema.org vocab which I just had to remind here. 
> They've got provision to specialize from their general pre-defined 
> types, so Thing>WebPageElement can be used to get 
> Thing>WebPageElement>Paragraph>Sentence This can be expressed using 
> html5 microdata itemtype attribute as: <span itemscope="itemscope" 
> itemtype="http://www.schema.org/thing/webpage/webpageelement/paragraph/sente 
> nce">One whole sentence!</span> HTML5 without XML serialization will 
> allow to skip ="itemscope" too! saves 12 characters, savings comparable 
> to those recommended by minifying. :-)

I'm sorry, but I've no idea what you're saying here.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



More information about the whatwg mailing list