[whatwg] Re: [css3-speech] Proposal: an aural box model
fantasai.lists at inkedblade.net
Thu Aug 5 02:40:21 PDT 2004
> "The pause is inserted between the element's content and any
> 'cue-before' or 'cue-after' content." 
> In my opinion there is a need to think about a new kind of *aural box
> model*, which (as far as i know) has not been defined yet, to be able
> to exactly understand how 'pause' works, how generated content is to be
> added with pseudo-elements, and to realize what we are missing in the
> Currently, a 'pause' is defined as "a pause or prosodic boundary to be
> observed before (or after) speaking an element's content". A 'cue' is
> defined as a sound to be "played before and/or after the element to
> delimit it". A 'pause' "is inserted between the element's content and
> any 'cue-before' or 'cue-after' content".
> This describes a model that can be rendered visually in the following way:
> cue-before . pause-before . <element> . pause-after . cue-after
> and can be compared to the visual box model in a way that the 'cue' is
> the aural equivalent to 'border' and 'pause' is the equivalent to
> Defining an aural box does also help determining where exactly
> generated content would be added with any pseudo-element.
> The issue is that there is no aural equivalent to 'margin', i.e. there
> is no way to determine the interval of time between the 'cue-after' of
> an element and the 'cue-before' of the next element.
Are you suggesting we define pause-collapsing like margin collapsing? :)
IMO, the pause should really be outside the cue. If I'm pausing between
list elements, I would pause after the ending cue of one and before the
beginning cue of the next, and not so much between the cue and its content.
And pauses should collapse, because if I have markup like this:
I wouldn't want the pause-after of a
list item to *concatenate* with the pause-after of the list itself /and/
the pause-after of the section /and/ the pause-before of the next section
/and/ the pause-before of the title. I'd just want to pause for the maximum
of all of them. Unless, of course, one of them has a cue.
So, imho, the box model for aural css should be
where cue-padding pads the cue so it doesn't run up against the edge of the
content (the same way padding in visual CSS pads the border so it doesn't run
up against the content).
I'd like to note, btw, that the correspondence between CSS Speech and SSML
need not become an absolute 1-1 syntactic mapping. It should be possible
to process CSS Speech + Markup to output SSML, but the properties and their
behavior need not be exactly the same. To constrain CSS like that is only
going to cause trouble, because the CSS model has its own constraints (in
the mechanism of the Cascade, for example). To be elegant, CSS sometimes
requires a design slightly different from that which works passingly well
in an XML language that embeds the presentational information directly in
More information about the whatwg