[whatwg] Interpretation issue: can <section> be used for "extended paragraphs"?

Jukka K. Korpela jkorpela at cs.tut.fi
Wed Jun 15 00:09:12 PDT 2011


2011-06-14 10:32, Ian Hickson wrote:

> On Thu, 10 Mar 2011, Jukka K. Korpela wrote:
[...]
>> A sentence in the text may continue with list items, displayed e.g. as a
>> bulleted list. So the list breaks the paragraph as a block of text but
>> not logically - the list items are part of the sentence just as they
>> would be if they were just mentioned in the text, for example using 1)
>> numbers in the text, 2) letters in the text, or 3) no special notation.
>
> Indeed, but "block of text" is pretty much what a paragraph is -- it isn't
> a logical construct.

The word "paragraph" is ambiguous, as he current wording says indirectly 
but clearly: It defines "The p element represents a paragraph", but the 
word "paragraph" links to the following:

"The term paragraph as defined in this section is distinct from (though 
related to) the p element defined later. The paragraph concept defined 
here is used to describe how to interpret documents.

A paragraph is typically a run of phrasing content that forms a block of 
text with one or more sentences that discuss a particular topic, as in 
typography, but can also be used for more general thematic grouping. For 
instance, an address is also a paragraph, as is a part of a form, a 
byline, or a stanza in a poem."

So it says that p is a paragraph, linking to an explanation that says 
paragraph is different from p. The explanation mentions "the term 
paragraph as defined in this section", but it does not give a definition 
- a sentence that begins with "A paragraph is typically" is a prelude to 
a definition, not a definition.

I gather that "The p element represents a paragraph" more or less means 
"the p element denotes a block of text". Can you make this more 
explicit, please? This is very confusing even to people who regularly 
read specifications for breakfast. In the current wordings, there are 
_two_ dimensions of vagueness of "paragraph": whether it is the 
classical concept of text that discusses one topic or the layout concept 
roughly corresponding to the old HTML block concept; and whether it is 
about explicitly marked-up elements (<p>...</p>) or more generally about 
constructs whose "paragraphness" might be inferred by some rules.

It would probably be best to dispense with the word "paragraph", as many 
people can't avoid thinking that paragraph is something logical, not the 
layout concept of a block of text. Nut unfortunately, in HTML heritage, 
the p element is not _purely_ a block of text. In addition to the name 
and old descriptions, it associates with the logical concept of 
paragraph, since p elements have default top and bottom margins. So they 
differ from div elements. A div element containing only text can be 
characterized as a block of text, and so can a p element, but there's a 
difference.

Maybe something like the following might express this:

The p element represents a block of text so that consecutive p elements 
are regarded as topically distinct. The name p comes from "paragraph", 
and the p element typically corresponds to a paragraph in prose, i.e. a 
subdivision of text that deals with one point or gives the words of one 
speaker in a discussion. However, the p element is also used for other 
thematic grouping, for example for a byline, a mailing address, for a 
label and associated field in a form, for a byline, or for a stanza in a 
poem.

Visual browsers are expected to render p elements by default with empty 
lines before and after, caused by default top and bottom margin.

>> a) for styling purposes (you need a container element so that you can
>> specify, without clumsily using classes on both the P and the UL, e.g.
>> that vertical spacing be reduced or zero)
>
> <div>  handles this case:
>
>     <div>This sentence
>      <ol>
>       <li>contains
>       <li>a list
>      </ol>
>     ...and is made of four paragraphs but can be styled as one since the
>     <div>  element is used instead of<p>.</div>

But if this follows, for example, a table, then extra measures would be 
needed to create vertical spacing. Using the p element would make the 
spacing the default. Similarly for spacing after this construct. So it 
would be more robust to use <p>...</p> markup here. Or you would need to 
assign style properties to the div element, effectively making it 
formatted the same way as p elements normally are, in your document.

I don't think anonymous blocks of text are a good idea. There was a 
reason why they were frowned upon in HTML 4.01. After years of favoring 
<p>...</p> as a container, as opposite to the original idea that allowed 
<p> as an empty element indicating paragraph break, it seems odd to give 
so many examples with "loose" text.

So I hope an example like the above but with <p>...</p> markup can be 
added, to answer the common question (which is often formulated in terms 
of a "list header", but it's really about something that starts as a 
paragraph and then moves to listing things down as a bulleted list).

Maybe an explanation like this might be added (perhaps even after the 
definition of p, as it really clarifies the concept):

Within a p element, only phrasing content (previously called 
"text-level" content) is allowed. This implies that it cannot contain a 
list element or a table element for example. A part of a document that 
discusses one topic is normally marked up as one p element, but if it 
contains lists for example, it needs to be marked up as one more p 
elements intermixed with (not containing) one or more lists. The part 
may be marked up as a div element to group the elements for the purposes 
of styling and scripting, for example

<div class="p">
<p>This is text, which may be just list header (introduction to
the list) or a longer presentation.
<ul>
   <li>an item</li>
   <li>another item</li>
</ul>
<p>Here we may have text that logically continues the discussion
of the topic.</p>
</div>

* * *

I know this suggestion is long and raw, but I hope its basic content is 
something we can agree on. And I have no big problem with using div 
markup here, even though it somewhat goes against the spirit of modern HTML.

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/



More information about the whatwg mailing list