[Imps] Reasonable limits on buffered values

Wed Jul 4 01:55:46 PDT 2007

On Thu, 28 Dec 2006, Henri Sivonen wrote:
>
> My primary strategy against denial of service attacks that target the 
> conformance checking service is to limit the number of bytes accepted as 
> input. This indirectly throttles everything that is proportional to the 
> size of input, which is OK for most stuff that has linear growth 
> behavior. (It doesn't address things like the billion laughs attack, 
> though.)
> 
> I have additionally placed arbitrary hard limits on the size of 
> particular buffers.

I recommend a simpler and broader strategy: limit the total CPU and memory 
usage of the process. After a certain level of CPU or memory usage, 
possibly monitored by a separate, higher priority thread, simply terminate 
the algorithm and explain that the system cannot handle the given 
document.

> I'm wondering if there's a best practice here. Is there data on how long 
> non-malicious attribute values legitimately appear on the Web?

I have seen (and created) multimegabyte attribute values. (Typically, 
data: URIs of one kind or another, but not always.)

> At least there can be only one attribute buffer being filled at a time. 
> Buffering of the textContent of <progress> and friends is potentially 
> worse than an attribute buffer, because you could use the leading 1 MB 
> of bytes to establish <progress> start tags (each creating a buffer for 
> content) and then use the trailing 1 MB to fill those buffers 
> simultaneously. Perhaps I should worry about those buffers instead. What 
> might be a reasonable strategy for securing those (short of writing the 
> associated algorithms as automata that don't need buffers)?

In that kind of case, I would recommend having one buffer for all decoded 
"text", and then having all text nodes and text buffers refer to start and 
end points in that buffer. This is also remarkably cheap in both CPU and 
memory; you only have to pay the cost of a single copy of the text 
content, regardless of the complexity of the data. It is also basically no 
overhead compared to having individual buffers, since you are still 
passing around strings. For mutable cases (e.g. to support scripting), you 
can use a copy-on-write scheme.

> Is there data on haw large legitimate HTML documents appear on the Web? 
> The current limit of 2 MB is based on rounding the size of the Web Apps 
> spec up.

I have seen infinitely long documents. Discounting those, I have seen 
documents of tens and hundrends of megabytes.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'