<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
<META NAME="GENERATOR" CONTENT="GtkHTML/3.26.3">
On Thu, 2010-06-24 at 09:01 -0700, Tab Atkins Jr. wrote:
On Thu, Jun 24, 2010 at 8:20 AM, Benjamin M. Schwartz
<<A HREF="mailto:firstname.lastname@example.org">email@example.com</A>> wrote:
> On 06/24/2010 11:04 AM, Kornel Lesinski wrote:
>> If you mean "parsing" with regular expressions, then I think that's a bad practice and shouldn't be encouraged.
> Worldwide, regarding HTML, I'm sure there is 100 times more regular
> expression processing code than full-on lexing code. Most code that
> processes HTML is embedded in scripts, doing some small special-purpose
> operation. Those regular expressions aren't going away. Helping them
> break less is a noble cause.
Actually, if we could make regex-based "parsing" break more, it would
probably be a net positive for the world. Regexes are the source of
so many holes in "validation"-type scripts.
In any case, XML doesn't require > to be escaped in attribute values,
and HTML doesn't appear to either. In practice, > is used in
attribute values, so declaring it verboten wouldn't be helpful.
Just to point out, regex's aren't the problem, and people who are blaming the issue on regular expressions are as bad as the people writing the dodgy regex's. The problem is just badly written expressions, not the tool itself. The same arguments are put forward by people when regular expressions are suggested as a means to validate email addresses. It's possible to do, but some people who write them don't really think about the problem.<BR>
<TABLE CELLSPACING="0" CELLPADDING="0" WIDTH="100%">