[whatwg] Comments on the definition of a valid e-mail address

Aryeh Gregor Simetrical+w3c at gmail.com
Mon Aug 24 06:54:09 PDT 2009

On Mon, Aug 24, 2009 at 4:36 AM, Smylers<Smylers at stripey.com> wrote:
> It's too complicated for most developers to roll their own validation,
> but there are standard libraries available which get it right.

Standard libraries available for all major languages?  As far as I can
tell from a quick search, the PHP standard library contains no e-mail
validation routines before 5.2.0 -- which isn't yet reliably available
except to the small minority of website admins with root access to
their machines.  Moreover, the e-mail validation in 5.2.0
(filter_var()) seems to be wrong -- apparently it just uses, yes, a
regex.  ("Don't use PHP" is, obviously, not a useful response here.)

If it were practical for everyone to validate strictly according to
spec on both client and server side, that would be fine.  I assume it
was felt there was good reason not to do this in HTML 5.

> Forms on websites capturing users' e-mail addresses typically want just
> the address part, prompting for the human-readable name in a separate
> box, so I think HTML 5's <input type=email> not allowing the above is
> helpful.

It might be more helpful if they stripped the part outside the angle
brackets, but I agree that it's reasonable to just reject these.

> There may actually be several categories of oddly placed dots.  While
> the address in the form you give above works it may be, say, that those
> with repeated dots in the hostname part don't work.
> On the specific case of a . immediately before the @, I've seen that
> before: this Perl library module extends an RFC-compliant module to
> allow just that; its author admits ".@" breaks the RFCs but claims such
> breakage is useful in the real world, specifically when dealing with
> e-mail addresses for Japanese mobile phones:
>  http://search.cpan.org/perldoc?Email::Valid::Loose
> That somebody has found this to be a sufficiently widespread problem
> with standard Perl e-mail address validation to write and upload a
> module which 'fixes' this (and just that; it makes no other changes)
> suggests that people will find HTML 5's <input type=email> to be
> problematic in precisely the same way.

The breakdown of the 202 is as follows.

* Single trailing dot in domain part: 100 (prohibited by RFC but
plausibly deliverable)
* Single trailing dot in local part: 40 (prohibited by RFC but
plausibly deliverable)
* Valid address in angle brackets (with other junk around it): 21
(permitted by RFC, kind of, and plausibly deliverable)
* Multiple consecutive dots: 20 (prohibited by RFC but plausibly deliverable)
* No @: 9 (unlikely to be deliverable)
* Comment: 3 (permitted by RFC and plausibly deliverable)
* Miscellaneous: 9 (one containing [NO]@[SPAM], two with trailing >,
one in "quotes", one with single leading dot in local part, two with
single leading comma in local part, one with leading ": ", one with
leading "\")

Again, this excludes ~3000 that would be valid if [ \n\t] were
stripped.  Note that almost all of the hits seem like they probably
are real working e-mail addresses that did have mail successfully sent
to them (as opposed to a few that look like they were only confirmed
by a bug).

> So it may actually be that there isn't a general problem here of lots of
> real-world e-mail addresses which work but don't comply with the RFCs;
> it may simply be the one case of ".@"?

No, that was just the example I chose because I knew that person
personally, and so was able to confirm that the address actually
worked.  I can't use my database access at Wikipedia to spam people
just to see if their addresses work, so I can't confirm any of the
others directly.

> Users often mis-type e-mail addresses.  It seems useful to be able to
> trap as many typos as possible.  Many authors obviously believe this,
> given how many employ JavaScript validators.  If HTML 5 were overly
> permissive about <input type=email> then it's likely such authors would
> continue to use homegrown JavaScript solutions, which slightly defeats
> the purpose of HTML 5 introducing <input type=email).

I agree, but if the only purpose is to catch typos, it doesn't seem
correct to completely prohibit submission.  At most, you should warn
the user.  Of course, this would be potentially complicated to do.

More information about the whatwg mailing list