<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-2">
<META content="MSHTML 6.00.3790.2666" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2>Hello guys,</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>So happy to find a list interested in the future of
Web (HTML/CSS/W3 Standards).</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Until i'll get a feeling of what's happening here i
will try only to read and learn from your messages. But, i have one problem,
that i am sure you might know how to handle it (i hope this is not offtopic in
here)</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>I have a web crawler, that i am using for personal
research. It crawls the entire site, finding all the links and creating a
sitemap, and grabs some statistics. After a while i felt that i can do more then
that, so i have decided to make it parse html code and extract some statistics
about tags. For the moment i have created an array with all HTML tags
(deprecated ones to), grouped by their structure type (block, inline, single -
thats how i call them). I am parsing the HTML code using regular expressions,
but as i've searched the net, i saw lots of people saying: dont parse html using
regex.</FONT></DIV>
<DIV><FONT face=Arial size=2>I studied a bit more, then i've found the relation
between the HTML document and the DTD (Document Type Definition)
declaration. I've noticed that browsers rely on it (the ones that are
public are cached, and the custom ones are grabbed before the HTML document is
parsed).</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Can you point me out to some documentation that
explains the way a browser parses HTML documents, or the way it uses the
DTD document for interpreting the tags and their attributes.</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>Another thing that is that everyone recomended to
use an already build library, but i want to slowly learn the whole parsing
process by myself, so i can understand all the priciples.</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>Thanks a lot!</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>Best wishes,</FONT> </DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>--------------------------------------<BR><FONT
color=#00007b><B>Serban Gh. Ghita</B></FONT><BR>Project Manager<BR><BR>VERASYS
Intl.<BR><FONT color=#00007b><B>Web Dept.</B></FONT><BR>Bucuresti,
ROMANIA<BR>Tel: +40-21-201.67.62<BR>Fax:
+40-251-306.017<BR>GSM: +40-788-28.29.10<BR><B>email:</B> <A
href="mailto:serban.ghita@verasys.com">serban.ghita@verasys.com</A><BR>email: <A
href="mailto:zamolxe@php.net">zamolxe@php.net</A><BR><A
href="http://www.verasys.com">www.verasys.com</A> / <A
href="http://www.itpromo.ro">www.itpromo.ro</A> </FONT></DIV></BODY></HTML>