New homepage design of d-p-l.org is now live. <eom>

Sun Dec 18 07:09:17 PST 2011

On 17/12/2011 21:06, Michel Fortin wrote:
> On 2011-12-17 13:09:35 +0000, Stewart Gordon <smjg_1998 at yahoo.com> said:
<snip>
>> No, because in order to determine whether it's well-formed, one must know whether it's
>> meant to be in SGML-based HTML, HTML5 or XHTML.
>
> Perhaps for it matters for validation if you don't say which spec to validate against, but
> validating against a spec doesn't always reflect reality either. There is no
> SGML-based-HTML-compliant parser used by a browser out there. Browsers have two parsers:
> one for HTML and one for XML (and sometime the HTML parser behaves slightly differently in
> quirk mode, but that's not part of any spec).

But there is a subset of HTML that is likely to be parsed correctly by browsers' HTML 
parsers, and this subset is all the HTML you're likely to need to use most of the time. 
On the other hand, the interpretation of tag soup is undefined and liable to vary from 
browser to browser.  So validation certainly helps you out here.

> And whether a browser uses the HTML or the XML parser has nothing to do with the doctype
> at the top of the file: it depends on the MIME types given in the Content-Type HTTP header
> or the file extension if it is a local file. HTML 5 doesn't change that.
>
> Almost all web pages declared as XHTML out there are actually parsed using the HTML parser
> because they are served with the text/html content type and not application/xhtml+xml. A
> lot of them are not well formed XML and wouldn't be viewable anyway if parsed according to
> their doctype.

But does any pre-HTML5 spec stipulate that HTML parsers accept tag soup in the first 
place?  ISTM this is all down to a tendency of browser/engine authors to implement 
fallback for malformed HTML but not for malformed XML.

Stewart.