HTML Versus XHTML

XHTML is a stricter, more XML-based version of HTML. HTML and XHTML are both languages in which web pages are written. HTML is SGML-based while XHTML is XML-based. They are like two sides of the same coin. XHTML was derived from HTML to conform to XML standards.

What is XHTML?

  • XHTML stands for EXtensible HyperText Markup Language
  • XHTML is a stricter, more XML-based version of HTML
  • XHTML is HTML defined as an XML application
  • XHTML is supported by all major browsers

Why XHTML?

XML is a markup language where all documents must be marked up correctly (be “well-formed”) XHTML was developed to make HTML more extensible and flexible to work with other data formats (such as XML). In addition, browsers ignore errors in HTML pages, and try to display the website even if it has some errors in the markup. So XHTML comes with a much stricter error handling.

Comparison chart

HTMLXHTML
Introduction (from Wikipedia)HTML or HyperText Markup Language is the main markup language for creating web pages and other information that can be displayed in a web browser.XHTML (Extensible HyperText Markup Language) is a family of XML markup languages that mirror or extend versions of the widely used Hypertext Markup Language (HTML), the language in which web pages are written.
Filename extension.html, .htm.xhtml, .xht, .xml, .html, .htm
Internet media typetext/htmlapplication/xhtml+xml
Developed byW3C & WHATWGWorld Wide Web Consortium
Type of formatDocument file formatMarkup language
Extended fromSGMLXML, HTML
Stands forHyperText Markup LanguageExtensible HyperText Markup Language
ApplicationApplication of Standard Generalized Markup Language (SGML).Application of XML
FunctionWeb pages are written in HTML.Extended version of HTML that is stricter and XML-based.
NatureFlexible framework requiring lenient HTML-specific parser.Restrictive subset of XML and needs to be parsed with standard XML parsers.
OriginProposed by Tim Berners-Lee in 1987.World Wide Web Consortium Recommendation in 2000.
VersionsHTML 2, HTML 3.2, HTML 4.0, HTML 5.XHTML 1, XHTML 1.1, XHTML 2, XHTML 5.

Overview of HTML and XHTML

HTML is the predominant mark up language for web pages. HTML creates structured documents by denoting structural semantics for text like headings, lists, links, quotes etc. It allows images and objects to be embedded to create interactive forms. It is written as tags surrounded by angle brackets – for example, <html>. Scripts in languages like JavaScript can also be loaded. XHTML is a family of XML languages which extend or mirror versions of HTML. It does not allow omission of any tags or use of attribute minimization. XHTML requires that there be an end tag to every start tag and all nested tags must be closed in the right order. For example, while <br> is valid in HTML, it would be required to write <br /> in XHTML.

Features of HTML vs XHTML documents

HTML documents are composed of elements that have three components- a pair of element tags – start tag, end tag; element attributes given within tags and actual, textual and graphic content. HTML element is everything that lies between and including tags. (Tag is a keyword which is enclosed within angle brackets). XHTML documents has only one root element. All elements including variables must be in lower case, and values assigned must be surrounded by quotation marks, closed and nested for being recognized. This is a mandatory requirement in XHTML unlike HTML where it is optional. The declaration of DOCTYPE would determine rules for documents to follow.

  • Aside from the different opening declarations for a document, the differences between an HTML 4.01 and XHTML 1.0 document—in each of the corresponding DTDs—are largely syntactic.
  • The underlying syntax of HTML allows many shortcuts that XHTML does not, such as elements with optional opening or closing tags, and even EMPTY elements which must not have an end tag.
  • By contrast, XHTML requires all elements to have an opening tag or a closing tag. XHTML, however, also introduces a new shortcut: an XHTML tag may be opened and closed within the same tag, by including a slash before the end of the tag like this: <br/>.
  • The introduction of this shorthand, which is not used in the SGML declaration for HTML 4.01, may confuse earlier software unfamiliar with this new convention. A fix for this is to include a space before closing the tag, as such: <br />.

XHTML vs HTML Specification

HTML and XHTML are closely related and therefore can be documented together. Both HTML 4.01 and XHTML 1.0 have three sub specifications – strict, loose and frameset. The difference opening declarations for a document distinguishes HTML and XHTML. Other differences are syntactic. HTML allows shortcuts like elements with optional tags, empty elements without end tags. XHTML is very strict about opening and closing tags. XHTML uses built in language defining functionality attribute. All syntax requirements of XML are included in a well formed XHTML document.

Note, though, that these differences apply only when an XHTML document is served as an application of XML; that is, with a MIME type of application/xhtml+xml, application/xml, or text/xml. An XHTML document served with a MIME type of text/html must be parsed and interpreted as HTML, so the HTML rules apply in this case. A style sheet written for an XHTML document being served with a MIME type of text/html may not work as intended if the document is then served with a MIME type of application/xhtml+xml. For more information about MIME types, make sure to read MIME Types.

This can be especially important when you’re serving XHTML documents as text/html. Unless you’re aware of the differences, you may create style sheets that won’t work as intended if the document’s served as real XHTML.

Where the terms “XHTML” and “XHTML document” appear in the remainder of this section, they refer to XHTML markup served with an XML MIME type. XHTML markup served as text/html is an HTML document as far as browsers are concerned.

How to migrate from HTML to XHTML

As recommended by W3C following steps can be followed for migration of HTML to XHTML (XHTML 1.0 documents):

  • Include xml:lang and lang attributes on elements assigning language.
  • Use empty-element syntax on elements specified as empty in HTML.
  • Include an extra space in empty-element tags: <html />
  • Include close tags for elements that can have content but are empty: <html></html>
  • Do not include XML declaration.

Carefully following W3C’s guidelines on compatibility, a user agent (web browser) should be able to interpret documents with equal ease as HTML or XHTML.

How to migrate from XHTML to HTML

To understand the subtle differences between HTML and XHTML, consider the transformation of a valid and well-formed XHTML 1.0 document into a valid HTML 4.01 document. To make this translation requires the following steps:

  • The language for an element should be specified with a lang attribute rather than the XHTML xml:lang attribute. XHTML uses XML’s built in language-defining functionality attribute.
  • Remove the XML namespace (xmlns=URI). HTML has no facilities for namespaces.
  • Change the document type declaration from XHTML 1.0 to HTML 4.01.
  • If present, remove the XML declaration. (Typically this is: <?xml version="1.0" encoding="utf-8"?>).
  • Ensure that the document’s MIME type is set to text/html. For both HTML and XHTML, this comes from the HTTP Content-Type header sent by the server.
  • Change the XML empty-element syntax to an HTML style empty element (<br/> to <br>).

The Most Important Differences from HTML

  • <!DOCTYPE> is mandatory
  • The xmlns attribute in <html> is mandatory
  • <html>, <head>, <title>, and <body> are mandatory
  • Elements must always be properly nested
  • Elements must always be closed
  • Elements must always be in lowercase
  • Attribute names must always be in lowercase
  • Attribute values must always be quoted
  • Attribute minimization is forbidden

XHTML – <!DOCTYPE ….> Is Mandatory

An XHTML document must have an XHTML <!DOCTYPE> declaration. The <html>, <head>, <title>, and <body> elements must also be present, and the xmlns attribute in <html> must specify the xml namespace for the document.

Example

Here is an XHTML document with a minimum of required tags: 

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.futurefundamentals.comxhtml">
<head>
  <title>Title of document</title>
</head>
<body>

  some content here...

</body>
</html>

XHTML Elements Must be Properly Nested

In XHTML, elements must always be properly nested within each other, like this:

Correct:

<b><i>Some text</i></b>

Wrong:

<b><i>Some text</b></i>

XHTML Elements Must Always be Closed

In XHTML, elements must always be closed, like this:

Correct:

<p>This is a paragraph</p>
<p>This is another paragraph</p>

Wrong:

<p>This is a paragraph
<p>This is another paragraph

XHTML Empty Elements Must Always be Closed

In XHTML, empty elements must always be closed, like this:

Correct:

A break: <br />
A horizontal rule: <hr />
An image: <img src="happy.gif" alt="Happy face" />

Wrong:

A break: <br>
A horizontal rule: <hr>
An image: <img src="happy.gif" alt="Happy face">

XHTML Elements Must be in Lowercase

In XHTML, element names must always be in lowercase, like this:

Correct:

<body>
<p>This is a paragraph</p>
</body>

Wrong:

<BODY>
<P>This is a paragraph</P>
</BODY>

XHTML Attribute Names Must be in Lowercase

In XHTML, attribute names must always be in lowercase, like this:

Correct:

<a href="https://www.futurefundamentals.com/html/">Visit our HTML tutorial</a>

Wrong:

<a HREF="https://www.futurefundamentals.com/html/">Visit our HTML tutorial</a>

XHTML Attribute Minimization is Forbidden

In XHTML, attribute minimization is forbidden:

Correct:

XHTML Attribute Minimization is Forbidden
In XHTML, attribute minimization is forbidden:

Wrong:

<input type="checkbox" name="vehicle" value="car" checked />
<input type="text" name="lastname" disabled />
HTML Versus XHTML
Show Buttons
Hide Buttons