Monday, September 23, 2013

XML Basics



XML

Acronym for Extensible Markup Language. An open standard for exchanging structured documents and data over the Internet that was introduced by the World Wide Web Consortium (W3C) in November 1996.

What is XML?

  • XML stands for EXtensible Markup Language
  • XML is a markup language much like HTML
  • XML was designed to describe data
  • XML tags are not predefined. You must define your own tags
  • XML uses a Document Type Definition (DTD) or an XML Schema to describe the data
  • XML with a DTD or XML Schema is designed to be self-descriptive
  • XML is a W3C Recommendation

The Main Difference Between XML and HTML

XML was designed to carry data.
XML is not a replacement for HTML.
XML and HTML were designed with different goals:

XML was designed to describe data and to focus on what data is.
HTML was designed to display data and to focus on how data looks.

HTML is about displaying information, while XML is about describing information.
Features or Goals of XML
1. XML shall be straightforwardly usable over the Internet.
When you write XML it should be readily available over the Internet. XML is not intended as a programming language for stand-alone systems, but rather to be used across the Internet for a wide variety of sources.
2. XML shall support a wide variety of applications.
The beauty of XML is that it was intended to be used for as many things as possible. This flexibility can sometimes make it more difficult to understand, but ultimately, XML can be used to describe a Web page about flowers or a database of car parts or nearly anything you can imagine.
3. XML shall be compatible with SGML.
SGML or Standard Generalized Markup Language, is the ISO standard on which all XML and thus XHTML documents are based upon. If a document isn't compatible with SGML, then it cannot be called XML.
4. It shall be easy to write programs which process XML documents.
XML was always intended to be easy to use and process. Because XML is based on human-readable text, this makes it a lot easier for programmers to figure out what is meant by the XML tags. This in turn makes it easier to write a program that processes those tags.
5. The number of optional features in XML is to be kept to the minimum.
Ideally, there would be zero optional features. Optional features cause problems because they are not guaranteed to be in any given situation. The more optional features there are in a system the more combinations there are for the system and so the more difficult the programming becomes.
6. XML documents should be human-legible and reasonably clear.
Rather than having elements that are named a3209zd you would have an element named <first_name>. Someone reading your XML should be able to make an educated guess about what the data is that's being tagged.
7. The XML design should be prepared quickly.
It is better to spend time building the data than it is on building an XML design.
8. The design of XML shall be formal and concise.
Only include as many elements as you need to be clear, not more and not less.
9. XML documents shall be easy to create.
XML is intended to not require a special editor or tool to create. And in fact, most XML documents can be edited in a text editor like Notepad or TextEdit.
10. Terseness in XML markup is of minimal importance.
When you're creating XML element names, first_name is better than fname because it's clearer and more human readable. While you do want to keep elements names short, the shortness should not be at the sacrifice of human-readability.


Advantages of XML

·     XML uses human, not computer, language. XML is readable and understandable, even by novices, and no more difficult to code than HTML.
·     XML is completely compatible with JavaTM and 100% portable. Any application that can process XML can use your information, regardless of platform.
·     XML is extendable. Create your own tags, or use tags created by others, that use the natural language of your domain, that have the attributes you need, and that makes sense to you and your users.

Disadvantages of XML

·         More difficult, demanding, and precise than HTML

·         Lack of browser support/end user applications

·         Still experimental/not solidified

Advantages of XML over HTML

·         By defining own markup language, can code documents more precisely

·         Reflects structure and semantics of documents --> better searching and navigation

·         Tagging/content separate from display

·         Allows single document to be used many ways

XML Structure

This page provides a description of XML structure including the document parts, the prologue, and provides a simple XML example document.

Document Parts

  • Prolog
  • Document Element (root element)

The Prologue

The prologue, equivalent to the header in HTML, may include the following:
  • An XML declaration (optional) such as:
<?xml version="1.0"?>
  • A DTD or reference to one (optional). An example reference to an external DTD file:
<!DOCTYPE LANGLIST SYSTEM "langlist.dtd">
  • Processing instructions - An example processing instruction that causes style to be determined by a style sheet:
<?xml-stylesheet type="text/css" href="xmlstyle.css"?>

An XML Document

Therefore a complete well formed XML document may look like:
<?xml version="1.0"?>
<LAND>
   <FOREST>
      <TREE>Oak</TREE>
      <TREE>Pine</TREE>
      <TREE>Maple</TREE>
   </FOREST>
   <MEADOW>
      <GRASS>Bluegrass</GRASS>
      <GRASS>Fescue</GRASS>
      <GRASS>Rye</GRASS>
   </MEADOW>
</LAND>
The below document is not an XML document since it does not qualify by the rules of a well formed document. There is more than one top level element which disqualifies the document from being well formed.
<?xml version="1.0"?>
<FOREST>
   <TREE>Oak</TREE>
   <TREE>Pine</TREE>
   <TREE>Maple</TREE>
</FOREST>
<MEADOW>
   <GRASS>Bluegrass</GRASS>
   <GRASS>Fescue</GRASS>
   <GRASS>Rye</GRASS>
</MEADOW>

No comments:

Post a Comment