What Is XML?
· Subset of the Standard Generalized Markup Language (SGML), easy to interchange structured docs over internet
· Defines how Internet Uniform Resource Locators can be used to identify component parts of XML data streams
· Document Type Definition: role of each element of text in a formal model, not required in XML
· XML lets users bring mult files together to form compound docs, where to put pictures in text, give processing control info to supporting programs, add editorial comments
· Composed of a series of entities, each one contains 1+ logical elements, each elements has certain attributes to describe how to be processed
· To define tag sets use DTD
· Some elements are placeholders, empty elements. No end-tag. Usually for graphic.
· Important=unique identifier, cross reference between two points in doc
· Text entity, commonly used text within DTD
· XML file normally has three types of markup, first two option: processing instruction, document type declaration, document instance
Survey of XML standards
· Builds on Unicode & DTD
· XML 1.1 first revision, revise treatment of characters in the XML specification to make it adapt more naturally to changes in the Unicode specification & normalization of characters
· Based on Standard Generalized Markup Language
· XML Catalogs has instructions how XML processor resolves XML entity identifiers into actual documents. System identifiers given by URIs. Public Identifiers.
· Namespaces in XML universal naming of elements and attributes in XML docs. Assign vocab markers if want to embed XHTML.
· XML Base associating XML elements with URIs specify how relative URIs are resolved in relevant XML processing actions
· Canonical XML Version 1.0 standard method for generating physical rep of an XML document, called canonical form. Accounts for variations allowed in XML syntax without changing meaning.
· XML Path Language syntax/data model for addressing parts of an XML document, a little language
· XPointer Framework defines a language to refer to fragments of an XML doc
· XLink generic framework for expressing links in XML docs. Harder than in HTML.
· Relax NG XML schema language, define and limit XML vocabs. Original is DTD, but some people dislike it while Relax is more simple/expressive.
· W3C XML Schema another schema language for XML. First part constrains structure of doc, second constrains contents of simple elements and attributes.
· Schematron schema language register collection of rules against which the XML doc is to be checked rather than mapping out entire tree structure
Extending Your Markup
· Looks like HTML docs, starts with a prolog, ends with exactly one element
· Single element can be viewed as root of doc, build off from there
· DTD declared in XML doc’s prolog with !DOCTYPE tag
· Elements nonterminal or terminal. Nonterminal contain subelements, grouped as sequences or choices. Terminal elements as parsed character data or EMPTY. Elements declared as ANY.
· Elements can have zero or more attributes, declared using !ATTLIST tag
· Character data most common data type for attributes. Types id, idref, idrefs
· Namespaces avoids names clashes, can be defined in any element. Define all namespaces within the root element and use unique prefixes. Namespaces and DTDs don’t work well together.
· Xlink describes how 2 docs can be linked
· XPointer enables addressing individual parts of an XML doc
· XPath used by XPointer to describe location paths
· Location path has location steps
· XLink to link docs together, uses its own namespace
· XSL is two languages: transformation language (XSLT) and formatting language.
· XSLT can transform XML into HTML, bypass formatting language
· XML is family of lanugages
W3Schools XML
· Greatest strength of XML Schemas is support for data types, they use XML Schemas so don’t have to learn new language, secures data communication
· Well-formed XML doc is a doc that conforms to the XML syntax rules
· Complex types, simple types
· <schema> element root
· Simple element contains only text, can’t have attributes