sgml
#define sgml: \
I---------------------------------------------------\
I---------------------------------------------------\
I---------------------------------------------------\
I /$$$$$$ /$$$$$$ /$$ /$$ /$$ \
I /$$__ $$ /$$__ $$| $$$ /$$$| $$ \
I | $$ \__/| $$ \__/| $$$$ /$$$$| $$ \
I | $$$$$$ | $$ /$$$$| $$ $$/$$ $$| $$ \
I \____ $$| $$|_ $$| $$ $$$| $$| $$ \
I /$$ \ $$| $$ \ $$| $$\ $ | $$| $$ \
I | $$$$$$/| $$$$$$/| $$ \/ | $$| $$$$$$$$ \
I \______/ \______/ |__/ |__/|________/ \
I---------------------------------------------------\
I---------------------------------------------------\
I---------------------------------------------------I
• "Standard Generalized Markup Language"
• the forefather of many markup languages
• used for describing structured documents
• hierarchical by nature
• has only historical significance, to explain the relationship
between modern markup languages
{
┌──▶ numerous GUI specification languages
│
┌──▶ XML ──┴──▶ mathML
│ └───┐
SGML ───┤ └─┐
│ ├──▶ XHTML
└──▶ HTML ───┘
}
Tags: Tags:
<tag-name attributes*>
(contents)
</tag-name>
<!-- or -->
<tag-name />
• the smallest building block of a document
• tags are nested inside one another to represent hierarchy
• tags must be closed before their parents are closed;
no intersection type structure is allowed
• formally, an element is an instance of a tag
• however tag is also the term used for describing the syntax element
delimited by '<' and '>'
• the tag that marks the beginning of an element is called an opening tag
• the tag that marks the end of an element is called a closing tag
• paired elements consist of an opening tag and a closing tag and
optionally further markdown between the two
• unpaired tags consist if a single self-closing tag;
they can have no contents { forced line break }
• the terms parent, chidren and sibling are used to express
the relationship between elements
<hello-world> My best greetings! </hello-world>
A A A
\-------------|------------------|-------- An opening tag \
\-----------------|-------- Text inside the element } A paired element.
\------- A closing tag /
<example />
A
|
An unpaired element.
<a> <!-- parent of elements 'b' and 'c' -->
<b/> <!-- child of 'a', sibling of 'c' -->
<c/> <!-- child of 'a', sibling of 'b' -->
</a>
Attributes:
• live inside an opening tag
• optionally can have a value
• contains metadata about the element
• space separated from the tag type and other attributes
<hello-world id="HW"> </hello-world>
A
|
id attribute with the value HW,
(some variation of this is usually
supported to name elements)
DTD: DTD:
https://www.htmlhelp.com/design/dtd/customdtd.html
• "Document Type Definition"
• tag definition language
• the tags of SGML are user defined, provided in the DTD format
• standard HTML structures are specified in DTD
XML: XML:
• "Extensible Markup Language"
• provides the freedom of defining your own tags and attributes
• a stricter alternative to SGML which is easier to parse
• Microsoft uses it extensively { WPF; MSBuild; }
• since the exact syntax of any XML document requires extra context,
there is not much to say about XML in a vacuum
XPATH: XPATH:
• "XML Path Language"
• technology for locating information inside XML
• suitable for SGML; HTML supports it too
• by design, similar to UNIX file paths,
where tags behave like folders;
this is a perfectly clear abstraction since both describe trees
• sometimes useful for web-scrapping
— examples:
• assume the following document
<data>
<employees>
<employee id=12>
<first-name>John</first-name>
<last-name>Doe</last-name>
</employee>
<employee id=4>
<first-name>Anon</first-name>
<last-name>Anonson</last-name>
</employee>
</employees>
</data>
• the tag names behave like folder names
• nodes are separated by '/'s
/data/employees
• if there are multiple matches inside the specified tag,
they behave like an array with 1-based indexing
# selects <employee id=12>
# NOTE: employee[0] does not exist
/data/employees/employee[1]
• one can also filter by tag value
# selects <employee id=4>
/data/employees/employee[first-name = "Anon"]
• or attribute
#selects <employee id=4>
/data/employees/employee[@id = "4"]
• or access the parent
#selects <employees>
/data/employees/employee[first-name = "Anon"]/..