sgml

#define sgml: \ I---------------------------------------------------\ I---------------------------------------------------\ I---------------------------------------------------\ I /$$$$$$ /$$$$$$ /$$ /$$ /$$ \ I /$$__ $$ /$$__ $$| $$$ /$$$| $$ \ I | $$ \__/| $$ \__/| $$$$ /$$$$| $$ \ I | $$$$$$ | $$ /$$$$| $$ $$/$$ $$| $$ \ I \____ $$| $$|_ $$| $$ $$$| $$| $$ \ I /$$ \ $$| $$ \ $$| $$\ $ | $$| $$ \ I | $$$$$$/| $$$$$$/| $$ \/ | $$| $$$$$$$$ \ I \______/ \______/ |__/ |__/|________/ \ I---------------------------------------------------\ I---------------------------------------------------\ I---------------------------------------------------I "Standard Generalized Markup Language" • the forefather of many markup languages • used for describing structured documents • hierarchical by nature • has only historical significance, to explain the relationship between modern markup languages { // Evolution ---> ┌──▶ numerous GUI specification languages │ ┌──▶ XML ──┴──▶ mathML │ └───┐ SGML ───┤ └─┐ │ ├──▶ XHTML └──▶ HTML ───┘ } Tags: <tag-name attributes*> (contents) </tag-name> <!-- or --> <tag-name /> • the smallest building block of a document • tags are nested inside one another to represent hierarchy • tags must be closed before their parents are closed; no intersection type structure is allowed • formally, an element is an instance of a tag • however tag is also the term used for describing the syntax element delimited by '<' and '>' • the tag that marks the beginning of an element is called an opening tag • the tag that marks the end of an element is called a closing tagpaired elements consist of an opening tag and a closing tag and optionally further markdown between the two • unpaired tags consist if a single self-closing tag; they can have no contents { forced line break } • the terms parent, chidren and sibling are used to express the relationship between elements <hello-world> My best greetings! </hello-world> A A A \-------------|------------------|-------- An opening tag \ \-----------------|-------- Text inside the element } A paired element. \------- A closing tag / <example /> A | An unpaired element. <a> <!-- parent of elements 'b' and 'c' --> <b/> <!-- child of 'a', sibling of 'c' --> <c/> <!-- child of 'a', sibling of 'b' --> </a> Attributes: • live inside an opening tag • optionally can have a value • contains metadata about the element • space separated from the tag type and other attributes <hello-world id="HW"> </hello-world> A | id attribute with the value HW, (some variation of this is usually supported to name elements) DTD: https://www.htmlhelp.com/design/dtd/customdtd.html"Document Type Definition" • tag definition language • the tags of SGML are user defined, provided in the DTD format • standard HTML structures are specified in DTD XML:"Extensible Markup Language" • provides the freedom of defining your own tags and attributes • a stricter alternative to SGML which is easier to parse • Microsoft uses it extensively { WPF; MSBuild; } • since the exact syntax of any XML document requires extra context, there is not much to say about XML in a vacuum XPATH:"XML Path Language" • technology for locating information inside XML • suitable for SGML; HTML supports it too • by design, similar to UNIX file paths, where tags behave like folders; this is a perfectly clear abstraction since both describe trees • sometimes useful for web-scrapping — examples: • assume the following document <data> <employees> <employee id=12> <first-name>John</first-name> <last-name>Doe</last-name> </employee> <employee id=4> <first-name>Anon</first-name> <last-name>Anonson</last-name> </employee> </employees> </data> • the tag names behave like folder names • nodes are separated by '/'s /data/employees • if there are multiple matches inside the specified tag, they behave like an array with 1-based indexing # selects <employee id=12> # NOTE: employee[0] does not exist /data/employees/employee[1] • one can also filter by tag value # selects <employee id=4> /data/employees/employee[first-name = "Anon"] • or attribute #selects <employee id=4> /data/employees/employee[@id = "4"] • or access the parent #selects <employees> /data/employees/employee[first-name = "Anon"]/..