Network Working GroupP. Hoffman
Internet-DraftICANN
Obsoletes: 2629 (if approved)December 21, 2015
Intended status: Informational
Expires: June 23, 2016

The 'XML2RFC' version 3 Vocabulary

Abstract

This document defines the "XML2RFC" version 3 vocabulary; an XML-based language used for writing RFCs and Internet-Drafts. It is heavily derived from the version 2 vocabulary that is also under discussion. This document obsoletes the v2 grammar described in RFC 2629 and its expected followup, draft-iab-xml2rfc.

Editorial Note (To be removed by RFC Editor)

Discussion of this draft takes place on the rfc-interest mailing list (rfc-interest@rfc-editor.org), which has its home page at <https://www.rfc-editor.org/mailman/listinfo/rfc-interest>.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress”.

This Internet-Draft will expire on June 23, 2016.

Copyright Notice

Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


1. Introduction

This document describes version 3 ("v3") of the "XML2RFC" vocabulary; an XML-based language ('Extensible Markup Language', [XML]) used for writing RFCs ([RFC7322]) and Internet-Drafts ([IDGUIDE]).

This document obsoletes the version 2 vocabulary ("v2") [XML2RFCv2], which contains the extended language definition. That document in turn obsoletes the original version ("v1") [RFC2629]. This document directly copies the material from [XML2RFCv2] where possible; as that document makes its way toward RFC publication, this document will incorporate as many of the changes as possible.

The v3 format will be used as part of the new RFC series described in [RFC6949]. The new format will be handled by one or more new tools for preparing the XML and converting it to other representations. Features of the expected tools are described in Appendix B. That section defines some terms used throughout this document, such as "prep tool" and "formatter".

Note that the vocabulary contains certain constructs that might not be used when generating the final text; however, they can provide useful data for other uses (such as index generation, populating a keyword database, or syntax checks).

In this document, the term "format" is used when describing types of documents, primarily XML and HTML. The term "representation" is used when talking about a specific instatiation of a format, such as an XML document or an HTML document that was created by an XML document.

1.1. Expected Updates to the Specification

Non-interoperable changes in later versions of this specification are likely based on experience gained in implementing the RFC production center toolset. Revised documents will be published capturing those changes as the toolset is completed. Other implementers must not expect those changes to remain backwards-compatible with the details described in this document.

1.2. Design Criteria for the Changes in v3

The design criteria of the changes from v2 to v3 are:

  • The intention is that starting and editing a v3 document will be easier than for a v2 document.
  • There will be good v2-to-v3 conversion tools for when an author wants to change versions.
  • There are no current plans to make v3 XML the required submission format for drafts or RFCs. That might happen eventually, but it is likely to be years away.

There is a desire to keep as much of the v2 grammar as makes sense within the above design criteria and not to make gratuitous changes to the v2 grammar. Another way to say this is "we would rather encourage backward compatibility but not be constrained by it". Still, the goal of starting and editing a v3 document being easier than for a v2 document is more important than backwards compatibility with v2, given the latter two design criteria.

v3 is upwards compatible with v2, meaning that a v2 document is meant to be a valid v3 document as well. However, some features of v2 are deprecated in v3 in favor of new elements. Deprecated features are listed in Section 1.3.3, and are described in [XML2RFCv2].

1.3. Differences from v2 to v3

This is a a hopefully-complete list of all the technical changes between [XML2RFCv2] and this document.

1.3.1. New Elements in v3

  • Add <dl>, <ul>, and <ol> as new ways to make lists. This is a significant change from v2 in that the child under these elements is <li>, not <t>. <li> has a model of either containing one or more <t> elements, or containing the flowing text normally found in <t>. These lists are children of <section>s and other lists instead of <t>.
  • Add <strong>, <em>, <tt>, <sub>, and <sup> for character formatting.
  • Add <aside> for incidental text that will be indented when displayed.
  • Add <sourcecode> to differentiate from <artwork>.
  • Add <table>, <thead>, <tbody>, <tfoot>, <tr>, <td>, and <th> to give table functionality like that in HTML.
  • Add <boilerplate> to hold the automatically-generated boilerplate text.
  • Add <blockquote> to indicate a quotation as in a paragraph-like format.
  • Add <name> to sections, notes, figures, and texttables to allow character formatting (fixed-width font) in their titles, and to allow references in the names.
  • Add <postalLine>, free text that represents one line of the address.
  • Add <displayreference> to allow display of more mneumonic anchor names for automatically-included references.
  • Add <refcontent> to allow better control of text in a reference.
  • Add <referencegroup> to allow referencing multi-RFC documents such as STDs and BCPs.
  • Add <relref> to allow referncing specific sections or anchors in references.
  • Add <link> to point to a resource related to the RFC.
  • Add <br> to allow line breaks (but not blank lines) in the generated output for table cells.
  • Add <svg> to allow easy inclusion of SVG drawings in <artwork>.

1.3.2. New Attributes for Existing Elements

  • Add "sortRefs", "symRefs", "tocDepth", and "tocInclude" attributes to <rfc> to cover Processing Instructions (PIs) that were in v2 that are still needed in the grammar. Add "prepTime" to indicate the time that the XML went through a preparation step. Add "version" to indicate the version of XML2RFC vocabulary used in the document. Add "scripts" to indicate which scripts are needed to render the document. Add "expiresDate" when an Internet Draft expires.
  • Add "ascii" attributes to <email>, <organization>, <street>, <city>, <region>, <country>, and <code>. Also add "asciiFullname", "asciiInitials", and "asciiSurname" to <author>. This allows an author to specify their information in their native scripts as the primary entry and still allow the ASCII-equivalent values to appear in the processed documents.
  • Add "anchor" attributes to many block elements to allow them to be linked with <relref> and <xref>.
  • Add the "section", "relative", and "sectionFormat" attributes to <xref>.
  • Add the "numbered" and "removeInRFC" attributes to <section>.
  • Add the "removeInRFC" attribute to <note>.
  • Add "pn" to <artwork>, <aside>, <blockquote>, <boilerplate>,<dt>, <figure>, <li>, <section> <sourcecode>, <t>, and <table> to hold automatically generated numbers for items in a section that don't have their own numbering (namely figures and tables).
  • Add "display" to <cref> to indicate to tools whether or not to display the comment.
  • Add "keepWithNext" and "keepWithPrevious" to <t> as a hint to tools that do pagination that the should try to keep the paragraph with the next/previous element.

1.3.3. Elements and Attributes Deprecated from v2

Deprecated elements and attributes are legacy vocabulary from v2 that are supported for input to v3 tools. They are likely to be removed from those tools in the future. Instead of being listed in Section 2, they are listed in in Section 3. See Appendix B for more information on tools and how they will handle deprecated features.

  • Deprecate <list> in favor of <dl>, <ul>, and <ol>.
  • Deprecate <spanx>; replace it with <strong>, <em>, and <tt>.
  • Deprecate <vspace> because the major use for it, creating pseudo-paragraph-breaks in lists, is now handled properly.
  • Deprecate <texttable>, <ttcol>, and <c>; replace them with the new table elements (<table> and the elements that can be contained within it).
  • Deprecate <facsimile> because it is rarely used.
  • Deprecate <format> because it is not useful and has caused surprise for authors in the past. If the goal is to provide a single URI (Uniform Resource Identifier) for a reference, use the "target" attribute on <reference> instead.
  • Deprecate <preamble> and <postamble> in favor of simply using <t> before or after the figure. This also deprecates the "align" attribute in <figure>.
  • Deprecate the "title" attribute in <section>, <note>, <figure>, <references>, and <texttable> in favor of the new <name>.
  • Deprecate the "alt", and "src" attributes in <figure> because they overlap with the attributes in <artwork>.
  • Deprecate the "xml:space" attribute in <artwork> because there was only one useful value. Deprecate "height" and "width" attribute in both <artwork> and <figure> because they are not needed for the new output formats.
  • Deprecate the "pageno" attribute in <xref> because it was unused in v2. Deprecate the "none" values for the "format" attribute in <xref> because it makes no sense semantically.

1.3.4. Additional Changes from v2

  • Allow non-ASCII characters in the format; the characters that are actually allowed will be determined by the RFC Editor.
  • Allow <artwork> and <sourcecode> to be used on their own in <section> (no longer confine them to a figure).
  • Give more specifics of handling the "type" attribute in <artwork>.
  • Allow <strong>, <em>, <tt>, <eref>, and <xref> in <cref>.
  • Allow the sub-elements inside a <reference> to be in any order.
  • Turned off the auto-generation of anchors in <cref> because there is no use case for them that cannot be achieved in other ways.
  • Allow more than one <artwork>, or more than one <sourcecode>, in <figure>.
  • In <front>, make <date> optional.
  • In <postal>, allow the sub-elements to be in any order. Also allow the inclusion of the new <postalLine> instead of the older elements.
  • In <section>, restricted the names of the anchors that can be used on some types of sections.
  • Made <seriesInfo> a child of <front>, and deprecated it as a child of <reference>. This also deprecates some of the attributes from <rfc> and moves them into <seriesInfo>.
  • <t> now only contains non-block elements, so it no longer contains <figure> elements.
  • Do not generate the grammar from a DTD, but instead get it directly from the Relax Next Generation (RNG) grammar [RNG].

1.4. Syntax Notation

The XML vocabulary here is defined in prose, based on the Relax NG schema ([RNC]) contained in Appendix C (specified in Relax NG Compact Notation, "RNC").

Note that the schema can be used for automated validity checks, but certain constraints are only described in prose (example: the conditionally required presence of the "abbrev" attribute).

2. Elements

The sections below describe all elements and their attributes.

Note that attributes not labeled "mandatory" are optional.

Many elements have an optional "anchor" attribute. In all cases, the value of the "anchor" attribute needs to be a valid XML "Name" (Section 2.3 of [XML]), additionally constrained to US-ASCII characters ([USASCII]). Thus, the character repertoire consists of "A-Z", "a-z", "0-9", "_", "-", ".", and ":", where "0-9", ".", and "-" are disallowed as start character. Anchors are described in more detail in Appendix B.2.

Tools interpreting the XML described here will collapse horizontal whitespace and linebreaks to a single whitespace (except inside <artwork> and <sourcecode>), and will trim leading and trailing whitespace.

Some of the elements have attributes that are not described in this section because those attributes are specific to the prep tool. People writing tools to process this format should read all of the appendices for a complete description of these attributes.

Every element in the v3 vocabulary can have an "xml:lang" attribute, an "xml:base" attribute, or both. The xml:lang attribute specifies the language used in the element. This is sometimes useful for renderers which display different fonts for ideographic characters used in China and Japan. The xml:base attribute is sometimes added to an XML file when doing XML-to-XML conversion where the base file has XInclude atttributes (see Appendix B.1).

2.1. <abstract>

Contains the abstract of the document. See [RFC7322] for more information on restrictions for the abstract.

This element appears as a child element of: <front> (Section 2.26).

In any order, but at least one of:

2.1.1. "anchor" attribute

Document-wide unique identifier for the abstract.

2.2. <address>

Provides address information for the author.

This element appears as a child element of: <author> (Section 2.7).

In this order:

  1. One optional <postal> element (Section 2.37)
  2. One optional <phone> element (Section 2.36)
  3. One optional <facsimile> element (Section 3.2)
  4. One optional <email> element (Section 2.23)
  5. One optional <uri> element (Section 2.64)

2.3. <annotation>

Provides additional prose augmenting a bibliographical reference. This text is intended to be shown after the rest of the generated reference text.

This element appears as a child element of: <reference> (Section 2.40).

In any order:

2.4. <area>

Provides information about the IETF area to which this document relates (currently not used when generating documents).

The value ought to be either the full name or the abbreviation of one of the IETF areas as listed on <http://www.ietf.org/iesg/area.html>. The list will be kept by the RFC Editor.

This element appears as a child element of: <front> (Section 2.26).

Content model: only text content.

2.5. <artwork>

This element allows the inclusion of "artwork" into the document. <artwork> provides full control of horizontal whitespace and line breaks, and thus is used for a variety of things, such as diagrams ("line art") and protocol unit diagrams.

Alternatively, the "src" attribute allows referencing an external graphics file, such as a vector drawing in SVG or a bitmap graphic file, using a URI. In this case, the textual content acts as fallback for output representations that do not support graphics; thus, it ought to contain either (1) a "line art" variant of the graphics or (2) prose that describes the included image in sufficient detail.

If the artwork includes either "&" or "<" characters, or the string "]]>" those characters need to be encoded using escaping or CDATA block(s); see <sourcecode> for a fuller description of these solutions.

In [XML2RFCv2], the <artwork> element was also used for source code and formal languages; in v3, this is now done with <sourcecode>.

There are at least five ways to include SVG in artwork in Internet Drafts:

  • Inline, by including all of the SVG in the content of the element, such as: <artwork type="svg"><svg xmlns...">
  • Inline, but using XInclude (see Appendix B.1), such as: <artwork type="svg"><xi:include href=...>
  • As a data: URI, such as: <artwork type="svg" src="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3...">
  • As a URI to an external entity, such as: <artwork type="svg" src="http://www.example.com/...">
  • As a local file, such as: <artwork type="svg" src="diagram12.svg">

The use of SVG in Internet Drafts and RFCs is covered in much more detail in [SVGforRFCs].

The above methods for inclusion of SVG art can also be used for including text artwork, but using a data: URI is probably confusing for text artwork.

Formatters that do pagination should attempt to keep artwork on a single page. This is to prevent artwork that is split across pages from looking like two separate pieces of artwork.

This element appears as a child element of: <aside> (Section 2.6), <blockquote> (Section 2.10), <dd> (Section 2.18), <figure> (Section 2.25), <li> (Section 2.29), <section> (Section 2.46), <td> (Section 2.56), and <th> (Section 2.58).

Either:

  • Text

Or:

2.5.1. "align" attribute

Controls whether the artwork appears left justified (default), centered, or right justified.

Allowed values:

  • "left" (default)
  • "center"
  • "right"

2.5.2. "alt" attribute

Alternative text description of the artwork (which is more than just a summary or caption). When the art comes from the "src" attribute, and the format of that artwork supports alternate text, the alternative text comes from the text of the artwork itself, not from this attribute. The contents of this attribute are important to readers who are visually impaired, as well as those reading on devices that cannot show the artwork well, or at all.

2.5.3. "anchor" attribute

Document-wide unique identifier for this artwork.

2.5.4. "height" attribute

Deprecated.

2.5.5. "name" attribute

A filename suitable for the contents (such as for extraction to a local file). This attribute can be helpful for other kinds of tools (such as automated syntax checkers which work by extracting the artwork). Note that the "name" attribute does not need to be unique for artwork elements in a document. If multiple artwork elements have the same name attribute, a processing tool might assume that the elements are all fragments of a single file, and the tool can collect those fragments for later processing. See Section 5 for a discussion of possible problems with the value of this attribute.

2.5.6. "src" attribute

The URI reference of a graphics file ([RFC3986]), or the name of a file on the local disk. This can be a "data" URI ([RFC2397]) that contains the contents of the graphics file. Note that the inclusion of art with the "src" attribute depends on the capabilities of the processing tool reading the XML document. Tools need to be able to handle the file: URI, and should be able to handle http: and https: URIs as well. The prep tool will be able to handle reading the "src" attribute.

If no URI scheme is given in the attribute, the attribute is considered to be a local file name relative to the current directory. Processing tools must be careful to not accept dangerous values for the filename, particularly those that contain absolute references outside the current directory. Document creators should think hard before using relative URIs due to possible later problems if files move around on the disk. Also, documents should most likely use explicit URI schemes whereever possible.

In some cases, the prep tool may remove the "src" attribute after processing its value. See [PREPTOOL] for a description of this.

It is an error to have both a "src" attribute and content in the <artwork> element.

2.5.7. "type" attribute

Specifies the type of the artwork. The value of this attribute is free text with certain values designated as preferred.

The preferred values for <artwork> types are:

  • ascii-art
  • binary-art
  • call-flow
  • hex-dump
  • svg

The RFC Editor will maintain a complete list of the preferred values on its web site, and that list is expected to be updated over time. Thus, a consumer of v3 XML should not cause a failure when it encounters an unexpected type or no type is specified.

2.5.8. "width" attribute

Deprecated.

2.5.9. "xml:space" attribute

Deprecated.

2.6. <aside>

This element is a container for content that is semantically less important or tangential to the content that surrounds it.

This element appears as a child element of: <section> (Section 2.46).

In any order:

2.6.1. "anchor" attribute

Document-wide unique identifier for this aside.

2.7. <author>

Provides information about a document's author. This is used both for the document itself (at the beginning of the document) and for referenced documents.

The <author> elements contained within the document's <front> element are used to fill the boilerplate, and also to generate the "Author's Address" section (see [RFC7322]).

Note that an "author" can also be just an organization (by not specifying any of the name attributes, but adding the <organization> child element).

Furthermore, the "role" attribute can be used to mark an author as "editor". This is reflected both on the front page and in the "Author's Address" section, as well as in bibliographical references. Note that this specification does not define a precise meaning for the term "editor".

See Section "Authors vs. Contributors" of [RFC7322] for more information.

This element appears as a child element of: <front> (Section 2.26).

In this order:

  1. One optional <organization> element (Section 2.35)
  2. One optional <address> element (Section 2.2)

2.7.1. "asciiFullname" attribute

The ASCII equivalent of the author's full name.

2.7.2. "asciiInitials" attribute

The ASCII equivalent of the author's intials.

2.7.3. "asciiSurname" attribute

The ASCII equivalent of the author's surname.

2.7.4. "fullname" attribute

The full name (used in the automatically generated "Author's Address" section).

2.7.5. "initials" attribute

An abbreviated variant of the given name(s), to be used in conjunction with the separately specified surname. It usually appears on the front page, in footers, and in references.

Some processors will post-process the value, for instance when it only contains a single letter (in which case they might add a trailing dot). Relying on this kind of post-processing can lead to results varying across formatters and thus ought to be avoided.

2.7.6. "role" attribute

Specifies the role the author had in creating the document.

Allowed values:

  • "editor"

2.7.7. "surname" attribute

The author's surname, to be used in conjunction with the separately specified initials. It usually appears on the front page, in footers, and in references.

2.8. <back>

Contains the "back" part of the document: the references and appendices. In <back>, <section> elements indicate appendices.

This element appears as a child element of: <rfc> (Section 2.45).

In this order:

  1. Optional <displayreference> elements (Section 2.19)
  2. Optional <references> elements (Section 2.42)
  3. Optional <section> elements (Section 2.46)

2.9. <bcp14>

Marks text that are phrases defined in BCP 14 such as "MUST", "SHOULD NOT", and so on. When shown in some of the output representations, the text in this element might be highlighted. The use of this element is optional.

This element is only to be used around the actual phrase from BCP 14, not the full definition of a requirement. For example, it is correct to say "The packet <bcp14>MUST</bcp14> be dropped.", but it not correct to say "<bcp14>The packet MUST be dropped.</bcp14>".

Content model: only text content.

2.10. <blockquote>

Specifies a block of text is a quotation.

This element appears as a child element of: <section> (Section 2.46).

Either:

Or:

2.10.1. "anchor" attribute

Document-wide unique identifier for this quotation.

2.10.2. "cite" attribute

The source of the citation. This must be a URI. If the quotedFrom attribute is given, this URI will be used by processing tools as the link for the text of that attribute.

2.10.3. "quotedFrom" attribute

Name of person or document the text in this element is quoted from. A formatter should render this as visible text at the end of the quotation.

2.11. <boilerplate>

Holds the boilerplate text for the document. This section is filled in by the prep tool.

This element appears as a child element of: <front> (Section 2.26).

One or more <section> elements (Section 2.46)

2.12. <br>

Indicates that a line break should be inserted in the generated output by a formatting tool. Multiple successive instances of this element do not cause blank lines to appear in the output, and is thus not useful.

This element appears as a child element of: <td> (Section 2.56), and <th> (Section 2.58).

Content model: this element does not have any contents.

2.13. <city>

Gives the city name in a postal address.

This element appears as a child element of: <postal> (Section 2.37).

Content model: only text content.

2.13.1. "ascii" attribute

The ASCII equivalent of the city name.

2.14. <code>

Gives the postal region code.

This element appears as a child elemen