Internet Architecture Board (IAB) | T. Hansen, Editor |
Request for Comments: 7995 | AT&T Laboratories |
Category: Informational | L. Masinter |
ISSN: 2070-1721 | M. Hardy |
Adobe | |
December 2016 |
This document discusses options and requirements for the PDF rendering of RFCs in the RFC Series, as outlined in RFC 6949. It also discusses the use of PDF for Internet-Drafts, and available or needed software tools for producing and working with PDF.¶
This document is not an Internet Standards Track specification; it is published for informational purposes.¶
This document is a product of the Internet Architecture Board (IAB) and represents information that the IAB has deemed valuable to provide for permanent record. It represents the consensus of the Internet Architecture Board (IAB). Documents approved for publication by the IAB are not a candidate for any level of Internet Standard; see Section 2 of RFC 7841.¶
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7995.¶
Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.¶
The RFC Series is evolving, as outlined in [RFC6949]. Future documents will use a canonical format, XML, with renderings in various formats, including PDF.¶
Because PDF has a wide range of capabilities and alternatives, not all PDFs are "equal". For example, visually similar documents could consist of scanned or rasterized images, or include text layout options, hyperlinks, embedded fonts, and digital signatures. (See [APP-PDF] for a history of PDF.)¶
This document explains some of the relevant options and makes recommendations, for both the RFC Series and Internet-Drafts.¶
The PDF format and the tools to manipulate it are not as well known as those for the other RFC formats, at least in the IETF community. This document discusses some of the processes for creating and using PDFs using both open source and commercial products.¶
The details described in this document are expected to change based on experience gained in implementing the new publication toolsets. Revised documents will be published capturing those changes as the toolsets are completed. Other implementers must not expect those changes to remain backwards-compatible with the details described in this document.¶
PDF [PDF] has gone through several revisions, primarily for the addition of features. PDF features have generally been added in a way that older viewers "fail gracefully", but even so, the older the PDF version produced, the more legacy viewers will support that version but the fewer features will be enabled.¶
As PDF has evolved a broad set of capabilities, additional standards for PDF files are applicable. These standards establish ground rules that are important for specific applications. For example, PDF/X was specifically designed for Prepress digital data exchange, with careful attention to color management and printing instructions. The PDF/E standard was designed for engineering documents with dynamic workflows (where a document continues to be revised after publication) and allows interactive media (including animation and 3D).¶
Two additional standards families are important to the RFC format, though: long-term preservation (PDF/A), and user accessibility (PDF/UA [PDFUA]). These then have sub-profiles (PDF/A-1, PDF/A-2 [PDFA2], PDF/A-3 [PDFA3]), each of which has conformance levels. These standards are then supported by various software libraries and tools.¶
It is effective and useful to use these standards to capture PDF for RFC requirements, and they will make the PDF files useful in workflows that expect them.¶
Recommendations: ¶
This section lays out options and requirements for PDFs produced by the RFC Editor for RFCs. There are two subsections: Section 3.1 covers "visible" requirements related to how the PDF normally appears when it is viewed with a PDF viewer; Section 3.2 covers "invisible" options and requirements, which primarily affect the ability to process PDFs in other ways but do not ordinarily control the way the document appears. (Of course, a viewer UI might display processing capabilities, such as showing whether a document has been digitally signed.)¶
In many cases, the choice of PDF requirements is heavily influenced by the capabilities of available tools to create PDFs. Most of the discussion of tooling is to be found in Appendix C.¶
PDF supports rich visible layout of fixed-sized pages.¶
For a consistent "look" of RFCs and good style, the PDFs produced by the RFC Editor should have a clear, consistent, identifiable, and easy-to-read style. They should print well on the widest range of printers and should look good on displays of varying resolution.¶
PDF files are laid out for a particular size of page and margins. There are two paper sizes in common use: "US Letter" (8.5x11 inches, 216x279 mm, in popular use in North America) and "A4" (210x297 mm, 8.27x11.7 inches, standard for the rest of the world). Usually, PDF printing software is used in a "shrink to fit" mode where the printing is adjusted to fit the paper in the printer. There is some controversy, but the argument that A4 is an international standard is compelling. However, if the margins and header positioning are chosen appropriately, the document can be printed without any scaling.¶
One common feature of the Internet-Draft output formats is optional visible paragraph numbers, to aid in discussions. In the PDF, and thus in the printed rendition, it is possible to make paragraph numbers unobtrusive and even to impinge on the margins.¶
By its nature, PDF is paginated, so pagination issues must be considered. This is reflected in two areas: running headers and footers, and how text is laid out on a page for optimal reading.¶
Appendix B describes the process of creating a paged document from running text such that related material is present on the same page together and artifacts of pagination don't interfere with easy reading of the document.¶
Layout engines differ in the quality of the algorithms used to automate these processes. In some cases, the automated processes require some manual assistance to ensure, for example, that a text line intended as a heading is "kept" with the text for which it is a heading.¶
Recommendations: ¶
A PDF may refer to a font by name, or it may use an embedded font. When a font is not embedded, a PDF viewer will attempt to locate a locally installed font of the same name. If it cannot find an exact match, it will find a "close match". If a close match is not available, it will fall back to something implementation dependent and usually undesirable.¶
In addition, the PDF/A standards mandate the embedding of fonts. Instead of using additional software to embed the fonts, the software generating the PDF files should produce PDF/A-conforming files directly, thus ensuring that all glyphs include Unicode mappings and embedded fonts from the outset.¶
If the HTML version of the document is being visually mimicked, the font(s) chosen should have both variable-width and constant-width components, as well as bold and italic representations.¶
The typefaces used by Internet-Drafts and by RFCs need not be identical.¶
Few fonts have glyphs for the entire repertoire of Unicode characters; for this purpose, the PDF generation tool may need a set of fonts and a way of choosing them. The RFC Editor is defining where Unicode characters may be used within RFCs [RFC7997].¶
Typefaces are typically licensed, and in many cases there is a fee for use by PDF creation tools; however, there is usually no fee for display or print of the embedded fonts.¶
Recommendations: ¶
Typically, when doing page layout of running text, especially with narrow page width and long words, layout processors of English text often have the option of either hyphenating words or using existing hyphens as a place to introduce word breaks. However, inserting line breaks mid-word can be harmful when the "word" is actually a sequence of characters representing a protocol element or protocol sequence.¶
PDF supports hyperlinks to sections of the same document and also to sections of other documents.¶
The conversion to PDF can generate: ¶
Recommendations: ¶
There is some advantage to having the PDF files look like the text or HTML renderings of the same document. Even so, there are several options. The PDF ¶
Most of the choices used for the renderings per [RFC7992] and [RFC7993] are thus applicable. See those documents for specifics on the rendering of the specific XML elements. Some notes: ¶
PDF offers a number of features that improve the utility of PDF files in a variety of workflows, at the cost of extra effort in the xml2rfc conversion process; the trade-offs may be different for the RFC Editor production of RFCs and for Internet-Drafts.¶
The contents of a PDF file can be represented in many ways. The PDF file could be generated: ¶
All of these end up with essentially the same visual representation of the output. However, each level has trade-offs for auxiliary uses, such as searching or indexing, commenting and annotation, and accessibility (text-to-speech). Keeping the running order of text in the content stream in the proper order supports all of these auxiliary uses.¶
In addition, the "role map" feature of PDF (Section 14.7.3 ("Structure Types") of [PDF]) would allow for the mapping of the logical tags found in the original XML into tags in the PDF.¶
Recommendations: ¶
PDF itself does not require the use of Unicode. Text is represented as a sequence of glyphs that can then be mapped to Unicode.¶
Recommendations: ¶
The XML allows both ASCII art and SVG to be used for artwork.¶
Recommendations: ¶
Guidelines for the accessibility of PDF <http://www.w3.org/TR/WCAG20-TECHS/PDF1.html> recommend that images, formulas, and other non-text items provide textual alternatives, using the "/Alt" Tag in PDF to provide human-readable text that can be vocalized by text-to-speech technology.¶
Metadata encodes information about the document authors, the document series, date created, etc. Having this metadata within the PDF file allows it to be used by search engines, viewers, and other reuse tools. PDF supports embedded metadata in a variety of ways, including using the Extensible Metadata Platform (XMP) [XMP]. The RFC Editor maintains metadata about an RFC on its info page.¶
PDF supports an "outline" feature where sections of the document are marked; this could be used in addition to the table of contents as a navigation aid.¶
The section structure of an RFC can be mapped into the PDF elements for the document structure. This will allow the bookmark feature of PDF readers to be used to quickly access sections of the document.¶
PDF has the capability of including other files; the files may be labeled by both a media type and a role, the AFRelationship key [PDFA3]. In this way, the PDF file also acts as a container.¶
Embedded content may be compressed.¶
Many PDF viewers support the ability to view and extract embedded files, although this capability is not universal.¶
Embedding content in the PDF file allows the PDF to act as a complete package that can be transformed, archived, and digitally signed. (Some sample code illustrating how items can be attached to a PDF file and subsequently extracted can be found at <https://github.com/Aiybe/xmptest>.) Useful possibilities: ¶
Recommendations: ¶
The RFC Editor and staff are at times called to provide evidence that a particular RFC is the "original" and has not been modified; digital signatures can provide that verification. As signatures also apply to embedded content, embedding the XML source will provide a way of signing the source XML that was used to produce the PDF file as well.¶
PDF has supported digital signatures since PDF 1.2, and there are multiple methods and options available for signing PDF files. The method chosen for the signing of Internet-Drafts and RFCs will be determined by separate policy.¶
If PDF digital signatures are chosen, the authors suggest the following: ¶
The following security considerations apply:¶
Threats: ¶
Mitigations: ¶
NOTE: This section is meant as an overview to give some background.¶
The RFC Series has for a long time accepted Postscript renderings of RFCs, either in addition to or instead of the text renderings of those same RFCs. These have usually been produced when there was a complicated figure or mathematics within the document. For example, consider the figures and mathematics found in RFCs 1119 and 1142, and compare the figures found in the text version of RFC 3550 with those in the Postscript version. The RFC Editor has provided a PDF rendering of RFCs. Usually, this has been a print of the text file that does not take advantage of any of the broader PDF functionality, unless there was a Postscript version of the RFC, which would then be used by the RFC Editor to generate the PDF.¶
In addition to PDFs generated and published by the RFC Editor, the IETF tools community has also long supported PDF for Internet-Drafts. Most RFCs start with Internet-Drafts, edited by individual authors. The Internet-Drafts submission tool at <https://datatracker.ietf.org/submit/> accepts PDF and Postscript files in addition to the (required) text submission and (currently optional) XML. If a PDF wasn't submitted for a particular version of an Internet-Draft, the tools would generate one from the Postscript, HTML, or text.¶
The process of creating a paged document from running text typically involves ensuring that related material is present on the same page together and that artifacts of pagination don't interfere with easy reading of the document. Typical high-quality layout processors do several things: ¶
This section discusses tools for viewing, comparing, creating, manipulating, and transforming PDF files, including those currently in use by the RFC Editor and Internet-Drafts, as well as outlining available PDF tools for various processes.¶
As with most file formats, PDF files are experienced through a reader or viewer of PDF files. For most of the common platforms in use (iOS, OS X, Windows, Android, ChromeOS, Kindle) and for most browsers (Edge, Safari, Chrome, Firefox), PDF viewing is built in. In addition there are many PDF viewers available for download and installation.¶
While almost all viewers also support the printing of PDF files, printing is one of the most important use cases for PDFs. Some printers have direct PDF support.¶
Because the xml2rfc format is a unique format, software for converting XML source documents to the various formats will be needed, including PDF generation.¶
One promising direction is suggested in <http://greenbytes.de/tech/webdav/rfc2629xslt/rfc2629xslt.html#output.pdf.fop>: using XSLT (Extensible Stylesheet Language Transformations) to generate XSL-FO (XSL Formatting Objects); XSL-FO is then processed by a FOP (Formatting Objects Processor) such as Apache FOP.¶
Several libraries are also available for generating PDF signatures. The choice of library to use for xml2pdf will depend on many factors: programming language, quality of implementation, quality of PDF generated, support, cost, availability, and so forth.¶
Various typefaces are available that might satisfy the requirements of this document. Google's Noto typeface family <https://www.google.com/get/noto/> supports a significant subset of Unicode and includes fixed-width, serif, and sans-serif styles. Another potentially useful set of typefaces (without extensive Unicode support, however) includes: ¶
Another font that looks promising for its broad Unicode support is Skolar <https://www.rosettatype.com/Skolar>, but it requires licensing.
In addition to generating and viewing PDF, other categories of PDF tools are available and may be useful both during specification development and for published RFCs. These include tools for comparing two PDFs, checkers that could be used to validate the results of conversion, reviewing and commentary tools that attach annotations to PDF files, and digital signature creation and validation.¶
Validation of an arbitrary author-generated PDF file would be quite difficult; there are few PDF validation tools. However, if RFCs and Internet-Drafts are generated by conversion from XML via xml2rfc, then explicit validation of PDF and adherence to expected profiles would mainly be useful to ensure that xml2rfc has functioned properly.¶
The IAB members at the time this memo was approved were (in alphabetical order): ¶
The input of the following people is gratefully acknowledged: Nevil Brownlee (ISE), Brian Carpenter, Chris Dearlove, Martin Duerst, Heather Flanagan (RSE), Joe Hildebrand, Paul Hoffman, Duff Johnson, Ted Lemon, Sean Leonard, Henrik Levkowetz, Julian Reschke, Adam Roach, Leonard Rosenthol, Alice Russo, Robert Sparks, Andrew Sullivan, and Dave Thaler.¶