Applications Area Working Group | A. Melnikov |
Internet-Draft | Isode Limited |
Updates: 2046 (if approved) | J. Reschke |
Intended status: Standards Track | greenbytes |
Expires: November 10, 2012 | May 9, 2012 |
This document changes RFC 2046 rules regarding default charset parameter values for text/* media types to better align with common usage by existing clients and servers.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress”.¶
This Internet-Draft will expire on November 10, 2012.¶
Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶
Discussion of this draft should take place on the Apps Area Working Group mailing list (apps-discuss@ietf.org), which is archived at <http://www.ietf.org/mail-archive/web/apps-discuss>.¶
RFC 2046 specified that the default charset parameter (i.e. the value used when the parameter is not specified) is "US-ASCII" (Section 4.1.2 of [RFC2046]). RFC 2616 changed the default for use by HTTP (Hypertext Transfer Protocol) to be "ISO-8859-1" (Section 3.7.1 of [RFC2616]). This encoding is not very common for new text/* media types and a special rule in the HTTP specification adds confusion about which specification ([RFC2046] or [RFC2616]) is authoritative in regards to the default charset for text/* media types. ¶
Many complex text subtypes such as text/html [RFC2854] and text/xml [RFC3023] have internal (to their format) means of describing the charset. Many existing User Agents ignore the default of "US-ASCII" rule for at least text/html and text/xml.¶
This document changes RFC 2046 rules regarding default charset parameter values for text/* media types to better align with common usage by existing clients and servers. It does not change the defaults for any currently registered media type. ¶
Section 4.1.2 of [RFC2046] says:¶
The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII.¶
As explained in the Introduction section this rule is considered to be outdated, so this document replaces it with the following set of rules:¶
Each subtype of the "text" media type which uses the "charset" parameter can define its own default value for the "charset" parameter, including the absence of any default.¶
In order to improve interoperability with deployed agents, "text/*" media type registrations SHOULD either¶
In accordance with option (a), above, registrations for "text/*" media types that can transport charset information inside the corresponding payloads (such as "text/html" and "text/xml") SHOULD NOT specify the use of a "charset" parameter, nor any default value, in order to avoid conflicting interpretations should the charset parameter value and the value specified in the payload disagree.¶
New subtypes of the "text" media type, thus, SHOULD NOT define a default "charset" value. If there is a strong reason to do so despite this advice, they SHOULD use the "UTF-8" [RFC3629] charset as the default.¶
Regardless of what approach is chosen, all new text/* registrations MUST clearly specify how the charset is determined; relying on the default defined in Section 4.1.2 of [RFC2046] is no longer permitted. However, existing text/* registrations that fail to specify how the charset is determined still default to US-ASCII.¶
Specifications covering the "charset" parameter, and what default value, if any, is used, are subtype-specific, NOT protocol-specific. Protocols that use MIME, therefore, MUST NOT override default charset values for "text/*" media types to be different for their specific protocol. The protocol definitions MUST leave that to the subtype definitions.¶
Guessing of the charset parameter can lead to security issues such as content buffer overflows, denial of services or bypass of filtering mechanisms. However, this document does not promote guessing, but encourages use of charset information that is specified by the sender.¶
Conflicting information in-band vs out-of-band can also lead to similar security problems, and this document recommends the use of charset information which is more likely to be correct (for example, in-band over out-of-band).¶
This document asks IANA to update the "text" subregistry of the Media Types registry (<http://www.iana.org/assignments/media-types/text/>), to add the following preamble: "See [this RFC] for information about 'charset' parameter handling for text media types."¶
IANA is also asked to add this RFC to the list of references at the beginning of the Application for Media Type (<http://www.iana.org/cgi-bin/mediatypes.pl>).¶
Many thanks to Ned Freed and John Klensin for comments and ideas that motivated creation of this document, and to Carsten Bormann, Murray S. Kucherawy, Barry Leiba, and Henri Sivonen for feedback and text suggestions.¶