The "data" URL schemeXerox Palo Alto Research Center3333 Coyote Hill RoadPalo AltoCA94034masinter@parc.xerox.com
A new URL scheme, "data", is defined. It allows inclusion of small
data items as "immediate" data, as if it had been included
externally.
Some applications that use URLs also have a need to embed (small)
media type data directly inline. This document defines a new URL
scheme that would work like 'immediate addressing'. The URLs are of
the form:
The <mediatype> is an Internet media type specification (with
optional parameters.) The appearance of ";base64" means that the data
is encoded as base64. Without ";base64", the data (as a sequence of
octets) is represented using ASCII encoding for octets inside the
range of safe URL characters and using the standard %xx hex encoding
of URLs for octets outside that range. If <mediatype> is omitted, it
defaults to text/plain;charset=US-ASCII. As a shorthand,
"text/plain" can be omitted but the charset parameter supplied.
The "data:" URL scheme is only useful for short values. Note that
some applications that use URLs may impose a length limit; for
example, URLs embedded within <A> anchors in HTML have a length limit
determined by the SGML declaration for HTML . The LITLEN
(1024) limits the number of characters which can appear in a single
attribute value literal, the ATTSPLEN (2100) limits the sum of all
lengths of all attribute value specifications which appear in a tag,
and the TAGLEN (2100) limits the overall length of a tag.
The "data" URL scheme has no relative URL forms.
where "urlchar" is imported from , and "type", "subtype",
"attribute" and "value" are the corresponding tokens from ,
represented using URL escaped encoding of as necessary.
Attribute values in are allowed to be either represented as
tokens or as quoted strings. However, within a "data" URL, the
"quoted-string" representation would be awkward, since the quote mark
is itself not a valid urlchar. For this reason, parameter values
should use the URL Escaped encoding instead of quoted string if the
parameter values contain any "tspecial".
The ";base64" extension is distinguishable from a content-type
parameter by the fact that it doesn't have a following "=" sign.
A data URL might be used for arbitrary types of data. The URL
encodes the text/plain string "A brief note", which might be useful
in a footnote link.
The HTML fragment:
could be used for a small inline image in a HTML document. (The
embedded image is probably near the limit of utility. For anything
else larger, data URLs are likely to be inappropriate.)
A data URL scheme's media type specification can include other
parameters; for example, one might specify a charset parameter.
can be used for a short sequence of greek characters.
Some applications may use the "data" URL scheme in order to provide
setup parameters for other kinds of networking applications. For
example, one might create a media type
whose content consists of a query string and a database identifier
for the "xxx" vendor's databases. A URL of the form:
could then be used in a local application to launch the "helper" for
application/vnd-xxx-query and give it the immediate data included.
This idea was originally proposed August 1995. Some versions of the
data URL scheme have been used in the definition of VRML, and a
version has appeared as part of a proposal for embedded data in HTML.
Various changes have been made, based on requests, to elide the media
type, pack the indication of the base64 encoding more tightly, and
eliminate "quoted printable" as an encoding since it would not easily
yield valid URLs without additional %xx encoding, which itself is
sufficient. The "data" URL scheme is in use in VRML, new applications
of HTML, and various commercial products. It is being used for object
parameters in Java and ActiveX applications.
Interpretation of the data within a "data" URL has the same security
considerations as any implementation of the given media type. An
application should not interpret the contents of a data URL which is
marked with a media type that has been disallowed for processing by
the application's configuration.
Sites which use firewall proxies to disallow the retrieval of certain
media types (such as application script languages or types with known
security problems) will find it difficult to screen against the
inclusion of such types using the "data" URL scheme. However, they
should be aware of the threat and take whatever precautions are
considered necessary within their domain.
The effect of using long "data" URLs in applications is currently
unknown; some software packages may exhibit unreasonable behavior
when confronted with data that exceeds its allocated buffer size.
Uniform Resource Identifiers (URI): Generic SyntaxWorld Wide Web Consortiumtimbl@w3.orgUniversity of California, Irvinefielding@ics.uci.eduXerox PARCmasinter@parc.xerox.comHypertext Markup Language - 2.0MIT Laboratory for Computer Sciencetimbl@w3.orgMIT Laboratory for Computer Science, W3 Consortiumconnolly@w3.orgMultipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message BodiesInnosoft International, Inc.ned@innosoft.comFirst Virtual Holdingsnsb@nsb.fv.com