| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
rfc.uri - URI parsing and construction Provides a set of functions to parse Uniform Resource Identifiers defined in RFC 2396 (RFC2396).
General parser of URI. These functions does not decode
URI encoding, since the parts to be decoded differ among
the uri schemes. After parsing uri, use uri-decode below
to decode them.
uri-parse is the most handy procedure. It breaks the uri
into the following parts and returns them as multiple values.
If the uri doesn't have the corresponding
parts, #f are returned for the parts.
"mailto" in "mailto:foo@example.com").
"anonymous"
in ftp://anonymous@ftp.example.com/pub/foo).
"ftp.example.com"
in ftp://anonymous@ftp.example.com/pub/foo).
8080
in http://www.example.com:8080/).
"/index.html" in
http://www.example.com/index.html).
"key=xyz&lang=en" in
http://www.example.com/search?key=xyz&lang=en).
"section4" in
http://www.example.com/document.html#section4).
The following procedures are finer grained and break up uris with different stages.
uri-scheme&specific takes a URI uri, and
returns two values, its scheme part and its scheme-specific part.
If uri doesn't have a scheme part, #f is returned for it.
(uri-scheme&specific "mailto:sclaus@north.pole") ⇒ "mailto" and "sclaus@north.pole" (uri-scheme&specific "/icons/new.gif") ⇒ #f and "/icons/new.gif" |
If the URI scheme uses hierarchical notation, i.e.
“//authority/path?query#fragment”,
you can pass
the scheme-specific part to uri-decompose-hierarchical
and it returns four values, authority, path, query
and fragment.
(uri-decompose-hierarchical "//www.foo.com/about/company.html") ⇒ "www.foo.com", "/about/company.html", #f and #f (uri-decompose-hierarchical "//zzz.org/search?key=%3fhelp") ⇒ "zzz.org", "/search", "key=%3fhelp" and #f (uri-decompose-hierarchical "//jjj.jp/index.html#whatsnew") ⇒ "jjj.jp", "/index.html", #f and "whatsnew" (uri-decompose-hierarchical "my@address") ⇒ #f, #f, #f and #f |
Furthermore, you can parse authority part of the
hierarchical URI by uri-decompose-authority.
It returns userinfo, host and port.
(uri-decompose-authority "yyy.jp:8080") ⇒ #f, "yyy.jp" and "8080" (uri-decompose-authority "mylogin@yyy.jp") ⇒ "mylogin", "yyy.jp" and #f |
Compose a URI from given components. There can be various combinations of components to create a valid URI—the following diagram shows the possible 'paths' of combinations:
/-----------------specific-------------------\
| |
scheme-+------authority-----+-+-------path*---------+-
| | | |
\-userinfo-host-port-/ \-path-query-fragment-/
|
If #f is given to a keyword argument, it is
equivalent to the absence of that keyword argument.
It is particularly useful to pass the results of
parsed uri.
If a component contains a character that is not appropriate
for that component, it must be properly escaped before
being passed to url-compose.
Some examples:
(uri-compose :scheme "http" :host "foo.com" :port 80
:path "/index.html" :fragment "top")
⇒ "http://foo.com:80/index.html#top"
(uri-compose :scheme "http" :host "foo.net"
:path* "/cgi-bin/query.cgi?keyword=foo")
⇒ "http://foo.net/cgi-bin/query.cgi?keyword=foo"
(uri-compose :scheme "mailto" :specific "a@foo.org")
⇒ "mailto:a@foo.org"
(receive (authority path query fragment)
(uri-decompose-hierarchical "//foo.jp/index.html#whatsnew")
(uri-compose :authority authority :path path
:query query :fragment fragment))
⇒ "//foo.jp/index.html#whatsnew"
|
Decodes “URI encoding”, i.e. %-escapes.
uri-decode takes input from the current input port,
and writes decoded result to the current output port.
uri-decode-string takes input from string and
returns decoded string.
If cgi-decode is true, also replaces + to a space character.
To uri-decode-string you can provide the external character
encoding by the encoding keyword argument. When it is given,
the decoded octet sequence is assumed to be in the specified encoding
and converted to the Gauche's internal character encoding.
Encodes unsafe characters by %-escape. uri-encode
takes input from the current input port and writes the result to
the current output port. uri-encode-string takes input
from string and returns the encoded string.
By default, characters that are not specified “unreserved” in
RFC3986 are escaped. You can pass different character
set to noescape argument to keep from being encoded.
For example, the older RFC2396 has several more “unreserved”
characters, and passing *rfc2396-unreserved-char-set* (see below)
prevents those characters from being escaped.
The multibyte characters are encoded as the octet stream of Gauche's
native multibyte representation by default. However, you can pass
the encoding keyword argument to uri-encode-string,
to convert string to the specified character encoding.
These constants are bound to character sets that represents
“unreserved” characters defined in RFC2396 and RFC3986, respectively.
(See Character Set, and srfi-14 - Character-set library, for
operations on character sets).
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
This document was generated by Shiro Kawai on October, 7 2008 using texi2html 1.78.