Uri#
Added in version 2.66.
- class Uri(*args, **kwargs)#
The GUri type and related functions can be used to parse URIs into
their components, and build valid URIs from individual components.
Since GUri only represents absolute URIs, all GUris will have a
URI scheme, so get_scheme will always return a non-NULL
answer. Likewise, by definition, all URIs have a path component, so
get_path will always return a non-NULL string (which may
be empty).
If the URI string has an
‘authority’ component (that
is, if the scheme is followed by :// rather than just :), then the
GUri will contain a hostname, and possibly a port and ‘userinfo’.
Additionally, depending on how the GUri was constructed/parsed (for example,
using the G_URI_FLAGS_HAS_PASSWORD and G_URI_FLAGS_HAS_AUTH_PARAMS flags),
the userinfo may be split out into a username, password, and
additional authorization-related parameters.
Normally, the components of a GUri will have all %-encoded
characters decoded. However, if you construct/parse a GUri with
G_URI_FLAGS_ENCODED, then the %-encoding will be preserved instead in
the userinfo, path, and query fields (and in the host field if also
created with G_URI_FLAGS_NON_DNS). In particular, this is necessary if
the URI may contain binary data or non-UTF-8 text, or if decoding
the components might change the interpretation of the URI.
For example, with the encoded flag:
g_autoptr(GUri) uri = g_uri_parse ("http://host/path?query=http%3A%2F%2Fhost%2Fpath%3Fparam%3Dvalue", G_URI_FLAGS_ENCODED, &err);
g_assert_cmpstr (g_uri_get_query (uri), ==, "query=http%3A%2F%2Fhost%2Fpath%3Fparam%3Dvalue");
While the default %-decoding behaviour would give:
g_autoptr(GUri) uri = g_uri_parse ("http://host/path?query=http%3A%2F%2Fhost%2Fpath%3Fparam%3Dvalue", G_URI_FLAGS_NONE, &err);
g_assert_cmpstr (g_uri_get_query (uri), ==, "query=http://host/path?param=value");
During decoding, if an invalid UTF-8 string is encountered, parsing will fail with an error indicating the bad string location:
g_autoptr(GUri) uri = g_uri_parse ("http://host/path?query=http%3A%2F%2Fhost%2Fpath%3Fbad%3D%00alue", G_URI_FLAGS_NONE, &err);
g_assert_error (err, G_URI_ERROR, G_URI_ERROR_BAD_QUERY);
You should pass G_URI_FLAGS_ENCODED or G_URI_FLAGS_ENCODED_QUERY if you
need to handle that case manually. In particular, if the query string
contains = characters that are %-encoded, you should let
parse_params do the decoding once of the query.
GUri is immutable once constructed, and can safely be accessed from
multiple threads. Its reference counting is atomic.
Note that the scope of GUri is to help manipulate URIs in various applications,
following RFC 3986. In particular,
it doesn’t intend to cover web browser needs, and doesn’t implement the
WHATWG URL standard. No APIs are provided to
help prevent
homograph attacks, so
GUri is not suitable for formatting URIs for display to the user for making
security-sensitive decisions.
Relative and absolute URIs#
As defined in RFC 3986, the hierarchical nature of URIs means that they can either be ‘relative references’ (sometimes referred to as ‘relative URIs’) or ‘URIs’ (for clarity, ‘URIs’ are referred to in this documentation as ‘absolute URIs’ — although in contrast to RFC 3986, fragment identifiers are always allowed).
Relative references have one or more components of the URI missing. In
particular, they have no scheme. Any other component, such as hostname,
query, etc. may be missing, apart from a path, which has to be specified (but
may be empty). The path may be relative, starting with ./ rather than /.
For example, a valid relative reference is ./path?query,
/?query#fragment or //example.com.
Absolute URIs have a scheme specified. Any other components of the URI which
are missing are specified as explicitly unset in the URI, rather than being
resolved relative to a base URI using parse_relative.
For example, a valid absolute URI is file:///home/bob or
https://search.com?query=string.
A GUri instance is always an absolute URI. A string may be an absolute URI
or a relative reference; see the documentation for individual functions as to
what forms they accept.
Parsing URIs#
The most minimalist APIs for parsing URIs are split and
split_with_user. These split a URI into its component
parts, and return the parts; the difference between the two is that
split treats the ‘userinfo’ component of the URI as a
single element, while split_with_user can (depending on the
UriFlags you pass) treat it as containing a username, password,
and authentication parameters. Alternatively, split_network
can be used when you are only interested in the components that are
needed to initiate a network connection to the service (scheme,
host, and port).
parse is similar to split, but instead of
returning individual strings, it returns a GUri structure (and it requires
that the URI be an absolute URI).
resolve_relative and parse_relative allow
you to resolve a relative URI relative to a base URI.
resolve_relative takes two strings and returns a string,
and parse_relative takes a GUri and a string and returns a
GUri.
All of the parsing functions take a UriFlags argument describing
exactly how to parse the URI; see the documentation for that type
for more details on the specific flags that you can pass. If you
need to choose different flags based on the type of URI, you can
use peek_scheme on the URI string to check the scheme
first, and use that to decide what flags to parse it with.
For example, you might want to use G_URI_PARAMS_WWW_FORM when parsing the
params for a web URI, so compare the result of peek_scheme
against http and https.
Building URIs#
join and join_with_user can be used to construct
valid URI strings from a set of component strings. They are the
inverse of split and split_with_user.
Similarly, build and build_with_user can be
used to construct a GUri from a set of component strings.
As with the parsing functions, the building functions take a
UriFlags argument. In particular, it is important to keep in mind
whether the URI components you are using are already %-encoded. If so,
you must pass the G_URI_FLAGS_ENCODED flag.
Note that Windows and Unix both define special rules for parsing
file:// URIs (involving non-UTF-8 character sets on Unix, and the
interpretation of path separators on Windows). GUri does not
implement these rules. Use filename_from_uri and
filename_to_uri if you want to properly convert between
file:// URIs and local filenames.
URI Equality#
Note that there is no g_uri_equal () function, because comparing
URIs usefully requires scheme-specific knowledge that GUri does
not have. GUri can help with normalization if you use the various
encoded UriFlags as well as G_URI_FLAGS_SCHEME_NORMALIZE
however it is not comprehensive.
For example, data:,foo and data:;base64,Zm9v resolve to the same
thing according to the data: URI specification which GLib does not
handle.
Methods#
- class Uri
- classmethod build(scheme: str, userinfo: str | None, host: str | None, port: int, path: str, query: str | None = None, fragment: str | None = None) Uri#
Creates a new
Urifrom the given components according toflags.See also
build_with_user(), which allows specifying the components of the “userinfo” separately.Added in version 2.66.
- Parameters:
scheme – the URI scheme
userinfo – the userinfo component, or
Nonehost – the host component, or
Noneport – the port, or
-1path – the path component
query – the query component, or
Nonefragment – the fragment, or
None
- classmethod build_with_user(scheme: str, user: str | None, password: str | None, auth_params: str | None, host: str | None, port: int, path: str, query: str | None = None, fragment: str | None = None) Uri#
Creates a new
Urifrom the given components according toflags(HAS_PASSWORDis added unconditionally). Theflagsmust be coherent with the passed values, in particular use%-encoded values withENCODED.In contrast to
build(), this allows specifying the components of the ‘userinfo’ field separately. Note thatusermust be non-Noneif eitherpasswordorauth_paramsis non-None.Added in version 2.66.
- Parameters:
scheme – the URI scheme
user – the user component of the userinfo, or
Nonepassword – the password component of the userinfo, or
Noneauth_params – the auth params of the userinfo, or
Nonehost – the host component, or
Noneport – the port, or
-1path – the path component
query – the query component, or
Nonefragment – the fragment, or
None
- classmethod escape_bytes(reserved_chars_allowed: str | None = None) str#
Escapes arbitrary data for use in a URI.
Normally all characters that are not ‘unreserved’ (i.e. ASCII alphanumerical characters plus dash, dot, underscore and tilde) are escaped. But if you specify characters in
reserved_chars_allowedthey are not escaped. This is useful for the ‘reserved’ characters in the URI specification, since those are allowed unescaped in some portions of a URI.Though technically incorrect, this will also allow escaping nul bytes as
%``00.Added in version 2.66.
- Parameters:
reserved_chars_allowed – a string of reserved characters that are allowed to be used, or
None.
- classmethod escape_string(reserved_chars_allowed: str | None, allow_utf8: bool) str#
Escapes a string for use in a URI.
Normally all characters that are not “unreserved” (i.e. ASCII alphanumerical characters plus dash, dot, underscore and tilde) are escaped. But if you specify characters in
reserved_chars_allowedthey are not escaped. This is useful for the “reserved” characters in the URI specification, since those are allowed unescaped in some portions of a URI.Added in version 2.16.
- Parameters:
reserved_chars_allowed – a string of reserved characters that are allowed to be used, or
None.allow_utf8 –
Trueif the result can include UTF-8 characters.
- get_auth_params() str | None#
Gets
uri's authentication parameters, which may contain%-encoding, depending on the flags with whichuriwas created. (Ifuriwas not created withHAS_AUTH_PARAMSthen this will beNone.)Depending on the URI scheme,
parse_params()may be useful for further parsing this information.Added in version 2.66.
- get_fragment() str | None#
Gets
uri's fragment, which may contain%-encoding, depending on the flags with whichuriwas created.Added in version 2.66.
- get_host() str | None#
Gets
uri's host. This will never have%-encoded characters, unless it is non-UTF-8 (which can only be the case ifuriwas created withNON_DNS).If
uricontained an IPv6 address literal, this value will be just that address, without the brackets around it that are necessary in the string form of the URI. Note that in this case there may also be a scope ID attached to the address. Eg,fe80::1234%``em1(orfe80::1234%``25em1if the string is still encoded).Added in version 2.66.
- get_password() str | None#
Gets
uri's password, which may contain%-encoding, depending on the flags with whichuriwas created. (Ifuriwas not created withHAS_PASSWORDthen this will beNone.)Added in version 2.66.
- get_path() str#
Gets
uri's path, which may contain%-encoding, depending on the flags with whichuriwas created.Added in version 2.66.
- get_query() str | None#
Gets
uri's query, which may contain%-encoding, depending on the flags with whichuriwas created.For queries consisting of a series of
name=valueparameters,UriParamsIterorparse_params()may be useful.Added in version 2.66.
- get_scheme() str#
Gets
uri's scheme. Note that this will always be all-lowercase, regardless of the string or strings thaturiwas created from.Added in version 2.66.
- get_user() str | None#
Gets the ‘username’ component of
uri's userinfo, which may contain%-encoding, depending on the flags with whichuriwas created. Ifuriwas not created withHAS_PASSWORDorHAS_AUTH_PARAMS, this is the same asget_userinfo().Added in version 2.66.
- get_userinfo() str | None#
Gets
uri's userinfo, which may contain%-encoding, depending on the flags with whichuriwas created.Added in version 2.66.
- classmethod is_valid(flags: UriFlags) bool#
Parses
uri_stringaccording toflags, to determine whether it is a valid absolute URI, i.e. it does not need to be resolved relative to another URI usingparse_relative().If it’s not a valid URI, an error is returned explaining how it’s invalid.
See
split(), and the definition ofUriFlags, for more information on the effect offlags.Added in version 2.66.
- Parameters:
flags – flags for parsing
uri_string
- classmethod join(scheme: str | None, userinfo: str | None, host: str | None, port: int, path: str, query: str | None = None, fragment: str | None = None) str#
Joins the given components together according to
flagsto create an absolute URI string.pathmay not beNone(though it may be the empty string).When
hostis present,pathmust either be empty or begin with a slash (/) character. Whenhostis not present,pathcannot begin with two slash characters (//). See RFC 3986, section 3.See also
join_with_user(), which allows specifying the components of the ‘userinfo’ separately.HAS_PASSWORDandHAS_AUTH_PARAMSare ignored if set inflags.Added in version 2.66.
- Parameters:
scheme – the URI scheme, or
Noneuserinfo – the userinfo component, or
Nonehost – the host component, or
Noneport – the port, or
-1path – the path component
query – the query component, or
Nonefragment – the fragment, or
None
- classmethod join_with_user(scheme: str | None, user: str | None, password: str | None, auth_params: str | None, host: str | None, port: int, path: str, query: str | None = None, fragment: str | None = None) str#
Joins the given components together according to
flagsto create an absolute URI string.pathmay not beNone(though it may be the empty string).In contrast to
join(), this allows specifying the components of the ‘userinfo’ separately. It otherwise behaves the same.HAS_PASSWORDandHAS_AUTH_PARAMSare ignored if set inflags.Added in version 2.66.
- Parameters:
scheme – the URI scheme, or
Noneuser – the user component of the userinfo, or
Nonepassword – the password component of the userinfo, or
Noneauth_params – the auth params of the userinfo, or
Nonehost – the host component, or
Noneport – the port, or
-1path – the path component
query – the query component, or
Nonefragment – the fragment, or
None
- classmethod list_extract_uris() list[str]#
Splits an URI list conforming to the text/uri-list mime type defined in RFC 2483 into individual URIs, discarding any comments. The URIs are not validated.
Added in version 2.6.
- classmethod parse(flags: UriFlags) Uri#
Parses
uri_stringaccording toflags. If the result is not a valid absolute URI, it will be discarded, and an error returned.Added in version 2.66.
- Parameters:
flags – flags describing how to parse
uri_string
- classmethod parse_params(length: int, separators: str, flags: UriParamsFlags) dict[str, str]#
Many URI schemes include one or more attribute/value pairs as part of the URI value. This method can be used to parse them into a hash table. When an attribute has multiple occurrences, the last value is the final returned value. If you need to handle repeated attributes differently, use
UriParamsIter.The
paramsstring is assumed to still be%-encoded, but the returned values will be fully decoded. (Thus it is possible that the returned values may contain=orseparators, if the value was encoded in the input.) Invalid%-encoding is treated as with thePARSE_RELAXEDrules forparse(). (However, ifparamsis the path or query string from aUrithat was parsed withoutPARSE_RELAXEDandENCODED, then you already know that it does not contain any invalid encoding.)WWW_FORMis handled as documented forinit().If
CASE_INSENSITIVEis passed toflags, attributes will be compared case-insensitively, so a params stringattr=123&Attr=456will only return a single attribute–value pair,Attr=456. Case will be preserved in the returned attributes.If
paramscannot be parsed (for example, it contains twoseparatorscharacters in a row), thenerroris set andNoneis returned.Added in version 2.66.
- Parameters:
length – the length of
params, or-1if it is nul-terminatedseparators – the separator byte character set between parameters. (usually
&, but sometimes;or both&;). Note that this function works on bytes not characters, so it can’t be used to delimit UTF-8 strings for anything but ASCII characters. You may pass an empty set, in which case no splitting will occur.flags – flags to modify the way the parameters are handled.
- parse_relative(uri_ref: str, flags: UriFlags) Uri#
Parses
uri_refaccording toflagsand, if it is a relative URI, resolves it relative tobase_uri. If the result is not a valid absolute URI, it will be discarded, and an error returned.Added in version 2.66.
- Parameters:
uri_ref – a string representing a relative or absolute URI
flags – flags describing how to parse
uri_ref
- classmethod parse_scheme() str | None#
Gets the scheme portion of a URI string. RFC 3986 decodes the scheme as:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
Common schemes include
file,https,svn+ssh, etc.Added in version 2.16.
- classmethod peek_scheme() str | None#
Gets the scheme portion of a URI string. RFC 3986 decodes the scheme as:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
Common schemes include
file,https,svn+ssh, etc.Unlike
parse_scheme(), the returned scheme is normalized to all-lowercase and does not need to be freed.Added in version 2.66.
- classmethod resolve_relative(uri_ref: str, flags: UriFlags) str#
Parses
uri_refaccording toflagsand, if it is a relative URI, resolves it relative tobase_uri_string. If the result is not a valid absolute URI, it will be discarded, and an error returned.(If
base_uri_stringisNone, this just returnsuri_ref, orNoneifuri_refis invalid or not absolute.)Added in version 2.66.
- Parameters:
uri_ref – a string representing a relative or absolute URI
flags – flags describing how to parse
uri_ref
- classmethod split(flags: UriFlags) tuple[bool, str | None, str | None, str | None, int, str, str | None, str | None]#
Parses
uri_ref(which can be an absolute or relative URI) according toflags, and returns the pieces. Any component that doesn’t appear inuri_refwill be returned asNone(but note that all URIs always have a path component, though it may be the empty string).If
flagscontainsENCODED, then%-encoded characters inuri_refwill remain encoded in the output strings. (If not, then all such characters will be decoded.) Note that decoding will only work if the URI components are ASCII or UTF-8, so you will need to useENCODEDif they are not.Note that the
HAS_PASSWORDandHAS_AUTH_PARAMSflagsare ignored bysplit(), since it always returns only the full userinfo; usesplit_with_user()if you want it split up.Added in version 2.66.
- Parameters:
flags – flags for parsing
uri_ref
- classmethod split_network(flags: UriFlags) tuple[bool, str | None, str | None, int]#
Parses
uri_string(which must be an absolute URI) according toflags, and returns the pieces relevant to connecting to a host. See the documentation forsplit()for more details; this is mostly a wrapper around that function with simpler arguments. However, it will return an error ifuri_stringis a relative URI, or does not contain a hostname component.Added in version 2.66.
- Parameters:
flags – flags for parsing
uri_string
- classmethod split_with_user(flags: UriFlags) tuple[bool, str | None, str | None, str | None, str | None, str | None, int, str, str | None, str | None]#
Parses
uri_ref(which can be an absolute or relative URI) according toflags, and returns the pieces. Any component that doesn’t appear inuri_refwill be returned asNone(but note that all URIs always have a path component, though it may be the empty string).See
split(), and the definition ofUriFlags, for more information on the effect offlags. Note thatpasswordwill only be parsed out ifflagscontainsHAS_PASSWORD, andauth_paramswill only be parsed out ifflagscontainsHAS_AUTH_PARAMS.Added in version 2.66.
- Parameters:
flags – flags for parsing
uri_ref
- to_string() str#
Returns a string representing
uri.This is not guaranteed to return a string which is identical to the string that
uriwas parsed from. However, if the source URI was syntactically correct (according to RFC 3986), and it was parsed withENCODED, thento_string()is guaranteed to return a string which is at least semantically equivalent to the source URI (according to RFC 3986).If
urimight contain sensitive details, such as authentication parameters, or private data in its query string, and the returned string is going to be logged, then consider usingto_string_partial()to redact parts.Added in version 2.66.
- to_string_partial(flags: UriHideFlags) str#
Returns a string representing
uri, subject to the options inflags. Seeto_string()andUriHideFlagsfor more details.Added in version 2.66.
- Parameters:
flags – flags describing what parts of
urito hide
- classmethod unescape_bytes(length: int, illegal_characters: str | None = None) Bytes#
Unescapes a segment of an escaped string as binary data.
Note that in contrast to
unescape_string(), this does allow nul bytes to appear in the output.If any of the characters in
illegal_charactersappears as an escaped character inescaped_string, then that is an error andNonewill be returned. This is useful if you want to avoid for instance having a slash being expanded in an escaped path element, which might confuse pathname handling.Added in version 2.66.
- Parameters:
length – the length (in bytes) of
escaped_stringto escape, or-1if it is nul-terminated.illegal_characters – a string of illegal characters not to be allowed, or
None.
- classmethod unescape_segment(escaped_string_end: str | None = None, illegal_characters: str | None = None) str | None#
Unescapes a segment of an escaped string.
If any of the characters in
illegal_charactersor the NUL character appears as an escaped character inescaped_string, then that is an error andNonewill be returned. This is useful if you want to avoid for instance having a slash being expanded in an escaped path element, which might confuse pathname handling.Note:
NULbyte is not accepted in the output, in contrast tounescape_bytes().Added in version 2.16.
- Parameters:
escaped_string_end – Pointer to end of
escaped_string, may beNoneillegal_characters – An optional string of illegal characters not to be allowed, may be
None
- classmethod unescape_string(illegal_characters: str | None = None) str | None#
Unescapes a whole escaped string.
If any of the characters in
illegal_charactersor the NUL character appears as an escaped character inescaped_string, then that is an error andNonewill be returned. This is useful if you want to avoid for instance having a slash being expanded in an escaped path element, which might confuse pathname handling.Added in version 2.16.
- Parameters:
illegal_characters – a string of illegal characters not to be allowed, or
None.