5.6. HTTP Cookies

This module contains classes for handling Cookies, as defined by RFC6265 HTTP State Management Mechanism

5.6.1. Client Scenarios

By default, Pyslet’s HTTP client does not support cookies. Adding support, if you want it, is done with the CookieStore class. All you need to do is create an instance and add it to the client before processing any requests:

import pyslet.http.client as http

client = http.Client()
cookie_store = http.cookie.CookieStore()
client.set_cookie_store(cookie_store)

Support for cookies is then transparently added to each request.

By default, the CookieStore object does not support domain cookies because it doesn’t know which domains are effectively top level domains (TLDs) so treats all domains as effective TLDs. Domain cookies can’t be stored for TLDs as this would allow a website at www.exampleA.com to set or overwrite a cookie in the ‘com’ domain which would then be sent to www.exampleB.com. There are lots of reasons why this is a bad idea, websites could disrupt each others operation or worse, compromise security and user privacy.

For most applications you can fix this by creating exceptions for domains you want your client to trust. For example, if you want to interact with www.example.com and www2.example.com you might want to allow domain cookies for example.com, knowing that the effective TLD in this case is simply ‘com’.

cookie_store.add_private_suffix(‘example.com’)

If you want to emulate the behaviour of real browsers you will need to upload a proper database of effective TLDs. For more information see CookieStore.fetch_public_suffix_list() and CookieStore.set_public_list(). Be warned, the public suffix list changes routinely and you’ll want to ensure you have the latest values loaded.

5.6.2. Web Application Scenarios

If you are writing a web application you may want to handle cookies directly by adding response headers explicitly to a response object provided by your web framework.

There are two classes for representing cookie definitions, you should use the stricter Section4Cookie when creating cookies as this follows the recommended syntax in the RFC and will catch problems such as attempting to set a cookie value containing a comma. Although user agents are supposed to cope with such values some systems are now rejecting cookies that do not adhere to the stricter section 4 definitions.

The following code creates a cookie called SID with a maximum lifespan of 15 minutes:

import pyslet.http.cookie as cookie

c = cookie.Section4Cookie("SID", "31d4d96e407aad42", max_age=15*60,
                          path="/", http_only=True, secure=True)
print c

It outputs the text required to set the Set-Cookie header:

SID=31d4d96e407aad42; Path=/; Max-Age=900; Secure; HttpOnly

You may want to add additional attributes such as an expires time for backwards compatibility or a domain to allow the cookie to be sent to other websites in a shared domain. See Cookie for details.

5.6.3. Reference

class pyslet.http.cookie.Cookie(name, value, path=None, domain=None, expires=None, max_age=None, secure=False, http_only=False, extensions=None)

Bases: object

Represents the definition of a cookie

name
The name of the cookie
value
The value of the cookie
path (optional)
A string: the path of the cookie. If None then the ‘directory’ of the page that returned the cookie will be used by the client.
domain (optional)
A string: the domain of the cookie. If None then the host name of the server that returned the cookie will be used by the client and the cookie will be treated as ‘host only’.
expires (optional)
An TimePoint instance. If None then the cookie will be treated as a session cookie by the client.
max_age (optional)
An integer, the length of time before the cookie expires in seconds. Overrides the expires value. If None then the value of expires is used instead, if both are None then the cookie will be treated as a session cookie by the client.
secure (Default: False)
Whether or not the cookie should be exposed only over secure protocols, such as https.
http_only (Default: False)
Whether or not the cookie should be exposed only via the HTTP protocol. Recommended value: True!
extensions
A list strings containing attribute extensions. The strings should be of the form name=value but this is not enforced.

Instances can be converted to strings using the builtin str function and the output that results is a valid Set-Cookie header value.

name = None

the cookie’s name

value = None

the cookie’s name

path = None

the cookie’s path

domain = None

the cookie’s domain

secure = None

the cookie’s secure flag

http_only = None

the cookie’s httponly flag

creation_time = None

the creation time of the cookie, initialised to the current time as returned by the builtin time.time function.

access_time = None

the last access time of the cookie, initialised to the current time as returned by the builtin time.time function.

expires_time = None

the expiry time of the cookie, as an integer compatible with the value returned by time.time

max_age = None

the max_age value

expires = None

the expires value as passed to the constructor, this is preserved and is used when serialising the definition even if Max-Age is also in effect. Some older clients may not support Max-Age and they will look at the Expires time instead.

extensions = None

the list of extensions

classmethod from_str(src)

Creates a new instance from a src string

The string is parsed using the generous parsing rules of Section 5 of the specification. Returns a new instance.

is_persistent()

Returns True if there is no expires time on this cookie.

The expires time is calculated from either the max_age or expires attributes.

is_hostonly()

Returns True if this cookie is ‘host only’

In other words, it should only be sent to the host that set the cookie originally.

touch(now=None)

Updates the cookie’s last access time.

now (optional)
Time value to use. This can be in the past or the future and improves performance when updating multiple cookies simultaneously.
expired(now=None)

Returns True if the cookie has expired

now (optional)
Time value at which to test, this can be in the past or the future and is largely provided to aid testing and also to improve performance when a large number of cookies need to be tested sequentially.
class pyslet.http.cookie.Section4Cookie(*args, **kwargs)

Bases: pyslet.http.cookie.Cookie

Represents a strict cookie definition

The purpose of this class is wrap Cookie to enforce more validation rules on the definition to ensure that the cookie adheres to section 4 syntax, and not just the broader section 5 syntax.

Names are checked for token validity, values are checked against the syntax for cookie-value and the attributes are checked against the other constraints in the specification.

The built-in str function will return a string that is valid against the section 4 syntax.

classmethod from_str(src)

Creates a new instance from a src string

Overridden to provide stricter parsing. This may still appear more generous than expected because the strict syntax allows an unrestricted set of attribute extensions so unrecognised attributes will often be recorded but not in any useful way.

5.6.3.1. Client Support

User agents that support cookies are obliged to keep a cookie store in which cookies can be saved and retrieved keyed on their domain, path and cookie name.

Pyslet’s approach is to provide an in-memory store with nodes defined for each domain (host) that a cookie has been associated with or which is the target of a public or private suffix rule. Nodes are also created for any implied parent domains and the result is a tree-like structure of dictionaries that can be quickly searched for each request.

class pyslet.http.cookie.CookieStore

Bases: object

An object that provides in-memory storage for cookies.

There are no initialisation options. By default, the cookie storage will refuse all ‘domain’ cookies. That is, cookies that have a domain attribute. If a domain cookie is received from a host that exactly matches its domain attribute then it is converted to a host-only cookie and is stored.

This behaviour can be changed by adding exclusions (in the form of calls to add_private_suffix()) or by loading in a new public prefix database using set_public_list().

Store a cookie.

urL
A URI instance representing the resource that is setting the cookie.
c
A Cookie instance, typically parsed from a Set-Cookie header returned when requesting the resource at url.

If the cookie can’t be set then CookieError is raised. Reasons why a cookie might be refused are a mismatch between a domain attribute and the url, or an attempt to set a cookie in a public domain, such as ‘co.uk’.

search(url)

Searches for cookies that match a resource

url
A URI instance representing the resource that we want to find cookies for.

The return result is a sorted list of Cookie objects. The sort order is defined in the specification, longer paths are sorted first, otherwise older cookies are listed before newer ones.

Expired cookies are automatically removed from the repository and all cookies returned have their access time updated to the current time.

expire_cookies(now=None, dnode=None)

Expire stored cookies.

now (optional)
The time at which to expire the cookies, defaults to the current time. This can be used to expire cookies based on some past or future point.

Iterates through all stored cookies and removes any that have expired.

end_session(now=None, dnode=None)

Expire all session cookies.

now (optional)
The time at which to expire cookies. See expire_cookies() for details.

Iterates through all stored cookies and removes any session cookies in addition to any that have expired.

add_public_suffix(suffix)

Marks a domain suffix as being public.

suffix
A string: a public suffix, may contain wild-card characters to match any entire label, for example: “.uk”, “.tokyo.jp”, “com”

Once a domain suffix is marked as being public future cookies will not be stored against that suffix (except in the unusual case where a cookie is ‘host only’ and the host name is a public suffix).

add_private_suffix(suffix)

Marks a domain suffix as being private.

suffix
A string: a public suffix, may contain wild-card characters to match any entire label, for example: “example.co.uk”, “*.tokyo.jp”, “com”

This method is required to override an existing public rule, thereby ensuring that future cookies can be stored against domains matching this suffix.

classmethod fetch_public_suffix_list(fpath, src='https://publicsuffix.org/list/effective_tld_names.dat', overwrite=False)

Fetches the public suffix list and saves to fpath

fpath
A local file path to save the file in
src
A string or URI instance pointing at the file to retrieve. It default to the data file https://publicsuffix.org/list/effective_tld_names.dat
overwrite (Default: False)
A flag to force an overwrite of an existing file at fpath, by default, if fpath already exists this method returns without doing anything.
set_public_list(black_list, tld_depth=1)

Loads a new public suffix list

black_list
A string containing a list of public suffixes in the format defined by: https://publicsuffix.org/list/
tld_depth (Default: 1)
The depth of domain that will be automatically treated as public. The default is 1, meaning that all top-level domains will be treated as public.

This methods loads data from a public list using calls to add_public_prefix() and add_private_prefix(), the latter being for exclusion rules.

If you use the full list published by the Public Suffix List project it is safe to use the default tld_depth value of 1:

https://publicsuffix.org/list/effective_tld_names.dat

If you want to load a much smaller list then you should focus on a large value for tld_depth (255 for example) and documenting exclusions only. For example:

// Exclusion list
// Accept domain cookies for example.com, example.co.uk
!example.com
!example.co.uk
test_public_domain(domain_str)

Test if a domain is public

domain_str
A domain string, e.g., “www.example.com”

Returns True if this domain is marked as public, False otherwise.

get_registered_domain(domain_str, u_labels=False)

Returns the publicly registered portion of a domain

domain_str
A domain string, e.g., “www.example.com”
u_labels (Default: False)
Flag indicating whether or not to return unicode labels instead of encoded ASCII Labels.

Compares this domain against the database of public domains and returns the publicly registered part of the domain. For example, www.example.com would typically return example.com and www.example.co.uk would typically return example.co.uk.

If domain_str is already a publicly registered domain then it returns None. If domain_str is itself None, None is also returned.

Initially, all domains are marked as public so this function will always return None. It iss intended for use after a public list has been loaded, such as the public suffix list (see set_public_list()).

check_public_suffix(domain_str, match_str)

See Public Suffix Test Data for details.

http://mxr.mozilla.org/mozilla-central/source/netwerk/test/unit/data/test_psl.txt?raw=1

Returns True if there is a match, False otherwise. Negative results are logged at ERROR level. Used for testing the public suffixes loaded with set_public_list().

5.6.3.2. Syntax

The following basic functions can be used to test characters against the syntax productions defined in the specification. In each case, if the argument is None then False is returned.

class pyslet.http.cookie.CookieParser(source)

Bases: pyslet.http.grammar.OctetParser

General purpose class for parsing RFC6265 productions

Unlike the basic syntax functions these methods allow a longer string, such as that received from an HTTP header, to be parsed into its component parts.

Methods follow inherited naming conventions, require_ methods raise a ValueError if the production is not matched whereas parse_ methods optionally parse a production if it is present and return None if not present.

Parses the set-cookie-string production

strict (Default: False)
Use the stricter section 4 syntax rules instead of the more permissive algorithm described in section 5.2

This is the format of the Set-Cookie header, it returns a Cookie instance or None if this cookie definition should be ignored.

Parses the value of a Cookie header.

strict (Default: False)
Indicates if stricter section 4 parsing is required.

Returns a dictionary of values, the keys are the names of the cookies in the cookie string and the values are either strings or, in the case of multiply defined names, sets of strings. We use sets as the specification makes it clear that you should not rely on the order of such definitions.

require_name_value_pair()

Returns a (name, value) pair

Parsed according to the looser section 5 syntax so will allow almost anything as a name and value provided it has an ‘=’.

Returns a (name, value) pair parsed according to cookie-pair

Parsed according to the stricter section 4 syntax so will only accept valid tokens as names, the ‘=’ is required and the value must be parseable with require_cookie_value().

See: require_cookie_pair()

If not parsed returns (None, None) rather than just None.

Returns a cookie-value string.

Parsed according to the stricter section 4 syntax so will not allow whitespace, comma, semicolon or backslash characters and will only allows double-quote when it is used to complete “enclose” the value, in which case the double-quotes are still considered to be part of the value string.

Parses a cookie-av string.

This production is effectively the production for extension-av in the stricter section 4 syntax. Effectively it returns everything up to but not including the next ‘;’ or CTL character.

It never returns None, if nothing is found it returns an empty string instead.

Parses the sane-cookie-date production.

This is the stricter syntax defined in section 4. The returns result is a FullDate instance.

Parses a date-token-list

This uses the weak section 5.1 syntax

It never returns None, if there are no tokens then it returns an empty list. Delimiters are always discarded.

Parses a date value

This uses the weak section 5.1 syntax and the algorithm described there. It absorbs almost all errors returning None if this date value should be ignored - but warnings are logged to alert you to the failure. The implications of replacing a date with None in this syntax are typically that a cookie that is supposed to be persistent become session only. However, if this was an attempt to remove a cookie with a very early date then the failure could cause more problems.

If successful, it returns a FullDate instance.

5.6.3.2.1. Date and Time

pyslet.http.cookie.split_year(year_str)

Parses a year from a string

Uses the generous rules in section 5.1 and returns a year value, adjusted using the 2-digit year algorithm documented there.

If a year value can’t be found ValueError is raised.

pyslet.http.cookie.split_month(month_str)

Parses a month from a string

Uses the generous rules in section 5.1 and returns a month value from 1 (January) to 12 (December).

If a month value can’t be found ValueError is raised.

pyslet.http.cookie.split_day_of_month(dom_str)

Parses a day-of-month from a string

Users the generous rules in section 5.1 and returns a single integer or raises ValueError if a valid day of month can’t be found.

pyslet.http.cookie.split_time(time_str)

Parses a time from a string

Users the generous rules in section 5.1 and returns a triple of hours, minutes, seconds. These values are unchecked!

If the time can’t be found ValueError is raised.

5.6.3.2.2. Basic Syntax

pyslet.http.cookie.is_delimiter(c)

Tests a character against the production delimiter

This production is from the weaker section 5 syntax of RFC6265.

pyslet.http.cookie.is_non_delimiter(c)

Tests a character against the production non-delimiter

The result differs from using not is_delimiter only in the handling of None which will return False when passed to either function.

pyslet.http.cookie.is_non_digit(c)

Tests a character against the production non-digit.

Tests a character against production coookie_octet

5.6.3.2.3. Domain Name Syntax

pyslet.http.cookie.domain_in_domain(subdomain, domain)

Returns try if subdomain is a sub-domain of domain.

subdomain
A reversed list of strings returned by split_domain()
domain
A reversed list of strings as returned by split_domain()

For example:

>>> domain_in_domain(['com', 'example'],
...                  ['com', 'example', 'www'])
True
pyslet.http.cookie.split_domain(domain_str, allow_wildcard=False)

Splits a domain string

domain_str
A unicode string, or a UTF-8 encoded binary string.
allow_wildcard (Default: False)
Allows the use of a single ‘*’ character as a domain label for the purposes of parsing wildcard domain definitions.

Returns a list of lower cased ASCII labels, converting U-Labels to ACE form (xn–) in the process. For example:

>>> split_domain('example.COM')
>>> ['example', 'com']
>>> split_domain(u'\u98df\u72ee.com.cn')
>>> ['xn--85x722f', 'com', 'cn']

Raises ValueError if domain_str is not valid.

pyslet.http.cookie.is_ldh_label(label)

Tests a string against the definition of LDH label

LDH Label is defined in RFC5890 as being the classic label syntax defined in RFC1034 and updated in RFC1123. To cut a long story short the update in question is described as follows:

One aspect of host name syntax is hereby changed: the restriction on the first character is relaxed to allow either a letter or a digit.

Although not spelled out there this would make the updated syntax:

<label> ::= <let-dig> [ [ <ldh-str> ] <let-dig> ]
<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
<let-dig-hyp> ::= <let-dig> | "-"
<let-dig> ::= <letter> | <digit>
pyslet.http.cookie.is_rldh_label(label)

Tests a string against the definition or R-LDH label

As defined by RFC5890

Reserved LDH labels, known as “tagged domain names” in some other contexts, have the property that they contain “–” in the third and fourth characters but which otherwise conform to LDH label rules.

Non-Reserved LDH labels are the set of valid LDH labels that do not have “–” in the third and fourth positions.

Therefore you can test for a NR-LDH label simply by using the not operator.

pyslet.http.cookie.is_a_label(label)

Test a string against the definition of A-label.

As defined by RFC5890

In fact, this function currently only tests for being an XN– label.

the class of labels that begin with the prefix “xn–” (case independent), but otherwise conform to the rules for LDH labels [is called “XN-labels”]...

The XN-labels that are valid Punycode output are known as “A-labels” if they also meet the other criteria for IDNA-validity

So bear in mind that (a) the remainder of the label may fail to decode properly when passed to the punycode algorithm and (b) even if it does decode it may result in a string that is not actually a valid U-Label.

5.6.3.3. Exceptions

class pyslet.http.cookie.CookieError

Bases: exceptions.ValueError

Raised when an operation violates RFC6265 rules.