2.7.1. HTTP Client

2.7.1.1. Sending Requests

Here is a simple example of Pyslet’s HTTP support in action from the python interpreter:

>>> import pyslet.http.client as http
>>> c = http.Client()
>>> r = http.ClientRequest('http://odata.pyslet.org')
>>> c.process_request(r)
>>> r.response.status
200
>>> print r.response.get_content_type()
text/html; charset=UTF-8
>>> print r.response.entity_body.getvalue()
<html>
<head><title>Pyslet Home</title></head>
<body>
<p><a href="http://qtimigration.googlecode.com/"><img src="logoc-large.png" width="1024"/></a></p>
</body>
</html>
>>> c.close()

In its simplest form there are three steps required to make an HTTP request, firstly you need to create a Client object. The purpose of the Client object is sending requests and receiving responses. The second step is to create a ClientRequest object describing the request you want to make. For a simple GET request you only need to specify the URL. The third step is to instruct the Client to process the request. Once this method returns you can examine the request’s associated response. The response’s entity body is written to a StringIO object by default.

The request and response objects are both derived classes of a basic HTTP Message class. This class has methods for getting and setting headers. You can use the basic get_header() and set_header() to set headers from strings or, where provided, you can use special wrapper methods such as get_content_type() to get and set headers using special-purpose class objects that represent parsed forms of the expected value. In the case of Content-Type headers the result is a MediaType() object. Providing these special object types is one of the main reasons why Pyslet’s HTTP support is different from other clients. By exposing these structures you can reuse HTTP concepts in other contexts, particularly useful when other technical specifications make normative references to them.

Here is a glimpse of what you can do with a parsed media type, continuing the above example:

>>> type = r.response.get_content_type()
>>> type
MediaType('text','html',{'charset': ('charset', 'UTF-8')})
>>> type.type
'text'
>>> type.subtype
'html'
>>> type['charset']
'UTF-8'
>>> type['name']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyslet/http/params.py", line 382, in __getitem__
    repr(key))
KeyError: "MediaType instance has no parameter 'name'"
>>>
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyslet/http/params.py", line 382, in __getitem__
    repr(key))
KeyError: "MediaType instance has no parameter 'name'"

There are lots of other special get_ and set_ methods on the Message, Request and Response objects.

2.7.1.2. Pipelining

One of the use cases that Pyslet’s HTTP client is designed to cover is reusing an HTTP connection to make multiple requests to the same host. The example above takes care to close the Client object when we’re done because otherwise it would leave the connection to the server open ready for another request.

2.7.1.3. Reference

The client module imports the grammar, params, messages and auth modules and these can therefore be accessed using a single import in your code. For example:

import pyslet.http.client as http
type = http.params.MediaType('application', 'xml')

For more details of the objects exposed by those modules see pyslet.http.grammar, pyslet.http.params, pyslet.http.messages and pyslet.http.auth.

class pyslet.http.client.Client(max_connections=100, ca_certs=None, timeout=None)

Bases: pyslet.pep8.PEP8Compatibility, object

An HTTP client

Note

In Pyslet 0.4 and earlier the name HTTPRequestManager was used, this name is still available as an alias for Client.

The object manages the sending and receiving of HTTP/1.1 requests and responses respectively. There are a number of keyword arguments that can be used to set operational parameters:

max_connections
The maximum number of HTTP connections that may be open at any one time. The method queue_request() will block (or raise RequestManagerBusy) if an attempt to queue a request would cause this limit to be exceeded.
timeout
The maximum wait time on the connection. This is not the same as a limit on the total time to receive a request but a limit on the time the client will wait with no activity on the connection before assuming that the server is no longer responding. Defaults to None, no timeout.
ca_certs

The file name of a certificate file to use when checking SSL connections. For more information see http://docs.python.org/2.7/library/ssl.html

In practice, there seem to be serious limitations on SSL connections and certificate validation in Python distributions linked to earlier versions of the OpenSSL library (e.g., Python 2.6 installed by default on OS X and Windows).

Warning

By default, ca_certs is optional and can be passed as None. In this mode certificates will not be checked and your connections are not secure from man in the middle attacks. In production use you should always specify a certificate file if you expect to use the object to make calls to https URLs.

Although max_connections allows you to make multiple connections to the same host+port the request manager imposes an additional restriction. Each thread can make at most 1 connection to each host+port. If multiple requests are made to the same host+port from the same thread then they are queued and will be sent to the server over the same connection using HTTP/1.1 pipelining. The manager (mostly) takes care of the following restriction imposed by RFC2616:

Clients SHOULD NOT pipeline requests using non-idempotent methods or non-idempotent sequences of methods

In other words, a POST (or CONNECT) request will cause the pipeline to stall until all the responses have been received. Users should beware of non-idempotent sequences as these are not automatically detected by the manager. For example, a GET,PUT sequence on the same resource is not idempotent. Users should wait for the GET request to finish fetching the resource before queuing a PUT request that overwrites it.

In summary, to take advantage of multiple simultaneous connections to the same host+port you must use multiple threads.

ConnectionClass

alias of Connection

httpUserAgent = None

The default User-Agent string to use, defaults to a string derived from the installed version of Pyslet, e.g.:

pyslet 0.5.20140727 (http.client.Client)
queue_request(request, timeout=None)

Starts processing an HTTP request

request
A messages.Request object.
timeout

Number of seconds to wait for a free connection before timing out. A timeout raises RequestManagerBusy

None means wait forever, 0 means don’t block.

The default implementation adds a User-Agent header from httpUserAgent if none has been specified already. You can override this method to add other headers appropriate for a specific context but you must pass this call on to this implementation for proper processing.

active_count()

Returns the total number of active connections.

thread_active_count()

Returns the total number of active connections associated with the current thread.

thread_task(timeout=None)

Processes all connections bound to the current thread then blocks for at most timeout (0 means don’t block) while waiting to send/receive data from any active sockets.

Each active connection receives one call to Connection.connection_task() There are some situations where this method may still block even with timeout=0. For example, DNS name resolution and SSL handshaking. These may be improved in future.

Returns True if at least one connection is active, otherwise returns False.

thread_loop(timeout=60)

Repeatedly calls thread_task() until it returns False.

process_request(request, timeout=60)

Process an messages.Message object.

The request is queued and then thread_loop() is called to exhaust all HTTP activity initiated by the current thread.

idle_cleanup(max_inactive=15)

Cleans up any idle connections that have been inactive for more than max_inactive seconds.

active_cleanup(max_inactive=90)

Clean up active connections that have been inactive for more than max_inactive seconds.

This method can be called from any thread and can be used to remove connections that have been abandoned by their owning thread. This can happen if the owning thread stops calling thread_task() leaving some connections active.

Inactive connections are killed using Connection.kill() and then removed from the active list. Should the owning thread wake up and attempt to finish processing the requests a socket error or messages.HTTPException will be reported.

close()

Closes all connections and sets the manager to a state where new connections cannot not be created.

Active connections are killed, idle connections are closed.

add_credentials(credentials)

Adds a pyslet.http.auth.Credentials instance to this manager.

Credentials are used in response to challenges received in HTTP 401 responses.

remove_credentials(credentials)

Removes credentials from this manager.

credentials
A pyslet.http.auth.Credentials instance previously added with add_credentials().

If the credentials can’t be found then they are silently ignored as it is possible that two threads may independently call the method with the same credentials.

dnslookup(host, port)

Given a host name (string) and a port number performs a DNS lookup using the native socket.getaddrinfo function. The resulting value is added to an internal dns cache so that subsequent calls for the same host name and port do not use the network unnecessarily.

If you want to flush the cache you must do so manually using flush_dns().

flush_dns()

Flushes the DNS cache.

find_credentials(challenge)

Searches for credentials that match challenge

find_credentials_by_url(url)

Searches for credentials that match url

class pyslet.http.client.ClientRequest(url, method='GET', res_body=None, protocol=<pyslet.http.params.HTTPVersion object at 0x7f81f17bc890>, auto_redirect=True, max_retries=3, **kwargs)

Bases: pyslet.http.messages.Request

Represents an HTTP request.

To make an HTTP request, create an instance of this class and then pass it to an Client instance using either Client.queue_request() or Client.process_request().

url
An absolute URI using either http or https schemes. A pyslet.rfc2396.URI instance or an object that can be passed to its constructor.

And the following keyword arguments:

method
A string. The HTTP method to use, defaults to “GET”
entity_body
A string or stream-like object containing the request body. Defaults to None meaning no message body. For stream-like objects the tell and seek methods must be supported to enable resending the request if required.
res_body
A stream-like object to write data to. Defaults to None, in which case the response body is returned as a string the res_body.
protocol
An params.HTTPVersion object, defaults to HTTPVersion(1,1)
autoredirect
Whether or not the request will follow redirects, defaults to True.
max_retries
The maximum number of times to attempt to resend the request following an error on the connection or an unexpected hang-up. Defaults to 3, you should not use a value lower than 1 because it is always possible that the server has gracefully closed the socket and we don’t notice until we’ve sent the request and get 0 bytes back on recv. Although ‘normal’ this scenario counts as a retry.
manager = None

the Client object that is managing us

connection = None

the Connection object that is currently sending us

status = None

the status code received, 0 indicates a failed or unsent request

error = None

If status == 0, the error raised during processing

scheme = None

the scheme of the request (http or https)

hostname = None

the hostname of the origin server

port = None

the port on the origin server

url = None

the full URL of the requested resource

res_body = None

the response body received (only used if not streaming)

auto_redirect = None

whether or not auto redirection is in force for 3xx responses

max_retries = None

the maximum number of retries we’ll attempt

nretries = None

the number of retries we’ve had

response = None

the associated ClientResponse

set_url(url)

Sets the URL for this request

This method sets the Host header and the following local attributes: scheme, hostname, port and request_uri.

can_retry()

Called after each connection-related failure

For idempotent methods we lose a life and check that it’s not game over. For non-idempotent methods (e.g., POST) we always return False.

set_client(client)

Called when we are queued for processing.

client
an Client instance
set_connection(connection)

Called when we are assigned to an HTTPConnection

disconnect()

Called when the connection has finished sending us

This may be before or after the response is received and handled!

finished()

Called when we have a final response and have disconnected from the connection There is no guarantee that the server got all of our data, it might even have returned a 2xx series code and then hung up before reading the data, maybe it already had what it needed, maybe it thinks a 2xx response is more likely to make us go away. Whatever. The point is that you can’t be sure that all the data was transmitted just because you got here and the server says everything is OK

class pyslet.http.client.ClientResponse(request, **kwargs)

Bases: pyslet.http.messages.Response

handle_headers()

Hook for response header processing.

This method is called when a set of response headers has been received from the server, before the associated data is received! After this call, recv will be called zero or more times until handle_message or handle_disconnect is called indicating the end of the response.

Override this method, for example, if you want to reject or invoke special processing for certain responses (e.g., based on size) before the data itself is received. To abort the response, close the connection using Connection.request_disconnect().

Override the Finished() method instead to clean up and process the complete response normally.

handle_message()

Hook for normal completion of response

handle_disconnect(err)

Hook for abnormal completion of the response

Called when the server disconnects before we’ve completed reading the response. Note that if we are reading forever this may be expected behaviour and err may be None.

We pass this information on to the request.

2.7.1.4. Exceptions

class pyslet.http.client.RequestManagerBusy

Bases: pyslet.http.messages.HTTPException

The HTTP client is busy

Raised when attempting to queue a request and no connections become available within the specified timeout.