5.1. HTTP Client¶
5.1.1. Sending Requests¶
Here is a simple example of Pyslet’s HTTP support in action from the python interpreter:
>>> import pyslet.http.client as http
>>> c = http.Client()
>>> r = http.ClientRequest('http://odata.pyslet.org')
>>> c.process_request(r)
>>> r.response.status
200
>>> print r.response.get_content_type()
text/html; charset=UTF-8
>>> print r.response.entity_body.getvalue()
<html>
<head><title>Pyslet Home</title></head>
<body>
<p><a href="http://qtimigration.googlecode.com/"><img src="logoc-large.png" width="1024"/></a></p>
</body>
</html>
>>> c.close()
In its simplest form there are three steps required to make an HTTP request, firstly you need to create a Client object. The purpose of the Client object is sending requests and receiving responses. The second step is to create a ClientRequest object describing the request you want to make. For a simple GET request you only need to specify the URL. The third step is to instruct the Client to process the request. Once this method returns you can examine the request’s associated response. The response’s entity body is written to a StringIO object by default.
The request and response objects are both derived classes of a basic
HTTP Message class. This class has methods for getting and setting
headers. You can use the basic
get_header()
and
set_header()
to set headers from
strings or, where provided, you can use special wrapper methods such as
get_content_type()
to get and set
headers using special-purpose class objects that represent parsed forms
of the expected value. In the case of Content-Type headers the result
is a MediaType()
object. Providing these
special object types is one of the main reasons why Pyslet’s HTTP
support is different from other clients. By exposing these structures
you can reuse HTTP concepts in other contexts, particularly useful when
other technical specifications make normative references to them.
Here is a glimpse of what you can do with a parsed media type, continuing the above example:
>>> type = r.response.get_content_type()
>>> type
MediaType('text','html',{'charset': ('charset', 'UTF-8')})
>>> type.type
'text'
>>> type.subtype
'html'
>>> type['charset']
'UTF-8'
>>> type['name']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyslet/http/params.py", line 382, in __getitem__
repr(key))
KeyError: "MediaType instance has no parameter 'name'"
>>>
There are lots of other special get_ and set_ methods on the
Message
,
Request
and Response
objects.
5.1.2. Pipelining¶
One of the use cases that Pyslet’s HTTP client is designed to cover is reusing an HTTP connection to make multiple requests to the same host. The example above takes care to close the Client object when we’re done because otherwise it would leave the connection to the server open ready for another request.
5.1.3. Reference¶
The client module imports the grammar, params, messages and auth modules and these can therefore be accessed using a single import in your code. For example:
import pyslet.http.client as http
type = http.params.MediaType('application', 'xml')
For more details of the objects exposed by those modules see
pyslet.http.grammar
, pyslet.http.params
,
pyslet.http.messages
and pyslet.http.auth
.
-
class
pyslet.http.client.
Client
(max_connections=100, ca_certs=None, timeout=None, max_inactive=None)¶ Bases:
pyslet.pep8.PEP8Compatibility
,object
An HTTP client
Note
In Pyslet 0.4 and earlier the name HTTPRequestManager was used, this name is still available as an alias for Client.
The object manages the sending and receiving of HTTP/1.1 requests and responses respectively. There are a number of keyword arguments that can be used to set operational parameters:
- max_connections
- The maximum number of HTTP connections that may be open at any
one time. The method
queue_request()
will block (or raiseRequestManagerBusy
) if an attempt to queue a request would cause this limit to be exceeded. - timeout
- The maximum wait time on the connection. This is not the same as a limit on the total time to receive a request but a limit on the time the client will wait with no activity on the connection before assuming that the server is no longer responding. Defaults to None, no timeout.
- max_inactive (None)
The maximum time to keep a connection inactive before terminating it. By default, HTTP connections are kept open when the protocol allows. These idle connections are kept in a pool and can be reused by any thread. This is useful for web-service type use cases (for which Pyslet has been optimised) but it is poor practice to keep these connections open indefinitely and anyway, most servers will hang up after a fairly short period of time anyway.
If not None, this setting causes a cleanup thread to be created that calls the
idle_cleanup()
method periodically passing this setting value as its argument.- ca_certs
The file name of a certificate file to use when checking SSL connections. For more information see http://docs.python.org/2.7/library/ssl.html
In practice, there seem to be serious limitations on SSL connections and certificate validation in Python distributions linked to earlier versions of the OpenSSL library (e.g., Python 2.6 installed by default on OS X and Windows).
Warning
By default, ca_certs is optional and can be passed as None. In this mode certificates will not be checked and your connections are not secure from man in the middle attacks. In production use you should always specify a certificate file if you expect to use the object to make calls to https URLs.
Although max_connections allows you to make multiple connections to the same host+port the request manager imposes an additional restriction. Each thread can make at most 1 connection to each host+port. If multiple requests are made to the same host+port from the same thread then they are queued and will be sent to the server over the same connection using HTTP/1.1 pipelining. The manager (mostly) takes care of the following restriction imposed by RFC2616:
Clients SHOULD NOT pipeline requests using non-idempotent methods or non-idempotent sequences of methodsIn other words, a POST (or CONNECT) request will cause the pipeline to stall until all the responses have been received. Users should beware of non-idempotent sequences as these are not automatically detected by the manager. For example, a GET,PUT sequence on the same resource is not idempotent. Users should wait for the GET request to finish fetching the resource before queuing a PUT request that overwrites it.
In summary, to take advantage of multiple simultaneous connections to the same host+port you must use multiple threads.
-
ConnectionClass
¶ alias of
Connection
-
httpUserAgent
= None¶ The default User-Agent string to use, defaults to a string derived from the installed version of Pyslet, e.g.:
pyslet 0.5.20140727 (http.client.Client)
-
classmethod
get_server_certificate_chain
(url, method=None, options=None)¶ Returns the certificate chain for an https URL
- url
- A
URI
instance. This must use the https scheme or ValueError will be raised. - method (SSL.TLSv1_METHOD)
- The SSL method to use, one of the constants from the pyOpenSSL module.
- options (None)
- The SSL options to use, as defined by the pyOpenSSL module. For example, SSL.OP_NO_SSLv2.
This method requires pyOpenSSL to be installed, if it isn’t then a RuntimeError is raised.
The address and port is extracted from the URL and interrogated for its certificate chain. No validation is performed. The result is a string containing the concatenated PEM format certificate files. This string is equivalent to the output of the following UNIX command:
echo | openssl s_client -showcerts -connect host:port 2>&1 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p'
The purpose of this method is to provide something like the ssh-style trust whereby you can download the chain the first time you connect, store it to a file and then use that file for the ca_certs argument for SSL validation in future.
If the site certificate changes to one that doesn’t validate to a certificate in the same chain then the SSL connection will fail.
As this method does no validation there is no protection against a man-in-the-middle attack when you use this method. You should only use this method when you trust the machine and connection you are using or when you have some other way to independently verify that the certificate chain is good.
-
queue_request
(request, timeout=None)¶ Starts processing an HTTP request
- request
- A
messages.Request
object. - timeout
Number of seconds to wait for a free connection before timing out. A timeout raises
RequestManagerBusy
None means wait forever, 0 means don’t block.
The default implementation adds a User-Agent header from
httpUserAgent
if none has been specified already. You can override this method to add other headers appropriate for a specific context but you must pass this call on to this implementation for proper processing.
-
active_count
()¶ Returns the total number of active connections.
-
thread_active_count
()¶ Returns the total number of active connections associated with the current thread.
-
thread_task
(timeout=None)¶ Processes all connections bound to the current thread then blocks for at most timeout (0 means don’t block) while waiting to send/receive data from any active sockets.
Each active connection receives one call to
Connection.connection_task()
There are some situations where this method may still block even with timeout=0. For example, DNS name resolution and SSL handshaking. These may be improved in future.Returns True if at least one connection is active, otherwise returns False.
-
thread_loop
(timeout=60)¶ Repeatedly calls
thread_task()
until it returns False.
-
process_request
(request, timeout=60)¶ Process an
messages.Message
object.The request is queued and then
thread_loop()
is called to exhaust all HTTP activity initiated by the current thread.
-
idle_cleanup
(max_inactive=15)¶ Cleans up any idle connections that have been inactive for more than max_inactive seconds.
-
active_cleanup
(max_inactive=90)¶ Clean up active connections that have been inactive for more than max_inactive seconds.
This method can be called from any thread and can be used to remove connections that have been abandoned by their owning thread. This can happen if the owning thread stops calling
thread_task()
leaving some connections active.Inactive connections are killed using
Connection.kill()
and then removed from the active list. Should the owning thread wake up and attempt to finish processing the requests a socket error ormessages.HTTPException
will be reported.
-
close
()¶ Closes all connections and sets the manager to a state where new connections cannot not be created.
Active connections are killed, idle connections are closed.
-
add_credentials
(credentials)¶ Adds a
pyslet.http.auth.Credentials
instance to this manager.Credentials are used in response to challenges received in HTTP 401 responses.
-
remove_credentials
(credentials)¶ Removes credentials from this manager.
- credentials
- A
pyslet.http.auth.Credentials
instance previously added withadd_credentials()
.
If the credentials can’t be found then they are silently ignored as it is possible that two threads may independently call the method with the same credentials.
-
dnslookup
(host, port)¶ Given a host name (string) and a port number performs a DNS lookup using the native socket.getaddrinfo function. The resulting value is added to an internal dns cache so that subsequent calls for the same host name and port do not use the network unnecessarily.
If you want to flush the cache you must do so manually using
flush_dns()
.
-
flush_dns
()¶ Flushes the DNS cache.
-
find_credentials
(challenge)¶ Searches for credentials that match challenge
-
find_credentials_by_url
(url)¶ Searches for credentials that match url
-
class
pyslet.http.client.
ClientRequest
(url, method='GET', res_body=None, protocol=<pyslet.http.params.HTTPVersion object>, auto_redirect=True, max_retries=3, min_retry_time=5, **kwargs)¶ Bases:
pyslet.http.messages.Request
Represents an HTTP request.
To make an HTTP request, create an instance of this class and then pass it to an
Client
instance using eitherClient.queue_request()
orClient.process_request()
.- url
- An absolute URI using either http or https schemes. A
pyslet.rfc2396.URI
instance or an object that can be passed to its constructor.
And the following keyword arguments:
- method
- A string. The HTTP method to use, defaults to “GET”
- entity_body
- A string or stream-like object containing the request body. Defaults to None meaning no message body. For stream-like objects the tell and seek methods must be supported to enable resending the request if required.
- res_body
- A stream-like object to write data to. Defaults to None, in
which case the response body is returned as a string in the
res_body
. - protocol
- An
params.HTTPVersion
object, defaults to HTTPVersion(1,1) - autoredirect
- Whether or not the request will follow redirects, defaults to True.
- max_retries
- The maximum number of times to attempt to resend the request following an error on the connection or an unexpected hang-up. Defaults to 3, you should not use a value lower than 1 because, when pipelining, it is always possible that the server has gracefully closed the socket and we won’t notice until we’ve sent the request and get 0 bytes back on recv. Although ‘normal’ this scenario counts as a retry.
-
connection
= None¶ the
Connection
object that is currently sending us
-
status
= None¶ the status code received, 0 indicates a failed or unsent request
-
error
= None¶ If status == 0, the error raised during processing
-
scheme
= None¶ the scheme of the request (http or https)
-
hostname
= None¶ the hostname of the origin server
-
port
= None¶ the port on the origin server
-
url
= None¶ the full URL of the requested resource
-
res_body
= None¶ the response body received (only used if not streaming)
-
auto_redirect
= None¶ whether or not auto redirection is in force for 3xx responses
-
max_retries
= None¶ the maximum number of retries we’ll attempt
-
response
= None¶ the associated
ClientResponse
-
send_pipe
= None¶ the send pipe to use on upgraded connections
-
recv_pipe
= None¶ the recv pipe to use on upgraded connections
-
set_url
(url)¶ Sets the URL for this request
This method sets the Host header and the following local attributes:
scheme
,hostname
,port
andrequest_uri
.
-
can_retry
()¶ Returns True if we reconnect and retry this request
-
connect
(connection, send_pos)¶ Called when we are assigned to an HTTPConnection”
- connection
- A
Connection
object - send_pos
- The position of the sent bytes pointer after which this request has been (or at least has started to be) sent.
-
disconnect
(send_pos)¶ Called when the connection has finished sending us
This may be before or after the response is received and handled!
- send_pos
The number of bytes sent on this connection before the disconnect. This value is compared with the value passed to
connect()
to determine if the request was actually sent to the server or abandoned without a byte being sent.For idempotent methods we lose a life every time. For non-idempotent methods (e.g., POST) we do the same except that if we’ve been (at least partially) sent then we lose all lives to prevent “indeterminate results”.
-
finished
()¶ Called when we have a final response and have disconnected from the connection There is no guarantee that the server got all of our data, it might even have returned a 2xx series code and then hung up before reading the data, maybe it already had what it needed, maybe it thinks a 2xx response is more likely to make us go away. Whatever. The point is that you can’t be sure that all the data was transmitted just because you got here and the server says everything is OK
-
class
pyslet.http.client.
ClientResponse
(request, **kwargs)¶ Bases:
pyslet.http.messages.Response
-
handle_headers
()¶ Hook for response header processing.
This method is called when a set of response headers has been received from the server, before the associated data is received! After this call, recv will be called zero or more times until handle_message or handle_disconnect is called indicating the end of the response.
Override this method, for example, if you want to reject or invoke special processing for certain responses (e.g., based on size) before the data itself is received. To abort the response, close the connection using
Connection.request_disconnect()
.Override the
Finished()
method instead to clean up and process the complete response normally.
-
handle_message
()¶ Hook for normal completion of response
-
handle_disconnect
(err)¶ Hook for abnormal completion of the response
Called when the server disconnects before we’ve completed reading the response. Note that if we are reading forever this may be expected behaviour and err may be None.
We pass this information on to the request.
-
5.1.4. Exceptions¶
-
class
pyslet.http.client.
RequestManagerBusy
¶ Bases:
pyslet.http.messages.HTTPException
The HTTP client is busy
Raised when attempting to queue a request and no connections become available within the specified timeout.