2.2. Python 2 Compatibility

The goal of Pyslet is to work using the same code in both Python 3 and Python 2. Pyslet was originally developed in very early versions of Python 2, it then became briefly dependent on Python 2.7 before settling down to target Python 2.6 and Python 2.7.

One approach to getting the code working with Python 3 would be to implement a compatibility module like six which helps code targeted at Python 2 to run more easily in Python 3. Unfortunately, the changes required are still extensive and so more significant transformation is required.

The purpose of this module is to group together the compatibility issues that specifically affect Pyslet. It provides definitions that make the intent of the Pyslet code clearer.

pyslet.py2.py2 = True

Unfortunately, sometimes you just need to know if you are running under Python 2, this flag provides a common way for version specific code to check. (There are multiple ways of checking, this flag just makes it easier to find places in Pyslet where we care.)

pyslet.py2.suffix

In some cases you may want to use a suffix to differentiate something that relates specifically to Python 3 versus Python 2. This string takes the value ‘3’ when Python 3 is in use and is an empty string otherwise.

One example where Pyslet uses this is in the stem of a pickled file name as such objects tend to be version specific.

2.2.1. Text, Characters, Strings and Bytes

This is the main area where Pyslet has had to change. In most cases, Pyslet explicitly wants either Text or Binary data so the Python 3 handling of these concepts makes a lot of sense.

pyslet.py2.u8(arg)

A wrapper for string literals, obviating the need to use the ‘u’ character that is not allowed in Python 3 prior to 3.3. The return result is a unicode string in Python 2 and a str object in Python 3. The argument should be a binary string in UTF-8 format, it is not a simple replacement for ‘u’. There are other approaches to this problem such as the u function defined by compatibility libraries such as six. Use whichever strategy best suits your application.

u8 is forgiving if you accidentally pass a unicode string provided that string contains only ASCII characters. Recommended usage:

my_string = u8(b'hello')
my_string = u8('hello') # works for ASCII text
my_string = u8(u'hello') # wrong, but will work for ASCII text
my_string = u8(b'\xe8\x8b\xb1\xe5\x9b\xbd')
my_string = u8('\xe8\x8b\xb1\xe5\x9b\xbd') # raises ValueError
my_string = u8(u'\u82f1\u56fd') # raises ValueError
my_string = u8('\u82f1\u56fd') # raises ValueError in Python 3 only

The latter examples above resolve to the following two characters: “英国”.

In cases where you only want to encode characters from the ISO-8859-1 aka Latin-1 character set you may prefer to use the ul function instead.

pyslet.py2.ul(arg)

An alternative wrapper for string literals, similar to u8() but using the latin-1 codec. ul is a little more forgiving than u8:

my_string = ul(b'Caf\xe9')
my_string = ul('Caf\xe9') # works for Latin text
my_string = ul(u'Caf\xe9') # wrong, but will work for Latin text

Notice that unicode escapes for characters outside the first 256 are not allowed in either wrapper. If you want to use a wrapper that interprets strings like ‘\u82f1\u56fd’ in both major Python versions you should use a module like six which will pass strings to the unicode_literal codec. The approach taken by Pyslet is deliberately different, but has the advantage of dealing with some awkward cases:

ul(b'\\user')

The u wrapper in six will throw an error for strings like this:

six.u('\\user')
Traceback (most recent call last):
    ...
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in
    position 0-4: end of string in escape sequence

Finally, given the increased overhead in calling a function when interpreting literals consider moving literal definitions to module level where they appear in performance critical functions:

CAFE = ul(b"Caf\xe9")

def at_cafe_1(location):
    return location == u"Caf\xe9"

def at_cafe_2(location):
    return location == CAFE

def at_cafe_3(location):
    return location == ul(b"Caf\xe9")

In a quick test with Python 2, using the execution time of version 1 as a bench mark version 2 was approximately 1.1 times slower but version 3 was 19 times slower (the results from six.u are about 16 times slower). The same tests with Python 3 yield about 9 and 3 times slower for ul and six.u respectvely.

Compatibility comes with a cost, if you only need to support Python 3.3 and higher (while retaining compatibility with Python 2) then you should use the first form and ignore these literal functions in performance critical code. If you want more compatibility then define all string literals ahead of time, e.g., at module level.

2.2.1.1. Character Constants

These constants are provided to define common character strings (forcing the unicode type in Python 2).

pyslet.py2.uempty

The empty string.

pyslet.py2.uspace

Single space character, character(0x20).

2.2.1.2. Text Functions

pyslet.py2.is_string(org)

Returns True if arg is either a character or binary string.

pyslet.py2.is_text(arg)

Returns True if arg is text and False otherwise. In Python 3 this is simply a test of whether arg is of type str but in Python 2 both str and unicode types return True. An example usage of this function is when checking arguments that may be either text or some other type of object.

pyslet.py2.force_text(arg)

Returns arg as text or raises TypeError. In Python 3 this simply checks that arg is of type str, in Python 2 this allows either string type but always returns a unicode string. No codec is used so this has the side effect of ensuring that only ASCII compatible str instances will be acceptable in Python 2.

pyslet.py2.to_text(arg)

Returns arg as text, converting it if necessary. In Python 2 this always returns a unicode string. In Python 3, this function is almost identical to the built-in str except that it takes binary data that can be interpreted as ascii and converts it to text. In other words:

to_text(b"hello") == "hello"

In both Python 2 and Python 3. Whereas the following is only true in Python 2:

str(b"hello") == "hello"

arg need not be a string, this function will cause an arbitrary object’s __str__ (or __unicode__ in Python 2) method to be evaluated.

pyslet.py2.is_ascii(arg)

Returns True if arg is of type str in both Python 2 and Python 3. The only difference is that in Python 3 unicode errors will be raised if arg contains non-ascii characters. If arg is not of str type then False is returned.

This function is used to check a value in situations where unicode is not expected in Python 2.

pyslet.py2.force_ascii(arg)

Returns arg as ascii text, converting it if necessary. The result is an object of type str, in both python 2 and python 3. The difference is that in Python 2 unicode strings are accepted and forced to type str by encoding with the ‘ascii’ codec whereas in Python 3 bytes instances are accepted and forced to type str by decoding with the ‘ascii’ codec.

This function is not needed very often but in some cases Python interfaces required type str in Python 2 when the intention was to accept ASCII text rather than arbitrary bytes. When migrated to Python 3 these interfaces can be problematic as inputs may be generated as ASCII bytes rather than strings in Python 3, e.g., the output of base64 encoding.

class pyslet.py2.UnicodeMixin

Bases: object

Mixin class to handle string formatting

For classes that need to define a __unicode__ method of their own this class is used to ensure that the correct behaviour exists in Python versions 2 and 3.

The mixin class implements __str__ based on your existing (required) __unicode__ or (optional) __bytes__ implementation. In python 2, the output of __unicode__ is encoded using the default system encoding if no __bytes__ implementation is provided. This may well generate errors but that seems more appropriate as it will catch cases where the str function has been used instead of to_text().

pyslet.py2.is_unicode(arg)

Returns True if arg is unicode text and False otherwise. In Python 3 this is simply a test of whether arg is of type str but in Python 2 arg must be a unicode string. This is used in contexts where we want to discriminate between bytes and text in all Python versions.

pyslet.py2.character(codepoint)

Given an integer codepoint returns a single unicode character. You can also pass a single byte value (defined as the type returned by indexing a binary string). Bear in mind that in Python 2 this is a single-character string, not an integer. See byte() for how to create byte values dynamically.

pyslet.py2.join_characters(iterable)

Convenience function for concatenating an iterable of characters (or character strings). In Python 3 this is just:

''.join

In Python 2 it ensures the result is a unicode string.

2.2.1.3. Bytes

pyslet.py2.to_bytes(arg)

Returns arg as bytes, converting it if necessary. In Python 2 this always returns a plain string and is in fact just an alias for the builtin str. In Python 3, this function is more complex. If arg is an object with a __bytes__ attribute then this is called, otherwise the object is converted to a string (using str) and then encoded using the ‘ascii’ codec.

The behaviour of to_bytes in Python 3 may appear similar to the built in bytes function but there is an important exception:

x = 2
str(x) == '2'               # in python 2 and 3
bytes(x) == b'2'            # in python 2
bytes(x) == b'\x00\x00'     # in python 3
to_bytes(x) == b'2'         # in python 2 and 3
pyslet.py2.force_bytes(arg)

Given either a binary string or a character string, returns a binary string of bytes. If arg is a character string then it is encoded with the ‘ascii’ codec.

pyslet.py2.byte(value)

Given either an integer value in the range 0..255, a single-character binary string or a single-character with Unicode codepoint in the range 0..255: returns a single byte representing that value. This is one of the main differences between Python 2 and 3. In Python 2 bytes are characters and in Python 3 they’re integers.

pyslet.py2.byte_value(b)

Given a value such as would be returned by byte() or by indexing a binary string, returns the corresponding integer value. In Python 3 this a no-op but in Python 2 it maps to the builtin function ord.

pyslet.py2.join_bytes(arg)

Given an arg that iterates to yield bytes, returns a bytes object containing those bytes. It is important not to confuse this operation with the more common joining of binary strings. No function is provided for that as the following construct works as expected in both Python 2 and Python 3:

b''.join(bstr_list)

The usage of join_bytes can best be illustrated by the following two interpreter sessions.

Python 2.7.10:

>>> from pyslet.py2 import join_bytes
>>> join_bytes(list(b'abc'))
'abc'
>>> b''.join(list(b'abc'))
'abc'

Python 3.5.1:

>>> from pyslet.py2 import join_bytes
>>> join_bytes(list(b'abc'))
b'abc'
>>> b''.join(list(b'abc'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected a bytes-like object, int found
pyslet.py2.byte_to_bstr(arg)

Given a single byte, returns a bytes object of length 1 containing that byte. This is a more efficient way of writing:

join_bytes([arg])

In Python 2 this is a no-operation but in Python 3 it is effectively the same as the above.

2.2.2. Printing to stdout

pyslet.py2.output(txt)

Simple function for writing to stdout

Not as sophisticated as Python 3’s print function but designed to be more of a companion to the built in input.

2.2.3. Numeric Definitions

pyslet.py2.long2()

Missing from Python 3, equivalent to the builtin int.

2.2.4. Iterable Fixes

Python 3 made a number of changes to the way objects are iterated.

pyslet.py2.range3(*args)

Uses Python 3 range semantics, maps to xrange in Python 2.

pyslet.py2.dict_keys(d)

Returns an iterable object representing the keys in the dictionary d.

pyslet.py2.dict_values(d)

Returns an iterable object representing the values in the dictionary d.

2.2.5. Comparisons

class pyslet.py2.SortableMixin

Bases: object

Mixin class for handling comparisons

Utility class for helping provide comparisons that are compatible with Python 2 and Python 3. Classes must define a method sortkey() which returns a sortable key value representing the instance.

Derived classes may optionally override the classmethod otherkey() to provide an ordering against other object types.

This mixin then adds implementations for all of the comparison methods: __eq__, __ne__, __lt__, __le__, __gt__, __ge__.

sortkey()

Returns a value to use as a key for sorting.

By default returns NotImplemented. This value causes the comparison functions to also return NotImplemented.

otherkey(other)

Returns a value to use as a key for sorting

The difference between this method and sortkey() is that this method takes an arbitrary object and either returns the key to use when comparing with this instance or NotImplemented if the sorting is not supported.

You don’t have to override this implementation, by default it returns other.sortkey() if other is an instance of the same class as self, otherwise it returns NotImplemented.

class pyslet.py2.CmpMixin

Bases: object

Mixin class for handling comparisons

For compatibility with Python 2’s __cmp__ method this class defines an implementation of __eq__, __lt__, __le__, __gt__, __ge__ that are redirected to __cmp__. These are the minimum methods required for Python’s rich comparisons.

In Python 2 it also provides an implementation of __ne__ that simply inverts the result of __eq__. (This is not required in Python 3.)

2.2.6. Misc Fixes

Imports the builtins module enabling you to import it from py2 instead of having to guess between __builtin__ (Python 2) and builtins (Python 3).

pyslet.py2.urlopen(*args, **kwargs)

Imported from urllib.request in Python 3, from urlib in Python 2.

pyslet.py2.urlencode(*args, **kwargs)

Imported from urllib.parse in Python 3, from urlib in Python 2.

pyslet.py2.urlquote(*args, **kwargs)

Imported from urllib.parse.quote in Python 3, from urlib.quote in Python 2.

pyslet.py2.parse_qs(*args, **kwargs)

Imported from urllib.parse in Python 3, from urlparse in Python 2.