6.2.5. XML: Schema Datatypes

This module implements some useful concepts drawn from http://www.w3.org/TR/xmlschema-2/

One of the main purposes of this module is to provide classes and functions for converting data between python-native representations of the value-spaces defined by this specification and the lexical representations defined in the specification.

The result is typically a pair of x_from_str/x_to_str functions that are used to define custom attribute handling in classes that are derived from Element. For example:

import xml.xsdatatypes as xsi

class MyElement(XMLElement):
    XMLNAME = "MyElement"
        XMLATTR_flag=('flag', xsi.boolean_from_str, xsi.boolean_to_str)

In this example, an element like this:

<MyElement flag="1">...</MyElement>

Would cause the instance of MyElement representing this element to have it’s flag attribute set to the Python constant True instead of a string value. Also, when serializing the element instance the flag attribute’s value would be converted to the canonical representation, which in this case would be the string “true”. Finally, these functions raise ValueError when conversion fails, an error which the XML parser will escalate to an XML validation error (allowing the document to be rejected in strict parsing modes).

6.2.5.1. Namespace

The XML schema namespace is typically used with the prefix xsi.

pyslet.xml.xsdatatypes.XMLSCHEMA_NAMESPACE = 'http://www.w3.org/2001/XMLSchema-instance'

The namespace to use XML schema elements

6.2.5.2. Primitive Datatypes

XML schema’s boolean trivially maps to Python’s True/False

pyslet.xml.xsdatatypes.boolean_from_str(src)

Decodes a boolean value from src.

Returns python constants True or False. As a convenience, if src is None then None is returned.

pyslet.xml.xsdatatypes.boolean_to_str(src)

Encodes a boolean using the canonical lexical representation.

src
Anything that can be resolved to a boolean except None, which raises ValueError.

The decimal, float and double types are represented by Python’s native float type but the function used to encode and decode them from strings differ from native conversion to adhere more closely to the schema specification and to ensure that, by default, canonical lexical representations are used.

pyslet.xml.xsdatatypes.decimal_from_str(src)

Decodes a decimal from a string returning a python float value.

If string is not a valid lexical representation of a decimal value then ValueError is raised.

pyslet.xml.xsdatatypes.decimal_to_str(value, digits=None, strip_zeros=True, **kws)

Encodes a decimal value into a string.

digits
You can control the maximum number of digits after the decimal point using digits which must be greater than 0 - None indicates no maximum.
strip_zeros (aka stripZeros)
This function always returns the canonical representation which means that it will strip trailing zeros in the fractional part. To override this behaviour and return exactly digits decimal places set stripZeros to False.
pyslet.xml.xsdatatypes.float_from_str(src)

Decodes a float value from a string returning a python float.

The precision of the python float varies depending on the implementation. It typically exceeds the precision of the XML schema float. We make no attempt to reduce the precision to that of schema’s float except that we return 0.0 or -0.0 for any value that is smaller than the smallest possible float defined in the specification. (Note that XML schema’s float canonicalizes the representation of zero to remove this subtle distinction but it can be useful to preserve it for numerical operations. Likewise, if we are given a representation that is larger than any valid float we return one of the special float values INF or -INF as appropriate.

pyslet.xml.xsdatatypes.float_to_str(value)

Encodes a python float value as a string.

To reduce the chances of our output being rejected by external applications that are strictly bound to a 32-bit float representation we ensure that we don’t output values that exceed the bounds of float defined by XML schema.

Therefore, we convert values that are too large to INF and values that are too small to 0.0E0.

pyslet.xml.xsdatatypes.double_from_str(src)

Decodes a double value from a string returning a python float.

The precision of the python float varies depending on the implementation. It may even exceed the precision of the XML schema double. The current implementation ignores this distinction.

pyslet.xml.xsdatatypes.double_to_str(value, digits=None, strip_zeros=True, **kws)

Encodes a double value returning a character string.

digits
Controls the number of digits after the decimal point in the mantissa, None indicates no maximum and the precision of python’s float is used to determine the appropriate number. You may pass the value 0 in which case no digits are given after the point and the point itself is omitted, but such values are not in their canonical form.
strip_zeros (aka stripZeros)
determines whether or not trailing zeros are removed, if False then exactly digits digits will be displayed after the point. By default zeros are stripped (except there is always one zero left after the decimal point).
class pyslet.xml.xsdatatypes.Duration(value=None)

Bases: pyslet.iso8601.Duration

Represents duration values.

Extends the basic iso duration class to include negative durations.

sign = None

an integer with the sign of the duration

dateTime values are represented by pyslet.iso8601.TimePoint instances. These functions are provided for convenience in custom attribute mappings.

pyslet.xml.xsdatatypes.datetime_from_str(src)

Returns a pyslet.iso8601.TimePoint instance.

pyslet.xml.xsdatatypes.datetime_to_str(value)

Returns the canonical lexical representation for dateTime

value:
An instance of pyslet.iso8601.TimePoint

6.2.5.3. Derived Datatypes

dateTime values are represented by pyslet.iso8601.TimePoint instances. These functions are provided for convenience in custom attribute mappings.

Name represents XML Names, the native Python character string is used.

pyslet.xml.xsdatatypes.name_from_str(src)

Decodes a name from a string.

Returns the same string or raises ValueError if src does not match the XML production Name.

pyslet.xml.xsdatatypes.name_to_str(src)

Encodes a name

A convenience function (equivalent to pyslet.py2.to_text().

Integer is represented by the native Python integer.

pyslet.xml.xsdatatypes.integer_from_str(src)

Decodes an integer

If string is not a valid lexical representation of an integer then ValueError is raised. This uses XML Schema’s lexical rules which are slightly different from Python’s native conversion.

pyslet.xml.xsdatatypes.integer_to_str(value)

Encodes an integer value using the canonical lexical representation.

6.2.5.4. Constraining Facets

6.2.5.4.1. Enumeration

class pyslet.xml.xsdatatypes.Enumeration

Bases: pyslet.xml.xsdatatypes.EnumBase

Abstract class for defining enumerations

The class is not designed to be instantiated but to act as a method of defining constants to represent the values of an enumeration and for converting between those constants and the appropriate string representations.

The basic usage of this class is to derive a class from it with a single class member called ‘decode’ which is a mapping from canonical strings to simple integers.

Once defined, the class will be automatically populated with a reverse mapping dictionary (called encode) and the enumeration strings will be added as attributes of the class itself. For exampe:

class Fruit(Enumeration):
    decode = {
        'Apple": 1,
        'Pear': 2,
        'Orange': 3}

Fruit.Apple == 1    # True thanks to metaclass

You can add define additional mappings by providing a second dictionary called aliases that maps additional names onto existing values. The aliases dictionary is a mapping from strings onto the equivalent canonical string:

class Vegetables(Enumeration):
    decode = {
        'Tomato': 1,
        'Potato': 2,
        'Courgette': 3}

    aliases = {
        'Zucchini': 'Courgette'}

Vegetables.Zucchini == 3       # True thanks to metaclass

You may also add the special key None to the aliases dictionary to define a default value for the enumeration. This is mapped to an attribute called DEFAULT:

class Staples(Enumeration):
    decode = {
        'Bread': 1,
        'Pasta': 2,
        'Rice': 3}

    aliases = {
        None: 'Bread'}

Staples.DEFAULT == 1        # True thanks to metaclass
DEFAULT = None

The DEFAULT value of the enumeration defaults to None

classmethod from_str(src)

Decodes a string returning a value in this enumeration.

If no legal value can be decoded then ValueError is raised.

classmethod from_str_lower(src)

Decodes a string, converting it to lower case first.

Returns a value in this enumeration. If no legal value can be decoded then ValueError is raised.

classmethod from_str_upper(src)

Decodes a string, converting it to upper case first.

Returns a value in this enumeration. If no legal value can be decoded then ValueError is raised.

classmethod from_str_title(src)

Decodes a string, converting it to title case first.

Title case is defined as an initial upper case letter with all other letters lower case.

Returns a value in this enumeration. If no legal value can be decoded then ValueError is raised.

classmethod list_from_str(decoder, src)

Decodes a list of values

decoder
One of the from_str methods.
src
A space-separated string of values

The result is an ordered list of values (possibly containing duplicates).

Example usage:

Fruit.list_from_str(Fruit.from_str_title,
                    "apple orange pear")
# returns [ Fruit.Apple, Fruit.Orange, Fruit.Pear ]
classmethod dict_from_str(decoder, src)

Decodes a dictionary of values

decoder
One of the from_str methods
src
A space-separated string of values.

The result is a dictionary mapping the values found as keys onto the strings used to represent them. Duplicates are mapped to the first occurrence of the encoded value.

Example usage:

Fruit.dict_from_str(Fruit.from_str_title,
                    "Apple orange PEARS apple")
    # returns {Fruit.Apple: 'Apple', Fruit.Orange: 'orange',
    #          Fruit.Pear: 'PEARS' }
classmethod to_str(value)

Encodes one of the enumeration constants returning a string.

If value is None then the encoded default value is returned (if defined) or None.

classmethod list_to_str(value_list)

Encodes a list of enumeration constants

value_list
A list or iterable of integer values corresponding to enumeration constants.

Returns a space-separated string. If valueList is empty then an empty string is returned.

classmethod dict_to_str(value_dict, sort_keys=True, **kws)

Encodes a dictionary of enumeration constants

value_dict
A dictionary with integer keys corresponding to enumeration constant values.
sort_keys
Boolean indicating that the result should be sorted by constant value. (Defaults to True.)

Returns a space-separated string. If value_dict is empty then an empty string is returned. The values in the dictionary are ignored, the keys are used to obtain the canonical representation of each value. Extending the example given in dict_from_str():

Fruit.dict_to_str(
    {Fruit.Apple: 'Apple', Fruit.Orange: 'orange',
     Fruit.Pear: 'PEARS' })
# returns: "Apple Pear Orange"

The order of the values in the string is determined by the sort order of the enumeration constants (not their string representation). This ensures that equivalent dictionaries are always encoded to the same string. In the above example:

Fruit.Apple < Fruit.Pear < Fruit.Orange

If you have large lists then you can skip the sorting step by passing False for sort_keys to improve performance at the expense of an unpredictable encoding.

classmethod DecodeLowerValue(*args, **kwargs)

Deprecated equivalent to from_str_lower()

classmethod DecodeTitleValue(*args, **kwargs)

Deprecated equivalent to from_str_title()

classmethod DecodeUpperValue(*args, **kwargs)

Deprecated equivalent to from_str_upper()

classmethod DecodeValue(*args, **kwargs)

Deprecated equivalent to from_str()

classmethod DecodeValueDict(*args, **kwargs)

Deprecated equivalent to dict_from_str()

classmethod DecodeValueList(*args, **kwargs)

Deprecated equivalent to list_from_str()

classmethod EncodeValue(*args, **kwargs)

Deprecated equivalent to to_str()

classmethod EncodeValueDict(*args, **kwargs)

Deprecated equivalent to dict_to_str()

classmethod EncodeValueList(*args, **kwargs)

Deprecated equivalent to list_to_str()

class pyslet.xml.xsdatatypes.EnumerationNoCase

Bases: pyslet.xml.xsdatatypes.Enumeration

Convenience class that automatically adds lower-case aliases

On creation, the enumeration ensures that aliases equivalent to the lower-cased canonical strings are defined. Designed to be used in conjunction with from_str_lower() for case insensitive matching of enumumeration strings.

6.2.5.4.2. WhiteSpace

pyslet.xml.xsdatatypes.white_space_replace(value)

Replaces tab, line feed and carriage return with space.

pyslet.xml.xsdatatypes.white_space_collapse(value)

Replaces all runs of white space with a single space. Also removes leading and trailing white space.

6.2.5.5. Regular Expressions

Appendix F of the XML Schema datatypes specification defines a regular expression language. This language differs from the native Python regular expression language but it is close enough to enable us to define a wrapper class which parses schema regular expressions and converts them to equivalent python regular expressions.

class pyslet.xml.xsdatatypes.RegularExpression(src)

Bases: pyslet.py2.UnicodeMixin

A regular expression as defined by XML schema.

Regular expressions are constructed from character strings. Internally they are parsed and converted to Python regular expressions to speed up matching.

Warning: because the XML schema expression language contains concepts not supported by Python the python regular expression may not be very readable.

src = None

the original source string

p = None

the compiled python regular expression

match(target)

Returns True if the expression matches target.

For completeness we also document the parser we use to do the conversion, it draws heavily on the pyslet.unicode5.CharClass concept.

class pyslet.xml.xsdatatypes.RegularExpressionParser(source)

Bases: pyslet.unicode5.BasicParser

A custom parser for XML schema regular expressions.

The parser is initialised from a character string and always operates in text mode.

require_reg_exp()

Parses a regExp

Returns a unicode string representing the regular expression.

require_branch()

Parses branch

Returns a character string representing these pieces as a python regular expression.

require_piece()

Parses piece

Returns a character string representing this piece in python regular expression format.

require_quantifier()

Parses quantifier

Returns a tuple of (n, m).

Symbolic values are expanded to the appropriate pair. The second value may be None indicating unbounded.

require_quantity()

Parses quantity

Returns a tuple of (n, m) even if an exact quantity is given.

In other words, the exact quantity ‘n’ returns (n, n). The second value may be None indicating unbounded.

require_quant_exact()

Parses QuantEact

Returns the integer value parsed.

require_atom()

Parses atom

Returns a unicode string representing this atom as a python regular expression.

is_char(c=None)

Parses Char

Returns either True or False depending on whether the_char satisfies the production Char.

The definition of this function is designed to be conservative with respect to the specification, which is clearly in error around production [10] as the prose and the BNF do not match. It appears that | was intended to be excluded in the prose but has been omitted, the reverse being true for the curly-brackets.

require_char_class()

Parses a charClass.

require_char_class_expr()

Parses charClassExpr

require_char_group()

Parses charGroup.

This method also handles the case of a class subtraction directly to reduce the need for look-ahead. If you specifically want to parse a subtraction you can do this with require_char_class_sub().

require_pos_char_group()

Parses posCharGroup

require_neg_char_group()

Parses negCharGroup.

require_char_class_sub()

Parses charClassSub

This method is not normally used by the parser as in present for completeness. See require_char_group().

require_char_range()

Parses a charRange.

require_se_range()

Parses seRange.

require_char_or_esc()

Parses charOrEsc.

require_char_class_esc()

Parsers charClassEsc.

Returns a CharClass instance.

require_single_char_esc()

Parses SingleCharEsc

Returns a single character.

require_cat_esc()

Parses catEsc.

require_compl_esc()

Parses complEsc.

require_char_prop()

Parses a charProp.

require_is_category()

Parses IsCategory.

require_is_block()

Parses IsBlock.

require_multi_char_esc()

Parses a MultiCharEsc.

require_wildcard_esc()

Parses ‘.’, the wildcard CharClass

6.2.5.6. Backwards Compatibility

pyslet.xml.xsdatatypes.DecodeBoolean(*args, **kwargs)

Deprecated equivalent to boolean_from_str()

pyslet.xml.xsdatatypes.EncodeBoolean(*args, **kwargs)

Deprecated equivalent to boolean_to_str()

pyslet.xml.xsdatatypes.DecodeDecimal(*args, **kwargs)

Deprecated equivalent to decimal_from_str()

pyslet.xml.xsdatatypes.EncodeDecimal(*args, **kwargs)

Deprecated equivalent to decimal_to_str()

pyslet.xml.xsdatatypes.DecodeFloat(*args, **kwargs)

Deprecated equivalent to float_from_str()

pyslet.xml.xsdatatypes.EncodeFloat(*args, **kwargs)

Deprecated equivalent to float_to_str()

pyslet.xml.xsdatatypes.DecodeDouble(*args, **kwargs)

Deprecated equivalent to double_from_str()

pyslet.xml.xsdatatypes.EncodeDouble(*args, **kwargs)

Deprecated equivalent to double_to_str()

pyslet.xml.xsdatatypes.DecodeDateTime(*args, **kwargs)

Deprecated equivalent to datetime_from_str()

pyslet.xml.xsdatatypes.EncodeDateTime(*args, **kwargs)

Deprecated equivalent to datetime_to_str()

pyslet.xml.xsdatatypes.DecodeName(*args, **kwargs)

Deprecated equivalent to name_from_str()

pyslet.xml.xsdatatypes.EncodeName(*args, **kwargs)

Deprecated equivalent to name_to_str()

pyslet.xml.xsdatatypes.DecodeInteger(*args, **kwargs)

Deprecated equivalent to integer_from_str()

pyslet.xml.xsdatatypes.EncodeInteger(*args, **kwargs)

Deprecated equivalent to integer_to_str()

pyslet.xml.xsdatatypes.make_enum(cls, default_value=None, **kws)

Deprecated function

This function is no longer required and does nothing unless default_value is passed in which case it adds the DEFAULT attribute to the Enumeration cls as if an alias had been declared for None (see Enumeration above for details).

pyslet.xml.xsdatatypes.MakeEnumeration(*args, **kwargs)

Deprecated equivalent to make_enum()

pyslet.xml.xsdatatypes.make_enum_aliases(cls, aliases)

Deprecated function

Supported for backwards compatibility, modify enum class definitions to include aliases as an attribute directly:

class MyEnum(Enumeration):
    decode = {
        # strings to ints mapping
        }
    aliases = {
        # aliases to strings mapping
        }

The new metaclass takes care of processing the aliases dictionary when the class is created.

pyslet.xml.xsdatatypes.MakeEnumerationAliases(*args, **kwargs)

Deprecated equivalent to make_enum_aliases()

pyslet.xml.xsdatatypes.make_lower_aliases(cls)

Deprecated function

Supported for backwards compatibility. Use new class EnumerationNoCase instead.

Warning, the new class will only add lower-case aliases for the canonical strings, any additional aliases (defined in the aliases dictionary attribute) must already be lower-case or be defined with both case variants.

pyslet.xml.xsdatatypes.MakeLowerAliases(*args, **kwargs)

Deprecated equivalent to make_lower_aliases()