6.4. XML Schema Datatypes

This module implements some useful concepts drawn from http://www.w3.org/TR/xmlschema-2/

One of the main purposes of this module is to provide classes and functions for converting data between python-native representations of the value-spaces defined by this specification and the lexical representations defined in the specification.

The result is typically a pair of DecodeX and EncodeX functions that are used to define custom attribute handling in classes that are derived from xml20081126.structures.XMLElement. For example:

import xsdatatypes20041028 as xsi

class MyElement(XMLElement):
        XMLATTR_flag=('flag',xsi.DecodeBoolean,xsi.EncodeBoolean)

6.4.1. Primitive Datatypes

pyslet.xsdatatypes20041028.DecodeBoolean(src)

Decodes a boolean value from src.

Returns python constants True or False. As a convenience, if src is None then None is returned.

pyslet.xsdatatypes20041028.EncodeBoolean(src)

Encodes a boolean value using the canonical lexical representation.

src can be anything that can be resolved to a boolean except None, which raises ValueError.

pyslet.xsdatatypes20041028.DecodeDecimal(src)

Decodes a decimal value from a string returning a python float value.

If string is not a valid lexical representation of a decimal value then ValueError is raised.

pyslet.xsdatatypes20041028.EncodeDecimal(value, digits=None, stripZeros=True)

Encodes a decimal value into a string.

You can control the maximum number of digits after the decimal point using digits which must be greater than 0 - None indicates no maximum. This function always returns the canonical representation which means that it will strip trailing zeros in the fractional part. To override this behaviour and return exactly digits decimal places set stripZeros to False.

pyslet.xsdatatypes20041028.DecodeFloat(src)

Decodes a float value from a string returning a python float.

The precision of the python float varies depending on the implementation. It typically exceeds the precision of the XML schema float. We make no attempt to reduce the precision to that of schema’s float except that we return 0.0 or -0.0 for any value that is smaller than the smallest possible float defined in the specification. (Note that XML schema’s float canonicalizes the representation of zero to remove this subtle distinction but it can be useful to preserve it for numerical operations. Likewise, if we are given a representation that is larger than any valid float we return one of the special float values INF or -INF as appropriate.

pyslet.xsdatatypes20041028.EncodeFloat(value)

Encodes a python float value as a string.

To reduce the chances of our output being rejected by external applications that are strictly bound to a 32-bit float representation we ensure that we don’t output values that exceed the bounds of float defined by XML schema.

Therefore, we convert values that are too large to INF and values that are too small to 0.0E0.

pyslet.xsdatatypes20041028.DecodeDouble(src)

Decodes a double value from a string returning a python float.

The precision of the python float varies depending on the implementation. It may even exceed the precision of the XML schema double. The current implementation ignores this distinction.

pyslet.xsdatatypes20041028.EncodeDouble(value, digits=None, stripZeros=True)

Encodes a double value returning a unicode string.

digits controls the number of digits after the decimal point in the mantissa, None indicates no maximum and the precision of python’s float is used to determine the appropriate number. You may pass the value 0 - in which case no digits are given after the point and the point itself is omitted, but such values are not in their canonical form.

stripZeros determines whether or not trailing zeros are removed, if False then exactly digits digits will be displayed after the point. By default zeros are stripped (except there is always one zero left after the decimal point).

pyslet.xsdatatypes20041028.DecodeDateTime(src)

Returns an pyslet.iso8601.TimePoint instance.

pyslet.xsdatatypes20041028.EncodeDateTime(value)

Returns the canonical lexical representation of a pyslet.iso8601.TimePoint instance.

6.4.2. Derived Datatypes

pyslet.xsdatatypes20041028.DecodeName(src)

Decodes a name from a string. Returns the same string or raised ValueError.

pyslet.xsdatatypes20041028.EncodeName(src)

A convenience function, returns src unchanged.

pyslet.xsdatatypes20041028.DecodeInteger(src)

Decodes an integer value from a string returning an Integer or Long value.

If string is not a valid lexical representation of an integer then ValueError is raised.

pyslet.xsdatatypes20041028.EncodeInteger(value)

Encodes an integer value using the canonical lexical representation.

6.4.3. Constraining Facets

6.4.3.1. Enumeration

class pyslet.xsdatatypes20041028.Enumeration

An abstract class designed to make generating enumeration types easier. The class is not designed to be instantiated but to act as a method of defining constants to represent the values of an enumeration.

The basic usage of this class is to derive a class from it with a single class member called ‘decode’ which is a mapping from canonical strings to simple integers. You then call the function MakeEnumeration() to complete the declaration, after which, you can use the enumeration as if you had defined the constants as class members and call any of the following class methods to convert enumeration values to and from their string representations.

classmethod DecodeValue(src)

Decodes a string returning a value in this enumeration.

If no legal value can be decoded then ValueError is raised.

classmethod DecodeLowerValue(src)

Decodes a string, converting it to lower case first.

Returns a value in this enumeration. If no legal value can be decoded then ValueError is raised.

classmethod DecodeUpperValue(src)

Decodes a string, converting it to upper case first.

Returns a value in this enumeration. If no legal value can be decoded then ValueError is raised.

classmethod DecodeTitleValue(src)

Decodes a string, converting it to title case first.

Returns a value in this enumeration. If no legal value can be decoded then ValueError is raised.

classmethod DecodeValueList(decoder, src)

Decodes a space-separated string of values using decoder which must be one of the Decode*Value methods of the enumeration. The result is an ordered list of values (possibly containing duplicates).

Example usage:

fruit.DecodeValueList(fruit.DecodeLowerValue,"apples oranges, pears")
# returns [ fruit.apples, fruit.oranges, fruit.pears ]
classmethod DecodeValueDict(decoder, src)

Decodes a space-separated string of values using decoder which must be one of the Decode*Value methods of the enumeration. The result is a dictionary mapping the values found as keys onto the strings used to represent them. Duplicates are mapped to the first occurrence of the encoded value.

Example usage:

fruit.DecodeValueDict(fruit.DecodeLowerValue,"Apples oranges PEARS")
# returns...
{ fruit.apples:'Apples', fruit.oranges:'oranges', fruit.pears:'PEARS' }
classmethod EncodeValue(value)

Encodes one of the enumeration constants returning a string.

If value is None then the encoded default value is returned (if defined) or None.

classmethod EncodeValueList(valueList)

Encodes a list of enumeration constants returning a space-separated string.

If valueList is empty then an empty string is returned.

classmethod EncodeValueDict(valueDict, sortKeys=True)

Encodes a dictionary of enumeration constants returning a space-separated string.

If valueDict is empty then an empty string is returned. Note that the canonical representation of each value is used. Extending the example given in DecodeValueDict():

fruit.EncodeValueDict(fruit.DecodeValueDict(fruit.DecodeLowerValue,
        "Apples oranges PEARS"))
# returns...
"apples oranges pears"

The order of the encoded values in the string is determined by the sort order of the enumeration constants. This ensures that equivalent dictionaries are always encoded to equivalent strings. In the above example:

fruit.apples < fruit.oranges and fruit.oranges < fruit.pears

If you have large lists then you can skip the sorting step by passing False for sortKeys to improve performance at the expense of a predictable encoding.

pyslet.xsdatatypes20041028.MakeEnumeration(e, defaultValue=None)

Adds convenience attributes to the class ‘e’

This function assumes that e has an attribute ‘decode’ that is a dictionary which maps strings onto enumeration values. This function creates the reverse mapping called ‘encode’ and also defines constant attribute values that are equivalent to the keys of decode and can be used in code in the form e.key.

If defaultValue is not None then it must be on of the strings in the decode dictionary. It is then used to set the DEFAULT value.

pyslet.xsdatatypes20041028.MakeEnumerationAliases(e, aliases)

Adds aliases from a dictionary, declaring additional convenience attributes.

This function assumes that MakeEnumeration() has already been used to complete the declaration of the enumeration. The aliases are added to the decode dictionary but, for obvious reasons, not to the encode dictionary.

pyslet.xsdatatypes20041028.MakeLowerAliases(e)

Adds aliases by converting all keys to lower case.

Assumes that MakeEnumeration() has already been used to complete the declaration of the enumeration. You must call this function to complete the declaration before relying on calls to Enumeration.DecodeLowerValue().

pyslet.xsdatatypes20041028.MakeEnumeration(e, defaultValue=None)

Adds convenience attributes to the class ‘e’

This function assumes that e has an attribute ‘decode’ that is a dictionary which maps strings onto enumeration values. This function creates the reverse mapping called ‘encode’ and also defines constant attribute values that are equivalent to the keys of decode and can be used in code in the form e.key.

If defaultValue is not None then it must be on of the strings in the decode dictionary. It is then used to set the DEFAULT value.

pyslet.xsdatatypes20041028.MakeEnumeration(e, defaultValue=None)

Adds convenience attributes to the class ‘e’

This function assumes that e has an attribute ‘decode’ that is a dictionary which maps strings onto enumeration values. This function creates the reverse mapping called ‘encode’ and also defines constant attribute values that are equivalent to the keys of decode and can be used in code in the form e.key.

If defaultValue is not None then it must be on of the strings in the decode dictionary. It is then used to set the DEFAULT value.

6.4.3.2. WhiteSpace

pyslet.xsdatatypes20041028.WhiteSpaceReplace(value)

Replaces tab, line feed and carriage return with space.

pyslet.xsdatatypes20041028.WhiteSpaceCollapse(value)

Replaces all runs of white space with a single space. Also removes leading and trailing white space.

6.4.4. Regular Expressions

Appendix F of the XML Schema datatypes specification defines a regular expression language. This language differs from the native Python regular expression language but it is close enough to enable us to define a wrapper class which parses schema regular expressions and converts them to equivalent python regular expressions.

class pyslet.xsdatatypes20041028.RegularExpression(src)

Models a regular expression as defined by XML schema.

Regular expressions are constructed from unicode source strings. Internally they are parsed and converted to Python regular expressions to speed up matching. Warning: because the XML schema expression language contains concepts not supported by Python the python regular expression may not be very readable.

src = None

the original source string

match(target)

A convenience function, returns True if the expression matches target.

For completeness we also document the parser we use to do the conversion, it draws heavily on the pyslet.unicode5.CharClass concept.

class pyslet.xsdatatypes20041028.RegularExpressionParser(source)

Bases: pyslet.unicode5.BasicParser

A custom parser for XML schema regular expressions.

The parser is initialised from a source string, the string to be parsed.

ParseRegExp()

Returns a unicode string representing the regular expression.

ParseBranch()

Returns a unicode string representing this piece as a python regular expression.

ParseQuantifier()

Returns a tuple of n,m.

Symbolic values are expanded to the appropriate pair. The second value may be None indicating unbounded.

ParseQuantity()

Returns a tuple of n,m even if an exact quantity is given.

In other words, the exact quantity ‘n’ returns n,n. The second value may be None indicated unbounded.

ParseQuantExact()

Returns an integer.

ParseAtom()

Returns a unicode string representing this atom as a python regular expression.

IsChar(c=None)

The definition of this function is designed to be conservative with respect to the specification, which is clearly in error around production [10] as the prose and the BNF do not match. It appears that | was intended to be excluded in the prose but has been omitted, the reverse being true for the curly-brackets.

ParseCharClass()

Returns a CharClass instance representing this class.

ParseCharClassExpr()

Returns a CharClass instance representing this class expression.

ParseCharGroup()

Returns a CharClass representing this group. This method also handles the case of a class subtraction directly to reduce the need for look-ahead. If you specifically want to parse a subtraction you can do this with ParseCharClassSub().

ParsePosCharGroup()

Returns a CharClass representing a positive range

ParseNegCharGroup()

Returns a CharClass representing this range.

ParseCharClassSub()

Returns a CharClass representing this range - this method is not normally used by the parser as in present for completeness. See ParseCharGroup().

ParseCharRange()

Returns a CharClass representing this range.

ParseSERange()

Returns a CharClass representing this range.

ParseCharOrEsc()

Returns a single unicode character.

ParseCharClassEsc()

Returns a CharClass instance representing one of the escape sequences.

ParseSingleCharEsc()

Returns a single unicode character parsed from a single char escape.

ParseCatEsc()

Returns a CharClass, parsing a category escape.

ParseComplEsc()

Returns a CharClass, parsing the complement of a category escape.

ParseCharProp()

Returns a CharClass, parsing an IsCategory or IsBlock.

ParseIsCategory()

Returns a CharClass corresponding to one of the character categories or raises an error.

ParseIsBlock()

Returns a CharClass corresponding to one of the Unicode blocks.

ParseMultiCharEsc()

Returns a CharClass corresponding to one of the multichar escapes, if parsed.

ParseWildcardEsc()

Returns a CharClass corresponding to the wildcard ‘.’ character if parsed.