6.2.5. XML: Schema Datatypes¶
This module implements some useful concepts drawn from http://www.w3.org/TR/xmlschema-2/
One of the main purposes of this module is to provide classes and functions for converting data between python-native representations of the value-spaces defined by this specification and the lexical representations defined in the specification.
The result is typically a pair of x_from_str/x_to_str functions that are
used to define custom attribute handling in classes that are derived from
Element
. For example:
import xml.xsdatatypes as xsi
class MyElement(XMLElement):
XMLNAME = "MyElement"
XMLATTR_flag=('flag', xsi.boolean_from_str, xsi.boolean_to_str)
In this example, an element like this:
<MyElement flag="1">...</MyElement>
Would cause the instance of MyElement representing this element to have it’s flag attribute set to the Python constant True instead of a string value. Also, when serializing the element instance the flag attribute’s value would be converted to the canonical representation, which in this case would be the string “true”. Finally, these functions raise ValueError when conversion fails, an error which the XML parser will escalate to an XML validation error (allowing the document to be rejected in strict parsing modes).
6.2.5.1. Namespace¶
The XML schema namespace is typically used with the prefix xsi.
-
pyslet.xml.xsdatatypes.
XMLSCHEMA_NAMESPACE
= 'http://www.w3.org/2001/XMLSchema-instance'¶ The namespace to use XML schema elements
6.2.5.2. Primitive Datatypes¶
XML schema’s boolean trivially maps to Python’s True/False
-
pyslet.xml.xsdatatypes.
boolean_from_str
(src)¶ Decodes a boolean value from src.
Returns python constants True or False. As a convenience, if src is None then None is returned.
-
pyslet.xml.xsdatatypes.
boolean_to_str
(src)¶ Encodes a boolean using the canonical lexical representation.
- src
- Anything that can be resolved to a boolean except None, which raises ValueError.
The decimal, float and double types are represented by Python’s native float type but the function used to encode and decode them from strings differ from native conversion to adhere more closely to the schema specification and to ensure that, by default, canonical lexical representations are used.
-
pyslet.xml.xsdatatypes.
decimal_from_str
(src)¶ Decodes a decimal from a string returning a python float value.
If string is not a valid lexical representation of a decimal value then ValueError is raised.
-
pyslet.xml.xsdatatypes.
decimal_to_str
(value, digits=None, strip_zeros=True, **kws)¶ Encodes a decimal value into a string.
- digits
- You can control the maximum number of digits after the decimal point using digits which must be greater than 0 - None indicates no maximum.
- strip_zeros (aka stripZeros)
- This function always returns the canonical representation which means that it will strip trailing zeros in the fractional part. To override this behaviour and return exactly digits decimal places set stripZeros to False.
-
pyslet.xml.xsdatatypes.
float_from_str
(src)¶ Decodes a float value from a string returning a python float.
The precision of the python float varies depending on the implementation. It typically exceeds the precision of the XML schema float. We make no attempt to reduce the precision to that of schema’s float except that we return 0.0 or -0.0 for any value that is smaller than the smallest possible float defined in the specification. (Note that XML schema’s float canonicalizes the representation of zero to remove this subtle distinction but it can be useful to preserve it for numerical operations. Likewise, if we are given a representation that is larger than any valid float we return one of the special float values INF or -INF as appropriate.
-
pyslet.xml.xsdatatypes.
float_to_str
(value)¶ Encodes a python float value as a string.
To reduce the chances of our output being rejected by external applications that are strictly bound to a 32-bit float representation we ensure that we don’t output values that exceed the bounds of float defined by XML schema.
Therefore, we convert values that are too large to INF and values that are too small to 0.0E0.
-
pyslet.xml.xsdatatypes.
double_from_str
(src)¶ Decodes a double value from a string returning a python float.
The precision of the python float varies depending on the implementation. It may even exceed the precision of the XML schema double. The current implementation ignores this distinction.
-
pyslet.xml.xsdatatypes.
double_to_str
(value, digits=None, strip_zeros=True, **kws)¶ Encodes a double value returning a character string.
- digits
- Controls the number of digits after the decimal point in the mantissa, None indicates no maximum and the precision of python’s float is used to determine the appropriate number. You may pass the value 0 in which case no digits are given after the point and the point itself is omitted, but such values are not in their canonical form.
- strip_zeros (aka stripZeros)
- determines whether or not trailing zeros are removed, if False then exactly digits digits will be displayed after the point. By default zeros are stripped (except there is always one zero left after the decimal point).
-
class
pyslet.xml.xsdatatypes.
Duration
(value=None)¶ Bases:
pyslet.iso8601.Duration
Represents duration values.
Extends the basic iso duration class to include negative durations.
-
sign
= None¶ an integer with the sign of the duration
-
dateTime values are represented by pyslet.iso8601.TimePoint
instances. These functions are provided for convenience in custom
attribute mappings.
-
pyslet.xml.xsdatatypes.
datetime_from_str
(src)¶ Returns a
pyslet.iso8601.TimePoint
instance.
-
pyslet.xml.xsdatatypes.
datetime_to_str
(value)¶ Returns the canonical lexical representation for dateTime
- value:
- An instance of
pyslet.iso8601.TimePoint
6.2.5.3. Derived Datatypes¶
dateTime values are represented by pyslet.iso8601.TimePoint
instances. These functions are provided for convenience in custom
attribute mappings.
Name represents XML Names, the native Python character string is used.
-
pyslet.xml.xsdatatypes.
name_from_str
(src)¶ Decodes a name from a string.
Returns the same string or raises ValueError if src does not match the XML production Name.
-
pyslet.xml.xsdatatypes.
name_to_str
(src)¶ Encodes a name
A convenience function (equivalent to
pyslet.py2.to_text()
.
Integer is represented by the native Python integer.
-
pyslet.xml.xsdatatypes.
integer_from_str
(src)¶ Decodes an integer
If string is not a valid lexical representation of an integer then ValueError is raised. This uses XML Schema’s lexical rules which are slightly different from Python’s native conversion.
-
pyslet.xml.xsdatatypes.
integer_to_str
(value)¶ Encodes an integer value using the canonical lexical representation.
6.2.5.4. Constraining Facets¶
6.2.5.4.1. Enumeration¶
-
class
pyslet.xml.xsdatatypes.
Enumeration
¶ Bases:
pyslet.xml.xsdatatypes.EnumBase
Abstract class for defining enumerations
The class is not designed to be instantiated but to act as a method of defining constants to represent the values of an enumeration and for converting between those constants and the appropriate string representations.
The basic usage of this class is to derive a class from it with a single class member called ‘decode’ which is a mapping from canonical strings to simple integers.
Once defined, the class will be automatically populated with a reverse mapping dictionary (called encode) and the enumeration strings will be added as attributes of the class itself. For exampe:
class Fruit(Enumeration): decode = { 'Apple": 1, 'Pear': 2, 'Orange': 3} Fruit.Apple == 1 # True thanks to metaclass
You can add define additional mappings by providing a second dictionary called aliases that maps additional names onto existing values. The aliases dictionary is a mapping from strings onto the equivalent canonical string:
class Vegetables(Enumeration): decode = { 'Tomato': 1, 'Potato': 2, 'Courgette': 3} aliases = { 'Zucchini': 'Courgette'} Vegetables.Zucchini == 3 # True thanks to metaclass
You may also add the special key None to the aliases dictionary to define a default value for the enumeration. This is mapped to an attribute called DEFAULT:
class Staples(Enumeration): decode = { 'Bread': 1, 'Pasta': 2, 'Rice': 3} aliases = { None: 'Bread'} Staples.DEFAULT == 1 # True thanks to metaclass
-
DEFAULT
= None¶ The DEFAULT value of the enumeration defaults to None
-
classmethod
from_str
(src)¶ Decodes a string returning a value in this enumeration.
If no legal value can be decoded then ValueError is raised.
-
classmethod
from_str_lower
(src)¶ Decodes a string, converting it to lower case first.
Returns a value in this enumeration. If no legal value can be decoded then ValueError is raised.
-
classmethod
from_str_upper
(src)¶ Decodes a string, converting it to upper case first.
Returns a value in this enumeration. If no legal value can be decoded then ValueError is raised.
-
classmethod
from_str_title
(src)¶ Decodes a string, converting it to title case first.
Title case is defined as an initial upper case letter with all other letters lower case.
Returns a value in this enumeration. If no legal value can be decoded then ValueError is raised.
-
classmethod
list_from_str
(decoder, src)¶ Decodes a list of values
- decoder
- One of the from_str methods.
- src
- A space-separated string of values
The result is an ordered list of values (possibly containing duplicates).
Example usage:
Fruit.list_from_str(Fruit.from_str_title, "apple orange pear") # returns [ Fruit.Apple, Fruit.Orange, Fruit.Pear ]
-
classmethod
dict_from_str
(decoder, src)¶ Decodes a dictionary of values
- decoder
- One of the from_str methods
- src
- A space-separated string of values.
The result is a dictionary mapping the values found as keys onto the strings used to represent them. Duplicates are mapped to the first occurrence of the encoded value.
Example usage:
Fruit.dict_from_str(Fruit.from_str_title, "Apple orange PEARS apple") # returns {Fruit.Apple: 'Apple', Fruit.Orange: 'orange', # Fruit.Pear: 'PEARS' }
-
classmethod
to_str
(value)¶ Encodes one of the enumeration constants returning a string.
If value is None then the encoded default value is returned (if defined) or None.
-
classmethod
list_to_str
(value_list)¶ Encodes a list of enumeration constants
- value_list
- A list or iterable of integer values corresponding to enumeration constants.
Returns a space-separated string. If valueList is empty then an empty string is returned.
-
classmethod
dict_to_str
(value_dict, sort_keys=True, **kws)¶ Encodes a dictionary of enumeration constants
- value_dict
- A dictionary with integer keys corresponding to enumeration constant values.
- sort_keys
- Boolean indicating that the result should be sorted by constant value. (Defaults to True.)
Returns a space-separated string. If value_dict is empty then an empty string is returned. The values in the dictionary are ignored, the keys are used to obtain the canonical representation of each value. Extending the example given in
dict_from_str()
:Fruit.dict_to_str( {Fruit.Apple: 'Apple', Fruit.Orange: 'orange', Fruit.Pear: 'PEARS' }) # returns: "Apple Pear Orange"
The order of the values in the string is determined by the sort order of the enumeration constants (not their string representation). This ensures that equivalent dictionaries are always encoded to the same string. In the above example:
Fruit.Apple < Fruit.Pear < Fruit.Orange
If you have large lists then you can skip the sorting step by passing False for sort_keys to improve performance at the expense of an unpredictable encoding.
-
classmethod
DecodeLowerValue
(*args, **kwargs)¶ Deprecated equivalent to
from_str_lower()
-
classmethod
DecodeTitleValue
(*args, **kwargs)¶ Deprecated equivalent to
from_str_title()
-
classmethod
DecodeUpperValue
(*args, **kwargs)¶ Deprecated equivalent to
from_str_upper()
-
classmethod
DecodeValue
(*args, **kwargs)¶ Deprecated equivalent to
from_str()
-
classmethod
DecodeValueDict
(*args, **kwargs)¶ Deprecated equivalent to
dict_from_str()
-
classmethod
DecodeValueList
(*args, **kwargs)¶ Deprecated equivalent to
list_from_str()
-
classmethod
EncodeValueDict
(*args, **kwargs)¶ Deprecated equivalent to
dict_to_str()
-
classmethod
EncodeValueList
(*args, **kwargs)¶ Deprecated equivalent to
list_to_str()
-
-
class
pyslet.xml.xsdatatypes.
EnumerationNoCase
¶ Bases:
pyslet.xml.xsdatatypes.Enumeration
Convenience class that automatically adds lower-case aliases
On creation, the enumeration ensures that aliases equivalent to the lower-cased canonical strings are defined. Designed to be used in conjunction with
from_str_lower()
for case insensitive matching of enumumeration strings.
6.2.5.5. Regular Expressions¶
Appendix F of the XML Schema datatypes specification defines a regular expression language. This language differs from the native Python regular expression language but it is close enough to enable us to define a wrapper class which parses schema regular expressions and converts them to equivalent python regular expressions.
-
class
pyslet.xml.xsdatatypes.
RegularExpression
(src)¶ Bases:
pyslet.py2.UnicodeMixin
A regular expression as defined by XML schema.
Regular expressions are constructed from character strings. Internally they are parsed and converted to Python regular expressions to speed up matching.
Warning: because the XML schema expression language contains concepts not supported by Python the python regular expression may not be very readable.
-
src
= None¶ the original source string
-
p
= None¶ the compiled python regular expression
-
match
(target)¶ Returns True if the expression matches target.
-
For completeness we also document the parser we use to do the
conversion, it draws heavily on the
pyslet.unicode5.CharClass
concept.
-
class
pyslet.xml.xsdatatypes.
RegularExpressionParser
(source)¶ Bases:
pyslet.unicode5.BasicParser
A custom parser for XML schema regular expressions.
The parser is initialised from a character string and always operates in text mode.
-
require_reg_exp
()¶ Parses a regExp
Returns a unicode string representing the regular expression.
-
require_branch
()¶ Parses branch
Returns a character string representing these pieces as a python regular expression.
-
require_piece
()¶ Parses piece
Returns a character string representing this piece in python regular expression format.
-
require_quantifier
()¶ Parses quantifier
Returns a tuple of (n, m).
Symbolic values are expanded to the appropriate pair. The second value may be None indicating unbounded.
-
require_quantity
()¶ Parses quantity
Returns a tuple of (n, m) even if an exact quantity is given.
In other words, the exact quantity ‘n’ returns (n, n). The second value may be None indicating unbounded.
-
require_quant_exact
()¶ Parses QuantEact
Returns the integer value parsed.
-
require_atom
()¶ Parses atom
Returns a unicode string representing this atom as a python regular expression.
-
is_char
(c=None)¶ Parses Char
Returns either True or False depending on whether
the_char
satisfies the production Char.The definition of this function is designed to be conservative with respect to the specification, which is clearly in error around production [10] as the prose and the BNF do not match. It appears that | was intended to be excluded in the prose but has been omitted, the reverse being true for the curly-brackets.
-
require_char_class
()¶ Parses a charClass.
-
require_char_class_expr
()¶ Parses charClassExpr
-
require_char_group
()¶ Parses charGroup.
This method also handles the case of a class subtraction directly to reduce the need for look-ahead. If you specifically want to parse a subtraction you can do this with
require_char_class_sub()
.
-
require_pos_char_group
()¶ Parses posCharGroup
-
require_neg_char_group
()¶ Parses negCharGroup.
-
require_char_class_sub
()¶ Parses charClassSub
This method is not normally used by the parser as in present for completeness. See
require_char_group()
.
-
require_char_range
()¶ Parses a charRange.
-
require_se_range
()¶ Parses seRange.
-
require_char_or_esc
()¶ Parses charOrEsc.
-
require_char_class_esc
()¶ Parsers charClassEsc.
Returns a CharClass instance.
-
require_single_char_esc
()¶ Parses SingleCharEsc
Returns a single character.
-
require_cat_esc
()¶ Parses catEsc.
-
require_compl_esc
()¶ Parses complEsc.
-
require_char_prop
()¶ Parses a charProp.
-
require_is_category
()¶ Parses IsCategory.
-
require_is_block
()¶ Parses IsBlock.
-
require_multi_char_esc
()¶ Parses a MultiCharEsc.
-
require_wildcard_esc
()¶ Parses ‘.’, the wildcard CharClass
-
6.2.5.6. Backwards Compatibility¶
-
pyslet.xml.xsdatatypes.
DecodeBoolean
(*args, **kwargs)¶ Deprecated equivalent to
boolean_from_str()
-
pyslet.xml.xsdatatypes.
EncodeBoolean
(*args, **kwargs)¶ Deprecated equivalent to
boolean_to_str()
-
pyslet.xml.xsdatatypes.
DecodeDecimal
(*args, **kwargs)¶ Deprecated equivalent to
decimal_from_str()
-
pyslet.xml.xsdatatypes.
EncodeDecimal
(*args, **kwargs)¶ Deprecated equivalent to
decimal_to_str()
-
pyslet.xml.xsdatatypes.
DecodeFloat
(*args, **kwargs)¶ Deprecated equivalent to
float_from_str()
-
pyslet.xml.xsdatatypes.
EncodeFloat
(*args, **kwargs)¶ Deprecated equivalent to
float_to_str()
-
pyslet.xml.xsdatatypes.
DecodeDouble
(*args, **kwargs)¶ Deprecated equivalent to
double_from_str()
-
pyslet.xml.xsdatatypes.
EncodeDouble
(*args, **kwargs)¶ Deprecated equivalent to
double_to_str()
-
pyslet.xml.xsdatatypes.
DecodeDateTime
(*args, **kwargs)¶ Deprecated equivalent to
datetime_from_str()
-
pyslet.xml.xsdatatypes.
EncodeDateTime
(*args, **kwargs)¶ Deprecated equivalent to
datetime_to_str()
-
pyslet.xml.xsdatatypes.
DecodeName
(*args, **kwargs)¶ Deprecated equivalent to
name_from_str()
-
pyslet.xml.xsdatatypes.
EncodeName
(*args, **kwargs)¶ Deprecated equivalent to
name_to_str()
-
pyslet.xml.xsdatatypes.
DecodeInteger
(*args, **kwargs)¶ Deprecated equivalent to
integer_from_str()
-
pyslet.xml.xsdatatypes.
EncodeInteger
(*args, **kwargs)¶ Deprecated equivalent to
integer_to_str()
-
pyslet.xml.xsdatatypes.
make_enum
(cls, default_value=None, **kws)¶ Deprecated function
This function is no longer required and does nothing unless default_value is passed in which case it adds the DEFAULT attribute to the Enumeration cls as if an alias had been declared for None (see
Enumeration
above for details).
-
pyslet.xml.xsdatatypes.
MakeEnumeration
(*args, **kwargs)¶ Deprecated equivalent to
make_enum()
-
pyslet.xml.xsdatatypes.
make_enum_aliases
(cls, aliases)¶ Deprecated function
Supported for backwards compatibility, modify enum class definitions to include aliases as an attribute directly:
class MyEnum(Enumeration): decode = { # strings to ints mapping } aliases = { # aliases to strings mapping }
The new metaclass takes care of processing the aliases dictionary when the class is created.
-
pyslet.xml.xsdatatypes.
MakeEnumerationAliases
(*args, **kwargs)¶ Deprecated equivalent to
make_enum_aliases()
-
pyslet.xml.xsdatatypes.
make_lower_aliases
(cls)¶ Deprecated function
Supported for backwards compatibility. Use new class
EnumerationNoCase
instead.Warning, the new class will only add lower-case aliases for the canonical strings, any additional aliases (defined in the aliases dictionary attribute) must already be lower-case or be defined with both case variants.
-
pyslet.xml.xsdatatypes.
MakeLowerAliases
(*args, **kwargs)¶ Deprecated equivalent to
make_lower_aliases()