6.2.2. XML: Reference¶

6.2.2.1. Documents and Elements¶

class pyslet.xml.structures.Node(parent=None)¶

Bases: pyslet.py2.UnicodeMixin, pyslet.pep8.MigratedClass

Base class for Element and Document shared attributes.

XML documents are defined hierarchicaly, each element has a parent which is either another element or an XML document.

get_children()¶: Returns an iterator over this object’s children.

classmethod get_element_class(name)¶

Returns a class object for representing an element

name: a unicode string representing the element name.

The default implementation returns None - for elements this has the effect of deferring the call to the parent document (where this method is overridden to return Element).

This method is called immediately prior to add_child() and (when applicable) get_child_class().

The real purpose of this method is to allow an element class to directly control the way the name of a child element maps to the class used to represent it. You would normally override this method in the Document to map element names to classes but in some cases you may want to tweek the mapping at the individual element level. For example, if the same element name is used for two different purposes in the same XML document. Although confusing, this is allowed in XML schema.

get_child_class(stag_class)¶

Supports custom content model handling

stag_class: The class of an element that is about to be created in the current context with add_child() or the builtin str if data has been recieved in a context where only element content was expected.

This method is only called when the XMLParser.sgml_omittag option is in effect. It is called prior to add_child() and gives the context (the parent element or document) a chance to modify the child element that will be created or indicate the end of the current element through use of the OMITTAG feature of SGML.

It returns the class of an element whose start tag has been omitted from the the document and should be added at this point or None if stag_class implies the end of the current element and the end tag may be omitted.

Otherwise this method should return stag_class unchanged (the default implementation does this) indicating that the parser should proceed as normal. In the case of unexpected data this is treated as a validity error and handled according to the parser’s validity checking options.

Validation errors are dealt with by the parser or, where the model is encoded into the classes themselves, by :meth;`add_child` and not by this method which should never raise validation errors.

Although not necessary for true XML parsing this method allows us to support the parsing of XML-like documents that omit tags, such as HTML. For example, suppose we have the following document:

<title>My Blank HTML Page</title>

The parser would recognise the start tag for <title> and then call this method (on the HTML document) passing the pyslet.html.Title class. For HTML documents, this method always returns the pyslet.html401.HTML class (ignoring stag_class completely). The result is that an HTML element is opened instead and the parser tries again, calling this method for the new HTML element. That does not accept Title either and returns the pyslet.html.Head class. Finally, a Head element is opened and that will accept Title as a child so it returns stag_class unchanged and the parser continues having inferred the omitted tags: <html> and <head>.

add_child(child_class, name=None)¶

Returns a new child of the given class attached to this object.

child_class: A class (or callable) used to create a new instance of Element.
name: The name given to the element (by the caller). If no name is given then the default name for the child is used. When the child returned is an existing instance, name is ignored.

processing_instruction(target, instruction='')¶

Abstract method for handling processing instructions

By default, processing instructions are ignored.

get_base()¶

Returns the base URI for a node

Abstract method, when used on a Document it returns the URI used to load the document, if known.

set_base(base)¶

Sets the base URI of a node.

base: A string suitable for setting xml:base or a pyslet.rfc2396.URI instance.

Abstract method. Changing the base effects the interpretation of all relative URIs in this node and its children.

get_lang()¶

Get the language of a node

Abstract method, when used on a Document it gets the default language to use in the absence of an explicit xml:lang value.

set_lang(lang)¶

Set the language of a node

lang: A string suitable for setting the xml:lang attribute of an element.

Abstract method, when used on a Document it sets a default language to use in the absence of an explicit xml:lang value.

get_space()¶

Gets the space policy of a node

Abstract method, when used on a Document it gets the default space policy to use in the absence of an explicit xml:space value.

ChildElement(*args, **kwargs)¶: Deprecated equivalent to add_child()

GetBase(*args, **kwargs)¶: Deprecated equivalent to get_base()

GetChildClass(*args, **kwargs)¶: Deprecated equivalent to get_child_class()

GetChildren(*args, **kwargs)¶: Deprecated equivalent to get_children()

classmethod GetElementClass(*args, **kwargs)¶: Deprecated equivalent to get_element_class()

GetLang(*args, **kwargs)¶: Deprecated equivalent to get_lang()

GetSpace(*args, **kwargs)¶: Deprecated equivalent to get_space()

SetBase(*args, **kwargs)¶: Deprecated equivalent to set_base()

SetLang(*args, **kwargs)¶: Deprecated equivalent to set_lang()

class pyslet.xml.structures.Document(root=None, base_uri=None, req_manager=None, **kws)¶

Bases: pyslet.xml.structures.Node

Base class for all XML documents.

With no arguments, a new Document is created with no base URI or root element.

root

If root is a class object (descended from Element) it is used to create the root element of the document.

If root is an orphan instance of Element (i.e., it has no parent) is is used as the root element of the document and its Element.attach_to_doc() method is called.

base_uri (aka baseURI for backwards compatibility)

See set_base() for more information

req_manager (aka reqManager for backwards compatibility)

Sets the request manager object to use for future HTTP calls. Must be an instance of pyslet.http.client.Client.

base_uri = None¶: The base uri of the document (as an URI instance)

lang = None¶: The default language of the document (see set_lang()).

declaration = None¶: The XML declaration (or None if no XMLDeclaration is used)

dtd = None¶: The dtd associated with the document or None.

root = None¶: The root element or None if no root element has been created yet.

get_children()¶: Yields the root element

XMLParser(entity)¶

Creates a parser for this document

entity: The entity to parse the document from

The default implementation creates an instance of XMLParser.

This method allows some document classes to override the parser used to parse them. This method is only used when parsing existing document instances (see read() for more information).

Classes that override this method may still register themselves with register_doc_class() but if they do then the default XMLParser object will be used as automatic detection of document class is done by the parser itself based on the information in the prolog (and/or first element).

classmethod get_element_class(name)¶

Defaults to returning Element.

Derived classes overrride this method to enable the XML parser to create instances of custom classes based on the document context and element name.

add_child(child_class, name=None)¶

Creates the root element of the document.

If there is already a root element it is detached from the document first using Element.detach_from_doc().

Unlike Element.add_child() there are no model customization options. The root element is always found at root.

set_base(base_uri)¶

Sets the base_uri of the document to the given URI.

base_uri: An instance of pyslet.rfc2396.URI or an object that can be passed to its constructor.

Relative file paths are resolved relative to the current working directory immediately and the absolute URI is recorded as the document’s base_uri.

get_base()¶: Returns a string representation of the document’s base_uri.

get_lang()¶: Returns the default language for the document.

set_lang(lang)¶: Sets the default language for the document.

get_space()¶

Returns the default space policy for the document.

By default we reutrn None, indicating that no policy is in force. Derived documents can oveerrid this behaviour to return either “preserve” or “default” to affect space handling.

validation_error(msg, element, data=None, aname=None)¶

Called when a validation error is triggered.

msg: contains a brief message suitable for describing the error in a log file.
element: the element in which the validation error occurred
data, aname: See Element.validation_error().

Prior to raising XMLValidityError this method logs a suitable message at WARN level.

register_element(element)¶

Registers an element’s ID

If the element has an ID attribute it is added to the internal ID table. If the ID already exists XMLIDClashError is raised.

unregister_element(element)¶

Removes an elements ID

If the element has a uniquely defined ID it is removed from the internal ID table. Called prior to detaching the element from the document.

get_element_by_id(id)¶

Returns the element with a given ID

Returns None if the ID is not the ID of any element.

get_unique_id(base_str=None)¶

Generates a random element ID that is not yet defined

base_str: A suggested prefix (defaults to None).

read(src=None, **kws)¶

Reads this document, parsing it from a source stream.

With no arguments the document is read from the base_uri which must have been specified on construction or with a call to the set_base() method.

src (defaults to None): You can override the document’s base URI by passing a value for src which may be an instance of XMLEntity or a file-like object suitable for passing to read_from_stream().

read_from_stream(src)¶

Reads this document from a stream

src: Any object that can be passed to XMLEntity’s constructor.

If you need more control, for example over encodings, you can create the entity yourself and use read_from_entity() instead.

read_from_entity(e)¶

Reads this document from an entity

e: An XMLEntity instance.

The document is read from the current position in the entity.

create(dst=None, **kws)¶

Creates the Document.

Outputs the document as an XML stream.

dst (defaults to None): The stream is written to the base_uri by default but if the ‘dst’ argument is provided then it is written directly to there instead. dst can be any object that supports the writing of binary strings.

Currently only documents with file type baseURIs are supported. The file’s parent directories are created if required. The file is always written using the UTF-8 as per the XML standard.

generate_xml(escape_function=<function escape_char_data>, tab='\t', encoding='UTF-8')¶

A generator that yields serialised XML

escape_function: The function that will be used to escape character data. The default is escape_char_data(). The alternate name escapeFunction is supported for backwards compatibility.
tab (defaults to ‘t’): Whether or not indentation will be used is determined by the tab parameter. If it is empty then no pretty-printing is performed, otherwise elements are indented (where allowed by their defining classes) for ease of reading.
encoding (defaults to “UTF-8”): The name of the character encoding to put in the XML declaration.

Yields character strings, the first string being the XML declaration which always specifies the encoding UTF-8

write_xml(writer, escape_function=<function escape_char_data>, tab='\t')¶

Writes serialized XML to an output stream

writer: A file or file-like object operating in binary mode.

The other arguments follow the same pattern as generate_xml() which this method uses to create the output which is always UTF-8 encoded.

update(**kws)¶

Updates the Document.

Update outputs the document as an XML stream. The stream is written to the base_uri which must already exist! Currently only documents with file type baseURIs are supported.

diff_string(other_doc, before=10, after=5)¶

Compares XML documents

other_doc: Another Document instance to compare with.
before (default 10): Number of lines before the first difference to output
after (default 5): Number of lines after the first difference to output

The two documents are converted to character strings and then compared line by line until a difference is found. The result is suitable for logging or error reporting. Used mainly to make the output of unittests easier to understand.

Create(*args, **kwargs)¶: Deprecated equivalent to create()

DiffString(*args, **kwargs)¶: Deprecated equivalent to diff_string()

GenerateXML(*args, **kwargs)¶: Deprecated equivalent to generate_xml()

GetElementByID(*args, **kwargs)¶: Deprecated equivalent to get_element_by_id()

GetUniqueID(*args, **kwargs)¶: Deprecated equivalent to get_unique_id()

Read(*args, **kwargs)¶: Deprecated equivalent to read()

ReadFromEntity(*args, **kwargs)¶: Deprecated equivalent to read_from_entity()

ReadFromStream(*args, **kwargs)¶: Deprecated equivalent to read_from_stream()

RegisterElement(*args, **kwargs)¶: Deprecated equivalent to register_element()

UnregisterElement(*args, **kwargs)¶: Deprecated equivalent to unregister_element()

Update(*args, **kwargs)¶: Deprecated equivalent to update()

ValidationError(*args, **kwargs)¶: Deprecated equivalent to validation_error()

WriteXML(*args, **kwargs)¶: Deprecated equivalent to write_xml()

class pyslet.xml.structures.Element(parent, name=None)¶

Bases: pyslet.xml.structures.Node

Base class that represents all XML elements.

This class is usually used only as a default to represent elements with unknown content models or that require no special processing. The power of Pyslet’s XML package comes when different classes are derived from this one to represent the different (classes of) elements defined by an application. These derived classes will normally some form of custom serialisation behaviour (see below).

Although derived classes are free to implement a wide range of python protocols they must always return True in truth tests. An implementation of __bool__ (Python 2, __nonzero__) is provided that does this. This ensures that derived classes are free to implement __len__ but bear in mind that an instance of a derived class for which __len__ returns 0 must still evaluate to True.

Elements compare equal if their names, attribute lists and canonical children all compare equal. No rich comparison methods are provided.

In addition to truth testing, custom attribute serialisation requires a custom implementation of __getattr__, see below for more details.

Elements are usually constructed by calling the parent element’s (or document’s) Node.add_child() method. When constructed directly, the constructor requires that the parent Node be passed as an argument. If you pass None then an orphan element is created (see attach_to_parent()).

Some aspects of the element’s XML serialisation behaviour are controlled by special class attributes that can be set on derived classes.

XMLNAME: The default name of the element the class represents.
XMLCONTENT: The default content model of the element; one of the ElementType constants.

You can customise attribute mappings using the following special class attributes.

ID: The name of the ID attribute if the element has a unique ID. With this class attribute set, ID handling is automatic (see set_id() and py:attr:id below).

By default, attributes are simply stored as name/value character strings in an internal dictionary. It is often more useful to map XML attributes directly onto similarly named attributes of the instances that represent each element.

This mapping can be provided using class attributes of the form XMLATTR_aname where /aname/ is the name of the attribute as it would appear in the element’s tag. There are a number of forms of attribute mapping.

XMLATTR_aname=<string>

This form creates a simple mapping from the XML attribute ‘aname’ to a python attribute with a defined name. For example, you might want to create a mapping like this to avoid a python reserved word:
XMLATTR_class="style_class"
This allows XML elements like this:
<element class="x"/>
To be parsed into python objects that behave like this:
element.style_class=="x"     # True
If an instance is missing a python attribute corresponding to a defined XML attribute, or it’s value has been set to None, then the XML attribute is omitted from the element’s tag when generating XML output.

XMLATTR_aname=(<string>, decode_function, encode_function)

More complex attributes can be handled by setting XMLATTR_aname to a tuple. The first item is the python attribute name (as above); the decode_function is a simple callable that takes a string argument and returns the decoded value of the attribute and the encode_function performs the reverse transformation.

The encode/decode functions can be None to indicate a no-operation.

For example, you might want to create an integer attribute using something like:

<element apples="5"/>

# class attribute definition
XMLATTR_apples = ('n_apples', int, str)

# the resulting object behaves like this...
element.n_apples == 5    # True

XMLATTR_aname=(<string>, decode_function, encode_function, type)

When XML attribute values are parsed from tags the optional type component of the tuple descriptor can be used to indicate a multi-valued attribute. For example, you might want to use a mult-valued mapping for XML attributes defined using one of the plural forms, IDREFS, ENTITIES and NMTOKENS.

If the type value is not None then the XML attribute value is first split by white-space, as per the XML specification, and then the decode function is applied to each resulting component. The instance attribute is then set depending on the value of type:
list
The instance attribute becomes a list, for example:

<element primes="2 3 5 7"/>

# class attribute definition
XMLATTR_primes = ('primes', int, str, list)

# resulting object behaves like this...
element.primes == [2, 3, 5, 7]      # True
dict
The instance attribute becomes a dictionary mapping parsed values on to their frequency, for example:

<element fruit="apple pear orange pear"/>

# class attribute definition
XMLATTR_fruit = ('fruit', None, None, dict)

# resulting object behaves like this...
element.fruit == {'apple': 1, 'orange': 1, 'pear': 2}
In this case, the decode function (if given) must return a hashable object!
When serialising to XML the reverse transformations are performed using the encode functions and the type (plain, list or dict) of the attribute’s current value. The declared multi-valued type is ignored. For dictionary values the order of the output values may not be the same as the order originally read from the XML input.

Warning: Empty lists and dictionaries result in XML attribute values that are present but with empty strings. If you wish to omit these attributes in the output XML you must set the attribute value to None.

Some element specifications define large numbers of optional attributes and it is inconvenient to write constructors to initialise these members in each instance and possibly wasteful of memory if a document contains large numbers of such elements.

To obviate the need for optional attributes to be present in every instance an implementation of __getattr__ is provided that will ensure that element.aname returns None if ‘aname’ is the target of an attribute mapping rule, regardless of whether or not the attribute has actually been seet for the instance.

Implementation note: internally, the XMLATTR_* descriptors are parsed into two mappings the first time they are needed. The forward map maps XML attribute names onto tuples of:

(<python attribute name>, decode_function, type)

The reverse map maps python attribute names onto a tuple of:

(<xml attribute name>, encode_function)

XML attribute names may contain many characters that are not legal in Python syntax but automated attribute processing is still supported for these attributes even though the declaration cannot be written into the class definition. Use the builtin function setattr immediately after the class is defined, for example:

class MyElement(Element):
    pass

setattr(MyElement, 'XMLATTR_hyphen-attr', 'hyphen_attr')

XMLCONTENT = 2¶: We default to a mixed content model

set_xmlname(name)¶

Sets the name of this element

name: A character string.

You will not normally need to call this method, it is called automatically during child creation.

get_xmlname()¶

Returns the name of this element

In the default implementation this is a simple character string.

get_document()¶

Returns the document that contains the element.

If the element is an orphan, or is the descendent of an orphan then None is returned.

set_id(id)¶

Sets the id of the element

The change is registered with the enclosing document. If the id is already taken then XMLIDClashError is raised.

classmethod mangle_aname(name)¶

Returns a mangled attribute name

A mangled attribute name is simple name prefixed with “XMLATTR_”.

classmethod unmangle_aname(mname)¶

Returns an unmangled attribute name.

If mname is not a mangled name, None is returned. A mangled attribute name starts with “XMLATTR_”.

get_attributes()¶

Returns a ditc mapping attribute names onto values.

Each attribute value is represented as a character string. Derived classes MUST override this method if they define any custom attribute mappings.

The dictionary returned represents a copy of the information in the element and so may be modified by the caller.

set_attribute(name, value)¶

Sets the value of an attribute.

name: The name of the attribute to set
value: The value of the attribute (as a character string) or None to remove the attribute.

get_attribute(name)¶

Gets the value of a single attribute as a string.

If the element has no attribute with name then KeyError is raised.

This method searches the attribute mappings and will return attribute values obtained by encoding the associated objects according to the mapping.

is_valid_name(value)¶

Returns True if a character string is a valid NAME

This test can be done standalone using the module function of the same name (this implementation defaults to using that function). By checking validity in the context of an element derived classes may override this test.

This test is used currently only used when checking IDs (see set_id())

is_empty()¶

Whether this element must be empty.

If the class defines the XMLCONTENT attribute then the model is taken from there and this method returns True only if XMLCONTENT is ElementType.EMPTY.

Otherwise, the method defaults to False

is_mixed()¶

Whether or not the element may contain mixed content.

If the class defines the XMLCONTENT attribute then the model is taken from there and this method returns True only if XMLCONTENT is ElementType.MIXED.

Otherwise, the method defaults to True

get_children()¶

Returns an iterable of the element’s children.

This method iterates through the internal list of children only. Derived classes with custom models (i.e., those that define attributes to customise child element creation) MUST override this method.

Each child is either a character string or an instance of Element (or a derived class thereof). We do not represent comments, processing instructions or other meta-markup.

get_canonical_children()¶

Returns children with canonical white space

A wrapper for get_children() that returns an iterable of the element’s children canonicalized for white space as follows. We check the current setting of xml:space, returning the same list of children as get_children() if ‘preserve’ is in force. Otherwise we remove any leading space and collapse all others to a single space character.

get_or_add_child(child_class)¶

Returns the first child of type child_class

If there is no child of that class then a new child is added.

add_child(child_class, name=None)¶

Adds a new child of the given class attached to this element.

child_class: A class object (or callable) used to create a new instance.
name: The name given to the element (by the caller). If no name is given then the default name for the child is used. When the child returned is an existing instance, name is ignored.

By default, an instance of child_class is created and attached to the internal list of child elements.

Child creation can be customised to support a more natural mapping for structured elements as follows. Firstly, the name of child_class (not the element name) is looked up in the parent (self), if there is no match, the method resolution order is followed for child_class looking up the names of each base in turn until a matching attribute is found. If there are no matches then the default handling is performed.

Otherwise, the behaviour is determined by the matching attribute as follows.

1 If the attribute is None then a new instance of child_class: is created and assigned to the attribute.
2 If the attribute is a list then a new instance of child_class: is created and appended to the attribute’s value.
3 Finally, if the attribute value is already an instance of: child_class it is returned unchanged.
4 Deprecated: A method attribute is called either without: arguments (if the method name matches the child_class exactly) or with the child_class itself passed as an argument. It must return the new child element.

In summary, a new child is created and attached to the element’s model unless the model supports a single element of the given child_class and the element already exists (as evidenced by an attribute with the name of child_class or one of its bases), in which case the existing instance is returned.

remove_child(child)¶

Removes a child from this element’s children.

child: An Element instance that must be a direct child. That is, one that would be yielded by get_children().

By default, we search the internal list of child elements.

For content model customisation we follow the same name matching conventions as for child creation (see add_child()). If a matching attribute is found then we process them as follows:

1 If the attribute’s value is child then it is set to None,: if it is not child then XMLUnknownChild is raised.
2 If the attribute is a list then we remove child from the: list. If child is not in the list XMLUnknownChild is raised.

If the attribute is None then we raise XMLUnknownChild.

find_children(child_class, child_list, max=None)¶

Finds children of a given class

Deprecated in favour of:

list(e.find_children_depth_first(child_class, False))

child_class: A class object derived from Element. May also be a tuple as per the definition of the builtin isinstance function in python.
child_list: A list. Matching children are appended to this.
max (defaults to None): Maximum number of children to match (None means no limit). This value is used to check against the length of child_list so any elements already present will count towards the total.

Nested matches are not included. In other words, if the model of child_class allows further elements of type child_class as children (directly or indirectly) then only the top-level match is returned. (Use find_children_depth_first() for a way to return recursive lists of matching children.)

The search is done depth first so children are returned in the logical order they would appear in the document.

find_children_breadth_first(child_class, sub_match=True, max_depth=1000, **kws)¶

Generates all children of a given class

child_class: A class object derived from Element. May also be a tuple as per the definition of the builtin isinstance function in python.
sub_match (defaults to True): Matching elements are also scanned for nested matches. If False, only the outer-most matching element is returned.
max_depth: Controls the maximum depth of the scan with level 1 indicating direct children only. It must be a positive integer and defaults to 1000.

Warning: to reduce memory requirements when searching large documents this method performs a two-pass scan of the element’s children, i.e., get_children() will be called twice.

Given that XML documents tend to be broader than they are deep find_children_depth_first() is a better method to use for general purposes.

find_children_depth_first(child_class, sub_match=True, max_depth=1000, **kws)¶

Generates all children of a given class

child_class: A class object derived from Element. May also be a tuple as per the definition of the builtin isinstance function in python.
sub_match (defaults to True): Matching elements are also scanned for nested matches. If False, only the outer-most matching element is returned.
max_depth: Controls the maximum depth of the scan with level 1 indicating direct children only. It must be a positive integer and defaults to 1000.

Uses a depth-first scan of the element hierarchy rooted at the current element.

find_parent(parent_class)¶

Finds the first parent of the given class.

parent_class: A class object descended from Element.

Traverses the hierarchy through parent elements until a matching parent is found or returns None.

attach_to_parent(parent)¶

Called to attach an orphan element to a parent.

This method is not normally needed, when creating XML elements you would normally call add_child() on the parent which ensures that elements are created in the context of a parent node. The purpose of this method is to allow orphaned elements to be associated with a (new) parent. For example, after being detached from one element hierarchy and attached to another.

This method does not do any special handling of child elements, the caller takes responsibility for ensuring that this element will be returned by future calls to parent.get_children(). However, attach_to_doc() is called to ensure id registrations are made.

attach_to_doc(doc=None)¶

Called when the element is first attached to a document.

This method is not normally needed, when creating XML elements you would normally call add_child() on the parent which ensures that elements are created in the context of a containing document. The purpose of this method is to allow orphaned elements to be associated with a parent (document) after creation. For example, after being detached from one element hierarchy and attached to another (possibly in a different document).

The default implementation ensures that any ID attributes belonging to this element or its descendents are registered.

detach_from_parent()¶

Called to detach an element from its parent

The result is that this element becomes an orphan.

This method does not do any special handling of child elements, the caller takes responsibility for ensuring that this element will no longer be returned by future calls to the (former) parent’s get_children() method.

We do call detach_from_doc() to ensure id registrations are removed and parent is set to None.

detach_from_doc(doc=None)¶

Called when an element is being detached from a document.

doc: The document the element is being detached from, if None then this is determined automatically. Provided as an optimisation for speed when detaching large parts of the element hierarchy.

The default implementation ensures that any ID attributes belonging to this element or its descendents are unregistered.

add_data(data)¶

Adds a character string to this element’s children.

This method raises a validation error if the element cannot take data children.

content_changed()¶

Notifies an element that its content has changed.

Called by the parser once the element’s attribute values and content have been parsed from the source. Can be used to trigger any internal validation required following manual changes to the element.

The default implementation tidies up the list of children reducing runs of data to a single unicode string to make future operations simpler and faster.

generate_value(ignore_elements=False)¶

Generates strings representing the element’s content

A companion method to get_value() which is useful when handling elements that contain a large amount of data). For more information see get_value().

get_value(ignore_elements=False)¶

Returns a single object representing the element’s content.

ignore_elements: If True then any elements found in mixed content are ignored. If False then any child elements cause XMLMixedContentError to be raised.

The default implementation returns a character string and is only supported for elements where mixed content is permitted (is_mixed()). It uses generate_value() to iterate through the children.

If the element is empty an empty string is returned.

Derived classes may return more complex objects, such as values of basic python types or class instances that better represent the content of the element.

You can pass ignore_elements as True to override this behaviour in the unlikely event that you want:

<!-- elements like this... -->
<data>This is <em>the</em> value</data>

# to behave like this:
data.get_value(True) == "This is  value" 

set_value(value)¶

Replaces the content of the element.

value: A character string used to replace the content of the element. Derived classes may support a wider range of value types, if the default implementation encounters anything other than a character string it attempts to convert it before setting the content.

The default implementation is only supported for elements where mixed content is permitted (see is_mixed()) and only affects the internally maintained list of children. Elements with more complex mixed models MUST override this method.

If value is None then the element becomes empty.

reset(reset_attrs=False)¶

Resets all children (and optionally attribute values).

reset_attrs

Whether or not to reset attribute values too.

Called by the default implementation of set_value() with reset_attrs=False, removes all children from the internally maintained list of children.

Called by the default implementation of add_child() with reset_attrs=True when an existing element instance is being recycled (obviating the constructor). The default implementation removes only unmapped attribute values. Mapped atrribute values are not reset.

Derived classes should call this method if they override the implementation of set_value().

Derived classes with custom content models, i.e., those that provide a custom implementation for get_children(), must override this method and treat it as an event associated with parsing the start tag of the element. (This method is also a useful signal for resetting an state used for validating custom content models.)

Required children should be reset and optional children should be orphaned using detach_from_parent() and any references to them in instance attributes removed. Failure to override this method will can result in the child elements accumulating from one read to the next.

validation_error(msg, data=None, aname=None)¶

Called when a validation error occurred in this element.

msg: Message suitable for logging and reporting the nature of the error.
data: The data that caused the error may be given in data.
aname: The attribute name may also be given indicating that the offending data was in an attribute of the element and not the element itself.

The default implementation simply calls the containing Document’s Document.validation_error() method. If the element is an orphan then XMLValidityError is raised directly with msg.

static sort_names(name_list)¶

Sorts names in a predictable order

name_list: A list of element or attribute names

The default implementation assumes that the names are strings or unicode strings so uses the default sort method.

deepcopy(parent=None)¶

Creates a deep copy of this element.

parent: The parent node to attach the new element to. If it is None then a new orphan element is created.

This method mimics the process of serialisation and deserialisation (without the need to generate markup). As a result, element attributes are serialised and deserialised to strings during the copy process.

get_base()¶: Returns the value of the xml:base attribute as a string.

set_base(base)¶

Sets the value of the xml:base attribute from a string.

Changing the base of an element effects the interpretation of all relative URIs in this element and its children.

resolve_base()¶

Returns the base of the current element.

The URI is calculated using any xml:base values of the element or its ancestors and ultimately relative to the base URI of the document itself.

If the element is not contained by a Document, or the document does not have a fully specified base_uri then the return result may be a relative path or even None, if no base information is available.

The return result is always None or a character string, such as would be obtained from the xml:base attribute.

resolve_uri(uriref)¶

Resolves a URI reference in the current context.

uriref: A pyslet.rfc2396.URI instance or a string that one can be parsed from.

The argument is resolved relative to the xml:base values of the element’s ancestors and ultimately relative to the document’s base. Ther result may still be a relative URI, there may be no base set or the base may only be known in relative terms.

For example, if the Document was loaded from the URL:

http://www.example.com/images/catalog.xml

and e is an element in that document then:

e.resolve_uri('smiley.gif')

would return a URI instance representing the fully-specified URI:

http://www.example.com/images/smiley.gif

relative_uri(href)¶

Returns href expressed relative to the element’s base.

href: A pyslet.rfc2396.URI instance or a string that one can be parsed from.

If href is already a relative URI then it is converted to a fully specified URL by interpreting it as being the URI of a file expressed relative to the current working directory.

For example, if the Document was loaded from the URL:

http://www.example.com/images/catalog.xml

and e is an element in that document then:

e.relatitve_uri('http://www.example.com/images/smiley.gif')

would return a URI instance representing relative URI:

'smiley.gif'

If the element does not have a fully-specified base URL then the result is a fully-specified URL itself.

get_lang()¶: Returns the value of the xml:lang attribute as a string.

set_lang(lang)¶

Sets the value of the xml:lang attribute from a string.

See resolve_lang() for how to obtain the effective language of an element.

resolve_lang()¶

Returns the effective language for the current element.

The language is resolved using the xml:lang value of the element or its ancestors. If no xml:lang is in effect then None is returned.

get_space()¶: Gets the value of the xml:space attribute

set_space(space)¶

Sets the xml:space attribute

space: A character string containing the new value or None to clear the attribute definition on this element.

resolve_space(space)¶

Returns the effective space policy for the current element.

The policy is resolved using the value returned by get_space() on this element or its ancestors. If no space policy is in effect then None is returned.

can_pretty_print()¶

True if this element’s content may be pretty-printed.

This method is used when formatting XML files to text streams. The output is also affected by the xml:space attribute. Derived classes can override the default behaviour.

The difference between this method and the xml:space attribute is that this method indicates if white space can be safely added to the output to improve formatting by inserting line feeds to break it over multiple lines and to insert spaces or tab characters to indent tags.

On the other hand, xml:space=’preserve’ indicates that white space in the original document must not be taken away. It therefore makes sense that if get_space() returns ‘preserve’ we will return False. Derived classes may consider providing an implementation of get_space that always return ‘preserve’ and using the default implementation of this method.

This method will return False if one of the following is true:

the special attribute SGMLCDATA is present
the special content model attribute XMLCONTENT indicates that the element may contain mixed content (this is the default for generic instances of Element)
get_space() is set to ‘preserve’ (xml:space)
self.parent.can_pretty_print() returns False

Otherwise we return True.

write_xml_attributes(attributes, escape_function=<function escape_char_data>, root=False, **kws)¶

Creates strings serialising the element’s attributes

attributes: A list of character strings
escape_function: The function that will be used to escape character data. The default is escape_char_data(). The alternate name escapeFunction is supported for backwards compatibility.
root: Indicates if this element should be treated as the root element. By default there is no special action required but derived classes may need to generate additional attributes, such as those that relate to the namespaces or schema used by the element.

The attributes are generated as strings of the form ‘name=”value”’ with values escaped appropriately for serialised XML output. The attributes are always sorted into a predictable order (based on attribute name) to ensure that identical documents produce identical output.

generate_xml(escape_function=<function escape_char_data>, indent='', tab='\t', root=False, **kws)¶

A generator that yields serialised XML

escape_function

The function that will be used to escape character data. The default is escape_char_data(). The alternate name escapeFunction is supported for backwards compatibility.

indent (defaults to an empty string)

The string to use for passing any inherited indent, used in combination with the tab parameter for pretty printing. See below.

tab (defaults to ‘t’)

Whether or not indentation will be used is determined by the tab parameter. If it is empty then no pretty-printing is performed for the element, otherwise the element will start with a line-feed followed by any inherited indent and finally followed by the content of tab. For example, if you prefer to have your XML serialised with a 4-space indent then pass tab=’ ‘.

If the element is in a context where pretty printing is not allowed (see can_pretty_print()) then tab is ignored.

root (defaults to False)

Indicates if this is the root element of the document. See write_xml_attributes().

Yields character strings.

write_xml(writer, escape_function=<function escape_char_data>, indent='', tab='\t', root=False, **kws)¶

Writes serialized XML to an output stream

writer: A file or file-like object operating in binary mode.

The other arguments follow the same pattern as generate_xml() which this method uses to create the output which is always UTF-8 encoded.

AddData(*args, **kwargs)¶: Deprecated equivalent to add_data()

AttachToDocument(*args, **kwargs)¶: Deprecated equivalent to attach_to_doc()

AttachToParent(*args, **kwargs)¶: Deprecated equivalent to attach_to_parent()

ContentChanged(*args, **kwargs)¶: Deprecated equivalent to content_changed()

Copy(*args, **kwargs)¶: Deprecated equivalent to deepcopy()

DeleteChild(*args, **kwargs)¶: Deprecated equivalent to remove_child()

DetachFromDocument(*args, **kwargs)¶: Deprecated equivalent to detach_from_doc()

DetachFromParent(*args, **kwargs)¶: Deprecated equivalent to detach_from_parent()

FindChildren(*args, **kwargs)¶: Deprecated equivalent to find_children()

FindChildrenBreadthFirst(*args, **kwargs)¶: Deprecated equivalent to find_children_breadth_first()

FindChildrenDepthFirst(*args, **kwargs)¶: Deprecated equivalent to find_children_depth_first()

FindParent(*args, **kwargs)¶: Deprecated equivalent to find_parent()

GenerateXML(*args, **kwargs)¶: Deprecated equivalent to generate_xml()

GetAttribute(*args, **kwargs)¶: Deprecated equivalent to get_attribute()

GetAttributes(*args, **kwargs)¶: Deprecated equivalent to get_attributes()

GetCanonicalChildren(*args, **kwargs)¶: Deprecated equivalent to get_canonical_children()

GetDocument(*args, **kwargs)¶: Deprecated equivalent to get_document()

GetValue(*args, **kwargs)¶: Deprecated equivalent to get_value()

GetXMLName(*args, **kwargs)¶: Deprecated equivalent to get_xmlname()

IsEmpty(*args, **kwargs)¶: Deprecated equivalent to is_empty()

IsMixed(*args, **kwargs)¶: Deprecated equivalent to is_mixed()

IsValidName(*args, **kwargs)¶: Deprecated equivalent to is_valid_name()

classmethod MangleAttributeName(*args, **kwargs)¶: Deprecated equivalent to mangle_aname()

PrettyPrint(*args, **kwargs)¶: Deprecated equivalent to can_pretty_print()

RelativeURI(*args, **kwargs)¶: Deprecated equivalent to relative_uri()

ResolveBase(*args, **kwargs)¶: Deprecated equivalent to resolve_base()

ResolveLang(*args, **kwargs)¶: Deprecated equivalent to resolve_lang()

ResolveURI(*args, **kwargs)¶: Deprecated equivalent to resolve_uri()

SetAttribute(*args, **kwargs)¶: Deprecated equivalent to set_attribute()

SetID(*args, **kwargs)¶: Deprecated equivalent to set_id()

SetSpace(*args, **kwargs)¶: Deprecated equivalent to set_space()

SetValue(*args, **kwargs)¶: Deprecated equivalent to set_value()

SetXMLName(*args, **kwargs)¶: Deprecated equivalent to set_xmlname()

static SortNames(*args, **kwargs)¶: Deprecated equivalent to sort_names()

classmethod UnmangleAttributeName(*args, **kwargs)¶: Deprecated equivalent to unmangle_aname()

ValidationError(*args, **kwargs)¶: Deprecated equivalent to validation_error()

WriteXML(*args, **kwargs)¶: Deprecated equivalent to write_xml()

WriteXMLAttributes(*args, **kwargs)¶: Deprecated equivalent to write_xml_attributes()

pyslet.xml.structures.map_class_elements(class_map, scope)¶

Adds element name -> class mappings to class_map

class_map: A dictionary that maps XML element names onto class objects that should be used to represent them.
scope: A dictionary, or an object containing a __dict__ attribute, that will be scanned for class objects to add to the mapping. This enables scope to be a module. The search is not recursive, to add class elements from imported modules you must call map_class_elements for each module.

Mappings are added for each class that is derived from Element that has an XMLNAME attribute defined. It is an error if a class is found with an XMLNAME that has already been mapped.

6.2.2.1.1. Exceptions¶

class pyslet.xml.structures.XMLMissingResourceError¶

Bases: pyslet.xml.structures.XMLError

Raised when an entity cannot be found (e.g., missing file).

Also raised when an external entity reference is encountered but the opening of external entities is turned off.

class pyslet.xml.structures.XMLMissingLocationError¶

Bases: pyslet.xml.structures.XMLError

Raised when on create, read or update when base_uri is None

class pyslet.xml.structures.XMLUnsupportedSchemeError¶

Bases: pyslet.xml.structures.XMLError

Document.base_uri has an unsupported scheme

Currently only file, http and https schemes are supported for open operations. For create and update operations, only file types are supported.

class pyslet.xml.structures.XMLUnexpectedHTTPResponse¶

Bases: pyslet.xml.structures.XMLError

Raised by Document.open_uri()

The message contains the response code and status message received from the server.

6.2.2.2. Prolog and Document Type Declaration¶

class pyslet.xml.structures.XMLDTD¶

Bases: pyslet.pep8.MigratedClass

An object that models a document type declaration.

The document type declaration acts as a container for the entity, element and attribute declarations used in a document.

name = None¶: The declared Name of the root element

parameter_entities = None¶: A dictionary of XMLParameterEntity instances keyed on entity name.

general_entities = None¶: A dictionary of XMLGeneralEntity instances keyed on entity name.

notations = None¶: A dictionary of XMLNotation instances keyed on notation name.

element_list = None¶: A dictionary of ElementType definitions keyed on the name of element.

attribute_lists = None¶: A dictionary of dictionaries, keyed on element name. Each of the resulting dictionaries is a dictionary of XMLAttributeDefinition keyed on attribute name.

declare_entity(entity)¶

Declares an entity in this document.

The same method is used for both general and parameter entities. The value of entity can be either an XMLGeneralEntity or an XMLParameterEntity instance.

get_parameter_entity(name)¶

Returns the parameter entity definition matching name.

Returns an instance of XMLParameterEntity. If no parameter has been declared with name then None is returned.

get_entity(name)¶

Returns the general entity definition matching name.

Returns an instance of XMLGeneralEntity. If no general has been declared with name then None is returned.

declare_notation(notation)¶

Declares a notation for this document.

The value of notation must be a XMLNotation instance.

get_notation(name)¶

Returns the notation declaration matching name.

name: The name of the notation to search for.

Returns an instance of XMLNotation. If no notation has been declared with name then None is returned.

declare_element_type(etype)¶

Declares an element type.

etype: An ElementType instance containing the element definition.

get_element_type(element_name)¶

Looks up an element type definition.

element_name: the name of the element type to look up

The method returns an instance of ElementType or None if no element with that name has been declared.

declare_attribute(element_name, attr_def)¶

Declares an attribute.

element_name: the name of the element type which should have this attribute applied
attr_def: An XMLAttributeDefinition instance describing the attribute being declared.

get_attribute_list(name)¶

Returns a dictionary of attribute definitions

name: The name of the element type to look up.

If there are no attributes declared for this element type, None is returned.

get_attribute_definition(element_name, attr_name)¶

Looks up an attribute definition.

element_name: the name of the element type in which to search
attr_name: the name of the attribute to search for.

The method returns an instance of XMLAttributeDefinition or None if no attribute matching this description has been declared.

GetAttributeList(*args, **kwargs)¶: Deprecated equivalent to get_attribute_list()

class pyslet.xml.structures.XMLDeclaration(version, encoding='UTF-8', standalone=False)¶

Bases: pyslet.xml.structures.XMLTextDeclaration

Represents a full XML declaration.

Unlike the parent class, XMLTextDeclaration, the version is required. standalone defaults to False as this is the assumed value if there is no standalone declaration.

standalone = None¶: Whether an XML document is standalone.

class pyslet.xml.structures.ElementType¶

Bases: object

Represents element type definitions.

EMPTY = 0¶: Content type constant for EMPTY

ANY = 1¶: Content type constant for ANY

MIXED = 2¶: Content type constant for mixed content

ELEMENT_CONTENT = 3¶: Content type constant for element content

SGMLCDATA = 4¶: Additional content type constant for SGML CDATA

entity = None¶: The entity in which this element was declared

name = None¶: The name of this element

content_type = None¶: The content type of this element, one of the constants defined above.

content_model = None¶: A XMLContentParticle instance which contains the element’s content model or None in the case of EMPTY or ANY declarations.

particle_map = None¶: A mapping used to validate the content model during parsing. It maps the name of the first child element found to a list of XMLNameParticle instances that can represent it in the content model. For more information see XMLNameParticle.particle_map.

build_model()¶: Builds internal strutures to support model validation.

is_deterministic()¶

Tests if the content model is deterministic.

For degenerate cases (elements declared with ANY or EMPTY) the method always returns True.

class pyslet.xml.structures.XMLContentParticle¶

Bases: object

An object for representing content particles.

ZeroOrOne = 1¶: Occurrence constant for ‘?’

OneOrMore = 3¶: Occurrence constant for ‘+’

occurrence = None¶: One of the occurrence constants defined above.

build_particle_maps(exit_particles)¶

Abstract method that builds the particle maps for this node or its children.

For more information see XMLNameParticle.particle_map.

Although only name particles have particle maps this method is called for all particle types to allow the model to be built hierarchically from the root out to the terminal (name) nodes. exit_particles provides a mapping to all the following particles outside the part of the hierarchy rooted at the current node that are directly reachable from the particles inside.

seek_particles(pmap)¶

Adds all possible entry particles to pmap.

Abstract method, pmap is a mapping from element name to a list of XMLNameParticles XMLNameParticle.

Returns True if a required particle was added, False if all particles added are optional.

Like build_particle_maps(), this method is called for all particle types. The mappings requested represent all particles inside the part of the hierarchy rooted at the current node that are directly reachable from the preceeding particles outside.

add_particles(src_map, pmap)¶

A utility method that adds particles from src_map to pmap.

Both maps are mappings from element name to a list of XMLNameParticles XMLNameParticle. All entries in src_map not currently in pmap are added.

is_deterministic(pmap)¶

A utility method for identifying deterministic particle maps.

A deterministic particle map is one in which each name maps uniquely to a single content particle. A non-deterministic particle map contains an ambiguity, for example ((b,d)|(b,e)). The particle map created by seek_particles() for the enclosing choice list would have two entries for ‘b’, one to map the first particle of the first sequence and one to the first particle of the second sequence.

Although non-deterministic content models are not allowed in SGML they are tolerated in XML and are only flagged as compatibility errors.

class pyslet.xml.structures.XMLNameParticle¶

Bases: pyslet.xml.structures.XMLContentParticle

Represents a content particle for a named element

name = None¶: the name of the element type that matches this particle

particle_map = None¶

Each XMLNameParticle has a particle map that maps the name of the ‘next’ element found in the content model to the list of possible XMLNameParticles XMLNameParticle that represent it in the content model.

The content model can be traversed using ContentParticleCursor.

class pyslet.xml.structures.XMLChoiceList¶

Bases: pyslet.xml.structures.XMLContentParticle

Represents a choice list of content particles in the grammar

class pyslet.xml.structures.XMLSequenceList¶

Bases: pyslet.xml.structures.XMLContentParticle

Represents a sequence list of content particles in the grammar

class pyslet.xml.structures.XMLAttributeDefinition¶

Bases: object

Represents an Attribute declaration

There is no special functionality provided by this class, instances hold the data members identified and the class defines a number of constants suitable for setting and testing them.

Contants are defined using CAPS, mixed case versions are provided only for backwards compatibility.

CDATA = 0¶: Type constant representing CDATA

ID = 1¶: Type constant representing ID

IDREF = 2¶: Type constant representing IDREF

IDREFS = 3¶: Type constant representing IDREFS

ENTITY = 4¶: Type constant representing ENTITY

ENTITIES = 5¶: Type constant representing ENTITIES

NMTOKEN = 6¶: Type constant representing NMTOKEN

NMTOKENS = 7¶: Type constant representing NMTOKENS

NOTATION = 8¶: Type constant representing NOTATION

ENUMERATION = 9¶: Type constant representing an enumeration, not defined as a keyword in the specification but representing declarations that match production [59], Enumeration.

IMPLIED = 0¶: Presence constant representing #IMPLIED

REQUIRED = 1¶: Presence constant representing #REQUIRED

FIXED = 2¶: Presence constant representing #FIXED

DEFAULT = 3¶: Presence constant representing a declared default value. Not defined as a keyword but represents a declaration with a default value defined in production [60].

entity = None¶: the entity in which this attribute was declared

name = None¶: the name of the attribute

type = None¶: One of the above type constants

values = None¶: An optional dictionary of values

defaultValue = None¶: An optional default value

6.2.2.3. Physical Structures¶

class pyslet.xml.structures.XMLEntity(src=None, encoding=None, req_manager=None, **kws)¶

Bases: pyslet.pep8.MigratedClass

Represents an XML entity.

This object serves two purposes, it acts as both the object used to store information about declared entities and also as a parser for feeding unicode characters to the main XMLParser.

src

May be a character string, a binary string, an instance of pyslet.rfc2396.URI, an instance of pyslet.http.client.ClientResponse or any object that supports file-like behaviour (seek and read).

If provided, the corresponding open method is called immediately, see open_unicode(), open_string(), open_uri(), open_http_response() and open_file().

encoding

If src is not None then this value will be passed when opening the entity reader.

req_manager

If src is a URI, passed to open_uri()

XMLEntity objects act as context managers, hence it is possible to use:

with XMLEntity(src=URI.from_octets('mydata.xml')) as e:
    # process the entity here, will automatically close

location = None¶: the location of this entity (used as the base URI to resolve relative links). A pyslet.rfc2396.URI instance.

mimetype = None¶: The mime type of the entity, if known, or None otherwise. A pyslet.http.params.MediaType instance.

encoding = None¶: the encoding of the entity (text entities), e.g., ‘utf-8’

bom = None¶: Flag to indicate whether or not the byte order mark was detected. If detected the flag is set to True. An initial byte order mark is not reported in the_char or by the next_char() method.

the_char = None¶: The character at the current position in the entity

line_num = None¶: The current line number within the entity (first line is line 1)

line_pos = None¶: the current character position within the entity (first char is 1)

buff_text = None¶: used by XMLParser.push_entity()

chunk_size = 8192¶

Characters are read from the data_source in chunks.

The default chunk size is set from io.DEFAULT_BUFFER_SIZE, typically 8KB.

In fact, in some circumstances the entity reader starts more cautiously. If the entity reader expects to read an XML or Text declaration, which may have an encoding declaration then it reads one character at a time until the declaration is complete. This allows the reader to change to the encoding in the declaration without causing errors caused by reading too many characters using the wrong codec. See change_encoding() and keep_encoding() for more information.

get_name()¶

Returns a name to represent this entity

The name is intended for logs and error messages. It defaults to the location if set.

is_external()¶

Returns True if this is an external entity.

The default implementation returns True if location is not None, False otherwise.

open()¶

Opens the entity for reading.

The default implementation uses open_uri() to open the entity from location if available, otherwise it raises NotImplementedError.

is_open()¶: Returns True if the entity is open for reading.

open_unicode(src)¶: Opens the entity from a unicode string.

open_string(src, encoding=None)¶

Opens the entity from a binary string.

src: A binary string.
encoding: The optional encoding is used to convert the string to unicode and defaults to None - meaning that the auto-detection method will be applied.

The advantage of using this method instead of converting the string to unicode and calling open_unicode() is that this method creates a unicode reader object to parse the string instead of making a copy of it in memory.

open_file(src, encoding='utf-8')¶

Opens the entity from a file

src: An existing (open) binary file.

The optional encoding provides a hint as to the intended encoding of the data and defaults to UTF-8. Unlike other Open* methods we do not assume that the file is seekable however, you may set encoding to None for a seekable file thus invoking auto-detection of the encoding.

open_uri(src, encoding=None, req_manager=None, **kws)¶

Opens the entity from a URI.

src: A pyslet.rfc2396.URI instance of either file, http or https schemes.
encoding: The optional encoding provides a hint as to the intended encoding of the data and defaults to UTF-8. For http(s) resources this parameter is only used if the charset cannot be read successfully from the HTTP headers.
req_manager: The optional req_manager allows you to pass an existing instance of pyslet.http.client.Client for handling URI with http or https schemes. (reqManager is supported for backwards compatibility.)

open_http_response(src, encoding='utf-8')¶

Opens the entity from an HTTP response passed in src.

src: An pyslet.http.client.ClientResponse instance.
encoding: The optional encoding provides a hint as to the intended encoding of the data and defaults to UTF-8. This parameter is only used if the charset cannot be read successfully from the HTTP response headers.

reset()¶

Resets an open entity

The entity returns to the first character in the entity.

get_position_str()¶

A short string describing the current position.

For example, if the current character is pointing to character 6 of line 4 then it will return the string ‘Line 4.6’

next_char()¶

Advances to the next character in an open entity.

This method takes care of the End-of-Line handling rules for XML which force us to remove any CR characters and replace them with LF if they appear on their own or to silenty drop them if they appear as part of a CR-LF combination.

auto_detect_encoding(src_file)¶

Auto-detects the character encoding

src_file: A file object. The object must support seek and blocking read operations. If src_file has been opened in text mode then no action is taken.

change_encoding(encoding)¶

Changes the character encoding used for this entity.

In many cases we can only guess at the encoding used in a file or other byte stream. However, XML has a mechanism for declaring the encoding as part of the XML or Text declaration. This declaration can typically be parsed even if the encoding has been guessed incorrectly initially. This method allows the XML parser to notify the entity that a new encoding has been declared and that future characters should be interpreted with this new encoding. (There are some situations where the request is ignored, such as when the encoding has already been detected to be UCS-2 or UCS-4 or when the source stream is not seekable.)

You can only change the encoding once. This method calls keep_encoding() once the encoding has been changed.

keep_encoding()¶

Fixes the character encoding used in the entity.

This entity parser starts in a cautious mode, parsing the entity one character a time to avoid errors caused by buffering with the wrong encoding. This method should be called once the encoding is determined so that the entity parser can use its internal character buffer.

next_line()¶

Called when the entity reader detects a new line.

This method increases the internal line count and resets the character position to the beginning of the line. You will not normally need to call this directly as line handling is done automatically by next_char().

KeepEncoding(*args, **kwargs)¶: Deprecated equivalent to keep_encoding()

Open(*args, **kwargs)¶: Deprecated equivalent to open()

close()¶: Closes the entity.

class pyslet.xml.structures.XMLDeclaredEntity(name=None, definition=None)¶

Bases: pyslet.xml.structures.XMLEntity

Abstract class representing a declared entitiy.

name: An optional string used as the name of the entity
definition: The definition of the entity is either a string or an instance of XMLExternalID, depending on whether the entity is an internal or external entity respectively.

entity = None¶: the entity in which this entity was declared

name = None¶: the name passed to the constructor

definition = None¶: the definition passed to the constructor

get_name()¶

Human-readable name suitable for logging/error reporting.

Simply returns name

is_external()¶: Returns True if this is an external entity.

open()¶

Opens the entity for reading.

External entities must be parsed for text declarations before the replacement text is encountered. This requires a small amount of look-ahead which may result in some characters needing to be re-parsed. We pass this to future parsers using buff_text.

class pyslet.xml.structures.XMLGeneralEntity(name=None, definition=None, notation=None)¶

Bases: pyslet.xml.structures.XMLDeclaredEntity

Represents a general entity.

name: Optional name
definition: An optional definition
notation: An optional notation.

notation = None¶: the notation name for external unparsed entities

get_name()¶: Formats the name as a general entity reference.

class pyslet.xml.structures.XMLParameterEntity(name=None, definition=None)¶

Bases: pyslet.xml.structures.XMLDeclaredEntity

Represents a parameter entity.

name: An optional name
definition: An optional definition.

See base class for more information on the parameters.

open_as_pe()¶

Opens the parameter entity in the context of a DTD.

This special method implements the rule that the replacement text of a parameter entity, when included as a PE, must be enlarged by the attachment of a leading and trailing space.

get_name()¶: Formats the name as a parameter entity reference.

class pyslet.xml.structures.XMLExternalID(public=None, system=None)¶

Bases: object

Represents external references to entities.

public: An optional public identifier
system: An optional system identifier

One (or both) of the identifiers should be provided.

get_location(base=None)¶

Get an absolute URI for the external entity.

Returns a pyslet.rfc2396.URI resolved against base if applicable. If there is no system identifier then None is returned.

class pyslet.xml.structures.XMLTextDeclaration(version='1.0', encoding='UTF-8')¶

Bases: object

Represents the text components of an XML declaration.

Both version and encoding are optional, though one or other are required depending on the context in which the declaration will be used.

class pyslet.xml.structures.XMLNotation(name, external_id)¶

Bases: object

Represents an XML Notation defined in Section 4.7

name: The name of the notation
external_id: A XMLExternalID instance in which one of public or system must be provided.

name = None¶: the notation name

external_id = None¶: the external ID of the notation (an XMLExternalID instance)

6.2.2.4. Syntax¶

6.2.2.4.1. White Space Handling¶

pyslet.xml.structures.is_s(c)¶

Tests production [3] S

Optimized for speed as this function is called a lot by the parser.

pyslet.xml.structures.collapse_space(data, smode=True, stest=<function is_s>)¶

Returns data with all spaces collapsed to a single space.

smode: Determines the fate of any leading space, by default it is True and leading spaces are ignored provided the string has some non-space characters.
stest: You can override the test of what consitutes a space by passing a function for stest, by default we use is_s() and any value passed to stest should behave similarly.

Note on degenerate case: this function is intended to be called with non-empty strings and will never return an empty string. If there is no data then a single space is returned (regardless of smode).

6.2.2.4.2. Names¶

pyslet.xml.structures.is_name_start_char(c)¶: Tests if the character c matches production [4] NameStartChar.

pyslet.xml.structures.is_name_char(c)¶: Tests production [4a] NameChar

pyslet.xml.structures.is_valid_name(name)¶: Tests if name is a string matching production [5] Name

pyslet.xml.structures.is_reserved_name(name)¶

Tests if name is reserved

Names beginning with ‘xml’ are reserved for future standardization

6.2.2.4.3. Character Data and Markup¶

pyslet.xml.structures.escape_char_data(src, quote=False)¶

Returns a unicode string with XML reserved characters escaped.

We also escape return characters to prevent them being ignored. If quote is True then the string is returned as a quoted attribute value.

pyslet.xml.structures.escape_char_data7(src, quote=False)¶

Escapes reserved and non-ASCII characters.

src: A character string
quote (defaults to False): When True, will surround the output in either single or double quotes (preferred) depending on the contents of src.

Characters outside the ASCII range are replaced with character references.

6.2.2.4.4. CDATA Sections¶

pyslet.xml.structures.CDATA_START = u'<![CDATA['¶: character string constant for “<![CDATA[“

pyslet.xml.structures.CDATA_END = u']]>'¶: character string constant for “]]>”

pyslet.xml.structures.escape_cdsect(src)¶

Wraps a string in a CDATA section

src: A character string of data

Returns a character string enclosed in <![CDATA[ ]]> with ]]> replaced by the clumsy sequence: ]]>]]><![CDATA[

Degenerate case: an empty string is returned as an empty string

6.2.2.4.5. Exceptions¶

class pyslet.xml.structures.XMLError¶

Bases: exceptions.Exception

Base class for all exceptions raised by this module.

class pyslet.xml.structures.XMLValidityError¶

Bases: pyslet.xml.structures.XMLError

Base class for all validation errors

Raised when a document or content model violates a validity constraint. These errors can be generated by the parser (for example, when validating a document against a declared DTD) or by Elements themselves when content is encountered that does not fit content model expected.

class pyslet.xml.structures.XMLIDClashError¶

Bases: pyslet.xml.structures.XMLValidityError

A validity error caused by two elements with the same ID

class pyslet.xml.structures.XMLIDValueError¶

Bases: pyslet.xml.structures.XMLValidityError

A validity error caused by an element with an invalid ID

ID attribute must satisfy the production for NAME.

class pyslet.xml.structures.DuplicateXMLNAME¶

Bases: pyslet.xml.structures.XMLError

Raised by map_class_elements()

Indicates an attempt to declare two classes with the same XML name.

class pyslet.xml.structures.XMLAttributeSetter¶