6.2.2. XML: Reference¶
6.2.2.1. Documents and Elements¶
-
class
pyslet.xml.structures.
Node
(parent=None)¶ Bases:
pyslet.py2.UnicodeMixin
,pyslet.pep8.MigratedClass
Base class for Element and Document shared attributes.
XML documents are defined hierarchicaly, each element has a parent which is either another element or an XML document.
-
get_children
()¶ Returns an iterator over this object’s children.
-
classmethod
get_element_class
(name)¶ Returns a class object for representing an element
- name
- a unicode string representing the element name.
The default implementation returns None - for elements this has the effect of deferring the call to the parent document (where this method is overridden to return
Element
).This method is called immediately prior to
add_child()
and (when applicable)get_child_class()
.The real purpose of this method is to allow an element class to directly control the way the name of a child element maps to the class used to represent it. You would normally override this method in the
Document
to map element names to classes but in some cases you may want to tweek the mapping at the individual element level. For example, if the same element name is used for two different purposes in the same XML document. Although confusing, this is allowed in XML schema.
-
get_child_class
(stag_class)¶ Supports custom content model handling
- stag_class
- The class of an element that is about to be created in the
current context with
add_child()
or the builtin str if data has been recieved in a context where only element content was expected.
This method is only called when the
XMLParser.sgml_omittag
option is in effect. It is called prior toadd_child()
and gives the context (the parent element or document) a chance to modify the child element that will be created or indicate the end of the current element through use of the OMITTAG feature of SGML.It returns the class of an element whose start tag has been omitted from the the document and should be added at this point or None if stag_class implies the end of the current element and the end tag may be omitted.
Otherwise this method should return stag_class unchanged (the default implementation does this) indicating that the parser should proceed as normal. In the case of unexpected data this is treated as a validity error and handled according to the parser’s validity checking options.
Validation errors are dealt with by the parser or, where the model is encoded into the classes themselves, by :meth;`add_child` and not by this method which should never raise validation errors.
Although not necessary for true XML parsing this method allows us to support the parsing of XML-like documents that omit tags, such as HTML. For example, suppose we have the following document:
<title>My Blank HTML Page</title>
The parser would recognise the start tag for <title> and then call this method (on the HTML document) passing the
pyslet.html.Title
class. For HTML documents, this method always returns thepyslet.html401.HTML
class (ignoring stag_class completely). The result is that an HTML element is opened instead and the parser tries again, calling this method for the new HTML element. That does not accept Title either and returns thepyslet.html.Head
class. Finally, a Head element is opened and that will accept Title as a child so it returns stag_class unchanged and the parser continues having inferred the omitted tags: <html> and <head>.
-
add_child
(child_class, name=None)¶ Returns a new child of the given class attached to this object.
- child_class
- A class (or callable) used to create a new instance of
Element
. - name
- The name given to the element (by the caller). If no name is given then the default name for the child is used. When the child returned is an existing instance, name is ignored.
-
processing_instruction
(target, instruction='')¶ Abstract method for handling processing instructions
By default, processing instructions are ignored.
-
get_base
()¶ Returns the base URI for a node
Abstract method, when used on a
Document
it returns the URI used to load the document, if known.
-
set_base
(base)¶ Sets the base URI of a node.
- base
- A string suitable for setting xml:base or a
pyslet.rfc2396.URI
instance.
Abstract method. Changing the base effects the interpretation of all relative URIs in this node and its children.
-
get_lang
()¶ Get the language of a node
Abstract method, when used on a
Document
it gets the default language to use in the absence of an explicit xml:lang value.
-
set_lang
(lang)¶ Set the language of a node
- lang
- A string suitable for setting the xml:lang attribute of an element.
Abstract method, when used on a
Document
it sets a default language to use in the absence of an explicit xml:lang value.
-
get_space
()¶ Gets the space policy of a node
Abstract method, when used on a
Document
it gets the default space policy to use in the absence of an explicit xml:space value.
-
ChildElement
(*args, **kwargs)¶ Deprecated equivalent to
add_child()
-
GetBase
(*args, **kwargs)¶ Deprecated equivalent to
get_base()
-
GetChildClass
(*args, **kwargs)¶ Deprecated equivalent to
get_child_class()
-
GetChildren
(*args, **kwargs)¶ Deprecated equivalent to
get_children()
-
classmethod
GetElementClass
(*args, **kwargs)¶ Deprecated equivalent to
get_element_class()
-
GetLang
(*args, **kwargs)¶ Deprecated equivalent to
get_lang()
-
GetSpace
(*args, **kwargs)¶ Deprecated equivalent to
get_space()
-
SetBase
(*args, **kwargs)¶ Deprecated equivalent to
set_base()
-
SetLang
(*args, **kwargs)¶ Deprecated equivalent to
set_lang()
-
-
class
pyslet.xml.structures.
Document
(root=None, base_uri=None, req_manager=None, **kws)¶ Bases:
pyslet.xml.structures.Node
Base class for all XML documents.
With no arguments, a new Document is created with no base URI or root element.
- root
If root is a class object (descended from
Element
) it is used to create the root element of the document.If root is an orphan instance of
Element
(i.e., it has no parent) is is used as the root element of the document and itsElement.attach_to_doc()
method is called.- base_uri (aka baseURI for backwards compatibility)
- See
set_base()
for more information - req_manager (aka reqManager for backwards compatibility)
- Sets the request manager object to use for future HTTP calls.
Must be an instance of
pyslet.http.client.Client
.
-
lang
= None¶ The default language of the document (see
set_lang()
).
-
declaration
= None¶ The XML declaration (or None if no XMLDeclaration is used)
-
dtd
= None¶ The dtd associated with the document or None.
-
root
= None¶ The root element or None if no root element has been created yet.
-
get_children
()¶ Yields the root element
-
XMLParser
(entity)¶ Creates a parser for this document
- entity
- The entity to parse the document from
The default implementation creates an instance of
XMLParser
.This method allows some document classes to override the parser used to parse them. This method is only used when parsing existing document instances (see
read()
for more information).Classes that override this method may still register themselves with
register_doc_class()
but if they do then the defaultXMLParser
object will be used as automatic detection of document class is done by the parser itself based on the information in the prolog (and/or first element).
-
classmethod
get_element_class
(name)¶ Defaults to returning
Element
.Derived classes overrride this method to enable the XML parser to create instances of custom classes based on the document context and element name.
-
add_child
(child_class, name=None)¶ Creates the root element of the document.
If there is already a root element it is detached from the document first using
Element.detach_from_doc()
.Unlike
Element.add_child()
there are no model customization options. The root element is always found atroot
.
-
set_base
(base_uri)¶ Sets the base_uri of the document to the given URI.
- base_uri
- An instance of
pyslet.rfc2396.URI
or an object that can be passed to its constructor.
Relative file paths are resolved relative to the current working directory immediately and the absolute URI is recorded as the document’s base_uri.
-
get_base
()¶ Returns a string representation of the document’s base_uri.
-
get_lang
()¶ Returns the default language for the document.
-
set_lang
(lang)¶ Sets the default language for the document.
-
get_space
()¶ Returns the default space policy for the document.
By default we reutrn None, indicating that no policy is in force. Derived documents can oveerrid this behaviour to return either “preserve” or “default” to affect space handling.
-
validation_error
(msg, element, data=None, aname=None)¶ Called when a validation error is triggered.
- msg
- contains a brief message suitable for describing the error in a log file.
- element
- the element in which the validation error occurred
- data, aname
- See
Element.validation_error()
.
Prior to raising
XMLValidityError
this method logs a suitable message at WARN level.
-
register_element
(element)¶ Registers an element’s ID
If the element has an ID attribute it is added to the internal ID table. If the ID already exists
XMLIDClashError
is raised.
-
unregister_element
(element)¶ Removes an elements ID
If the element has a uniquely defined ID it is removed from the internal ID table. Called prior to detaching the element from the document.
-
get_element_by_id
(id)¶ Returns the element with a given ID
Returns None if the ID is not the ID of any element.
-
get_unique_id
(base_str=None)¶ Generates a random element ID that is not yet defined
- base_str
- A suggested prefix (defaults to None).
-
read
(src=None, **kws)¶ Reads this document, parsing it from a source stream.
With no arguments the document is read from the
base_uri
which must have been specified on construction or with a call to theset_base()
method.- src (defaults to None)
- You can override the document’s base URI by passing a value
for src which may be an instance of
XMLEntity
or a file-like object suitable for passing toread_from_stream()
.
-
read_from_stream
(src)¶ Reads this document from a stream
- src
- Any object that can be passed to
XMLEntity
’s constructor.
If you need more control, for example over encodings, you can create the entity yourself and use
read_from_entity()
instead.
-
read_from_entity
(e)¶ Reads this document from an entity
- e
- An
XMLEntity
instance.
The document is read from the current position in the entity.
-
create
(dst=None, **kws)¶ Creates the Document.
Outputs the document as an XML stream.
- dst (defaults to None)
- The stream is written to the base_uri by default but if the ‘dst’ argument is provided then it is written directly to there instead. dst can be any object that supports the writing of binary strings.
Currently only documents with file type baseURIs are supported. The file’s parent directories are created if required. The file is always written using the UTF-8 as per the XML standard.
-
generate_xml
(escape_function=<function escape_char_data>, tab='\t', encoding='UTF-8')¶ A generator that yields serialised XML
- escape_function
- The function that will be used to escape character data. The
default is
escape_char_data()
. The alternate name escapeFunction is supported for backwards compatibility. - tab (defaults to ‘t’)
- Whether or not indentation will be used is determined by the tab parameter. If it is empty then no pretty-printing is performed, otherwise elements are indented (where allowed by their defining classes) for ease of reading.
- encoding (defaults to “UTF-8”)
- The name of the character encoding to put in the XML declaration.
Yields character strings, the first string being the XML declaration which always specifies the encoding UTF-8
-
write_xml
(writer, escape_function=<function escape_char_data>, tab='\t')¶ Writes serialized XML to an output stream
- writer
- A file or file-like object operating in binary mode.
The other arguments follow the same pattern as
generate_xml()
which this method uses to create the output which is always UTF-8 encoded.
-
update
(**kws)¶ Updates the Document.
Update outputs the document as an XML stream. The stream is written to the base_uri which must already exist! Currently only documents with file type baseURIs are supported.
-
diff_string
(other_doc, before=10, after=5)¶ Compares XML documents
- other_doc
- Another
Document
instance to compare with. - before (default 10)
- Number of lines before the first difference to output
- after (default 5)
- Number of lines after the first difference to output
The two documents are converted to character strings and then compared line by line until a difference is found. The result is suitable for logging or error reporting. Used mainly to make the output of unittests easier to understand.
-
DiffString
(*args, **kwargs)¶ Deprecated equivalent to
diff_string()
-
GenerateXML
(*args, **kwargs)¶ Deprecated equivalent to
generate_xml()
-
GetElementByID
(*args, **kwargs)¶ Deprecated equivalent to
get_element_by_id()
-
GetUniqueID
(*args, **kwargs)¶ Deprecated equivalent to
get_unique_id()
-
ReadFromEntity
(*args, **kwargs)¶ Deprecated equivalent to
read_from_entity()
-
ReadFromStream
(*args, **kwargs)¶ Deprecated equivalent to
read_from_stream()
-
RegisterElement
(*args, **kwargs)¶ Deprecated equivalent to
register_element()
-
UnregisterElement
(*args, **kwargs)¶ Deprecated equivalent to
unregister_element()
-
ValidationError
(*args, **kwargs)¶ Deprecated equivalent to
validation_error()
-
WriteXML
(*args, **kwargs)¶ Deprecated equivalent to
write_xml()
-
class
pyslet.xml.structures.
Element
(parent, name=None)¶ Bases:
pyslet.xml.structures.Node
Base class that represents all XML elements.
This class is usually used only as a default to represent elements with unknown content models or that require no special processing. The power of Pyslet’s XML package comes when different classes are derived from this one to represent the different (classes of) elements defined by an application. These derived classes will normally some form of custom serialisation behaviour (see below).
Although derived classes are free to implement a wide range of python protocols they must always return True in truth tests. An implementation of __bool__ (Python 2, __nonzero__) is provided that does this. This ensures that derived classes are free to implement __len__ but bear in mind that an instance of a derived class for which __len__ returns 0 must still evaluate to True.
Elements compare equal if their names, attribute lists and canonical children all compare equal. No rich comparison methods are provided.
In addition to truth testing, custom attribute serialisation requires a custom implementation of __getattr__, see below for more details.
Elements are usually constructed by calling the parent element’s (or document’s)
Node.add_child()
method. When constructed directly, the constructor requires that the parentNode
be passed as an argument. If you pass None then an orphan element is created (seeattach_to_parent()
).Some aspects of the element’s XML serialisation behaviour are controlled by special class attributes that can be set on derived classes.
- XMLNAME
- The default name of the element the class represents.
- XMLCONTENT
- The default content model of the element; one of the
ElementType
constants.
You can customise attribute mappings using the following special class attributes.
- ID
- The name of the ID attribute if the element has a unique ID.
With this class attribute set, ID handling is automatic (see
set_id()
and py:attr:id below).
By default, attributes are simply stored as name/value character strings in an internal dictionary. It is often more useful to map XML attributes directly onto similarly named attributes of the instances that represent each element.
This mapping can be provided using class attributes of the form XMLATTR_aname where /aname/ is the name of the attribute as it would appear in the element’s tag. There are a number of forms of attribute mapping.
XMLATTR_aname=<string>
This form creates a simple mapping from the XML attribute ‘aname’ to a python attribute with a defined name. For example, you might want to create a mapping like this to avoid a python reserved word:
XMLATTR_class="style_class"
This allows XML elements like this:
<element class="x"/>
To be parsed into python objects that behave like this:
element.style_class=="x" # True
If an instance is missing a python attribute corresponding to a defined XML attribute, or it’s value has been set to None, then the XML attribute is omitted from the element’s tag when generating XML output.
XMLATTR_aname=(<string>, decode_function, encode_function)
More complex attributes can be handled by setting XMLATTR_aname to a tuple. The first item is the python attribute name (as above); the decode_function is a simple callable that takes a string argument and returns the decoded value of the attribute and the encode_function performs the reverse transformation.
The encode/decode functions can be None to indicate a no-operation.
For example, you might want to create an integer attribute using something like:
<!-- source XML --> <element apples="5"/> # class attribute definition XMLATTR_apples = ('n_apples', int, str) # the resulting object behaves like this... element.n_apples == 5 # True
XMLATTR_aname=(<string>, decode_function, encode_function, type)
When XML attribute values are parsed from tags the optional type component of the tuple descriptor can be used to indicate a multi-valued attribute. For example, you might want to use a mult-valued mapping for XML attributes defined using one of the plural forms, IDREFS, ENTITIES and NMTOKENS.
If the type value is not None then the XML attribute value is first split by white-space, as per the XML specification, and then the decode function is applied to each resulting component. The instance attribute is then set depending on the value of type:
- list
The instance attribute becomes a list, for example:
<!-- source XML --> <element primes="2 3 5 7"/> # class attribute definition XMLATTR_primes = ('primes', int, str, list) # resulting object behaves like this... element.primes == [2, 3, 5, 7] # True
- dict
The instance attribute becomes a dictionary mapping parsed values on to their frequency, for example:
<!-- source XML --> <element fruit="apple pear orange pear"/> # class attribute definition XMLATTR_fruit = ('fruit', None, None, dict) # resulting object behaves like this... element.fruit == {'apple': 1, 'orange': 1, 'pear': 2}
In this case, the decode function (if given) must return a hashable object!
When serialising to XML the reverse transformations are performed using the encode functions and the type (plain, list or dict) of the attribute’s current value. The declared multi-valued type is ignored. For dictionary values the order of the output values may not be the same as the order originally read from the XML input.
Warning: Empty lists and dictionaries result in XML attribute values that are present but with empty strings. If you wish to omit these attributes in the output XML you must set the attribute value to None.
Some element specifications define large numbers of optional attributes and it is inconvenient to write constructors to initialise these members in each instance and possibly wasteful of memory if a document contains large numbers of such elements.
To obviate the need for optional attributes to be present in every instance an implementation of __getattr__ is provided that will ensure that element.aname returns None if ‘aname’ is the target of an attribute mapping rule, regardless of whether or not the attribute has actually been seet for the instance.
Implementation note: internally, the XMLATTR_* descriptors are parsed into two mappings the first time they are needed. The forward map maps XML attribute names onto tuples of:
(<python attribute name>, decode_function, type)The reverse map maps python attribute names onto a tuple of:
(<xml attribute name>, encode_function)XML attribute names may contain many characters that are not legal in Python syntax but automated attribute processing is still supported for these attributes even though the declaration cannot be written into the class definition. Use the builtin function setattr immediately after the class is defined, for example:
class MyElement(Element): pass setattr(MyElement, 'XMLATTR_hyphen-attr', 'hyphen_attr')
-
XMLCONTENT
= 2¶ We default to a mixed content model
-
set_xmlname
(name)¶ Sets the name of this element
- name
- A character string.
You will not normally need to call this method, it is called automatically during child creation.
-
get_xmlname
()¶ Returns the name of this element
In the default implementation this is a simple character string.
-
get_document
()¶ Returns the document that contains the element.
If the element is an orphan, or is the descendent of an orphan then None is returned.
-
set_id
(id)¶ Sets the id of the element
The change is registered with the enclosing document. If the id is already taken then
XMLIDClashError
is raised.
-
classmethod
mangle_aname
(name)¶ Returns a mangled attribute name
A mangled attribute name is simple name prefixed with “XMLATTR_”.
-
classmethod
unmangle_aname
(mname)¶ Returns an unmangled attribute name.
If mname is not a mangled name, None is returned. A mangled attribute name starts with “XMLATTR_”.
-
get_attributes
()¶ Returns a ditc mapping attribute names onto values.
Each attribute value is represented as a character string. Derived classes MUST override this method if they define any custom attribute mappings.
The dictionary returned represents a copy of the information in the element and so may be modified by the caller.
-
set_attribute
(name, value)¶ Sets the value of an attribute.
- name
- The name of the attribute to set
- value
- The value of the attribute (as a character string) or None to remove the attribute.
-
get_attribute
(name)¶ Gets the value of a single attribute as a string.
If the element has no attribute with name then KeyError is raised.
This method searches the attribute mappings and will return attribute values obtained by encoding the associated objects according to the mapping.
-
is_valid_name
(value)¶ Returns True if a character string is a valid NAME
This test can be done standalone using the module function of the same name (this implementation defaults to using that function). By checking validity in the context of an element derived classes may override this test.
This test is used currently only used when checking IDs (see
set_id()
)
-
is_empty
()¶ Whether this element must be empty.
If the class defines the
XMLCONTENT
attribute then the model is taken from there and this method returns True only if XMLCONTENT isElementType.EMPTY
.Otherwise, the method defaults to False
-
is_mixed
()¶ Whether or not the element may contain mixed content.
If the class defines the
XMLCONTENT
attribute then the model is taken from there and this method returns True only if XMLCONTENT isElementType.MIXED
.Otherwise, the method defaults to True
-
get_children
()¶ Returns an iterable of the element’s children.
This method iterates through the internal list of children only. Derived classes with custom models (i.e., those that define attributes to customise child element creation) MUST override this method.
Each child is either a character string or an instance of Element (or a derived class thereof). We do not represent comments, processing instructions or other meta-markup.
-
get_canonical_children
()¶ Returns children with canonical white space
A wrapper for
get_children()
that returns an iterable of the element’s children canonicalized for white space as follows. We check the current setting of xml:space, returning the same list of children asget_children()
if ‘preserve’ is in force. Otherwise we remove any leading space and collapse all others to a single space character.
-
get_or_add_child
(child_class)¶ Returns the first child of type child_class
If there is no child of that class then a new child is added.
-
add_child
(child_class, name=None)¶ Adds a new child of the given class attached to this element.
- child_class
- A class object (or callable) used to create a new instance.
- name
- The name given to the element (by the caller). If no name is given then the default name for the child is used. When the child returned is an existing instance, name is ignored.
By default, an instance of child_class is created and attached to the internal list of child elements.
Child creation can be customised to support a more natural mapping for structured elements as follows. Firstly, the name of child_class (not the element name) is looked up in the parent (self), if there is no match, the method resolution order is followed for child_class looking up the names of each base in turn until a matching attribute is found. If there are no matches then the default handling is performed.
Otherwise, the behaviour is determined by the matching attribute as follows.
- 1 If the attribute is None then a new instance of child_class
- is created and assigned to the attribute.
- 2 If the attribute is a list then a new instance of child_class
- is created and appended to the attribute’s value.
- 3 Finally, if the attribute value is already an instance of
- child_class it is returned unchanged.
- 4 Deprecated: A method attribute is called either without
- arguments (if the method name matches the child_class exactly) or with the child_class itself passed as an argument. It must return the new child element.
In summary, a new child is created and attached to the element’s model unless the model supports a single element of the given child_class and the element already exists (as evidenced by an attribute with the name of child_class or one of its bases), in which case the existing instance is returned.
-
remove_child
(child)¶ Removes a child from this element’s children.
- child
- An
Element
instance that must be a direct child. That is, one that would be yielded byget_children()
.
By default, we search the internal list of child elements.
For content model customisation we follow the same name matching conventions as for child creation (see
add_child()
). If a matching attribute is found then we process them as follows:- 1 If the attribute’s value is child then it is set to None,
- if it is not child then
XMLUnknownChild
is raised. - 2 If the attribute is a list then we remove child from the
- list. If child is not in the list
XMLUnknownChild
is raised.
- If the attribute is None then we raise
XMLUnknownChild
.
-
find_children
(child_class, child_list, max=None)¶ Finds children of a given class
Deprecated in favour of:
list(e.find_children_depth_first(child_class, False))
- child_class
- A class object derived from
Element
. May also be a tuple as per the definition of the builtin isinstance function in python. - child_list
- A list. Matching children are appended to this.
- max (defaults to None)
- Maximum number of children to match (None means no limit). This value is used to check against the length of child_list so any elements already present will count towards the total.
Nested matches are not included. In other words, if the model of child_class allows further elements of type child_class as children (directly or indirectly) then only the top-level match is returned. (Use
find_children_depth_first()
for a way to return recursive lists of matching children.)The search is done depth first so children are returned in the logical order they would appear in the document.
-
find_children_breadth_first
(child_class, sub_match=True, max_depth=1000, **kws)¶ Generates all children of a given class
- child_class
- A class object derived from
Element
. May also be a tuple as per the definition of the builtin isinstance function in python. - sub_match (defaults to True)
- Matching elements are also scanned for nested matches. If False, only the outer-most matching element is returned.
- max_depth
- Controls the maximum depth of the scan with level 1 indicating direct children only. It must be a positive integer and defaults to 1000.
Warning: to reduce memory requirements when searching large documents this method performs a two-pass scan of the element’s children, i.e.,
get_children()
will be called twice.Given that XML documents tend to be broader than they are deep
find_children_depth_first()
is a better method to use for general purposes.
-
find_children_depth_first
(child_class, sub_match=True, max_depth=1000, **kws)¶ Generates all children of a given class
- child_class
- A class object derived from
Element
. May also be a tuple as per the definition of the builtin isinstance function in python. - sub_match (defaults to True)
- Matching elements are also scanned for nested matches. If False, only the outer-most matching element is returned.
- max_depth
- Controls the maximum depth of the scan with level 1 indicating direct children only. It must be a positive integer and defaults to 1000.
Uses a depth-first scan of the element hierarchy rooted at the current element.
-
find_parent
(parent_class)¶ Finds the first parent of the given class.
- parent_class
- A class object descended from
Element
.
Traverses the hierarchy through parent elements until a matching parent is found or returns None.
-
attach_to_parent
(parent)¶ Called to attach an orphan element to a parent.
This method is not normally needed, when creating XML elements you would normally call
add_child()
on the parent which ensures that elements are created in the context of a parent node. The purpose of this method is to allow orphaned elements to be associated with a (new) parent. For example, after being detached from one element hierarchy and attached to another.This method does not do any special handling of child elements, the caller takes responsibility for ensuring that this element will be returned by future calls to parent.get_children(). However,
attach_to_doc()
is called to ensure id registrations are made.
-
attach_to_doc
(doc=None)¶ Called when the element is first attached to a document.
This method is not normally needed, when creating XML elements you would normally call
add_child()
on the parent which ensures that elements are created in the context of a containing document. The purpose of this method is to allow orphaned elements to be associated with a parent (document) after creation. For example, after being detached from one element hierarchy and attached to another (possibly in a different document).The default implementation ensures that any ID attributes belonging to this element or its descendents are registered.
-
detach_from_parent
()¶ Called to detach an element from its parent
The result is that this element becomes an orphan.
This method does not do any special handling of child elements, the caller takes responsibility for ensuring that this element will no longer be returned by future calls to the (former) parent’s
get_children()
method.We do call
detach_from_doc()
to ensure id registrations are removed andparent
is set to None.
-
detach_from_doc
(doc=None)¶ Called when an element is being detached from a document.
- doc
- The document the element is being detached from, if None then this is determined automatically. Provided as an optimisation for speed when detaching large parts of the element hierarchy.
The default implementation ensures that any ID attributes belonging to this element or its descendents are unregistered.
-
add_data
(data)¶ Adds a character string to this element’s children.
This method raises a validation error if the element cannot take data children.
-
content_changed
()¶ Notifies an element that its content has changed.
Called by the parser once the element’s attribute values and content have been parsed from the source. Can be used to trigger any internal validation required following manual changes to the element.
The default implementation tidies up the list of children reducing runs of data to a single unicode string to make future operations simpler and faster.
-
generate_value
(ignore_elements=False)¶ Generates strings representing the element’s content
A companion method to
get_value()
which is useful when handling elements that contain a large amount of data). For more information seeget_value()
.
-
get_value
(ignore_elements=False)¶ Returns a single object representing the element’s content.
- ignore_elements
- If True then any elements found in mixed content are
ignored. If False then any child elements cause
XMLMixedContentError
to be raised.
The default implementation returns a character string and is only supported for elements where mixed content is permitted (
is_mixed()
). It usesgenerate_value()
to iterate through the children.If the element is empty an empty string is returned.
Derived classes may return more complex objects, such as values of basic python types or class instances that better represent the content of the element.
You can pass ignore_elements as True to override this behaviour in the unlikely event that you want:
<!-- elements like this... --> <data>This is <em>the</em> value</data> # to behave like this: data.get_value(True) == "This is value"
-
set_value
(value)¶ Replaces the content of the element.
- value
- A character string used to replace the content of the element. Derived classes may support a wider range of value types, if the default implementation encounters anything other than a character string it attempts to convert it before setting the content.
The default implementation is only supported for elements where mixed content is permitted (see
is_mixed()
) and only affects the internally maintained list of children. Elements with more complex mixed models MUST override this method.If value is None then the element becomes empty.
-
reset
(reset_attrs=False)¶ Resets all children (and optionally attribute values).
- reset_attrs
Whether or not to reset attribute values too.
Called by the default implementation of
set_value()
with reset_attrs=False, removes all children from the internally maintained list of children.Called by the default implementation of
add_child()
with reset_attrs=True when an existing element instance is being recycled (obviating the constructor). The default implementation removes only unmapped attribute values. Mapped atrribute values are not reset.
Derived classes should call this method if they override the implementation of
set_value()
.Derived classes with custom content models, i.e., those that provide a custom implementation for
get_children()
, must override this method and treat it as an event associated with parsing the start tag of the element. (This method is also a useful signal for resetting an state used for validating custom content models.)Required children should be reset and optional children should be orphaned using
detach_from_parent()
and any references to them in instance attributes removed. Failure to override this method will can result in the child elements accumulating from one read to the next.
-
validation_error
(msg, data=None, aname=None)¶ Called when a validation error occurred in this element.
- msg
- Message suitable for logging and reporting the nature of the error.
- data
- The data that caused the error may be given in data.
- aname
- The attribute name may also be given indicating that the offending data was in an attribute of the element and not the element itself.
The default implementation simply calls the containing Document’s
Document.validation_error()
method. If the element is an orphan thenXMLValidityError
is raised directly with msg.
-
static
sort_names
(name_list)¶ Sorts names in a predictable order
- name_list
- A list of element or attribute names
The default implementation assumes that the names are strings or unicode strings so uses the default sort method.
-
deepcopy
(parent=None)¶ Creates a deep copy of this element.
- parent
- The parent node to attach the new element to. If it is None then a new orphan element is created.
This method mimics the process of serialisation and deserialisation (without the need to generate markup). As a result, element attributes are serialised and deserialised to strings during the copy process.
-
get_base
()¶ Returns the value of the xml:base attribute as a string.
-
set_base
(base)¶ Sets the value of the xml:base attribute from a string.
Changing the base of an element effects the interpretation of all relative URIs in this element and its children.
-
resolve_base
()¶ Returns the base of the current element.
The URI is calculated using any xml:base values of the element or its ancestors and ultimately relative to the base URI of the document itself.
If the element is not contained by a Document, or the document does not have a fully specified base_uri then the return result may be a relative path or even None, if no base information is available.
The return result is always None or a character string, such as would be obtained from the xml:base attribute.
-
resolve_uri
(uriref)¶ Resolves a URI reference in the current context.
- uriref
- A
pyslet.rfc2396.URI
instance or a string that one can be parsed from.
The argument is resolved relative to the xml:base values of the element’s ancestors and ultimately relative to the document’s base. Ther result may still be a relative URI, there may be no base set or the base may only be known in relative terms.
For example, if the Document was loaded from the URL:
http://www.example.com/images/catalog.xml
and e is an element in that document then:
e.resolve_uri('smiley.gif')
would return a URI instance representing the fully-specified URI:
http://www.example.com/images/smiley.gif
-
relative_uri
(href)¶ Returns href expressed relative to the element’s base.
- href
- A
pyslet.rfc2396.URI
instance or a string that one can be parsed from.
If href is already a relative URI then it is converted to a fully specified URL by interpreting it as being the URI of a file expressed relative to the current working directory.
For example, if the Document was loaded from the URL:
http://www.example.com/images/catalog.xml
and e is an element in that document then:
e.relatitve_uri('http://www.example.com/images/smiley.gif')
would return a URI instance representing relative URI:
'smiley.gif'
If the element does not have a fully-specified base URL then the result is a fully-specified URL itself.
-
get_lang
()¶ Returns the value of the xml:lang attribute as a string.
-
set_lang
(lang)¶ Sets the value of the xml:lang attribute from a string.
See
resolve_lang()
for how to obtain the effective language of an element.
-
resolve_lang
()¶ Returns the effective language for the current element.
The language is resolved using the xml:lang value of the element or its ancestors. If no xml:lang is in effect then None is returned.
-
get_space
()¶ Gets the value of the xml:space attribute
-
set_space
(space)¶ Sets the xml:space attribute
- space
- A character string containing the new value or None to clear the attribute definition on this element.
-
resolve_space
(space)¶ Returns the effective space policy for the current element.
The policy is resolved using the value returned by
get_space()
on this element or its ancestors. If no space policy is in effect then None is returned.
-
can_pretty_print
()¶ True if this element’s content may be pretty-printed.
This method is used when formatting XML files to text streams. The output is also affected by the xml:space attribute. Derived classes can override the default behaviour.
The difference between this method and the xml:space attribute is that this method indicates if white space can be safely added to the output to improve formatting by inserting line feeds to break it over multiple lines and to insert spaces or tab characters to indent tags.
On the other hand, xml:space=’preserve’ indicates that white space in the original document must not be taken away. It therefore makes sense that if
get_space()
returns ‘preserve’ we will return False. Derived classes may consider providing an implementation of get_space that always return ‘preserve’ and using the default implementation of this method.This method will return False if one of the following is true:
- the special attribute SGMLCDATA is present
- the special content model attribute
XMLCONTENT
indicates that the element may contain mixed content (this is the default for generic instances ofElement
) get_space()
is set to ‘preserve’ (xml:space)- self.parent.can_pretty_print() returns False
Otherwise we return True.
-
write_xml_attributes
(attributes, escape_function=<function escape_char_data>, root=False, **kws)¶ Creates strings serialising the element’s attributes
- attributes
- A list of character strings
- escape_function
- The function that will be used to escape character data. The
default is
escape_char_data()
. The alternate name escapeFunction is supported for backwards compatibility. - root
- Indicates if this element should be treated as the root element. By default there is no special action required but derived classes may need to generate additional attributes, such as those that relate to the namespaces or schema used by the element.
The attributes are generated as strings of the form ‘name=”value”’ with values escaped appropriately for serialised XML output. The attributes are always sorted into a predictable order (based on attribute name) to ensure that identical documents produce identical output.
-
generate_xml
(escape_function=<function escape_char_data>, indent='', tab='\t', root=False, **kws)¶ A generator that yields serialised XML
- escape_function
- The function that will be used to escape character data. The
default is
escape_char_data()
. The alternate name escapeFunction is supported for backwards compatibility. - indent (defaults to an empty string)
- The string to use for passing any inherited indent, used in combination with the tab parameter for pretty printing. See below.
- tab (defaults to ‘t’)
Whether or not indentation will be used is determined by the tab parameter. If it is empty then no pretty-printing is performed for the element, otherwise the element will start with a line-feed followed by any inherited indent and finally followed by the content of tab. For example, if you prefer to have your XML serialised with a 4-space indent then pass tab=’ ‘.
If the element is in a context where pretty printing is not allowed (see
can_pretty_print()
) then tab is ignored.- root (defaults to False)
- Indicates if this is the root element of the document. See
write_xml_attributes()
.
Yields character strings.
-
write_xml
(writer, escape_function=<function escape_char_data>, indent='', tab='\t', root=False, **kws)¶ Writes serialized XML to an output stream
- writer
- A file or file-like object operating in binary mode.
The other arguments follow the same pattern as
generate_xml()
which this method uses to create the output which is always UTF-8 encoded.
-
AddData
(*args, **kwargs)¶ Deprecated equivalent to
add_data()
-
AttachToDocument
(*args, **kwargs)¶ Deprecated equivalent to
attach_to_doc()
-
AttachToParent
(*args, **kwargs)¶ Deprecated equivalent to
attach_to_parent()
-
ContentChanged
(*args, **kwargs)¶ Deprecated equivalent to
content_changed()
-
Copy
(*args, **kwargs)¶ Deprecated equivalent to
deepcopy()
-
DeleteChild
(*args, **kwargs)¶ Deprecated equivalent to
remove_child()
-
DetachFromDocument
(*args, **kwargs)¶ Deprecated equivalent to
detach_from_doc()
-
DetachFromParent
(*args, **kwargs)¶ Deprecated equivalent to
detach_from_parent()
-
FindChildren
(*args, **kwargs)¶ Deprecated equivalent to
find_children()
-
FindChildrenBreadthFirst
(*args, **kwargs)¶ Deprecated equivalent to
find_children_breadth_first()
-
FindChildrenDepthFirst
(*args, **kwargs)¶ Deprecated equivalent to
find_children_depth_first()
-
FindParent
(*args, **kwargs)¶ Deprecated equivalent to
find_parent()
-
GenerateXML
(*args, **kwargs)¶ Deprecated equivalent to
generate_xml()
-
GetAttribute
(*args, **kwargs)¶ Deprecated equivalent to
get_attribute()
-
GetAttributes
(*args, **kwargs)¶ Deprecated equivalent to
get_attributes()
-
GetCanonicalChildren
(*args, **kwargs)¶ Deprecated equivalent to
get_canonical_children()
-
GetDocument
(*args, **kwargs)¶ Deprecated equivalent to
get_document()
-
GetValue
(*args, **kwargs)¶ Deprecated equivalent to
get_value()
-
GetXMLName
(*args, **kwargs)¶ Deprecated equivalent to
get_xmlname()
-
IsEmpty
(*args, **kwargs)¶ Deprecated equivalent to
is_empty()
-
IsMixed
(*args, **kwargs)¶ Deprecated equivalent to
is_mixed()
-
IsValidName
(*args, **kwargs)¶ Deprecated equivalent to
is_valid_name()
-
classmethod
MangleAttributeName
(*args, **kwargs)¶ Deprecated equivalent to
mangle_aname()
-
PrettyPrint
(*args, **kwargs)¶ Deprecated equivalent to
can_pretty_print()
-
RelativeURI
(*args, **kwargs)¶ Deprecated equivalent to
relative_uri()
-
ResolveBase
(*args, **kwargs)¶ Deprecated equivalent to
resolve_base()
-
ResolveLang
(*args, **kwargs)¶ Deprecated equivalent to
resolve_lang()
-
ResolveURI
(*args, **kwargs)¶ Deprecated equivalent to
resolve_uri()
-
SetAttribute
(*args, **kwargs)¶ Deprecated equivalent to
set_attribute()
-
SetSpace
(*args, **kwargs)¶ Deprecated equivalent to
set_space()
-
SetValue
(*args, **kwargs)¶ Deprecated equivalent to
set_value()
-
SetXMLName
(*args, **kwargs)¶ Deprecated equivalent to
set_xmlname()
-
static
SortNames
(*args, **kwargs)¶ Deprecated equivalent to
sort_names()
-
classmethod
UnmangleAttributeName
(*args, **kwargs)¶ Deprecated equivalent to
unmangle_aname()
-
ValidationError
(*args, **kwargs)¶ Deprecated equivalent to
validation_error()
-
WriteXML
(*args, **kwargs)¶ Deprecated equivalent to
write_xml()
-
WriteXMLAttributes
(*args, **kwargs)¶ Deprecated equivalent to
write_xml_attributes()
-
pyslet.xml.structures.
map_class_elements
(class_map, scope)¶ Adds element name -> class mappings to class_map
- class_map
- A dictionary that maps XML element names onto class objects that should be used to represent them.
- scope
- A dictionary, or an object containing a __dict__ attribute, that will be scanned for class objects to add to the mapping. This enables scope to be a module. The search is not recursive, to add class elements from imported modules you must call map_class_elements for each module.
Mappings are added for each class that is derived from
Element
that has an XMLNAME attribute defined. It is an error if a class is found with an XMLNAME that has already been mapped.
6.2.2.1.1. Exceptions¶
-
class
pyslet.xml.structures.
XMLMissingResourceError
¶ Bases:
pyslet.xml.structures.XMLError
Raised when an entity cannot be found (e.g., missing file).
Also raised when an external entity reference is encountered but the opening of external entities is turned off.
-
class
pyslet.xml.structures.
XMLMissingLocationError
¶ Bases:
pyslet.xml.structures.XMLError
Raised when on create, read or update when base_uri is None
-
class
pyslet.xml.structures.
XMLUnsupportedSchemeError
¶ Bases:
pyslet.xml.structures.XMLError
Document.base_uri
has an unsupported schemeCurrently only file, http and https schemes are supported for open operations. For create and update operations, only file types are supported.
-
class
pyslet.xml.structures.
XMLUnexpectedHTTPResponse
¶ Bases:
pyslet.xml.structures.XMLError
Raised by
Document.open_uri()
The message contains the response code and status message received from the server.
6.2.2.2. Prolog and Document Type Declaration¶
-
class
pyslet.xml.structures.
XMLDTD
¶ Bases:
pyslet.pep8.MigratedClass
An object that models a document type declaration.
The document type declaration acts as a container for the entity, element and attribute declarations used in a document.
-
name
= None¶ The declared Name of the root element
-
parameter_entities
= None¶ A dictionary of XMLParameterEntity instances keyed on entity name.
-
general_entities
= None¶ A dictionary of XMLGeneralEntity instances keyed on entity name.
-
notations
= None¶ A dictionary of XMLNotation instances keyed on notation name.
-
element_list
= None¶ A dictionary of
ElementType
definitions keyed on the name of element.
-
attribute_lists
= None¶ A dictionary of dictionaries, keyed on element name. Each of the resulting dictionaries is a dictionary of
XMLAttributeDefinition
keyed on attribute name.
-
declare_entity
(entity)¶ Declares an entity in this document.
The same method is used for both general and parameter entities. The value of entity can be either an
XMLGeneralEntity
or anXMLParameterEntity
instance.
-
get_parameter_entity
(name)¶ Returns the parameter entity definition matching name.
Returns an instance of
XMLParameterEntity
. If no parameter has been declared with name then None is returned.
-
get_entity
(name)¶ Returns the general entity definition matching name.
Returns an instance of
XMLGeneralEntity
. If no general has been declared with name then None is returned.
-
declare_notation
(notation)¶ Declares a notation for this document.
The value of notation must be a
XMLNotation
instance.
-
get_notation
(name)¶ Returns the notation declaration matching name.
- name
- The name of the notation to search for.
Returns an instance of
XMLNotation
. If no notation has been declared with name then None is returned.
-
declare_element_type
(etype)¶ Declares an element type.
- etype
- An
ElementType
instance containing the element definition.
-
get_element_type
(element_name)¶ Looks up an element type definition.
- element_name
- the name of the element type to look up
The method returns an instance of
ElementType
or None if no element with that name has been declared.
-
declare_attribute
(element_name, attr_def)¶ Declares an attribute.
- element_name
- the name of the element type which should have this attribute applied
- attr_def
- An
XMLAttributeDefinition
instance describing the attribute being declared.
-
get_attribute_list
(name)¶ Returns a dictionary of attribute definitions
- name
- The name of the element type to look up.
If there are no attributes declared for this element type, None is returned.
-
get_attribute_definition
(element_name, attr_name)¶ Looks up an attribute definition.
- element_name
- the name of the element type in which to search
- attr_name
- the name of the attribute to search for.
The method returns an instance of
XMLAttributeDefinition
or None if no attribute matching this description has been declared.
-
GetAttributeList
(*args, **kwargs)¶ Deprecated equivalent to
get_attribute_list()
-
-
class
pyslet.xml.structures.
XMLDeclaration
(version, encoding='UTF-8', standalone=False)¶ Bases:
pyslet.xml.structures.XMLTextDeclaration
Represents a full XML declaration.
Unlike the parent class,
XMLTextDeclaration
, the version is required. standalone defaults to False as this is the assumed value if there is no standalone declaration.-
standalone
= None¶ Whether an XML document is standalone.
-
-
class
pyslet.xml.structures.
ElementType
¶ Bases:
object
Represents element type definitions.
-
EMPTY
= 0¶ Content type constant for EMPTY
-
ANY
= 1¶ Content type constant for ANY
-
MIXED
= 2¶ Content type constant for mixed content
-
ELEMENT_CONTENT
= 3¶ Content type constant for element content
-
SGMLCDATA
= 4¶ Additional content type constant for SGML CDATA
-
entity
= None¶ The entity in which this element was declared
-
name
= None¶ The name of this element
-
content_type
= None¶ The content type of this element, one of the constants defined above.
-
content_model
= None¶ A
XMLContentParticle
instance which contains the element’s content model or None in the case of EMPTY or ANY declarations.
-
particle_map
= None¶ A mapping used to validate the content model during parsing. It maps the name of the first child element found to a list of
XMLNameParticle
instances that can represent it in the content model. For more information seeXMLNameParticle.particle_map
.
-
build_model
()¶ Builds internal strutures to support model validation.
-
is_deterministic
()¶ Tests if the content model is deterministic.
For degenerate cases (elements declared with ANY or EMPTY) the method always returns True.
-
-
class
pyslet.xml.structures.
XMLContentParticle
¶ Bases:
object
An object for representing content particles.
-
ZeroOrOne
= 1¶ Occurrence constant for ‘?’
-
OneOrMore
= 3¶ Occurrence constant for ‘+’
-
occurrence
= None¶ One of the occurrence constants defined above.
-
build_particle_maps
(exit_particles)¶ Abstract method that builds the particle maps for this node or its children.
For more information see
XMLNameParticle.particle_map
.Although only name particles have particle maps this method is called for all particle types to allow the model to be built hierarchically from the root out to the terminal (name) nodes. exit_particles provides a mapping to all the following particles outside the part of the hierarchy rooted at the current node that are directly reachable from the particles inside.
-
seek_particles
(pmap)¶ Adds all possible entry particles to pmap.
Abstract method, pmap is a mapping from element name to a list of
XMLNameParticles XMLNameParticle
.Returns True if a required particle was added, False if all particles added are optional.
Like
build_particle_maps()
, this method is called for all particle types. The mappings requested represent all particles inside the part of the hierarchy rooted at the current node that are directly reachable from the preceeding particles outside.
-
add_particles
(src_map, pmap)¶ A utility method that adds particles from src_map to pmap.
Both maps are mappings from element name to a list of
XMLNameParticles XMLNameParticle
. All entries in src_map not currently in pmap are added.
-
is_deterministic
(pmap)¶ A utility method for identifying deterministic particle maps.
A deterministic particle map is one in which each name maps uniquely to a single content particle. A non-deterministic particle map contains an ambiguity, for example ((b,d)|(b,e)). The particle map created by
seek_particles()
for the enclosing choice list would have two entries for ‘b’, one to map the first particle of the first sequence and one to the first particle of the second sequence.Although non-deterministic content models are not allowed in SGML they are tolerated in XML and are only flagged as compatibility errors.
-
-
class
pyslet.xml.structures.
XMLNameParticle
¶ Bases:
pyslet.xml.structures.XMLContentParticle
Represents a content particle for a named element
-
name
= None¶ the name of the element type that matches this particle
-
particle_map
= None¶ Each
XMLNameParticle
has a particle map that maps the name of the ‘next’ element found in the content model to the list of possibleXMLNameParticles XMLNameParticle
that represent it in the content model.The content model can be traversed using
ContentParticleCursor
.
-
-
class
pyslet.xml.structures.
XMLChoiceList
¶ Bases:
pyslet.xml.structures.XMLContentParticle
Represents a choice list of content particles in the grammar
-
class
pyslet.xml.structures.
XMLSequenceList
¶ Bases:
pyslet.xml.structures.XMLContentParticle
Represents a sequence list of content particles in the grammar
-
class
pyslet.xml.structures.
XMLAttributeDefinition
¶ Bases:
object
Represents an Attribute declaration
There is no special functionality provided by this class, instances hold the data members identified and the class defines a number of constants suitable for setting and testing them.
Contants are defined using CAPS, mixed case versions are provided only for backwards compatibility.
-
CDATA
= 0¶ Type constant representing CDATA
-
ID
= 1¶ Type constant representing ID
-
IDREF
= 2¶ Type constant representing IDREF
-
IDREFS
= 3¶ Type constant representing IDREFS
-
ENTITY
= 4¶ Type constant representing ENTITY
-
ENTITIES
= 5¶ Type constant representing ENTITIES
-
NMTOKEN
= 6¶ Type constant representing NMTOKEN
-
NMTOKENS
= 7¶ Type constant representing NMTOKENS
-
NOTATION
= 8¶ Type constant representing NOTATION
-
ENUMERATION
= 9¶ Type constant representing an enumeration, not defined as a keyword in the specification but representing declarations that match production [59], Enumeration.
-
IMPLIED
= 0¶ Presence constant representing #IMPLIED
-
REQUIRED
= 1¶ Presence constant representing #REQUIRED
-
FIXED
= 2¶ Presence constant representing #FIXED
-
DEFAULT
= 3¶ Presence constant representing a declared default value. Not defined as a keyword but represents a declaration with a default value defined in production [60].
-
entity
= None¶ the entity in which this attribute was declared
-
name
= None¶ the name of the attribute
-
type
= None¶ One of the above type constants
-
values
= None¶ An optional dictionary of values
-
defaultValue
= None¶ An optional default value
-
6.2.2.3. Physical Structures¶
-
class
pyslet.xml.structures.
XMLEntity
(src=None, encoding=None, req_manager=None, **kws)¶ Bases:
pyslet.pep8.MigratedClass
Represents an XML entity.
This object serves two purposes, it acts as both the object used to store information about declared entities and also as a parser for feeding unicode characters to the main
XMLParser
.- src
May be a character string, a binary string, an instance of
pyslet.rfc2396.URI
, an instance ofpyslet.http.client.ClientResponse
or any object that supports file-like behaviour (seek and read).If provided, the corresponding open method is called immediately, see
open_unicode()
,open_string()
,open_uri()
,open_http_response()
andopen_file()
.- encoding
- If src is not None then this value will be passed when opening the entity reader.
- req_manager
- If src is a URI, passed to
open_uri()
XMLEntity objects act as context managers, hence it is possible to use:
with XMLEntity(src=URI.from_octets('mydata.xml')) as e: # process the entity here, will automatically close
-
location
= None¶ the location of this entity (used as the base URI to resolve relative links). A
pyslet.rfc2396.URI
instance.
-
mimetype
= None¶ The mime type of the entity, if known, or None otherwise. A
pyslet.http.params.MediaType
instance.
-
encoding
= None¶ the encoding of the entity (text entities), e.g., ‘utf-8’
-
bom
= None¶ Flag to indicate whether or not the byte order mark was detected. If detected the flag is set to True. An initial byte order mark is not reported in
the_char
or by thenext_char()
method.
-
the_char
= None¶ The character at the current position in the entity
-
line_num
= None¶ The current line number within the entity (first line is line 1)
-
line_pos
= None¶ the current character position within the entity (first char is 1)
-
buff_text
= None¶ used by
XMLParser.push_entity()
-
chunk_size
= 8192¶ Characters are read from the data_source in chunks.
The default chunk size is set from io.DEFAULT_BUFFER_SIZE, typically 8KB.
In fact, in some circumstances the entity reader starts more cautiously. If the entity reader expects to read an XML or Text declaration, which may have an encoding declaration then it reads one character at a time until the declaration is complete. This allows the reader to change to the encoding in the declaration without causing errors caused by reading too many characters using the wrong codec. See
change_encoding()
andkeep_encoding()
for more information.
-
get_name
()¶ Returns a name to represent this entity
The name is intended for logs and error messages. It defaults to the location if set.
-
is_external
()¶ Returns True if this is an external entity.
The default implementation returns True if location is not None, False otherwise.
-
open
()¶ Opens the entity for reading.
The default implementation uses
open_uri()
to open the entity fromlocation
if available, otherwise it raises NotImplementedError.
-
is_open
()¶ Returns True if the entity is open for reading.
-
open_unicode
(src)¶ Opens the entity from a unicode string.
-
open_string
(src, encoding=None)¶ Opens the entity from a binary string.
- src
- A binary string.
- encoding
- The optional encoding is used to convert the string to unicode and defaults to None - meaning that the auto-detection method will be applied.
The advantage of using this method instead of converting the string to unicode and calling
open_unicode()
is that this method creates a unicode reader object to parse the string instead of making a copy of it in memory.
-
open_file
(src, encoding='utf-8')¶ Opens the entity from a file
- src
- An existing (open) binary file.
The optional encoding provides a hint as to the intended encoding of the data and defaults to UTF-8. Unlike other Open* methods we do not assume that the file is seekable however, you may set encoding to None for a seekable file thus invoking auto-detection of the encoding.
-
open_uri
(src, encoding=None, req_manager=None, **kws)¶ Opens the entity from a URI.
- src
- A
pyslet.rfc2396.URI
instance of either file, http or https schemes. - encoding
- The optional encoding provides a hint as to the intended encoding of the data and defaults to UTF-8. For http(s) resources this parameter is only used if the charset cannot be read successfully from the HTTP headers.
- req_manager
- The optional req_manager allows you to pass an existing
instance of
pyslet.http.client.Client
for handling URI with http or https schemes. (reqManager is supported for backwards compatibility.)
-
open_http_response
(src, encoding='utf-8')¶ Opens the entity from an HTTP response passed in src.
- src
- An
pyslet.http.client.ClientResponse
instance. - encoding
- The optional encoding provides a hint as to the intended encoding of the data and defaults to UTF-8. This parameter is only used if the charset cannot be read successfully from the HTTP response headers.
-
reset
()¶ Resets an open entity
The entity returns to the first character in the entity.
-
get_position_str
()¶ A short string describing the current position.
For example, if the current character is pointing to character 6 of line 4 then it will return the string ‘Line 4.6’
-
next_char
()¶ Advances to the next character in an open entity.
This method takes care of the End-of-Line handling rules for XML which force us to remove any CR characters and replace them with LF if they appear on their own or to silenty drop them if they appear as part of a CR-LF combination.
-
auto_detect_encoding
(src_file)¶ Auto-detects the character encoding
- src_file
- A file object. The object must support seek and blocking read operations. If src_file has been opened in text mode then no action is taken.
-
change_encoding
(encoding)¶ Changes the character encoding used for this entity.
In many cases we can only guess at the encoding used in a file or other byte stream. However, XML has a mechanism for declaring the encoding as part of the XML or Text declaration. This declaration can typically be parsed even if the encoding has been guessed incorrectly initially. This method allows the XML parser to notify the entity that a new encoding has been declared and that future characters should be interpreted with this new encoding. (There are some situations where the request is ignored, such as when the encoding has already been detected to be UCS-2 or UCS-4 or when the source stream is not seekable.)
You can only change the encoding once. This method calls
keep_encoding()
once the encoding has been changed.
-
keep_encoding
()¶ Fixes the character encoding used in the entity.
This entity parser starts in a cautious mode, parsing the entity one character a time to avoid errors caused by buffering with the wrong encoding. This method should be called once the encoding is determined so that the entity parser can use its internal character buffer.
-
next_line
()¶ Called when the entity reader detects a new line.
This method increases the internal line count and resets the character position to the beginning of the line. You will not normally need to call this directly as line handling is done automatically by
next_char()
.
-
KeepEncoding
(*args, **kwargs)¶ Deprecated equivalent to
keep_encoding()
-
close
()¶ Closes the entity.
-
class
pyslet.xml.structures.
XMLDeclaredEntity
(name=None, definition=None)¶ Bases:
pyslet.xml.structures.XMLEntity
Abstract class representing a declared entitiy.
- name
- An optional string used as the name of the entity
- definition
- The definition of the entity is either a string or an instance of
XMLExternalID
, depending on whether the entity is an internal or external entity respectively.
-
entity
= None¶ the entity in which this entity was declared
-
name
= None¶ the name passed to the constructor
-
definition
= None¶ the definition passed to the constructor
-
get_name
()¶ Human-readable name suitable for logging/error reporting.
Simply returns name
-
is_external
()¶ Returns True if this is an external entity.
-
open
()¶ Opens the entity for reading.
External entities must be parsed for text declarations before the replacement text is encountered. This requires a small amount of look-ahead which may result in some characters needing to be re-parsed. We pass this to future parsers using
buff_text
.
-
class
pyslet.xml.structures.
XMLGeneralEntity
(name=None, definition=None, notation=None)¶ Bases:
pyslet.xml.structures.XMLDeclaredEntity
Represents a general entity.
- name
- Optional name
- definition
- An optional definition
- notation
- An optional notation.
-
notation
= None¶ the notation name for external unparsed entities
-
get_name
()¶ Formats the name as a general entity reference.
-
class
pyslet.xml.structures.
XMLParameterEntity
(name=None, definition=None)¶ Bases:
pyslet.xml.structures.XMLDeclaredEntity
Represents a parameter entity.
- name
- An optional name
- definition
- An optional definition.
See base class for more information on the parameters.
-
open_as_pe
()¶ Opens the parameter entity in the context of a DTD.
This special method implements the rule that the replacement text of a parameter entity, when included as a PE, must be enlarged by the attachment of a leading and trailing space.
-
get_name
()¶ Formats the name as a parameter entity reference.
-
class
pyslet.xml.structures.
XMLExternalID
(public=None, system=None)¶ Bases:
object
Represents external references to entities.
- public
- An optional public identifier
- system
- An optional system identifier
One (or both) of the identifiers should be provided.
-
get_location
(base=None)¶ Get an absolute URI for the external entity.
Returns a
pyslet.rfc2396.URI
resolved againstbase
if applicable. If there is no system identifier then None is returned.
-
class
pyslet.xml.structures.
XMLTextDeclaration
(version='1.0', encoding='UTF-8')¶ Bases:
object
Represents the text components of an XML declaration.
Both version and encoding are optional, though one or other are required depending on the context in which the declaration will be used.
-
class
pyslet.xml.structures.
XMLNotation
(name, external_id)¶ Bases:
object
Represents an XML Notation defined in Section 4.7
- name
- The name of the notation
- external_id
- A
XMLExternalID
instance in which one of public or system must be provided.
-
name
= None¶ the notation name
-
external_id
= None¶ the external ID of the notation (an XMLExternalID instance)
6.2.2.4. Syntax¶
6.2.2.4.1. White Space Handling¶
-
pyslet.xml.structures.
is_s
(c)¶ Tests production [3] S
Optimized for speed as this function is called a lot by the parser.
-
pyslet.xml.structures.
collapse_space
(data, smode=True, stest=<function is_s>)¶ Returns data with all spaces collapsed to a single space.
- smode
- Determines the fate of any leading space, by default it is True and leading spaces are ignored provided the string has some non-space characters.
- stest
- You can override the test of what consitutes a space by passing
a function for stest, by default we use
is_s()
and any value passed to stest should behave similarly.
Note on degenerate case: this function is intended to be called with non-empty strings and will never return an empty string. If there is no data then a single space is returned (regardless of smode).
6.2.2.4.2. Names¶
-
pyslet.xml.structures.
is_name_start_char
(c)¶ Tests if the character c matches production [4] NameStartChar.
-
pyslet.xml.structures.
is_name_char
(c)¶ Tests production [4a] NameChar
-
pyslet.xml.structures.
is_valid_name
(name)¶ Tests if name is a string matching production [5] Name
-
pyslet.xml.structures.
is_reserved_name
(name)¶ Tests if name is reserved
Names beginning with ‘xml’ are reserved for future standardization
6.2.2.4.3. Character Data and Markup¶
-
pyslet.xml.structures.
escape_char_data
(src, quote=False)¶ Returns a unicode string with XML reserved characters escaped.
We also escape return characters to prevent them being ignored. If quote is True then the string is returned as a quoted attribute value.
-
pyslet.xml.structures.
escape_char_data7
(src, quote=False)¶ Escapes reserved and non-ASCII characters.
- src
- A character string
- quote (defaults to False)
- When True, will surround the output in either single or double quotes (preferred) depending on the contents of src.
Characters outside the ASCII range are replaced with character references.
6.2.2.4.4. CDATA Sections¶
-
pyslet.xml.structures.
CDATA_START
= u'<![CDATA['¶ character string constant for “<![CDATA[“
-
pyslet.xml.structures.
CDATA_END
= u']]>'¶ character string constant for “]]>”
-
pyslet.xml.structures.
escape_cdsect
(src)¶ Wraps a string in a CDATA section
- src
- A character string of data
Returns a character string enclosed in <![CDATA[ ]]> with ]]> replaced by the clumsy sequence: ]]>]]><![CDATA[
Degenerate case: an empty string is returned as an empty string
6.2.2.4.5. Exceptions¶
-
class
pyslet.xml.structures.
XMLError
¶ Bases:
exceptions.Exception
Base class for all exceptions raised by this module.
-
class
pyslet.xml.structures.
XMLValidityError
¶ Bases:
pyslet.xml.structures.XMLError
Base class for all validation errors
Raised when a document or content model violates a validity constraint. These errors can be generated by the parser (for example, when validating a document against a declared DTD) or by Elements themselves when content is encountered that does not fit content model expected.
-
class
pyslet.xml.structures.
XMLIDClashError
¶ Bases:
pyslet.xml.structures.XMLValidityError
A validity error caused by two elements with the same ID
-
class
pyslet.xml.structures.
XMLIDValueError
¶ Bases:
pyslet.xml.structures.XMLValidityError
A validity error caused by an element with an invalid ID
ID attribute must satisfy the production for NAME.
-
class
pyslet.xml.structures.
DuplicateXMLNAME
¶ Bases:
pyslet.xml.structures.XMLError
Raised by
map_class_elements()
Indicates an attempt to declare two classes with the same XML name.
-
class
pyslet.xml.structures.
XMLAttributeSetter
¶ Bases:
pyslet.xml.structures.XMLError
Raised when a badly formed attribute mapping is found.
-
class
pyslet.xml.structures.
XMLMixedContentError
¶ Bases:
pyslet.xml.structures.XMLError
Raised by
Element.get_value()
Indicates unexpected element children.
-
class
pyslet.xml.structures.
XMLParentError
¶ Bases:
pyslet.xml.structures.XMLError
Raised by
Element.attach_to_parent()
Indicates that the element was not an orphan.
-
class
pyslet.xml.structures.
XMLUnknownChild
¶ Bases:
pyslet.xml.structures.XMLError
Raised by
Element.remove_child()
Indicates that the child being removed was not found in the element’s content.