6.3. XML: Parsing XML Documents¶
This module exposes a number of internal functions typically defined privately
in XML parser implementations which make it easier to reuse concepts from XML in
other modules. For example, the IsNameStartChar()
tells you if a character
matches the production for NameStartChar in the XML standard.
-
class
pyslet.xml20081126.parser.
XMLParser
(entity)¶ Bases:
pyslet.pep8.PEP8Compatibility
An XMLParser object
- entity
- The
XMLEntity
to parse.
XMLParser objects are used to parse entities for the constructs defined by the numbered productions in the XML specification.
XMLParser has a number of optional attributes, all of which default to False. Attributes with names started ‘check’ increase the strictness of the parser. All other parser flags, if set to True, will not result in a conforming XML processor.
-
DocumentClassTable
= {}¶ A dictionary mapping doctype parameters onto class objects.
For more information about how this is used see
get_document_class()
andRegisterDocumentClass()
.
-
RefModeInContent
= 1¶ Treat references as per “in Content” rules
-
RefModeInAttributeValue
= 2¶ Treat references as per “in Attribute Value” rules
-
RefModeAsAttributeValue
= 3¶ Treat references as per “as Attribute Value” rules
-
RefModeInEntityValue
= 4¶ Treat references as per “in EntityValue” rules
-
RefModeInDTD
= 5¶ Treat references as per “in DTD” rules
-
PredefinedEntities
= {'amp': '&', 'lt': '<', 'gt': '>', 'apos': "'", 'quot': '"'}¶ A mapping from the names of the predefined entities (lt, gt, amp, apos, quot) to their replacement characters.
-
checkValidity
= None¶ checks XML validity constraints If checkValidity is True, and all other options are left at their default (False) setting then the parser will behave as a validating XML parser.
-
valid
= None¶ Flag indicating if the document is valid, only set if
checkValidity
is True
-
nonFatalErrors
= None¶ A list of non-fatal errors discovered during parsing, only populated if
checkValidity
is True
-
checkCompatibility
= None¶ checks XML compatibility constraints; will cause
checkValidity
to be set to True when parsing
-
checkAllErrors
= None¶ checks all constraints; will cause
checkValidity
andcheckCompatibility
to be set to True when parsing.
-
raiseValidityErrors
= None¶ treats validity errors as fatal errors
-
dontCheckWellFormedness
= None¶ provides a loose parser for XML-like documents
-
unicodeCompatibility
= None¶
-
sgmlNamecaseGeneral
= None¶ option that simulates SGML’s NAMECASE GENERAL YES
-
sgmlNamecaseEntity
= None¶ option that simulates SGML’s NAMECASE ENTITY YES
-
sgmlOmittag
= None¶ option that simulates SGML’s OMITTAG YES
-
sgmlShorttag
= None¶ option that simulates SGML’s SHORTTAG YES
-
sgmlContent
= None¶ This option simulates some aspects of SGML content handling based on class attributes of the element being parsed.
Element classes with XMLCONTENT=:py:data:XMLEmpty are treated as elements declared EMPTY, these elements are treated as if they were introduced with an empty element tag even if they weren’t, as per SGML’s rules. Note that this SGML feature “has nothing to do with markup minimization” (i.e.,
sgmlOmittag
.)
-
refMode
= None¶ The current parser mode for interpreting references.
XML documents can contain five different types of reference: parameter entity, internal general entity, external parsed entity, (external) unparsed entity and character entity.
The rules for interpreting these references vary depending on the current mode of the parser, for example, in content a reference to an internal entity is replaced, but in the definition of an entity value it is not. This means that the behaviour of the
parse_reference()
method will differ depending on the mode.The parser takes care of setting the mode automatically but if you wish to use some of the parsing methods in isolation to parse fragments of XML documents, then you will need to set the refMode directly using one of the RefMode* family of constants defined above.
-
entity
= None¶ The current entity being parsed
-
the_char
= None¶ the current character; None indicates end of stream
-
declaration
= None¶ The declaration being parsed or None
-
dtd
= None¶ The documnet type declaration of the document being parsed. This member is initialised to None as well-formed XML documents are not required to have an associated dtd.
-
doc
= None¶ The document being parsed
-
docEntity
= None¶ The document entity
-
element
= None¶ The current element being parsed
-
elementType
= None¶ The element type of the current element
-
get_context
()¶ Returns the parser’s context
This is either the current element or the document if no element is being parsed.
-
next_char
()¶ Moves to the next character in the stream.
The current character can always be read from
the_char
. If there are no characters left in the current entity then entities are popped from an internal entity stack automatically.
-
buff_text
(unused_chars)¶ Buffers characters that have already been parsed.
- unused_chars
- A string of characters to be pushed back to the parser in the order in which they are to be parsed.
This method enables characters to be pushed back into the parser forcing them to be parsed next. The current character is saved and will be parsed (again) once the buffer is exhausted.
-
push_entity
(entity)¶ Starts parsing an entity
- entity
- An
XMLEntity
instance which is to be parsed.
the_char
is set to the current character in the entity’s stream. The current entity is pushed onto an internal stack and will be resumed when this entity has been parsed completely.Note that in the degenerate case where the entity being pushed is empty (or is already positioned at the end of the file) then push_entity does nothing.
-
check_encoding
(entity, declared_encoding)¶ Checks the entity against the declared encoding
- entity
- An
XMLEntity
instance which is being parsed. - declared_encoding
- A string containing the declared encoding in any declaration or None if there was no declared encoding in the entity.
-
get_external_entity
()¶ Returns the external entity currently being parsed.
If no external entity is being parsed then None is returned.
-
standalone
()¶ True if the document should be treated as standalone.
A document may be declared standalone or it may effectively be standalone due to the absence of a DTD, or the absence of an external DTD subset and parameter entity references.
-
declared_standalone
()¶ True if the current document was declared standalone.
-
well_formedness_error
(msg='well-formedness error', error_class=<class 'pyslet.xml20081126.structures.XMLWellFormedError'>)¶ Raises an XMLWellFormedError error.
- msg
- An optional message string
- error_class
- an optional error class which must be a class object derived from py:class:XMLWellFormednessError.
Called by the parsing methods whenever a well-formedness constraint is violated.
The method raises an instance of error_class and does not return. This method can be overridden by derived parsers to implement more sophisticated error logging.
-
validity_error
(msg='validity error', error=<class 'pyslet.xml20081126.structures.XMLValidityError'>)¶ Called when the parser encounters a validity error.
- msg
- An optional message string
- error
- An optional error class or instance which must be a (class) object derived from py:class:XMLValidityError.
The behaviour varies depending on the setting of the
checkValidity
andraiseValidityErrors
options. The default (both False) causes validity errors to be ignored. When checking validity an error message is logged tononFatalErrors
andvalid
is set to False. Furthermore, ifraiseValidityErrors
is True error is raised (or a new instance of error is raised) and parsing terminates.This method can be overridden by derived parsers to implement more sophisticated error logging.
-
compatibility_error
(msg='compatibility error')¶ Called when the parser encounters a compatibility error.
- msg
- An optional message string
The behaviour varies depending on the setting of the
checkCompatibility
flag. The default (False) causes compatibility errors to be ignored. When checking compatibility an error message is logged tononFatalErrors
.This method can be overridden by derived parsers to implement more sophisticated error logging.
-
processing_error
(msg='Processing error')¶ Called when the parser encounters a general processing error.
- msg
- An optional message string
The behaviour varies depending on the setting of the
checkAllErrors
flag. The default (False) causes processing errors to be ignored. When checking all errors an error message is logged tononFatalErrors
.This method can be overridden by derived parsers to implement more sophisticated error logging.
-
parse_literal
(match)¶ Parses an optional literal string.
- match
- The literal string to match
Returns True if match is successfully parsed and False otherwise. There is no partial matching, if match is not found then the parser is left in its original position.
-
parse_required_literal
(match, production='Literal String')¶ Parses a required literal string.
- match
- The literal string to match
- production
- An optional string describing the context in which the literal was expected.
There is no return value. If the literal is not matched a wellformed error is generated.
-
parse_decimal_digits
()¶ Parses a, possibly empty, string of decimal digits.
Decimal digits match [0-9]. Returns the parsed digits as a string or an empty string if no digits were matched.
-
parse_required_decimal_digits
(production='Digits')¶ Parses a required sring of decimal digits.
- production
- An optional string describing the context in which the decimal digits were expected.
Decimal digits match [0-9]. Returns the parsed digits as a string.
-
parse_hex_digits
()¶ Parses a, possibly empty, string of hexadecimal digits
Hex digits match [0-9a-fA-F]. Returns the parsed digits as a string or an empty string if no digits were matched.
-
parse_required_hex_digits
(production='Hex Digits')¶ Parses a required string of hexadecimal digits.
- production
- An optional string describing the context in which the hexadecimal digits were expected.
Hex digits match [0-9a-fA-F]. Returns the parsed digits as a string.
-
parse_quote
(q=None)¶ Parses the quote character
- q
- An optional character to parse as if it were a quote. By default either one of “’” or ‘”’ is accepted.
Returns the character parsed or raises a well formed error.
-
parse_document
(doc=None)¶ [1] document: parses a Document.
- doc
- The
Document
instance that will be parsed. The declaration, dtd and elements are added to this document. If doc is None then a new instance is created usingget_document_class()
to identify the correct class to use to represent the document based on information in the prolog or, if the prolog lacks a declaration, the root element.
This method returns the document that was parsed, an instance of
Document
.
-
get_document_class
(dtd)¶ Returns a class object suitable for this dtd
- dtd
- A
XMLDTD
instance
Returns a class object derived from
Document
suitable for representing a document with the given document type declaration.In cases where no doctype declaration is made a dummy declaration is created based on the name of the root element. For example, if the root element is called “database” then the dtd is treated as if it was declared as follows:
<!DOCTYPE database>
This default implementation uses the following three pieces of information to locate a class registered with
RegisterDocumentClass()
. The PublicID, SystemID and the name of the root element. If an exact match is not found then wildcard matches are attempted, ignoring the SystemID, PublicID and finally the root element in turn. If a document class still cannot be found then wildcard matches are tried matching only the PublicID, SystemID and root element in turn.If no document class cab be found,
Document
is returned.
-
is_s
()¶ Tests if the current character matches S
Returns a boolean value, True if S is matched.
By default calls
is_s()
In Unicode compatibility mode the function maps the unicode white space characters at code points 2028 and 2029 to line feed and space respectively.
-
parse_s
()¶ [3] S
Parses white space returning it as a string. If there is no white space at the current position then an empty string is returned.
The productions in the specification do not make explicit mention of parameter entity references, they are covered by the general statement that “Parameter entity references are recognized anwhere in the DTD...” In practice, this means that while parsing the DTD, anywhere that an S is permitted a parameter entity reference may also be recognized. This method implements this behaviour, recognizing parameter entity references within S when
refMode
isRefModeInDTD
.
-
parse_required_s
(production='[3] S')¶ [3] S: Parses required white space
- production
- An optional string describing the production being parsed. This allows more useful errors than simply ‘expected [3] S’ to be logged.
If there is no white space then a well-formedness error is raised.
-
parse_name
()¶ [5] Name
Parses an optional name. The name is returned as a unicode string. If no Name can be parsed then None is returned.
-
parse_required_name
(production='Name')¶ [5] Name
- production
- An optional string describing the production being parsed. This allows more useful errors than simply ‘expected [5] Name’ to be logged.
Parses a required Name, returning it as a string. If no name can be parsed then a well-formed error is raised.
-
parse_names
()¶ [6] Names
This method returns a tuple of unicode strings. If no names can be parsed then None is returned.
-
parse_nmtoken
()¶ [7] Nmtoken
Returns a Nmtoken as a string or, if no Nmtoken can be parsed then None is returned.
-
parse_nmtokens
()¶ [8] Nmtokens
This method returns a tuple of unicode strings. If no tokens can be parsed then None is returned.
-
parse_entity_value
()¶ [9] EntityValue
Parses an EntityValue, returning it as a unicode string.
This method automatically expands other parameter entity references but does not expand general or character references.
-
parse_att_value
()¶ [10] AttValue
The value is returned without the surrounding quotes and with any references expanded.
The behaviour of this method is affected significantly by the setting of the
dontCheckWellFormedness
flag. When set, attribute values can be parsed without surrounding quotes. For compatibility with SGML these values should match one of the formal value types (e.g., Name) but this is not enforced so values like width=100% can be parsed without error.
-
parse_system_literal
()¶ [11] SystemLiteral
The value of the literal is returned as a string without the enclosing quotes.
-
parse_pubid_literal
()¶ [12] PubidLiteral
The value of the literal is returned as a string without the enclosing quotes.
-
parse_char_data
()¶ [14] CharData
Parses a run of character data. The method adds the parsed data to the current element. In the default parsing mode it returns None.
When the parser option
sgmlOmittag
is selected the method returns any parsed character data that could not be added to the current element due to a model violation. Note that in this SGML-like mode any S is treated as being in the current element as the violation doesn’t occur until the first non-S character (so any implied start tag is treated as being immediately prior to the first non-S).
-
parse_comment
(got_literal=False)¶ [15] Comment
- got_literal
- If True then the method assumes that the ‘<!–’ literal has already been parsed.
Returns the comment as a string.
-
parse_pi
(got_literal=False)¶ [16] PI: parses a processing instruction.
- got_literal
- If True the method assumes the ‘<?’ literal has already been parsed.
This method calls the
Node.ProcessingInstruction()
of the current element or of the document if no element has been parsed yet.
-
parse_pi_target
()¶ [17] PITarget
Parses a processing instruction target name, the name is returned.
-
parse_cdsect
(got_literal=False, cdend=u']]>')¶ [18] CDSect
- got_literal
- If True then the method assumes the initial literal has already been parsed. (By default, CDStart.)
- cdend
- Optional string. The literal used to signify the end of the CDATA section can be overridden by passing an alternative literal in cdend. Defaults to ‘]]>’
This method adds any parsed data to the current element, there is no return value.
-
parse_cdstart
()¶ [19] CDStart
Parses the literal that starts a CDATA section.
-
parse_cdata
(cdend=']]>')¶ [20] CData
Parses a run of CData up to but not including cdend.
This method adds any parsed data to the current element, there is no return value.
-
parse_cdend
()¶ [21] CDEnd
Parses the end of a CDATA section.
-
parse_prolog
()¶ [22] prolog
Parses the document prolog, including the XML declaration and dtd.
-
parse_xml_decl
(got_literal=False)¶ [23] XMLDecl
- got_literal
- If True the initial literal ‘<?xml’ is assumed to have already been parsed.
Returns an
XMLDeclaration
instance. Also, if an encoding is given in the declaration then the method changes the encoding of the current entity to match. For more information seeChangeEncoding()
.
-
parse_version_info
(got_literal=False)¶ [24] VersionInfo
- got_literal
- If True, the method assumes the initial white space and ‘version’ literal has been parsed already.
The version number is returned as a string.
-
parse_eq
(production='[25] Eq')¶ [25] Eq
- production
- An optional string describing the production being parsed. This allows more useful errors than simply ‘expected [25] Eq’ to be logged.
Parses an equal sign, optionally surrounded by white space
-
parse_version_num
()¶ [26] VersionNum
Parses the XML version number, returning it as a string, e.g., “1.0”.
-
parse_misc
()¶ [27] Misc
This method parses everything that matches the production Misc*
-
parse_doctypedecl
(got_literal=False)¶ [28] doctypedecl
- got_literal
- If True, the method assumes the initial ‘<!DOCTYPE’ literal has been parsed already.
This method creates a new instance of
XMLDTD
and assigns it to py:attr:dtd, it also returns this instance as the result.
-
parse_decl_sep
()¶ [28a] DeclSep
Parses a declaration separator.
-
parse_int_subset
()¶ [28b] intSubset
Parses an internal subset.
-
parse_markup_decl
(got_literal=False)¶ [29] markupDecl
- got_literal
- If True, the method assumes the initial ‘<’ literal has been parsed already.
Returns True if a markupDecl was found, False otherwise.
-
parse_ext_subset
()¶ [30] extSubset
Parses an external subset
-
parse_ext_subset_decl
()¶ [31] extSubsetDecl
Parses declarations in the external subset.
-
check_pe_between_declarations
(check_entity)¶ [31] extSubsetDecl
- check_entity
- A
XMLEntity
object, the entity we should still be parsing.
Checks the well-formedness constraint on use of PEs between declarations.
-
parse_sd_decl
(got_literal=False)¶ [32] SDDecl
- got_literal
- If True, the method assumes the initial ‘standalone’ literal has been parsed already.
Returns True if the document should be treated as standalone; False otherwise.
-
parse_element
()¶ [39] element
The class used to represent the element is determined by calling the
get_element_class()
method of the current document. If there is no document yet then a new document is created automatically (seeparse_document()
for more information).The element is added as a child of the current element using
Node.ChildElement()
.The method returns a boolean value:
- True
- the element was parsed normally
- False
- the element is not allowed in this context
The second case only occurs when the
sgmlOmittag
option is in use and it indicates that the content of the enclosing element has ended. The Tag is buffered so that it can be reparsed when the stack of nestedparse_content()
andparse_element()
calls is unwound to the point where it is allowed by the context.
-
check_attributes
(name, attrs)¶ Checks attrs against the declarations for an element.
- name
- The name of the element
- attrs
- A dictionary of attributes
Adds any omitted defaults to the attribute list. Also, checks the validity of the attributes which may result in values being further normalized as per the rules for collapsing spaces in tokenized values.
-
match_xml_name
(element, name)¶ Tests if name is a possible name for element.
- element
- A
Element
instance. - name
- The name of an end tag, as a string.
This method is used by the parser to determine if an end tag is the end tag of this element. It is provided as a separate method to allow it to be overridden by derived parsers.
The default implementation simply compares name with
GetXMLName()
-
check_expected_particle
(name)¶ Checks the validity of element name in the current context.
- name
- The name of the element encountered. An empty string for name indicates the enclosing end tag was found.
This method also maintains the position of a pointer into the element’s content model.
-
get_stag_class
(name, attrs=None)¶ [40] STag
- name
- The name of the element being started
- attrs
- A dictionary of attributes of the element being started
Returns information suitable for starting the element in the current context.
If there is no
Document
instance yet this method assumes that it is being called for the root element and selects an appropriate class based on the contents of the prolog and/or name.When using the
sgmlOmittag
option name may be None indicating that the method should return information about the element implied by PCDATA in the current context (only called when an attempt to add data to the current context has already failed).The result is a triple of:
- element_class
- the element class that this STag must introduce or None if this STag does not belong (directly or indirectly) in the current context
- element_name
- the name of the element (to pass to ChildElement) or None to use the default
- buff_flag
- True indicates an omitted tag and that the triggering STag (i.e., the STag with name name) should be buffered.
-
parse_stag
()¶ [40] STag, [44] EmptyElemTag
This method returns a tuple of (name, attrs, emptyFlag) where:
- name
- the name of the element parsed
- attrs
- a dictionary of attribute values keyed by attribute name
- emptyFlag
- a boolean; True indicates that the tag was an empty element tag.
-
parse_attribute
()¶ [41] Attribute
Returns a tuple of (name, value) where:
- name
- is the name of the attribute or None if
sgmlShorttag
is True and a short form attribute value was supplied. - value
- the attribute value.
If
dontCheckWellFormedness
is set the parser uses a very generous form of parsing attribute values to accomodate common syntax errors.
-
parse_etag
(got_literal=False)¶ [42] ETag
- got_literal
- If True, the method assumes the initial ‘</’ literal has been parsed already.
The method returns the name of the end element parsed.
-
parse_content
()¶ [43] content
The method returns:
- True
- indicates that the content was parsed normally
- False
- indicates that the content contained data or markup not allowed in this context
The second case only occurs when the
sgmlOmittag
option is in use and it indicates that the enclosing element has ended (i.e., the element’s ETag has been omitted). See py:meth:parse_element for more information.
-
handle_data
(data, cdata=False)¶ [43] content
- data
- A string of data to be handled
- cdata
- If True data is treated as character data (even if it matches the production for S).
Data is handled by calling
AddData()
even if the data is optional white space.
-
unhandled_data
(data)¶ [43] content
- data
- A string of unhandled data
This method is only called when the
sgmlOmittag
option is in use. It processes data that occurs in a context where data is not allowed.It returns a boolean result:
- True
- the data was consumed by a sub-element (with an omitted start tag)
- False
- the data has been buffered and indicates the end of the current content (an omitted end tag).
-
parse_empty_elem_tag
()¶ [44] EmptyElemTag
There is no method for parsing empty element tags alone.
This method raises NotImplementedError. Instead, you should call
parse_stag()
and examine the result. If it returns False then an empty element was parsed.
-
parse_element_decl
(got_literal=False)¶ [45] elementdecl
- got_literal
- If True, the method assumes that the ‘<!ELEMENT’ literal has already been parsed.
Declares the element type in the
dtd
, (if present). There is no return result.
-
parse_content_spec
(etype)¶ [46] contentspec
- etype
- An
ElementType
instance.
Sets the
contentType
andcontentModel
attributes of etype, there is no return value.
-
parse_children
(got_literal=False, group_entity=None)¶ [47] children
- got_literal
- If True, the method assumes that the initial ‘(‘ literal has already been parsed, including any following white space.
- group_entity
- An optional
XMLEntity
object. If got_literal is True then group_entity must be the entity in which the opening ‘(‘ was parsed which started the choice group.
The method returns an instance of
XMLContentParticle
.
-
parse_cp
()¶ [48] cp
Returns an
XMLContentParticle
instance.
-
parse_choice
(first_child=None, group_entity=None)¶ [49] choice
- first_child
- An optional
XMLContentParticle
instance. If present the method assumes that the first particle and any following white space has already been parsed. - group_entity
- An optional
XMLEntity
object. If first_child is given then group_entity must be the entity in which the opening ‘(‘ was parsed which started the choice group.
Returns an
XMLChoiceList
instance.
-
parse_seq
(first_child=None, group_entity=None)¶ [50] seq
- first_child
- An optional
XMLContentParticle
instance. If present the method assumes that the first particle and any following white space has already been parsed. In this case, group_entity must be set to the entity which contained the opening ‘(‘ literal. - group_entity
- An optional
XMLEntity
object, see above.
Returns a
XMLSequenceList
instance.
-
parse_mixed
(got_literal=False, group_entity=None)¶ [51] Mixed
- got_literal
- If True, the method assumes that the #PCDATA literal has already been parsed. In this case, group_entity must be set to the entity which contained the opening ‘(‘ literal.
- group_entity
- An optional
XMLEntity
object, see above.
Returns an instance of
XMLChoiceList
with occurrenceZeroOrMore
representing the list of elements that may appear in the mixed content model. If the mixed model contains #PCDATA only the choice list will be empty.
-
parse_attlist_decl
(got_literal=False)¶ [52] AttlistDecl
- got_literal
- If True, assumes that the leading ‘<!ATTLIST’ literal has already been parsed.
Declares the attriutes in the
dtd
, (if present). There is no return result.
-
parse_att_def
(got_s=False)¶ [53] AttDef
- got_s
- If True, the method assumes that the leading S has already been parsed.
Returns an instance of
XMLAttributeDefinition
.
-
parse_att_type
(a)¶ [54] AttType
- a
- A required
XMLAttributeDefinition
instance.
This method sets the
type
andvalues
fields of a.Note that, to avoid unnecessary look ahead, this method does not call
parse_string_type()
orparse_enumerated_type()
.
-
parse_string_type
(a)¶ [55] StringType
- a
- A required
XMLAttributeDefinition
instance.
This method sets the
type
andvalues
fields of a.This method is provided for completeness. It is not called during normal parsing operations.
-
parse_tokenized_type
(a)¶ [56] TokenizedType
- a
- A required
XMLAttributeDefinition
instance.
-
parse_enumerated_type
(a)¶ [57] EnumeratedType
- a
- A required
XMLAttributeDefinition
instance.
This method sets the
type
andvalues
fields of a.This method is provided for completeness. It is not called during normal parsing operations.
-
parse_notation_type
(got_literal=False)¶ [58] NotationType
- got_literal
- If True, assumes that the leading ‘NOTATION’ literal has already been parsed.
Returns a list of strings representing the names of the declared notations being referred to.
-
parse_enumeration
()¶ [59] Enumeration
Returns a dictionary of strings representing the tokens in the enumeration.
-
parse_default_decl
(a)¶ [60] DefaultDecl: parses an attribute’s default declaration.
- a
- A required
XMLAttributeDefinition
instance.
This method sets the
presence
anddefaultValue
fields of a.
-
parse_conditional_sect
(got_literal_entity=None)¶ [61] conditionalSect
- got_literal_entity
- An optional
XMLEntity
object. If given, the method assumes that the initial literal ‘<![‘ has already been parsed from that entity.
-
parse_include_sect
(got_literal_entity=None)¶ [62] includeSect:
- got_literal_entity
- An optional
XMLEntity
object. If given, the method assumes that the production, up to and including the keyword ‘INCLUDE’ has already been parsed and that the opening ‘<![‘ literal was parsed from that entity.
There is no return value.
-
parse_ignore_sect
(got_literal_entity=None)¶ [63] ignoreSect
- got_literal_entity
- An optional
XMLEntity
object. If given, the method assumes that the production, up to and including the keyword ‘IGNORE’ has already been parsed and that the opening ‘<![‘ literal was parsed from this entity.
There is no return value.
-
parse_ignore_sect_contents
()¶ [64] ignoreSectContents
Parses the contents of an ignored section. The method returns no data.
-
parse_ignore
()¶ [65] Ignore
Parses a run of characters in an ignored section. This method returns no data.
-
parse_char_ref
(got_literal=False)¶ [66] CharRef
- got_literal
- If True, assumes that the leading ‘&’ literal has already been parsed.
The method returns a unicode string containing the character referred to.
-
parse_reference
()¶ [67] Reference
This method returns any data parsed as a result of the reference. For a character reference this will be the character referred to. For a general entity the data returned will depend on the parsing context. For more information see
parse_entity_ref()
.
-
parse_entity_ref
(got_literal=False)¶ [68] EntityRef
- got_literal
- If True, assumes that the leading ‘&’ literal has already been parsed.
This method returns any data parsed as a result of the reference. For example, if this method is called in a context where entity references are bypassed then the string returned will be the literal characters parsed, e.g., “&ref;”.
If the entity reference is parsed successfully in a context where Entity references are recognized, the reference is looked up according to the rules for validating and non-validating parsers and, if required by the parsing mode, the entity is opened and pushed onto the parser so that parsing continues with the first character of the entity’s replacement text.
A special case is made for the predefined entities. When parsed in a context where entity references are recognized these entities are expanded immediately and the resulting character returned. For example, the entity & returns the ‘&’ character instead of pushing an entity with replacement text ‘&’.
Inclusion of an unescaped & is common so when we are not checking well-formedness we treat ‘&’ not followed by a name as if it were ‘&’. Similarly we are generous about the missing ‘;’.
-
lookup_predefined_entity
(name)¶ Looks up pre-defined entities, e.g., “lt”
This method can be overridden by variant parsers to implement other pre-defined entity tables.
-
parse_pe_reference
(got_literal=False)¶ [69] PEReference
- got_literal
- If True, assumes that the initial ‘%’ literal has already been parsed.
This method returns any data parsed as a result of the reference. Normally this will be an empty string because the method is typically called in contexts where PEReferences are recognized. However, if this method is called in a context where PEReferences are not recognized the returned string will be the literal characters parsed, e.g., “%ref;”
If the parameter entity reference is parsed successfully in a context where PEReferences are recognized, the reference is looked up according to the rules for validating and non-validating parsers and, if required by the parsing mode, the entity is opened and pushed onto the parser so that parsing continues with the first character of the entity’s replacement text.
-
parse_entity_decl
(got_literal=False)¶ [70] EntityDecl
- got_literal
- If True, assumes that the literal ‘<!ENTITY’ has already been parsed.
Returns an instance of either
XMLGeneralEntity
orXMLParameterEntity
depending on the type of entity parsed.
-
parse_ge_decl
(got_literal=False)¶ [71] GEDecl
- got_literal
- If True, assumes that the literal ‘<!ENTITY’ and the required S has already been parsed.
Returns an instance of
XMLGeneralEntity
.
-
parse_pe_decl
(got_literal=False)¶ [72] PEDecl
- got_literal
- If True, assumes that the literal ‘<!ENTITY’ and the required S has already been parsed.
Returns an instance of
XMLParameterEntity
.
-
parse_entity_def
(ge)¶ [73] EntityDef
- ge
- The general entity being parsed, an
XMLGeneralEntity
instance.
This method sets the
definition
andnotation
fields from the parsed entity definition.
-
parse_pe_def
(pe)¶ [74] PEDef
- pe
- The parameter entity being parsed, an
XMLParameterEntity
instance.
This method sets the
definition
field from the parsed parameter entity definition. There is no return value.
-
parse_external_id
(allow_public_only=False)¶ [75] ExternalID
- allow_public_only
An external ID must have a SYSTEM literal, and may have a PUBLIC identifier. If allow_public_only is True then the method will also allow an external identifier with a PUBLIC identifier but no SYSTEM literal. In this mode the parser behaves as it would when parsing the production:
(ExternalID | PublicID) S?
Returns an
XMLExternalID
instance.
-
resolve_external_id
(external_id, entity=None)¶ [75] ExternalID: resolves an external ID, returning a URI.
- external_id
- A
XMLExternalID
instance. - entity
- An optional
XMLEntity
instance. Can be used to force the resolution of relative URIs to be relative to the base of the given entity. If it is None then the currently open external entity (where available) is used instead.
Returns an instance of
pyslet.rfc2396.URI
or None if the external ID cannot be resolved.The default implementation simply calls
get_location()
with the entity’s base URL and ignores the public ID. Derived parsers may recognize public identifiers and resolve accordingly.
-
parse_ndata_decl
(got_literal=False)¶ [76] NDataDecl
- got_literal
- If True, assumes that the literal ‘NDATA’ has already been parsed.
Returns the name of the notation used by the unparsed entity as a string without the preceding ‘NDATA’ literal.
-
parse_text_decl
(got_literal=False)¶ [77] TextDecl
- got_literal
- If True, assumes that the literal ‘<?xml’ has already been parsed.
Returns an
XMLTextDeclaration
instance.
-
parse_encoding_decl
(got_literal=False)¶ [80] EncodingDecl
- got_literal
- If True, assumes that the literal ‘encoding’ has already been parsed.
Returns the declaration name without the enclosing quotes.
-
parse_enc_name
()¶ [81] EncName
Returns the encoding name as a string or None if no valid encoding name start character was found.
-
parse_notation_decl
(got_literal=False)¶ [82] NotationDecl
- got_literal
- If True, assumes that the literal ‘<!NOTATION’ has already been parsed.
Declares the notation in the
dtd
, (if present). There is no return result.
-
parse_public_id
()¶ [83] PublicID
The literal string is returned without the PUBLIC prefix or the enclosing quotes.