2.3. XML: Parsing XML Documents

This module exposes a number of internal functions typically defined privately in XML parser implementations which make it easier to reuse concepts from XML in other modules. For example, the IsNameStartChar() tells you if a character matches the production for NameStartChar in the XML standard.

class pyslet.xml20081126.parser.XMLParser(entity)

Returns an XMLParser object constructed from the XMLEntity to parse.

XMLParser objects are used to parse entities for the constructs defined by the numbered productions in the XML specification.

XMLParser has a number of optional attributes, all of which default to False. Attributes with names started ‘check’ increase the strictness of the parser. All other parser flags, if set to True, will not result in a conforming XML processor.

DocumentClassTable = {}

A dictionary mapping doctype parameters onto class objects.

For more information about how this is used see GetDocumentClass() and RegisterDocumentClass().

RefModeNone = 0

Default constant used for setting refMode

RefModeInContent = 1

Treat references as per “in Content” rules

RefModeInEntityValue = 4

Treat references as per “in EntityValue” rules

RefModeInDTD = 5

Treat references as per “in DTD” rules

PredefinedEntities = {'amp': '&', 'lt': '<', 'gt': '>', 'apos': "'", 'quot': '"'}

A mapping from the names of the predefined entities (lt, gt, amp, apos, quot) to their replacement characters.

checkValidity = None

checks XML validity constraints

If checkValidity is True, and all other options are left at their default (False) setting then the parser will behave as a validating XML parser.

valid = None

Flag indicating if the document is valid, only set if checkValidity is True.

nonFatalErrors = None

A list of non-fatal errors discovered during parsing, only populated if checkValidity is True.

checkCompatibility = None

checks XML compatibility constraints; will cause checkValidity to be set to True when parsing.

checkAllErrors = None

checks all constraints; will cause checkValidity and checkCompatibility to be set to True when parsing.

raiseValidityErrors = None

treats validity errors as fatal errors

unicodeCompatibility = None

See http://www.w3.org/TR/unicode-xml/

sgmlNamecaseGeneral = None

option that simulates SGML’s NAMECASE GENERAL YES

sgmlNamecaseEntity = None

option that simulates SGML’s NAMECASE ENTITY YES

sgmlOmittag = None

option that simulates SGML’s OMITTAG YES

sgmlShorttag = None

option that simulates SGML’s SHORTTAG YES

sgmlContent = None

This option simulates some aspects of SGML content handling based on class attributes of the element being parsed.

  • Element classes with XMLCONTENT=:py:data:XMLEmpty are treated as elements declared EMPTY, these elements are treated as if they were introduced with an empty element tag even if they weren’t, as per SGML’s rules. Note that this SGML feature “has nothing to do with markup minimization” (i.e., sgmlOmittag.)
refMode = None

The current parser mode for interpreting references.

XML documents can contain five different types of reference: parameter entity, internal general entity, external parsed entity, (external) unparsed entity and character entity.

The rules for interpreting these references vary depending on the current mode of the parser, for example, in content a reference to an internal entity is replaced, but in the definition of an entity value it is not. This means that the behaviour of the ParseReference() method will differ depending on the mode.

The parser takes care of setting the mode automatically but if you wish to use some of the parsing methods in isolation to parse fragments of XML documents, then you will need to set the refMode directly using one of the RefMode* family of constants defined above.

entity = None

The current entity being parsed

declaration = None

The declaration parsed or None.

dtd = None

The documnet type declaration of the document being parsed.

This member is initialised to None as well-formed XML documents are not required to have an associated dtd.

doc = None

The document being parsed.

docEntity = None

The document entity.

element = None

The current element being parsed.

elementType = None

The element type of the current element.

NextChar()

Moves to the next character in the stream.

The current character can always be read from the_char. If there are no characters left in the current entity then entities are popped from an internal entity stack automatically.

PushEntity(entity)

Starts parsing entity

the_char is set to the current character in the entity’s stream. The current entity is pushed onto an internal stack and will be resumed when this entity has been parsed completely.

Note that in the degenerate case where the entity being pushed is empty (or is already positioned at the end of the file) then PushEntity does nothing.

CheckEncoding(entity, declaredEncoding)

Checks the entity against the declared encoding (if any) and the rules on entity encodings.

GetExternalEntity()

Returns the external entity currently being parsed.

If no external entity is being parsed then None is returned.

Standalone()

True if the document being parsed should be treated as standalone.

A document may be declared standalone or it may effectively be standalone due to the absence of a DTD, or the absence of an external DTD subset and parameter entity references.

DeclaredStandalone()

True if the current document was declared standalone.

WellFormednessError(msg='well-formedness error', errorClass=<class 'pyslet.xml20081126.structures.XMLWellFormedError'>)

Raises an XMLWellFormedError error.

Called by the parsing methods whenever a well-formedness constraint is violated. The method takes an optional message string, msg and an optional error class which must be a class object derived from py:class:XMLWellFormednessError.

The method raises an instance of errorClass and does not return. This method can be overridden by derived parsers to implement more sophisticated error logging.

ValidityError(msg='validity error', error=<class 'pyslet.xml20081126.structures.XMLValidityError'>)

Called when the parser encounters a validity error.

The method takes an optional message string, msg and an optional error class or instance which must be a (class) object derived from py:class:XMLValidityError.

The behaviour varies depending on the setting of the checkValidity and raiseValidityErrors options. The default (both False) causes validity errors to be ignored. When checking validity an error message is logged to nonFatalErrors and valid is set to False. Furthermore, if raiseValidityErrors is True error is raised (or a new instance of error is raised) and parsing terminates.

This method can be overridden by derived parsers to implement more sophisticated error logging.

CompatibilityError(msg='compatibility error')

Called when the parser encounters a compatibility error.

The method takes an optional message string, msg.

The behaviour varies depending on the setting of the checkCompatibility flag. The default (False) causes compatibility errors to be ignored. When checking compatibility an error message is logged to nonFatalErrors.

This method can be overridden by derived parsers to implement more sophisticated error logging.

ProcessingError(msg='Processing error')

Called when the parser encounters a general processing error.

The method takes an optional message string, msg and an optional error class or instance which must be a (class) object derived from py:class:XMLProcessingError.

The behaviour varies depending on the setting of the checkAllErrors flag. The default (False) causes processing errors to be ignored. When checking all errors an error message is logged to nonFatalErrors.

This method can be overridden by derived parsers to implement more sophisticated error logging.

ParseLiteral(match)

Parses a literal string, passed in match.

Returns True if match is successfully parsed and False otherwise. There is no partial matching, if match is not found then the parser is left in its original position.

ParseRequiredLiteral(match, production='Literal String')

Parses a required literal string raising a wellformed error if not matched.

production is an optional string describing the context in which the literal was expected.

ParseDecimalDigits()

Parses a, possibly empty, string of decimal digits matching [0-9]*.

ParseRequiredDecimalDigits(production='Digits')

Parses a required sring of decimal digits matching [0-9]+.

production is an optional string describing the context in which the digits were expected.

ParseHexDigits()

Parses a, possibly empty, string of hexadecimal digits matching [0-9a-fA-F].

ParseRequiredHexDigits(production='Hex Digits')

Parses a required sring of hexadecimal digits matching [0-9a-fA-F].

production is an optional string describing the context in which the hexadecimal digits were expected.

ParseQuote(q=None)

Parses the quote character, q, or one of “’” or ‘”’ if q is None.

Returns the character parsed or raises a well formed error.

ParseDocument(doc=None)

[1] document: parses an Document.

doc is the Document instance that will be parsed. The declaration, dtd and elements are added to this document. If doc is None then a new instance is created using GetDocumentClass() to identify the correct class to use to represent the document based on information in the prolog or, if the prolog lacks a declaration, the root element.

This method returns the document that was parsed, an instance of Document.

GetDocumentClass(dtd)

Returns a class object derived from Document suitable for representing a document with the given document type declaration.

In cases where no doctype declaration is made a dummy declaration is created based on the name of the root element. For example, if the root element is called “database” then the dtd is treated as if it was declared as follows:

<!DOCTYPE database>

This default implementation uses the following three pieces of information to locate class registered with RegisterDocumentClass(). The PublicID, SystemID and the name of the root element. If an exact match is not found then wildcard matches are attempted, ignoring the SystemID, PublicID and finally the root element in turn. If a document class still cannot be found then wildcard matches are tried matching only the PublicID, SystemID and root element in turn.

If no document class cab be found, Document is returned.

IsS()

By default just calls the module level IsS()

In Unicode compatibility mode the function maps the unicode white space characters at code points 2028 and 2029 to line feed and space respectively.

ParseS()

[3] S: Parses white space from the stream matching the production for S.

If there is no white space at the current position then an empty string is returned.

The productions in the specification do not make explicit mention of parameter entity references, they are covered by the general statement that “Parameter entity references are recognized anwhere in the DTD...” In practice, this means that while parsing the DTD, anywhere that an S is permitted a parameter entity reference may also be recognized. This method implements this behaviour, recognizing parameter entity references within S when refMode is RefModeInDTD.

ParseRequiredS(production='[3] S')

[3] S: Parses required white space from the stream.

If there is no white space then a well-formedness error is raised. production is an optional string describing the context in which the space was expected.

ParseName()

[5] Name: parses a Name

The name is returned as a unicode string. If no Name can be parsed then None is returned.

ParseRequiredName(production='Name')

[5] Name: Parses a required Name.

If no name can be parsed then a well-formed error is raised.

ParseNames()

[6] Names: parses a list of Names.

This method returns a tuple of unicode strings. If no names can be parsed then None is returned.

ParseNmtoken()

[7] Nmtoken: parses a single Nmtoken.

If no Nmtoken can be parsed then None is returned.

ParseNmtokens()

[8] Nmtokens: parses a list of Nmtokens.

This method returns a tuple of unicode strings. If no tokens can be parsed then None is returned.

ParseEntityValue()

[9] EntityValue: parses an EntityValue, returning it as a unicode string.

This method automatically expands other parameter entity references but does not expand general or character references.

ParseAttValue()

[10] AttValue: parses an attribute value.

The value is returned without the surrounding quotes and with any references expanded.

The behaviour of this method is affected significantly by the setting of the dontCheckWellFormedness flag. When set, attribute values can be parsed without surrounding quotes. For compatibility with SGML these values should match one of the formal value types (e.g., Name) but this is not enforced so values like width=100% can be parsed without error.

ParseSystemLiteral()

[11] SystemLiteral: Parses a literal value matching the production for SystemLiteral.

The value of the literal is returned as a string without the enclosing quotes.

ParsePubidLiteral()

[12] PubidLiteral: Parses a literal value matching the production for PubidLiteral.

The value of the literal is returned as a string without the enclosing quotes.

ParseCharData()

[14] CharData: parses a run of character data

The method adds the parsed data to the current element. In the default parsing mode it returns None.

When the parser option sgmlOmittag is selected the method returns any parsed character data that could not be added to the current element due to a model violation. Note that in this SGML-like mode any S is treated as being in the current element as the violation doesn’t occurr until the first non-S character (so any implied start tag is treated as being immediately prior to the first non-S).

parse_comment(gotLiteral=False)

[15] Comment: parses a comment.

If gotLiteral is True then the method assumes that the ‘<!–’ literal has already been parsed.

ParsePI(gotLiteral=False)

[16] PI: parses a processing instruction.

This method calls the Node.ProcessingInstruction() of the current element or of the document if no element has been parsed yet.

If gotLiteral is True the method assumes the ‘<?’ literal has already been parsed.

ParsePITarget()

[17] PITarget: parses a processing instruction target name

ParseCDSect(gotLiteral=False, cdEnd=u']]>')

[18] CDSect: parses a CDATA section.

This method adds any parsed data to the current element.

If gotLiteral is True then the method assumes the initial literal has already been parsed. (By default, CDStart.) The literal used to signify the end of the CDATA section can be overridden by passing an alternative literal in cdEnd.

ParseCDStart()

[19] CDStart: parses the literal that starts a CDATA section.

ParseCData(cdEnd=']]>')

[20] CData: parses a run of CData up to but not including cdEnd.

This method adds any parsed data to the current element.

ParseCDEnd()

[21] CDEnd: parses the end of a CDATA section.

ParseProlog()

[22] prolog: parses the document prolog, including the XML declaration and dtd.

ParseXMLDecl(gotLiteral=False)

[23] XMLDecl: parses an XML declaration.

This method returns an XMLDeclaration instance. Also, if an encoding is given in the declaration then the method changes the encoding of the current entity to match. For more information see ChangeEncoding().

If gotLiteral is True the initial literal ‘<?xml’ is assumed to have already been parsed.

ParseVersionInfo(gotLiteral=False)

[24] VersionInfo: parses XML version number.

The version number is returned as a string. If gotLiteral is True then it is assumed that the preceding white space and ‘version’ literal have already been parsed.

ParseEq(production='[25] Eq')

[25] Eq: parses an equal sign, optionally surrounded by white space

ParseVersionNum()

[26] VersionNum: parses the XML version number, returns it as a string.

ParseMisc()

[27] Misc: parses multiple Misc items.

This method parses everything that matches the production Misc*

ParseDoctypedecl(gotLiteral=False)

[28] doctypedecl: parses a doctype declaration.

This method creates a new instance of XMLDTD and assigns it to dtd, it also returns this instance as the result.

If gotLiteral is True the method assumes that the initial literal ‘<!DOCTYPE’ has already been parsed.

ParseDeclSep()

[28a] DeclSep: parses a declaration separator.

ParseIntSubset()

[28b] intSubset: parses an internal subset.

ParseMarkupDecl(gotLiteral=False)

[29] markupDecl: parses a markup declaration.

Returns True if a markupDecl was found, False otherwise.

ParseExtSubset()

[30] extSubset: parses an external subset

ParseExtSubsetDecl()

[31] extSubsetDecl: parses declarations in the external subset.

CheckPEBetweenDeclarations(checkEntity)

[31] extSubsetDecl: checks the well-formedness constraint on use of PEs between declarations.

checkEntity is the entity we should still be in!

ParseSDDecl(gotLiteral=False)

[32] SDDecl: parses a standalone declaration

Returns True if the document should be treated as standalone; False otherwise.

ParseElement()

[39] element: parses an element, including its content.

The class used to represent the element is determined by calling the GetElementClass() method of the current document. If there is no document yet then a new document is created automatically (see ParseDocument() for more information).

The element is added as a child of the current element using Node.ChildElement().

The method returns:

  • True: indicates that an element was parsed normally
  • False: indicates that the element is not allowed in this context

The second case only occurs when the sgmlOmittag option is in use and it indicates that the content of the enclosing element has ended. The Tag is buffered so that it can be reparsed when the stack of nested ParseContent() and ParseElement() calls is unwound to the point where it is allowed by the context.

CheckAttributes(name, attrs)

Checks attrs against the declarations for element name.

This method will add any omitted defaults to the attribute list. Also, checking the validity of the attributes may result in values being further normalized as per the rules for collapsing spaces in tokenized values.

MatchXMLName(element, name)

Tests if name is a possible name for this element.

This method is used by the parser to determine if an end tag is the end tag of this element. It is provided a separate method to allow it to be overridden by derived parsers

CheckExpectedParticle(name)

Tests if <name> fits with the cursor and raises a validity error if not.

An empty string for name indicates the enclosing end tag was found.

The method updates the current cursor as appropriate.

GetSTagClass(name, attrs=None)

[40] STag: returns information suitable for starting element name with attributes attrs in the current context

If there is no Document instance yet this method assumes that it is being called for the root element and selects an appropriate class based on the contents of the prolog and/or name.

When using the sgmlOmittag option name may be None indicating that the method should return information about the element implied by PCDATA in the current context (only called when an attempt to add data to the current context has already failed).

The result is a triple of:

  • elementClass: the element class that this STag must introduce or None if this STag does not belong (directly or indirectly) in the current context
  • elementName: the name of the element (to pass to ChildElement) or None to use the default
  • buffFlag: True indicates an omitted tag and that the triggering STag (i.e., the STag with name name) should be buffered.
ParseSTag()

[40] STag, [44] EmptyElemTag: parses a start tag or an empty element tag.

This method returns a triple of name, attrs, emptyFlag where:

  • name is the name of the element parsed.
  • attrs is a dictionary of attribute values keyed by attribute name
  • emptyFlag is a boolean; True indicates that the tag was an empty element tag.
ParseAttribute()

[41] Attribute: parses an attribute

Returns name, value where:

  • name is the name of the attribute or None if sgmlShorttag is True and a short form attribute value was supplied.
  • value is the attribute value.

If dontCheckWellFormedness the parser uses a very generous form of parsing attribute values to accomodate common syntax errors.

ParseETag(gotLiteral=False)

[42] ETag: parses an end tag

If gotLiteral is True then the method assumes the initial ‘</’ literal has been parsed alread.

The method returns the name of the end element parsed.

ParseContent()

[43] content: parses the content of an element.

The method returns:

  • True: indicates that the content was parsed normally
  • False: indicates that the content contained data or markup not allowed in this context

The second case only occurs when the sgmlOmittag option is in use and it indicates that the enclosing element has ended (i.e., the element’s ETag has been omitted). See py:meth:ParseElement for more information.

HandleData(data, cdata=False)

[43] content: handles character data in content.

When validating, the data is checked to see if it is optional white space. However, if cdata is True the data is treated as character data (even if it matches the production for S).

UnhandledData(data)

[43] content: manages unhandled data in content.

This method is only called when the sgmlOmittag option is in use. It processes data that occurs in a context where data is not allowed.

It returns a boolean result:

  • True: the data was consumed by a sub-element (with an omitted start tag)
  • False: the data has been buffered and indicates the end of the current content (an omitted end tag).
ParseEmptyElemTag()

[44] EmptyElemTag: there is no method for parsing empty element tags alone.

This method raises NotImplementedError. Instead, you should call ParseSTag() and examine the result. If it returns False then an empty element was parsed.

ParseElementDecl(gotLiteral=False)

[45] elementdecl: parses an element declaration

If gotLiteral is True the method assumes that the ‘<!ELEMENT’ literal has already been parsed.

ParseContentSpec(eType)

[46] contentspec: parses the content specification for an element type

ParseChildren(gotLiteral=False, groupEntity=None)

[47] children: parses an element content model comprising children.

If gotLiteral is True the method assumes that the initial ‘(‘ literal has already been parsed, including any following white space.

The method returns an instance of XMLContentParticle.

ParseCP()

[48] cp: parses a content particle

ParseChoice(firstChild=None, groupEntity=None)

[49] choice: parses a sequence of content particles.

firstChild is an optional XMLContentParticle instance. If present the method assumes that the first particle and any following white space has already been parsed. If firstChild is given then groupEntity must be the entity in which the opening ‘(‘ was parsed which started the choice group.

ParseSeq(firstChild=None, groupEntity=None)

[50] seq: parses a sequence of content particles.

firstChild is an optional XMLContentParticle instance. If present the method assumes that the first particle and any following white space has already been parsed.

ParseMixed(gotLiteral=False, groupEntity=None)

[51] Mixed: parses a mixed content type.

If gotLiteral is True the method assumes that the #PCDATA literal has already been parsed. In this case, groupEntity must be set to the entity which contained the opening ‘(‘ literal.

Returns an instance of XMLChoiceList with occurrence ZeroOrMore representing the list of elements that may appear in the mixed content model. If the mixed model contains #PCDATA only then the choice list will be empty.

ParseAttlistDecl(gotLiteral=False)

[52] AttlistDecl: parses an attribute list definition.

If gotLiteral is True the method assumes that the ‘<!ATTLIST’ literal has already been parsed.

ParseAttDef(gotS=False)

[53] AttDef: parses an attribute definition.

If gotS is True the method assumes that the leading S has already been parsed.

Returns an instance of XMLAttributeDefinition.

ParseAttType(a)

[54] AttType: parses an attribute type.

a must be an XMLAttributeDefinition instance. This method sets the type and values fields of a.

Note that, to avoid unnecessary look ahead, this method does not call ParseStringType() or ParseEnumeratedType().

ParseStringType(a)

[55] StringType: parses an attribute’s string type.

This method is provided for completeness. It is not called during normal parsing operations.

a must be an XMLAttributeDefinition instance. This method sets the type and values fields of a.

ParseTokenizedType(a)

[56] TokenizedType: parses an attribute’s tokenized type.

a must be an XMLAttributeDefinition instance. This method sets the type and values fields of a.

ParseEnumeratedType(a)

[57] EnumeratedType: parses an attribute’s enumerated type.

This method is provided for completeness. It is not called during normal parsing operations.

a must be an XMLAttributeDefinition instance. This method sets the type and values fields of a.

ParseNotationType(gotLiteral=False)

[58] NotationType: parses a notation type.

If gotLiteral is True the method assumes that the leading ‘NOTATION’ literal has already been parsed.

Returns a list of strings representing the names of the declared notations being referred to.

ParseEnumeration()

[59] Enumeration: parses an enumeration.

Returns a dictionary of strings representing the tokens in the enumeration.

ParseDefaultDecl(a)

[60] DefaultDecl: parses an attribute’s default declaration.

a must be an XMLAttributeDefinition instance. This method sets the defaultValue fields of a.

ParseConditionalSect(gotLiteralEntity=None)

[61] conditionalSect: parses a conditional section.

If gotLiteralEntity is set to an XMLEntity object the method assumes that the initial literal ‘<![‘ has already been parsed from that entity.

ParseIncludeSect(gotLiteralEntity=None)

[62] includeSect: parses an included section.

If gotLiteralEntity is set to an XMLEntity object the method assumes that the production, up to and including the keyword ‘INCLUDE’ has already been parsed and that the opening ‘<![‘ literal was parsed from that entity.

ParseIgnoreSect(gotLiteralEntity=None)

[63] ignoreSect: parses an ignored section.

If gotLiteralEntity is set to an XMLEntity object the method assumes that the production, up to and including the keyword ‘IGNORE’ has already been parsed and that the opening ‘<![‘ literal was parsed from that entity.

ParseIgnoreSectContents()

[64] ignoreSectContents: parses the contents of an ignored section.

The method returns no data.

ParseIgnore()

[65] Ignore: parses a run of characters in an ignored section.

This method returns no data.

ParseCharRef(gotLiteral=False)

[66] CharRef: parses a character reference.

If gotLiteral is True the method assumes that the leading ‘&’ literal has already been parsed.

The method returns a unicode string containing the character referred to.

ParseReference()

[67] Reference: parses a reference.

This method returns any data parsed as a result of the reference. For a character reference this will be the character referred to. For a general entity the data returned will depend on the parsing context. For more information see ParseEntityRef().

ParseEntityRef(gotLiteral=False)

[68] EntityRef: parses a general entity reference.

If gotLiteral is True the method assumes that the leading ‘&’ literal has already been parsed.

This method returns any data parsed as a result of the reference. For example, if this method is called in a context where entity references are bypassed then the string returned will be the literal characters parsed, e.g., “&ref;”.

If the entity reference is parsed successfully in a context where Entity references are recognized, the reference is looked up according to the rules for validating and non-validating parsers and, if required by the parsing mode, the entity is opened and pushed onto the parser so that parsing continues with the first character of the entity’s replacement text.

A special case is made for the predefined entities. When parsed in a context where entity references are recognized these entities are expanded immediately and the resulting character returned. For example, the entity &amp; returns the ‘&’ character instead of pushing an entity with replacement text ‘&#38;’.

Inclusion of an unescaped & is common so when we are not checking well- formedness we treat ‘&’ not followed by a name as if it were ‘&amp;’. Similarly we are generous about the missing ‘;’.

LookupPredefinedEntity(name)

Utility function used to look up pre-defined entities, e.g., “lt”

This method can be overridden by variant parsers to implement other pre-defined entity tables.

ParsePEReference(gotLiteral=False)

[69] PEReference: parses a parameter entity reference.

If gotLiteral is True the method assumes that the initial ‘%’ literal has already been parsed.

This method returns any data parsed as a result of the reference. Normally this will be an empty string because the method is typically called in contexts where PEReferences are recognized. However, if this method is called in a context where PEReferences are not recognized the returned string will be the literal characters parsed, e.g., “%ref;”

If the parameter entity reference is parsed successfully in a context where PEReferences are recognized, the reference is looked up according to the rules for validating and non-validating parsers and, if required by the parsing mode, the entity is opened and pushed onto the parser so that parsing continues with the first character of the entity’s replacement text.

ParseEntityDecl(gotLiteral=False)

[70] EntityDecl: parses an entity declaration.

Returns an instance of either XMLGeneralEntity or XMLParameterEntity depending on the type of entity parsed. If gotLiteral is True the method assumes that the leading ‘<!ENTITY’ literal has already been parsed.

ParseGEDecl(gotLiteral=False)

[71] GEDecl: parses a general entity declaration.

Returns an instance of XMLGeneralEntity. If gotLiteral is True the method assumes that the leading ‘<!ENTITY’ literal and the required S have already been parsed.

ParsePEDecl(gotLiteral=False)

[72] PEDecl: parses a parameter entity declaration.

Returns an instance of XMLParameterEntity. If gotLiteral is True the method assumes that the leading ‘<!ENTITY’ literal and the required S have already been parsed.

ParseEntityDef(ge)

[73] EntityDef: parses the definition of a general entity.

The general entity being parsed must be passed in ge. This method sets the definition and notation fields from the parsed entity definition.

ParsePEDef(pe)

[74] PEDef: parses a parameter entity definition.

The parameter entity being parsed must be passed in pe. This method sets the definition field from the parsed parameter entity definition.

ParseExternalID(allowPublicOnly=False)

[75] ExternalID: parses an external ID returning an XMLExternalID instance.

An external ID must have a SYSTEM literal, and may have a PUBLIC identifier. If allowPublicOnly is True then the method will also allow an external identifier with a PUBLIC identifier but no SYSTEM literal. In this mode the parser behaves as it would when parsing the production:

(ExternalID | PublicID) S?
ResolveExternalID(externalID, entity=None)

[75] ExternalID: resolves an external ID, returning a URI reference.

Returns an instance of pyslet.rfc2396.URI or None if the external ID cannot be resolved.

entity can be used to force the resolution of relative URI to be relative to the base of the given entity. If it is None then the currently open external entity (where available) is used instead.

The default implementation simply calls GetLocation() with the entities base URL and ignores the public ID. Derived parsers may recognize public identifiers and resolve accordingly.

ParseNDataDecl(gotLiteral=False)

[76] NDataDecl: parses an unparsed entity notation reference.

Returns the name of the notation used by the unparsed entity as a string without the preceding ‘NDATA’ literal.

ParseTextDecl(gotLiteral=False)

[77] TextDecl: parses a text declataion.

Returns an XMLTextDeclaration instance.

ParseEncodingDecl(gotLiteral=False)

[80] EncodingDecl: parses an encoding declaration

Returns the declaration name without the enclosing quotes. If gotLiteral is True then the method assumes that the literal ‘encoding’ has already been parsed.

ParseEncName()

[81] EncName: parses an encoding declaration name

Returns the encoding name as a string or None if no valid encoding name start character was found.

ParseNotationDecl(gotLiteral=False)

[82] NotationDecl: Parses a notation declaration matching production NotationDecl

This method assumes that the literal ‘<!NOTATION’ has already been parsed. It declares the notation in the dtd.

ParsePublicID()

[83] PublicID: Parses a literal matching the production for PublicID.

The literal string is returned without the PUBLIC prefix or the enclosing quotes.