Using the Pull Parser

This API is experimental, and subject to change. parser = PullParser.new( “<a>text<b att=‘val’/>txet” ) while parser.has_next?

res = parser.next
puts res[1]['att'] if res.start_tag? and res[0] == 'b'

end See the PullEvent class for information on the content of the results. The data is identical to the arguments passed for the various events to the StreamListener API.

Notice that: parser = PullParser.new( “<a>BAD DOCUMENT” ) while parser.has_next?

res = parser.next
raise res[1] if res.error?

end

Nat Price gave me some good ideas for the API.

Constants

LETTER = '[[:alpha:]]'

DIGIT = '[[:digit:]]'

COMBININGCHAR = ''

EXTENDER = ''

NCNAME_STR = "[#{LETTER}_:][-#{LETTER}#{DIGIT}._:#{COMBININGCHAR}#{EXTENDER}]*"

NAME_STR = "(?:(#{NCNAME_STR}):)?(#{NCNAME_STR})"

UNAME_STR = "(?:#{NCNAME_STR}:)?#{NCNAME_STR}"

NAMECHAR = '[\-\w\d\.:]'

NAME = "([\\w:]#{NAMECHAR}*)"

NMTOKEN = "(?:#{NAMECHAR})+"

NMTOKENS = "#{NMTOKEN}(\\s+#{NMTOKEN})*"

REFERENCE = "&(?:#{NAME};|#\\d+;|#x[0-9a-fA-F]+;)"

REFERENCE_RE = /#{REFERENCE}/

DOCTYPE_START = /\A\s*

DOCTYPE_PATTERN = /\s*)/um

ATTRIBUTE_PATTERN = /\s*(#{NAME_STR})\s*=\s*(["'])(.*?)\4/um

COMMENT_START = /\A/um

CDATA_START = /\A

CDATA_END = /^\s*\]\s*>/um

CDATA_PATTERN = //um

XMLDECL_START = /\A<\?xml\s/u;

XMLDECL_PATTERN = /<\?xml\s+(.*?)\?>/um

INSTRUCTION_START = /\A<\?/u

INSTRUCTION_PATTERN = /<\?(.*?)(\s+.*?)?\?>/um

TAG_MATCH = /^<((?>#{NAME_STR}))\s*((?>\s+#{UNAME_STR}\s*=\s*(["']).*?\5)*)\s*(\/)?>/um

CLOSE_MATCH = /^\s*<\/(#{NAME_STR})\s*>/um

VERSION = /\bversion\s*=\s*["'](.*?)['"]/um

ENCODING = /\bencoding\s*=\s*["'](.*?)['"]/um

STANDALONE = /\bstandalone\s*=\s["'](.*?)['"]/um

ENTITY_START = /^\s*

IDENTITY = /^([!\*\w\-]+)(\s+#{NCNAME_STR})?(\s+["'](.*?)['"])?(\s+['"](.*?)["'])?/u

ELEMENTDECL_START = /^\s*

ELEMENTDECL_PATTERN = /^\s*(/um

SYSTEMENTITY = /^\s*(%.*?;)\s*$/um

ENUMERATION = "\\(\\s*#{NMTOKEN}(?:\\s*\\|\\s*#{NMTOKEN})*\\s*\\)"

NOTATIONTYPE = "NOTATION\\s+\\(\\s*#{NAME}(?:\\s*\\|\\s*#{NAME})*\\s*\\)"

ENUMERATEDTYPE = "(?:(?:#{NOTATIONTYPE})|(?:#{ENUMERATION}))"

ATTTYPE = "(CDATA|ID|IDREF|IDREFS|ENTITY|ENTITIES|NMTOKEN|NMTOKENS|#{ENUMERATEDTYPE})"

ATTVALUE = "(?:\"((?:[^<&\"]|#{REFERENCE})*)\")|(?:'((?:[^<&']|#{REFERENCE})*)')"

DEFAULTDECL = "(#REQUIRED|#IMPLIED|(?:(#FIXED\\s+)?#{ATTVALUE}))"

ATTDEF = "\\s+#{NAME}\\s+#{ATTTYPE}\\s+#{DEFAULTDECL}"

ATTDEF_RE = /#{ATTDEF}/

ATTLISTDECL_START = /^\s*

ATTLISTDECL_PATTERN = /^\s*/um

NOTATIONDECL_START = /^\s*

PUBLIC = /^\s*/um

SYSTEM = /^\s*/um

TEXT_PATTERN = /\A([^<]*)/um

PUBIDCHAR = "\x20\x0D\x0Aa-zA-Z0-9\\-()+,./:=?;!*@$_%#"

SYSTEMLITERAL = %Q{((?:"[^"]*")|(?:'[^']*'))}

PUBIDLITERAL = %Q{("[#{PUBIDCHAR}']*"|'[#{PUBIDCHAR}]*')}

EXTERNALID = "(?:(?:(SYSTEM)\\s+#{SYSTEMLITERAL})|(?:(PUBLIC)\\s+#{PUBIDLITERAL}\\s+#{SYSTEMLITERAL}))"

NDATADECL = "\\s+NDATA\\s+#{NAME}"

PEREFERENCE = "%#{NAME};"

ENTITYVALUE = %Q{((?:"(?:[^%&"]|#{PEREFERENCE}|#{REFERENCE})*")|(?:'([^%&']|#{PEREFERENCE}|#{REFERENCE})*'))}

PEDEF = "(?:#{ENTITYVALUE}|#{EXTERNALID})"

ENTITYDEF = "(?:#{ENTITYVALUE}|(?:#{EXTERNALID}(#{NDATADECL})?))"

PEDECL = ""

GEDECL = ""

ENTITYDECL = /\s*(?:#{GEDECL})|(?:#{PEDECL})/um

EREFERENCE = /&(?!#{NAME};)/

DEFAULT_ENTITIES = { 'gt' => [/>/, '>', '>', />/], 'lt' => [/</, '<', '<', / [/"/, '"', '"', /"/], "apos" => [/'/, "'", "'", /'/] }

MISSING_ATTRIBUTE_QUOTES = /^<#{NAME_STR}\s+#{NAME_STR}\s*=\s*[^"']/um

Attributes

[R] source
Show files where this class is defined (1 file)
Register or log in to add new notes.