yari-xml-parser

Fault-tolerant XML parser producing a typed, fully-located AST. Handles simple and complex tags, self-closing tags, comments, CDATA sections, DOCTYPE declarations, XML prolog, and namespaces. When the parser encounters malformed input it emits an error node and resumes — never throws.

Installation

// Gradle (Groovy DSL)
implementation 'com.easyparsingapi:yari-xml-parser:VERSION'

// Maven
<dependency>
    <groupId>com.easyparsingapi</groupId>
    <artifactId>yari-xml-parser</artifactId>
    <version>VERSION</version>
</dependency>

XmlParser — Entry Points

All methods are static. XmlParser is not instantiable.

class XmlParser (static)
parseUnit(String, XmlConfig) AstResult<Xml> Parse a raw XML string; returns the AST plus the full token list
parseUnit(List<Token>, XmlConfig) AstResult<Xml> Parse a pre-tokenised list; returns the AST plus the full token list

XmlConfig — Parser Configuration

class XmlConfig with builder
static builder() Builder Start a fluent configuration
acceptUnclosedTag(boolean) Builder When true, accept tags that are not explicitly self-closed; when false, they produce an XmlError node
tagAsPlainText(String) Builder Tokenise the named tag's content as raw text instead of XML markup (also accepts a Markup or a Function<TagEntity, Boolean> predicate)
build() XmlConfig Materialise the configuration

Xml — Root AST Node

Implements AstUnit. Root of a parsed XML document.

class Xml implements AstUnit
getNodes() List<XmlNode> Top-level nodes (prolog, DOCTYPE, root element, comments…)
astComments() List<AstComment>
astCommentsOf(AstNode, Position…) List<AstComment>

AST Nodes

Structural nodes implement XmlNode and carry a source location (XmlError extends AstError).

Tag structure

Tag structure com.easyparsingapi.yari.parser.xml.ast
NodeDescription
Xml Root — complete XML document
Tag Complete tag: <foo>…</foo>
TagHead Tag opening: <foo attr="val">
TagBody Tag body: content between <foo> and </foo>
TagFoot Tag closing: </foo>
TagEmpty Self-closing tag: <br/>, <img src="…"/>
TagSimple Interface — a tag with a name and attributes (Tag, TagEmpty…)
TagComplex<B> Interface — a tag with head, body and foot (extends TagSimple)
TagAttribute Attribute: name="value"
TagName Tag name, with optional namespace (extends Markup)

Other nodes

Other nodes com.easyparsingapi.yari.parser.xml.ast
NodeDescription
Text Text content between tags
XmlComment Comment: <!-- … -->
CData CDATA section: <![CDATA[…]]>
Prolog Prolog: <?xml version="1.0"?>
DocType Declaration: <!DOCTYPE …>
Markup Base class for tag/attribute identifiers
XmlIdentifier Interface — a qualified XML identifier (with namespace)
XmlError Error node (fault-tolerant parsing)

Error Recovery

The parser continues past malformed input by emitting an XmlError node and resuming. Enable fault-tolerance via XmlConfig:

Xml xml = XmlParser.parseUnit("""
    <root>
        <item>ok</item>
        <broken
        <item>still parsed</item>
    </root>
    """, XmlConfig.builder()
        .acceptUnclosedTag(true)
        .build()).unit();

xml.astStream()
   .filter(n -> n instanceof XmlError)
   .forEach(n -> System.out.println("error at " + n.getSourceLocation()));

The parser never throws. Every recoverable error produces an XmlError node with its source location and a human-readable failure message.

For the full code-level reference, see the README on GitHub and the Javadoc-annotated source under yari-xml-parser/src/main/java/.