yari-xml-parser
Fault-tolerant XML parser producing a typed, fully-located AST. Handles simple and complex tags, self-closing tags, comments, CDATA sections, DOCTYPE declarations, XML prolog, and namespaces. When the parser encounters malformed input it emits an error node and resumes — never throws.
Installation
// Gradle (Groovy DSL)
implementation 'com.easyparsingapi:yari-xml-parser:VERSION'
// Maven
<dependency>
<groupId>com.easyparsingapi</groupId>
<artifactId>yari-xml-parser</artifactId>
<version>VERSION</version>
</dependency>
XmlParser — Entry Points
All methods are static. XmlParser is not instantiable.
| parseUnit(String, XmlConfig) | AstResult<Xml> | Parse a raw XML string; returns the AST plus the full token list |
| parseUnit(List<Token>, XmlConfig) | AstResult<Xml> | Parse a pre-tokenised list; returns the AST plus the full token list |
XmlConfig — Parser Configuration
| static builder() | Builder | Start a fluent configuration |
| acceptUnclosedTag(boolean) | Builder | When true, accept tags that are not explicitly self-closed; when false, they produce an XmlError node |
| tagAsPlainText(String) | Builder | Tokenise the named tag's content as raw text instead of XML markup (also accepts a Markup or a Function<TagEntity, Boolean> predicate) |
| build() | XmlConfig | Materialise the configuration |
Xml — Root AST Node
Implements AstUnit. Root of a parsed XML document.
| getNodes() | List<XmlNode> | Top-level nodes (prolog, DOCTYPE, root element, comments…) |
| astComments() | List<AstComment> | |
| astCommentsOf(AstNode, Position…) | List<AstComment> |
AST Nodes
Structural nodes implement XmlNode and carry a source location (XmlError extends AstError).
Tag structure
| Node | Description |
|---|---|
Xml |
Root — complete XML document |
Tag |
Complete tag: <foo>…</foo> |
TagHead |
Tag opening: <foo attr="val"> |
TagBody |
Tag body: content between <foo> and </foo> |
TagFoot |
Tag closing: </foo> |
TagEmpty |
Self-closing tag: <br/>, <img src="…"/> |
TagSimple |
Interface — a tag with a name and attributes (Tag, TagEmpty…) |
TagComplex<B> |
Interface — a tag with head, body and foot (extends TagSimple) |
TagAttribute |
Attribute: name="value" |
TagName |
Tag name, with optional namespace (extends Markup) |
Other nodes
| Node | Description |
|---|---|
Text |
Text content between tags |
XmlComment |
Comment: <!-- … --> |
CData |
CDATA section: <![CDATA[…]]> |
Prolog |
Prolog: <?xml version="1.0"?> |
DocType |
Declaration: <!DOCTYPE …> |
Markup |
Base class for tag/attribute identifiers |
XmlIdentifier |
Interface — a qualified XML identifier (with namespace) |
XmlError |
Error node (fault-tolerant parsing) |
Error Recovery
The parser continues past malformed input by emitting an XmlError node and resuming. Enable fault-tolerance via XmlConfig:
Xml xml = XmlParser.parseUnit("""
<root>
<item>ok</item>
<broken
<item>still parsed</item>
</root>
""", XmlConfig.builder()
.acceptUnclosedTag(true)
.build()).unit();
xml.astStream()
.filter(n -> n instanceof XmlError)
.forEach(n -> System.out.println("error at " + n.getSourceLocation()));
The parser never throws. Every recoverable error produces an XmlError node with its source location and a human-readable failure message.
For the full code-level reference, see the README on GitHub and the Javadoc-annotated source under yari-xml-parser/src/main/java/.