Fast, resilient parsers: JavaScript · HTML · CSS · XML And A powerful parser-combinator engine written in Java
Four distinct, fast and fault-tolerant parsers — one each for
JavaScript, HTML, CSS and XML — built on a shared parser-combinator engine
and producing a unified, fully-located AST.
Parse a full HTML page — with its embedded JavaScript and CSS — and walk the entire tree
in one pass. Every node carries exact source locations. Comments are first-class.
Engineered for throughput, they parse large files and minified bundles quickly,
in a single linear pass.
And the core is open: use the same yari-parsec engine to build your own
fast, resilient parser for any language or DSL.
AI coding assistants and static-analysis tools are most useful when they can reason about
source code at a structural level — not as raw text, but as a typed, located tree.
Yari is designed to be that building block: parse any web source in one line,
walk the result with a typed API, query nodes by type, and read exact locations.
A clean, typed, fully-located AST gives an LLM a much stronger signal than raw text,
improving analyses, refactoring suggestions, and overall code comprehension.
Each AST node is fully serializable to JSON out of the box via Jackson — persist a parsed
tree, send it over the wire, or reload it without re-parsing.
The unified tree model means one analysis pipeline covers HTML structure, embedded scripts,
stylesheets, and standalone JS/CSS/XML files — all in the same shape.
A philosophy of resilience
Real-world web code is messy. HTML pages in the wild have unclosed tags, broken nesting,
and mixed casing. JavaScript in the wild has syntax errors and unusual patterns.
Yari was built around one constraint: never throw, never stop.
When the parser encounters an unrecognisable combination, it emits an error node
into the AST and resumes parsing — so you still get the maximum number of recognisable
nodes from a broken or partially-malformed source.
This degraded-parse behaviour is not an afterthought bolted onto each parser.
It is a first-class primitive of yari-parsec, the combinator engine underneath,
which makes it composable across every language in the framework.
Perfect for linting, static analysis, or any pipeline that must keep going on imperfect inputs.
Build your own parser
Yari is more than four language parsers. At its core is yari-parsec, a general
parser-combinator engine: compose small, well-typed Parser<T>
values to tokenize, build expression grammars with operator precedence, track source locations,
and recover from errors. It is the exact machinery the JavaScript, HTML, CSS and XML parsers
are themselves built on.
Use it to implement your own fast, resilient parser for any language or DSL —
one linear pass, error nodes instead of exceptions, and an OperatorTable for
precedence and associativity. The Quick start walks through a
complete mini-language built entirely on the engine.
Why Yari
A small set of focused, fast and resilient parsers that compose into a unified AST
without any external dependencies beyond the JVM.
One AST, all sub-languages
The HTML parser also parses the CSS and JavaScript embedded inside the HTML
(<script>, <style>, onclick, style attributes…)
and exposes everything as a single unified AST.
Stream over the nodes of a full web page — HTML, CSS and JS — in one pass.
Degraded parsing — never throw
When the parser encounters an unrecognisable combination it emits an error node
and resumes — you always get the maximum number of recognisable nodes.
Ideal for linting, static analysis, or any pipeline that must handle imperfect inputs.
Exact source locations
Every AST node carries its precise location in the source: offsets, lines, and columns.
Critical for diagnostics, refactoring tools, code-intel features, and any downstream
consumer that needs to point back at the original text.
Comments are first-class
Comments are parsed in every supported language and kept with their own source locations.
A dedicated API lets you look them up relative to any AST node (leading, trailing, inside…),
so you never lose the connection between code and its documentation.
Simple one-liner API
Parsing a source is a single method call. The resulting AST is a plain typed tree
you can walk, query, or stream over.
Easy to plug into existing JVM tooling — no configuration, no setup.
JSON serialization
Every AST node is fully serializable and deserializable to JSON out of the box
(via Jackson). Persist a parsed tree, send it over the wire, or reload it later
without re-parsing the original source.
AI-friendly
A clean, typed, located AST gives an LLM a much stronger signal than raw source text.
Yari is designed to be a building block for AI-driven code understanding —
improving analyses, refactoring suggestions, and code comprehension.
Parser-combinator engine
The framework ships its own parser-combinator library (yari-parsec),
which makes degraded-parse behaviour composable across every language.
Error recovery is a first-class primitive of the combinators, not an afterthought.
Heavily tested
Each module ships with an extensive test suite covering as many edge cases as possible —
and it keeps growing as new patterns surface.
Contributions of untested real-world inputs are welcome.
The modules
Pick the modules you need. Each one is documented page-by-page with its full API.