Fast, resilient parsers: JavaScript · HTML · CSS · XML
And
A powerful parser-combinator engine written in Java

Four distinct, fast and fault-tolerant parsers — one each for JavaScript, HTML, CSS and XML — built on a shared parser-combinator engine and producing a unified, fully-located AST. Parse a full HTML page — with its embedded JavaScript and CSS — and walk the entire tree in one pass. Every node carries exact source locations. Comments are first-class. Engineered for throughput, they parse large files and minified bundles quickly, in a single linear pass. And the core is open: use the same yari-parsec engine to build your own fast, resilient parser for any language or DSL.

Quick start → Browse modules

Built for AI-driven code understanding

AI coding assistants and static-analysis tools are most useful when they can reason about source code at a structural level — not as raw text, but as a typed, located tree. Yari is designed to be that building block: parse any web source in one line, walk the result with a typed API, query nodes by type, and read exact locations. A clean, typed, fully-located AST gives an LLM a much stronger signal than raw text, improving analyses, refactoring suggestions, and overall code comprehension.

Each AST node is fully serializable to JSON out of the box via Jackson — persist a parsed tree, send it over the wire, or reload it without re-parsing. The unified tree model means one analysis pipeline covers HTML structure, embedded scripts, stylesheets, and standalone JS/CSS/XML files — all in the same shape.

A philosophy of resilience

Real-world web code is messy. HTML pages in the wild have unclosed tags, broken nesting, and mixed casing. JavaScript in the wild has syntax errors and unusual patterns. Yari was built around one constraint: never throw, never stop. When the parser encounters an unrecognisable combination, it emits an error node into the AST and resumes parsing — so you still get the maximum number of recognisable nodes from a broken or partially-malformed source.

This degraded-parse behaviour is not an afterthought bolted onto each parser. It is a first-class primitive of yari-parsec, the combinator engine underneath, which makes it composable across every language in the framework. Perfect for linting, static analysis, or any pipeline that must keep going on imperfect inputs.

Build your own parser

Yari is more than four language parsers. At its core is yari-parsec, a general parser-combinator engine: compose small, well-typed Parser<T> values to tokenize, build expression grammars with operator precedence, track source locations, and recover from errors. It is the exact machinery the JavaScript, HTML, CSS and XML parsers are themselves built on.

Use it to implement your own fast, resilient parser for any language or DSL — one linear pass, error nodes instead of exceptions, and an OperatorTable for precedence and associativity. The Quick start walks through a complete mini-language built entirely on the engine.

Why Yari

A small set of focused, fast and resilient parsers that compose into a unified AST without any external dependencies beyond the JVM.

One AST, all sub-languages

The HTML parser also parses the CSS and JavaScript embedded inside the HTML (<script>, <style>, onclick, style attributes…) and exposes everything as a single unified AST. Stream over the nodes of a full web page — HTML, CSS and JS — in one pass.

Degraded parsing — never throw

When the parser encounters an unrecognisable combination it emits an error node and resumes — you always get the maximum number of recognisable nodes. Ideal for linting, static analysis, or any pipeline that must handle imperfect inputs.

Exact source locations

Every AST node carries its precise location in the source: offsets, lines, and columns. Critical for diagnostics, refactoring tools, code-intel features, and any downstream consumer that needs to point back at the original text.

Comments are first-class

Comments are parsed in every supported language and kept with their own source locations. A dedicated API lets you look them up relative to any AST node (leading, trailing, inside…), so you never lose the connection between code and its documentation.

Simple one-liner API

Parsing a source is a single method call. The resulting AST is a plain typed tree you can walk, query, or stream over. Easy to plug into existing JVM tooling — no configuration, no setup.

JSON serialization

Every AST node is fully serializable and deserializable to JSON out of the box (via Jackson). Persist a parsed tree, send it over the wire, or reload it later without re-parsing the original source.

AI-friendly

A clean, typed, located AST gives an LLM a much stronger signal than raw source text. Yari is designed to be a building block for AI-driven code understanding — improving analyses, refactoring suggestions, and code comprehension.

Parser-combinator engine

The framework ships its own parser-combinator library (yari-parsec), which makes degraded-parse behaviour composable across every language. Error recovery is a first-class primitive of the combinators, not an afterthought.

Heavily tested

Each module ships with an extensive test suite covering as many edge cases as possible — and it keeps growing as new patterns surface. Contributions of untested real-world inputs are welcome.

The modules

Pick the modules you need. Each one is documented page-by-page with its full API.

yari-parsec

Parsec

Parser-combinator engine. Compose typed Parser<T> values, track source locations, and recover from errors. The foundation of every module.

yari-core

Core

Common interfaces shared by every parser: the AST contract, source location system, comment model, and base utilities.

yari-xml-parser

XML Parser

Fault-tolerant XML parser. Supports tags, self-closing tags, comments, CDATA, DOCTYPE, prolog, and namespaces.

yari-css-parser

CSS Parser

Fault-tolerant CSS parser. Supports at-rules, complex selectors (pseudo-classes, pseudo-elements, combinators, nth-patterns…) and property values.

yari-javascript-parser

JavaScript Parser

Fault-tolerant JavaScript parser. Supports ES6 modules, classes, async/await, destructuring, template literals, generators, and more.

yari-html-parser

HTML Parser

Extends the XML parser and additionally processes embedded JavaScript and CSS. Produces a unified AST where each sub-language is parsed into its own tree.

Why choose Yari

What the framework brings, compared to conventional approaches.

Aspect Conventional approach With Yari
Broken input Exception thrown, parsing aborts Error node inserted, parsing continues — maximum AST recovered
Multi-language page Three separate parsers, three separate trees to juggle One unified AST — HTML, embedded CSS and JS in one pass
Source locations Often absent or limited to line numbers Every node carries exact start/end offsets, lines and columns
Comments Discarded during lexing Preserved with location, queryable relative to any AST node
Serialization Manual mapping or custom serializer Full Jackson JSON serialization/deserialization out of the box
Error recovery Ad-hoc per parser, hard to extend First-class combinator primitive — composable across all parsers
AI compatibility Raw text or opaque tokens Typed, located AST — strong structural signal for LLMs and agents
Dependencies Multiple external parser libraries, version conflicts One framework, one shared engine — only Jackson and SLF4J