Menhir

What is it?

Menhir is an LR(1) parser generator for OCaml: it compiles LR(1) grammars down to OCaml code.

Menhir replaces ocamlyacc. Legacy grammars can be compiled by Menhir, with a few caveats, described in the reference manual (HTML; PDF).

How to get it?

Menhir is available through opam, OCaml's package manager.

Type opam install menhir.

Menhir's source code is hosted in this repository (releases; changes).

How to get help?

There is a mailing list for announcements of new releases and discussion of problems, bugs, feature requests, and so on. Only subscribers can post.

Menhir has been designed and implemented by François Pottier and Yann Régis-Gianas.

What are the key features of Menhir?

Menhir has many features that make it superior to the traditional yacc-style parser generators that many people are familiar with.

  • Menhir is not restricted to LALR(1) grammars. It accepts LR(1) grammars, thus avoiding certain artificial conflicts. When a grammar lies outside this class, Menhir explains conflicts in terms of the grammar, not just in terms of the automaton. Menhir's explanations are believed to be understandable by mere humans.
  • Menhir allows the definition of a nonterminal symbol to be parameterized. A formal parameter can be instantiated with a terminal symbol, a nonterminal symbol, or an anonymous rule. A library of standard parameterized definitions, including options, sequences, and lists, is bundled with Menhir. EBNF syntax is supported: the modifiers ?, +, and * are sugar for options, nonempty lists, and arbitrary lists. Parameterized definitions are expanded away in a straightforward way.
  • Menhir's %inline keyword allows indicating that a nonterminal symbol should be replaced with its definition at every use site. This offers a second macro-expansion mechanism. Together, these expansion mechanisms help write concise and elegant grammars, while avoiding LR(1) conflicts. In other words, they extend Menhir's expressive power far beyond LR(1), while retaining the attractive features of LR(1): determinism, performance, guaranteed unambiguity.
  • In --table mode only, Menhir supports incremental parsing. This means that the state of the parser can be saved at any point (at no cost) and that parsing can later be resumed from a saved state. Furthermore, Menhir offers an inspection API which allows the parser's current state and stack to be examined by the user. This opens the door to a variety of advanced uses, including error explanation, error recovery, context-dependent lexical analysis, and so on.
  • Menhir offers a set of tools for building a (complete, irredundant) set of invalid input sentences, mapping each such sentence to a hand-written error message, and maintaining this mapping as the grammar evolves. Thus, a generated parser can produce good syntax error messages.
  • Menhir has a Coq back-end, which produces parsers whose correctness and completeness with respect to the grammar can be verified by Coq.
  • Menhir offers an interpreter that helps debug grammars interactively.
  • Menhir allows grammar specifications to be split over multiple files. It also allows several grammars to share a single set of tokens.
  • Menhir produces reentrant parsers.
  • Menhir is able to produce parsers that are parameterized by OCaml modules.
  • Instead of referring to semantic values via keywords: $1, $2, etc., Menhir allows semantic values to be explicitly named. In fact, Menhir now has fairly nice syntax for describing grammars.