Discussion moving here from
AvoidExceptionsWheneverPossible.
- BTW, I find I went from recursive descent years ago, to yacc/lex, to AntlrTranslatorGenerator, back to hand-coded recursive descent. --JimPerry
- How come? I mean, I find recursive descent seductive - but I have a very strong feeling it's nowhere near as maintainable/efficient as a good parser generator coupled with a good grammar. OTOH, having done both within the last few years, I have no idea why I feel that. So ... how come? --PeterMerel
There's a benefit to having an explicit grammar, especially if you're implementing something with a standard one, but the fact is that parser generators require grammars twiddled for their own preferences, for instance yacc favors left-recursion for repetitive constructs. Also, they require you to intermix the
grammar with the implementation language in a way that is neither quite
natural to a
BackusNaurForm-reader nor a C++-or-whatever reader. The context in which the target-language fragments will occur in the generated parser is not apparent.
Yacc is notoriously hard to get to produce decent diagnostics, and existing
versions tend to have limitations like using fixed symbol names (this can be
worked around but is a pain). Also, since it builds trees bottom up, it can
be unfriendly to C++ object management.
So I looked into
AntlrTranslatorGenerator
which is more modern than yacc, and generates
readable recursive-descent code. It looked promising, but I found that to me
the grammar interspersed with C++ fragments was not really a whole lot more
readable or maintainable than the emitted parser, and certainly no more so than
an equivalent hand-coded parser. (Also, the ANTLR folks were/are putting all their efforts into a Java version). So I eventually dropped that path and opted for the hand-coded parser route.
This has seemed to pay off; it's much easier to make changes, there are fewer
grammar hacks and repetitions and so on. Perhaps it's what you're used to. It's certainly more convenient to have the parser right there in C++ like everything else rather than having to run it through a preprocessor (and in
our environment we found we needed to pick one vendor's yacc, do the
preprocessing on that particular platform, then ship around the preprocessed code).
Efficient? I don't know. My recollection is that it's the lexical analyzer
(token scanner) that dominates in parsing, but in any case I'm a strong believer
in
LazyOptimization and parser efficiency has even never crossed the radar screen in monitoring performance of our current systems. YMMV. --
JimPerry
As a followup on this, the desire for an explicit grammar does come up occasionally, so I recently wrote a utility to read parser code and generate a
BackusNaurForm grammar for the recognized language. It produces about as readable a BNF as could be written by hand, more so IMO than most yacc source (even if trimmed of action code). This approach has a certain satisfying quality related to
TheSourceCodeIsTheDesign.
The particular program I wrote, written in the
IconLanguage and primarily a vehicle for learning Icon, is specifically tailored to C++ and in a few minor ways to the particular context, but I think the general approach is pretty broadly applicable. --
JimPerry