See
XpForOptimizingCompilers.
The simplest optimizing compilers I've seen are in
PeterNorvig's
ParadigmsOfArtificialIntelligenceProgramming and
ChristianQueinnec's
LispInSmallPieces. They share several traits:
- They parasitize off of an existing language runtime.
- They compile to byte codes.
- They use an AbstractSyntaxTree to represent programs.
All of these compilers were written for instructional purposes, and all were built up from even simpler interpreters. Norvig does an especially spiffy job: he fits a compiler, an assembler, a peephole optimizer, a branch simplifier and a VM all into two chapters.
Furthermore, I suspect that any of these compilers could be built using XP and a bit of domain knowledge. Even better, you could improve them further without radical re-design--techniques are known for doing native code generation straight from an
AbstractSyntaxTree.
But what about a modern production compiler?
Modern production compilers generally don't rely on ASTs. Typically, they use something like (say) N-address code, possibly in
StaticSingleAssignmentForm. They use lots of funky algorithms. For an idea of how hairy it can get, compare the algorithms in
StevenMuchnick's
AdvancedCompilerDesignAndImplementation, or even
TheDragonBook (older), to the books above.
I don't think you could build one of these beasts without some prior design work. I don't think you could start out with one of the simple compilers above, and transform it into one of these without breaking a lot of test suites for several programmer-weeks. I could be wrong.
Or does XP classify this kind of broad-sweeping, hard-won, architectural knowledge as a
SystemMetaphor, as was suggested by
RalphJohnson? --
EricKidd
AssemblyLanguage fits here in 2 different ways:
To get really fast applications, the compiler will, of course, have to compile all the way to
MachineCode. (To make it easier to test if it's working right, many compilers have a compiler option to save a ".cod" (
MachineCode) file, which shows the raw hex
MachineCode bytes, the
AssemblyLanguage, and the high-level source.).
If I get to pick the source language, the simplest possible compiler for me to write is an assembler for
AssemblyLanguage.
If you get to pick the source language and
don't have to optimize, the simplest possible compiler is the null compiler that performs an identity transformation. Let the source language be the target runtime format. Done.
The simplest optimizing compiler I've written was 200 lines of C code, compiling
ForthLanguage to powerpc
MachineCode. Optimizations were limited, but included
TailCallOptimization,
ConstantFolding, inlining, and a generic
PeepholeOptimization mechanism that used something akin to sed's s/foo/bar/ to rewrite short segments of code. Oh, and it was hard
RealTime in the absense of
ForthMacros (although, in Forth, macros are rarely absent); the amount of work for a character in the source code was bounded. --
AdamBerger
ForthSimplicity has the complete source code for a (non-optimizing) compiler.
ColorForth also has a very small compiler, with limited optimizations.
ForthLanguage lends itself to simple compilers. It is more of a macro assembler for a virtual machine than a high-level language.
I remember that the compiler (to 68k) for the
FalseLanguage had only 1K size (written in
AssemblyLanguage). --
GunnarZarncke
The C compiler I wrote for the
BbcMicro while at Acornsoft ran in 20K - code and data. It barely deserved the title "optimizing", though. Paul Fellows also wrote a Pascal compiler for the same machine that also ran in 20K. --
PaulHudson
See also
StepsTowardTheReinventionOfProgramming
CategoryCompilers