The classic Unix tools for compiler construction.
Lex is a "tokenizer," helping to generate programs whose control flow is directed by instances of regular expressions in the input stream. It is often used to segment input in preparation for further parsing (as with Yacc).
Yacc provides a more general parsing tool for describing the input to a computer program. The Yacc user specifies the grammar of the input along with code to be invoked as each structure in that grammar is recognized. Yacc turns that specification into a subroutine to process the input.
If you are writing a compiler, that "process" involves generating code to be assembled to generate the object code. Alternatively, if you are writing an interpreter, the "code to be invoked" will be code controlling flow of the user's application.
A LALR(1) parser generator that claims to be faster and easier to program than Bison or Yacc.
Most of the work of the compiler is done on an intermediate representation called register transfer language. In this language, the instructions to be output are described, pretty much one by one, in an algebraic form that describes what the instruction does.
People frequently have the idea of using RTL stored as text in a file as an interface between a language front end and the bulk of GNU CC. This idea is not feasible. GNU CC was designed to use RTL internally only. Correct RTL for a given program is very dependent on the particular target machine. And the RTL does not contain all the information about the program.
TENDRA / ANDF compilation tools
"The Dragon Book"
C-- is a C-like language designed as an intermediate target language for compilers. It eschews many of the syntactical complications that have caused ANSI C to get more complex over time.
ML-RISC - A framework for retargetable and optimizing compiler back ends.
MLRISC is a customizable optimizing back-end written in Standard ML and has been successfully retargeted to multiple architectures, notably, IA-32, Alpha, PA-RISC, Sparc, PPC, MIPS.
Available under the SML/NJ license.
ANTLR (ANother Tool for Language Recognition) is a parser framework for building recognizers, compilers, and translators where the "actions" are represented in C++ or Java
bintrans is a dynamic binary translator. That means it runs programs on architectures they were not compiled for. It does this by dynamically translating the machine code of the programs to be run to machine code for the native architecture.
A parser generator for Python .
If this was useful, let others know by an Affero rating