ZPORT – An I/O Port Language for the Zilog Z8 Microcontroller
Gareth Scott - 13 September 2003
embeddedcontrollers@yahoo.com
A small language called ZPORT has been written to translate General I/O port requests for the Zilog Z8 microcontroller into C code. A formal definition of the language is provided here. This paper shows how to use the Free Software Foundation compiler tools flex and bison (also known as Lex and Yacc) to build the compiler from the language definition. The Z8! Encore evaluation board and ZDS II (Developer Studio) were used to help develop this language.
If you’re a reader of Circuit Cellar then you’re probably already familiar with a high-level language like C, Java or Basic. And you know that the compiler for these languages just translates the code you’ve written into another language – usually one closer to the target machine. The main reason these languages were written was to provide a higher level of abstraction for programmers to use in solving problems. This abstraction not only allows more complex problems to be solved but also provides a greater degree of safety.
An example of this can be seen in the Standard Template Library for C++ where you can specify a type safe stack in just one line of code. That one line of code, however, is expanded by the compiler into many lines of code that take care of all the housekeeping tasks like allocating memory when adding elements to the stack. Errors are also caught early on at compile time because the STL only allows elements of the stack type to be used. The language developed here doesn’t approach the complexity of the STL, but it does provide the same benefits of greater abstraction and catching errors early in the development cycle.
The Language
The ZPORT language is similar to the AWK language [4] which uses a “pattern {action}” paradigm. Like AWK, this language is declarative, rather than procedural like C and Java. In a declarative language there are no functions to call. You just state what you would like to happen given a certain input and the language goes and figures out how to do it. From a logical standpoint you can just think of everything happening simultaneously.
In ZPORT, the pattern is composed of input lines and the action is composed of outputs. An example is shown in fig. 1.
inputs
{d3} outputs
{e0, e1, e2, e3, e4, e7, g0, !g1, !g2, !g3, !g4, !g5, !g6} /*
If button d3 is high (up) then set all LED cathode lines 'e?' low and anode lines 'g?' high. */ d3
{!e0, !e1, !e2, !e3, !e4, g0, g1, g2, g3, g4, g5, g6}
The language also
contains a header listing input and output lines. Since ZPORT is targeted to
the Z8 development board, the allowable I/O ports are A through H. And since
each port has 7 lines, ZPORT uses a port letter (upper or lower case) followed
by a number to designate a specific line. The sample program in fig. 1 says
that the input line is, port D line 3. The outputs are port E lines 0-4 and 7
and port G line 0-6.
After listing all the I/O lines, ZPORT allows as many pattern {action} statements as you like. The sample program in
fig. 1 has just one. It says that when D3 is high, set E0-4 low and G0-6 high.
In the Z8 Development Kit, D3 is one of the push buttons and E0-4 are the
cathodes of the LED matrix. G0-6 are the anodes. This statement has the effect
of turning on all the LED dots when the button is up.
For each pattern{action} statement in the program, ZPORT does 2
others things behind the scenes. First, it automatically issues a complement pattern{action}. In the sample program of fig. 1
the complement is:
!d3 {e0, e1, e2, e3, e4, !g0, !g1, !g2, !g3, !g4, !g5, !g6}. This has the effect of turning off all the
LED segments. Without this statement the LED’s would stay lit.
The other action ZPORT takes is to toggle the E7 line low and then high
after each pattern action. This has the effect of latching the data in the
action statement into the first LED on the development board. Of course, since
the language you’ll be developing will probably be targeted at a different
hardware configuration, you can control these actions. I’ll show you how all of
this is accomplished in the following sections.
There are several steps involved in developing the language that takes us
from the program in fig. 1 to running it on the Z8. Taken in small steps you’ll
see that each one is quite manageable. A brief overview of each step will be
given here and then more detail will be provided in the following sections.
There’s a common misconception that compilers are line based. In other
words, that they look at a line of code and then translate that into another
language closer to the target machine. However, separating code into lines is
just a convenience for humans. A C or Java program can be written in one long
line of code. The compiler doesn’t care because the first thing it does is
split the code up into a long line of words, or tokens. This is called lexical
analysis or scanning and is the job of flex.
Just as words by themselves have no meaning, the same applies to tokens.
They have to be combined together using a grammar just as we combine words in a
sentence. The location of these words gives them a syntactic meaning and value.
The job of figuring out the syntax of a language is that of bison. The output
of flex is bison’s input.
The grammar that can be handled by bison is called LALR(1). This is a
subset of a specific type of languages called context free languages. The
acronym LALR(1) stands for Look-Ahead-Left-Right(1). This means that we’ll read
the tokens in a left to right fashion and at any time will only look ahead 1
token to figure out which grammar rule applies. More complicated languages can
be handled by looking ahead 2 or more tokens. However, there’s a series of
theoretical arguments that once you start down the slippery slope of looking
ahead more than 1 token that you’ll end up looking at n-tokens. And looking
ahead a possibly infinite number of tokens makes the compiler infinitely harder
to write.
Both flex and bison produce C code as output so this paper
assumes you have access to a C compiler. The Microsoft Visual C++ v6.0 compiler
was used in this project. Unlike MSVC++ and the Zilog Developer Studio, flex
and bison are command line tools. This means that that are run from a DOS
window.
Lexical Analysis
Flex performs lexical analysis by using a table-driven finite state machine. This mechanism is actually capable of parsing a very simple kind of language called regular expressions and can be used in its own right for some languages. The technique of using a finite state machine is shown in fig. 2.
The
token definitions for flex are in a file called port.l. There is an
exhaustive explanation of this file in the Anatomy of the
lexical analysis file, port.l section. A typical example of the
definition of an I/O port line is shown below.
line [A-Ha-h][0-7]
This definition says that whenever flex encounters one upper or lower case letter that is from A to H followed by exactly one number from 0 to 7, then it will return a line token.
Syntactic Analysis
Arguably the most interesting, and complicated, part of defining a new language takes place in the definition of the grammar used by bison. Rather than show a toy language, a slightly simplified version of the ZPORT language is shown in fig. 3. For the complete specification of ZPORT please refer to the port.y file.
1) program
-> input output
pattern_action 2) input
-> INPUTS
LBRACE line_sequence RBRACE 3) output
-> OUTPUTS
LBRACE LINE RBRACE 4)
line_sequence -> LINE
COMMA LINE 5)
pattern_action -> expr
LBRACE action RBRACE 6)
action-> LINE 7) expr
-> expr
AND term 8) expr
-> term 9) term
-> LPAREN
expr RPAREN 10) term
-> LINE
The first production rule says that a program is composed of the following sequence: input output pattern_action. Each of these terms is referred to as a non-terminal symbol, which, by convention, is written in lower-case. Terminal symbols are generally in upper case and refer to the tokens returned by the lexical scanner.
Bison uses a technique called bottom-up or shift-reduce parsing. Shifting refers to pushing a terminal symbol onto bison’s stack. Reducing refers to replacing a series of terminal and non-terminal symbols with the left-hand side of a production rule. It may seem backwards, but bison tries to end up with just the one start symbol on the stack. In this grammar it’s the non-terminal called program. Note that if given a choice, bison will shift rather than reduce.
inputs {d3, f6} outputs {e0} d3 && f6 {e0} 1)
inputs -
shift 2)
inputs {
-
shift 3)
inputs {
d3 -
shift 4)
inputs {
d3 , -
shift 5)
inputs {
d3, f6 -
shift 6)
inputs {
line_sequence -
reduce using rule 4 7)
inputs {
line_sequence } -
shift 8)
input -
reduce using rule 2 9)
input outputs - shift 10)
input outputs { - shift 11)
input outputs { e0 - shift 12)
input outputs { e0 } - shift 13)
input output -
reduce using rule 3 14)
input output d3 -
shift 15)
input output term -
reduce using rule 10 15)
input output expr -
reduce using rule 8 16)
input output expr AND - shift 17)
input output expr AND f6 - shift 18)
input output expr AND term - reduce using rule 10 19)
input output expr -
reduce using rule 7 20)
input output expr { -
shift 21)
input output expr { e0 - shift 22)
input output expr { action - reduce using rule 6 23)
input output expr { action } - shift 24)
input output pattern_action - reduce using rule 5 24)
program - reduce using rule 1
Of course, the example in fig. 4 only shows that the program parsed correctly. No code was emitted to translate the program into C code. That is done by actions that are associated with a production rule in the grammar. These actions are found in the braces after the production rules in port.y. In ZPORT most code is just emitted directly in the production rule. However, the expression that constitutes the pattern part of pattern{action} can contain logical operators nested in parentheses. For this, ZPORT builds a tree structure and walks the tree when a pattern_action is reduced. An example of this can be seen in fig. 5.
Debugging
If you run into problems developing a language there are several things you can do. The first is to look at the error or warning messages given by flex and bison. These can be very helpful and usually refer to the line number where the error occurred. In particular, bison will let you know if there are ambiguities in your language. If there are, you’ll need to tell bison how to resolve them.
Use the –v option in bison to turn on the output file, port.output. This will show you what bison thinks your grammar is and also tell you when bison is shifting or reducing. It’s useful to bench test your code. This is the process of looking at your program file and following along with port.output to see if the states that you think bison enters is actually what happens. Bison will tell you the state it’s in and whether it is shifting or reducing when you use the –t option.
When you use the -t option in bison to turn on debugging you’ll also need to add yydebug = 1; to the main() function in the bison input file, port.y. This will allow you to step into the port.tab.c file generated by bison and compiled by your C compiler. You can generally assume that the flex and bison code is working correctly and fortunately don’t need to debug this part. However, you can now set breakpoints inside the code you supplied in the user section of port.y.
Free Software Foundation
Bison is distributed under the GNU (which is recursively defined as: “GNU’s Not Unix”) General Public License Agreement. It’s actually one of the more readable copyright notices that you’ll ever come across. GNU is part of the Free Software Foundation. If you’re not familiar with the concept of free software, the copyright is a good introduction to its principles. Here’s an excerpt from the copyright agreement:
When we speak of
free software, we are referring to freedom, not price.
The “free” refers to granting access to your source code, not necessarily giving it away at no charge. Another principle of free software is that no patents should be applied to software, unless those patents allow everyone to freely use the software.
The Free Software Foundation would obviously like to encourage you to make any software your build free too. However, you are certainly allowed to include the output generated by bison in non-free, proprietary software.
Even if you don’t agree in principal to free software, I think you’ll agree that you’re getting a good deal from flex and bison. They will save you the tedium and complexity of writing your own scanner and parser.
One of the founding members of the Free Software Foundation is Richard Stallman. You’ll find that he can articulate the nuances of Free Software much better than I have here. In addition to writing code, he’s also written on many areas of software development. If you read the manual pages [2, 3] for flex and bison you’ll find that Richard Stallman and a small army of contributors have made these tools the success they are today.
In particular, Robert Corbet wrote most of bison and Vern Paxson wrote flex.
Enhancements and Caveats
ZPORT does not handle parsing errors in detail. The compiler will tell you that there’s been a syntactic error, but it currently won’t show you the context, or even line number, of the error. Also, due to the large amount of code generated by the compiler the large model is used for compiling the C code in the Zilog Developer Studio.
In addition, there’s currently nothing to stop 2 different patterns trying to switch the same I/O line in different directions. This could be prevented by additional constraints added to the language but is not implemented here.
Conclusion
There is a lot of capability in flex and bison that hasn’t been covered here. However, this article should provide a basis to explore these free compiler construction tools. Hopefully you’ve seen that it’s not only possible but quite a lot of fun to write your own language. Even though building a compiler is not a trivial task, it’s well within the grasp of most programmers familiar with a high level language. The benefits of productivity and safety are an added bonus. If, in the course of your work, you find yourself repeatedly writing similar code, I hope that you’ll consider writing your own language instead.
Anatomy of the lexical analysis file, port.l
%option noyywrap
%{
#include
<stdio.h>
#include
<stdlib.h>
#include <string.h>
%}
line [A-Ha-h][0-7]
lparen [(]
rparen [)]
lbrace [{]
rbrace [}]
comma [,]
bang [!]
or "||"
and "&&"
%%
{line} {yylval.string
= strdup(yytext); return LINE;}
outputs {return
OUTPUTS;}
inputs {return
INPUTS;}
{lbrace} {return
LBRACE;}
{rbrace} {return RBRACE;}
{lparen} {return LPAREN;}
{rparen} {return RPAREN;}
{comma} {return COMMA;}
{bang} {yylval.string
= strdup(yytext); return BANG;}
{or} {return
OR;}
{and}
{return AND;}
%%
/*
extern int yylex();
int main( argc, argv )
int argc;
char *argv[];
{
yyin = fopen( "\\dev\\parser\\port1\\port.def",
"r" );
return
yylex();
}
*/
Anatomy of the syntactic analysis
file, port.y
%{
#include
<stdio.h>
#include
<malloc.h>
#include
<string.h>
#include
"compiler.h"
void
yyerror (char* s);
int yylex(void);
void
pushInputLine(char* line);
void
pushOutputLine(char* line);
void
printBoilerplate(void);
%}
%union
{
int number;
char* string;
node* pNode;
}
%token
INPUTS OUTPUTS LBRACE RBRACE COMMA BANG LPAREN RPAREN
%token
<string> LINE BANG
%token
<int> AND OR
%type <string> line_level
%type <string> line_sequence
line_level_sequence
%type <string> pattern_action
%type <pNode> expr term action
%start
program
%% /*
Grammar rules and actions */
program:
input output pattern_action_list
;
input: INPUTS LBRACE line_sequence RBRACE
;
output:
OUTPUTS LBRACE line_level_sequence RBRACE {printf("\nwhile
(TRUE) {");}
;
line_sequence: LINE line_sequence_remainder {pushInputLine($1);}
;
line_sequence_remainder: /* empty */
|
COMMA
line_sequence
;
line_level_sequence: line_level line_level_sequence_remainder {pushOutputLine($1);}
;
line_level_sequence_remainder: /* empty */
|
COMMA
line_level_sequence
;
line_level: LINE
|
BANG LINE {$$ = $1; $$ = strcat($$, $2);}
;
pattern_action_list:
/* empty */
|
pattern_action_list pattern_action
;
pattern_action:
expr LBRACE action RBRACE {doPatternAction($1,
$3);}
;
action:
action COMMA line_level {$$ =
addNodeOperatorAction($1, $3);}
|
line_level {$$ = addNodeActionId($1);}
;
expr:
expr AND term {$$ = addNodeOperator(AND,
$1, $3);}
|
expr OR term {$$ =
addNodeOperator(OR, $1, $3);}
|
term
;
term:
LPAREN expr RPAREN {$$ = $2;}
|
line_level {$$ = addNodeId($1);}
;
%% /*
Additional C code */
#include
"lex.yy.c"
void
yyerror (char* s)
{
printf (" %s\n", s);
}
main
()
{
/* Initialize IO boilerplate code.
*/
FILE* stream =
fopen("\\dev\\parser\\port1\\boilerplate.c", "r" );
if (stream == NULL) {
printf("Error
opening boilerplate code.\n");
} else {
#define BUF_LEN (256)
char buffer[BUF_LEN];
while (fgets(buffer,
BUF_LEN, stream) != NULL) {
printf("%s",
buffer);
}
fclose(stream);
}
// To turn on debugging, make sure
the next line is uncommented and
//
turn on the -t (also use -v -l) options in bison.exe.
// yydebug = 1;
yyin
= fopen("\\dev\\parser\\port1\\port.def", "r" );
yyparse ();
// The closing braces for 'main()'
and 'while (TRUE)'.
printf("\n} /* while */");
printf("\n} /* main
*/\n");
}
User
defined code omitted here for brevity. See the source code for the full
listing.
How all the parts fit together to
build a compiler.
1. A Compact Guide to Lex and Yacc, Thomas Niemann, epaperpress.com
2. Bison 1.24 Man page, Charles Donnelly and Richard Stallman, Free Software Foundation, ISBN-1-882114-30-2
3. Flex 2.5.2 Man page, Vern Paxson
4. The AWK Programming Language, Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger, 1988, Addison-Wesley, ISBN 0-201-07981-X
Gareth Scott started programming with the Altair 8080 in 1975. He has degrees in Computer Science and Electrical Engineering from Sonoma State University and Cleveland State University respectively. He worked at Autodesk for 10 years as a programmer on AutoCAD. His interests are in distributed microcontroller systems and industrial control. You may reach him at embeddedcontrollers@yahoo.com.