ZPORT – An I/O Port Language for the Zilog Z8 Microcontroller

Gareth Scott  - 13 September 2003

embeddedcontrollers@yahoo.com

 

Abstract

A small language called ZPORT has been written to translate General I/O port requests for the Zilog Z8 microcontroller into C code. A formal definition of the language is provided here. This paper shows how to use the Free Software Foundation compiler tools flex and bison (also known as Lex and Yacc) to build the compiler from the language definition. The Z8! Encore evaluation board and ZDS II (Developer Studio) were used to help develop this language.

 

Motivation

If you’re a reader of Circuit Cellar then you’re probably already familiar with a high-level language like C, Java or Basic. And you know that the compiler for these languages just translates the code you’ve written into another language – usually one closer to the target machine. The main reason these languages were written was to provide a higher level of abstraction for programmers to use in solving problems. This abstraction not only allows more complex problems to be solved but also provides a greater degree of safety.

 

An example of this can be seen in the Standard Template Library for C++ where you can specify a type safe stack in just one line of code. That one line of code, however, is expanded by the compiler into many lines of code that take care of all the housekeeping tasks like allocating memory when adding elements to the stack. Errors are also caught early on at compile time because the STL only allows elements of the stack type to be used. The language developed here doesn’t approach the complexity of the STL, but it does provide the same benefits of greater abstraction and catching errors early in the development cycle.

 

The Language

The ZPORT language is similar to the AWK language [4] which uses a “pattern {action}” paradigm.  Like AWK, this language is declarative, rather than procedural like C and Java.  In a declarative language there are no functions to call. You just state what you would like to happen given a certain input and the language goes and figures out how to do it. From a logical standpoint you can just think of everything happening simultaneously.

 

In ZPORT, the pattern is composed of input lines and the action is composed of outputs.  An example is shown in fig. 1.

 

inputs {d3}

outputs {e0, e1, e2, e3, e4, e7, g0, !g1, !g2, !g3, !g4, !g5, !g6}

 

/* If button d3 is high (up) then set all LED cathode lines 'e?'

    low and anode lines 'g?' high. */

d3 {!e0, !e1, !e2, !e3, !e4, g0, g1, g2, g3, g4, g5, g6}

 

 
 

 

 

 

 

 

 

 

 

 


Fig. 1 – A simple program written in ZPORT

 

 

The language also contains a header listing input and output lines. Since ZPORT is targeted to the Z8 development board, the allowable I/O ports are A through H. And since each port has 7 lines, ZPORT uses a port letter (upper or lower case) followed by a number to designate a specific line. The sample program in fig. 1 says that the input line is, port D line 3. The outputs are port E lines 0-4 and 7 and port G line 0-6.

 

After listing all the I/O lines, ZPORT allows as many pattern {action} statements as you like. The sample program in fig. 1 has just one. It says that when D3 is high, set E0-4 low and G0-6 high. In the Z8 Development Kit, D3 is one of the push buttons and E0-4 are the cathodes of the LED matrix. G0-6 are the anodes. This statement has the effect of turning on all the LED dots when the button is up.

 

For each pattern{action} statement in the program, ZPORT does 2 others things behind the scenes. First, it automatically issues a complement pattern{action}.  In the sample program of fig. 1 the complement is:

  !d3 {e0, e1, e2, e3, e4, !g0, !g1, !g2, !g3, !g4, !g5, !g6}. This has the effect of turning off all the LED segments. Without this statement the LED’s would stay lit.

 

The other action ZPORT takes is to toggle the E7 line low and then high after each pattern action. This has the effect of latching the data in the action statement into the first LED on the development board. Of course, since the language you’ll be developing will probably be targeted at a different hardware configuration, you can control these actions. I’ll show you how all of this is accomplished in the following sections.

 

The Big Picture

There are several steps involved in developing the language that takes us from the program in fig. 1 to running it on the Z8. Taken in small steps you’ll see that each one is quite manageable. A brief overview of each step will be given here and then more detail will be provided in the following sections.

 

There’s a common misconception that compilers are line based. In other words, that they look at a line of code and then translate that into another language closer to the target machine. However, separating code into lines is just a convenience for humans. A C or Java program can be written in one long line of code. The compiler doesn’t care because the first thing it does is split the code up into a long line of words, or tokens. This is called lexical analysis or scanning and is the job of flex.

 

Just as words by themselves have no meaning, the same applies to tokens. They have to be combined together using a grammar just as we combine words in a sentence. The location of these words gives them a syntactic meaning and value. The job of figuring out the syntax of a language is that of bison. The output of flex is bison’s input.

 

The grammar that can be handled by bison is called LALR(1). This is a subset of a specific type of languages called context free languages. The acronym LALR(1) stands for Look-Ahead-Left-Right(1). This means that we’ll read the tokens in a left to right fashion and at any time will only look ahead 1 token to figure out which grammar rule applies. More complicated languages can be handled by looking ahead 2 or more tokens. However, there’s a series of theoretical arguments that once you start down the slippery slope of looking ahead more than 1 token that you’ll end up looking at n-tokens. And looking ahead a possibly infinite number of tokens makes the compiler infinitely harder to write.

 

Both flex and bison produce C code as output so this paper assumes you have access to a C compiler. The Microsoft Visual C++ v6.0 compiler was used in this project. Unlike MSVC++ and the Zilog Developer Studio, flex and bison are command line tools. This means that that are run from a DOS window.

 

Lexical Analysis

Flex performs lexical analysis by using a table-driven finite state machine. This mechanism is actually capable of parsing a very simple kind of language called regular expressions and can be used in its own right for some languages. The technique of using a finite state machine is shown in fig. 2.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Fig. 2 – Parsing the word ‘CAT’ with a finite state machine.

 

The token definitions for flex are in a file called port.l. There is an exhaustive explanation of this file in the Anatomy of the lexical analysis file, port.l section. A typical example of the definition of an I/O port line is shown below.

line                              [A-Ha-h][0-7]

This definition says that whenever flex encounters one upper or lower case letter that is from A to H followed by exactly one number from 0 to 7, then it will return a line token.

 

Syntactic Analysis

Arguably the most interesting, and complicated, part of defining a new language takes place in the definition of the grammar used by bison. Rather than show a toy language, a slightly simplified version of the ZPORT language is shown in fig. 3. For the complete specification of ZPORT please refer to the port.y file.

 

1) program ->                     input output pattern_action

2) input ->                            INPUTS LBRACE line_sequence RBRACE

3) output ->                          OUTPUTS LBRACE LINE RBRACE

4) line_sequence ->          LINE COMMA LINE 

5) pattern_action ->          expr LBRACE action RBRACE

6) action->                           LINE

7) expr ->                             expr AND term

8) expr ->                             term

9) term ->                             LPAREN expr RPAREN

10) term ->                           LINE   

 

 
 

 

 

 

 

 

 

 

 

 

 

 

 

 


Fig. 3 – A simplified version of the ZPORT grammar in Bachus Naur Form.

 

 

 

 

 

 

 

The first production rule says that a program is composed of the following sequence: input output pattern_action. Each of these terms is referred to as a non-terminal symbol, which, by convention, is written in lower-case. Terminal symbols are generally in upper case and refer to the tokens returned by the lexical scanner.

 

Bison uses a technique called bottom-up or shift-reduce parsing. Shifting refers to pushing a terminal symbol onto bison’s stack. Reducing refers to replacing a series of terminal and non-terminal symbols with the left-hand side of a production rule. It may seem backwards, but bison tries to end up with just the one start symbol on the stack. In this grammar it’s the non-terminal called program. Note that if given a choice, bison will shift rather than reduce.

inputs {d3, f6}

outputs {e0}

d3 && f6 {e0}

 

1)     inputs                                                - shift

2)     inputs {                                              - shift

3)     inputs { d3                                        - shift

4)     inputs { d3 ,                                      - shift

5)     inputs { d3, f6                                  - shift

6)     inputs { line_sequence                - reduce using rule 4

7)     inputs { line_sequence }              - shift

8)     input                                                  - reduce using rule 2

9)     input outputs                                  - shift

10)  input outputs {                              - shift

11)  input outputs { e0                         - shift

12)  input outputs { e0 }                      - shift

13)  input output                                    - reduce using rule 3

14) input output d3                              - shift

15) input output term                           - reduce using rule 10

15) input output expr                           - reduce using rule 8

16) input output expr AND                 - shift

17) input output expr AND f6            - shift

18) input output expr AND term       - reduce using rule 10

19) input output expr                           - reduce using rule 7

20) input output expr {                        - shift

21) input output expr { e0                  - shift

22) input output expr { action           - reduce using rule 6

23) input output expr { action }        - shift

24) input output pattern_action       - reduce using rule 5

24) program                                          - reduce using rule 1

 

 

 

 

 

 
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Fig. 4 – A ZPORT program and the bison stack that results from applying the grammar in fig. 3.

 

 

Of course, the example in fig. 4 only shows that the program parsed correctly. No code was emitted to translate the program into C code. That is done by actions that are associated with a production rule in the grammar. These actions are found in the braces after the production rules in port.y.  In ZPORT most code is just emitted directly in the production rule. However, the expression that constitutes the pattern part of pattern{action} can contain logical operators nested in parentheses. For this, ZPORT builds a tree structure and walks the tree when a pattern_action is reduced. An example of this can be seen in fig. 5.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Fig. 5 – The tree representation of the pattern expression: d3 && (f6 || f7)

 

Debugging

If you run into problems developing a language there are several things you can do. The first is to look at the error or warning messages given by flex and bison. These can be very helpful and usually refer to the line number where the error occurred. In particular, bison will let you know if there are ambiguities in your language. If there are, you’ll need to tell bison how to resolve them.

 

Use the –v option in bison to turn on the output file, port.output. This will show you what bison thinks your grammar is and also tell you when bison is shifting or reducing. It’s useful to bench test your code. This is the process of looking at your program file and following along with port.output to see if the states that you think bison enters is actually what happens. Bison will tell you the state it’s in and whether it is shifting or reducing when you use the –t option.

 

When you use the -t option in bison to turn on debugging you’ll also need to add yydebug = 1; to the main() function in the bison input file, port.y. This will allow you to step into the port.tab.c file generated by bison and compiled by your C compiler. You can generally assume that the flex and bison code is working correctly and fortunately don’t need to debug this part. However, you can now set breakpoints inside the code you supplied in the user section of port.y.

 

Free Software Foundation

Bison is distributed under the GNU (which is recursively defined as: “GNU’s Not Unix”) General Public License Agreement. It’s actually one of the more readable copyright notices that you’ll ever come across. GNU is part of the Free Software Foundation. If you’re not familiar with the concept of free software, the copyright is a good introduction to its principles. Here’s an excerpt from the copyright agreement:

 

  When we speak of free software, we are referring to freedom, not price.

 

The “free” refers to granting access to your source code, not necessarily giving it away at no charge. Another principle of free software is that no patents should be applied to software, unless those patents allow everyone to freely use the software.

 

The Free Software Foundation would obviously like to encourage you to make any software your build free too. However, you are certainly allowed to include the output generated by bison in non-free, proprietary software.

Even if you don’t agree in principal to free software, I think you’ll agree that you’re getting a good deal from flex and bison. They will save you the tedium and complexity of writing your own scanner and parser.

 

One of the founding members of the Free Software Foundation is Richard Stallman. You’ll find that he can articulate the nuances of Free Software much better than I have here. In addition to writing code, he’s also written on many areas of software development. If you read the manual pages [2, 3] for flex and bison you’ll find that Richard Stallman and a small army of contributors have made these tools the success they are today.

In particular, Robert Corbet wrote most of bison and Vern Paxson wrote flex.

 

Enhancements and Caveats

ZPORT does not handle parsing errors in detail. The compiler will tell you that there’s been a syntactic error, but it currently won’t show you the context, or even line number, of the error. Also, due to the large amount of code generated by the compiler the large model is used for compiling the C code in the Zilog Developer Studio.

In addition, there’s currently nothing to stop 2 different patterns trying to switch the same I/O line in different directions. This could be prevented by additional constraints added to the language but is not implemented here.

 

Conclusion

There is a lot of capability in flex and bison that hasn’t been covered here. However, this article should provide a basis to explore these free compiler construction tools. Hopefully you’ve seen that it’s not only possible but quite a lot of fun to write your own language. Even though building a compiler is not a trivial task, it’s well within the grasp of most programmers familiar with a high level language. The benefits of productivity and safety are an added bonus. If, in the course of your work, you find yourself repeatedly writing similar code, I hope that you’ll consider writing your own language instead.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Anatomy of the lexical analysis file, port.l

 

Rounded Rectangular Callout: We don’t want to start using another input file when we’re done with ‘port.def’ so we’ll say ‘noyywrap’.
 


%option noyywrap

 

%{

Rounded Rectangular Callout: Any include files used by the user code section below should be stated here.#include <stdio.h>

Rounded Rectangular Callout: A port line is defined by the port letter (‘a’ through ‘h’) and a number (0 through 7). A valid port line is ‘a3’ and ‘A3’ (the same line). An invalid port line is ‘i3’ or ‘a8’.#include <stdlib.h>

#include <string.h>

Rounded Rectangular Callout: The  token name.%}

 

line                              [A-Ha-h][0-7]

Rounded Rectangular Callout: A left brace token is just the ‘{‘ symbol.lparen                         [(]

rparen                         [)]

lbrace                         [{]

Rounded Rectangular Callout: An ‘or’ token is 2 vertical bars next to each other.rbrace                         [}]

Rounded Rectangular Callout: Separates the definitions section (above) and the rules section (below).comma                        [,]

bang                           [!]

or                                 "||"

Rounded Rectangular Callout: This refers to the definition of ‘line’ in the previous section.and                              "&&"

 

%%

{line}                {yylval.string = strdup(yytext); return LINE;}

outputs                        {return OUTPUTS;}

Rounded Rectangular Callout: The lexical analyzer returns terminal symbols to the syntactical analyzer. As the name implies, a terminal symbol is the end of the line where the syntactical analyzer needs to either shift (push on the syntax stack) or reduce (use a grammar rule to replace terminal and non-terminal characters with a non-terminal) the syntax.inputs             {return INPUTS;}

{lbrace}           {return LBRACE;}

{rbrace}           {return RBRACE;}

{lparen}           {return LPAREN;}

{rparen}           {return RPAREN;}

{comma}          {return COMMA;}

{bang}             {yylval.string = strdup(yytext); return BANG;}

{or}                   {return OR;}

{and}                {return AND;}

Rounded Rectangular Callout: Separates the rules section (above) and the user defined code section (below). 


%%

Rounded Rectangular Callout: When you use this file as input to ‘flex.exe’ it produces the file ‘lex.yy.c’ which contains the function ‘yylex()’ but not ‘main()’ – you need to supply that./*

extern int yylex();

Rounded Rectangular Callout: Change the path to match the location of your input file.
 


int main( argc, argv )

int argc;

char *argv[];

            {

             yyin = fopen( "\\dev\\parser\\port1\\port.def", "r" );

            return yylex();

            }

Rounded Rectangular Callout: Remove  this and the matching comment  above to test the lexical portion of your language. Normally, the syntactic analyzer (produced by ‘bison.exe’) will use this as input and you can just comment out this portion of the code.*/

 

 

 

 

 

 

 

 

 

 

 

Anatomy of the syntactic analysis file, port.y

Rounded Rectangular Callout: The C declarations used by the user code section at the end of this file are put here.
 


%{

#include <stdio.h>

#include <malloc.h>

#include <string.h>

#include "compiler.h"

 

void yyerror (char* s);

int  yylex(void);

void pushInputLine(char* line);

void pushOutputLine(char* line);

Rounded Rectangular Callout: The syntactic types of terminal and non-terminals can be one of these types: a number, string or pNode. If you omit this union then every terminal and non-terminal is considered a number. We define pNode in ‘compiler.h’ as follows:

typedef struct thisNode {
	operandType operand;	// &&, ||
	char   idValue[4];	// Only used when operand = enumId
	struct thisNode* pLeft;
	struct thisNode* pRight;
} node;
void printBoilerplate(void);

 

%}

 

%union {

            int number;

            char* string;

            node* pNode;

}

 

%token INPUTS OUTPUTS LBRACE RBRACE COMMA BANG LPAREN RPAREN

%token <string> LINE BANG

%token <int>   AND OR

%type  <string> line_level

%type  <string> line_sequence line_level_sequence

%type  <string> pattern_action

Rounded Rectangular Callout: This tells bison that every program in this language consists of the non-terminal syntactic value ‘program’. Note that ‘program’ is defined in the next section for grammar rules.%type  <pNode>  expr term action

Rounded Rectangular Callout: The grammar rules for the language you’re writing start in this section. This is arguably the most important part of the entire process. A badly written grammar will cause endless problems.
 


%start program

 

%% /* Grammar rules and actions */

program:         input output pattern_action_list

;

Rounded Rectangular Callout: This production rule says that a ‘program’ is composed of ‘input’ followed by ‘ouput’ and then by a ‘pattern_action_list’.  These are non-terminal symbols (lower case by convention) so we need to look to further rules to see what an ‘input’ etc. is really made of. 


input:   INPUTS LBRACE line_sequence RBRACE

;

 

output: OUTPUTS LBRACE line_level_sequence RBRACE          {printf("\nwhile (TRUE) {");}

;

Rounded Rectangular Callout: The code within the braces is executed when this production rule is reached. Without any code a default rule {$$ = $1;} is executed. This default rule says that the symbol on the left-hand side of the colon is set to the first symbol on the right-hand side. 


line_sequence:            LINE line_sequence_remainder          {pushInputLine($1);}

;

 

line_sequence_remainder:     /* empty */

                                                |

                                                COMMA line_sequence

;

 

line_level_sequence:  line_level line_level_sequence_remainder    {pushOutputLine($1);} 

;

Rounded Rectangular Callout: A ‘line_level_sequence_remainder’ can be empty …
 


line_level_sequence_remainder:       /* empty */

                                                            |

Rounded Rectangular Callout: … or it can be a ‘,’ followed by a ‘line_level_sequence’.                                                            COMMA line_level_sequence

Rounded Rectangular Callout: A production rule composed of terminal symbols from the lexical analyzer. Note that the second rule concatenates the ‘!’ with the port line so you can have lines like ‘!a3’ as well as ‘h2’.;

 

line_level:        LINE       

                        |

                        BANG LINE       {$$ = $1; $$ = strcat($$, $2);}   

Rounded Rectangular Callout: Notice how a ‘pattern_action_list’ is defined in terms of itself. This is a recursive definition and in particualar, it’s left-recursive since ‘pattern_action_list’ is on the left-hand side. This uses less stack space and is the recommended style in ‘bison’.;

 

pattern_action_list: /* empty */

|

pattern_action_list pattern_action    

;

 

pattern_action: expr LBRACE action RBRACE {doPatternAction($1, $3);}

;

Rounded Rectangular Callout: When we reach this production rule we’ve got a complete ‘pattern {action}’ sequence and call ‘doPatternAction()’ to output the expression and it’s action. 


action: action COMMA line_level      {$$ = addNodeOperatorAction($1, $3);}

            |

            line_level         {$$ = addNodeActionId($1);}

;

 

expr: expr AND term   {$$ = addNodeOperator(AND, $1, $3);}

  |

              expr OR term {$$ = addNodeOperator(OR, $1, $3);}

              |

              term

;

 

term: LPAREN expr RPAREN     {$$ = $2;}         

|

line_level         {$$ = addNodeId($1);}

;

Rounded Rectangular Callout: All the code below this %% marker is supplied by you, the language author. 

 


%% /* Additional C code */

 

Rounded Rectangular Callout: This is the output file generated by ‘flex.exe’ run on the ‘port.l’ input file. Rather than copy the entire file here we just use this C mechanism to include the file for compilation. 


#include "lex.yy.c"

 

void yyerror (char* s)

Rounded Rectangular Callout: Along with ‘main()’ you need to supply ‘yyerror()’. This function is called by the compiler if there is an error parsing the input language.{

            printf (" %s\n", s);

}

Rounded Rectangular Callout: This file contains all the Zilog specific code to setup I/O lines and then read or set the values of those lines.
 


main ()

{

            /* Initialize IO boilerplate code. */

            FILE* stream = fopen("\\dev\\parser\\port1\\boilerplate.c", "r" );

            if (stream == NULL) {

                        printf("Error opening boilerplate code.\n");

            } else {

                        #define BUF_LEN (256)

                        char buffer[BUF_LEN];

                        while (fgets(buffer, BUF_LEN, stream) != NULL) {

                                    printf("%s", buffer);

                        }

                        fclose(stream);

            }

 

            // To turn on debugging, make sure the next line is uncommented and

            //  turn on the -t (also use -v -l) options in bison.exe.

            // yydebug = 1;

yyin = fopen("\\dev\\parser\\port1\\port.def", "r" );

            yyparse ();

            // The closing braces for 'main()' and 'while (TRUE)'.

            printf("\n} /* while */");

            printf("\n} /* main */\n");

}

 

User defined code omitted here for brevity. See the source code for the full listing.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

How all the parts fit together to build a compiler.

 

 

 

Vertical Scroll: port.l
Token definitions
Flowchart: Alternate Process: flex.exe port.l
Vertical Scroll: lex.yy.c
Lexical analysis
Vertical Scroll: port.y
Grammar definitions
Vertical Scroll: port_tab.c
Lexical and syntactic analysis
Rounded Rectangle: bison.exe –v –l port.y
Flowchart: Alternate Process: C compiler
Rounded Rectangle: Zilog ZDS II Z8 Encore! compiler
Vertical Scroll: output.c
New language compiled to C code
Rounded Rectangle: porty.exe
Vertical Scroll: port.def
New language goes here
Explosion 2: output.lod
Can be fed to the Z8! Encore
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Sources

1.   A Compact Guide to Lex and Yacc, Thomas Niemann, epaperpress.com

 

2. Bison 1.24 Man page, Charles Donnelly and Richard Stallman, Free Software Foundation, ISBN-1-882114-30-2

 

3. Flex 2.5.2 Man page, Vern Paxson

 

4. The AWK Programming Language, Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger, 1988, Addison-Wesley, ISBN 0-201-07981-X

 

About the author

Gareth Scott started programming with the Altair 8080 in 1975. He has degrees in Computer Science and Electrical Engineering from Sonoma State University and Cleveland State University respectively.  He worked at Autodesk for 10 years as a programmer on AutoCAD. His interests are in distributed microcontroller systems and industrial control. You may reach him at embeddedcontrollers@yahoo.com.