XTRAN Example — Translate Labeled Structures in HP (Digital, Compaq) VAX MACRO to C

One way to declare a structure in VAX MACRO is in the following form:

	<lbl1>:
	[[<lbl>:]] <opr> ...
	...
	<name>=.-<lbl1>		;OPTIONAL -- LENGTH OF STRUCTURE

where:

        <lbl1>  is any legal symbol in the source language
        <lbl>   is any legal symbol in the source language
        <opr>   is one of:  .ASCII, .BLK{{B|W|L}}, .BYTE,
                            .WORD, .LONG, .ADDRESS

An example of this usage is

STR:
MEMB1:	.BLKB	 3
	.ASCII	 /ABC/
	.ASCII	 ?DEF?
MEMB2:	.LONG	 2,3
	.BLKL	 1
STRSIZ	=	 .-STR

(Note the use of "fillers" — members with no names.)

Normally, XTRAN would try to translate this as a structure of unknown type named memb1 plus a long array named memb2, and it would be unable to translate the = statement because of the Program Counter reference.

We have created a set of XTRAN rules ("meta-code") to translate this kind of structure declaration, which we call a LBLSTR (labeled structure). These rules, which comprise just over 400 non-comment lines of meta-code, are input to XTRAN after the normal VAX MACRO to C translation rules we provide. They implement the following strategy:

Because we can't reliably determine the start and end of a LBLSTR from the original code, we require a marker at the start of each LBLSTR decl. If it doesn't end with an =, we also require a marker at its end. These markers are embedded in comments that will be automatically deleted by XTRAN.
We post rules to be evaluated after reading and parsing all of our input, but before starting translation. These rules will scan all VAX MACRO code to identify LBLSTR struct declarations, based on the markers we require. When they find one, they will change the operator type of each statement of the LBLSTR (including the ending = if any) to "user data". This prevents XTRAN from trying to combine the LBLSTR's instructions, and also activates our special rules for these operators. They will also mark each such statement to suppress automatic passthrough of its labels and comments, since we will be handling them explicitly in our rules.
We post a translation rule for each of the statement operators that can legally belong to an LBLSTR, overriding that operator's normal rule:
- If it has been marked as "user data", meaning it is part of a LBLSTR:
  - If it is the first member of the LBLSTR, the rule will record its first (or only) label as the structure's name.
  - The rule will record information we'll need to declare it as a member of its LBLSTR structure. This includes its name, its data type, its dimension if any, its initial values if any, and its comments if any.
  - If the member has no name (or it's the first member and we used its name for the structure), the rule will generate, as the member name, filler<n>, where <n> starts from 0 for each LBLSTR.
  - The rule will generate no target code directly from the statement.
  - If the statement is marked as the end of a LBLSTR structure, the rule will use the previously recorded member information (including the information from this statement) to generate a C declaration and initialization of the structure.
- If the statement isn't marked as "user data", the rule will translate it normally.
We post a translation rule for each = statement, overriding the normal rule for =:
- If it has been marked as "user data", meaning that it is the end of a LBLSTR structure, the rule will use the member information we previously recorded to generate a C declaration and initialization of the structure, and will then translate the = using sizeof().
- If the statement isn't marked as "user data", the rule will translate it normally.
Our rules will first create a declaration of the desired C structure with a struct tag, and then an allocation of that struct type, with the given name and with appropriate initial values.
Our rules will also tell XTRAN that the C structure is an allocation of the declared struct type, so that XTRAN will automatically qualify all member name references properly. The struct tag is required in order for this to work.

The following input to, and output from, XTRAN are untouched except for added commentary in the input and paraphrasing of the LBLSTR start and end markers.

Process Flowchart

Here is a flowchart for this process, in which the elements are color coded:

BLUE for XTRAN versions (runnable programs)
ORANGE for XTRAN rules (text files)
RED for code

Input to XTRAN:

MB4DIM	=	5				;MB4DIM =      5

ARR:	.ASCII	/XYZ/				;ARR:   .ASCII /XYZ/
ARRSIZ	=	.-ARR				;ARRSIZ = .-ARR

(Note that the following struct is global. It ends with an equate, so we don't need to mark its end.)

;(LBLSTR start marker)
STRCT1::
	.ASCII	/ABC/<9>/DEF/			;       .ASCII /ABC/<9>/DEF/;
MEMB1:	.ASCII	/GHI/				;MEMB1: .ASCII /GHI/;
MEMB2:	.BLKW	1				;MEMB2: .BLKW  1
	.ASCII	/JKL/				;       .ASCII /JKL/;
MEMB3:	.ASCII	/YY/				;MEMB3: .ASCII /YY/;
	.LONG	2,3				;       .LONG  2,3
S1SIZ	=	.-STRCT1			;S1SIZ	=      .-STRCT1

WS1:	.LONG	3				;WS1:   .LONG  3

(Note that the following struct is local. It doesn't end with an equate, so we need to mark its end.)

;(LBLSTR start marker)
STRCT2: .LONG	2,5				;STRCT2:.LONG  2,5
	.ASCII	?XYZ?				;       .ASCII ?XYZ?
MEMB4:	.BLKW	MB4DIM				;MEMB4: .BLKW  MB4DIM
						;(LBLSTR end marker)

WS2:	.BLKW	10				;WS2:   .BLKW  10

CODE:	MOVAL	STRCT1,R3			;       MOVAL  STRCT1,R3
	MOVB	MEMB3+1,R4			;       MOVB   MEMB3+1,R4
	MOVW	MEMB4(R1),MEMB2			;       MOVW   MEMB4(R1),MEMB2
	.END

Output from XTRAN:

	extern long *r3;
	extern char r4;
	extern long r1;

#define MB4DIM 5				/*mb4dim =      5*/

	static char arr[4] = "XYZ";		/*arr:   .ascii /xyz/*/
#define ARRSIZ sizeof(arr)			/*arrsiz = .-arr*/

	struct strct1_str
	    {
	    char filler1[7];

						/*       .ascii /abc/<9>/def/;*/
	    char memb1[3];			/*memb1: .ascii /ghi/;*/
	    short memb2;			/*memb2: .blkw	1*/
	    char filler2[3];			/*       .ascii /jkl/;*/
	    char memb3[2];			/*memb3: .ascii /yy/;*/

	    long filler3[2];			/*       .long	2,3*/
	    };
	    
	struct strct1_str strct1 = 
	    {
	    "ABC\tDEF",				/*filler1*/
	    "GHI",				/*memb1*/
	    0,					/*memb2*/
	    "JKL",				/*filler2*/
	    "YY",				/*memb3*/
	    { 2, 3 }				/*filler3*/

	    };
#define S1SIZ sizeof(strct1)			/*s1siz	 =       .-strct1*/

	static long ws1 = 3;			/*ws1:   .long   3*/

	struct strct2_str
	    {
	    long filler1[2];

						/*strct2:.long	2,5*/
	    char filler2[3];			/*       .ascii ?xyz?*/
	    short memb4[MB4DIM];		/*memb4: .blkw  mb4dim*/
	    };



	    
	static struct strct2_str strct2 = 
	    {
	    { 2, 5 },				/*filler1*/
	    "XYZ",				/*filler2*/
	    { 0 }				/*memb4*/

	    };
	static short ws2[10];			/*ws2:   .blkw  10*/


code:
	r3 = (long *) &strct1;			/*       moval  strct1,r3*/
	r4 = strct1.memb3[1];			/*       movb   memb3+1,r4*/
	strct1.memb2 = *((short *) ((char *)
	  strct2.memb4 + r1));			/*       movw   memb4(r1),memb2*/