XTRAN Example — Restructure C, C++, Java, or C#
Scenario 1: You have used XTRAN to translate some assembly code to C, C++, Java, or C#. Because it came from assembler, the result is not well-structured.
Scenario 2: You have inherited some C, C++, Java, or C# code as the result of an acquisition, and it isn't in the best of shape, being rife with "threaded code".
XTRAN to the rescue!
The following example uses a rule set written
in XTRAN's "meta-code" (rules language) that
implements a set of code structuring algorithms to eliminate goto
statements in C, C++, Java, or C# by imposing if
,
else
, and for
statements. The algorithms use
recursive statement and expression pattern matching and replacement
capabilities provided by XTRAN's meta-code, involving
"match" statement patterns and "replace" statement
patterns.
We have developed additional patterns and replacements to eliminate even
more goto
s by imposing while
and do
statements as well.
The rules used for this example comprise 437 code lines of XTRAN's rules language. They are not very specific to C, C++, Java, or C#; they can easily be adapted to work for any 3GL language. In fact, we also have versions for COBOL and PL/I.
How can such powerful and generalized code quality improvement be achieved in less than 500 lines of rules? Because there is so much capability already available as part of XTRAN's rules language. The rules used for this example take advantage of the following functionality provided by that rules language:
- Statement and expression pattern matching and replacement
- Text manipulation
- Text formatting
- Access to XTRAN's Internal Representation (XIR)
- XIR's language-independent properties
- Navigation in XIR
- Creating new meta-functions written in meta-code, which we call user meta-functions
In the patterns below, this
indicates C / C++ / Java / C#
elements and <this>
indicates meta
(rules language) elements. Also, the matching of each occurrence
of <label>
in each pattern is
constrained in the rules by a condition — the matching label must have
only one GOTO
to it.
Structure with "if"
To impose if
s, we use the following "match" and
"replace" patterns.
The first if
"match" pattern has the form:
if (<expr>) goto <label>; <stmts> <label>: <stmt>
where <expr>
matches any
conditional expression, <label>
matches any label, <stmts>
match
one or more statements containing no if
at the top level and no
labels, and <stmt>
matches any
statement. Note that the second occurrence of <label>
must be the same symbol as the first
occurrence in order for the pattern to match.
The second if
"match" pattern is the same as the
first, except that it allows if
statements at the top level of
<stmts>
.
The if
"replace" pattern has the form:
if (!<expr>) <stmts> <label>: <stmt>
where <expr>
, <stmts>
, <label>
, and <stmt>
are whatever matched them in the
"match" patterns.
Structure with "if" and "else"
To impose if
/else
s, we use the following
"match" and "replace" patterns.
The if
/else
"match" pattern has the
form:
if (<expr>) { <stmts1> goto <label>; } <stmts2> <label>: <stmt>
The if
/else
"replace" pattern has the
form:
if (<expr>) <stmts1> else <stmts2> <label>: <stmt>
where <expr>
, <stmts1>
, <stmts2>
, <label>
, and <stmt>
are whatever matched them in the
"match" pattern.
Structure with "for"
To impose for
s, we use two pattern and replacement sets —
one set for "up-counting" for
s and one set for
"down-counting" for
s.
"Up-counting" for
The "up-counting" for
"match" pattern has
the form:
<index> = <init>; <label>: <stmts>; <index> += <increm>; if (<index> < <limit>) goto <label>;
where <index>
matches the index
variable, <init>
matches its
initial value, <label>
matches any
label, <stmts>
matches one or more
statements with no labels, <increm>
matches the index increment value,
and <limit>
matches the index
limit value. Note that the second occurrence of <label>
must be the same symbol as the first
occurrence in order for the pattern to match.
The "up-counting" for
"replace" pattern has
the form:
for (<index> = <init>; <index> <= <limit>; <index> += <increm>) <stmts>;
where <index>
, <init>
, <limit>
, <increm>
, and <stmts>
are whatever matched them in the
"match" pattern.
"Down-counting" for
The "down-counting" for
"match" pattern has
the form:
<index> = <init>; <label>: <stmts>; <index> -= <decrem>; if (<index> >= <limit>) goto <label>;
where <index>
matches the index
variable, <init>
matches its
initial value, <label>
matches any
label, <stmts>
matches one or more
statements with no labels, <decrem>
matches the index decrement value,
and <limit>
matches the index
limit value. Note that the second occurrence of <label>
must be the same symbol as the first
occurrence in order for the pattern to match.
The "down-counting" for
"replace" pattern
has the form:
for (<index> = <init>; <index> > <limit>; <index> -= <decrem>) <stmts>;
where <index>
, <init>
, <limit>
, <decrem>
, and <stmts>
are whatever matched them in the
"match" pattern.
Strategy
For each "match" pattern, the meta-code repeatedly scans the code to be processed, using a facility built into XTRAN's rules language that visits each statement (recursively) once in each pass, looking for matches and replacing them with the corresponding "replace" patterns, until it can find no more matches to be replaced. The meta-code also tells XTRAN to delete any label that has no references, at the end of each pass, so if the structuring deletes all references to a label, the label will disappear. This has the effect that a successful match and replacement can trigger more structuring on the next pass.
The meta-code also:
- Simplifies to
+=
and-=
expressions where possible, before attempting pattern matches. For example,a = 5 + a
becomesa += 5
. In addition to improving the quality of the code, this allows us to minimize the expression permutations in our match patterns. - Simplifies
!<expr>
in the restructured code where possible; for example,!(a < b)
becomesa >= b
. - Cleans up any
{}
block that contains only one statement, by eliminating the{}
around the statement. - Eliminates any statement labels if structuring has eliminated
all
goto
s to them. Note that doing so may trigger further structuring on the next round. - Preserves all comments on all affected statements, including deleted
statements, and automatically annotates comments in cases where the
meaning of a conditional expression is reversed by structuring, inserting
(rvsd)
if the condition is reversed and removing it if the same condition is reversed back to its original form by additional structuring. - Reports, in the run log, the number of
goto
s eliminated by each algorithm.
The rules are not very specific to C / C++ / Java / C#; they can easily be adapted for any language. In fact, we also have versions for COBOL and PL/I.
Process Flowchart
Here is a flowchart for this process, in which the elements are color coded:
- BLUE for XTRAN versions (runnable programs)
- ORANGE for XTRAN rules (text files)
- RED for
code - PURPLE for text data files
C / C++ / Java / C# code before structuring:
void func(void) /*void func(void)*/ { long i, j; /*long i, j;*/ /* * The following should structure to nested "if"s: */ lbl0: i = 2; /*lbl0: i = 2;*/ if (i == 0 || i == 1) /*if (i == 0 || i == 1)*/ goto lbl2; /*goto lbl2;*/ i = 1; /*i = 1;*/ if (i > 1) /*if (i > 1)*/ goto lbl1; /*goto lbl1;*/ i = 2; /*i = 2;*/ lbl1: i = 3; /*lbl1: i = 3;*/ lbl2: i = 4; /*lbl2: i = 4;*/ i = 5; /*lbl3: i = 5;*/ /* * The following should structure to an "if/else": */ if (i == 0 || i == 1) /*if (i == 0 || i == 1)*/ { /*start of "if" blk*/ i = 6; /*i = 6;*/ i = 7; /*i = 7;*/ goto lbl3; /*goto lbl3;*/ } /*end of "if" blk*/ i = 8; /*i = 8;*/ lbl3: i = 9; /*lbl3: i = 9;*/ /* * The following should structure to an up-counting "for" with "i" as the * index: */ i = 10; /*i = 10;*/ lbl4: j = i + 1; /*lbl4: j = i + 1;*/ j = i + 2; /*j = i + 2;*/ i += 3; /*i += 3;*/ if (i < 10) /*if (i < 10)*/ goto lbl4; /*goto lbl4;*/ j = i + 3; /*j = i + 3;*/ /* * The following should structure to a down-counting "for" with "i" as the * index. Note that, by the time we're attempting this match, we have * collapsed "i = i - 2" to "i -= 2". */ i = 11; /*i = 11;*/ lbl5: j = i + 1; /*lbl5: j = i + 1;*/ j = i + 2; /*j = i + 2;*/ i = i - 2; /*i = i - 2;*/ if (i >= 1) /*if (i >= 1)*/ goto lbl5; /*goto lbl5;*/ j = i + 3; /*j = i + 3;*/ /* * The reference in the following "goto" will prevent XTRAN from deleting lbl0: */ if (i < 1) /*if (i < 1)*/ goto lbl0; /*goto lbl0;*/ return; /*return*/ }
C / C++ / Java / C# code after structuring by XTRAN:
void func(void) /*void func(void)*/ { long i, j; /*long i, j;*/ /* * The following should structure to nested "if"s: */ lbl0: i = 2; /*lbl0: i = 2;*/ if (i != 0 && i != 1) /*(rvsd) if (i == 0 || i == 1)*/ { /*goto lbl2;*/ i = 1; /*i = 1;*/ if (i <= 1) /*(rvsd) if (i > 1)*/ { /*goto lbl1;*/ i = 2; /*i = 2;*/ } i = 3; /*lbl1: i = 3;*/ } i = 4; /*lbl2: i = 4;*/ i = 5; /*lbl3: i = 5;*/ /* * The following should structure to an "if/else": */ if (i == 0 || i == 1) /*if (i == 0 || i == 1)*/ { i = 6; /*i = 6;*/ i = 7; /*i = 7;*/ } else /*goto lbl3;*/ { i = 8; /*i = 8;*/ } i = 9; /*lbl3: i = 9;*/ /* * The following should structure to an up-counting "for" with "i" as the * index: */ for (i = 10; i < 10; i += 3) /*i = 10;*/ { j = i + 1; /*lbl4: j = i + 1;*/ j = i + 2; /*j = i + 2;*/ /*i += 3;*/ /*if (i < 10)*/ /*goto lbl4;*/ } j = i + 3; /*j = i + 3;*/ /* * The following should structure to a down-counting "for" with "i" as the * index. Note that, by the time we're attempting this match, we have * collapsed "i = i - 2" to "i -= 2". */ for (i = 11; i >= 1; i -= 2) /*i = 11;*/ { j = i + 1; /*lbl5: j = i + 1;*/ j = i + 2; /*j = i + 2;*/ /*i = i - 2;*/ /*if (i >= 1)*/ /*goto lbl5;*/ } j = i + 3; /*j = i + 3;*/ /* * The reference in the following "goto" will prevent XTRAN from deleting lbl0: */ if (i < 1) /*if (i < 1)*/ goto lbl0; /*goto lbl0;*/ return; /*return*/ }
Excerpt from run log:
Structuring rules eliminated 5 of 6 "goto"s (83%): 2 by inserting "if"s 1 by inserting "if/else"s 2 by inserting "for"s