XTRAN Example — Restructure C, C++, Java, or C#

Scenario 1:  You have used XTRAN to translate some assembly code to C, C++, Java, or C#.  Because it came from assembler, the result is not well-structured.

Scenario 2:  You have inherited some C, C++, Java, or C# code as the result of an acquisition, and it isn't in the best of shape, being rife with "threaded code".

XTRAN to the rescue!

The following example uses a rule set written in XTRAN's "meta-code" (rules language) that implements a set of code structuring algorithms to eliminate goto statements in C, C++, Java, or C# by imposing if, else, and for statements.  The algorithms use recursive statement and expression pattern matching and replacement capabilities provided by XTRAN's meta-code, involving "match" statement patterns and "replace" statement patterns.

We have developed additional patterns and replacements to eliminate even more gotos by imposing while and do statements as well.

The rules used for this example comprise 437 code lines of XTRAN's rules language.  They are not very specific to C, C++, Java, or C#; they can easily be adapted to work for any 3GL language.  In fact, we also have versions for COBOL and PL/I.

How can such powerful and generalized code quality improvement be achieved in less than 500 lines of rules?  Because there is so much capability already available as part of XTRAN's rules language.  The rules used for this example take advantage of the following functionality provided by that rules language:

In the patterns below, this indicates C / C++ / Java / C# elements and <this> indicates meta (rules language) elements.  Also, the matching of each occurrence of <label> in each pattern is constrained in the rules by a condition — the matching label must have only one GOTO to it.

Structure with "if"

To impose ifs, we use the following "match" and "replace" patterns.

The first if "match" pattern has the form:

        if (<expr>)
            goto <label>;
        <stmts>
<label>:
        <stmt>

where <expr> matches any conditional expression, <label> matches any label, <stmts> match one or more statements containing no if at the top level and no labels, and <stmt> matches any statement.  Note that the second occurrence of <label> must be the same symbol as the first occurrence in order for the pattern to match.

The second if "match" pattern is the same as the first, except that it allows if statements at the top level of <stmts>.

The if "replace" pattern has the form:

        if (!<expr>)
            <stmts>
<label>:
        <stmt>

where <expr>, <stmts>, <label>, and <stmt> are whatever matched them in the "match" patterns.

Structure with "if" and "else"

To impose if/elses, we use the following "match" and "replace" patterns.

The if/else "match" pattern has the form:

        if (<expr>)
            {
            <stmts1>
            goto <label>;
            }
        <stmts2>
<label>:
        <stmt>

The if/else "replace" pattern has the form:

        if (<expr>)
            <stmts1>
        else
            <stmts2>
<label>:
        <stmt>

where <expr>, <stmts1>, <stmts2>, <label>, and <stmt> are whatever matched them in the "match" pattern.

Structure with "for"

To impose fors, we use two pattern and replacement sets — one set for "up-counting" fors and one set for "down-counting" fors.

"Up-counting" for

The "up-counting" for "match" pattern has the form:

        <index> = <init>;
<label>:
        <stmts>;
        <index> += <increm>;
        if (<index> < <limit>)
            goto <label>;

where <index> matches the index variable, <init> matches its initial value, <label> matches any label, <stmts> matches one or more statements with no labels, <increm> matches the index increment value, and <limit> matches the index limit value.  Note that the second occurrence of <label> must be the same symbol as the first occurrence in order for the pattern to match.

The "up-counting" for "replace" pattern has the form:

        for (<index> = <init>;
          <index> <= <limit>;
          <index> += <increm>)
            <stmts>;

where <index>, <init>, <limit>, <increm>, and <stmts> are whatever matched them in the "match" pattern.

"Down-counting" for

The "down-counting" for "match" pattern has the form:

        <index> = <init>;
<label>:
        <stmts>;
        <index> -= <decrem>;
        if (<index> >= <limit>)
            goto <label>;

where <index> matches the index variable, <init> matches its initial value, <label> matches any label, <stmts> matches one or more statements with no labels, <decrem> matches the index decrement value, and <limit> matches the index limit value.  Note that the second occurrence of <label> must be the same symbol as the first occurrence in order for the pattern to match.

The "down-counting" for "replace" pattern has the form:

        for (<index> = <init>;
          <index> > <limit>;
          <index> -= <decrem>)
            <stmts>;

where <index>, <init>, <limit>, <decrem>, and <stmts> are whatever matched them in the "match" pattern.

Strategy

For each "match" pattern, the meta-code repeatedly scans the code to be processed, using a facility built into XTRAN's rules language that visits each statement (recursively) once in each pass, looking for matches and replacing them with the corresponding "replace" patterns, until it can find no more matches to be replaced.  The meta-code also tells XTRAN to delete any label that has no references, at the end of each pass, so if the structuring deletes all references to a label, the label will disappear.  This has the effect that a successful match and replacement can trigger more structuring on the next pass.

The meta-code also:

The rules are not very specific to C / C++ / Java / C#; they can easily be adapted for any language.  In fact, we also have versions for COBOL and PL/I.



Process Flowchart

Here is a flowchart for this process, in which the elements are color coded:

data flowchart

C / C++ / Java / C# code before structuring:

void func(void)                                 /*void func(void)*/
{
        long i, j;                              /*long i, j;*/

/*
 * The following should structure to nested "if"s:
 */
lbl0:   i = 2;                                  /*lbl0:   i = 2;*/
        if (i == 0 || i == 1)                   /*if (i == 0 || i == 1)*/
            goto lbl2;                          /*goto lbl2;*/
        i = 1;                                  /*i = 1;*/
        if (i > 1)                              /*if (i > 1)*/
            goto lbl1;                          /*goto lbl1;*/
        i = 2;                                  /*i = 2;*/
lbl1:   i = 3;                                  /*lbl1: i = 3;*/
lbl2:   i = 4;                                  /*lbl2: i = 4;*/
        i = 5;                                  /*lbl3: i = 5;*/
/*
 * The following should structure to an "if/else":
 */
        if (i == 0 || i == 1)                   /*if (i == 0 || i == 1)*/
            {                                   /*start of "if" blk*/
            i = 6;                              /*i = 6;*/
            i = 7;                              /*i = 7;*/
            goto lbl3;                          /*goto lbl3;*/
            }                                   /*end of "if" blk*/
        i = 8;                                  /*i = 8;*/
lbl3:   i = 9;                                  /*lbl3: i = 9;*/
/*
 * The following should structure to an up-counting "for" with "i" as the
 * index:
 */
        i = 10;                                 /*i = 10;*/
lbl4:   j = i + 1;                              /*lbl4: j = i + 1;*/
        j = i + 2;                              /*j = i + 2;*/
        i += 3;                                 /*i += 3;*/
        if (i < 10)                             /*if (i < 10)*/
            goto lbl4;                          /*goto lbl4;*/
        j = i + 3;                              /*j = i + 3;*/
/*
 * The following should structure to a down-counting "for" with "i" as the
 * index.  Note that, by the time we're attempting this match, we have
 * collapsed "i = i - 2" to "i -= 2".
 */
        i = 11;                                 /*i = 11;*/
lbl5:   j = i + 1;                              /*lbl5: j = i + 1;*/
        j = i + 2;                              /*j = i + 2;*/
        i = i - 2;                              /*i = i - 2;*/
        if (i >= 1)                             /*if (i >= 1)*/
            goto lbl5;                          /*goto lbl5;*/
        j = i + 3;                              /*j = i + 3;*/
/*
 * The reference in the following "goto" will prevent XTRAN from deleting lbl0:
 */
        if (i < 1)                              /*if (i < 1)*/
            goto lbl0;                          /*goto lbl0;*/
        return;                                 /*return*/      
}


C / C++ / Java / C# code after structuring by XTRAN:

void func(void)                                 /*void func(void)*/
{
        long i, j;                              /*long i, j;*/
/*
 * The following should structure to nested "if"s:
 */
lbl0:
        i = 2;                                  /*lbl0:   i = 2;*/
        if (i != 0 && i != 1)                   /*(rvsd) if (i == 0 || i == 1)*/
            {                                   /*goto lbl2;*/
            i = 1;                              /*i = 1;*/
            if (i <= 1)                         /*(rvsd) if (i > 1)*/
                {                               /*goto lbl1;*/
                i = 2;                          /*i = 2;*/
                }
            i = 3;                              /*lbl1: i = 3;*/
            }
        i = 4;                                  /*lbl2: i = 4;*/
        i = 5;                                  /*lbl3: i = 5;*/
/*
 * The following should structure to an "if/else":
 */
        if (i == 0 || i == 1)                   /*if (i == 0 || i == 1)*/
            {
            i = 6;                              /*i = 6;*/
            i = 7;                              /*i = 7;*/
            }
        else                                    /*goto lbl3;*/
            {
            i = 8;                              /*i = 8;*/
            }
        i = 9;                                  /*lbl3: i = 9;*/
/*
 * The following should structure to an up-counting "for" with "i" as the
 * index:
 */
        for (i = 10; i < 10; i += 3)            /*i = 10;*/
            {
            j = i + 1;                          /*lbl4: j = i + 1;*/
            j = i + 2;                          /*j = i + 2;*/
                                                /*i += 3;*/
                                                /*if (i < 10)*/
                                                /*goto lbl4;*/
            }
        j = i + 3;                              /*j = i + 3;*/
/*
 * The following should structure to a down-counting "for" with "i" as the
 * index.  Note that, by the time we're attempting this match, we have
 * collapsed "i = i - 2" to "i -= 2".
 */
        for (i = 11; i >= 1; i -= 2)            /*i = 11;*/
            {
            j = i + 1;                          /*lbl5: j = i + 1;*/
            j = i + 2;                          /*j = i + 2;*/
                                                /*i = i - 2;*/
                                                /*if (i >= 1)*/
                                                /*goto lbl5;*/
            }
        j = i + 3;                              /*j = i + 3;*/
/*
 * The reference in the following "goto" will prevent XTRAN from deleting lbl0:
 */
        if (i < 1)                              /*if (i < 1)*/
            goto lbl0;                          /*goto lbl0;*/
        return;                                 /*return*/
}



Excerpt from run log:

Structuring rules eliminated 5 of 6 "goto"s (83%):
      2 by inserting "if"s
      1 by inserting "if/else"s
      2 by inserting "for"s