XTRAN Example — Analyze HTML Embedded Substitutions & Their Occurrences

XTRAN treats HTML as a computer language, in which each tag, line or segment of nonmarkup text, or end tag is a "statement", and each tag attribute is a "statement attribute".

The following example uses an XTRAN rules file comprising 93 non-comment lines of XTRAN's rules language ("meta-code") to analyze all in-line substitutions in HTML.

The HTML mining rules for this example can easily be enhanced to produce DSV output that can be interactively queried using existing XTRAN rules.

The following is an English paraphrase of the XTRAN rules used for this example.

    For each HTML "statement"
        If nonmarkup text line or segment
            For each embedded substitution in text
                Tally substitution name occurrence by name
                Record source file and line of this occurrence
    Sort substitution names
    For each substitution name seen, alphabetically
        Report substitution name with tally
        For each occurrence of this substitution name
            Report occurrence source file and line

How can such powerful and generalized code mining be automated in only 93 code lines of XTRAN rules? Because there is so much capability already available as part of XTRAN's rules language. These rules take advantage of the following functionality:

Text manipulation
Text formatting
Delimited list manipulation
"Per statement" recursive iterator
Access to XTRAN's Internal Representation (XIR)
Meta-variable pointers

Process Flowchart

Here is a flowchart for this process, in which the elements are color coded:

BLUE for XTRAN versions (runnable programs)
ORANGE for XTRAN rules (text files)
RED for code
PURPLE for text data files

Output from XTRAN:

Running these rules on this HTML page generated the following XTRAN analysis output:

                        HTML Embedded Substitution Usage

&amp;  3
    htmsub.html(19)
    htmsub.html(57)
    htmsub.html(60)
&nbsp;  9
    htmsub.html(19)
    htmsub.html(19)
    htmsub.html(27)
    htmsub.html(30)
    htmsub.html(30)
    htmsub.html(49)
    htmsub.html(75)
    htmsub.html(75)
    htmsub.html(94)