XTRAN Example — Analyze HTML Embedded Substitutions & Their Occurrences
XTRAN treats HTML as a computer language, in which each tag, line or segment of nonmarkup text, or end tag is a "statement", and each tag attribute is a "statement attribute".
The following example uses an XTRAN rules file comprising 93 non-comment lines of XTRAN's rules language ("meta-code") to analyze all in-line substitutions in HTML.
The HTML mining rules for this example can easily be enhanced to produce DSV output that can be interactively queried using existing XTRAN rules.
The following is an English paraphrase of the XTRAN rules used for this example.
For each HTML "statement" If nonmarkup text line or segment For each embedded substitution in text Tally substitution name occurrence by name Record source file and line of this occurrence Sort substitution names For each substitution name seen, alphabetically Report substitution name with tally For each occurrence of this substitution name Report occurrence source file and line
How can such powerful and generalized code mining be automated in only 93 code lines of XTRAN rules? Because there is so much capability already available as part of XTRAN's rules language. These rules take advantage of the following functionality:
- Text manipulation
- Text formatting
- Delimited list manipulation
- "Per statement" recursive iterator
- Access to XTRAN's Internal Representation (XIR)
- Meta-variable pointers
Process Flowchart
Here is a flowchart for this process, in which the elements are color coded:
- BLUE for XTRAN versions (runnable programs)
- ORANGE for XTRAN rules (text files)
- RED for
code - PURPLE for text data files
Output from XTRAN:
Running these rules on this HTML page generated the following XTRAN analysis output:
HTML Embedded Substitution Usage & 3 htmsub.html(19) htmsub.html(57) htmsub.html(60) 9 htmsub.html(19) htmsub.html(19) htmsub.html(27) htmsub.html(30) htmsub.html(30) htmsub.html(49) htmsub.html(75) htmsub.html(75) htmsub.html(94)