XTRAN Example — Tally DSV Values by Field Name
Scenario — you want to tally the occurrences of field values in delimiter separated value (DSV) data.
XTRAN to the rescue!
The following example uses an XTRAN rules file comprising 86 non-comment lines of "meta-code" (XTRAN's rules language) to tally field values by field name in DSV data.
The rules took less than one hour to write and ½ hour to debug. (That's right, less than 1½ hours total!)
You specify to the rules, via environment variable values:
- The name of a DSV data file to process, starting with a field label row; empty lines and lines starting with a semicolon are ignored
- The name of the output file to create
- (optionally) The DSV delimiting character, defaulting to comma
The rules output the resulting value tallies in the following DSV format (assuming the DSV delimiting character is defaulted to comma):
<label>,<value>,<tally>
where:
<label>
|
Field's label, from the label row |
<value>
|
Field's value |
<tally>
|
Number of times <value> occurs
for <label>
|
For instance, given the following DSV data input:
; Label row: ; make,color,year ; ; Data: ; Chrysler,red,2013 Cadillac,black,2015 Ford,blue,2015 Cadillac,white,2015 Ford,white,2013 Chrysler,black,2013 Ford,white,2014
The output will be
color,black,2 color,blue,1 color,red,1 color,white,3 make,Cadillac,2 make,Chrysler,2 make,Ford,3 year,2013,3 year,2014,1 year,2015,3
How can such powerful and generalized data manipulation be automated in less than 1½ hours and only 86 code lines of XTRAN rules? Because there is so much capability already available as part of XTRAN's rules language. These rules take advantage of the following functionality:
- Text file input and output
- Text manipulation
- Text formatting
- Delimited list manipulation
- Environment variable manipulation
- Content-addressable data bases
- Creating new meta-functions written in meta-code, which we call user meta-functions
Process Flowchart
Here is a flowchart for this process, in which the elements are color coded:
- BLUE for XTRAN versions (runnable programs)
- ORANGE for XTRAN rules (text files)
- PURPLE for text data files