XTRAN Example — Enable Sorting of DSV Data
Scenario — you have some delimiter separated value (DSV) data that you want to sort using a text sorting utility. Your task is complicated by the following issues:
- The delimited values are variable length.
- You want to sort some fields in ascending order and others in descending order.
- You want to sort some fields effectively right-justified (e.g. numeric data).
- You want to sort some fields case-sensitively and others case-insensitively.
XTRAN to the rescue!
The following example uses an XTRAN rules file comprising 300 non-comment lines of XTRAN's rules language ("meta-code") to prepare DSV data for sorting. The original rules took just over an hour to write and about an hour to debug. (That's right, just over two hours total!)
This is an example of XTRAN's ability to automate the manipulation of data as well as code.
You tell the rules, via environment variable values:
- Optionally, the data's DSV delimiting character, defaulting to comma.
- Optionally, the data file's comment character. If specified, the rules ignore any line that starts with it as a comment line. Note — comment lines will therefore not be in the output file.
- Which DSV fields are to be used for the sort, and in what order.
- For each sort field:
- Whether to sort the field in ascending or descending order.
- Optionally, whether to sort the field effectively right-justified within the maximum length seen in the data for the field. (It took only 30 minutes to add this feature to the XTRAN rules!)
- Optionally, whether to sort case-sensitive (the default) or case-insensitive. (It took only 15 minutes to add this feature to the XTRAN rules!)
The rules then prepend, to each DSV data line, the fields on which to sort:
- In the proper sorting order.
- With each descending sort field's text inverted to achieve that result.
- Padded to each field's maximum length in the data file
— with
z
s if sorting the field descending, else with spaces, and at the beginning if the field is to be sorted right-justified, else at the end. - Forced to lower case if sorting the field case-insensitively.
- With the added field copies separated from the original data using a unique text token, to facilitate their removal after sorting.
You can then use a "dumb" sorting utility to do the
sorting, followed by a pass through a text processing utility such
as sed
to remove the prepended sort field copies and separating
token, restoring the data to its original form (but now sorted as
requested).
We provide, with XTRAN, a BASH script to automate this process.
In the example below, we sort some DSV data:
- On the 4th field descending and right-justified, then
- On the 3rd field ascending and case-insensitive, then
- On the 1st field descending
The input to and output from XTRAN are untouched.
How can such powerful and generalized data manipulation be automated in about 3 hours and only 300 lines of rules? Because there is so much capability already available as part of XTRAN's rules language. The rules used for this example take advantage of the following functionality provided by that rules language:
- Text file input and output
- Text manipulation
- Delimited list manipulation
- Content-addressable data base
Process Flowchart
Here is a flowchart for this process, in which the elements are color coded:
- BLUE for XTRAN versions (runnable programs)
- ORANGE for XTRAN rules (text files)
- PURPLE for text data files
Input to XTRAN:
# dsvsrt.in -- Input file for demonstrating sorting of DSV data # def!abc12!RCD3!1,230!end3 abc!xyz1234!Rcd1!25!end1 abc!xyz1!rcd2!1,432!end2 xyz!xyz1!rcd2!1,230!end2 def!xyz1234!RCD1!25!end1 abc!xxyz!Rcd1!123,456!end3 # # End of dsvsrt.in
Output from XTRAN, as processed by the sort utility and the text utility:
abc!xxyz!Rcd1!123,456!end3 abc!xyz1!rcd2!1,432!end2 xyz!xyz1!rcd2!1,230!end2 def!abc12!RCD3!1,230!end3 abc!xyz1234!Rcd1!25!end1 def!xyz1234!RCD1!25!end1