XTRAN Example — Enable Sorting of DSV Data
You have some delimiter separated value (DSV) data that you want to sort using a text sorting utility. This task is complicated by the following issues:
- The delimited values are variable length.
- You want to sort some fields in ascending order and others in descending order.
- You want to sort some fields case-sensitive and others case-insensitive.
XTRAN to the rescue!
The following example uses an XTRAN rules file comprising 154 non-comment lines of "meta-code" (XTRAN's rules language) to prepare DSV data for sorting. The rules took just over an hour to write and about an hour to debug. (That's right, just over two hours total!)
This is an example of XTRAN's ability to automate the manipulation of data as well as code.
You tell the rules, via an environment variable value:
- Optionally, the DSV delimiting character, defaulting to comma
- Which DSV fields are to be used for the sort, and in what order
- For each sort field:
- Whether to sort in ascending or descending order
- Optionally, whether to sort case-sensitive (the default) or case-insensitive (it took only 15 minutes to add this feature to the XTRAN rules!)
The rules then prepend, to each DSV data line, the fields on which to sort:
- In the proper order
- Padded to the same length
- Forced to lower case if sorting the field case-insensitive
- With descending sort fields' text inverted to achieve that result
The rules separate the added field copies from the original data with a unique text token, to facilitate their removal after sorting.
You can then use a "dumb" sorting utility to do the
sorting, followed by a pass through a text processing utility such
sed to remove the prepended sort field copies and separating
token, restoring the data to its original form (but now sorted as
We provide, with XTRAN, a BASH script to automate this process.
How can such powerful and generalized data manipulation be automated in a little over 2 hours and only 154 lines of rules? Because there is so much capability already available as part of XTRAN's rules language. The rules used for this example take advantage of the following functionality provided by that rules language:
- Text file input and output
- Text manipulation
- Delimited list manipulation
- Content-addressable data base
In the example below, we sort some DSV data on the 3rd field ascending and case-insensitive, and then on the 1st field descending.
The input to and output from XTRAN are untouched.
Here is a flowchart for this process, in which the elements are color coded:
- BLUE for XTRAN versions (runnable programs)
- ORANGE for XTRAN rules (text files)
- PURPLE for text data files
Input to XTRAN:
def,abc12,rcd3,end3 abc,xyz1234,RCD1,end1 abc,xyz1,RCD2,end2 xyz,xyz1,rcd2,end2 def,xyz1234,Rcd1,end1
Output from XTRAN, as processed by the sort utility and the text utility:
def,xyz1234,Rcd1,end1 abc,xyz1234,RCD1,end1 xyz,xyz1,rcd2,end2 abc,xyz1,RCD2,end2 def,abc12,rcd3,end3