XTRAN Example — Load DSV Data into XTRAN Data Base


Menu

Introduction

Scenario — you're using XTRAN to automate data analysis and transformation, and you have a mass of delimiter-separated-values (DSV) data that you want to load into an XTRAN in-memory content-addressable data base (associative array), where you can manipulate it more easily.

XTRAN itself to the rescue!

The following examples use an XTRAN rules file comprising 272 non-comment lines of XTRAN's rules language ("meta-code") to automate loading of DSV data into an XTRAN data base.  The rules took 1¼ hours to create and 2 hours to debug.   (That's right, only 3¼ hours total!)

The rules create a user meta-function you can call from your XTRAN rules to load DSV data into an XTRAN data base.

The meta-function assumes that a DSV data file's first non-comment line is a label row (a common practice with DSV files), and it uses those item labels as subscripts when it stores the data in the XTRAN data base.

In the meta-function call, you can specify (or default) the following:

When called, the user meta-function will create the named XTRAN data base if it doesn't exist, and will load the DSV data into it from the named file as follows, based on the DSV data's labels row and on which DSV item number (if any) has unique values:

1st
subscript 
2nd
subscript 
Cell contents
<dlmchsb>    DSV data file's item delimiter character (optional)
<labelsb>    DSV data file's column labels (optional)
<uniqv1>  <label1>  <uniqv1>'s value for <label1>, if any
<uniqv1>  <label2>  <uniqv1>'s value for <label2>, if any
<uniqv1>  <linesb>  Original DSV data line (optional)
<uniqv2>  <label1>  <uniqv2>'s value for <label1>, if any
<uniqv2>  <label2>  <uniqv2>'s value for <label2>, if any
<uniqv2>  <linesb>  Original DSV data line (optional)

where:

<uniqvn> is the value of the DSV column that has a unique value for each DSV row, or if that's not given, a unique 1-n integer
<labeln> is the label for a data item (other than the unique one if given)
<dlmchsb> (optional) is the text subscript with which to store the DSV file's item delimiter character in the data base
<labelsb> (optional) is the text subscript with which to store the DSV column labels in the data base
<linesb> (optional) is the (second) text subscript with which to store each original DSV data line in the data base

The meta-function evaluates to the number of DSV data records stored in the data base, or to -1 if something went wrong.

Once the DSV data are loaded into your XTRAN data base, your data manipulation rules can quickly retrieve a <labeln>'s value for a <uniqvn>, as well as a <uniqvn>'s original DSV data line if you requested that, with a simple call to XTRAN's "get a data base cell's contents" built-in meta-function.

If you tell the meta-function to store the DSV data file's item delimiter character and column labels in the data base, you can create XTRAN rules that process data base contents in a very general way.

How can such powerful and generalized data manipulation be automated in only 3¼ hours and only 272 code lines of XTRAN rules?  Because there is so much capability already available as part of XTRAN's rules language.  These rules take advantage of the following functionality:

The input to XTRAN is shown untouched.



Process Flowchart

Here is a flowchart for this process, in which the elements are color coded:

data flowchart

Example 1

For this example, we default the DSV delimiter character to comma and the DSV comment string to semicolon.  We also specify "NAME" as the label of the column that contains unique values.

DSV input data file:

; demd2d-1.dsv -- DSV data file for first example
;
; Label row:
;
NAME,GENDER,AGE
;
; Data:
;
Fred,M,43
Mary,F,19
George,M,55
;
; End of demd2d-1.dsv

Data base contents after import:

1st
subscript 
2nd
subscript 
Cell
contents
"Fred" "AGE" "43"
"Fred" "GENDER" "M"
"George" "AGE" "55"
"George" "GENDER" "M"
"Mary" "AGE" "19"
"Mary" "GENDER" "F"


Example 2

For this example, we specify semicolon as the DSV delimiter character and "??" as the DSV comment string.  We again specify "NAME" as the label of the column that contains unique values; note that it's in a different position from the previous example.

We also specify that the DSV data file's item delimiter character and column labels are to be stored in the data base using the subscripts "DLMCHR" and "LABELS".

We also specify that original DSV data lines are to be stored in the data base using the subscript "DSVLIN".

DSV input data file:

?? demd2d-2.dsv -- DSV data file for second example
??
?? Label row:
??
GENDER;AGE;NAME
??
M;;Fred
F;19;Mary
;55;George
??
?? End of demd2d-2.dsv

Note that "Fred" has no "AGE" given, and that "George" has no "GENDER" given.

Data base contents after import:

1st
subscript 
2nd
subscript 
Cell
contents
"DLMCHR"   ";"
"LABELS"   "GENDER;AGE;NAME"
"Fred" "AGE" (none)
"Fred" "DSVLIN" "M;;Fred"
"Fred" "GENDER" "M"
"George" "AGE" "55"
"George" "DSVLIN" ";55;George"
"George" "GENDER" (none)
"Mary" "AGE" "19"
"Mary" "DSVLIN" "F;19;Mary"
"Mary" "GENDER" "F"


Example 3

For this example, we default the DSV delimiter character to comma and the DSV comment string to semicolon.

We don't specify a "unique values" column label, so the XTRAN meta-function assigns a unique 1-n integer to each DSV data row and uses that as the first subscript.

DSV input data file:

; demd2d-3.dsv -- Third DSV data file for demd2d-x.xxc
; Revised 2018-10-03.2153 by Stephen F. Heffner (Win10 on PREC7520)
; Created 2018-10-03 by Stephen F. Heffner (Win10 on PREC7520)
;
; Label row:
;
GENDER,AGE,NAME
;
M,55
F,19,Mary
M,,George
;
; End of demd2d-3.dsv

Note that the first data line has no "NAME" value, and "George" has no "AGE" value.

Data base contents after import:

1st
subscript 
2nd
subscript 
Cell
contents
1 "AGE" "55"
1 "GENDER" "M"
1 "NAME" (none)
2 "AGE" "19"
2 "GENDER" "F"
2 "NAME" "Mary"
3 "AGE" (none)
3 "GENDER" "M"
3 "NAME" "George"