XTRAN Example — Load DSV Data into XTRAN Data Base

Introduction

Scenario — you're using XTRAN to automate data analysis and transformation, and you have a mass of delimiter-separated-values (DSV) data that you want to load into an XTRAN in-memory content-addressable data base (associative array), where you can manipulate it more easily.

XTRAN itself to the rescue!

The following examples use an XTRAN rules file comprising 272 non-comment lines of XTRAN's rules language ("meta-code") to automate loading of DSV data into an XTRAN data base. The rules took 1¼ hours to create and 2 hours to debug. (That's right, only 3¼ hours total!)

The rules create a user meta-function you can call from your XTRAN rules to load DSV data into an XTRAN data base.

The meta-function assumes that a DSV data file's first non-comment line is a label row (a common practice with DSV files), and it uses those item labels as subscripts when it stores the data in the XTRAN data base.

In the meta-function call, you can specify (or default) the following:

The name of the XTRAN data base into which the DSV data is to be loaded.
The file specification of the DSV data file to be loaded. In that file specification, embedded environment variable names will be replaced by their values.
(Optional) Whether or not to create the data base, defaulting to yes. If told not to create it and it exists, the meta-function adds the values in the DSV data file to the existing data base.
(Optional) That file's DSV delimiter character, defaulting to comma.
(Optional) That file's comment string, defaulting to semicolon. Any line starting with it is ignored as a comment.
(Optional) The label of the DSV item that has a unique value for each DSV row. If not given, the meta-function will generate a unique 1-n integer value to use for each DSV row.
(Optional) The data base subscript with which to store the DSV data file's item delimiter character in the data base.
(Optional) The data base subscript with which to store the DSV data file's column labels in the data base.
(Optional) The (second) data base subscript with which to store each original DSV data line in the data base.

When called, the user meta-function will create the named XTRAN data base if it doesn't exist, and will load the DSV data into it from the named file as follows, based on the DSV data's labels row and on which DSV item number (if any) has unique values:

1^st subscript	2^nd subscript	Cell contents
<dlmchsb>		DSV data file's item delimiter character (optional)
<labelsb>		DSV data file's column labels (optional)
<uniqv1>	<label1>	<uniqv1>'s value for <label1>, if any
<uniqv1>	<label2>	<uniqv1>'s value for <label2>, if any
…	…	…
<uniqv1>	<linesb>	Original DSV data line (optional)
<uniqv2>	<label1>	<uniqv2>'s value for <label1>, if any
<uniqv2>	<label2>	<uniqv2>'s value for <label2>, if any
…	…	…
<uniqv2>	<linesb>	Original DSV data line (optional)
…	…	…

where:

<uniqv_n>	is the value of the DSV column that has a unique value for each DSV row, or if that's not given, a unique 1-n integer
<label_n>	is the label for a data item (other than the unique one if given)
<dlmchsb>	(optional) is the text subscript with which to store the DSV file's item delimiter character in the data base
<labelsb>	(optional) is the text subscript with which to store the DSV column labels in the data base
<linesb>	(optional) is the (second) text subscript with which to store each original DSV data line in the data base

The meta-function evaluates to the number of DSV data records stored in the data base, or to -1 if something went wrong.

Once the DSV data are loaded into your XTRAN data base, your data manipulation rules can quickly retrieve a <label_n>'s value for a <uniqv_n>, as well as a <uniqv_n>'s original DSV data line if you requested that, with a simple call to XTRAN's "get a data base cell's contents" built-in meta-function.

If you tell the meta-function to store the DSV data file's item delimiter character and column labels in the data base, you can create XTRAN rules that process data base contents in a very general way.

How can such powerful and generalized data manipulation be automated in only 3¼ hours and only 272 code lines of XTRAN rules? Because there is so much capability already available as part of XTRAN's rules language. These rules take advantage of the following functionality:

Text file input and output
Text manipulation
Text formatting
Delimited list manipulation
Content-addressable data bases
Creating new meta-functions written in meta-code, which we call user meta-functions
Meta-variable and meta-function pointers

The input to XTRAN is shown untouched.

Process Flowchart

Here is a flowchart for this process, in which the elements are color coded:

BLUE for XTRAN versions (runnable programs)
ORANGE for XTRAN rules (text files)
PURPLE for text data files

Example 1

For this example, we default the DSV delimiter character to comma and the DSV comment string to semicolon. We also specify "NAME" as the label of the column that contains unique values.

DSV input data file:

; demd2d-1.dsv -- DSV data file for first example
;
; Label row:
;
NAME,GENDER,AGE
;
; Data:
;
Fred,M,43
Mary,F,19
George,M,55
;
; End of demd2d-1.dsv

Data base contents after import:

1^st subscript	2^nd subscript	Cell contents
`"Fred"`	`"AGE"`	`"43"`
`"Fred"`	`"GENDER"`	`"M"`
`"George"`	`"AGE"`	`"55"`
`"George"`	`"GENDER"`	`"M"`
`"Mary"`	`"AGE"`	`"19"`
`"Mary"`	`"GENDER"`	`"F"`

Example 2

For this example, we specify semicolon as the DSV delimiter character and "??" as the DSV comment string. We again specify "NAME" as the label of the column that contains unique values; note that it's in a different position from the previous example.

We also specify that the DSV data file's item delimiter character and column labels are to be stored in the data base using the subscripts "DLMCHR" and "LABELS".

We also specify that original DSV data lines are to be stored in the data base using the subscript "DSVLIN".

DSV input data file:

?? demd2d-2.dsv -- DSV data file for second example
??
?? Label row:
??
GENDER;AGE;NAME
??
M;;Fred
F;19;Mary
;55;George
??
?? End of demd2d-2.dsv

Note that "Fred" has no "AGE" given, and that "George" has no "GENDER" given.

Data base contents after import:

1^st subscript	2^nd subscript	Cell contents
`"DLMCHR"`		`";"`
`"LABELS"`		`"GENDER;AGE;NAME"`
`"Fred"`	`"AGE"`	(none)
`"Fred"`	`"DSVLIN"`	`"M;;Fred"`
`"Fred"`	`"GENDER"`	`"M"`
`"George"`	`"AGE"`	`"55"`
`"George"`	`"DSVLIN"`	`";55;George"`
`"George"`	`"GENDER"`	(none)
`"Mary"`	`"AGE"`	`"19"`
`"Mary"`	`"DSVLIN"`	`"F;19;Mary"`
`"Mary"`	`"GENDER"`	`"F"`

Example 3

For this example, we default the DSV delimiter character to comma and the DSV comment string to semicolon.

We don't specify a "unique values" column label, so the XTRAN meta-function assigns a unique 1-n integer to each DSV data row and uses that as the first subscript.

DSV input data file:

; demd2d-3.dsv -- Third DSV data file for demd2d-x.xxc
; Revised 2018-10-03.2153 by Stephen F. Heffner (Win10 on PREC7520)
; Created 2018-10-03 by Stephen F. Heffner (Win10 on PREC7520)
;
; Label row:
;
GENDER,AGE,NAME
;
M,55
F,19,Mary
M,,George
;
; End of demd2d-3.dsv

Note that the first data line has no "NAME" value, and "George" has no "AGE" value.

Data base contents after import:

1^st subscript	2^nd subscript	Cell contents
`1`	`"AGE"`	`"55"`
`1`	`"GENDER"`	`"M"`
`1`	`"NAME"`	(none)
`2`	`"AGE"`	`"19"`
`2`	`"GENDER"`	`"F"`
`2`	`"NAME"`	`"Mary"`
`3`	`"AGE"`	(none)
`3`	`"GENDER"`	`"M"`
`3`	`"NAME"`	`"George"`

XTRAN Example — Load DSV Data into XTRAN Data Base

Menu

Introduction

Process Flowchart

Example 1

DSV input data file:

Data base contents after import:

Example 2

DSV input data file:

Data base contents after import:

Example 3

DSV input data file:

Data base contents after import: