revUp | Issue 53

The Dawning of a Brand New Array
Bernard brings new brackets, greater flexibility

I love arrays. They are such a flexible data structure that can be used for storing a variety of information. But I have something to confess - I've always had a somewhat rocky relationship with Revolution arrays. The array in Revolution provides the basics of what an array should be but always seems to let me down when I really need him. Let me explain.

I love that I can use an array in Revolution to represent the properties of a file using a data structure that is self-documenting:

I love that I can have a single variable that contains multiple elements of data that can be passed to, or returned from a handler.

But I'm less in love with the fact that arrays are not first-class citizens. This means I can't perform operations like sending the file array to another control:

send "StoreFile theFile" to button "FileCache"

My relationship with arrays is further stressed when I try to represent hierarchal structures in an array. Arrays in Revolution are one-dimensional. A dimension of an array is referenced using the '[' ']' expression and you can only use a single '[' ']' expression. This means an array element can only store a string value, not another array. While a one-dimensional array works fine for representing a file it doesn't work so well for representing something like a folder.

A folder contains multiple files and each file has properties such as name, size and creation date. I could store information about the folder's files using a tab delimited list -the name in column one, the file size in column two and so on. This is exactly what the files() function does in fact. But lists are not ideal for at least two reasons.

First, lists are not self documenting. I have to know what file property each column in the list represents and I have to access each property using a vague syntax:

set the itemDelimiter to tab
put item 1 of line 3 of theFolderFiles into theFileName

I could make this slightly more readable if I defined a constant for each column (i.e. constant kFileNameCol = 1) but that requires defining constants in any scripts I want to access the file list in.

Second, lists force you to take special care that no values in the list contain the list delimiters. If you are using a tab delimited list no values can contain tabs or returns. If they do then your column offsets will no longer be what you expect them to be as another item has been added to the line.

To overcome the shortcomings of lists I turn to the array. With arrays there is no need to worry about what type of data is stored in each element's key and the keys themselves document what is stored in them.

But how do I represent multiple files in a one-dimensional array? A common workaround is to mimic multiple dimensions using commas. For every dimension the array should have a comma is inserted. An array storing information on two files in a folder might look like this:

The problem with this approach is that the array does not truly represent the structure I am trying to store. In the example above the engine thinks the array is made up of four elements. If I ask the engine for the keys of the array I get:

put the keys of theFolderFiles


1,name
1,size
2,name
2,size

But the array should really be made up of two elements that in turn each have a name and size element. The following diagram illustrates this:

Why does this matter? In some cases it really doesn't. If you are working with two-dimensional arrays and are always dealing with arrays that you know the structure of you can hard code all of the functionality you need. In these cases the current engine design might only serve as an inconvenience. Once you start working with more than two dimensions, or if you are trying to write generic handlers that alter or interrogate arrays, things become trickier. Let's look at some examples.

To delete the entry for the second file in the array I have to individually delete each key that is prefixed with a "2":

delete variable theFolderFiles[2,"name"] 
delete variable theFolderFiles[2,"size"]

delete variable theFolderFiles[2]

The same holds true for adding and updating. I have to treat each dimension as an individual key in a monolithic one-dimensional array rather than as elements of an array that happens to be stored in the key of a parent array.

I can retrieve the keys in an array using the keys() function. Remember, however, that the array thinks that I am storing information for four separate elements. This makes it difficult to write a general repeat loop to iterate over the dimensions of an array.

Given our history together you can only imagine my excitement when I installed the beta of the next version of Revolution (referred to as "Bernard" at Revolution headquarters) and read the following in the engine change log:

Let's look at what updating an array to the status of "first-class value" means.

Knowing the intense hardships developers had to endure by living in a one-dimensional world the kind engineers at Runtime Revolution have given us more square brackets. This means we can now use multiple '[' ']' expressions. Let's look at how we would store a folder's files using new arrays:

Notice that we now have two sets of square brackets. The numbers ([1] and [2]) make up the first dimension. Each number has a second dimension made up of the file properties (["name"] and ["size"]). Compared to the example using commas the syntax has only changed slightly. But the result is completely different. Now if I ask the engine for the keys of the array I get:

put the keys of theFolderFiles

1
2

You see the engine now knows about the hierarchy of the array. With multi-dimensional arrays I can also ask the engine to tell me what the keys of theFolderFiles[1] are and it will respond:

put the keys of theFolderFiles[1]


id
name

So now you actually have an array stored in the key of another array, or in other words, a multi-dimensional array:

Our new arrays resolve the issues I mentioned previously relating to adding, updating and deleting keys as well as iterating through the multiple dimensions. Some examples of what you can do with the new multi-dimensional arrays are:

## Create array
put "my file.txt" into theFile["name"]
put "1024" into theFile["size"]

## Replace array stored in key "2" of theFolderFiles with
# theFile array
put theFile into theFolderFiles[2]

## Put the array stored in key "1" of theFolderFiles in another
# array
put theFolderFiles[1] into theFile

put theFile["name"]


"addresses.txt"

## Create array containing new file info
put "readme.txt" into theFile["name"]
put "2048" into theFile["size"]

## Determine the highest numbered key in dimension one
## then add 1 in order to get key where we will store new file
## the extents returns two items in a comma delimited list: 
## item 1 is the lowest number in the keys
## item 2 is the highest number in the keys
put item 2 of the extents of theFolderFiles + 1 into theNextKey
put theFile into theFolderFiles[theNextKey]

## Empty the file array stored in key "2"
put empty into theFolderFiles[2]

## Delete file array stored in key "2"
delete local theFolderFiles[2]

put item 2 of the extents of theFolderFiles into theHighestKey
repeat with i = 1 to theHighestKey
   put theFolderFiles[i] into theFile
   ## Do something ...
end repeat

Iterate sequentially through an array indexed by a list may be missing numbers due to a delete operation

## We may have deleted a key from dimension one so iterate
# using the keys
put the keys of theFolderFiles into theKeys
sort theKeys
repeat for each line theKey in theKeys
   put theFolderFiles[theKey] into theFile
   ## Do something ...
end repeat

The above examples illustrate the fact that Revolution now allows us to store arrays in the keys of another array.

Whenever I can work with a native data structure I find that it is easier to write and read code as I am using native syntax rather than calls to some library. With these new arrays developers can begin to write generic handlers for manipulating multi-dimensional arrays. This means it is now easer to do things such as translate popular formats into a native Revolution data structure and back again.

Take XML for example. When your XML documents are not prohibitively large you no longer have to resort to using XML handlers to manipulate the XML data. You can simply convert the XML document to a Revolution array, manipulate the array, and then convert the array back into XML.

To illustrate Array <-> XML conversions I've put together a Revolution stack with some sample handlers. The handlers will help illustrate the flexibility of the new arrays. Note that you will need the latest Revolution beta in order to use the stack.

Needless to say I'm really excited about arrays getting an upgrade. I think our relationship is definitely going to improve.