Title

SRFI 25: Multi-dimensional Array Primitives

Author

Jussi Piitulainen

Status

This SRFI is currently in ``final'' status. To see an explanation of each status that a SRFI can hold, see here. You can access the discussion via the archive of the mailing list.

Draft: 2001/11/12-2002/01/11
Revised: 2002/02/03
Final: 2002/05/21

Abstract

A core set of procedures for creating and manipulating heterogeneous multidimensional arrays is proposed. The design is consistent with the rest of Scheme and independent of other container data types. It provides easy sharing of parts of an array as other arrays without copying, encouraging a declarative style of programming.

The specification is based on an original contribution by Alan Bawden in 1993.

Rationale

The proposed arrays encourage a natural declarative programming style. They allow sharing of most any rectangular part of an array through an affine index mapping, without copying. But imperative style is equally natural.

The design is consistent with the two indexed data structures of Scheme: vectors and strings. The design makes arrays a self-contained type. These statements are illustrated in the following paragraphs.

First, in the one-dimensional case, the arguments of the following relevant calls match exactly.

    (vector-set! v k o)
    (string-set! s k c)
    (array-set! a k o)

Likewise, make-array matches make-vector and make-string. An analogue to vector, string and list is provided, alleviating the lack of an external representation. Index bounds are specified as for substring, lower bound included and upper bound excluded.

Array shapes are specified as arrays. These can be made with a special procedure shape that does not have a shape argument. An array does not retain a dependence to the shape array. For example, mutation of a shape array is allowed.

Index mappings return multiple values as multiple values.

Array dimensions can begin at any index. In particular, the choice between 0 and 1 is left to the user. (Shapes and index objects are zero based, though.)

The ability to pack an index sequence in a vector is useful for implementing higher level operations. (The ability to pack it in a one-dimensional array lets one use, say, a row of a matrix as an index.)

It is not required that vectors not be arrays. It is not required that they be, either.

Specification

Arrays are heterogeneous data structures whose elements are indexed by integer sequences of fixed length. The length of a valid index sequence is the rank or the number of dimensions of an array. The shape of an array consists of bounds for each index.

The lower bound b and the upper bound e of a dimension are exact integers with (<= b e). A valid index along the dimension is an exact integer k that satisfies both (<= b k) and (< k e). The length of the array along the dimension is the difference (- e b). The size of an array is the product of the lengths of its dimensions.

A shape is specified as an even number of exact integers. These are alternately the lower and upper bounds for the dimensions of an array.

The following ten procedures should be implemented.

(array? obj)
Returns #t if obj is an array, otherwise returns #f.

Note: there is no reasonable way to implement this procedure accurately in R5RS; SRFI 9 (Defining Record Types) specifies a way, and many Scheme implementations provide something similar.

(make-array shape)
(make-array shape obj)
Returns a newly allocated array whose shape is given by shape. If obj is provided, then each element is initialized to it. Otherwise the initial contents of each element is unspecified. The array does not retain a dependence to shape.

(shape bound ...)
Returns a shape. The sequence bound ... must consist of an even number of exact integers that are pairwise not decreasing. Each pair gives the lower and upper bound of a dimension. If the shape is used to specify the dimensions of an array and bound ... is the sequence b0 e0 ... bk ek ... of n pairs of bounds, then a valid index to the array is any sequence j0 ... jk ... of n exact integers where each jk satisfies (<= bk jk) and (< jk ek).

The shape of a d-dimensional array is a d × 2 array where the element at k 0 contains the lower bound for an index along dimension k and the element at k 1 contains the corresponding upper bound, where k satisfies (<= 0 k) and (< k d).

(array shape obj ...)
Returns a new array whose shape is given by shape and the initial contents of the elements are obj ... in row major order. The array does not retain a dependence to shape.

(array-rank array)
Returns the number of dimensions of array.

    (array-rank
       (make-array (shape 1 2 3 4)))

Returns 2.

(array-start array k)
Returns the lower bound for the index along dimension k.

(array-end array k)
Returns the upper bound for the index along dimension k.

(array-ref array k ...)
(array-ref array index)
Returns the contents of the element of array at index k .... The sequence k ... must be a valid index to array. In the second form, index must be either a vector or a 0-based 1-dimensional array containing k ....

    (array-ref (array (shape 0 2 0 3)
                  'uno 'dos 'tres
                  'cuatro 'cinco 'seis)
       1 0)

Returns cuatro.

    (let ((a (array (shape 4 7 1 2) 3 1 4)))
       (list (array-ref a 4 1)
             (array-ref a (vector 5 1))
             (array-ref a (array (shape 0 2)
                             6 1))))

Returns (3 1 4).

(array-set! array k ... obj)
(array-set! array index obj)
Stores obj in the element of array at index k .... Returns an unspecified value. The sequence k ... must be a valid index to array. In the second form, index must be either a vector or a 0-based 1-dimensional array containing k ....

    (let ((a (make-array
                (shape 4 5 4 5 4 5))))
       (array-set! a 4 4 4 'huuhkaja)
       (array-ref a 4 4 4))

Returns huuhkaja.

(share-array array shape proc)
Returns a new array of shape shape that shares elements of array through proc. The procedure proc must implement an affine function that returns indices of array when given indices of the array returned by share-array. The array does not retain a dependence to shape.

    (define i_4
       (let* ((i (make-array
                    (shape 0 4 0 4)
                    0))
              (d (share-array i
                    (shape 0 4)
                    (lambda (k)
                       (values k k)))))
          (do ((k 0 (+ k 1)))
              ((= k 4))
             (array-set! d k 1))
          i))

Note: the affinity requirement for proc means that each value must be a sum of multiples of the arguments passed to proc, plus a constant.

Implementation note: arrays have to maintain an internal index mapping from indices k1 ... kd to a single index into a backing vector; the composition of this mapping and proc can be recognised as (+ n0 (* n1 k1) ... (* nd kd)) by setting each index in turn to 1 and others to 0, and all to 0 for the constant term; the composition can then be compiled away, together with any complexity that the user introduced in their procedure.

This document does not specify any external representation for arrays. This document does not specify when arrays are equal?. (Indeed, R5RS equal? will do the wrong thing.)

Examples

The reference implementation comes with a number of files that illustrate some ways to use the proposed system (and are very useful in testing an implementation; that is their origin).

A library arlib.scm that contains, among several other things, tabulate-array for a more useful initialization of a new array, an array-equal?, and a transpose that can permute the dimensions of an array any which way.
A test suite test.scm for array, and another test suite list.scm for arlib.
A rudimentary display procedure (play array) in play.scm, for playing around with the system.

Implementation

A portable reference implementation is provided. It uses a minimal error reporting mechanism that conforms to SRFI 23 (Error reporting mechanism). Type disjointness requires support from the host implementation, such as support for SRFI 9 (Defining Record Types). All names not defined in this proposal are in the prefix "array:", which serves as a module system.

You can get source for the reference implementation as a single file and stop reading. But there are variations. This single file represents arrays as procedures (so the type predicate is very approximate); it represents index mapping as vectors of coefficients; map recognition is not optimised for any number of dimensions as that would be redundant in this representation.

The real source comes in too many files. A working installation consists of the following parts, each in its own file.

a record type definition, either system specific for type disjointness, or portable as procedures, in a file as-*.scm, and
indexing operations to match the type, in a file ix-*.scm, and
an affine recogniser of one of three types, optimised up to some number of dimensions, in a file op-*.scm, and
the main source file array.scm.

Affine recognisers are made by a program opt.scm but one of each type is also available here, optimized for 0, 1, 2 and 3 dimensions. Choose one type: pick a recogniser with matching index procedures; load as-, ix- and op- and array.)

In the mbda type representation, index mappings are procedures that accept an optional argument. The matching access procedures apply the mapping to the arguments of array-ref and array-set!.
In the tter type representation, index mappings are pairs of procedures: one takes exactly the indices, the other takes indices and an object. The matching access procedures apply the first procedure to the argumets of array-ref and the second procedure to the arguments of array-set!.
In ctor representation, index mappings are coefficient vectors. The access procedures compute the sum of products of coefficients and indexes in a loop on the list.

Record implementations are available for generic Scheme (arrays are not disjoint from procedures), for SRFI 9 (Defining Record Types) (not tested), and for PLT Scheme (arrays belong to a struct type).

With the three files from above, the main source file should work in any Scheme implementation without need of modification.

Error checking in the implementation may be a tad expensive. The places where it occurs are cleanly separated from the surrounding code. (Sharing uses a check that is exponential in the number of dimensions. It is disabled above a threshold rank.)

Acknowledgements

The original concept comes from a message to the Usenet newsgroup comp.lang.scheme by Alan Bawden in 1993. A variant of that implementation by Richard Kelsey in the Scheme 48 system was also an influence. Apart from the origins, the main design goal has been consistency with the core types of Scheme.

Alan Bawden and Mark K. Gardner gave useful comments at an earlier attempt to make this specification public. (There was at least one other. Notes have gone missing.) SRFI feedback led to improved wording, hidden shapes, and two kinds of index objects.

The exact title of the proposal comes from a message titled "a process that might work" by William D. Clinger to the rrrs-authors mailing list in 1998. That appears to be a part of the past of the SRFI process.

Copyright

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor: Mike Sperber

Last modified: Tue May 28 18:46:09 MST 2002