Language/APL

Modified | DDT |>| SAT.JAN,980124,15:20-5 |
comp.lang.apl posting by Bob Bernecky about the internal structure of APL

 
        From: Robert Bernecky  97/10/27 13:23
     Subject: Re: What is APL data storage?
          To: Larisa Cherkanskaya 
  Newsgroups: comp.lang.apl

I am a designer and implementor of APL interpreters and compilers,
so I might be able to provide a few suggestions and comments:

a. The idea that APL is heavy on arrays of real numbers
    is a common myth, but not true. Boolean, integer, and character
    arrays dominate the data types; 0- and 1-element arrays represent more
    than half of the array sizes encountered in real applications.
    Measurements of real applications, taken over periods
    of weeks on large systems bear out these statistics. I can provide
    3 or 4 independent citations of studies if you wish.
    However, that is not truly relevant to your problem, which
    is fundamentally the issue of array heap management.
b. The issue of arrays being real numbers only is a red herring. Bits is bits,
     as they say. The only potential design advantage given by that limitation
     is that you can elide type information, and perhaps gain some performance
     advantage by ensuring that data is aligned in memory in a manner
     appropriate to your target machine.
c.  I am familiar with several forms of storage management in APL
     systems. Simply put, they are as follows:
        1. Use the operating system calls for storage allocation and
            deallocation (e.g., malloc/free). This generally presumes that
            data is not going to have to move while allocated, and may
            cause storage fragmentation to the point of system failure
            if array sizes are highly variable and the system executes for
            long periods of time. Operating system calls may be more
           expensive than one of the following techniques.
        2. Allocate your own heap at startup time and manage it yourself.
            This is generally faster than operating system calls, but means
             that you get to write and debug your own storage manager,
             and you also tend to be limited in the heap size you can allocate
             (You have to guess how much dynamic memory will be needed
             during the entire execution. This is the APL " workspace full"
             problem.) . Methods of heap management of this sort are
             many and varied, and I recommend you check a standard
             Algorithms text for methods. An advantage of this scheme
             besides performance is that you can move arrays during
             execution to defragment memory.
        3. Use a fancy storage management system that is a hybrid of the
            two above, perhaps combined with a buddy system, to reduce
            fragmentation and the need to garbage collect/defragment
            storage.
d. Arrays tend to be stored in APL as something like this (Call this an
      m-entry, for m-entry): [By the way, schemes of this sort date back to
      the late 1960's, and the development of APL\360 at IBM.]
          backpointer
          length of the m-entry in bytes
          element count
          reference count
          array type
          array rank (# dimensions)
          array shape
          array elements (variable length)
     The meanings of these items are:
       backpointer- an index or address of the thing that points to this
                            array. The backpointer gives you the information
                            you need to move the array somewhere else in
                            memory, perhaps for heap compaction. When you
                            move the m-entry, you use the backpointer to find
                            the forward pointer to the m-entry, and then
                            adjust that forward pointer to point to the new
                            location of the m-entry after it has been moved.
       m-entry length: This is used by the storage compacter when the array
         is moved and by the storage deallocator when the array is
         deallocated.
       element count: Many APL primitives need to know the number of elements
         in the array. For example, the "ravel" primitive (,) that turns any
         array into a vector (list) has to compute the result shape. The
         element count, formed as the times reduction of the array shape, is
         redundant, but is sometimes used often enough during execution that
         it makes sense to keep it around, rather than doing the multiplies
         on each primitive call.
      reference count: The number of "things" that point to this m-entry
          New references to the m-entry increase the reference count;
          deleted references decrease it. When the reference count goes to
           zero, the m-entry is deallocated. This speeds up certain
           operations and makes them take less memory. For example,
          if  HERM is a big matrix, and you call a function foo:   foo HERM
        then you can either copy all of HERM to a new array for use by foo,
        OR you can increase the reference count for HERM and pass it
        directly into foo. Anything that might want to alter foo's argument
        checks the reference count -- if it's 1, then it can safely do the
        changes in place; otherwise it has to make a copy then.
        Reference counts are generally swell. In compiled APL, such
       as my APEX compiler, we can perform static analysis to remove
       most of the reference counting. This is difficult to do efficiently
       (or at all, for that matter) in an interpreter.
     array type: Since APL supports many data types on overloaded
       primitives, and as APL does not use declarations, the type is used
       to identify the array as, e.g., boolean, integer, character, real,
       complex. You don't need this, as your arrays are all real. array rank:
       The number of dimensions (axes) in the array. That is,
         the number of elements in the shape vector.
     array shape: The shape vector is the number of elements along each
        axis of the array. A chessboard has shape 8 8. If you ravel it
       to turn it into a vector, the result has shape ,64.
     array elements: Finally, we have the array elements themselves,
        stored in row-major order. This ordering makes certain primitives
        faster, allows us to easily exploit identities and use cache to
        advantage, and matches most other language storage layout
        schemes (except FORTRAN).
A good thing to do, now that I've told you this, is to decouple the array
data from the descriptor, keeping the array elements in one heap (or portion
thereof) and the descriptors in another. This permits you to share the same
data among arrays of different rank or shape. For example, that would let you
store the chessboard and its ravelled list as the same array.
Another good thing to do is to maintain an "array coordinate map" [See
Guibas & Wyatt POPL78] that extends this scheme so that common array
operations, such as transpose, reversal, drop, and row indexing, can be done
in fixed time, with no data movement regardless of the size of the arrays
involved. There's more, but that should give you a start...
Bob

Larisa Cherkanskaya wrote: 
> I'm a computer major student. Part of my assignment is to design a
> storage managment system for a heap of variaable-size blocks under
> assumption that the heap is used only to store arrays of real numbers.
> This assumption is basicaly the one behind APL data storage.
>      Can anybody please tell me where can I find the APL data description?
>
> My e-mail akrats1@triton.towson.edu
>
> Thank you! Alex.
comp.lang.apl posting by Bob Bernecky about the internal structure of APL

\/ \/ \/ Feedback \/ \/ \/