Modified | DDT |>| SAT.JAN,980124,15:20-5 | | CoSy/Home ; CoSy/Current | © Coherent Systems Inc . |
From: Robert Bernecky97/10/27 13:23 Subject: Re: What is APL data storage? To: Larisa Cherkanskaya > I'm a computer major student. Part of my assignment is to design a > storage managment system for a heap of variaable-size blocks under > assumption that the heap is used only to store arrays of real numbers. > This assumption is basicaly the one behind APL data storage. > Can anybody please tell me where can I find the APL data description? > > My e-mail akrats1@triton.towson.edu > > Thank you! Alex.Newsgroups: comp.lang.apl I am a designer and implementor of APL interpreters and compilers, so I might be able to provide a few suggestions and comments: a. The idea that APL is heavy on arrays of real numbers is a common myth, but not true. Boolean, integer, and character arrays dominate the data types; 0- and 1-element arrays represent more than half of the array sizes encountered in real applications. Measurements of real applications, taken over periods of weeks on large systems bear out these statistics. I can provide 3 or 4 independent citations of studies if you wish. However, that is not truly relevant to your problem, which is fundamentally the issue of array heap management. b. The issue of arrays being real numbers only is a red herring. Bits is bits, as they say. The only potential design advantage given by that limitation is that you can elide type information, and perhaps gain some performance advantage by ensuring that data is aligned in memory in a manner appropriate to your target machine. c. I am familiar with several forms of storage management in APL systems. Simply put, they are as follows: 1. Use the operating system calls for storage allocation and deallocation (e.g., malloc/free). This generally presumes that data is not going to have to move while allocated, and may cause storage fragmentation to the point of system failure if array sizes are highly variable and the system executes for long periods of time. Operating system calls may be more expensive than one of the following techniques. 2. Allocate your own heap at startup time and manage it yourself. This is generally faster than operating system calls, but means that you get to write and debug your own storage manager, and you also tend to be limited in the heap size you can allocate (You have to guess how much dynamic memory will be needed during the entire execution. This is the APL " workspace full" problem.) . Methods of heap management of this sort are many and varied, and I recommend you check a standard Algorithms text for methods. An advantage of this scheme besides performance is that you can move arrays during execution to defragment memory. 3. Use a fancy storage management system that is a hybrid of the two above, perhaps combined with a buddy system, to reduce fragmentation and the need to garbage collect/defragment storage. d. Arrays tend to be stored in APL as something like this (Call this an m-entry, for m-entry): [By the way, schemes of this sort date back to the late 1960's, and the development of APL\360 at IBM.] backpointer length of the m-entry in bytes element count reference count array type array rank (# dimensions) array shape array elements (variable length) The meanings of these items are: backpointer- an index or address of the thing that points to this array. The backpointer gives you the information you need to move the array somewhere else in memory, perhaps for heap compaction. When you move the m-entry, you use the backpointer to find the forward pointer to the m-entry, and then adjust that forward pointer to point to the new location of the m-entry after it has been moved. m-entry length: This is used by the storage compacter when the array is moved and by the storage deallocator when the array is deallocated. element count: Many APL primitives need to know the number of elements in the array. For example, the "ravel" primitive (,) that turns any array into a vector (list) has to compute the result shape. The element count, formed as the times reduction of the array shape, is redundant, but is sometimes used often enough during execution that it makes sense to keep it around, rather than doing the multiplies on each primitive call. reference count: The number of "things" that point to this m-entry New references to the m-entry increase the reference count; deleted references decrease it. When the reference count goes to zero, the m-entry is deallocated. This speeds up certain operations and makes them take less memory. For example, if HERM is a big matrix, and you call a function foo: foo HERM then you can either copy all of HERM to a new array for use by foo, OR you can increase the reference count for HERM and pass it directly into foo. Anything that might want to alter foo's argument checks the reference count -- if it's 1, then it can safely do the changes in place; otherwise it has to make a copy then. Reference counts are generally swell. In compiled APL, such as my APEX compiler, we can perform static analysis to remove most of the reference counting. This is difficult to do efficiently (or at all, for that matter) in an interpreter. array type: Since APL supports many data types on overloaded primitives, and as APL does not use declarations, the type is used to identify the array as, e.g., boolean, integer, character, real, complex. You don't need this, as your arrays are all real. array rank: The number of dimensions (axes) in the array. That is, the number of elements in the shape vector. array shape: The shape vector is the number of elements along each axis of the array. A chessboard has shape 8 8. If you ravel it to turn it into a vector, the result has shape ,64. array elements: Finally, we have the array elements themselves, stored in row-major order. This ordering makes certain primitives faster, allows us to easily exploit identities and use cache to advantage, and matches most other language storage layout schemes (except FORTRAN). A good thing to do, now that I've told you this, is to decouple the array data from the descriptor, keeping the array elements in one heap (or portion thereof) and the descriptors in another. This permits you to share the same data among arrays of different rank or shape. For example, that would let you store the chessboard and its ravelled list as the same array. Another good thing to do is to maintain an "array coordinate map" [See Guibas & Wyatt POPL78] that extends this scheme so that common array operations, such as transpose, reversal, drop, and row indexing, can be done in fixed time, with no data movement regardless of the size of the arrays involved. There's more, but that should give you a start... Bob Larisa Cherkanskaya wrote: