Sprezzatura :: Making Databases Happen

Argument passing - Subroutines and Functions - Mike Pope

Jim Owens notes that "the C-esque use of local vs. global variables doesn't apply to R[/]BASIC". True, but misleading; there is in fact a notion of local, shared, and global variables - it's just that it doesn't work as it does with C.

For beginners, look at these two versions of essentially the same user-defined subroutine:

0001        * Version 1
0002        FUNCTION SPC(STRING, LENGTH, JUST)
0003         JUST = JUST : '#' : LENGTH
0004         STRING = FMT(STRING, JUST)
0005        RETURN STRING

0001        * Version 2
0002        FUNCTION SPC(STRING, LENGTH, JUST)
0003         TEMP.JUST = JUST : '#' : LENGTH
0004         RETURN.STRING = FMT(STRING, TEMP.JUST)
0005        RETURN RETURN.STRING

What's the difference? Both return a formatted string. But the second one does so without altering any of the data passed to it. The first one modifies two variables: STRING (by reformatting its contents), and JUST (by reassigning it with a full justification specification). This explains the results Mr Owen experienced with his problem.

In the former case, the values of these variables will be changed in the program that called this one. This is true because the variables are shared with the calling program, by virtue of being passed as arguments. In the latter example, by contrast, the contents of the variable TEMP.JUST is "invisible" to the calling program because it is a local variable - it is neither one of the variables used in the argument list, nor is it the explicit return value for the function.

LOCAL, SHARED, COMMON GLOBAL AND GLOBAL VARIABLES

There are four types of shared data areas in R/BASIC, summarized as follows:

Global         @Variables. These may be globally readable (@USERNAME),
               globally settable (@USER0, @SENTENCE), or readable/settable
               for the current recursive (TCL) level (@LEVEL, @RECUR).

Common Global  Common areas established with a label, i.e., labelled common.
               Information in labelled common is available to all programs
               (main or subroutine/function) that declare the same label.
               Information is maintained for the entirety of the logon
               session unless explicitly nulled (although this does not
               de-allocate the labelled area).

Shared         Argument lists and explicit COMMON variables. This is a
               limited form of inheritance by which one program can share
               data with programs "below" it. Variables of these types can
               be shared between one main program and any subroutines or
               functions called by that main program, including subroutines
               that call further subroutines. A main program cannot,
               however, share arguments or common variables with another
               main program. Any change made to a shared variable in any of
               the interrelated programs is reflected for all subsequent
               access to that variable. There are differences in the way
               arguments and common variables are handled (See below).

Local          Variables in a program that are not one of the above, i.e.
               that are not @-variables, labelled common, common, or part of
               an argument list.

PASSING BY REFERENCE, PASSING BY VALUE

R/BASIC passes information in argument lists, common, labelled common, and @variables by reference. This means that the variable in the calling and called program are referencing the same physical data (more accurately, the same area of memory), even if the variables in the two programs happened to have different names. By extension, this means that if a subroutine changes the value of a variable passed to it in an argument list or shared via a COMMON declaration, this change is reflected in the main program, because both programs are actually referring to the same information.

It is possible to pass by value, which means that the calling and called routines have separate copies of the data. This is done in R/BASIC by passing an expression (rather than a variable) in an argument list or common area (this doesn't work for @Variables). For example:

0001     CALL MYSUB(A:B, 2 + 3, "TEST")

All three arguments in the example are expressions, and are therefore passed by value. In other words, the subroutine will receive the values represented by the expressions. However, no change can be made to the arguments, as this calling syntax does not allow for a value to be returned using the argument list. (Functions can be called successfully in this manner, as they return a value "in place").

DESCRIPTERS

Underlying R/BASIC's use of variables is a fairly common operating system technology known as a descripter (or descriptor) table. Simply put, the descripter table is an array of all the variables currently in use anywhere in the system, including @variables (mostly), local variables, labelled common variables, and a couple of other lower-level types. If you like, you can think of a descripter table as a large bank of pigeonholes or postboxes into which information about the data in a variable can be stored.

Each variable in each program has one entry (usually) in the descripter table. Each entry in the descripter table is 10 bytes long, and the table as a whole can be thought of as a single huge variable (more accurately, a single segment), and is therefore limited to 64k. By deduction, the maximum number of variables that can be allocated in toto for one session is 64k/10 or 6,535. The system itself has grabbed about one thousand of these by the time the user takes control, however, so there are somewhat closer to 5,000 available at level 1 TCL.

ALLOCATION OF DESCRIPTERS.

In the simplest scenario, when a program is loaded, the system allocates a descripter for every variable in the program. One of the data in a program is the number of variables it requires, so this is known as soon as the program object code is read. For example, if a program uses 5 local variables, the system allocates 5 descripter table entries when the program is loaded. When the program terminates, the descripters are de-allocated, and those slots in the table again become available.

When you call a subroutine or function, the system loads the code for the routine and allocates descripters for its variables as well (but see below). When the subroutine or function executes a RETURN statement, however, its descripters are not de-allocated. They remain in the table (as, in fact, its code remains in the program stack) until the main program that called them terminates with a STOP statement.

DIMENSIONED ARRAYS

One relatively common problem arises out of the attempt to dimension a particularly large array (for example, DIM ARRAY(100,100)). This results in the error "maximum number of errors exceded [sic]". As it happens, R/BASIC attempts to allocate one descriptor table entry for each element in the dimensioned array (plus 2 extra). For example, the statement DIM ARRAY (100,100) attempts to allocate 10,002 descripters, considerably beyond the size of the table. The result is as noted. (This is not a problem for dynamic arrays because dynamics arrays are treated as a single variable that just happens to have delimiters in it. As a note of interest, this explains why access to the 5,000th element of a dimensioned array is faster than access to the 5,000th element of a dynamic array. In the former case, the system can extract the data as if it were its own variable, which in effect it is. In the latter case, the system must scan through a single large string looking for the 4,999th and 5,000th delimiters).

ARGUMENTS

Shared variables passed as arguments are handled a bit differently. In the main program they are allocated as individual entries in the descripter table. In the called routines, however, a variable that is in the argument list is not simply allocated a new, independent descripter table entry. Shared variables in a subroutine do in fact get entries in the table, but the entries are not "normal" entries. Indeed, they are pointers to the descripters that do contain the data for that variable. Thus a reference to an argument in a subroutine is looked up in the descripter table, which points it to another entry. The effect, as noted earlier, is that there is only one "real" descripter for a shared variable, and only one physical location for the data. All other references to it via argument lists all point to this one entry.

Given the explanation of "passing by reference", it should also be clear how passing an expression in an argument list is different. In that case, the argument in the argument list has no relationship with an existing descripter, since it isn't a variable that is being passed, but an actual value - there is nothing to point to except a data value directly.

If a subroutine or function is called recursively, each new iteration sets up a new descripter variable (local or shared) in the subroutine or function. If the variable is part of an argument list, the descripter for that variable points to the descripter referenced by the calling routine, so that the descripter for an argument to a subroutine that has called itself twice contains a pointer to a pointer to a pointer to a variable. Fortunately, if you pass dimensioned arrays, the system is intelligent enough not to reallocate new descripters for each element. Instead, the argument contains a pointer to one of those extra array descripters mentioned earlier, one that contains information about the dimension of the array and other such salient data.

The fact that a new descripter is allocated in a recursively-called subroutine for each argument explains why an endless loop of recursive calls will sometimes abort with "out of memory" but other times with "too many variables" In the latter case, the first limit that was reached was the number of possible entries in the descripter table for the variables used by that routine.

COMMON AND LABELLED COMMON VARIABLES

Although the common variables are often thought to be interchangeable with argument lists, they are handled differently in the descripter table. The declaration of a common area (labelled or "normal") alerts the compiler to tag the variables in question in the object code, and likewise alerts the program loader to allocate new descripters for those variables only if they are not already allocated. Instead of using a stack of indirect pointers as is done with arguments, the system resolves all references to a common variable by accessing the same descripter each time. This results in a slight improvement in efficiency for lookup, and, obviously, saves the extra descripter entries.

@VARIABLES

@Variables are handled differently yet again. These are pre-defined in the compiler, so the system already knows something about their locations in the descripter table. A specific reference is therefore compiled right into the program, so that a new descripter is not allocated when the program that references them is loaded.

FUN WITH DESCRIPTERS

There isn't a great deal you can do with descripters directly from R/BASIC, but there are a few tricks. Three R/BASIC-accessible statements and functions deal specifically with descripters:

Transfer     When you assign a value to a variable, you allocate a
             descripter and place information into the descripter. If you
             assign the value of one variable to another (e.g. A = B), you
             allocate a second descripter , look up the value represented by
             the first one, copy it, and then assign it to the second
             descripter. The end result is two descripters, and more
             importantly, two copies of the data. The TRANSFER command
             short-circuits this process. Instead of copying the data and
             assigning it anew, the command simply copies the first
             descripter wholesale to the second one. This literally
             transfers one descripter to another, including information
             about the data it represents. This is a much faster operation
             than copying the data and assigning it a new descripter.
             However, because two descripters cannot point to the same data,
             the first one is reassigned a null value.

Unassigned   An external function (DECLARE it) available somewhere around
             release 1.1, the UNASSIGNED function tells you whether a
             variable has been assigned a value. You might guess how it
             works. It reads the descripter table entry for the variable you
             pass it, and reports back with true if the descripter contains
             the special value placed in there at allocation time (before
             any value is assigned to the variable).

Descripter   The best for last. The internal function DESCRIPTER hands back
             to you the actual descripter table entry for the variable you
             pass it. If you are enterprising (and are reasonably familiar
             with the ways of bit-level data handling) you can probably use
             this function to figure how descripters are built and what
             they actually contain. Beyond that it has limited use, although
             it can be useful on odd occasions when debugging.

(Volume 3, Issue 1, Pages 11-15)

Making Databases Happen

Registered Address: 12A Marlborough Place, Brighton, BN1 1WN
USA +1 215 939 3400

RevMedia