Task 3 -- Basics of subroutine handling

Consider the following "generic" assembly language code segment using subroutine calls

MyCode:	;	MyCode() subroutine
	code here that uses processor registers
A:	CALL MySub
B	NOP
	code here that uses processor registers
C:	CALL MySub
D:	NOP
	code here that uses processor registers
	......
	......
	RETURN

MySub:	;	MySub() subroutine
	code here that is using the
	SAME processor registers 
	as did subroutine MyCode()
	RETURN

The first question to ponder is how does the processor know that when executing the instruction labeled C inside function "MyCode()" that


How does the processor know "where to go"?

The CALL instruction (labeled "C") has built into it the address associated with the instruction that starts the subroutine "MySub()". The logic controlling the execution of the CALL instruction ensures that this address is placed into the processor's program counter, or as that register is called on many Intel processors -- the instruction pointer. This forces the next instruction to be fetched to be the one starting the subroutine "MySub()". More details on the operations involved in subroutine calls can be found in HVZ.


How does the processor know what instruction to execute on returning?

CISC processors

On CISC processors, such as the 68K family, another component of implementing the CALL instruction is to store the address of the instruction following the CALL onto the processor's memory stack. This address is pulled off the stack into the program counter during the RETURN instruction. This forces the correct program flow to be re-established (instruction labeled "D" rather than instruction labeled "B"). More details can be found in HVZ.

RISC processors

RISC processors work the same way in requiring storage of a return address but handle the operation differently. Storing things to an external memory stack would slow the RISC processor pipeline. Instead of this slow memory access, the return address is stored internally (fast access) in a dedicated register.

There is a single special register, the "link register", on the PowerPC for storing the return address. On RISC processors with a register window (SPARC and 29K processors), the return address can be stored in one of a number of registers, each of which can play the same role as the link register. This approach has speed advantages when subroutines call other subroutines as the contents of the link register need not be stored, and later recovered, to the "slow" external memory stack. The more recent RISC processors, without the register window, partially counter balance the slow memory operations of having to store the link register to memory during nested subroutines by placing some memory right on the processor chip, the memory cache. Multiple memory operations on a RISC processor, even one with cache, are slow as the memory operations can break the fast RISC instruction pipeline. This effect is counterbalanced as the simple RISC instructions can be re-ordered by an "intelligent" compiler for "just-in-time" memory operations. Here the LOAD instructions are started far enough in front of the instruction that needs the memory value that any slow memory operation will have completed by the time the value is needed. This is possible as many parallel operations can occur on the more modern processors.

During the RETURN instruction, the RISC processor logic retrieves the return address from the link registers and places it into the program counter, forcing a jump to the instruction after the CALL. More information on subroutine RETURN instructions can be found in HVZ.


What registers are available for use during the subroutine?

When the processor enters a subroutine, there is no indication of how the processor got to that point. The CALL instruction could, in principle, be from any where in your code, from the "MyCode()", from a library function, or even from the subroutine "MySub()" itself (recursive calls). Therefore there is NO approach you can use to build a subroutine to take into account all the possible ways that the processor registers might be used by the function that called the subroutine.

Instead, it is necessary to adopt some sort of "convention" about how the registers can be shared between the main code (the one calling the subroutine), the subroutine and the subroutine-that-the-subroutine calls. This convention must allow a balance of maximum convenience for programming and maximum speed of operation.

The simplest convention to adopt is to require the subroutine to save all registers it uses (on the memory stack) and restore the register's original values on exit. However, this would mean a lot of slow memory stack operations for even the simplest subroutine.

The best approach is to have banks of additional registers that can be "swapped" in for use during subroutines. This is the method that is used on the Intel CISC i960 processor and on the RISC SPARC and 29K "register windowed" processors. The old Z80 processor has just been released "in new form" with these register banks. (Also what happens when you run out of the "extra registers"?)

With the 68K and PowerPC processors a third approach is taken. To speed operations, and avoid saving all registers to slow external memory, some registers are specified as volatile, meaning that these registers do not need to be stored (and recovered) during a subroutine. There are four 68K volatile registers (D0, D1, A0 and A1) and 8 PowerPC volatile registers (R3 to R10).

All other (non-volatile) registers, including the stack pointer, must be saved and restored to their original values is modified during the subroutine. In time critical code, this can involve some specialized coding techniques.

If you don't want to follow the suggested coding convention during your laboratories, then don't. However, do not just "use" what ever register you think might be available at this point in your program, plan a convention of your own. If you don't plan how the registers are used you will end up wasting a lot of time during your "test and debug" phase of your work. Also, you should remember that there are a large number of predeveloped routines provided with the libraries included in the SDS start kit directories. You can also develop "C" subroutines (using the SDS "C" compiler) that can be linked to your code. All of these routines follow the coding convention I use.

NOTE:- These coding conventions have some interesting consequences (side-effects) of which you should be aware if you plan to call "C" generated subroutines or my subroutines. You may have already experienced the effect if you attempted to print things to the "screen" during the Laboratory 2 experiment with the 68K COFFEEPOT virtual device.



Last modified: July 22, 1996 01:22 PM by M. Smith.
Copyright -- M. R. Smith