When you use my Turk68 compiler to generate native 68K Mac applications,
you probably need to look at the library source code for library routines
inadequately documented in MOSAPI. Most of
it is pure T2, but a lot of the low-level routines use machine-code hacks
in the "C0DE(...)" pseudo-function (note that's a zero, not the
letter Oh), which must be imported form pkg Dangerous. This document
explains how to read the details of most of them, as well as I can remember,
and where to look for the specs.
The general form is a T2 expression or statement, which if it has a result type specified, you can assign to a variable of that type, or return from a function (same type), or use in any place where T2 expects an expression value of that type; if it has no result type, then you can only use it as a stand-alone followed by a semicolon:
C0DE([:optional result type,][variables and/or #code operators,]["optional C code"])All parts are separed by commas. The optional result type is first and identified by a colon followed by a defined type name. The optional C code is last and is plain text contained within double quotes (a T2 String literal), and is used by the Turk2C compiler to generate (that) code in C. It (usually) also can be read as a explanation of what this code does in 68K machine code.
The operators between these two ends can consist in variable names and numbers to be pushed (as values) onto the virtual stack the compiler uses in evaluating expressions, or compiler directives, each of which is the hash symbol '#' followed by a constant name or literal number, which tell the compiler to do something, possibly involving values on the virtual stack. Often I have added end-line comments giving pseudo-assembly language ops for what the operators generate. All of these are separated by commas.
I can never remember what-all these operators do except #3 (SysCall), so I look the others up in the Turk68.tag source code in the PutNib nonterminal (the easy way to find it is to search for the unique end string "{~PutNib}" which is one of those listed alphabetically in a group near the front), and a few similarly named nonterminals it calls (for positive values less than 256), or else in the definitions near the front of the SysLibs.t2 file (for the few negative values, and values greater than 0x1FFFF), which you can mostly infer what they do from their names.
Compiler directives between 256 and 8K are interpreted as direct inline 68K-code halfword (2-byte) instruction operators and/or their immediate operands -- always 2 bytes each: if you need less than 256, add 0x10000 to it (the extra bit is stripped off), and usually coded in hex to match the opcodes in the 68K reference manual.
#3, the System call is always preceded by a number (no hash) pushed on the 68K stack as a selector, and before that, zero to five more parameters, usually names of variables or literal numbers, and if there are parameters, they are popped off again by a single operator following the #3. The selector number (between -64 and +113) maps directly to the switch-case selector in the void SysCall(..) function in sKernel68.t2. There are gaps in the assignment, because the selectors were chosen at a time when the kernel did a lot more that is now done in SysLibs packages.
For an example, consider the two-halfword-add function:
int Add2i(int a, int b) {return C0DE(:int,a,b,#25,0xFFFF0000, // = AddPtThe T2 function header [int Add2i(int a, int b) {return ] could have been the header of the equivalent (pure T2) function:
#16,#22,#23,#25,0xFFFF0000,#16,#22,#17,0xFFFF,#16,#17,#17,
// dup,F0, &,rot,swp,dup,F0,&,rot,+,0F,&,+,+
"(((a&0xFFFF0000)+(b&0xFFFF0000)+((a+b)&0xFFFF)))");}~Add2i
int Add2i(int a, int b) {but by using compiler directive operators, the compiler can code this inline and both save the subroutine call overhead, and also do some optimization on the parameters.
return (a&-0x10000)+(b&-0x10000)+((a+b)&0xFFFF);}~Add2i
First, the result is declared to be type int. Then both parameters, a and b, are pushed onto the virtual stack. If this is inlined, they would be loaded into registers. Then the compiler operation #25 DUP pushes onto the virtual stack a copy of the top number. Then the constant masking the high halfword is pushed and the #16 AND operator applies that to the (copy of the) b parameter -- the compiler optimizer would simply clear the low half of the register containing the copy of b -- then #22 ROT the top three items on the virtual stack are rotated top-to-bottom so the masked integer is now below the original two parameters. This is done in the compiler virtual stack; no native code is generated. This is followed by a #23 SWP which exchanges the top two items, leaving the a parameter on top, which is similarly masked and rotated down to the bottom. The remaining two copies of the original parameters a and b are now at the stack top; they are #17 added together and masked down to the low 16 bits, then two more adds combine that sum with the two (masked) high parts for a resulting sum, where the carry out of the low half does not propagate into the high half. The result is in a register, ready to use in whatever expression this function was called from.
The total cost in 68K machine code instructions are the two loads necessary to get a and b into their respective registers (they might already be there, the compiler knows), plus a couple register copies, two half-register clears, three register-to-register adds, and one mask low, which I think is either a 3-halfword AND-immediate (which costs more to fetch from main memory), or else two single halfword operations, one to clear a register, the second to transfer the low half of the value to it (which takes an extra stage in the pipeline, but in modern superscalar hardware probably executes in parallel. The 68K-to-PPC JIT compiler can reduce that some more (I don't know if Apple's did, but mine did).
Finally, you see in a quoted string, the C code (which is identical to T2 source code) that does the same thing.
A second example calls the Mac Toolbox BlockMove function, which takes source and destination starting addresses in 68K registers A0 and A1, and the number of bytes to copy in D0
void BlockMove(int frm, int too, int nby) {C0DE(frm,too,nby,#vWantA0,#vWantA1,Notice first that there is no declared return type, as appropriate for a void function. We start with the same parameters (frm,too,nby) as integers loaded up from the calling environment, and pushed onto the virtual stack. Then we tell the compiler (#vWantA0, #vWantA1) that we want to reserve A0 and A1, so it will not try to use them to fetch the parameters (which might be undereferenced array elements or object instance variables, both of which require address and possibly data registers to access; the compiler will therefore use instead A2 and/or A3 as needed), then we use the predefined compiler operators (#vPopD0, #vPopA1, #vPopA0) to pop the virtual stack elements into those registers in the proper order, followed by the #64 compiler directive to devirtualize everything (force the registers to be loaded; there is nothing left on the virtual stack but these register references). Technically we should wait until after the actual A-trap call to release the registers, but the compiler has nothing left to do, so it's safe to release them now. The final compiler directive #0xA02E is the A-trap itself, which is coded inline. If the data to be copied is a T2 array or string, then the two parameters are the array or string pointers themselves (a single LDA operation each), and the length value is probably evaluated in an expression, which might already be in a data register, or else partially evaluated on the virtual stack, which the #64 devirtualize operator will finish with a destination D0. The result is optimal 68K code, which JIT-compiles to near-optimal PPC code.
#vPopD0,#vPopA1,#vPopA0,#64,#vDoneD0,#vDoneA0,#vDoneA1,#0xA02E);}
I think that covers an example of just about everything you might encounter in my native code hacks in the SysLibs library source.
Tom Pittman
2020 January 11