>From owner-simulators  Tue Aug 15 13:29:35 1995
Received: from meitner.cs.washington.edu (meitner.cs.washington.edu [128.95.2.104]) by june.cs.washington.edu (8.6.12/7.2ju) with ESMTP id NAA18624 for <simulators@june.cs.washington.edu>; Tue, 15 Aug 1995 13:29:34 -0700
Received: from localhost (localhost [127.0.0.1]) by meitner.cs.washington.edu (8.6.12/7.2ws+) with SMTP id NAA30644 for <simulators@cs>; Tue, 15 Aug 1995 13:29:34 -0700
Message-Id: <199508152029.NAA30644@meitner.cs.washington.edu>
To: simulators@cs
Subject: Clarity MCode
Date: Tue, 15 Aug 1995 13:29:34 PDT
From: "		    pardo@cs.washington.edu" <pardo@meitner.cs.washington.edu>
X-Message-Id: simulators@cs.washington.edu, message #1995-08-005
X-Unsubscribe: e-mail `majordomo@cs.washington.edu', body `unsubscribe rtcg'

%A Brian T. Lewis
%A L. Peter Deutsch
%A Theodore C. Goldstein
%T Clarity MCode: A Retargetable Intermediate Representation for
Compilation
%J Proceedings of IR '95
%D January 1995
%C San Francisco, California, USA
%P 119-128
%W btlewis@eng.sun.com <brian.lewis@sun.com>
%X `Clarity' is a C++-related programming language with delegation
instead of inheritance and better control over safe and unsafe modules
(among other things).  For improved code portability, Clarity is
compiled to `MCode', a machine-independent IR.  The IR may be
interpreted or compiled to machine code lazily at runtime.  To support
systems programming, can call between Clarity and C.
  Machine code generation is done by converting Clarity stack
operations (~bytecodes) to an AST and then uses a tree-walk code
generator to generate machine code in one pass.  The resulting code is
similar in quality to static compilation -O2 and abot 2x slower than
the best-optimized C.  The raw interpreter is about 100x slower than
the best C.
  (pg 119)  Forwards compatability: Code generated specifically for
one new SPARC implementation is 25\% faster than ``generic'' code
running on the same processor.  On-the-fly code generation lets you
take advantage of more machine-specific optimizations.  ``A runtime
code generator can also take advantage of the specfic values used in a
program to generate machine code customized for those values.''
(doesn't cite anything).
  (pg 119, 126)  MCode compiled to linkable .o modules.  Thus need to
include platform-dependent definitions of global variables and
procedures and descriptions of refernced symbols.  Thus, uses ``entry
code [that] consists primarily of a three instruction "trampoline" that
redirects the call to the appropriate target procedure chosen by the
interpret/compile strategy module in the MCode runtime.  The SPARC
entry code also has three words used when atomically updating the
trampoline. ... [T]he contents of a Linkable MCode file are mostly
platform-independent.''  Pardo notes that MCode modules could use a
platform-dependent ``installer'' like S/38 and AS/400 that would
decorate a truly portable MCode module with e.g., the trampoline code
required for static linking (as well as ``default'' static code
implementations).  You would also want an uninstaller.  Pardo notes
that dynamically-linked MCode modules could be completely
machine-independent if you could convince the dynamic linker to call
out to the MCode runtime system whenever it was building a dynamic
link.
  (pg 122)  MCode similar to Oak/Java bytecodes but more abstract,
refering to platform-dependent type definitions.  MCode's control is
expressed as high-level instructions (BeginLoop ... EndLoop) while
Oak/Java uses jumps.  Lower level than Franz's LZ'd AST.
  (pg 124)  MCode includes some optimization information: aliasing, a
flag for each procedure leaf/nonleaf.  Later maybe variable lifetime
information.
  (pg 126)  Type safety will eventually be checked by a prelinker.
  (pg ??)  Code generated on a procedure-by-procedure basis.
  (pg 126-8)  Machine code generator is represented by two classes.
One keeps track of state during compilation; machine-dependent part
takes care of e.g., parameter passing conventions.  The other
represents machine code at the level of ``call op_add_real'' to
generate code when an ``add real'' is seen in the MCode instruction
stream.  (pg 120) Current design has strong roots in an earlier
system by Deutsch to quickly compile C bytecodes.
  (pg 128)  The interpreter implements calls and returns as SPARC ABI
calls.  Does not require dynamically-generated closures (ala [Breuel,
Chase, DEC Alpha calling standards]) since they are provided by the
platform-dependent part of an MCode object module.  Does require
dynamically-generated sequences for calls to routines with aggregate
return, since the size of the return aggregate's lengh is encoded in
the I-stream following the call instruction (static code would require
one call sequence for each possible aggregate size).
  Brian says: supporting the SPARC ABI complicated the interpreter,
since various struct return policies use an `unimp' instruction after
the `call' instruction, thus the interpreter must have the right
call/unimp sequences lying around.  Clarity code ``doesn't quite'' use
the SPARC ABI calling convention, it uses 2 global registers to hold
the `self'-like pointers used to implement delegation.



[Ted Goldstein <tedg@clare.eng.sun.com> was good enough to take some
 time out to explain a few details of dynamic compilation of MCode
 bytecodes.  Here's a brief summary of his e-mail, though I may have
 introduced some errors in transliterating it:]

The external interface to the compiler works at either whole module
level (for static .o file generation) or one-function at a time (for
in-core execution).  I'll focus on the latter.

The CGEN code generator was designed and built by L Peter Deutsch and
myself after experience with the Smalltalk-80 code generators built
for ParcPlace Systems.

The entire code generator is small--only 3750 executable statements.
It is heavily object-oriented, with about 180 classes and structs in
11543 lines.  Our experience is that OO code is efficient when it is
small and compact.  Of the 180 classes, 17 are machine dependent.

The first time any function in a module is executed, the entire module
has its external symbol tables unpickled.  A small type table is set
up for the bytecodes; as with a conventional compiler, this is the
compilation context for the function.

The next step is to symbolically execute the bytecodes.  The bytecodes
target a stack machine, but CGEN reconstructs SSA form.  CGEN does a
first-best fit targeting (similar to PQCC's TNBIND) and code is
generated as soon as a target is known.  If CGEN gets in trouble (runs
out of registers and too large a spill penalty is created), CGEN goes
back and regenerates the function first doing a graph coloring of the
SSA form before targeting.  In practice, only large FORTRAN-ish code
uses regeneration; Clarity is like C++ so double-generation happens
rarely.

Code generation always uses specific registers and machine
instructions.  Various local optimizations are done eagerly including
delay slot filling, branch selection and memory interleaving.
``Eager'' means that the optimization is triggered as soon as enough
consecutive instructions are known.  The code generator uses a
Smalltalk-style InstructionStream object that performs local
optimization.

Some optimizations would not even be possible in conventional
compilers (for example arithmetic on runtime constants and addresses
more complicated than linkers support).

Finally instructions are copied out of objects and onto the target
page.  Locality of reference allows code referenced consecutively to
be close in memory, which is important for OO code.