Todo

A Brief Categorization

A list of tools, organized according to various interesting features. See also a listing of tools ordered alphabetically. Interesting things about the tools include:

Purpose of the tool
Supports buggy applications (that is: is the tool robust in the face of application errors?).
Supports dynamic instruction space modification (a.k.a. ``Dynamic Linking'', ``Runtime Code Generation'', or ``Self-Modifying Code'')
Supports multiple target processors
Supports multiple protection domains (address spaces)
Supports signals, exceptions and asynchronous events
Supports system-mode simulation or tracing
Implementation:
Timing simulation
Product status

Purpose Of The Tool

THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.

atr: address tracing
db: debugging
otr: other tracing
sim: (instruction set) simulation
tb: tool building
Here, ``tool building'' is meant to encompass tools that are used to build other tools, for example, a tool that builds tracing tools is a tool-building tool, whereas a configurable cache simulator is not a tool-building tool. The usual distinction is that a tool-building tool can be extended [NG87, NG88] using a general-purpose programming language (e.g. C, C++, ...), whereas a configurable tool is programmed with a less-powerful language e.g. a list of cache size, line size, associativity, etc.

In addition, some tools are used for

os: operating system (OS) Emulation

Handles Application Bugs Robustly

THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.

No: Application errors such as stores to random memory locations may cause the simulation or tracing tool to fail or produce spurious answers, or may cause the application program to fail in an unexpected (unintended) way or produce spurious answers.
Some: Certain kinds of errors are detected or serviced. For example, application errors may be constrained so that they can clobber application data in random ways but that they cannot cause the simulation or tracing tool to fail or produce erronious results.
Yes: Application errors are detected and handled in some predictable way.
Yes*: Turning on checking may slow execution.

Works with Self-Modifying Code

THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.

No
Yes, but not all kinds
Yes

Multiple Processors

THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.

No
Y1: multiplexes all target processors on a single host processor
Y=: same number of host and target processors (to be precise, should be a ``Y-'' category for several host processors per target processor).
Y+: can multiplex a large number of target processors onto a potentially smaller number of host processors

Support for Multiple Protection Domains

THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.

No
Yes

Signals and Exceptions

THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.

No
S: yes, but not all kinds. For example, a tracing tool might execute the traced program correctly but fail to trace signal handlers.
Yes

Support for System-Mode Code (Detail)

THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.

d: device
u: user
s: system
Note: the system mode may be marked in parenthesis, e.g. (s), indicating that the host processor does not have a distinct system mode in hardware, but the tool is intended to work with (simulate, trace, etc.) operating system code.

Input Representation

THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.

asm: assembly code
exe: executable code, no symbol table information
exe*: executable code, with symbol table information
hll: high-level language

Implementation: Decompilation Technology

``Decompilation technology'' here refers to the process of analyzing a (machine code) fragment and, through analysis, creating some higher-level information about the fragment. For simulation and tracing tools, decompilation is typically simpler than static program decompilation, in which the goal is to read a binary program and produce source code for it in some high-level language. Simulation and tracing ``has it easy'' in comparison because it is possible to get by with a lower-level representation and also to punt hard problems to the runtime, when more information is available.

Even so, executable machine code is difficult to simulate and trace efficiently (within 2 orders of magnitude of the performance of native execution) when using ``naive'' instruction-by-instruction translation, because lots of relevant information is unavailable statically. For example, every instruction is potentially a branch target; every word of memory is potentially used both as code and as data; every mutable word of memory is potentially executed, modified (at runtime), and then executed again; and so on.

Executable machine code is also inherently (target) machine-dependent and thus lexing and parsing the machine code is a source of potential portability problems. (Note that some tools use a high-level input, so that relatively little analysis is needed to determine the original programmers intent, at least at a level needed to simulate the program with modest efficiency.)

The following is a a list of tools and papers that show how to reduce the overhead of analyzing each instruction; how to reduce the number of times each instruction is analyzed; how to perform optimistic analysis and recover when it's wrong; and how to improve the abstraction of machine-dependent parts of the tool.

A short list:

A slightly longer list:

Implementation: Simulation Technology

The ``simulation technology'' is how the original machine instructions (or other source representation) gets translated into an executable representation that is suitable for simulation and/or tracing. Choices include:

ddi: Decode-and-dispatch interpretation: the input representation for an operation is fetched and decoded each time it is executed.
pdi: Predecode interpretation: the input form is translated into a form that is faster to decode; that form is then saved so that successive invocations (e.g. subsequent iterations of a loop) need only fetch and decode the ``fast'' form. Note that
- The translation may happen before program invocation, during startup, or incrementally during execution; and that the translated form may be discarded and regenerated.
- If the original instructions change, the translated form becomes incoherent with the original representation; a system that fails to update (invalidate) the translated form before it is then reexecuted will simulate the old instructions instead of the new ones. For some systems (e.g., those with hardware coherent instruction caches) such behavior is erronious.
tci: Threaded code interpretation: a particularly common and efficient form of predecode interpretation.
scc: Static cross-compilation: The input form is statically (before program execution) translated from the target instruction set to the host instruction set. Note that:
- All translation costs are paid statically, so runtime efficiency may be very good. In contrast, dynamic analysis and transformation costs are paid during simulation, and so it may be necessary to ``cut corners'' with dynamic translation in order to manage the runtime cost. Cutting corners may affect both the quality of analysis of the original program and the quality of code generation.
- Instructions that cannot be located statically or which do not exist until runtime cannot be translated statically.
- Historically, it is difficult to distinguish between memory words that are used for instructions and those that are used for data; translating data as instructions may cause errors.
- Translating to machine code allows the use of the host hardware's instruction fetch/decode/dispatch hardware to help simulate the target's.
- Translating to machine code makes it easier to translate clumps of host instructions; most dispatching between target instructions is thus eliminated.
dcc: Dynamic Cross Compilation: Host machine code is generated dynamically, as the program runs. Note that:
- Translating ``on demand'' eases the problem of determining what is code and what is data; a given word may even be used as both code and data.
- Translating to machine code is often more expensive than translating to other representations; both the cost of generating the machine code and the cost of executing it contribute to the overall execution time.
- Theoretical performance advantages from dynamic cross-compilation may be overwhelmed by the host's increased cache miss ratio due to dynamic cross-compilation's larger code sizes [Pittman 95].
aug: Augmentation: cross-compilation where the host and target are the same machine. Note that
- Augmentation is typically done statically.
- There is a fine line between having identical host and target machines (augmetnation) and having nearly-identical machines in which just a few features (e.g. memory references) are simulated, but in which the bulk of instruction sets and encodings are identical.
emu: Emulation: Where software simulation is sped up using hardware assistance. ``Hardware assistance'' might include special compatability modes but might also include careful use of page mappings. (See ``emulation''.)

Dynamic Compilation: Displaced Execution

Move an instruction from one place to another, but execute with the same host and target.

1951: EDSAC Debug
1987: Shadow

Dynamic Compilation: Cross-Compilation

Compile instruction sequences from a target machine to run on a host machine.

1984: ST-80
1987: CRISP
1987: Mimic
1988: SoftPC
1988: SELF
1991: Shade
1993: MINT
1994: Executor
1994: IMS.
1993: SimICS; in particular, ``Partial Translation''
1994: SimOS.

Hardware Emulation

1986: ATUM
1987: CRISP
1993: Migrant
1993: Tapeworm II
1993: WWT
1994: IMS

Interpreters

Simulation and tracing tools that perform execution using interpretation; the original executable code is neither preprocessed (augmentation or static cross-compilation) nor is it dynamically compiled to host code.

1986: Z80MU
1987: Cerberus
1988: g88
1990: Spa
1991: SimICS
1991: Dynascope
1992: Accelerator
1992: GNU Simulators
1992: SPIM
1993: Cygnus
1993: Dynascope-II
1993: Executor
1993: MINT
1993: WWT
1994: Dynascope-II
1994: Talisman (also known as ``mg88'').
1994: Kx10
1994: Mable
1994: Mime

Static Cross-Compilation

Statically cross-compile instruction sequences from a target machine to run on some host machine.

1983: dis+mod+run
1986: Moxie
1987: Cerberus
1992: Accelerator
1993: Vest and mx
1994: FlashPort
1994: FreePort Express
1994: Migrant
1994: Pixie-II

Static Augmentation

Augmentation-based tracing tools run host instructions native, but some instructions are simulated. For example, Proteus executes arithmetic and stack-relative memory reference instructions native, and simulates load and store instructions that may reference shared memory.

1983: Simon
1986: Pixie
1988: RPPT
1989: MPtrace
1989: Titan tracing
1989: TRAPEDS
1991: Proteus
1991: Tango Lite
1992: FAST
1992: OM
1992: Purify
1993: ATOM (based on OM)
1993: Hiprof (based on OM)
1993: qp/qpt
1993: Third Degree (based on OM)
1993: WWT
1994: IDtrace

Multiple Strategies

Some tools rely on having multiple strategies in order to achieve their desired functionality. For the purposes here, ``untraced native execution'' counts as a translator.

1951: EDSAC Debug (displaced execution, native execution)
1991: Dynascope (interpretation, native execution)
1992: Accelerator (static cross-compilation, interpretation)
1993: MINT (dynamic cross-compilation, interpretation)
1993: Vest and mx (static cross-compilation, interpretation)
1994: Executor (interpretation, dynamic cross-compilation)
1994: SimICS (interpretation, dynamic cross-compilation)
1995: FreePort Express (static cross-compilation, interpretation; uses Vest and mx technology)

Other

Some tools/papers not listed under other headings.

Match Between Host and Target

THIS CATEGORY NOT YET ORGANIZED.

Generally, the closer the match between the host and the target, the easier it is to write a simulator, and the better the efficiency. Possible mismatches include:

Byte or word size. For example, Kx10 simulates a machine with 36-bit words; it runs on machines with 32-bit and 64-bit words.
Numeric representation. For example, whether integers are sign-magnitude, one's complement, or two's complement. Or, for example, Vest, which simulates all VAX floating-point formats on a host machine that lacks some of the VAX formats.
Which instruction combinations cause exceptions, and how those exceptions are reported.
Synchronization and atomicity. In particular, the details may be messy where the target machine synchronizes implicitly and the host does so explicitly, since all target operations that might cause synchronization generally need to be treated as if they do.

Note that target support for self-modifying code may be treated as a special case of synchronization. For example, target machines with no caches or unified instruction and data caches will typically write instructions using ordinary store instructions. Therefore, all store instructions must be treated as potential code-modifying instructions.

For timing-accurate simulation (see Talisman and RSIM), some matches between the host and target can improve the efficiency, but many do not.

Timing Simulation

THIS CATEGORY NOT YET ORGANIZED.

Some instruction-set simulators also perform timing simulation. Timing is not strictly an element of timing simulation, but is often useful, since one major use for instruction set simulation is to collect information for predicting or analyzing performance. Important features of timing simulation include both the processor pipeline and the memory system (see Talisman and RSIM).

Product Status

THIS CATEGORY NOT YET ORGANIZED.

The status of tool

info: only information is available
nonprod: the tool is available but is not a product
product: the tool is a commercial product

From instruction-set simulation and tracing