Instruction-Level Simulation And Tracing

This is $Revision: 1.107 $, last updated $Date: 2004/06/08 17:36:44 $.
For an up-to-date version, please check


Places that are known to have dubious or absent information are marked with ???.

There is also a simulators mailing list. To subscribe, write to <>. For sample messages see here.

Quick Index

A Quick Overview of Instruction-Set Simulation and Tracing

The most important thing is what does it do? If you are building or using a simulator you need to be concerned at some level about the implementation. But first you need to figure out what you want it to do.
Why aren't you using the real thing? Do you want an accurate simulation? If yes, use real hardware. If no, make up the numbers. NullSIM. It's not as accurate, but it's cheaper and faster than any other simulation tool. It's the only universal simulator! Tired of configuring your simulator to do exactly what you want? Use NullSIM, with a familiar user interface and predictable results!
Instruction-set simulators can execute programs written or compiled for computers that do not yet exist, which no longer exist, or which are more expensive to purchase than to simulate. Simulators can also provide access to internal state that is invisible on the real hardware, can provide deterministic execution in the face of races, and can be used to ``stress test'' for situations that are hard to produce on real hardware.

Instruction-level tracing can provide detailed information about the behavior of programs; that information drives analyzers that analyze or predict behavior of various system components and which can, in turn, improve the design and implementation of everything from architectures to compilers to applications.

Although simulators and tracing tools appear to perform different tasks, they in practice do much the same work: both manipulate machine-level details, and both use similar implementation techniques.

This web page is a jumping-off point for lots of work related to instruction-level simulation and tracing. Please contribute! Please send comments, contributions, and suggestions to `'. If you'd like to help, edit this page, there is lots that needs to be done; your help is appreciated.

This web also page lists a few OS emulation tools. Although these don't specifically fit the category of tools covered by this page, it's interesting to consider whether you could glue together a processor emulator and an OS emulator and wind up with a whole simulated system. To date, whole simulated systems are built as integrated tools, rather than being assembled modularly.


Some terminology: See also the Glossary.

A Brief Categorization

A list of tools, organized according to various interesting features. See also a listing of tools ordered alphabetically. Interesting things about the tools include:

Purpose Of The Tool

Simulation and tracing tools can perform a wide variety of tasks. Here are some common uses: In addition, some tools are used for

Handles Application Bugs Robustly

Works with Self-Modifying Code


Multiple Processors


Support for Multiple Protection Domains


Signals and Exceptions


Support for System-Mode Code


Input Representation


Implementation: Decompilation Technology

``Decompilation technology'' here refers to the process of analyzing a (machine code) fragment and, through analysis, creating some higher-level information about the fragment. For simulation and tracing tools, decompilation is typically simpler than static program decompilation, in which the goal is to read a binary program and produce source code for it in some high-level language. Simulation and tracing ``has it easy'' in comparison because it is possible to get by with a lower-level representation and also to punt hard problems to the runtime, when more information is available.

Even so, executable machine code is difficult to simulate and trace efficiently (within 2 orders of magnitude of the performance of native execution) when using ``naive'' instruction-by-instruction translation, because lots of relevant information is unavailable statically. For example, every instruction is potentially a branch target; every word of memory is potentially used both as code and as data; every mutable word of memory is potentially executed, modified (at runtime), and then executed again; and so on.

Executable machine code is also inherently (target) machine-dependent and thus lexing and parsing the machine code is a source of potential portability problems. (Note that some tools use a high-level input, so that relatively little analysis is needed to determine the original programmers intent, at least at a level needed to simulate the program with modest efficiency.)

The following is a a list of tools and papers that show how to reduce the overhead of analyzing each instruction; how to reduce the number of times each instruction is analyzed; how to perform optimistic analysis and recover when it's wrong; and how to improve the abstraction of machine-dependent parts of the tool.

A short list:

A slightly longer list:

Implementation: Simulation Technology

The ``simulation technology'' is how the original machine instructions (or other source representation) gets translated into an executable representation that is suitable for simulation and/or tracing. Choices include:

Dynamic Compilation: Displaced Execution

Move an instruction from one place to another, but execute with the same host and target.

Dynamic Compilation: Cross-Compilation

Compile instruction sequences from a target machine to run on a host machine.

Hardware Emulation


Simulation and tracing tools that perform execution using interpretation; the original executable code is neither preprocessed (augmentation or static cross-compilation) nor is it dynamically compiled to host code.

Static Cross-Compilation

Statically cross-compile instruction sequences from a target machine to run on some host machine.

Static Augmentation

Augmentation-based tracing tools run host instructions native, but some instructions are simulated. For example, Proteus executes arithmetic and stack-relative memory reference instructions native, and simulates load and store instructions that may reference shared memory.

Multiple Strategies

Some tools rely on having multiple strategies in order to achieve their desired functionality. For the purposes here, ``untraced native execution'' counts as a translator.


Some tools/papers not listed under other headings.

Match Between Host and Target


Generally, the closer the match between the host and the target, the easier it is to write a simulator, and the better the efficiency. Possible mismatches include:

Note that target support for self-modifying code may be treated as a special case of synchronization. For example, target machines with no caches or unified instruction and data caches will typically write instructions using ordinary store instructions. Therefore, all store instructions must be treated as potential code-modifying instructions.

For timing-accurate simulation (see Talisman and RSIM), some matches between the host and target can improve the efficiency, but many do not.

Timing Simulation


Some instruction-set simulators also perform timing simulation. Timing is not strictly an element of timing simulation, but is often useful, since one major use for instruction set simulation is to collect information for predicting or analyzing performance. Important features of timing simulation include both the processor pipeline and the memory system (see Talisman and RSIM).


There are many ways to measure performance. Some common metrics include: Metrics that are more abstract have the advantage that they are typically simple to reason about and applicable across a variety of implementations. For example, host instructions may be counted relatively easily for each of a variety of target instructions, and the counts are relatively isolated from the structure of the caches and microarchitecture. Conversly, concrete metrics tend to more accurately reflect all related costs. For example the effects of caches and microarchitectures are included.l

It is worth noting that few reports give enough information about the measurement methodology in order to make a valid comparison. For example, if dilation is ``typically'' 20x, what is ``typical'', and what is the performance for ``non-typical'' workloads?

Product Status


The status of tool

An Alphabetical List of Tools

Just The Names

Longer Writeups

Longer writeups and cross-references. Some of the tools here have bibliographic entries, home pages or online papers, noted with ``See: ...''. Many are also described and referenced in the 1994 SIGMETRICS Shade paper, noted with ``See: Shade''.

See here for a list of tools.




Atari Emulators


The listed tools include:


Apple II Emulators


The listed tools include Apple II emulators:


Apple Macintosh Emulators


The listed tools include Macintosh emulators:




ATOM is built on top of OM.





BEaT (Binary Emulation and Translation)





See: bib cite, Shade

As of 1994, Cerberus was being actively used and updated by <>, who might be willing to provide information and/or code.

Commodore Emulators










Crusoe is an x86 emulator. It both interprets x86 instructions and also translates x86 instructions to a host VLIW instruction set; translations are cached for reuse. The host instruction set is not exported, only target instructions may be executed. A demonstration Crusoe executed both x86 and Java instructions.








A prototype/research vehicle for decompiling DOS EXE binary files. It uses digital signatures to determine library function calls and the original compiler.


DEC PDP-8 Simulators



DEC PDP-11 Simulators
















The EDSAC Debugger uses a tracing simulator that operates by: fetching the simulated instruction; decoding it to save trace information; checking to see if the instruction is a branch, and updating the simulated program counter if it is; else placing the instruction in the middle of the simulator loop and executing it directly; and then returning to the top of the simulator loop.

As an aside, the 1951 paper on the EDSAC debugger contains a pretty complete description of a modern debugger...




EEL reads object files and executables and allows tools built on top of EEL to modify the machine code without needing details of the underlying architecture or operating system or with the consequences of adding or deleting code.

EEL appears as a C++ class. EEL is provided with an executable, which it analyzes, creating abstractions such as executable (whole program), routines, CFGs, instructions and snippets. A tool built on EEL then edits the executable by performing structured rewrites of the EEL constructs; EEL ensures that details of register allocation, branches, etc. are updated correctly in the final code.













FLEX-ES (formerly OPEN/370) provides a System/390 on a Pentium. It includes system-mode operation, runs 8 popular S/370 OS's. On a 2-processor Pentium-II/400MHz, it provides 7 to 8 MIPS on one processor and I/O functions on the other processor. They also sell installed systems (hardware/software turnkey systems).


FLEX-ES home page.

FreePort Express

??? FreePort Express is a tool for convering Sun SPARC binaries to DEC Alpha AXP binaries.

See: FreePort Express web page


g88 is a portable simulator that simulates both user and system-mode code. It uses threaded code to performance on the order of a few tens of instructions per simulated instruction.


g88 was written by Robert Bedichek.

GNU Simulators





Built on top of OM.








The Interpreter


``The Interpreter'' is a micro-architecture that is intended for a variety of uses including emulation of existing or hypothetical machines and program profiling. An emulator is written in microcode and instructions executed from the microinstructions that are executed from the microstore give both parallelism and fast execution.


More detailed review:


This review/summary by Pardo.























MPtrace statically augments parallel programs written for the i386-based Sequent Symmetry multiprocessor. The instrumented programs are then run to generate multiprocessor address traces.


MPtrace was written by David Keppel and Eric J. Koldinger under the supervision of Susan J. Eggers and Henry M. Levy





New Jersey Machine Code Toolkit (NJMCT)

The New Jersey Machine Code Toolkit lets programmers decode and encode machine instructions symbolically, guided by machine specifications that mappings between symbolic and machine (binary) forms. It thus helps programmers write applications such as assemblers, diassemblers, linkers, run-time code generators, tracing tools, and other tools that consume or produce machine code.

Questions and comments can be sent to `'.





Partial Emulation


Virtual machines (VMs) provide greater flexibility and protection but require the ability to run one operating system (OS) under the control of another. In the absence of virtualization hardware, VMs are typically built by porting the OS to run in user mode, using a special kernel-level environment or as a system-level simulator. ``Partial Emulation'' or a ``Lightweight Virtual Machine'' is an augmentation-based approach to system-level simulation: directly execute most instructions, statically rewrite and virtualize those instructions which are ``tricky'' due to running in a VM environment. Compared to the other approaches, partial emulation offers fewer OS modifications than user-mode execution (user-mode Linux requires a machine description around 33,000 lines) and higher performance than a full (all instructions) simulator (Bochs is about 10x slower than native execution).

The implementation described here emultes all privilged instructions and some non-privileged instructions. One approach replaces each ``interesting'' instruction with illegal instruction traps. A second approach is to call emulation subroutines. ``Rewriting'' is done during compilation, and the current implementation requires OS source code [EY 03].

The approach here must: detect and emulate privileged and some non-privileged instructions; redirect system calls and page faults to the user-level OS; emulate an MMU; emulate devices.

The implementation with illegal instruction traps uses a companion process and debugger-type accesses to simulate interesting instructions. Otherwise, the user-level OS and its processes are executed in a single host process. The ``illegal instruction trap'' approach inserts an illegal instruction before each ``interesting'' instruction. The companion process then skips the illegal instruction, simulates the ``interesting'' instruction, then restarts the process. It is about 1,500 lines of C code. The ``procedure call'' approach is about 1,400 lines but is faster. There are still out-of-process traps due to e.g., MMU emulation (ala SimOS).

For IA-32, the ``interesting'' instructions are mov, push, and pop instructions that manipulate segment registers; call, jmp, and ret instructions that cross segment boundaries; iret; instructions that manipulate special registers; and instructions that read and write (privileged bits of) the flag register.

Not all host OSs have the right facilities to implement a partial emulator.

Some target OS changes were needed. For NetBSD, six address constants were changed to avoid host OS conflicts, and device drivers were removed. For FreeBSD, there were also replaced BIOS calls with code that returned the needed values; had they tried to implement (run) the BIOS the system would need to execute virtual 8086 mode.

User-level execution speed was similar to native. For OS-intensive microbenchmarks, the ``illegal instruction trap'' implementat was at least 100x slower than native (non-virtual) execution and slower than Bochs. The ``procedure call'' approach was 3-5x faster, but little slower than Bochs and still 10x slower than VMware which was in turn 4x-10x slower than native. A test benchmark (patch) was 15x slower using illegal instruction traps and about 5x slower using procedure calls. For comparison, VMware was about 1.1x slower.

The paper proposes using a separate host process for each page table base register value in order to reduce overhead for MMU emulation.


Further reading: ``Running BSD Kernels as User Processes by Partial Emulation and Rewriting of Machine Instructions'' [EY 03].





















Simulates pipeline-level parallelism and memory system behavior.






Shade combines efficient instruction-set simulation with a flexible, extensible trace generation capability. Efficiency is achieved by dynamically compiling and caching code to simulate and trace the application program; the cost is as low as two instructions per simulated instruction. The user may control the extent of tracing in various ways; arbitrarily detailed application state information may be collected during the simulation, but tracing less translates directly into greater efficiency. Current Shade implementations run on SPARC systems and simulate the SPARC (Versions 8 and 9) and MIPS I instruction sets.


Shade was written by Bob Cmelik, with help from David Keppel.


SimICS is a multiprocessor simulator. SimICS simulates both the user and system modes of 88000 and SPARC processors and is used for simulation, debugging, and prototyping.


SimICS should soon be available under license. Contact Peter Magnusson.

SimICS is a rewrite of gsim, which, in turn, was derived from g88. SimICS was written by Peter Magnusson, David Samuelsson, Bengt Werner and Henrik Forsberg.

Sinclair ZX Spectrum Emulators










SimOS emulates both user-mode and system-mode code for a MIPS-based multiprocessor. It uses a combination of direct-execution (some OS rewrites may be required) and dynamic cross-compilation (no rewrites needed) in order to emulate and, to some degree, instrument.




Sleipnir is an instruction-level simulator generator in the style of yacc. The configuration file is extended C, with special constructs to describe bit-level encodings and common code and support for generation of a threaded-code simulator.

For example, 0b_10ii0sss_s0iidddd specifies a 16-bit pattern with constant values which must match and named ``don't care'' fields i (split over two locations), s, and d. Sleipnir combines the various patterns to create an instruction decoder. Named fields are substituted in action rules for an instruction. For example, add 0b_10ii0sss_s0iidddd { GP(reg[$d]) = GP(reg[$s]) + $^c }. Here, ^ indicates sign-extension. Threaded-code dispatch is implied.

For simple machines, Sleipnir can generate cycle-accurate simulators. For more complex machines, it generates ISA machines. Threaded-code simulators are typically weak at VLIW simulation and machines with some kinds of exposed latencies. Threaded-code simulators typically simulate one instruction entirely before starting the next, but with VLIW and exposed latencies, the effects of a single instruction are spread over the execution of several instructions. Sleipnir supports some kinds of exposed latencies by running an after() function after each instruction. Simulator code that creates values writes them in to buffers, and code in after() can copy the values as needed to memory, the PC, and so on.

Reported machine description sizes, speeds, and level of accuracy include the following. ``Speed'' is based on a 250 MHz MIPS R10000-based machine.

In Norse mythology, ``Sleipnir'' is an eight-legged horse that could travel over land and sea and through the air.

ArchitectureMD linesSim. speedAccuracy
MIPS-I (integer)7005.1 MIPSISA
M*Core9706.4 MIPSCycle
ARM/Thumb2,8123.6 MIPSISA
TI C62015,2313.4 MIPSCycle
Lucent DSP16003,9033.7 MIPSCycle



SoftPC is an 8086/80286 emulator which runs on a variety of host machines. The first version implemented an 8086 processor core using an interpreter. It provided device emulators for EGA/VGA and Hercules graphics, hard disks, floppies, and and an interrupt controller.

In about 1986, Steve Chamberlain developed a dynamic cross-compiler for the Sun 3/260. The basic emulation structure is an array of bytes for simulated memory and and an ``action'' array, which is a same-size array of bytes. There are then three arrays R, W, and X for reads, writes, and execution; each is subscripted by the ``action'' byte and contains a pointer to the correspondition read, write, or execute action. For example, a read of location 17 is implemented by reading a = action[17], then branching to R[a]. Similarly, executing location 17 is implemented by reading a = action[17], then branching to X[a]. The default action is that each instruction is interpreted.

Each branch invokes the translator. The translator (dynamic cross-compiler) generates a translation that starts at the last branch and goes through the current branch. SoftPC then records the current branch target, which is the starting place for the next branch's translation. SoftPC ``installs'' the translation by allocating a byte subscript a, then it fills in the action table with the value a and sets R[a] to act as a normal read; W[a] to invalidate the corresponding translation; and X[a] to point to the new translation. For each byte ``covered'' by the translation, the action table is set to a byte value that will invalidate the translation. For each translation, SoftPC also sets a back-pointer in a 256-entry table so that when a particular translation is being invalidated it is easy to find the location in the ``action'' table which currently uses that translation.

There are thus a maximum of 256 translations at any time (actually 254 due to reserved byte values). The simulated system had up to 1MB RAM. In about 1988 Henry ??? extended the system to use the low bit of the address as part of the subscript, in order to expand the table to 512 translations. This is used in the first Apple MacIntosh target of SoftPC.

SoftPC emulates many devices, including EGA, VGA, and Hercules video; disks, including floppies and hard disks; the interrupt controller; and so on. In about 1987, Steve Chamberlain implemented an 8087 (FP coprocessor) that was not a faithful 8087 (e.g., did not provide full 80-bit FP) but which provided sufficient accuracy to run common applications.

















An Atari ST emulator that runs on (at least) a Sun SPARC IPC under SunOS 4.1; it emulates an MC68000, RAM, ROM, Atari ST graphics, keyboard, BIOS, clock and maybe some other stuff. On a SPECint=13.8 machine it runs average half the speed of a real ST.


By: Marinos "nino" Yannikos.


T2 is a SPARCle/Fugu simulator that is implemented by dynamically cross-compiling SPARCle code to SPARC code. It simulates both user and system mode code and was used for doing program development before the arrival of SPARCle hardware.

The name T2 is short for ``Talisman-2''. Note that, despite the similarity in names, Talisman and T2 share little in implementation or core features: the former uses a threaded code implementation and provides timing simulation of an m88k, while the latter uses dynamic cross-compilation and provides fast simulation of a SPARCle.

Tango Lite




Talisman is a fast timing-accurate simulator for an 88000-based multiple-processor machine. Talisman provides both user-mode and system mode simulation and can boot OS kernels. Simulation is reasonably fast, on the order of a hundred instructions per simulated instruction. Talisman also does low-level timing simulation and typically produces estimated running times that are within a few percent of running times on real hardware. Note that e.g. turning off dynamic RAM refresh simulation makes the timing accuracy substantially worse!


Tapeworm II



Third Degree


Built on top of OM.


Titan tracing






VAX-11 RSX Emulator



Vest and mx



Windows x86


According to a Microsoft information release, "Windows x86" is a user-space x86 emulator with an OS interface to 32-bit Microsoft Windows (tm).

Windows on Windows (WOW)


According to a Microsoft information release, "Windows on Windows" is a user-space x86 emulator with an interface to 16-bit Microsoft Windows (tm).






Wine is a Microsoft Windows(tm) OS emulator for i*86 systems. Most of the application's code runs native, but calls to ``OS'' functions are transformed into calls into Unix/X. Some programs require enhanced mode device drivers and will (probably) never run under Wine. Wine is neither a processor emulator nor a tracing tool.








Z-80 Simulators




8051 Emulators


        - 2500 A.D.
        - Avocet Systems
          (also compilers and assemblers).
        - ChipTools
             on a 33 MHz 486 matches the speed of a 12 MHz 8051
        - Cybernetic Micro Systems
        - Dunfield Development Systems
             Low cost $50.00
             500,000+ instructions/second on 486/33
             Can interface to target system for physical I/O
             Includes PC hosted "on chip" debugger with identical user
        - HiTech Equipment Corp.
        - Iota Systems, Inc.
        - J & M Microtek, Inc.
        - Keil Electronics
        - Lear Com Company
        - Mandeno Granville Electronics, Ltd
        - Micro Computer Control Corporation
             Simulator/source code debugger ($79.95)
        - Microtek Research
        - Production Languages Corp.
        - PseudoCorp

    Emulators ($$$ - high, $$ - medium, $ - low priced)
        - Advanced Micro Solutions  $$
        - Advanced Microcomputer Systems, Inc.  $
        - American Automation  $$$  $$
        - Applied Microsystems  $$
        - ChipTools (front end for Nohau's emulator)
        - Cybernetic Micro Systems  $
        - Dunfield Development Systems $
             plans for pseudo-ice using Dallas DS5000/DS2250
             used together with their resident monitor and host debugger
        - HBI Limited  $
        - Hewlett-Packard  $$$
        - HiTech Equipment Corp.
        - Huntsville Microsystems  $$
        - Intel Corporation  $$$
        - Kontron Electronics  $$$
        - Mandeno Granville Electronics, Ltd
             full line covering everything from the Atmel flash to the
                Siemens powerhouse 80c517a
        - MetaLink Corporation  $$  $
        - Nohau Corporation  $$
        - Orion Instruments  $$$
        - Philips $
             DS-750 pseudo-ICE developed by Philips and CEIBO
             real-time emulation and simulator debug mode
             source-level debugging for C, PL/M, and assembler
             programs 8xC75x parts
             low cost - only $100
             DOS and Windows versions available
        - Signum Systems  $$
        - Sophia Systems  $$$
        - Zax Corporation
        - Zitek Corporation  $$$
(Contacts listed in FAQ below).



A glossary of some terms used here and in the cited works.

See also Terminology.




[ASH 86]
\bibitem{ASH:86} Anant Agarwal, Richard L. Sites and Mark Horowitz, ``ATUM: A New Technique for Capturing Address Traces Using Microcode,'' Proceedings of the 13th International Symposium on Computer Architecture (ISCA-14), June 1986, pp.~119-127.

[AS 92]
\bibitem{AS:92} Kristy Andrews and Duane Sand, ``Migrating a CISC Computer Family onto RISC via Object Code Translation,'' Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-V), October 1992, pp.~213-222.

[BL 94]
\bibitem{BL:94} Thomas Ball, and James R. Larus ``Optimally Profiling and Tracing Programs,'' ACM Transactions on Programming Languages and Systems, (16)2, May 1994,

[Baumann 86]
\bibitem{Baumann:86} Robert A. Baumann, ``Z80MU,'' Byte Magazine, October 1986, pp.~203-216.

[Jeremiassen 00]

\bibitem{Jeremiassen:00} Tor E. Jeremiassen, ``Sleipnir --- An Instruction-Level Simulator Generator,'' International Conference on Computer Design, pp.~23--31. IEEE, 2000.

[Bedichek 90]
\bibitem{Bedichek:90} Robert Bedichek, ``Some Efficient Architecture Simulation Techniques,'' Winter 1990 USENIX Conference, January 1990, pp.~53-63. PostScript(tm) paper [Link broken, please e-mail <> to get it fixed.]

[Bedichek 94]
\bibitem{Bedichek:94} Robert Bedichek, ``The Meerkat Multicomputer: Tradeoffs in Multicomputer Architecture,'' Doctoral Dissertation, University of Washington Department of Computer Science and Engineering technical report 94-06-06, 1994.

[Bedichek 95]
  author = "Robert C. Bedichek",
  title = "Talisman: Fast and Accurate Multicomputer Simulation",
  booktitle="Proceedings of the 1995 ACM SIGMETRICS Conference on
             Modeling and Measurement of Computer Systems",

[BKLW 89]
\bibitem{BKLW:89} Anita Borg, R. E. Kessler, Georgia Lazana and David W. Wall, ``Long Address Traces from RISC Machines: Generation and Analysis,'' Digital Equipment Western Research Laboratory Research Report 89/14, (appears in shorter form as~\cite{BKW:90}) September 1989. Abstract/paper.

[BKW 90]
\bibitem{BKW:90} Anita Borg, R. E. Kessler and David W. Wall, ``Generation and Analysis of Very Long Address Traces,'' Proceedings of the 17th Annual Symposium on Computer Architecture (ISCA-17), May 1990, pp.~270-279.

[Boothe 92]
\bibitem{Boothe:92} Bob Boothe, ``Fast Accurate Simulation of Large Shared Memory Multiprocessors,'' technical report UCB/CSD 92/682, University of California, Berkeley, Computer Science Division, April 1992.

[BDCW 91]
\bibitem{BDCW:91} Eric A. Brewer, Chrysanthos N. Dellarocas, Adrian Colbrook and William E. Weihl, ``{\sc Proteus}: A High-Performance Parallel-Architecture Simulator,'' Massachusetts Institute of Technology technical report MIT/LCS/TR-516, 1991.

[BAD 87]
\bibitem{BAD:87} Eugene D. Brooks III, Timothy S. Axelrod and Gregory A. Darmohray, ``The Cerberus Multiprocessor,'' Lawrence Livermore National Laboratory technical report, Preprint UCRL-94914, 1987.

[Chamberlain 94]
\bibitem{Chamberlain:94} Steve Chamberlain, Personal communication, 1994.

[CUL 89]
\bibitem{CUL:89} Craig Chambers, David Ungar and Elgin Lee, ``An Efficient Implementation of {\sc Self}, a Dynamically-Typed Object-Oriented Language Based on Prototypes,'' OOPSLA '89 Proceedings, October 1989, pp.~49-70.

[CHRG 95]
%A John Chapin
%A Steve Herrod
%A Mendel Rosenblum
%A Anoop Gupta
%T Memory System Performance of UNIX on CC-NUMA Multiprocessors
%P 1-13
%D May 1995

[CHKW 86]
\bibitem{CHKW:86} Fred Chow, A. M. Himelstein, Earl Killian and L. Weber, ``Engineering a RISC Compiler System,'' IEEE COMPCON, March 1986.

[CG 93]
\bibitem{CG:93} Cristina Cifuentes and K.J. Gough ``A Methodology for Decompilation,'' In Proceedings of the XIX Conferencia Latinoamericana deInformatica, pp. 257-266, Buenos Aires, Argentina, August 1993. PostScript(tm) paper, PostScript(tm) paper.

(Note: these papers may have moved to here.)

[CG 94]
\bibitem{CG:94} Cristina Cifuentes and K.J. Gough ``Decompilation of Binary Programs,'' Technical report 3/94, Queensland University of Technology, School of Computing Science, 1994. PostScript(tm) paper

(Note: these papers may have moved to here.)

[CG 95]
\bibitem{CG:95} C. Cifuentes and K. John Gough, ``Decompilation of Binary Programs,'' Software--Practice&Experience, July 1995. PostScript(tm) paper

Describes general techniques and a 80286/DOS to C converter.

[Cifuentes 93]
\bibitem{Cifuentes:93} C. Cifuentes, ``A Structuring Algorithm for Decompilation'', Proceedings of the XIX Conferencia Latinoamericana de Informatica, Aug 1993, Buenos Aires, pp. 267 - 276. PostScript(tm) paper

[Cifuentes 94a]
\bibitem{Cifuentes:94a} Cristina Cifuentes ``Interprocedural Data Flow Decompilation,'' Technical report 4/94, Queensland University of Technology, School of Computing Science, 1994. PostScript(tm) paper

(Note: these papers may have moved to here.)

[Cifuentes 94b]
\bibitem{Cifuentes:94b} Cristina Cifuentes ``Reverse Compilation Techniques,'' Doctoral disseration, Queensland University of Technology, July 1994. PostScript(tm) paper (474MB).

[Cifuentes 94c]
\bibitem{Cifuentes:94c} C. Cifuentes, ``Structuring Decompiled Graphs,'' Technical Report 4/94, Queensland University of Technology, Faculty of Information Technology, April 1994. PostScript(tm)

[Cifuentes 95]
\bibitem{Cifuentes:95} C. Cifuentes, ``Interprocedural Data Flow Decompilation'', Journal of Programming Languages. In print, 1995. PostScript(tm) paper

[Cifuentes 95b]
\bibitem{Cifuentes:95b} C. Cifuentes, ``An Environment for the Reverse Engineering of Executable Programs''. To appear: Proceedings of the Asia-Pacific Software Engineering Conference (APSEC). IEEE. Brisbane, Australia. December 1995. PostScript(tm) paper

[Conte & Gimarc 95]
``Fast Simulation of Computer Architectures'', Thomas M. Conte and Charles E. Gimarc, Editors. Kluwer Academic Publishers, 1995. ISBN 0-7923-9593-X.

See here for ordering information.

%A Robert F. Cmelik
%A David R. Ditzel
%A Edmund J. Kelly
%A Colin B. Hunter
%A Douglas A. Laird
%A Malcolm John Wing
%A Gregorz B. Zyner
%T Combining Hardware and Software to Provide an Improved Microprocessor
%R United States Patent #US6031992

Available as of 2000/03 via

HERE r 77%

US06011908 01/04/2000 Gated store buffer for an advanced microprocessor Available as of 2000/03 via

77% r 77%

US05958061 09/28/1999 Host microprocessor with apparatus for temporarily holding target processor state e Available as of 2000/03 via


[Cmelik 93a]
\bibitem{Cmelik:93a} Robert F. Cmelik, ``Introduction to Shade,'' Sun Microsystems Laboratories, Incorporated, February 1993.

[Cmelik 93b]
\bibitem{Cmelik:93b} Robert F. Cmelik, ``The Shade User's Manual,'' Sun Microsystems Laboratories, Incorporated, February 1993.

[Cmelik 93c]
\bibitem{Cmelik:93c} Robert F. Cmelik, ``SpixTools Introduction and User's Manual,'' Sun Microsystems Laboratories, Incorporated, technical report TR93-6, February 1993. Html pointer

[CK 93]
\bibitem{CK:93} Robert F. Cmelik, and David Keppel, ``Shade: A Fast Instruction-Set Simulator for Execution Profiling,'' Sun Microsystems Laboratories, Incorporated, and the University of Washington, technical report SMLI 93-12 and UWCSE 93-06-06, 1993. Html pointer, PostScript(tm) paper.

[CK 94]
\bibitem{CK:94} Robert F. Cmelik, and David Keppel, ``Shade: A Fast Instruction-Set Simulator for Execution Profiling,'' Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems May 1994, pp.~128-137. Html pointer, PostScript(tm) paper. [Link broken, please e-mail <> to get it fixed.]

[CK 95]
\bibitem{CK:95} Robert F. Cmelik, and David Keppel, ``Shade: A Fast Instruction-Set Simulator for Execution Profiling,'' Appears as Chapter~2 of ``[Conte & Gimarc 95]'', pp.~5-46.

[CMMJS 88]
\bibitem{CMMJS:88} R. C. Covington, S. Madala, V. Mehta, J. R. Jump and J. B. Sinclair, ``The Rice Parallel Processing Testbed,'' Proceedings of the 1988 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, 1988, pp.~4-11.

[DLHH 94]
\bibitem{DLHH:94} Peter Davies, Philippe LaCroute, John Heinlein and Mark Horowitz, ``Mable: A Technique for Efficient Machine Simulation,'' Quantum Effect Design, Incorporated, and Stanford University technical report CSL-TR-94-636, 1994.

[DGK 91]
\bibitem{DGH:91} Helen Davis, Stephen R. Goldschmidt and John Hennessy, ``Multiprocessor Simulation and Tracing Using Tango,'' Proceedings of the 1991 International Conference on Parallel Processing (ICPP, Vol. II, Software), August 1991, pp.~II 99-107.

[Deutsch 83]
\bibitem{Deutsch:83} Peter Deutsch, ``The Dorado Smalltalk-80 Implementation: Hardware Architecture's Impact on Software Architecture,'' Smalltalk-80: Bits of History, Words of Advice, 1983 Addison-Wesley pp.~113-126.

Review/summary by Pardo:

[DS 84]
\bibitem{DS:84} Peter Deutsch and Alan M. Schiffman, ``Efficient Implementation of the Smalltalk-80 System,'' 11th Annual Symposium on Principles of Programming Languages (POPL-11), January 1984, pp.~297-302.

[DM 87]
\bibitem{DM:87} David R. Ditzel and Hubert R. McLellan ``Branch Folding in the CRISP Microprocessor: Reducing Branch Delay to Zero,'' Proceedings of the 14th Annual International Symposium on Computer Architecture; Computer Architecture News, Volume 15, Number 2, June 1987, pp.~2-9.

[DMB 87]
\bibitem{DMB:87} David R. Ditzel, Hubert R. McLellan and Alan D. Berenbaum, ``The Hardware Architecture of the CRISP Microprocessor,'' Proceedings of the 14th Annual International Symposium on Computer Architecture; Computer Architecture News, Volume 15, Number 2, June 1987, pp.~309-319.

[EKKL 90]
\bibitem{EKKL:90} Susan J. Eggers, David Keppel, Eric J. Koldinger and Henry M. Levy, ``Techniques for Efficient Inline Tracing on a Shared-Memory Multiprocessor,'' Proceedings of the 1990 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, May 1990, pp.~37-47.

[ES 94]
\bibitem{ES:94} Alan Eustace and Amitabh Srivastava, ``ATOM: A Flexible Interface for Building High Performance ProgramAnalysis Tools,'' Technical note TN-44, July 1994, Digital Equipment Corporation Western Research Laboratory, July 1994. Html.

[ES 95]
\bibitem{ES:95} Alan Eustace and Amitabh Srivastava, ``ATOM: A Flexible Interface for Building High Performance Program Analysis Tools,'' Proceedings of the USENIX 1995 Technical Conference on UNIX and Advanced Computing Systems, New Orleans, Louisiana, January 16-20, 1995, pp. 303-314.

[EY 03]
\bibitem{EY:83} Hideki Eiraku, Yasushi Shinjo ``Running BSD Kernels as User Processes by Partial Emulation and Rewriting of Machine Instructions'', Proceedings of BSDCon '03 San Mateo, CA, USA, 8-12 September 2003.

[Fujimoto 83]
\bibitem{Fujimoto:83} Richard M. Fujimoto, ``Simon: A Simulator of Multicomputer Networks'' technical report UCB/CSD 83/137, ERL, University of California, Berkeley, 1983.

[FC 88]
\bibitem{FC:88} Richard M. Fujimoto, and William B. Campbell, ``Efficient Instruction Level Simulation of Computers,'' Transactions of The Society for Computer Simulation 5(2), April 1988, pp.~109-123.

[FP 94]
\bibitem{FP:94} FlashPort product literature, AT&T Bell Laboratories, August 1994.

\bibitem{FN:75} M. J. Flynn, C. Neuhauser, ``EMMY -- An Emulation System for User Microprogramming,'' National Computer Conference, 1975, pp.~85-89.

Review/summary by Pardo:

[Gill 51]
\bibitem{Gill:51} S. Gill, ``The Diagnosis Of Mistakes In Programmes on the EDSAC'' Proceedings of the Royal Society Series A Mathematical and Physical Sciences, 22 May 1951, (206)1087, pp.~538-554, Cambridge University Press London and New York.

The scanned article available via here. [Link broken, please e-mail <> to get it fixed.]

[GDB 94]
\bibitem{GDB:94} GNU debugger and simulator, Internet Universal Resource Locator {\mbox{\tt}}, GDB distribution, {{\tt sim}} subdirectory.

Note that (as of 1998) for each simulator included with GDB there is also a GCC target and a set of runtime libraries.

[GH 92]
\bibitem{GH:92} Stephen R. Goldschmidt and John L. Hennessy, ``The Accuracy of Trace-Driven Simulations of Multiprocessors,'' Stanford University Computer Systems Laboratory, technical report CSL-TR-92-546, September 1992.

[Granlund 94]
\bibitem{Granlund:94} Torbj\"{o}rn Granlund, ``The Cygnus Simulator Proposal,'' Cygnus Support, Mountain View, California, March 1994.

[Grossman 94]
\bibitem{Grossman:94} Stu Grossman, Personal communication, November 1994.

[Halfhill 94]
\bibitem{Halfhill:94} Tom. R. Halfhill, ``Emulation: RISC's Secret Weapon,'' Byte, April 1994, pp.~119-130.

[Halfhill 00]
\bibitem{Halfhill:00} Tom. R. Halfhill, ``Transmeta Breaks x86 Low-Power Barrier,'' Microprocessor Report, Feburary 14, 2000.

Review/summary by Pardo:

[Haygood 1999]
%A Bill Haygood
%T Emulators and Emulation
%J Self (
%D 1999

Review/summary by Pardo:

[HJ 92]
\bibitem{HJ:92} Reed Hastings and Bob Joyce, ``Purify: Fast Detection of Memory Leaks and Access Errors,'' Proceedings of the Winter USENIX Conference, January 1992, pp.~125-136.

[HP 93]
\bibitem{HP:93} John Hennessy and David Patterson, ``Computer Organization and Design: The Hardware-Software Interface'' (Appendix A, by James R. Larus), Morgan Kaufman, 1993.

[HCU 91]
\bibitem{HCU:91} Urs H\"{o}lzle, Craig Chambers and David Ungar, ``Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches,'' Proceedings of the European Conference on Object-Oriented Programming (ECOOP), July 1991, pp.~21-38.

[HU 94]
\bibitem{HU:94} Urs H\"{o}lzle and David Ungar, ``Optimizing Dynamically-Dispatched Calls with Run-Time Type Feedback,'' Proceedings of the 1994 ACM Conference on Programming Language Design and Implementation (PLDI), June, 1994, pp.~326-335.

[Hsu 89]
\bibitem{Hsu:89} Peter Hsu, ``Introduction to Shadow,'' Sun Microsystems, Incorporated, July 1989.

[IMS 94]
\bibitem{IMS:94} ``IMS Demonstrates x86 Emulation Chip,'' Microprocessor Report, 9 May 1994, pp.~5 and~15.

[Irlam 93]
\bibitem{Irlam:93} Gordon Irlam, Personal communication, February 1993.

[James 90]
\bibitem{James:90} David James, ``Multiplexed Busses: The Endian Wars Continue,'' IEEE Micro Magazine, June 1990, pp.~9-22.

[Johnston 79]
\bibitem{Johnston:79} Ronald L. Johnston, ``The Dynamic Incremental Compiler of APL{$\backslash$}3000,'' APL Quote Quad 9(4), Association for Computing Machinery (ACM), June 1979, pp.~82-87.

[KCW 98]
%A Edmund J. Kelly
%A Robert F. Cmelik
%A Malcolm John Wing
%T Memory Controller For A Microprocessor for Detecting A Failure Of Speculation On The Physical Nature Of A Component Being Addressed
%D 1998/11/03
%R United States Patent #05832205
Available as of 2000/03 via

[Keppel 91]
\bibitem{Keppel:91} David Keppel, ``A Portable Interface for On-The-Fly Instruction Space Modification,'' Proceedings of the 1991 Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), April 1991, pp.~86-95 (source code is also available via anonymous ftp from PostScript(tm) paper.

[KEH 91]
\bibitem{KEH:91} David Keppel, Susan J. Eggers and Robert R. Henry, ``A Case for Runtime Code Generation,'' University of Washington Computer Science and Engineering technical report UWCSE TR 91-11-04, November 1991. Html pointer [Link broken, please e-mail <> to get it fixed.]

[KEH 93]
\bibitem{KEH:93} David Keppel, Susan J. Eggers and Robert R. Henry, ``Evaluating Runtime-Compiled Value-Specific Optimizations,'' University of Washington Computer Science and Engineering technical report UWCSE TR 93-11-02, November 1993. Html pointer [Link broken, please e-mail <> to get it fixed.]

[Killian 94]
\bibitem{Killian:94} Earl Killian, Personal communication, February 1994.

[KKB 98]
%A Alex Klaiber
%A David Keppel
%A Robert Bedicheck
%T Method and Apparatus for Correcting Errors in Computer Systems
%D 18 May 1999
%R United States Patent #05905855

Available as of 2000/03 via

[LOS 86]
\bibitem{LOS:86} T. G. Lang, J. T. O'Quin II and R. O. Simpson, ``Threaded Code Interpreter for Object Code,'' IBM Technical Disclosure Bulletin, 28(10), March 1986, pp.~4238-4241.

[Larus 93]
\bibitem{Larus:93} James R. Larus, ``Efficient Program Tracing,'' IEEE Computer 26(5), May 1993, pp.~52-61.

[LB 94]
\bibitem{LB:94} James R. Larus and Thomas Ball, ``Rewriting Executable Files to Measure Program Behavior,'' Software -- Practice and Experience 24(1), February 1994, pp.~197-218.

[LS 95]
\bibitem{LS:95} James R. Larus, and Eric Schnarr ``EEL: Machine-Independent Executable Editing,'' to appear: SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 291-300, June 1995. PostScript(tm) paper, or an laternate site for the same paper.

[Magnusson 93a]
\bibitem{Magnusson:93a} Peter S. Magnusson, ``A Design For Efficient Simulation of a Multiprocessor,'' Proceedings of the First International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), La Jolla, California, January 1993, pp.~69-78. PostScript(tm) paper.

[Magnusson 93b]
\bibitem{Magnusson:93b} Peter S. Magnusson, ``Partial Translation,'' Swedish Institute for Computer Science technical report T93:05, 1993. PostScript(tm) paper.

[MS 94]
\bibitem{MS:94a} Peter S. Magnusson, and David Samuelsson, ``A Compact Intermediate Format for SimICS,'' Swedish Institute of Computer Science technical report R94:17, September 1994. PostScript(tm) paper.

[MW 94]
\bibitem{MW:94} Peter S. Magnusson, and Bengt Werner, ``Some Efficient Techniques for Simulating Memory,'' Swedish Institute of Computer Science technical report R94:16, September 1994. PostScript(tm) paper.

[MW 95]
\bibitem{MW:94} Peter S. Magnusson, and Bengt Werner, ``Efficient Memory Simulation in SimICS''. In 28th International Annual Simulation Simposium, Phoenix, AZ. April 1995.

[Matthews 94]
\bibitem{Matt:94} Clifford T. Matthews, ``680x0 emulation on x86 (ARDI's syn68k used in Executor)'' USENET \code{comp.emulators.misc} posting, 3 November, 1994. plain text document, plain text document.

[May 87]
\bibitem{May:87} Cathy May, ``Mimic: A Fast S/370 Simulator,'' Proceedings of the ACM SIGPLAN 1987 Symposium on Interpreters and Interpretive Techniques; SIGPLAN Notices 22(7), June 1987, pp.~1-13.

[Nielsen 91]
\bibitem{Nielsen:91} Robert D. Nielsen, ``DOS on the Dock,'' NeXTWorld, March/April 1991, pp.~50-51.

[NG 87]
\bibitem{NG:87} David Notkin and William G. Griswold, ``Enhancement through Extension: The Extension Interpreter,'' Proceedings of the ACM SIGPLAN '87 Symposium on Interpreters and Interpretive Techniques, June 1987, pp.~45-55.

[NG 88]
\bibitem{NG:88} David Notkin and William G. Griswold, ``Extension and Software Development,'' Proceedings of the 10th International Conference on Software Engineering, Singapore, April 1988, pp.~274-283.

[Kep 03]
\bibitem{Keppel:03} David Keppel, ``How to Detect Self-Modifying Code During Instruction-Set Simulation'', April, 2003. Available as of 2003/10 from ``'' (papers).

  • ``How To Detect Self-Modifying Code During Instruction-Set Simulation''
  • [PM 94]
    \bibitem{PM:94} Jim Pierce and Trevor Mudge, ``IDtrace -- A Tracing Tool for i486 Simulation,'' Proceedings of the International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), January 1994.

    [PRA 97]
    \bibitem{PRA:97} Vijay S. Pai, Parthasarathy Ranganathan, and Sarita V. Adve. "RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors". In Proceedings of the Third Workshop on Computer Architecture Education. February 1997.

    [Patil 96]
    TITLE = "Efficient {P}rogram {M}onitoring {T}echniques",
    AUTHOR = "Harish Patil",
    INSTITUTION = "Computer Sciences department, University of Wisconsin",
    YEAR = 1996,
    MONTH = "July",
    TYPE = "{TR} 1320: Ph.D. Dissertation",
    ADDRESS = "Madison, Wisconsin",
    [Patil 97]
    TITLE = "Low-cost, Concurrent Checking of Pointer and Array Accesses in
    {C} Programs",
    JOURNAL = "Software - Practice and Experience",
    AUTHOR = "Harish Patil and Charles Fischer",
    VOLUME = 27,
    NUMBER = 1,
    YEAR = 1997,
    MONTH = "January",
    PAGES = "87-110",
    [PM 94]
    \bibitem{PM:94} Jim Pierce and Trevor Mudge, ``IDtrace -- A Tracing Tool for i486 Simulation,'' Technical report 203-94, University of Michigan, March 1994.

    PostScript(tm) paper

    [Pittman 87]
    \bibitem{Pittman:87} Thomas Pittman, ``Two-Level Hybrid Interpreter/Native Code Execution for Combined Space-Time Program Efficiency,'' Proceedings of the 1987 ACM SIGPLAN Symposium on Interpreters and Interpretive Techniques, June 1987, pp.~150-152.

    [Pittman 95]
    \bibitem{Pittman:95} Thomas Pittman, ``The RISC Penalty,'' IEEE Micro, December 1995, pp.~5, 76-80.

    Brief summary by Pardo: The paper analyzes the costs of RISC due to higher (instruction) cache miss rates. Demonstrated by comparing the inner loop code for an interpretive (ddi) processor emulator to the inner loop code for a dynamic cross-compiler. With perfect cache hit ratios, the former would take 61 cycles while the latter would take 18. However, due to cache miss costs, the ``18-cycle'' version took longer to run.

    [RF 94a]
      author="Norman Ramsey and Mary F. Fernandez",
      title="The {New} {Jersey} Machine-Code Toolkit",
      institution="Department of Computer Science, Princeton University",

    PostScript(tm) paper, conference paper

    [RF 94b]
      author="Norman Ramsey and Mary F. Fernandez",
      title="{New} {Jersey} {Machine-Code} {Toolkit} Architecture Specifications",
      institution="Department of Computer Science, Princeton University",

    WWW page, PostScript(tm) paper.

    [RF 94c]
      author="Norman Ramsey and Mary F. Fernandez",
      title="{New} {Jersey} {Machine-Code} {Toolkit} Reference Manual",
      institution="Department of Computer Science, Princeton University",

    WWW page, PostScript(tm) paper.

    [RF 95]
    \bibitem{Ramsey:95} Norman Ramsey and Mary F. Fernandez, ``The {New} {Jersey} Machine-Code Toolkit,'' Proceedings of the Winter 1995 USENIX Conference, New Orleans, Louisiana, January, 1995, pp~289-302.

    @inproceedings{ramsey:jersey, refereed=1,
      author="Norman Ramsey and Mary F. Fernandez",
      title="The {New} {Jersey} Machine-Code Toolkit",
      booktitle="Proceedings of the 1995 USENIX Technical Conference",
      address="New Orleans, LA",

    [RFD 72]
    \bibitem{RFD:72} E. W. Reigel, U. Faber, D. A. Fisher, ``The Interpreter -- A Microprogrammable Building Block System,'' Spring Joint Computer Conference, 1972, pp.~705-723.

    \bibitem{RHLLLW:93} Steven K. Reinhardt, Mark D. Hill, James R. Larus, Alvy. R. Lebeck, J. C. Lewis and David A. Wood, ``The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers,'' Proceedings of the 1993 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, June 1993 pp.~48-60.

    [Reuter 8X]
    \bibitem{Reuter:8X} Jim Reuter, ``Decomp,'' circa 1985. source code available from `', and sample inputs available from `'.

    [RHWG 95]
    \bibitem{RHWG:95} Mendel Rosenblum, Stephen A. Herrod, Emmett Witchel and Anoop Gupta, ``Complete Computer System Simulation: The SimOS Approach,'' IEEE Parallel and Distributed Technology: Systems and Applications, 3(4):34-43, Winter 1995.

    abstract, Compressed PostScript® (57 KB).

    [RW 94]
    \bibitem{RW:94} Mendel Rosenblum and Emmett Witchel, ``SimOS: A Platform for Complete Workload Studies,'' Personal communication (submitted for publication), November 1994.

    [SW 79]
    \bibitem{SW:79} H. J. Saal and Z. Weiss, ``A Software High Performance APL Interpreter,'' APL Quote Quad 9(4), June 1979, pp.~74-81.

    [Samuelsson 79]
    \bibitem{Samuelsson:94} David Samuelsson ``System Level Interpretation of the SPARC V8 Instruction Set Architecture,'' Research report 94:23, Swedish Institute of Computer Science, 1994.

    [Sathaye 94]
    \bibitem{Sath:94} Sumedh W. Sathaye, ``Mime: A Tool for Random Emulation and Feedback Trace Collection,'' Masters thesis, Department of Electrical and Computer Engineering, University of South Carolina, Columbia, South Carolina, 1994.

    [SE 93]
    \bibitem{SE:93} Gabriel M. Silberman and Kemal Ebcio\u{g}lu ``An Architectural Framework for Supporting Heterogeneous Instruction-Set Architectures,'' IEEE Computer, June 1993, pp.~39-56.

    [SCKMR 92]
    \bibitem{SCKMR:92} Richard L. Sites, Anton Chernoff, Matthew B. Kirk, Maurice P. Marks and Scott G. Robinson, ``Binary Translation,'' Digital Technical Journal Vol. 4 No. 4 Special Issue 1992. Html paper, PostScript(tm) paper.

    [SCKMR 93]
    \bibitem{SCKMR:93} Richard L. Sites, Anton Chernoff, Matthew B. Kirk, Maurice P. Marks and Scott G. Robinson, ``Binary Translation,'' Communications of The ACM (CACM) 36(2), February 1993, pp.~69-81.

    [Smith 91]
    \bibitem{Smith:91} M. D. Smith, ``Tracing With Pixie,'' Technical Report CSL-TR-91-497, Stanford University, Computer Systems Laboratory, November 1991. PostScript(tm).

    [Sosic 92]
    \bibitem{Sosic:92} Rok Sosi\v{c}, ``Dynascope: A Tool for Program Directing,'' Proceedings of the 1992 ACM Conference on Programming Language Design and Implementation (PLDI), June 1992, pp.~12-21.

    [Sosic 94]
    \bibitem{Sosic:94} Rok Sosi\v{c}, ``Design and Implementation of Dynascope, a Directing Platform for Compiled Programs,'' technical report CIT-94-7, School of Computing and Information Technology, Griffith University, 1994.

    [Sosic 94b]
    \bibitem{Sosic:94b} Rok Sosi\v{c}, ``The Dynascope Directing Server: Design and Implementation,'' Computing Systems, 8(2): 107-134, Spring 1994

    [SE 94a]
    \bibitem{SE:94a} Amitabh Srivastava, and Alan Eustace, ``ATOM: A System for Building Customized Program Analysis Tools,'' Research Report 94/2, March 1994, Digital Equipment Corporation Western Research Laboratory, March 1994.

    [SE 94b]
    \bibitem{SE:94b} Amitabh Srivastava, and Alan Eustace, ``ATOM: A System for Building Customized Program Analysis Tools,'' Proceedings of the 1994 ACM Conference on Programming Language Design and Implementation (PLDI), June 1994, pp.~196-205.

    [SW 92]
    \bibitem{SW:92} Amitabh Srivastava, and David W. Wall, ``A Practical System for Intermodule Code Optimization at Link-Time,'' Research Report 92/6, Digital Equipment Corporation Western Research Laboratory, December 1992.

    [SW 93]
    \bibitem{SW:93} Amitabh Srivastava, and David W. Wall, ``A Practical System for Intermodule Code Optimization at Link-Time,'' Journal of Programming Languages, March 1993.

    [SF 89]
    \bibitem{SF:89} Craig B. Stunkel and W. Kent Fuchs, ``{\sc Trapeds}: Producing Traces for Multicomputers via Execution Driven Simulation,'' ACM Performance Evaluation Review, May 1989, pp.~70-78.

    [SJF 91]
    \bibitem{SJF:91} Craig B. Stunkel, Bob Janssens and W. Kent Fuchs, ``Address Tracing for Parallel Machines,'' IEEE Computer 24(1), January 1991, pp.~31-38.

    [Tucker 65]

    %A S. G. Tucker
    %T Emulation of Large Systems
    %J Communications of the ACM (CACM)
    %V 8
    %N 12
    %D December 1965
    %P 753-761

    ABSTRACT: The conversion problem and a new technique called emulation are discussed. The technique of emulation is developed and includes sections on both the Central Processing Unit (CPU) and the Input/Output (I/O) unit. This general treatment is followed by three sections that describe in greater detail the implemention of compatibility features using the emulation techniques for the IBM 7074, 7080 and 7090 systems on the IBM System/360.

    Cited by [Wilkes 69].

    Pardo has a copy.

    [UNMS 94a]
    \bibitem{UNMS:94} Richard Uhlig, David Nagle, Trevor Mudge and Stuart Sechrest, ``Tapeworm~II: A New Method for Measuring OS Effects on Memory Architecture Performance,'' Technical Report, University of Michigan, Electrical Engineering and Computer Science Department, May 1994.

    [UNMS 94b]
    \bibitem{UNMS:94} Richard Uhlig, David Nagle, Trevor Mudge and Stuart Sechrest, ``Trap-driven Simulation with Tapeworm II,'' Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), San Jose, California, October 5-7, 1994.

    [Veenstra 93]
    \bibitem{Veenstra:93} Jack E. Veenstra, ``Mint Tutorial and User Manual,'' University of Rochester Computer Science Department, technical report 452, May 1993.

    [VF 94]
    \bibitem{VF:94} Jack E. Veenstra and Robert J. Fowler, ``{\sc Mint}: A Front End for Efficient Simulation of Shared-Memory Multiprocessors,'' Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), January 1994, pp.~201-207.

    [Wilkes 69]
    \bibitem{Wilkes:69} Maurice V. Wilkes ``The Growth of Interest in Microprogramming: A Literature Survey,'' Computing Surveys 1(3):139-145, September 1969.


    [Wilner 72]
    \bibitem{Wilner:72} W. T. Wilner, ``Design of the Burroughs B1700,'' Fall Joint Compuer Conference, 1972, 489-497.


    [WK 99]
    %A Malcolm J. Wing
    %A Edmund C. Kelly
    %T Method And Apparatus for Aliasing Memory Data In An Advanced Microprocessor
    %D 20 July 1999
    %R United States Patent #05926832
    Available as of 2000/03 via

    Comments on Some Of The Papers

    Who's Who

    This really isn't who's who -- it's a list of some of the people who were easy to find and who seem to spend a lot of their time doing simulation and tracing.

    The Lists

    A list of names of some people doing simulation and tracing.

    Details About Who's Who

    See here for a list of names.

    Anant Agarwal


    Kristy Andrews

    (``'' as of 1999/09) is a co-developer of Accelerator.

    Thomas Ball

    Thomas Ball (` as of 1999) works on improving software production through domain-specific languages, automated program analysis, and software visualization. He has helped build tools such as qp/qpt and the Hot Path Browser.

    See also The Twelve Days of Christmas, Reverse-Engineered.

    Robert A. Baumann


    Robert Bedichek

    Robert Bedichek (`' as of 1999/07), wrote the g88 simulator while at Tektronix, Talisman while at the University of Washington, and T2 while at MIT.

    Robert Bedichek is interested in computer architecture and operating systems and has built Meerkat, a modestly-scalable multiple-processor machine. The lack of good systems analysis tools, however, keeps driving him back to tool-building.

    Anita Borg


    Bob Boothe


    Eric A. Brewer


    Steve Chamberlain

    Steve Chamberlain (`sac * pobox ; com', as of 1999/07) has written a series of amazing virtual machines including SoftPC and the GNU Simulators. He has also done a lot of work on BFD, GAS, GCC, GLD, etc. for a wide variety of machines.

    Lee Kiat Chia

    Lee Kiat Chia, ( as of 1995/06) is part of Purdue's Binary Emulation and Translation group.

    Cristina Cifuentes

    Cristina Cifuentes (``'' or ``'' both as of 1998) has studied decompilation extensively and wrote dcc. Cristina was previously at UTAS (here, `' as of 1994).

    Bob Cmelik

    Bob Cmelik (`' as of 1995/03 [Link broken, please e-mail <> to get it fixed.]), wrote the Spix static instrumentation tools and the Shade simulation and tracing tool while at Sun Microsystems, and helped to design and implement Crusoe at Transmeta.

    Thomas M. Conte

    Thomas M. Conte ( as of 2001/08/31) is one of the editors of [Conte & Gimarc 95].

    Don Eastlake

    Don Eastlake ( as of July 1995) wrote the instruction execution engine of 11SIM.

    Alan Eustace

    Alan Eustace (`' as of 1994) worked with Amitabh Srivastava to develop ATOM.

    U. Faber


    D. A. Fisher


    Richard M. Fujimoto

    Richard M. Fujimoto (`', as of 1994) has worked on several simulators, including dis+mod+run, Simon, and a variety of time-warp simulation systems.

    Torbjorn Granlund

    Torbjorn Granlund (`', as of 1994) has worked on simulators both at the Swedish Institute for Computer Science and at Cygnus.

    Note: the second ``o'' in ``Torbjorn'' should have an umlaut over it, but so far no umlaut appears here.

    Bill Haygood

    Bill Haygood ( as of July 1999) wrote portable PDP-8, Z-80, and LSI-11 simulators. His home page contains a short writeup [Haygood 1999] on computation/space tradeoffs (e.g., lookup tables for condition codes).

    Tom R. Halfhill

    Tom R. Halfhill ( and as of March 2000) writes for Microprocessor Report and before that wrote for Byte and other technology magazines. He has been watching and writing about emulation for quite a while. Articles include [Halfhill 94], [Halfhill 94b], and [Halfhill 00].

    Steve Herrod

    Steve Herrod ( or as of January 2002) has been involved with Tango Lite, studying about and writing a paper called ``Memory System Performance of UNIX on CC-NUMA Multiprocessors'', a hardware, trace-based evaluation of IRIX on the Stanford DASH multiprocessor, SimOS, the Crusoe processor, and VMWare.

    Mark Horowitz


    Tor Jeremiassen

    (`tor*ti;com' as of 2003/10).

    R. E. Kessler


    James R. Larus

    James R. Larus, (`larus * microsoft ; com' as of 2003/11) specializes in compiler- and architecutre-related projects and has worked on EEL, SPIM, qp/qpt and WWT.

    Georgia Lazana


    Peter S. Magnusson

    Peter Magnusson (`psm * virtutech ; com' as of 2003/10) built SimICS and its predecessor, gsim while at the Swedish Institute for Computer Science. As of 2003/10 he is president and CEO of Virtutech.

    Cathy May

    Cathy May (may * watson ; ibm ; com) is author of Mimic, which performed dynamic translation of groups of blocks of target code to groups of blocks of host code.

    Vijay S. Pai

    Vijay S. Pai (vijaypai * rice ; edu as of 2003/11) was coauthor of RSIM at Rice.


    Pardo (`pardo * xsim ; com' as of 1999/03) helped with the design and implementation of MPtrace and the design of Shade, both while at the University of Washington. He was an original Crusoe architect and implmentor.

    Pardo is most infamous for his shameless promotion of Run-Time Code Generation (also known as self-modifying code), and he also suffers from interests in compilers, computer architecture, operating systems, performance analysis, and a bunch of other stuff.

    Russell Quong

    Russell W. Quong (at Sun Microsystems as of 2002/10) directed Purdue's Binary Emulation and Translation group and also built very-large workload simulators at Sun.

    Norman Ramsey

    Norman Ramsey (`norman*eecs;purdue;edu' as of 2003) spends a lot of time trying to solve portability problems and is responsible for the New Jersey Machine Code Toolkit. He also has an ongoing interest in debuggers, interpreters, linkers, and so on.

    E. W. Reigel


    Steven K. Reinhardt

    Steven K. Reinhardt (`' as of 1994) spends a lot of time simulating multiple-processor machines. He's spent a lot of time working on WWT.

    Mendel Rosenblum

    Mendel Rosenblum (`' as of 1999, also probably `') has both spent a lot of time simulating multiple-processor machines, and lately, at VMWare, simulating uniprocessors nested virtual machines.

    Duane Sand

    Duane Sand (``'', as of 1999/09) designed and helped write Accelerator, used to migrate Tandem's application base and OS from their proprietary processor to a MIPS-based processor.

    Richard L. Sites


    Richard L. Sites


    M. D. Smith

    Michael D. Smith ( as of 1999/08) works on computer architectures and compilation for those architectures. Instruction-Set Simulator and tracing tool papers include Pixie.

    Rok Sosic

    Rok Sosic ( as of 1995/09) wrote Dynascope and Dynascope-II. Note: The `c' in Rok's name should have a `v'-shaped accent over it, but HTML doesn't seem to have that accent.

    Amitabh Srivastava

    Amitabh Srivastava (`' as of 1994) worked with David W. Wall to develop OM and with Alan Eustace to develop ATOM.

    Richard M. Stallman

    Richard M. Stallman ( as of July 1995) wrote the device emulation engine of 11SIM.

    Thai Wey Then

    Thai Wey Then (at Purdue as of 1995/06) is part of Purdue's Binary Emulation and Translation group.

    David Wall

    David Wall ( as of 95/08) has worked on several compiler tools that operate at or near link time, including Titan tracing and OM.

    Maurice V. Wilkes

    Maurice V. Wilkes, is generally considered the inventor of microcode. Wilkes cites various authors who've proposed or used microcode to implement high-performance emulators.

    Wilkes is also one of the ``grandparents'' of computing. He was around the day that EDSAC became the world's first opreational general-purpose programmable computer. He is credited with saying that they ``discovered'' debugging that very same day while attempting to execute a simple program for generating a table of prime numbers (see ``The Multics System'' by Elliot I. Organick, The MIT Press 1972, pg. 127).

    Emmett Witchel

    Emmet Witchel (`' as of 1995, `' as of 1994) worked on SimOS.

    Marinos "nino" Yannikos

    Marinos "nino" Yannikos ( is the author of STonX and helped with this web page.

    See Also: Related Work

    A variety of work that seems relevant and isn't folded in elsewhere. See also more general information discovery sites

    To Do: Work In Progress

    Selected Recent Changes


    This page was prepared by Pardo, based in large part on the SIGMETRICS '94 Shade paper. and thus with help from Bob Cmelik and the other people who helped with the Shade paper.

    The magic script that splits a single large HTML source file into various-size pages, for ease of browsing, was written by Marinos "nino" Yannikos.

    Many individual entries in the TODO section are contributions by readers too numerous to list. Many thanks, all!

    Copyright (c) 1999 by Pardo. All rights reserved.

    Please address comments and suggestions to `'.