Find and incorporate
ssim: A Superscalar Simulator
Mike Johnson
M. D. Smith
Stanford Univ.
Pixie front ends in

  • Find and incorporate (based on work with `pixie' and `ssim', may tell about them?)
    Johnson, Mike: Superscalar microprocessor design
    Englewood Cliffs, NJ : Prentice Hall, 1991. - XXIV, 288 S. :
    graph. Darst.
    (Prentice-Hall series in innovative technology) Literaturverz. S. 273 - 278
    ISBN 0-13-875634-1

  • Find and incorporate (mostly results of tracing, but may discuss simulation and tracing):
      author =       {Jerome C. Huck and Michael J. Flynn},
      title =        {Analyzing Computer Architectures},
      publisher =    {IEEE Computer Society Press},
      year =         1989,
      address =      {Washington, DC}

  • Find out more about Robert Bedichek's T2 simulator.

  • Yaze Z80 and CP/M emulator. more info and source code.

  • WinDLX, MSWindows GUI for DLX. Also include information about DLX from [Hennessy & Patterson 93]

  • UAE Commodore Amiga hardware emulator (incomplete).

  • DEC FX!32 binary translation/emulation system for running Microsft Windows applications.

  • Find and incorporate
    %A Max Copperman
    %A Jeff Thomas
    %T Poor Man's Watchpoints
    %J ACM SIGPLAN NotIces
    %V 30
    %N 1
    %D January 1995
    %P 37-44
    Pardo has a copy. Executive summary: debugging tool; statically patches loads and stores with code to check for data breakpoints.

    Amusing story: The processor they were running on has load delay slots and does not have pipeline interlocks. Their tool replaces each load or store with several instructions; it patched a piece of user-mode code of the form

    load addr -> r5
    store r5 -> addr2
    Before patching, the code saved the old value of r5 to addr2. After patching, it saved the new value. Technically, this code was broken already because the symptom could have also been exhibited by an interrupt or exception between the load and the store.

  • Find and incorporate information about Spike. Referenced in [Conte & Gimarc 95], Tom Conte says (paraphrpased):

    ``Spike was built inside GNU GCC by Michael Golden and myself. It includes a lot of features that have appeared in ATOM, including the simulator with the benchnark into a single ``self-tracing'' binary. The instruction trace was based on an abstract machine model distilled from GCC's RTL; it had both a high-level and a low-level form. Spike is still in occasional use, but has never been released.''

  • Find and incorporate information about Reiser & Skudlarek's paper "Program Profiling Problems, and a Solution via Machine Language Rewriting", from ACM SIGPLAN Notices, V29, $1, January 1994. Pardo has a copy.

    Basic summary: Wanted to profile. -p/-pg code is larger and slower by enough to make it hard to justify profiling as he default. Assumes the entire source is available. For these and other reasons, wrote jprof which operates with disassembly, analysis and rewriting. Discusses sampling errors, expected accuracy, stability, randomness, etc. Describes jprof: counters and stopwatches; subroutine call graph. Domain/OS on HP/Apollo using 68030. Discusses shared libraries. Can also use page-fault clock. 4-microsecond clocks. Some lessons/observations. Doesn't explain how program running time is affected by jprof.

  • Design tradeoffs between various implementations of 68k implementations (comp.arch posting).

  • More on decompilation of PC executables

  • Update the reference for Alvin R. "Alvy" Lebeck.

  • Review and include FX!32. March 5 1996 Microprocessor Report. Jim Turley, "Alpha Runs x86 Code with FX!32".

    Summary: DEC is running Win32 application binaries on Alpha by a new combination of interpreter and static translator. The static translator runs in the background, between the first and second executions of the application. It uses info collected by the interpreter during the 1st run, to reliably distinguish active code paths from r/o data and work out the effects of indirect jumps. Static analysis can't do this automatically on its own, for typical x86 binaries.

  • Add info about Doug Kwan (author of YAE, an Apple ][ emulator) to "Who's who" section. Nino says: only freely available dynamic recompilation. (Dynamic recompilation for SPARC and MIPS). Information forwarded by Marinos Yannikos <>.

  • Find, read, and incorporate decompilation info (also cites a program verification dissertation):
    %A P. J. Brown
    %T Re-creation of Source Code from Reverse Polish Form
    %J Softwawe \- Practice & Experience
    %V 2
    %N 3
    %P 275-278
    %D 1972
    Note: there's a slightly later SPE that has a follow-up article explaining how to do it faster/more efficiently.

  • Xref: uw-beaver comp.compilers:10907 Path: uw-beaver!!!!!!!not-for-mail From: (Frans F.J. Faase) Newsgroups: comp.compilers Subject: Re: Need decompiler for veryyy old code.... Date: 29 Apr 1996 23:11:51 -0400 Organization: University of Twente, Dept. of Computer Science Lines: 29 Sender: Approved: Message-ID: <96-04-144@comp.compilers> References: <96-04-110@comp.compilers> NNTP-Posting-Host: Keywords: disassemble, IBM > Currently I am undertaking to modify some very old IBM code (at least > 20 years old. I believe that the code is either Assembler or Cobol. I do not know whether the following is of use for you, but I do maintain a WWW page about decompilation, which has some links to other resources as well. <a href=""></a> Maybe, you should contact Martin Ward <>: <a href=""></a> Or Tim Bull <>: <a href=""></a> Frans (P.S. Email to <> bounced with 451 error) -- Frans J. Faase Information Systems Group Tel : +31-53-4894232 Department of Computer Science secr. : +31-53-4893690 University of Twente Fax : +31-53-4892927 PO box 217, 7500 AE Enschede, The Netherlands Email : --------------- --------------------- -- Send compilers articles to, meta-mail to

  • A Java runtime, which generates native code at runtime: Softway's Guava. Info from Jeremy Fitzhardinge (

  • Find, read, and summarize the following:
    %A Ariel Pashtan
    %T A Prolog Implementation of an Instruction-Level Processor Simulator
    %J Software \- Practice and Experience
    %V 17
    %N 5
    %P 309-318
    %D May 1987

  • Find, read and summarize "Augmint". According to Anthony-Trung Nguyen <>, it is based on MINT, and understands x86 instruction set and runs on Intel x86 boxes with UNIX (Linux, Unixware, etc.) or Windows NT. It is described further at and there was a paper in ICCD-96 paper, available from

  • Find, read and summarize "Etch". See Etch is an x86 Windows/NT tool for annotating x86 binaries, without source code.
  • Find, read and summarize "Etch".
    From: (Brad Chen)
    Newsgroups: comp.arch
    Subject: Windows x86 Address Traces Available
    Date: 7 Oct 1996 22:20:30 GMT
    Organization: Harvard University EECS
    Lines: 15
    Message-ID: <53bvne$>
    Keywords: Windows x86 address traces

    A collection of x86 memory reference traces from Win32 applications are now available from the following URL: The collection includes traces from both commercial and public-domain applications. The collection currently includes:

     - Perl
     - MPeg Play
     - Borland C++
     - Microsoft Visual C
     - Microsoft Word
    These traces were created using Etch, and instrumentation and optimization tool for Win32 executables. For more information on Etch see the above URL.


  • Add information on iprof. Here's a summary from Peter Kuhn:
    Peter Kuhn                                    voice: +49-89-289-23092
    Institute for Integrated Circuits (LIS)       fax1:  +49-89-289-28323
    Technical University of Munich                fax2:  +49-89-289-25304
    Arcisstr. 21, D-80290 Munich, Germany 
    http:  //
    The operation is: With gcc/g++ option -a (above version 2.6.3) you can produce a basic block statistics file (bb.out), which contains the number of times each basic block of the program is acccessed during runtime. iprof processes this basic block statistics file and accesses the program's executable to summarize the machine instructions used for each basic block. So iprof doesn't make any modifications to the gcc/g++ and is easily portable among gcc/g++ supported architectures. Currently binaries for LINUX 486, Pentium and Sparc Solaris are provided, ports to other architectures are straightforward.

  • There are many ways to measure slowdown. Each has certain benefits, each has shortcomings. So what we see includes: (I forget the details, but I'd definitely check out some of the early SimICS papers for a discussion of runnign times, Peter has more to say.)

  • Find an incorporate Harish Patil's dissertation <> on ``efficient program monitoring''. See the TR. Or, try here.
    From: Harish Patil 
    Newsgroups: comp.compilers
    Subject: Thesis available: Program Monitoring
    Date: 29 Jan 1997 11:21:02 -0500
    Organization: Compilers Central
    Lines: 59
    Message-ID: <97-01-223@comp.compilers>
    Reply-To: Harish Patil 
    Keywords: report, available, performance
    Hello everyone:
     I am glad to announce that my Ph.D. thesis, titled "Efficient Program
     Monitoring Techniques", is available on-line. This thesis was
     completed under the supervision of Prof. Charles Fischer at the
     department of Computer Sciences, University of Wisconsin --Madison.
     The thesis is available as technical report # 1320. Please check it
     out at the URL:
     An abstract of the thesis follows.
    	Efficient Program Monitoring Techniques
    Programs need to be monitored for many reasons, including performance
    evaluation, correctness checking, and security. However, the cost of
    monitoring programs can be very high. This thesis contributes two
    techniques for reducing the high execution time overhead of program
    monitoring: 1) customization and 2) shadow processing. These
    techniques have been tested using a memory access monitoring system
    for C programs.
    "Customization" reduces the cost of monitoring programs by decoupling
    monitoring from original computation. A user program can be customized
    for any desired monitoring activity by deleting computation not
    relevant for monitoring. The customized program is smaller, easier to
    analyze, and almost always faster than the original program. It can be
    readily instrumented to perform the desired monitoring. We have
    explored the use of program slicing technology for customizing C
    programs. Customization can cut the overhead of memory access
    monitoring by up to half.
    "Shadow processing" hides the cost of on-line monitoring by using idle
    processors in multiprocessor workstations. A user program is
    partitioned into two run-time processes. One is the main process
    executing as usual, without any monitoring code. The other is a shadow
    process following the main process and performing the desired
    monitoring. One key issue in the use of shadow process is the degree
    to which the main process is burdened by the need to synchronize and
    communicate with the shadow process. We believe the overhead to the
    main process must be very modest to allow routine use of shadow
    processing for heavily-used production programs. We therefore limit
    the interaction between the two processes to communicating certain
    irreproducible values. In our experimental shadow processing system
    for memory access checking the overhead to the main process is very
    low - almost always less than 10%.  Further, since the shadow process
    avoids repeating some of the computations from the main program, it
    runs much faster than a single process performing both the computation
    and monitoring.
    Harish Patil:  Massachusetts Language Lab - Hewlett Packard
    Mail Stop CHR02DC, 300 Apollo Drive, Chelmsford MA 01824
    Phone: 508 436 5717  Fax: 508 436 5135  Email:

    From instruction-set simulation and tracing