Whos-Pardo

To Do: Work In Progress

Other tracing work by the Computer Architecture folks at the University of Washington.
A history of computers, which includes information on simulators.
Larus's ``AE'' (abstract execution), including James R. Larus, ``Abstract Execution: A Technique for Efficiently Tracing Programs'', Software--Practice & Experience, 20(12):1241-1258, December 1990 UW CS TR 912.
Pure Software, United States Patent 5,193,180 March 1993.
Robert Wahbe, Steven Lucco, Thomas E. Anderson and Susan L. Graham ``Efficient Software-Based Fault Isolation,'' Proceedings of the Fourteenth ACM Symposium on Operating System Principles (SOSP), December 1993, pp.~203-216. PostScript(tm) paper
Robert Wahbe, Steven Lucco and Susan L. Graham, ``Practical Data Breakpoints: Design and Implementation,'' In Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation. SIGPLAN, ACM, 1993. Sigplan Notices, Volume 28, Number 6. PostScript(tm) paper
Robert Wahbe. Efficient Data Breakpoints. In ASPLOS V. ACM, 1992. PostScript(tm) paper
Mark Weiser, ``Program Slicing,'' IEEE Transactions on Software Engineering, SE-10(4):352-357, July 1994.
The glossary should have a ``short list'', and should define runtime, dynamic, speculation and rollback.
STonX not integrated.
Apple Macintosh emulators not integrated.
Apple II emulators not integrated.
DEC PDP-8 emulators not integrated.
DEC PDP-11 emulators not integrated.
SimCPM (see the bottom of the page) isn't listed.
xtrs isn't integrated.
MSX emulators isn't integrated.
Sinclair ZX Spectrum Emulators isn't integrated.
VAX-11 RSX Emulator isn't integrated.
Wabi isn't integrated.
8051 Emulators isn't integrated.
Emulator for disk drive processors
AXP simulators including Mannequin, ISP, AUD, and AUDI. Also in PostScript(tm)
Simulation of the DEC NVAX: text and PostScript(tm)
Intel 80960 Emulators. (this is just a list of mfg., but lists a few emulator mfg.)
Wine is not integrated.
DOSEMU (from Linux) isn't listed.
The section Implementation: Simulation Technology needs to be rewritten to make orthogonal static/dynamic and level of IR.
Need an entry for \bibitem{Rosin:69} R. F. Rosin, ``Contemporary Concepts of Microprogramming and Emulation,'' Computer Surveys, Vol. 1, No. 4, Dec. 1969, pp. 197-212. Cited by [RFD 72].
Need an entry for \bibitem{Tucker:65} S. G. Tucker, ``Emulation of large systems,'' Comm. ACM 8, 12 (Dec. 1965), 753-761. Cited by [Wilkes 69].
Need an entry for \bibitem{Tucker:65} M. A. McCormack, T. T. Schansman, and K. K. Womack, ``1401 compatability feature on the IBM system/360 model 30,'' Comm. ACM 8, 12 (Dec. 1965), 773-776. Cited by [Wilkes 69].
Need an entry for \bibitem{Benjamin:65} R. I. Benjamin, ``The Spectra 70/45 emulator for the RCA 301,'' Comm. ACM 8, 12 (Dec. 1965), 748-752. Cited by [Wilkes 69].
Need an entry for \bibitem{Green:66} J. Green, ``Microprogramming, emulators and programming languages,'' Comm. ACM 9, 3 (Mar. 1966), 230-232. Cited by [Wilkes 69] and [RFD 72].
Need an entry for \bibitem{CN:66} C. R. Campbell, and D. A. Neilson, ``Microprogramming the Spectra 70/35,'' Datamation 12, 9 (1966), 64. Cited by [Wilkes 69].
[Wilkes 69] isn't yet incorporated fully into this document.
Need to merge interpreter types (e.g. ddi, tci, etc.) into glossary.
Cons up ``Who's Who'' entries for all the authors.
Need to read/incorporate Computer Structures: Readings and Examples, by Bell and Newell, especially Weber's paper comparing native issue rates of microcode and emulated ``native'' code. (Reference courtesey of Duane Sand).
\bibitem{Wilner:72} is not integrated.
[FN75] is not integrated. It should have a ``Tool'' listing and the review under the paper should be moved up to the tool listing.
Need to find/read/incorporate R. F. Rosin, ``Contemporary Concepts of Microprogramming and Emulation,'' Computing Surveys, Volume 1, Number 4, December 1961, pp.~197-212, found in [Tucker & Flynn, Dynamic Miicroprogramming: Processor Organization and Programming, 1971].
Need to find/incorporate papers from Techewb, including
- this
- this
- this
- this
- this
- this
- this
Try a search here using `emulation' as the search key.
Emulation under GNU HURD
Need to find/read/incorporate [HV 79]
\bibitem{HV:79} R. N. Horspool and N. Marovac, ``An Approach to the Problem of Detranslation of Computer Programs,'' The Computer Journal, 23(3)223-229, 1979.
C. Cifuentes says that it may not apply to e.g. x86 architectures; limits also mentioned in one of May's papers
Duane Sand notes that dynamic cross compilation -- anything using Runtime Code Generation (RTCG) [Link broken, please e-mail <pardo@xsim.com> to get it fixed.] -- has at least one advantage over some other forms of emulation:
It is less liable to cause legal troubles with copyright owners' rights to control all derivative works, because the RTCG's result is only a transient copy rather than a permanently stored codefile. RTCG-based emulation techniques are legal IFF the 1995 generation of chips' hardware-implemented transformations at icache load time are legal.
Need to read/categorize/etc. the paper ``A $\mu$ Simulator and $\mu$ Debugger for the CAL DATA 135'', Fredirck L. Ross, M. S. Thesis, Department of Compuer Science, Southern Illinois University July 1978. See Pardo and ask him to ask Ebeling for a copy.
Need to find/read/categorize/etc. the paper
```
%A Tom Thompson,
%T Building the Better Virtual CPU
%J Byte
%D August 1995
%P 149-150
```
which, Duane Sand says (paraphrasing):
Describes two variations of Apple's 68K interpreter it used in the initial PowerMacs. Both variations identify frequently executed blocks of 68K code, compile them with trivial peephole optimizations into host RISC code, hold the code in a software-managed "cache" until it's full, then throw it all away and start over. One variation is used on Unix emulations of Apple, and the other variation is used on the 'Power Mac 9500', in combination with a modified interpreter with a smaller lookup table footprint than in the first generation PowerMacs. (The original interpeter used so much lookup table space that it ran poorly on the original PPC 603 chips, which held up Apple's plans for laptop PowerMacs for a year.) On PowerMacs that are able to run both old & new versions, the new version averages 20-30% speedup over the entire (nonnative) application.
The Retrocomputing Museum keeps track of emulators for retro-machines.
The Charles Babbage Institute has a link to a history of simulation site, but the link wasn't working when Pardo tried it. See perhaps their TR or the TR's author, Paul A. Fishwick. The TR itself is about general simulation, rather than ISA simulation (simulating instruction sets).
Integrate [PM 94].
Investigate/integrate Redo project, using decompilation for software maintainance and reverse engineering.
Investigate Slack: A New Performance Metric for Parallel Programs which has to do with profiling.

Find and incorporate

%A S. Graham
%T The Semi-Automatic Computer Conversion System (SACCS)
%J Presented at the ACM Reprogramming Conference
%C Princeton, New Jersey
%D June 1965
%W Referenced by [Gaines 65]

Find and incorporate

[Gaines 65]

%A R. Stockton Gaines
%T On the Translation of Machine Language Programs
%J Communications of the ACM (CACM)
%V 8
%N 12
%D December 1965
%P 736-741

Pardo has a copy.

Find and incorporate

[Dellert 65]

%A George T. Dellert, Jr.
%T A Use of Macros in Translation of Symbolic Assembly Language of One
Computer to Another
%J Communications of the ACM (CACM)
%V 8
%N 12
%D December 1965
%P 742-748

Pardo has a copy.

Find and incorporate

[Benjamin 65]

%A R. I. Benjamin
%T The Spectra 70/45 Emulator for the RCA 301
%J Communications of the ACM (CACM)
%V 8
%N 12
%D December 1965
%P 748-752

Pardo has a copy.

Find and incorporate

[Tucker 65]

%A S. G. Tucker
%T Emulation of Large Systems
%J Communications of the ACM (CACM)
%V 8
%N 12
%D December 1965
%P 753-761
%X ABSTRACT: The conversion problem and a new technique called
emulation are discussed.  The technique of emulation is developed and
includes sections on both the Central Processing Unit (CPU) and the
Input/Output (I/O) unit.  This general treatment is followed by three
sections that describe in greater detail the implemention of
compatibility features using the emulation techniques for the IBM
7074, 7080 and 7090 systems on the IBM System/360.

Pardo has a copy.

Find and incorporate

%A Thomas M. Olsen
%T Philco/IBM Translation at Problem-Oriented Symbolic and Binary Levels
%J Communications of the ACM (CACM)
%V 8
%N 12
%D December 1965
%P 762-768

Pardo has a copy.

Find and incorporate

%A Marvin Lowell Graham
%A Peter Zilahy Ingerman
%T An Assembly Language for Reprogramming
%J Communications of the ACM (CACM)
%V 8
%N 12
%D December 1965
%P 769-773

Pardo has a copy.

Find and incorporate

%A M. A. McCormack
%A T. T. Schansman
%A K. K. Womack
%T 1401 Compatability Feature on the IBM System/360 Model 30
%J Communications of the ACM (CACM)
%V 8
%N 12
%D December 1965
%P 773-776

Pardo has a copy.

Find and incorporate

%A Donald M. Wilson
%A David J. Moss
%T CAT: A 7090-3600 Computer-Aided Translation
%J Communications of the ACM (CACM)
%V 8
%N 12
%D December 1965
%P 777-781

Pardo has a copy.

Find and incorporate

%A Mark I. Halpern
%T Machine Independence: Its Technology and Economics
%J Communications of the ACM (CACM)
%V 8
%N 12
%D December 1965
%P 782-785

Pardo has a copy.

Integrage FreePort Express
Find and integrate MIT AI Lab Technical reports about the ``11SIM'' PDP-11 simulator that ran on a PDP-6 and later a DEC KA-10. Written by Don Eastlake and Richard M. Stallman. See MIT TR e-mail.
Find and integrate OmniVM.
Write a PDP-1 emulator in Java, OmniVM or whatever.
One advantage of running a simulator on a segmented machine (such as the 80386) is that you can set aside a segment to use for application addreses. Then, bogus application addresses can't ``wrap'' to valid simulator addresses. The advantage is that you can get the benefits of full address mapping (e.g., g88, SimOS) without paying the translation overhead. Of course segmented machines are harder to simulate efficiently...
gsim
gsim is derived from g88 and a predecessor of SimICS. Get a straight history from Peter Magnusson and write up a separate entry.
Integrate this:

OPEN/370
2 MIPS System/370 on a 90MHz Pentium. Includes system-mode operation, runs 8 popular S/370 OS's. They also sell installed systems (hardware/software turnkey systems).
Categories:
- Purpose: cross-machine simulation
- Input representation: exe
- Detail: User, System, Device
- Multiple protection domains: No
- Multiple processors: No
- Signals and execptions: Yes
- SMC OK: Yes
- Simulation technology: dynamic cross-compilation
  - Units of analysis: several target instructions at a time.
  - Units of translation: one target instruction at a time
- Tool is robust in the face of application bugs: Yes
- Status: commercially available product.
- Degree of possible mismatch between host and target:
  - byte order
  - floating-point numeric representation
  - instructions that cause exceptions
- Host machine instructions per target machine instruction: 3--hundreds, typically ~50.
OPEN/370 home page.
Integrate this:
[Emmerik 94] M van Emmerik, ``Signatures for Library Functions in Executable Files'', Technical Report 2/94, Queensland University of Technology, Faculty of Information Technology, April 1994. Submitted ... 1994. PostScript(tm)
Search outward from Marinos "nino" Yannikos's page on simulators.

Integrate this:

LazyFP (???)

%A David Wakeling
%T A Throw-away Compiler for a Lazy Functional Language
%J Proceedings of the Fuji Internation Workshop on Functional and
Logic Programming
%C Susono, Japan
%D July 1995
%P 203--216
%W David Wakeling <david@dcs.exeter.ac.uk>
``http://www.dcs.ex.ac.uk/~david''
%X Dynamic cross-compiler for a virtual machine used to run lazy
languages.

This file is easiest to edit and browse (sometimes) if it is one big file, but it (a) takes forever to load and (b) screws with some browsers. Ideally, it should be possible to build a script that takes the big thing as input and makes it into a bunch of little things. Doing it in a couple of passes would even be alright, so that intra-document links would get fixed up correctly. For example:
- Build a tool that generates one .html file for each header up to the next header. Do one file per <h2> or one file per <h2> and <h3> or <h2> <h3> <h4> (currently only nested to <h4>).
- Scan each HTML file for name="..." constructs and build an index of {"#name", "file#name"} tuples. Easy to distinguish file-relative references from absolute names becaues the absolute ones all start with "http:" or "ftp:" or whatever, while the relative ones start with "Tool-" or "Bib-" or "Whos-" or whatever.
- Go back and edit every .html file, replacing each occurance of href="#name" with "file#name", as described by the index tuple file.
It's a simple Perl script, write it and get credit for helping out!
S21
A simulator for the MuP21 MISC (Minimal Instruction-Set Computer; c.f., Mutable Instruction-Set Computer) by Ultra Technology
- simulator
Arm Simulators
According to Michael Williams michael.williams@armltd.co.uk, simulators for the ARM6 and ARM7 are available as part of the ARM GNU toochain, from ftp://ftp.cl.cam.ac.uk/arm/gnu/armul-1.0.tar.gz and maybe also VHDL/Verilog models from ARM partners, see http://www.arm.com/.
Ape 2600 -- Atari Simulator
An ongoing project to emulate an Atari 2600 on a generic machine.

Find and incorporate

[Rowson 94]
@InProceedings{Rowson:94,
  author =       "James A. Rowson",
  title =        "Hardware/Software Co-Simulation",
  booktitle =    "Proc.~of the 31st Design Automation Conference
(DAC~'94)",
  year =         "1994",
  organization = "ACM",
  address =      "San Diego, CA",
  OPTmonth =     "June",
  note =         "(Tutorial)",
  OPTannote =    ""
}

Find and incorporate

@InProceedings{Rogers:92,
  author =       "Anne Rogers and Kai Li",
  title =        "Software Support for Speculative Loads",
  pages =        "38-50",
  booktitle =    "Proc.~of the 5th International Conference on Architectural
                  Support for Programming Languages and Operating Systems",
  year =         "1992",
  month =        "October"
}

Evidently contains information about a cycle-level simulator. More in

@TechReport{Rogers:93,
  author =       "Anne Rogers and Scott Rosenberg",
  title =        "Cycle Level {SPIM}",
  institution =  "Department of Computer Science, Princeton
University",
  year =         1993,
  address =      "Princeton, NJ",
  month =        "October"
}

Find and incorporate

Tracing with Pixie
Michael D. Smith
Center for Integrated Systems
Stanford University
April 91

ssim: A Superscalar Simulator
Mike Johnson
AMD
M. D. Smith
Stanford Univ.

Pixie front ends in ftp://velox.stanford.edu/pub

Find and incorporate (based on work with `pixie' and `ssim', may tell about them?)

Johnson, Mike: Superscalar microprocessor design
Englewood Cliffs, NJ : Prentice Hall, 1991. - XXIV, 288 S. :
graph. Darst.
(Prentice-Hall series in innovative technology) Literaturverz. S. 273 - 278
ISBN 0-13-875634-1

Find and incorporate (mostly results of tracing, but may discuss simulation and tracing):

@Book{Huck:89,
  author =       {Jerome C. Huck and Michael J. Flynn},
  title =        {Analyzing Computer Architectures},
  publisher =    {IEEE Computer Society Press},
  year =         1989,
  address =      {Washington, DC}
}

Find out more about Robert Bedichek's T2 simulator.
Yaze Z80 and CP/M emulator. more info and source code.
WinDLX, MSWindows GUI for DLX. Also include information about DLX from [Hennessy & Patterson 93]
UAE Commodore Amiga hardware emulator (incomplete).
DEC FX!32 binary translation/emulation system for running Microsft Windows applications.
Find and incorporate
```
%A Max Copperman
%A Jeff Thomas
%T Poor Man's Watchpoints
%J ACM SIGPLAN NotIces
%V 30
%N 1
%D January 1995
%P 37-44
```
Pardo has a copy. Executive summary: debugging tool; statically patches loads and stores with code to check for data breakpoints.
Amusing story: The processor they were running on has load delay slots and does not have pipeline interlocks. Their tool replaces each load or store with several instructions; it patched a piece of user-mode code of the form
```
load addr -> r5
store r5 -> addr2
```
Before patching, the code saved the old value of r5 to addr2. After patching, it saved the new value. Technically, this code was broken already because the symptom could have also been exhibited by an interrupt or exception between the load and the store.
Find and incorporate information about Spike. Referenced in [Conte & Gimarc 95], Tom Conte conte@eos.ncsu.edu says (paraphrpased):
``Spike was built inside GNU GCC by Michael Golden and myself. It includes a lot of features that have appeared in ATOM, including the simulator with the benchnark into a single ``self-tracing'' binary. The instruction trace was based on an abstract machine model distilled from GCC's RTL; it had both a high-level and a low-level form. Spike is still in occasional use, but has never been released.''
Find and incorporate information about Reiser & Skudlarek's paper "Program Profiling Problems, and a Solution via Machine Language Rewriting", from ACM SIGPLAN Notices, V29, $1, January 1994. Pardo has a copy.
Basic summary: Wanted to profile. -p/-pg code is larger and slower by enough to make it hard to justify profiling as he default. Assumes the entire source is available. For these and other reasons, wrote jprof which operates with disassembly, analysis and rewriting. Discusses sampling errors, expected accuracy, stability, randomness, etc. Describes jprof: counters and stopwatches; subroutine call graph. Domain/OS on HP/Apollo using 68030. Discusses shared libraries. Can also use page-fault clock. 4-microsecond clocks. Some lessons/observations. Doesn't explain how program running time is affected by jprof.
Design tradeoffs between various implementations of 68k implementations (comp.arch posting).
More on decompilation of PC executables
Update the reference for Alvin R. "Alvy" Lebeck.
Review and include FX!32. March 5 1996 Microprocessor Report. Jim Turley, "Alpha Runs x86 Code with FX!32".
Summary: DEC is running Win32 application binaries on Alpha by a new combination of interpreter and static translator. The static translator runs in the background, between the first and second executions of the application. It uses info collected by the interpreter during the 1st run, to reliably distinguish active code paths from r/o data and work out the effects of indirect jumps. Static analysis can't do this automatically on its own, for typical x86 binaries.
Add info about Doug Kwan (author of YAE, an Apple ][ emulator) to "Who's who" section. Nino says: only freely available dynamic recompilation. (Dynamic recompilation for SPARC and MIPS). Information forwarded by Marinos Yannikos <nino@complang.tuwien.ac.at>.
Find, read, and incorporate decompilation info (also cites a program verification dissertation):
```
%A P. J. Brown
%T Re-creation of Source Code from Reverse Polish Form
%J Softwawe \- Practice & Experience
%V 2
%N 3
%P 275-278
%D 1972
```
Note: there's a slightly later SPE that has a follow-up article explaining how to do it faster/more efficiently.
Xref: uw-beaver comp.compilers:10907 Path: uw-beaver!uhog.mit.edu!news.mathworks.com!newsfeed.internetmci.com!in2.uu.net!ivan.iecc.com!ivan.iecc.com!not-for-mail From: faase@cs.utwente.nl (Frans F.J. Faase) Newsgroups: comp.compilers Subject: Re: Need decompiler for veryyy old code.... Date: 29 Apr 1996 23:11:51 -0400 Organization: University of Twente, Dept. of Computer Science Lines: 29 Sender: johnl@iecc.com Approved: compilers@ivan.iecc.com Message-ID: <96-04-144@comp.compilers> References: <96-04-110@comp.compilers> NNTP-Posting-Host: localhost.iecc.com Keywords: disassemble, IBM > Currently I am undertaking to modify some very old IBM code (at least > 20 years old. I believe that the code is either Assembler or Cobol. I do not know whether the following is of use for you, but I do maintain a WWW page about decompilation, which has some links to other resources as well. <a href="http://www.cs.utwente.nl/~faase/Ha/decompile.html"> http://www.cs.utwente.nl/~faase/Ha/decompile.html</a> Maybe, you should contact Martin Ward <Martin.Ward@durham.ac.uk>: <a href="http://www.dur.ac.uk/~dcs0mpw/"> http://www.dur.ac.uk/~dcs0mpw/</a> Or Tim Bull <tim.bull@durham.ac.uk>: <a href="http://www.dur.ac.uk/~dcs1tmb/home.html"> http://www.dur.ac.uk/~dcs1tmb/home.html</a> Frans (P.S. Email to <PROCUNIERA@ucfv.bc.ca> bounced with 451 error) -- Frans J. Faase Information Systems Group Tel : +31-53-4894232 Department of Computer Science secr. : +31-53-4893690 University of Twente Fax : +31-53-4892927 PO box 217, 7500 AE Enschede, The Netherlands Email : faase@cs.utwente.nl --------------- http://www.cs.utwente.nl/~faase/ --------------------- -- Send compilers articles to compilers@iecc.com, meta-mail to compilers-request@iecc.com.
A Java runtime, which generates native code at runtime: Softway's Guava. Info from Jeremy Fitzhardinge (jeremy@suede.sw.oz.au)

Find, read, and summarize the following:

%A Ariel Pashtan
%T A Prolog Implementation of an Instruction-Level Processor Simulator
%J Software \- Practice and Experience
%V 17
%N 5
%P 309-318
%D May 1987

Find, read and summarize "Augmint". According to Anthony-Trung Nguyen <anguyen@csrd.uiuc.edu>, it is based on MINT, and understands x86 instruction set and runs on Intel x86 boxes with UNIX (Linux, Unixware, etc.) or Windows NT. It is described further at http://www.csrd.uiuc.edu/iacoma/augmint.html and there was a paper in ICCD-96 paper, available from ftp://ftp.csrd.uiuc.edu/pub/Projects/iacoma/aug.ps.
Find, read and summarize "Etch". See http://memsys.cs.washington.edu/memsys/html/etch.html. Etch is an x86 Windows/NT tool for annotating x86 binaries, without source code.
Find, read and summarize "Etch".
```
From: bchen@eecs.harvard.edu (Brad Chen)
Newsgroups: comp.arch
Subject: Windows x86 Address Traces Available
Date: 7 Oct 1996 22:20:30 GMT
Organization: Harvard University EECS
Lines: 15
Message-ID: <53bvne$5lb@necco.harvard.edu>
NNTP-Posting-Host: steward.harvard.edu
Keywords: Windows x86 address traces
```
A collection of x86 memory reference traces from Win32 applications are now available from the following URL: http://etch.eecs.harvard.edu/traces/index.html. The collection includes traces from both commercial and public-domain applications. The collection currently includes:
```
 - Perl
 - MPeg Play
 - Borland C++
 - Microsoft Visual C
 - Microsoft Word
```
These traces were created using Etch, and instrumentation and optimization tool for Win32 executables. For more information on Etch see the above URL.
(etch-info@cs.washington.edu)
Add information on iprof. Here's a summary from Peter Kuhn:
```
Peter Kuhn                                    voice: +49-89-289-23092
Institute for Integrated Circuits (LIS)       fax1:  +49-89-289-28323
Technical University of Munich                fax2:  +49-89-289-25304
Arcisstr. 21, D-80290 Munich, Germany 
email: P_Kuhn@lis.e-technik.tu-muenchen.de
http:  //www.lis.e-technik.tu-muenchen.de/people/kp.html
```
- portable to GNU gcc/g++ supported platforms, operating systems and processors
- detailed instrumentation of instruction usage
- no source code modification necessary
- no restrictions for the application programmer (only "-a" switch for gcc/g++ compilers)
- applicable to statically linked libraries
- minimal slow down of program execution time (about 5%)
- fast: no source recompilation necessary for repeated simulation runs
- less amount of trace data produced
- high reliability: no executable modification
- covered by GNU Public License
- available via anonymous ftp at: ftp://ftp.lis.e-technik.tu-muenchen.de/pub/iprof
The operation is: With gcc/g++ option -a (above version 2.6.3) you can produce a basic block statistics file (bb.out), which contains the number of times each basic block of the program is acccessed during runtime. iprof processes this basic block statistics file and accesses the program's executable to summarize the machine instructions used for each basic block. So iprof doesn't make any modifications to the gcc/g++ and is easily portable among gcc/g++ supported architectures. Currently binaries for LINUX 486, Pentium and Sparc Solaris are provided, ports to other architectures are straightforward.
There are many ways to measure slowdown. Each has certain benefits, each has shortcomings.
- Time to execute target code on simulated target vs. native target running time. This is particularly interesting if you are trying to deterine relative performance for a cmmercial product such as SoftPC or if you're otherwise interested in real-time response. However, it ignores the implementation technology of the host machine. For example, a simulated Z-80 on a SPARC will be faster than a simulated SPARC on a Z-80, and performance may vary by 6X depending on which Z-80 you use.
- The time or number of host instructions to execute the workload vs. executing the workload native on the host tells you the most about simulation efficiency if the host and the target are the same machine. The numbers get less useful if the host and target are different; there's also differences if the simulator executes some part of the program "native" (e.g., system calls). For example, a workload compiled for the EDSAC (17-bit words) and then run on a MIPS is unlikely to be close to the performance of the workload compiledd and run on the SPARC.
- Number of host instructions per target instructions captures more of the "simulation efficiency" wihtout getting caught inthe confusion of processor implementation technologies. Howver, it potentially does the least accurate job of predicting real-time performance, as it may be unduly hurt by real-world concerns such as the number of cache misses. For example, SimICS got faster when the IR got smaller but more complicated to decode. The number of host instructions increased, but the overall running time decreased.
- Multiprocessor performance is even harder to judge. For example, multiplexing target processors on a single host processor may induce TLB, cache and paging misses that lead to much worse performance. Conversely, I/O effects may be overlapped with simulation of other processors, reducing the effective overhead of simulation.
- Simulating more costs more; simulators such as Shade, FX!32, etc. are as fast as they are in part because some parts of the overall workload (e.g., OS code) is executed native on the host machine, rather than simulating all host OS code.
So what we see includes:
- You can't measure the running time of a workload on a target that does not yet or no longer exists.
- Anything that uses elapsed running times depends strongly on the implementation technology.
- The real-world performance does vary depending on the implementation technology.
- The host/target ratio fails to capture some significant effects, e.g., the SimICS example.
- Multiprocessor simulation may cause higher miss rates in the processor cache, TLB and paging memory. Conversely, simulation may be overlapped with compuation.
- Running more of the application as host code improves the observed running time and host/target instructin ratio.
(I forget the details, but I'd definitely check out some of the early SimICS papers for a discussion of runnign times, Peter has more to say.)

Find an incorporate Harish Patil's dissertation <patil@ch.hp.com> on ``efficient program monitoring''. See the TR. Or, try here.

From: Harish Patil 
Newsgroups: comp.compilers
Subject: Thesis available: Program Monitoring
Date: 29 Jan 1997 11:21:02 -0500
Organization: Compilers Central
Lines: 59
Sender: johnl@iecc.com
Approved: compilers@ivan.iecc.com
Message-ID: <97-01-223@comp.compilers>
Reply-To: Harish Patil 
NNTP-Posting-Host: ivan.iecc.com
Keywords: report, available, performance

Hello everyone:

 I am glad to announce that my Ph.D. thesis, titled "Efficient Program
 Monitoring Techniques", is available on-line. This thesis was
 completed under the supervision of Prof. Charles Fischer at the
 department of Computer Sciences, University of Wisconsin --Madison.
 The thesis is available as technical report # 1320. Please check it
 out at the URL:
 http://www.cs.wisc.edu/Dienst/UI/2.0/Describe/ncstrl.uwmadison%2fCS-TR-96-1320
 An abstract of the thesis follows.

Regards,

-Harish
	Efficient Program Monitoring Techniques
	---------------------------------------
Programs need to be monitored for many reasons, including performance
evaluation, correctness checking, and security. However, the cost of
monitoring programs can be very high. This thesis contributes two
techniques for reducing the high execution time overhead of program
monitoring: 1) customization and 2) shadow processing. These
techniques have been tested using a memory access monitoring system
for C programs.

"Customization" reduces the cost of monitoring programs by decoupling
monitoring from original computation. A user program can be customized
for any desired monitoring activity by deleting computation not
relevant for monitoring. The customized program is smaller, easier to
analyze, and almost always faster than the original program. It can be
readily instrumented to perform the desired monitoring. We have
explored the use of program slicing technology for customizing C
programs. Customization can cut the overhead of memory access
monitoring by up to half.

"Shadow processing" hides the cost of on-line monitoring by using idle
processors in multiprocessor workstations. A user program is
partitioned into two run-time processes. One is the main process
executing as usual, without any monitoring code. The other is a shadow
process following the main process and performing the desired
monitoring. One key issue in the use of shadow process is the degree
to which the main process is burdened by the need to synchronize and
communicate with the shadow process. We believe the overhead to the
main process must be very modest to allow routine use of shadow
processing for heavily-used production programs. We therefore limit
the interaction between the two processes to communicating certain
irreproducible values. In our experimental shadow processing system
for memory access checking the overhead to the main process is very
low - almost always less than 10%.  Further, since the shadow process
avoids repeating some of the computations from the main program, it
runs much faster than a single process performing both the computation
and monitoring.

==========================================================================
Harish Patil:  Massachusetts Language Lab - Hewlett Packard
Mail Stop CHR02DC, 300 Apollo Drive, Chelmsford MA 01824
Phone: 508 436 5717  Fax: 508 436 5135  Email: patil@apollo.hp.com

CGuard

Categories:

Purpose: debugging
Input representation: hll
Detail: User
Multiple protection domains: No
Multiple processors: No
Signals and execptions: No
SMC OK: S (dynamically-linked libraries only)
Simulation technology: augmentation
Tool is robust in the face of application bugs: N
Status: information.

See:

Related to OM/ATOM/Hiprof: There are also derivative products, for example "Client Server News Issue 192 (G-2 Computer Intelligence Inc, 3 Maple Place, PO Box 7, Glen Head, New York 11545-9864, USA Telephone: 516-759-7025 Fax: 516-759-7028)" reports
CS192-24 TRACEPOINT NAMES ITS FIRST PRODUCT DEC spin-off Tracepoint Technology named its first product HiProf, as we suspected it would (CSN No 185), and described it as a graphical hierarchical profiler that will enable C++ developers to analyze the binaries of 32-bit x86 applications and figure out where modifications should be made. The first of a family, the tool is based on a patented Binary Code Instrumentation technology that displays a detailed analysis of an application's execution in Tracepoint's IDE. The company's core framework can handle executables and .dlls that have been generated by compiling software as well. Therefore, source code shouldn't have to be recompiled. The data can be viewed on a threads basis. HiProf is due out next month at $599 and runs on Win95 or NT 3.51 or later. It supports apps developed with VC++ 2.0 or above and Microsoft Developer Studio 4.03.

Newsgroups: comp.compilers Subject: ANNOUNCE - Fast Code Coverage Tool Date: 8 May 1997 21:27:24 -0400 Organization: Tracepoint/DIGITAL Lines: 33 Sender: johnl@iecc.com Approved: compilers@ivan.iecc.com Message-ID: <97-05-111@comp.compilers> Reply-To: jgarvin@scruznet.com NNTP-Posting-Host: ivan.iecc.com Keywords: testing, tools, available ANNOUNCING - TestTrack, Fast Code Coverage Tool for 32-bit Windows Apps TracePoint Technology has just opened the beta for TestTrack - an advanced code coverage tool that analyses test results and identifies areas in your code that have not been tested. Since TestTrack works on compiled and linked binary code (no source code or obj files required), there=92s no need for recompiling or preprocessing so the entire process is dramatically quicker than with past generation tools. TestTrack analyzes and reports on coverage of several different types including; function coverage, class coverage, line coverage, branch coverage, multiple condition coverage, call-pair coverage and more. TestTrack allows you to selectively exclude portions of the code base , if desired, so you can analyze only those portions of an app that concern you. A robust and intuitive GUI displays results in "live" pie charts or bar graphs that let you drill down into the code represented with just a mouse click, extensive reporting capabilities include the ability to publish reports in html, and a powerful merge function allows you to merge the results of several test runs for total coverage analysis. In addition, TestTrack identifies dead code in your app which is no longer used but which can slow performance and bloat program size. An evaluation copy of the latest TestTrack beta is available for free download from TracePoint at www.tracepoint.com. TestTrack works on 32-bit apps generated with VC++ 2.x - 5.0. TracePoint is a recent spin-off of DIGITAL Equipment Corp, whose mission is to create and market advanced development tools for 32-bit Windows apps. For further information on TracePoint visit our web site or call 888-688-2504.
From: dcpi-czar@pa.dec.com (Lance Berc) Newsgroups: comp.arch,comp.sys.dec,comp.unix.osf.osf1,comp.compilers Subject: New Alpha Performance Analysis Tools Date: 20 Jun 1997 21:43:17 -0400 Organization: Digital Equipment Corporation, Systems Research Center Lines: 33 Sender: johnl@iecc.com Approved: compilers@ivan.iecc.com Message-ID: <97-06-084@comp.compilers> NNTP-Posting-Host: ivan.iecc.com Keywords: tools, available

Version 2.2 of the DIGITAL Continuous Profiling Infrastructure, a set of performance tools for Digital Alpha systems running Digital Unix, is available for general use.

The Digital Continuous Profiling Infrastructure for Digital Alpha platforms permits continuous low-overhead profiling of entire systems, including the kernel, user programs, drivers, and shared libraries. The system is efficient enough that it can be left running all the time, allowing it to be used to drive online profile-based optimizations for production systems.

The Continuous Profiling Infrastructure maintains a database of profile information that is incrementally updated for every executable image that runs. A suite of profile analysis tools analyzes the profile information at various levels. At one extreme, the tools show what fraction of cpu cycles were spent executing the kernel and each user program. At the other extreme, the tools show how long a particular instruction stalls on average, e.g., because of a D-cache miss.

DCPI runs under Digital Unix V3.2 and V4.x, with a port to WindowsNT underway. It is free of charge. Further information, including papers and man pages, can be found at: http://www.research.digital.com/SRC/dcpi The system was developed at Digital's Systems Research Center and Western Research Laboratory, both in Palo Alto, California. A paper describing the system, will appear at SOSP-16 in October. -- Send compilers articles to compilers@iecc.com, meta-mail to compilers-request@iecc.com.
See http://www.research.digital.com/SRC/dcpi
SIS -- a SPARC V7 instruction set simulator, cycle accurate including parallel execution of IU and FPU and operand dependency stalls. Comments to Jiri Gaisler <jgais@wd.estec.esa.nl>.
From: el@compelcon.se (Erik Lundh) Newsgroups: comp.compilers Subject: Re: asm -> structured form Date: 14 Jan 1998 14:28:38 -0500 Organization: Algonet/Tninet Lines: 22 Sender: johnl@iecc.com Approved: compilers@ivan.iecc.com Message-ID: <98-01-055@comp.compilers> References: <98-01-013@comp.compilers> NNTP-Posting-Host: ivan.iecc.com Keywords: disassemble, tools, comment Have a look at Christina Cifuentes work with decompilers at http://www.it.uq.edu.au/groups/csm/dcc.html Also, have a look at Frans Faase's excellent compilation of decompiler efforts at http://wwwis.cs.utwente.nl:8080/~faase/Ha/decompile.html (There is a disclaimer at the top of the page that Mr Faase has left the faculty and might be unable to maintain the page. But the last update is dated in december 1997... Hope he can keep it!) Best Regards, Erik Lundh Compelcon AB SWEDEN Alexander Kjeldaas wrote: [I'm impressed -- it does a better job of decompiling than anything I've seen elsewhere. It's still a far cry from the original source, but good enough to be a big help figuring out what a dusty old program does. -John] -- Send compilers articles to compilers@iecc.com, meta-mail to compilers-request@iecc.com. Archives at http://www.iecc.com/compilers
Daisy, a VLIW + dynamic translator project at IBM.
Date: Sun, 22 Feb 1998 22:09:04 -0800 Message-Id: <199802230609.WAA08308@ncube> From: Steve Herrod To: simos-release@Crissy.Stanford.EDU Subject: Announcing SimOS Release 2.0! Content-Type: text Content-Length: 1628
The SimOS team at Stanford University is pleased to announce the second release of our complete machine simulation environment. If you are receiving this email, then you have downloaded an earlier version of SimOS or were deemed "someone who may be interested". If you would like to be taken off this infrequently used list, send mail to "simos@cs.stanford.edu" and we'll take you off of it immediately.
For those of you who need a refresher, SimOS is a "complete machine simulator" in that it models the hardware of uniprocessor and multiprocessor computers in enough detail to boot and run commercial operating systems as well as applications designed for these operating systems. This includes databases, web servers, and other workloads that traditional simulation tools have trouble supporting. Furthermore, SimOS executes these workloads at high speeds and provides support for easily collecting detailed hardware and software performance information.
There have been substantial improvements and enhancements since the first SimOS release including:
* Support for the Digital Alpha architecture running the Digital Unix operating system.
* Support for the MIPS 64-bit architecture.
* More modular hardware simulator interfaces that simplify the process of adding new processor, memory system, and device models.
SimOS is available free of charge for the research community and runs on several different hardware platforms. For download information, research papers, a discussion group, and more, visit the new SimOS web site at:
http://simos.stanford.edu
Connectix VirtualPC simulates a complete PC system including VGA, Audio Ethernet hardware and does sophisticated dynamic translation to achieve reasonable speeds (it's not exactly clear how well that works, but it seems that its achieved speed is between 25% and 80% native speed) and claims ``up to twice as fast as the competition.'' http://www.connectix.com/html/connectix_virtualpc.html and http://www.byte.com/art/9711/sec4/art4.htm
SoftWindows 5.0 is Insignia's competitive product (``up to twice as fast as the competition'') Also, RealPC, also by Insignia, is closer to VirtualPC in its design, it is a hardware-level emulator. See http://www.insignia.com.
Another interesting product is Inferno. See http://inferno.lucent.com. It describes the Inferno operating system, which is available both in native and application form and is VM-based, using dynamic translation to achieve (allegedly) a 1.5-2.5 times slowdown over native code. It is similar to TAOS in that it achieves application portability via a virtual machine.

Date:         Sun, 21 Jun 1998 10:50:58 -0400
Reply-To: History of Computing Issues 
Sender: History of Computing Issues 
From: Lee Wittenberg 
Subject:      SSEM Simulator
To: SHOTHC-L@SIVM.SI.EDU
X-UIDL: 2b629fc6064ad7c8f2c0919e41c76276

To coincide with the 50th Anniversary of the Small Scale Experimental Machine at Manchester, I am releasing the first "official" version of an SSEM simulator written in Java, and therefore (presumably) platform-independent. Source and binaries are available at

        ftp://samson.kean.edu/pub/leew/ssem/

Bochs, an Intel x86 emulator.
The New Mexico Statue University Parallel Trace Archive
The Paradyn system uses runtime (during execution) code generation (instrumentation).

   Method for verifying contiquity of a binary translated block of instructions
   by attaching a compare and/or branch instruction to predecessor block of
   instructions

				  Abstract

   A method for enabling a first block of instructions to verify whether the
   first block of instructions follows a second block of instructions in an
   order of execution. The method includes appending a compare instruction to
   the first block of instructions. The compare instruction compares a first
   value from the first block of instructions with a second value from the
   second block of instructions, which precedes the first block of instructions
   in the order of execution. The method further includes appending a branching
   instruction to the first block of instructions. The branching instruction is
   executed in response to the first value being unequal to the second value.
   The branching instruction, when executed, branches to an alternative look-up
   routine to obtain a block of instructions that follows the second block of
   instructions in the order of execution.


http://patents.uspto.gov/cgi-bin/ifetch4?INDEX+PATBIB-ALL+0+24884+0+6+20371+OF+1+1+1+PN%2f5721927


What is claimed is: 
    1. A computer-implemented method for enabling a first block of instructions to verify whether the first block of instructions
follows a second block of instructions in an order of execution the method comprising the steps of: 

     a) appending a compare instruction to the first block of instructions, the compare instruction when executed compares a
     first value from the first block of instructions with a second value from the second block of instructions, said second block
     of instructions preceding said first block of instructions in the order of execution; and 
     b) appending a branching instruction to the first block of instructions, said branching instruction is executed in response to
     the first value being unequal to the second value, said branching instruction, when executed, branches to an alternative
     look-up routine to obtain a block of instructions that follows the second block of instructions in the order of execution.



U.S. REFERENCES:   (No patents reference this one) 
 Patent
         Inventor 
                    Issued   
                                                   Title
 5167023
       De Nicolas et al.
                   11 /1992 
                         Translating a dynamic transfer control instruction address in a simulated CPU
                         processor 

ABSTRACT:   The system and method of this invention simulates the flow of control of an application program targeted for a
specific instruction set of a specific processor by utilizing a simulator running on a second processing system having a second
processor with a different instruction set. The simulator reduces the number of translated instructions needed to simulate the
flow of control of the first processor instructions when translating the address of the next executable instruction resulting from a
dynamic transfer of control, i.e., resulting from a return instruction. The simulator compares the address that is loaded at run time
by the return instruction with the return address previously executed by that instruction. If the last return address matches, the
location of the return is the same. If the last return does not match, a translate look-aside buffer is used to determine the address.
If the translate look-aside buffer does not find the address, then a binary tree look up mechanism is used to determine the
address of the next instruction after a return. The performance of the simulator is enhanced by utilizing the easiest approaches
first in the chance that a translated instruction will result most efficiently.

 5287490
       Sites
                   2 /1994 
                         Identifying plausible variable length machine code of selecting address in numerical
                         sequence, decoding code strings, and following execution transfer paths 

ABSTRACT:   Information about the location of untranslated instructions in an original program is discovered during execution of a
partial translation of the program, and that information is used later during re-translation of the original program. Preferably the
information includes origin addresses of translated instructions and corresponding destination address of untranslated
instructions of execution transfers that occur during the execution of the partial translation. Preferably this feedback of
information from execution to re-translation is performed after each execution of the translated program so that virtually all of
the instructions in the original program will eventually be located and translated. To provide an indication of the fraction of the
code that has been translated, the program is scanned to find plausible code in the areas of memory that do not contain translated
code. The plausible code is identified by selecting addresses according to three different scanning modes and attempting to
decode variable-length instructions beginning at the selected addresses. The scanning modes include a first mode in which
addresses are selected in numerical sequence by a scan pointer, a second mode in which addresses are selected in
instruction-length sequence by an instruction decode pointer, and a third mode in which the selected addresses are destination
addresses of previously-decoded execution transfer instructions.

hat is claimed is: 
    26. A method of operating a digital computer having an addressable memory, said addressable memory containing a computer
program, said computer program including instructions and data at respective address locations of said addressable memory,
each of said instructions consisting of contents of a variable number of contiguous ones of said address locations depending upon
an operation specified by said each of said instructions, said method identifying address locations of said addressable memory
that appear to contain said instructions of said computer program, said method comprising the steps of: 

     a) selecting program addresses in numerical sequence, and attempting to decode an instruction in said addressable
     memory at each program address until an initial instruction is decoded; and when said initial instruction is decoded, then 
     b) attempting to decode a string of instructions immediately following said initial instruction until an execution transfer
     instruction is decoded, and when an attempt to decode an instruction fails, continuing said selecting program addresses
     and said attempting to decode an instruction at each program address as set out in said step a), and when an execution
     transfer instruction is decoded, then 
     c) attempting to decode an instruction at a destination address of the decoded execution transfer instruction, and when
     the attempt to decode an instruction at the destination address of the decoded execution transfer instruction fails,
     continuing said selecting program addresses and said attempting to decode an instruction at each program address as set
     out in step a), and when the attempt to decode an instruction at the destination address of the decoded execution transfer
     instruction succeeds, then identifying, as said address locations of said addressable memory that appear to contain said
     instructions of said computer program, the address locations including said initial instruction and said string of instructions
     including said execution transfer instruction, 
     wherein some program addresses of said computer program are known to contain instructions, and wherein said step a)
     skips over the program addresses that are known to contain instructions, 
     wherein the decoding of an instruction is not permitted when an instruction being decoded partially overlaps program
     addresses known to contain an instruction, and 
     wherein said step a) skips over a program address containing a value that is included in a predefined set of values,
     regardless of whether an attempt to decode an instruction starting at the program address would be successful, wherein
     said set of values includes values that indicate instructions having a length of one program address location, said set of
     values includes opcodes of privileged instructions, and said set of values includes the value of zero, and 
     wherein said step a) skips over a program address that is the first address of a string of at least four printable ASCII
     alphanumeric characters.

 5560013
       Scalzi et al.
                   9 /1996 
                         Method of using a target processor to execute programs of a source architecture that
                         uses multiple address spaces 

ABSTRACT:   A method of utilizing large virtual addressing in a target computer to implement an instruction set translator (1ST)
for dynamically translating the machine language instructions of an alien source computer into a set of functionally equivalent
target computer machine language instructions, providing in the target machine, an execution environment for source machine
operating systems, application subsystems, and applications. The target system provides a unique pointer table in target virtual
address space that connects each source program instruction in the multiple source virtual address spaces to a target instruction
translation which emulates the function of that source instruction in the target system. The target system efficiently stores the
translated executable source programs by actually storing only one copy of any source program, regardless of the number of
source address spaces in which the source program exists. The target system efficiently manages dynamic changes in the
source machine storage, accommodating the nature of a preemptive, multitasking source operating system. The target system
preserves the security and data integrity for the source programs on a par with their security and data integrity obtainable when
executing in source processors (i.e. having the source architecture as their native architecture). The target computer execution
maintains source-architected logical separations between programs and data executing in different source address
spaces--without a need for the target system to be aware of the source virtual address spaces.

Having thus described our invention, what we claim as new and desire to secure by Letters patent is: 
    1. An emulation method for executing individual source instructions in a target processor to execute source programs
requiring source processor features not built into the target processor, comprising the steps of: 

     inputting instructions of a source processor program to an emulation target processor having significant excess virtual
     addressing capacity compared to a virtual addressing capacity required for a source processor to natively execute the
     source processor program, and supporting multiple source virtual address spaces in the operation of the source
     processor, 
     building a virtual ITM (instruction translation map) in a target virtual address space supported by the target processor, the
     virtual ITM containing an ITM entry for each source instruction addressable unit, each source instruction addressable
     unit beginning on a source storage instruction boundary, structuring each ITM entry for containing a translation address
     to a target translation program that executes a source instruction having a source address associated with the ITM entry,
     determining a ratio R by dividing the length of each ITM entry by the length of each source instruction addressable unit, 
     accessing an ITM entry for an executing source instruction by: 
         generating a source aggregate virtual address for the source instruction by combining the source address of the
         source instruction with a source address space identifier of a source virtual address space containing the
         instruction,
         multiplying the source aggregate virtual address by R to obtain a target virtual address component, and
         inserting the target virtual address component into a predetermined component location in a target virtual address
         to generate an ITM entry target virtual address for locating an ITM entry associated with the source instruction in
         order to obtain a one-to-one addressing relationship between ITM entry target virtual addresses and source
         instruction addresses.

 5619665
       Emma
                   4 /1997 
                         Method and apparatus for the transparent emulation of an existing instruction-set
                         architecture by an arbitrary underlying instruction-set architecture 

ABSTRACT:   The invention provides means and methods for extending an instruction-set architecture without impacting the
software interface. This circumvents all software compatibility issues, and allows legacy software to benefit from new
architectural extensions without recompilation and reassembly. The means employed are a translation engine for translating
sequences of old architecture instructions into primary, new architecture instructions, and an extended instruction (EI) cache
memory for storing the translations. A processor requesting a sequence of instructions will look first to the EI-cache for a
translation, and if translations are unavailable, will look to a conventional cache memory for the sequence, and finally, if still
unavailable, will look to a main memory.

I claim: 
    1. A method for translating a series of one or more instructions of a first semantic type into one or more instructions of a
second semantic type, comprising the steps of: 

     providing a first memory; 
     providing a second memory; 
     translating a sequence of instructions of the first semantic type stored in the first memory into one or more primary
     instructions of the second semantic type and storing the instructions of the second type in the second memory; 
     upon a request from the processor for the sequence of instructions of the first semantic type: 
         providing the corresponding instructions of the second semantic type if available in the second memory;
         providing the sequence of instructions of the first semantic type if the corresponding instructions of the second
         semantic type are not available in the second memory.

[Others found.]

 4347565
       Kareda et al.
                 8 /1982 
                        Address control system for software simulation 

ABSTRACT:   An address control system for software simulation in a virtual machine system having a virtual storage function.
When a simulator program is simulating an instruction of a program to be simulated, an address translation of an operand address
in the program to be simulated is achieved using a translation lookaside buffer, thereby greatly reducing the overhead for the
address translation during the simulator program execution. 

 4638423
       Ballard
                 1 /1987 
                        Emulating computer 

ABSTRACT:   An apparatus and method is disclosed for providing an emulating computer. The present invention consists of a
computer having a storage area, processing unit, control circuits and translation circuit. The original instructions are first loaded
into the storage area. When the processor attempts to operate an instruction the control circuit loads a section of the instructions
into the translating circuit. These instructions are then translated and stored in a memory area of the translating circuit having the
address of the original instruction. The processor unit then accesses the storage area and retrieves the translated instruction. 

What is claimed is: 
    7. A method of emulating a computer comprising the steps of: 

     transmitting an instruction to a processing unit; 
     checking a cache memory for a translated instruction; 
     loading an instruction block into an instruction memory if said translated instruction is not in said cache memory; 
     translating an instruction of said instruction block providing a translated instruction; 
     storing said translated instruction in said cache memory; and 
     transmitting said translated instruction from said cache memory to said processing unit.

http://www.nwlink.com/~tigger/altair.html.
Find/write up a 1984 bib cite on a Bell Labs project to emulate the PDP-11. They implemented it in portable FORTRAN (minus some host-specific work around to handle random access files for swapping the simulated memory). They were able to boot Unix straight from distribution tapes. The work as done 81-82, I believe. The intention was to simplify bootstrapping Unix on new hardware in environments that did not have an existing Unix machine. The objective sort of failed since by the time they got it working, Unix was so succesful, few locations were that desperate. Their slowdown was about 120, and among other ideas they said that they could re-implement the interpreter kernel using threaded code for performance.
TeraGen emulating microcontroller. Following from an EE Times article by David Lammers, TeraGen architecture primes single engine for multiple instruction sets (01/25/99, 02:08:32 PM EDT).
- TeraGen Corp., Sunnyvale CA.
- Microcontroller.
- Translates multiple ISAs on the fly to ``POPs'' (primitive operations) for scheduling on a VLIW.
- TeraGen cofounder Don Sollers was a principal architect of the DSP architecture being brought to market by ZSP Corp., which uses conventional superscalar techniques to increase signal-processing throughput. Sollers earlier worked on processors at Digital and Sun and was principal architect of the Supersparc II.
- The key advantage ... will be its ability to execute the code from several different processing cores on one engine.
- The TeraGen engine is adapted to additional instruction sets by adding a ``small block of fast ROM'' to govern the translation of the new instructions into POPs.
- A ROM can also be set up to translate a set of hardware functions into POPs. The TeraGen engine could thus be configured to emulate peripheral functions as well as other processors.
- Instructions streams for all processors/emulated peripherals flow to the scheduler, where each is translated into POPs. The resulting streams of VLIW-like operations are then scheduled for execution.
- A key feature is ... ``a large, fast data cache used for register emulation. By allocating cache locations to represent each of the registers for each of the instruction sets it is emulating, the TeraGen engine apparently can blend POPs from different streams of instructions into a single flow. It thus can theoretically find opportunities for parallelism that would escape a conventional superscalar or VLIW architecture.''
- Put another way, the system uses a virtual register file in a cache to emulate the register file. Stollers said ``This is part of our secret sauce: The cache can be accessed as a register.''
- Stollers said ``We have the capability to manage and schedule operations within an RTOS. This approach would allow the control logic to run at the same speed as the data path, which is what real-world multiprocessing is all about.''
- Note that all emulated processors/devices get faster together as the TeraGen is made faster.
- CEO George Alexy was recruited from Cirrus Logic Inc. in mid-1998 to head up TeraGen.
- Alexy said two semiconductor companies have taken licenses, initially for 8-bit applications; they will reach silicon within the year.
- Will Strauss, principal at Forward Concepts (Tempe, Ariz.), says there may be legal questions about emulation.
- Strauss says most DSP designs use a Harvard architecture; TeraGen employs a unique register-file approach.
- Quoting from the above article, ``The ability to reuse code while combining a DSP and an MCU may be unique to TeraGen, Sollers said. The StarCore approach now being developed by Motorola and Lucent is working toward combining a DSP and an MCU on the same die, but Sollers claimed that the StarCore effort "will almost be forced to adopt a new ISA. In our approach, we allow people to use a familiar ISA. From a top-level perspective, what we are doing is allowing people to configure a system-on-chip through software. That is where the flexibility of this approach comes from."''
- TeraGen "breaks very complex tasks into primitives very quickly, to achieve an advantage that way. The POPs are long instructions-a native instruction set that is dramatically different from what previous architectures have attempted. How we hierarchically establish our instructions is our inherent advantage."
- TeraGen has a staff of about 20 engineers.
- Analyst Strauss said TeraGen may quickly run into intellectual-property issues.
- TeraGen has attracted $9 million in investment capital from Sequoia Capital Partners and InterWest Partners.

DATE:  June 10, Thursday, 2:30
TITLE:  Jalapeno --- a new Java Virtual Machine for Servers

SPEAKER: Vivek Sarkar
         IBM T. J. Watson Research Center

ABSTRACT:

In this talk, we give an overview of the Jalapeno Java Virtual Machine
(JVM) research project at the IBM T. J. Watson Research Center.  The
goal of Jalapeno is to expand the frontier of JVM technologies for
server nodes --- especially in the areas of dynamic optimized
compilation and specialization, scalable exploitation of multiple
processors in SMPs, and the use of a JVM as a 7x24 application server.

The Jalapeno JVM has two key distinguishing features.  First, the
Jalapeno JVM takes a compile-only approach to program execution.
Instead of providing both an interpreter and a JIT/dynamic compiler,
it provides two dynamic compilers --- a quick non-optimizing
"baseline" compiler, and a slower production-strength optimizing
compiler.  Both compilers share the same interfaces with the rest of
the JVM, thus making it easy to mix execution of unoptimized methods
with optimized methods.  Second, the Jalapeno JVM is itself
implemented in Java!  This design choice brings with it several
advantages as well as technical challenges. The advantages include a
uniform memory space for JVM objects and application objects, and ease
of portability.  The key technical challenge is to overcome the large
performance penalties of executing Java code (compared to native code)
that has been the experience of current JVMs; if we succeed in doing
so, we will simultaneously improve the performance of our JVM as well
as of the applications running on our JVM.

The Jalapeno project was initiated in January 1998 and is still work
in progress.  This talk will highlight our design decisions and early
experiences in working towards our goal of building a high-performance
JVM for SMP servers.

Anything you know about that I haven't included and any bugs you find that I haven't fixed.

From instruction-set simulation and tracing