- [ASH 86]
-
\bibitem{ASH:86}
Anant Agarwal,
Richard L. Sites
and Mark Horowitz,
``ATUM: A New Technique for Capturing Address Traces Using Microcode,''
Proceedings of the 13th International Symposium on Computer
Architecture (ISCA-14),
June 1986,
pp.~119-127.
- [AS 92]
-
\bibitem{AS:92}
Kristy Andrews
and
Duane Sand,
``Migrating a CISC Computer Family onto RISC via Object Code Translation,''
Proceedings of the Fifth International Conference on Architectural
Support for Programming Languages and Operating Systems (ASPLOS-V),
October 1992,
pp.~213-222.
- [BL 94]
-
\bibitem{BL:94}
Thomas Ball,
and
James R. Larus
``Optimally Profiling and Tracing Programs,''
ACM Transactions on Programming Languages and Systems,
(16)2,
May 1994,
- [Baumann 86]
-
\bibitem{Baumann:86}
Robert A. Baumann,
``Z80MU,''
Byte Magazine,
October 1986,
pp.~203-216.
- [Jeremiassen 00]
-
\bibitem{Jeremiassen:00}
Tor E. Jeremiassen,
``Sleipnir --- An Instruction-Level Simulator Generator,''
International Conference on Computer Design, pp.~23--31. IEEE, 2000.
- [Bedichek 90]
-
\bibitem{Bedichek:90}
Robert Bedichek,
``Some Efficient Architecture Simulation Techniques,''
Winter 1990 USENIX Conference,
January 1990,
pp.~53-63.
PostScript(tm) paper
[Link broken, please e-mail <pardo@xsim.com> to
get it fixed.]
- [Bedichek 94]
-
\bibitem{Bedichek:94}
Robert Bedichek,
``The Meerkat Multicomputer: Tradeoffs in Multicomputer Architecture,''
Doctoral Dissertation,
University of Washington Department of Computer Science and Engineering
technical report 94-06-06, 1994.
- [Bedichek 95]
-
\bibitem{Bedichek:95}
@inproceedings(Bedichek:95,
author = "Robert C. Bedichek",
title = "Talisman: Fast and Accurate Multicomputer Simulation",
booktitle="Proceedings of the 1995 ACM SIGMETRICS Conference on
Modeling and Measurement of Computer Systems",
month=May,
year="1995",
page=14--24
)
- [BKLW 89]
-
\bibitem{BKLW:89}
Anita Borg,
R. E. Kessler,
Georgia Lazana
and
David W. Wall,
``Long Address Traces from RISC Machines: Generation and Analysis,''
Digital Equipment Western Research Laboratory Research Report 89/14,
(appears in shorter form as~\cite{BKW:90})
September 1989.
Abstract/paper.
- [BKW 90]
-
\bibitem{BKW:90}
Anita Borg, R. E. Kessler and
David W. Wall,
``Generation and Analysis of Very Long Address Traces,''
Proceedings of the 17th Annual Symposium on Computer Architecture (ISCA-17),
May 1990,
pp.~270-279.
- [Boothe 92]
-
\bibitem{Boothe:92}
Bob Boothe,
``Fast Accurate Simulation of Large Shared Memory Multiprocessors,''
technical report UCB/CSD 92/682,
University of California, Berkeley, Computer Science Division,
April 1992.
- [BDCW 91]
-
\bibitem{BDCW:91}
Eric A. Brewer,
Chrysanthos N. Dellarocas, Adrian Colbrook and
William E. Weihl,
``{\sc Proteus}: A High-Performance Parallel-Architecture Simulator,''
Massachusetts Institute of Technology technical report
MIT/LCS/TR-516,
1991.
- [BAD 87]
-
\bibitem{BAD:87}
Eugene D. Brooks III, Timothy S. Axelrod and Gregory A. Darmohray,
``The Cerberus Multiprocessor,''
Lawrence Livermore National Laboratory technical report,
Preprint UCRL-94914,
1987.
- [Chamberlain 94]
-
\bibitem{Chamberlain:94}
Steve Chamberlain, Personal communication, 1994.
- [CUL 89]
-
\bibitem{CUL:89}
Craig Chambers, David Ungar and Elgin Lee,
``An Efficient Implementation of {\sc Self}, a Dynamically-Typed
Object-Oriented Language Based on Prototypes,''
OOPSLA '89 Proceedings,
October 1989,
pp.~49-70.
- [CHRG 95]
-
%A John Chapin
%A Steve Herrod
%A Mendel Rosenblum
%A Anoop Gupta
%T Memory System Performance of UNIX on CC-NUMA Multiprocessors
%J ACM SIGMETRICS '95
%P 1-13
%D May 1995
%W ftp://www-flash.stanford.edu/pub/hive/numa-os.ps
- [CHKW 86]
-
\bibitem{CHKW:86}
Fred Chow, A. M. Himelstein, Earl Killian and L. Weber,
``Engineering a RISC Compiler System,''
IEEE COMPCON,
March 1986.
- [CG 93]
-
\bibitem{CG:93}
Cristina Cifuentes
and K.J. Gough
``A Methodology for Decompilation,''
In Proceedings of the XIX Conferencia Latinoamericana deInformatica,
pp. 257-266,
Buenos Aires, Argentina, August 1993.
PostScript(tm) paper,
PostScript(tm) paper.
(Note: these papers may have moved to
here.)
- [CG 94]
-
\bibitem{CG:94}
Cristina Cifuentes
and
K.J. Gough
``Decompilation of Binary Programs,''
Technical report 3/94,
Queensland University of Technology, School of Computing Science,
1994.
PostScript(tm) paper
(Note: these papers may have moved to
here.)
- [CG 95]
-
\bibitem{CG:95}
C. Cifuentes and K. John Gough,
``Decompilation of Binary Programs,''
Software--Practice&Experience, July 1995.
PostScript(tm) paper
Describes general techniques and a 80286/DOS to C converter.
- [Cifuentes 93]
-
\bibitem{Cifuentes:93}
C. Cifuentes,
``A Structuring Algorithm for Decompilation'', Proceedings of the XIX
Conferencia Latinoamericana de Informatica, Aug 1993, Buenos Aires,
pp. 267 - 276.
PostScript(tm) paper
- [Cifuentes 94a]
-
\bibitem{Cifuentes:94a}
Cristina Cifuentes
``Interprocedural Data Flow Decompilation,''
Technical report 4/94,
Queensland University of Technology, School of Computing Science, 1994.
PostScript(tm) paper
(Note: these papers may have moved to
here.)
- [Cifuentes 94b]
-
\bibitem{Cifuentes:94b}
Cristina Cifuentes
``Reverse Compilation Techniques,''
Doctoral disseration,
Queensland University of Technology,
July 1994.
PostScript(tm) paper
(474MB).
- [Cifuentes 94c]
-
\bibitem{Cifuentes:94c}
C. Cifuentes,
``Structuring Decompiled Graphs,''
Technical Report 4/94, Queensland University of
Technology, Faculty of Information Technology, April 1994.
PostScript(tm)
- [Cifuentes 95]
-
\bibitem{Cifuentes:95}
C. Cifuentes,
``Interprocedural Data Flow Decompilation'', Journal of Programming Languages.
In print, 1995.
PostScript(tm) paper
- [Cifuentes 95b]
-
\bibitem{Cifuentes:95b}
C. Cifuentes,
``An Environment for the Reverse Engineering of Executable Programs''.
To appear: Proceedings of the Asia-Pacific Software Engineering
Conference (APSEC). IEEE. Brisbane, Australia. December 1995.
PostScript(tm) paper
- [Conte & Gimarc 95]
-
``Fast Simulation of Computer Architectures'',
Thomas M. Conte and Charles E. Gimarc, Editors.
Kluwer Academic Publishers, 1995.
ISBN 0-7923-9593-X.
See
here
for ordering information.
- [CDKHLWZ 00]
-
%A Robert F. Cmelik
%A David R. Ditzel
%A Edmund J. Kelly
%A Colin B. Hunter
%A Douglas A. Laird
%A Malcolm John Wing
%A Gregorz B. Zyner
%T Combining Hardware and Software to Provide an Improved Microprocessor
%R United States Patent #US6031992
Available as of 2000/03 via
http://www.patents.ibm.com/details?&pn=US06031992__
HERE
r
77%
- [98]
-
US06011908
01/04/2000
Gated store buffer for an advanced microprocessor
Available as of 2000/03 via
77%
r
77%
- [98]
-
US05958061
09/28/1999
Host microprocessor with apparatus for temporarily
holding target
processor state
e
Available as of 2000/03 via
77%
- [Cmelik 93a]
-
\bibitem{Cmelik:93a}
Robert F. Cmelik,
``Introduction to Shade,''
Sun Microsystems Laboratories, Incorporated,
February 1993.
- [Cmelik 93b]
-
\bibitem{Cmelik:93b}
Robert F. Cmelik,
``The Shade User's Manual,''
Sun Microsystems Laboratories, Incorporated,
February 1993.
- [Cmelik 93c]
-
\bibitem{Cmelik:93c}
Robert F. Cmelik,
``SpixTools Introduction and User's Manual,''
Sun Microsystems Laboratories, Incorporated,
technical report TR93-6,
February 1993.
Html pointer
- [CK 93]
-
\bibitem{CK:93}
Robert F. Cmelik,
and
David Keppel,
``Shade: A Fast Instruction-Set Simulator for Execution Profiling,''
Sun Microsystems Laboratories, Incorporated, and the University of
Washington,
technical report
SMLI 93-12
and UWCSE
93-06-06,
1993.
Html pointer,
PostScript(tm) paper.
- [CK 94]
-
\bibitem{CK:94}
Robert F. Cmelik,
and
David Keppel,
``Shade: A Fast Instruction-Set Simulator for Execution Profiling,''
Proceedings of the 1994 ACM SIGMETRICS Conference
on Measurement and Modeling of Computer Systems
May 1994,
pp.~128-137.
Html pointer,
PostScript(tm) paper.
[Link broken, please e-mail <pardo@xsim.com> to
get it fixed.]
- [CK 95]
-
\bibitem{CK:95}
Robert F. Cmelik,
and
David Keppel,
``Shade: A Fast Instruction-Set Simulator for Execution Profiling,''
Appears as Chapter~2 of
``[Conte & Gimarc 95]'',
pp.~5-46.
- [CMMJS 88]
-
\bibitem{CMMJS:88}
R. C. Covington, S. Madala, V. Mehta, J. R. Jump and J. B. Sinclair,
``The Rice Parallel Processing Testbed,''
Proceedings of the 1988 ACM SIGMETRICS Conference on Measurement and
Modeling of Computer Systems,
1988,
pp.~4-11.
- [DLHH 94]
-
\bibitem{DLHH:94}
Peter Davies, Philippe LaCroute, John Heinlein and
Mark Horowitz,
``Mable: A Technique for Efficient Machine Simulation,''
Quantum Effect Design, Incorporated, and Stanford University
technical report CSL-TR-94-636,
1994.
- [DGK 91]
-
\bibitem{DGH:91}
Helen Davis, Stephen R. Goldschmidt and John Hennessy,
``Multiprocessor Simulation and Tracing Using Tango,''
Proceedings of the 1991 International Conference on Parallel
Processing (ICPP, Vol. II, Software),
August 1991,
pp.~II 99-107.
- [Deutsch 83]
-
\bibitem{Deutsch:83}
Peter Deutsch,
``The Dorado Smalltalk-80 Implementation: Hardware Architecture's
Impact on Software Architecture,''
Smalltalk-80: Bits of History, Words of Advice,
1983
Addison-Wesley
pp.~113-126.
Review/summary by Pardo:
- Describes a mostly-microcode implementation of the ST-80 VM.
Runs on a Xerox Dorado; fastest ST-80 implementation, in it's
day.
About 85-95% of the execution time is spent in the Dorado's
ST-80 microcode.
- [DS 84]
-
\bibitem{DS:84}
Peter Deutsch and Alan M. Schiffman,
``Efficient Implementation of the Smalltalk-80 System,''
11th Annual Symposium on Principles of Programming Languages (POPL-11),
January 1984,
pp.~297-302.
- [DM 87]
-
\bibitem{DM:87}
David R. Ditzel and
Hubert R. McLellan
``Branch Folding in the CRISP Microprocessor: Reducing Branch Delay
to Zero,''
Proceedings of the 14th Annual International Symposium on Computer
Architecture; Computer Architecture News,
Volume 15, Number 2,
June 1987,
pp.~2-9.
- [DMB 87]
-
\bibitem{DMB:87}
David R. Ditzel,
Hubert R. McLellan
and
Alan D. Berenbaum,
``The Hardware Architecture of the CRISP Microprocessor,''
Proceedings of the 14th Annual International Symposium on Computer
Architecture; Computer Architecture News,
Volume 15, Number 2,
June 1987,
pp.~309-319.
- [EKKL 90]
-
\bibitem{EKKL:90}
Susan J. Eggers,
David Keppel,
Eric J. Koldinger and Henry M. Levy,
``Techniques for Efficient Inline Tracing on a Shared-Memory
Multiprocessor,''
Proceedings of the 1990 ACM SIGMETRICS Conference on Measurement and
Modeling of Computer Systems,
May 1990,
pp.~37-47.
- [ES 94]
-
\bibitem{ES:94}
Alan Eustace
and
Amitabh Srivastava,
``ATOM: A Flexible Interface for Building High Performance ProgramAnalysis Tools,''
Technical note
TN-44,
July 1994,
Digital Equipment Corporation Western Research Laboratory,
July 1994.
Html.
- [ES 95]
-
\bibitem{ES:95}
Alan Eustace
and
Amitabh Srivastava,
``ATOM: A Flexible Interface for Building High Performance Program
Analysis Tools,''
Proceedings of the USENIX 1995
Technical Conference on UNIX and Advanced Computing Systems,
New Orleans, Louisiana,
January 16-20, 1995,
pp. 303-314.
- [EY 03]
-
\bibitem{EY:83}
Hideki Eiraku, Yasushi Shinjo
``Running BSD Kernels as User Processes by Partial Emulation and Rewriting of Machine Instructions'',
Proceedings of BSDCon '03
San Mateo, CA, USA,
8-12 September 2003.
- [Fujimoto 83]
-
\bibitem{Fujimoto:83}
Richard M. Fujimoto,
``Simon: A Simulator of Multicomputer Networks''
technical report UCB/CSD 83/137,
ERL, University of California, Berkeley,
1983.
- [FC 88]
-
\bibitem{FC:88}
Richard M. Fujimoto,
and William B. Campbell,
``Efficient Instruction Level Simulation of Computers,''
Transactions of The Society for Computer Simulation 5(2),
April 1988,
pp.~109-123.
- [FP 94]
-
\bibitem{FP:94}
FlashPort product literature,
AT&T Bell Laboratories,
August 1994.
- [FN75]
-
\bibitem{FN:75}
M. J. Flynn,
C. Neuhauser,
``EMMY -- An Emulation System for User Microprogramming,''
National Computer Conference,
1975,
pp.~85-89.
Review/summary by Pardo:
- EMMY is a user-microprogrammable machine designed to allow
easy microprogramming in order to emulate other machines. The goal
is to facilitate inter-architecture comparisons, analyze the
effectiveness of architectures and compilers (optimizations)
through ``software probes'', and to develop new architectures (in
order to solve the ``semantic gap'' problem).
- Features of special hardware include: field handling and selection,
shifting, extensive bit testing and flexible specification of data
paths (which they call ``residual control''; I don't understand the
last one).
- EMMY was designed with ``severe'' cost constraints. Goal is to
fit whole CPU on one PC board. It is 32-bit, 4,096 words (32-bit)
of microstore. There are 7 general-purpose registers and an eighth
status register that includes condition codes, memory busy, status
(halt/run/disable interrupts) and microinstruction PC.
- Each microinstruction has ``a high degree of parallelism'' for
a ``machine of this size'', by which I assume they mean something
that is ~vertically microcoded. Each 32-bit microinstruction is,
basically, a 2-wide VLIW, where one half performs logical and branch
operations and the other branch and memory operations.
- Implemented in TTL plus some MECL-10K (a Motorola ECL).
The CPU is on a 12" x 15" card with micromemory and console logic on
separate cards. The clock is 25ns; microstorage has 60ns to
access, with a 200ns cycle time; microinstructions are typically
executed every 200ns; they estimate it takes ~10 microseconds to
execute each simulated instruction.
- No discussion of multitasking/privilege/VM/etc.
- [Gill 51]
-
\bibitem{Gill:51}
S. Gill,
``The Diagnosis Of Mistakes In Programmes on the EDSAC''
Proceedings of the Royal Society Series A Mathematical and Physical
Sciences,
22 May 1951,
(206)1087,
pp.~538-554,
Cambridge University Press
London and New York.
The scanned article available via
here.
[Link broken, please e-mail <pardo@xsim.com> to
get it fixed.]
- [GDB 94]
-
\bibitem{GDB:94}
GNU debugger and simulator,
Internet Universal Resource Locator
{\mbox{\tt ftp://prep.ai.mit.edu/pub/gnu}},
GDB distribution, {{\tt sim}} subdirectory.
Note that (as of 1998) for each simulator included with GDB there is
also a GCC target and a set of runtime libraries.
- [GH 92]
-
\bibitem{GH:92}
Stephen R. Goldschmidt and John L. Hennessy,
``The Accuracy of Trace-Driven Simulations of Multiprocessors,''
Stanford University Computer Systems Laboratory,
technical report CSL-TR-92-546,
September 1992.
- [Granlund 94]
-
\bibitem{Granlund:94}
Torbj\"{o}rn Granlund,
``The Cygnus Simulator Proposal,''
Cygnus Support, Mountain View, California, March 1994.
- [Grossman 94]
-
\bibitem{Grossman:94}
Stu Grossman, Personal communication,
November 1994.
- [Halfhill 94]
-
\bibitem{Halfhill:94}
Tom. R. Halfhill,
``Emulation: RISC's Secret Weapon,''
Byte, April 1994, pp.~119-130.
- [Halfhill 00]
-
\bibitem{Halfhill:00}
Tom. R. Halfhill,
``Transmeta Breaks x86 Low-Power Barrier,''
Microprocessor Report, Feburary 14, 2000.
Review/summary by Pardo:
- Basic architecture: a VLIW core plus software to translate x86 code
to native VLIW code at run time.
- Related tools include:
FWB's SoftWindows for the Macintosh and Unix,
Connectix's Virtual PC for the Macintosh,
FX!32 for Alpha,
Sun's HotSpot Java JIT.
- All are "emulators" -- optimizing and caching translated code is a
performance-enhancing techique that does not change the
fundamentals of what is going on.
- Crusoe has special hardware to assist emulation.
- Related tools include International Meta Systems (IMS) 3250,
a never-produced design to emulate x86, 68K and 6502 using
customizable microcode plus ...
(See uPR 5/6/92-03, ``Microcode Engine Offers Enhanced Emulation'').
- Crusoe hardware is not specifically for x86 emulation;
to boost performance of any non-native executables.
For example, see Transmeta Java->VLIW demonstration.
Still, probably more than coincidence Crusoe chips
have 80-bit FP, partial register writes, etc.
- Crusoe's most important accomplishments:
- combining VLIW and emulation;
- HW/SW technology to vary processor voltage and
frequency adaptively depending on workload
(``LongRun'');
- new standard for low power among x86-compatible processors;
- sacrifice less performance to emulation than other
software translators -- Crusoe chips are slower than other
similar-frequency x86 processors, but because the core
is optimzed for low power, not high perfornace.
- 700MHz Crusoe is about 70% the speed of a 700-MHz Pentium III.
- Integrated PCI controller, DRAM controllers, and other
components of a traditional north bridge.
Gains power consumption and efficiency of on-chip memory
controllers.
- Architectural features of little importance to SW developers
because nobody writes software for the native architecture.
- Different Crusoe models not compatible at the host level,
are compatible at the target level.
- Discussion of LongRun. [[Not relevant to this page except to note
that it is transparent. --pardo]]
- Overhead of emulation is highly variable.
Traditionally about 10:1.
Modern emulators use caching, optimized recompilation for 4:1.
- For mobile markets, x86 has historically be ousted by RISCs
due to lower power consumption of RISCs.
- x86 instruction decoding in software.
- Current Intel/AMD processors convert x86 to micro-ops,
while Cycrix and Centaur execute x86 directly.
Micro-op tranlsationa adds a pipe stage and has more
overhead (microcode call) for complex instructions.
- "Micro-op" conversion in software saves control logic and simplifies
chip design.
- Can fix many kinds of bugs in software.
- Emulation is at least as old as 1964, with IBM's S/360
provided emulation for IBM's older 1401s.
[[Arguably, [Gill 51].
See also [Tucker 65] and
[Wilkes 69] -- pardo]]
- What is new: emulation is not an alternative, it is the whole
strategy.
[[Arguably, microcode is "the whole strategy" for many
processors; but Crusoe uses dynamic compilation of microcode
and skips most hardware used in a traditional microcode
approach. --pardo]]
- HW features to assist CMS:
- register files, many with shadows;
- gated store buffer;
- 80-bit FP registers;
- per-instruction "commit" bit;
- alias hardware to allow memory reference reordering;
- MMU protection of translated memory;
- special caches for translation software
Shadow state lets Crusoe execute speculatively
and out of order.
On an exception, can roll back to most recent committed state
by a simple copy.
Preserves prcecise exception model.
- On boot, reserves fixed block (e.g., 16MB) for translator
and recompiled code.
- Interpretation requires at least 12 clock cycles per x86 instruction.
- Dynamically profile code and select it for translation,
maybe optimization.
- Granularity of translation is one or more basic blocks.
- Familiar optimizations: loop unrolling, common subexpression
eliminiation, loop-invariant code removal, ...
Some are x86-specific: skip redundant sets of x86 condition
codes.
Some are VLIW-specific: combine multiple x86 blocks into one
VLIW block.
- Code expansion
- by increasing the number of instructions to do the same work; or
- by translating compact x86 instructions to longer VLIW equivalents.
Example from [Transmeta 00]:
- 20 x86 instructions to 10 VLIW instructions but 23 "useful"
VLIW packets plus 7 NOP packets. Total 32 subinstructions, 50%
more than the original x86 code.
- Further expansion because VLIW packets are 32b, but
"typical" x86 instructions are 16-24b.
- I$ expansion may be 33% to 150%
- Reduces effective size of caches including translation
cache.
- If translation cache flushing is often, hard to amortize
cost of translation.
- Cost of extra RAM for translation cache.
No free lunch -- tradeoff to get lower power.
- It is a VLIW, efficiency depends on scheduling -- no dynamic
reordering hardware.
For comparison, the IA-64 group has been refining Multiflow's
VLIW compiler for years; Transmeta started from scratch, and the
compiler has to run in real time, not overnight.
- Factor in Crusoe's favor: monitor actual usage.
- At time of writing this article, all results based on Transmeta
claims, no independent results.
- Crusoe's technology defies benchmarking: too heavy on repetitive
loops, overestimates performance; conversely, low-repeat
benchmarks may underestimate real performance.
Battery tests, similar: unless they mirror real-live use
cloesly, they will not represent what average users can
expect.
- Emulator is no longer new technology: Java JIT compilers,
Windows emulators,
dozens of emulators including otherwise-dead machines like
Apple II, Atari 2600, Commodor 64.
- [Haygood 1999]
-
%A Bill Haygood
%T Emulators and Emulation
%J Self (http://www.haygood.org/~bill/emul/index.html)
%D 1999
Review/summary by Pardo:
- Available
here
(see also
here
(reprinted with permission).
See ``Bill Haygood'' for contact information.
- Briefly describes implementation techniques for
three simulators used to simulate the PDP-8/e,
Zilog Z80A, and DEC LSI-11.
- Implementation tradeoffs favor large lookup tables to
reduce computation.
Instruction handlers are dispatched using a table
indexed by opcode; condition codes on the Z-80
emulator are computed by indexing a 2^16-entry table by
both 8-bit operands.
- Details of finding out what the actual operations do.
- PDP-8a FP coprocessor. It uses IEEE host hardware
with 6 bits less precision and a smaller exponent than the PDP-8a.
- [HJ 92]
-
\bibitem{HJ:92}
Reed Hastings and Bob Joyce,
``Purify: Fast Detection of Memory Leaks and Access Errors,''
Proceedings of the Winter USENIX Conference,
January 1992,
pp.~125-136.
- [HP 93]
-
\bibitem{HP:93}
John Hennessy and David Patterson,
``Computer Organization and Design: The Hardware-Software
Interface''
(Appendix A, by
James R. Larus),
Morgan Kaufman,
1993.
- [HCU 91]
-
\bibitem{HCU:91}
Urs H\"{o}lzle, Craig Chambers and David Ungar,
``Optimizing Dynamically-Typed Object-Oriented Languages With
Polymorphic Inline Caches,''
Proceedings of the European Conference on Object-Oriented
Programming (ECOOP),
July 1991,
pp.~21-38.
- [HU 94]
-
\bibitem{HU:94}
Urs H\"{o}lzle and David Ungar,
``Optimizing Dynamically-Dispatched Calls with
Run-Time Type Feedback,''
Proceedings of the 1994 ACM Conference on Programming Language
Design and Implementation (PLDI),
June, 1994,
pp.~326-335.
- [Hsu 89]
-
\bibitem{Hsu:89}
Peter Hsu,
``Introduction to Shadow,''
Sun Microsystems, Incorporated,
July 1989.
- [IMS 94]
-
\bibitem{IMS:94}
``IMS Demonstrates x86 Emulation Chip,''
Microprocessor Report,
9 May 1994,
pp.~5 and~15.
- [Irlam 93]
-
\bibitem{Irlam:93}
Gordon Irlam, Personal communication, February 1993.
- [James 90]
-
\bibitem{James:90}
David James,
``Multiplexed Busses: The Endian Wars Continue,''
IEEE Micro Magazine,
June 1990,
pp.~9-22.
- [Johnston 79]
-
\bibitem{Johnston:79}
Ronald L. Johnston,
``The Dynamic Incremental Compiler of APL{$\backslash$}3000,''
APL Quote Quad 9(4),
Association for Computing Machinery (ACM),
June 1979,
pp.~82-87.
- [KCW 98]
-
%A Edmund J. Kelly
%A Robert F. Cmelik
%A Malcolm John Wing
%T Memory Controller For A Microprocessor for Detecting A Failure Of Speculation On The Physical Nature Of A Component Being Addressed
%D 1998/11/03
%R United States Patent #05832205
Available as of 2000/03 via
http://www.patents.ibm.com/details?pn=US05832205__.
- [Keppel 91]
-
\bibitem{Keppel:91}
David Keppel,
``A Portable Interface for On-The-Fly Instruction Space Modification,''
Proceedings of the 1991 Symposium on Architectural Support for
Programming Languages and Operating Systems (ASPLOS-IV),
April 1991,
pp.~86-95
(source code is also available via anonymous ftp from
ftp://ftp.cs.washington.edu/pub/pardo/fly-1.1.tar.gz).
PostScript(tm) paper.
- [KEH 91]
-
\bibitem{KEH:91}
David Keppel,
Susan J. Eggers
and Robert R. Henry,
``A Case for Runtime Code Generation,''
University of Washington Computer Science and Engineering
technical report
UWCSE TR 91-11-04,
November 1991.
Html pointer
[Link broken, please e-mail <pardo@xsim.com> to
get it fixed.]
- [KEH 93]
-
\bibitem{KEH:93}
David Keppel,
Susan J. Eggers
and Robert R. Henry,
``Evaluating Runtime-Compiled Value-Specific Optimizations,''
University of Washington Computer Science and Engineering
technical report
UWCSE TR 93-11-02,
November 1993.
Html pointer
[Link broken, please e-mail <pardo@xsim.com> to
get it fixed.]
- [Killian 94]
-
\bibitem{Killian:94}
Earl Killian, Personal communication, February 1994.
- [KKB 98]
-
%A Alex Klaiber
%A David Keppel
%A Robert Bedicheck
%T Method and Apparatus for Correcting Errors in Computer Systems
%D 18 May 1999
%R United States Patent #05905855
Available as of 2000/03 via
http://www.patents.ibm.com/details?pn=US05905855__
- [LOS 86]
-
\bibitem{LOS:86}
T. G. Lang, J. T. O'Quin II and R. O. Simpson,
``Threaded Code Interpreter for Object Code,''
IBM Technical Disclosure Bulletin,
28(10), March 1986, pp.~4238-4241.
- [Larus 93]
-
\bibitem{Larus:93}
James R. Larus,
``Efficient Program Tracing,''
IEEE Computer 26(5),
May 1993,
pp.~52-61.
- [LB 94]
-
\bibitem{LB:94}
James R. Larus and Thomas Ball,
``Rewriting Executable Files to Measure Program Behavior,''
Software -- Practice and Experience 24(1),
February 1994,
pp.~197-218.
- [LS 95]
-
\bibitem{LS:95}
James R. Larus,
and
Eric Schnarr
``EEL: Machine-Independent Executable Editing,''
to appear: SIGPLAN Conference on Programming Language Design and
Implementation (PLDI),
pp. 291-300, June 1995.
PostScript(tm) paper,
or an laternate site for the
same paper.
- [Magnusson 93a]
-
\bibitem{Magnusson:93a}
Peter S. Magnusson,
``A Design For Efficient Simulation of a Multiprocessor,''
Proceedings of the First International Workshop on Modeling,
Analysis, and Simulation of Computer and Telecommunication
Systems (MASCOTS),
La Jolla, California,
January 1993,
pp.~69-78.
PostScript(tm) paper.
- [Magnusson 93b]
-
\bibitem{Magnusson:93b}
Peter S. Magnusson,
``Partial Translation,''
Swedish Institute for Computer Science
technical report T93:05,
1993.
PostScript(tm) paper.
- [MS 94]
-
\bibitem{MS:94a}
Peter S. Magnusson,
and David Samuelsson,
``A Compact Intermediate Format for SimICS,''
Swedish Institute of Computer Science
technical report R94:17,
September 1994.
PostScript(tm) paper.
- [MW 94]
-
\bibitem{MW:94}
Peter S. Magnusson,
and Bengt Werner,
``Some Efficient Techniques for Simulating Memory,''
Swedish Institute of Computer Science
technical report R94:16,
September 1994.
PostScript(tm) paper.
- [MW 95]
-
\bibitem{MW:94}
Peter S. Magnusson,
and Bengt Werner,
``Efficient Memory Simulation in SimICS''.
In 28th International Annual Simulation Simposium, Phoenix,
AZ. April 1995.
- [Matthews 94]
-
\bibitem{Matt:94}
Clifford T. Matthews,
``680x0 emulation on x86 (ARDI's syn68k used in Executor)''
USENET \code{comp.emulators.misc} posting,
3 November, 1994.
plain text document,
plain text document.
- [May 87]
-
\bibitem{May:87}
Cathy May,
``Mimic: A Fast S/370 Simulator,''
Proceedings of the ACM SIGPLAN 1987 Symposium on Interpreters and
Interpretive Techniques; SIGPLAN Notices 22(7),
June 1987,
pp.~1-13.
- [Nielsen 91]
-
\bibitem{Nielsen:91}
Robert D. Nielsen,
``DOS on the Dock,''
NeXTWorld,
March/April 1991,
pp.~50-51.
- [NG 87]
-
\bibitem{NG:87}
David Notkin
and William G. Griswold,
``Enhancement through Extension: The Extension Interpreter,''
Proceedings of the ACM SIGPLAN '87 Symposium on Interpreters and
Interpretive Techniques,
June 1987,
pp.~45-55.
- [NG 88]
-
\bibitem{NG:88}
David Notkin
and William G. Griswold,
``Extension and Software Development,''
Proceedings of the 10th International Conference on Software
Engineering,
Singapore,
April 1988,
pp.~274-283.
- [Kep 03]
-
\bibitem{Keppel:03}
David Keppel,
``How to Detect Self-Modifying Code During Instruction-Set Simulation'',
April, 2003.
Available as of 2003/10 from
``xsim.com'' (papers).
``How To Detect Self-Modifying Code During Instruction-Set Simulation''
[PM 94]
\bibitem{PM:94}
Jim Pierce and Trevor Mudge,
``IDtrace -- A Tracing Tool for i486 Simulation,''
Proceedings of the International Workshop
on Modeling, Analysis and Simulation of
Computer and Telecommunication Systems
(MASCOTS),
January 1994.
[PRA 97]
\bibitem{PRA:97} Vijay S. Pai,
Parthasarathy Ranganathan, and
Sarita V. Adve. "RSIM: An Execution-Driven Simulator for ILP-Based
Shared-Memory Multiprocessors and Uniprocessors". In Proceedings of
the Third Workshop on Computer Architecture Education. February 1997.
[Patil 96]
@TECHREPORT{Patil96,
TITLE = "Efficient {P}rogram {M}onitoring {T}echniques",
AUTHOR = "Harish Patil",
INSTITUTION = "Computer Sciences department, University of Wisconsin",
YEAR = 1996,
MONTH = "July",
TYPE = "{TR} 1320: Ph.D. Dissertation",
ADDRESS = "Madison, Wisconsin",
}
[Patil 97]
@ARTICLE{patil97,
TITLE = "Low-cost, Concurrent Checking of Pointer and Array Accesses in
{C} Programs",
JOURNAL = "Software - Practice and Experience",
AUTHOR = "Harish Patil and Charles Fischer",
VOLUME = 27,
NUMBER = 1,
YEAR = 1997,
MONTH = "January",
PAGES = "87-110",
}
[PM 94]
\bibitem{PM:94}
Jim Pierce and Trevor Mudge,
``IDtrace -- A Tracing Tool for i486 Simulation,''
Technical report 203-94,
University of Michigan,
March 1994.
PostScript(tm) paper
[Pittman 87]
\bibitem{Pittman:87}
Thomas Pittman,
``Two-Level Hybrid Interpreter/Native Code Execution for Combined
Space-Time Program Efficiency,''
Proceedings of the 1987 ACM SIGPLAN Symposium
on Interpreters and Interpretive Techniques,
June 1987,
pp.~150-152.
[Pittman 95]
\bibitem{Pittman:95}
Thomas Pittman,
``The RISC Penalty,''
IEEE Micro,
December 1995,
pp.~5, 76-80.
Brief summary by Pardo:
The paper analyzes the costs of RISC due to higher (instruction) cache
miss rates.
Demonstrated by comparing the inner loop code for an interpretive
(ddi)
processor emulator to the inner loop code for a
dynamic cross-compiler.
With perfect cache hit ratios, the former would take 61 cycles
while the latter would take 18.
However, due to cache miss costs, the ``18-cycle'' version
took longer to run.
[RF 94a]
@techreport{ramsey:tk-architecture,
author="Norman Ramsey and Mary F. Fernandez",
title="The {New} {Jersey} Machine-Code Toolkit",
number="TR-469-94",
institution="Department of Computer Science, Princeton University",
year="1994"
}
PostScript(tm) paper,
conference paper
[RF 94b]
@techreport{ramsey:tk-architecture,
author="Norman Ramsey and Mary F. Fernandez",
title="{New} {Jersey} {Machine-Code} {Toolkit} Architecture Specifications",
number="TR-470-94",
institution="Department of Computer Science, Princeton University",
month="October",
year="1994"
}
WWW page,
PostScript(tm) paper.
[RF 94c]
@techreport{ramsey:tk-reference,
author="Norman Ramsey and Mary F. Fernandez",
month=oct,
title="{New} {Jersey} {Machine-Code} {Toolkit} Reference Manual",
number="TR-471-94",
institution="Department of Computer Science, Princeton University",
month="October",
year="1994"
}
WWW page,
PostScript(tm) paper.
[RF 95]
\bibitem{Ramsey:95}
Norman Ramsey
and
Mary F. Fernandez,
``The {New} {Jersey} Machine-Code Toolkit,''
Proceedings of the Winter 1995 USENIX Conference,
New Orleans, Louisiana,
January, 1995, pp~289-302.
@inproceedings{ramsey:jersey, refereed=1,
author="Norman Ramsey and Mary F. Fernandez",
title="The {New} {Jersey} Machine-Code Toolkit",
booktitle="Proceedings of the 1995 USENIX Technical Conference",
address="New Orleans, LA",
pages="289-302",
month=January,
year="1995"
}
[RFD 72]
\bibitem{RFD:72}
E. W. Reigel,
U. Faber,
D. A. Fisher,
``The Interpreter -- A Microprogrammable Building Block System,''
Spring Joint Computer Conference,
1972, pp.~705-723.
[RHLLLW]
\bibitem{RHLLLW:93}
Steven K. Reinhardt,
Mark D. Hill,
James R. Larus,
Alvy. R. Lebeck,
J. C. Lewis and
David A. Wood,
``The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers,''
Proceedings of the 1993 ACM SIGMETRICS Conference
on Measurement and Modeling of Computer Systems,
June 1993
pp.~48-60.
[Reuter 8X]
\bibitem{Reuter:8X}
Jim Reuter,
``Decomp,''
circa 1985.
source code
available from
`ftp://ftp.cs.washington.edu/pub/decomp.tar.Z',
and
sample inputs
available from
`ftp://ftp.cs.washington.edu/pub/pardo/decomp-samples.tar.gz'.
[RHWG 95]
\bibitem{RHWG:95}
Mendel Rosenblum,
Stephen A. Herrod,
Emmett Witchel
and
Anoop Gupta,
``Complete Computer System Simulation: The SimOS Approach,''
IEEE Parallel and Distributed Technology: Systems and Applications,
3(4):34-43, Winter 1995.
abstract,
Compressed PostScript®
(57 KB).
[RW 94]
\bibitem{RW:94}
Mendel Rosenblum
and
Emmett Witchel,
``SimOS: A Platform for Complete Workload Studies,''
Personal communication (submitted for publication),
November 1994.
[SW 79]
\bibitem{SW:79}
H. J. Saal and Z. Weiss,
``A Software High Performance APL Interpreter,''
APL Quote Quad 9(4),
June 1979,
pp.~74-81.
[Samuelsson 79]
\bibitem{Samuelsson:94}
David Samuelsson
``System Level Interpretation of the SPARC V8 Instruction
Set Architecture,''
Research report 94:23,
Swedish Institute of Computer Science,
1994.
[Sathaye 94]
\bibitem{Sath:94}
Sumedh W. Sathaye,
``Mime: A Tool for Random Emulation and Feedback Trace Collection,''
Masters thesis,
Department of Electrical and Computer Engineering,
University of South Carolina,
Columbia, South Carolina,
1994.
[SE 93]
\bibitem{SE:93}
Gabriel M. Silberman and Kemal Ebcio\u{g}lu
``An Architectural Framework for Supporting Heterogeneous
Instruction-Set Architectures,''
IEEE Computer,
June 1993,
pp.~39-56.
[SCKMR 92]
\bibitem{SCKMR:92}
Richard L. Sites,
Anton Chernoff,
Matthew B. Kirk,
Maurice P. Marks and
Scott G. Robinson,
``Binary Translation,''
Digital Technical Journal
Vol. 4 No. 4 Special Issue 1992.
Html paper,
PostScript(tm) paper.
[SCKMR 93]
\bibitem{SCKMR:93}
Richard L. Sites,
Anton Chernoff,
Matthew B. Kirk,
Maurice P. Marks and
Scott G. Robinson,
``Binary Translation,''
Communications of The ACM (CACM) 36(2),
February 1993,
pp.~69-81.
[Smith 91]
\bibitem{Smith:91}
M. D. Smith,
``Tracing With Pixie,''
Technical Report CSL-TR-91-497, Stanford University,
Computer Systems Laboratory, November 1991.
PostScript(tm).
[Sosic 92]
\bibitem{Sosic:92}
Rok Sosi\v{c},
``Dynascope: A Tool for Program Directing,''
Proceedings of the 1992 ACM Conference on Programming Language
Design and Implementation (PLDI),
June 1992,
pp.~12-21.
[Sosic 94]
\bibitem{Sosic:94}
Rok Sosi\v{c},
``Design and Implementation of Dynascope,
a Directing Platform for Compiled Programs,''
technical report CIT-94-7, School of Computing and Information
Technology, Griffith University, 1994.
[Sosic 94b]
\bibitem{Sosic:94b}
Rok Sosi\v{c},
``The Dynascope Directing Server: Design and Implementation,''
Computing Systems, 8(2): 107-134, Spring 1994
[SE 94a]
\bibitem{SE:94a}
Amitabh Srivastava,
and
Alan Eustace,
``ATOM: A System for Building Customized Program Analysis Tools,''
Research Report
94/2,
March 1994,
Digital Equipment Corporation Western Research Laboratory,
March 1994.
[SE 94b]
\bibitem{SE:94b}
Amitabh Srivastava,
and
Alan Eustace,
``ATOM: A System for Building Customized Program Analysis Tools,''
Proceedings of the 1994 ACM Conference on Programming Language
Design and Implementation (PLDI),
June 1994,
pp.~196-205.
[SW 92]
\bibitem{SW:92}
Amitabh Srivastava,
and
David W. Wall,
``A Practical System for Intermodule Code Optimization at Link-Time,''
Research Report
92/6,
Digital Equipment Corporation Western Research Laboratory,
December 1992.
[SW 93]
\bibitem{SW:93}
Amitabh Srivastava,
and
David W. Wall,
``A Practical System for Intermodule Code Optimization at Link-Time,''
Journal of Programming Languages,
March 1993.
[SF 89]
\bibitem{SF:89}
Craig B. Stunkel and W. Kent Fuchs,
``{\sc Trapeds}: Producing Traces for Multicomputers via Execution Driven
Simulation,''
ACM Performance Evaluation Review,
May 1989,
pp.~70-78.
[SJF 91]
\bibitem{SJF:91}
Craig B. Stunkel, Bob Janssens and W. Kent Fuchs,
``Address Tracing for Parallel Machines,''
IEEE Computer 24(1),
January 1991,
pp.~31-38.
[Tucker 65]
%A S. G. Tucker
%T Emulation of Large Systems
%J Communications of the ACM (CACM)
%V 8
%N 12
%D December 1965
%P 753-761
ABSTRACT: The conversion problem and a new technique called
emulation are discussed. The technique of emulation is developed and
includes sections on both the Central Processing Unit (CPU) and the
Input/Output (I/O) unit. This general treatment is followed by three
sections that describe in greater detail the implemention of
compatibility features using the emulation techniques for the IBM
7074, 7080 and 7090 systems on the IBM System/360.
Cited by [Wilkes 69].
Pardo has a copy.
[UNMS 94a]
\bibitem{UNMS:94}
Richard Uhlig, David Nagle, Trevor Mudge and Stuart Sechrest,
``Tapeworm~II: A New Method for Measuring OS Effects on Memory
Architecture Performance,''
Technical Report, University of Michigan,
Electrical Engineering and Computer Science Department,
May 1994.
[UNMS 94b]
\bibitem{UNMS:94}
Richard Uhlig, David Nagle, Trevor Mudge and Stuart Sechrest,
``Trap-driven Simulation with Tapeworm II,''
Sixth International Conference on Architectural Support for
Programming Languages and Operating Systems (ASPLOS-VI),
San Jose, California,
October 5-7, 1994.
[Veenstra 93]
\bibitem{Veenstra:93}
Jack E. Veenstra,
``Mint Tutorial and User Manual,''
University of Rochester Computer Science Department,
technical report 452,
May 1993.
[VF 94]
\bibitem{VF:94}
Jack E. Veenstra and Robert J. Fowler,
``{\sc Mint}: A Front End for Efficient Simulation of Shared-Memory
Multiprocessors,''
Proceedings of the Second International Workshop on Modeling,
Analysis, and Simulation of Computer and Telecommunication Systems
(MASCOTS),
January 1994,
pp.~201-207.
[Wilkes 69]
\bibitem{Wilkes:69}
Maurice V. Wilkes
``The Growth of Interest in Microprogramming: A Literature Survey,''
Computing Surveys 1(3):139-145, September 1969.
Excerpts:
- (pg 142): ``1965 and 1966 saw papers on an entirely new
subject, namely
the emulation of one computer by another
[42-44].
Tucker [38] defined an emulator as a package that includes
special hardware and a complementary set of software routines.
An emulator runs five or even ten times as fast as a purely
software simulator.
Tucker goes on to discuss the design of emulators for large
systems.
It is only in very unusal circumstances that it is practicable
to write a microprogram that implements directly on the object
machine [[host --pardo]]
the instruction set of the subject machine
[[target --pardo]];
this is because of differences in word length, processor
structure, and so on.
Tucker recommends that in order to design an emulator, one
should first study a simulator and see in what areas it spends
most of its time.
This analysis will generally lead to the identification, as
candidates for microprogramming, of a group of special
instructions which are related tnot to specific instructions
of the sbuect machine but rather to problems common to many
such instructions.
The most important of these special instructions is likely to
be one that performs a similar function to the main loop in an
interpreter and sends control to an appropriate software
simulator for each instruction interpreted.
Another will probably be an instruction that performs a
conditional test in the way that it is performed on the
subject machine.
It may also be worthwhile adding special instructions to deal
with such instructions of the subject machine as are difficult
to simulate.
If this procedure is carried to the extreme,
the software simulation disappears altogether and we have a
full hardware feature.
Full hardware features are econimically practicable only for
small machines (McCormack, et al [41]).
Sometimes the design of an emulator can be much simplified if
a small change or addition is made to the register
interconnection logic of the object machine; an example, cited
by Tucker, is the addtion of a small amount of logic to the
IBM System 360/65 processor in order to facilitate the
emulation of overflow detection on IBM 7090 shifts.
Such addtions (if made) can enable the efficiency of the
emuator as a whole to be imrpoved to a useful extent.
Sometimes more substantial additions are worthwhile,
such as hardware registers intended to correspond to
particular register on the subject machine.
By careful design of an emulator it is even possible to handle
correctly certain types of fuction that are time-dependent on
the subject machine.
McCormack, et al [41] gives an example of a case in which
hardware additions to the object machine were necessary in
order to enable it, when running under the mulator, to handle
data at the rates required by certain peripheral devices.
It is generally found that, in order to accomodate an
emulator, it is necessary to provide a section section to the
read-only memory approximately equal in size to the section
that holds the microprogram for the basic instruction set.
There is no doubt that emulators will be of great economic
importance to the computer industry in the future, and the
fact that they can be provided relatively easily on a
microprogrammed computer is an argument in favor of
microprogramming as a design method.
... Opler [45] has suggested the term firmware
[... and ...] suggests that firmware may take its place along with
software and hardware as one of the main commodities of the
computer field.
- [38] Tucker, S. G. Emulation of large systems,
Comm. ACM 8, 12 (Dec. 1965), 753-761.
(See [Tucker 65].)
- [41] McCormack, M. A., Schansman, T. T., and Womack, K. K.
1401 compatability feature on the IBM system/360 model 30.
Comm. ACM 8, 12 (Dec. 1965), 773-776.
- [42] Benjamin, R. I.
The Spectra 70/45 emulator for the RCA 301
Comm. ACM 8, 12 (Dec. 1965), 748-752.
- [43] Green, J.
Microprogramming, emulators and programming languages
Comm. ACM 9, 3 (Mar. 1966), 230-232.
- [44] Campbell, C. R., and Neilson, D. A.
Microprogramming the Spectra 70/35,
Datamation 12, 9 (1966), 64.
- [45] Opler, A.
Fourth generation software.
Datamation 13, 1 (1967), 22.
[Wilner 72]
\bibitem{Wilner:72}
W. T. Wilner,
``Design of the Burroughs B1700,''
Fall Joint Compuer Conference, 1972, 489-497.
Excerpts:
- The B1700 uses user-programmed microcode that is dynamically
swapped to and from main memory.
The goal is to support multitasking
in an environment where there are many macro-machines, each
tailored to a particular high-level language or machine
simulation.
- (pp. 489-490) The design rationalle is that good performance is
achieved only if programs are executed by a machine that closely
resembles the language (the ``semantic gap'' argument). Includes
target machine emulation as a language.
- (pg. 491) The B1700 includes: bit-level memory addressability,
microcode (conceptually) executed from main memory to allow
interleaved program execution; 16-bit microinstructions; 2/4/6 MHz
micro-issue; 14-53 microseconds to reconfigure (swap instruction sets)
at 6MHz; compilers ``may not'' generate code for the microengine, thus
ensuring portability.
- (pg. 492) All memory accesses are via a transliteration processor
that maps variable-width bit-addressed references onto main memory.
- (pg. 493) Switching interpreters between instructions is useful not
only for context switches but also for switching e.g. between traced
and non-traced execution.
- (pg. 493) Microcode logically executes out of main memory, but may
actually use processor-local memory.
- (pg. 495) Running-time comparisons indicate that an RPG-II program
runs an order of magnitude faster than on an IBM System/3 that has a
similar lease rate (memory bound); another set of (banking) benchmarks
ran in 50% to 110% of the System/3 time (not clear if I/O bound, but
the B1700 had a slower card reader). Notes that B1700 supports
bit-variable addressing, segmented virtual memory, multitasking, ...
- Microarchitecture not clearly described. Note that uses 16-bit
(vertical) microcode. Might actually be a two-level scheme as in
The Interpreter.
[WK 99]
%A Malcolm J. Wing
%A Edmund C. Kelly
%T Method And Apparatus for Aliasing Memory Data In An Advanced Microprocessor
%D 20 July 1999
%R United States Patent #05926832
Available as of 2000/03 via
http://www.patents.ibm.com/details?pn=US05926832__
From instruction-set simulation and tracing