\bibitem{HV:79} R. N. Horspool and N. Marovac, ``An Approach to the Problem of Detranslation of Computer Programs,'' The Computer Journal, 23(3)223-229, 1979.
C. Cifuentes says that it may not apply to e.g. x86 architectures; limits also mentioned in one of May's papers
It is less liable to cause legal troubles with copyright owners' rights to control all derivative works, because the RTCG's result is only a transient copy rather than a permanently stored codefile. RTCG-based emulation techniques are legal IFF the 1995 generation of chips' hardware-implemented transformations at icache load time are legal.
%A Tom Thompson, %T Building the Better Virtual CPU %J Byte %D August 1995 %P 149-150which, Duane Sand says (paraphrasing):
Describes two variations of Apple's 68K interpreter it used in the initial PowerMacs. Both variations identify frequently executed blocks of 68K code, compile them with trivial peephole optimizations into host RISC code, hold the code in a software-managed "cache" until it's full, then throw it all away and start over. One variation is used on Unix emulations of Apple, and the other variation is used on the 'Power Mac 9500', in combination with a modified interpreter with a smaller lookup table footprint than in the first generation PowerMacs. (The original interpeter used so much lookup table space that it ran poorly on the original PPC 603 chips, which held up Apple's plans for laptop PowerMacs for a year.) On PowerMacs that are able to run both old & new versions, the new version averages 20-30% speedup over the entire (nonnative) application.
%A S. Graham %T The Semi-Automatic Computer Conversion System (SACCS) %J Presented at the ACM Reprogramming Conference %C Princeton, New Jersey %D June 1965 %W Referenced by [Gaines 65]
%A R. Stockton Gaines %T On the Translation of Machine Language Programs %J Communications of the ACM (CACM) %V 8 %N 12 %D December 1965 %P 736-741
%A George T. Dellert, Jr. %T A Use of Macros in Translation of Symbolic Assembly Language of One Computer to Another %J Communications of the ACM (CACM) %V 8 %N 12 %D December 1965 %P 742-748
%A R. I. Benjamin %T The Spectra 70/45 Emulator for the RCA 301 %J Communications of the ACM (CACM) %V 8 %N 12 %D December 1965 %P 748-752
Pardo has a copy.
%A Thomas M. Olsen %T Philco/IBM Translation at Problem-Oriented Symbolic and Binary Levels %J Communications of the ACM (CACM) %V 8 %N 12 %D December 1965 %P 762-768Pardo has a copy.
%A Marvin Lowell Graham %A Peter Zilahy Ingerman %T An Assembly Language for Reprogramming %J Communications of the ACM (CACM) %V 8 %N 12 %D December 1965 %P 769-773Pardo has a copy.
%A M. A. McCormack %A T. T. Schansman %A K. K. Womack %T 1401 Compatability Feature on the IBM System/360 Model 30 %J Communications of the ACM (CACM) %V 8 %N 12 %D December 1965 %P 773-776Pardo has a copy.
%A Donald M. Wilson %A David J. Moss %T CAT: A 7090-3600 Computer-Aided Translation %J Communications of the ACM (CACM) %V 8 %N 12 %D December 1965 %P 777-781Pardo has a copy.
%A Mark I. Halpern %T Machine Independence: Its Technology and Economics %J Communications of the ACM (CACM) %V 8 %N 12 %D December 1965 %P 782-785Pardo has a copy.
[Emmerik 94] M van Emmerik, ``Signatures for Library Functions in Executable Files'', Technical Report 2/94, Queensland University of Technology, Faculty of Information Technology, April 1994. Submitted ... 1994. PostScript(tm)
%A David Wakeling %T A Throw-away Compiler for a Lazy Functional Language %J Proceedings of the Fuji Internation Workshop on Functional and Logic Programming %C Susono, Japan %D July 1995 %P 203--216 %W David Wakeling <david@dcs.exeter.ac.uk> ``http://www.dcs.ex.ac.uk/~david'' %X Dynamic cross-compiler for a virtual machine used to run lazy languages.
[Rowson 94] @InProceedings{Rowson:94, author = "James A. Rowson", title = "Hardware/Software Co-Simulation", booktitle = "Proc.~of the 31st Design Automation Conference (DAC~'94)", year = "1994", organization = "ACM", address = "San Diego, CA", OPTmonth = "June", note = "(Tutorial)", OPTannote = "" }
@InProceedings{Rogers:92, author = "Anne Rogers and Kai Li", title = "Software Support for Speculative Loads", pages = "38-50", booktitle = "Proc.~of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems", year = "1992", month = "October" }Evidently contains information about a cycle-level simulator. More in
@TechReport{Rogers:93, author = "Anne Rogers and Scott Rosenberg", title = "Cycle Level {SPIM}", institution = "Department of Computer Science, Princeton University", year = 1993, address = "Princeton, NJ", month = "October" }
ssim: A Superscalar Simulator Mike Johnson AMD M. D. Smith Stanford Univ.Pixie front ends in ftp://velox.stanford.edu/pub
Johnson, Mike: Superscalar microprocessor design Englewood Cliffs, NJ : Prentice Hall, 1991. - XXIV, 288 S. : graph. Darst. (Prentice-Hall series in innovative technology) Literaturverz. S. 273 - 278 ISBN 0-13-875634-1
@Book{Huck:89, author = {Jerome C. Huck and Michael J. Flynn}, title = {Analyzing Computer Architectures}, publisher = {IEEE Computer Society Press}, year = 1989, address = {Washington, DC} }
%A Max Copperman %A Jeff Thomas %T Poor Man's Watchpoints %J ACM SIGPLAN NotIces %V 30 %N 1 %D January 1995 %P 37-44Pardo has a copy. Executive summary: debugging tool; statically patches loads and stores with code to check for data breakpoints.
Amusing story: The processor they were running on has load delay slots and does not have pipeline interlocks. Their tool replaces each load or store with several instructions; it patched a piece of user-mode code of the form
load addr -> r5 store r5 -> addr2Before patching, the code saved the old value of r5 to addr2. After patching, it saved the new value. Technically, this code was broken already because the symptom could have also been exhibited by an interrupt or exception between the load and the store.
``Spike was built inside GNU GCC by Michael Golden and myself. It includes a lot of features that have appeared in ATOM, including the simulator with the benchnark into a single ``self-tracing'' binary. The instruction trace was based on an abstract machine model distilled from GCC's RTL; it had both a high-level and a low-level form. Spike is still in occasional use, but has never been released.''
Basic summary: Wanted to profile. -p/-pg code is larger and slower by enough to make it hard to justify profiling as he default. Assumes the entire source is available. For these and other reasons, wrote jprof which operates with disassembly, analysis and rewriting. Discusses sampling errors, expected accuracy, stability, randomness, etc. Describes jprof: counters and stopwatches; subroutine call graph. Domain/OS on HP/Apollo using 68030. Discusses shared libraries. Can also use page-fault clock. 4-microsecond clocks. Some lessons/observations. Doesn't explain how program running time is affected by jprof.
Summary: DEC is running Win32 application binaries on Alpha by a new combination of interpreter and static translator. The static translator runs in the background, between the first and second executions of the application. It uses info collected by the interpreter during the 1st run, to reliably distinguish active code paths from r/o data and work out the effects of indirect jumps. Static analysis can't do this automatically on its own, for typical x86 binaries.
%A P. J. Brown %T Re-creation of Source Code from Reverse Polish Form %J Softwawe \- Practice & Experience %V 2 %N 3 %P 275-278 %D 1972Note: there's a slightly later SPE that has a follow-up article explaining how to do it faster/more efficiently.
%A Ariel Pashtan %T A Prolog Implementation of an Instruction-Level Processor Simulator %J Software \- Practice and Experience %V 17 %N 5 %P 309-318 %D May 1987
From: bchen@eecs.harvard.edu (Brad Chen) Newsgroups: comp.arch Subject: Windows x86 Address Traces Available Date: 7 Oct 1996 22:20:30 GMT Organization: Harvard University EECS Lines: 15 Message-ID: <53bvne$5lb@necco.harvard.edu> NNTP-Posting-Host: steward.harvard.edu Keywords: Windows x86 address traces
A collection of x86 memory reference traces from Win32 applications are now available from the following URL: http://etch.eecs.harvard.edu/traces/index.html. The collection includes traces from both commercial and public-domain applications. The collection currently includes:
- Perl - MPeg Play - Borland C++ - Microsoft Visual C - Microsoft WordThese traces were created using Etch, and instrumentation and optimization tool for Win32 executables. For more information on Etch see the above URL.
(etch-info@cs.washington.edu)
Peter Kuhn voice: +49-89-289-23092 Institute for Integrated Circuits (LIS) fax1: +49-89-289-28323 Technical University of Munich fax2: +49-89-289-25304 Arcisstr. 21, D-80290 Munich, Germany email: P_Kuhn@lis.e-technik.tu-muenchen.de http: //www.lis.e-technik.tu-muenchen.de/people/kp.html
From: Harish PatilNewsgroups: comp.compilers Subject: Thesis available: Program Monitoring Date: 29 Jan 1997 11:21:02 -0500 Organization: Compilers Central Lines: 59 Sender: johnl@iecc.com Approved: compilers@ivan.iecc.com Message-ID: <97-01-223@comp.compilers> Reply-To: Harish Patil NNTP-Posting-Host: ivan.iecc.com Keywords: report, available, performance Hello everyone: I am glad to announce that my Ph.D. thesis, titled "Efficient Program Monitoring Techniques", is available on-line. This thesis was completed under the supervision of Prof. Charles Fischer at the department of Computer Sciences, University of Wisconsin --Madison. The thesis is available as technical report # 1320. Please check it out at the URL: http://www.cs.wisc.edu/Dienst/UI/2.0/Describe/ncstrl.uwmadison%2fCS-TR-96-1320 An abstract of the thesis follows. Regards, -Harish Efficient Program Monitoring Techniques --------------------------------------- Programs need to be monitored for many reasons, including performance evaluation, correctness checking, and security. However, the cost of monitoring programs can be very high. This thesis contributes two techniques for reducing the high execution time overhead of program monitoring: 1) customization and 2) shadow processing. These techniques have been tested using a memory access monitoring system for C programs. "Customization" reduces the cost of monitoring programs by decoupling monitoring from original computation. A user program can be customized for any desired monitoring activity by deleting computation not relevant for monitoring. The customized program is smaller, easier to analyze, and almost always faster than the original program. It can be readily instrumented to perform the desired monitoring. We have explored the use of program slicing technology for customizing C programs. Customization can cut the overhead of memory access monitoring by up to half. "Shadow processing" hides the cost of on-line monitoring by using idle processors in multiprocessor workstations. A user program is partitioned into two run-time processes. One is the main process executing as usual, without any monitoring code. The other is a shadow process following the main process and performing the desired monitoring. One key issue in the use of shadow process is the degree to which the main process is burdened by the need to synchronize and communicate with the shadow process. We believe the overhead to the main process must be very modest to allow routine use of shadow processing for heavily-used production programs. We therefore limit the interaction between the two processes to communicating certain irreproducible values. In our experimental shadow processing system for memory access checking the overhead to the main process is very low - almost always less than 10%. Further, since the shadow process avoids repeating some of the computations from the main program, it runs much faster than a single process performing both the computation and monitoring. ========================================================================== Harish Patil: Massachusetts Language Lab - Hewlett Packard Mail Stop CHR02DC, 300 Apollo Drive, Chelmsford MA 01824 Phone: 508 436 5717 Fax: 508 436 5135 Email: patil@apollo.hp.com
Categories:
See:
CS192-24 TRACEPOINT NAMES ITS FIRST PRODUCT DEC spin-off Tracepoint Technology named its first product HiProf, as we suspected it would (CSN No 185), and described it as a graphical hierarchical profiler that will enable C++ developers to analyze the binaries of 32-bit x86 applications and figure out where modifications should be made. The first of a family, the tool is based on a patented Binary Code Instrumentation technology that displays a detailed analysis of an application's execution in Tracepoint's IDE. The company's core framework can handle executables and .dlls that have been generated by compiling software as well. Therefore, source code shouldn't have to be recompiled. The data can be viewed on a threads basis. HiProf is due out next month at $599 and runs on Win95 or NT 3.51 or later. It supports apps developed with VC++ 2.0 or above and Microsoft Developer Studio 4.03.
Newsgroups: comp.compilers Subject: ANNOUNCE - Fast Code Coverage Tool Date: 8 May 1997 21:27:24 -0400 Organization: Tracepoint/DIGITAL Lines: 33 Sender: johnl@iecc.com Approved: compilers@ivan.iecc.com Message-ID: <97-05-111@comp.compilers> Reply-To: jgarvin@scruznet.com NNTP-Posting-Host: ivan.iecc.com Keywords: testing, tools, available ANNOUNCING - TestTrack, Fast Code Coverage Tool for 32-bit Windows Apps TracePoint Technology has just opened the beta for TestTrack - an advanced code coverage tool that analyses test results and identifies areas in your code that have not been tested. Since TestTrack works on compiled and linked binary code (no source code or obj files required), there=92s no need for recompiling or preprocessing so the entire process is dramatically quicker than with past generation tools. TestTrack analyzes and reports on coverage of several different types including; function coverage, class coverage, line coverage, branch coverage, multiple condition coverage, call-pair coverage and more. TestTrack allows you to selectively exclude portions of the code base , if desired, so you can analyze only those portions of an app that concern you. A robust and intuitive GUI displays results in "live" pie charts or bar graphs that let you drill down into the code represented with just a mouse click, extensive reporting capabilities include the ability to publish reports in html, and a powerful merge function allows you to merge the results of several test runs for total coverage analysis. In addition, TestTrack identifies dead code in your app which is no longer used but which can slow performance and bloat program size. An evaluation copy of the latest TestTrack beta is available for free download from TracePoint at www.tracepoint.com. TestTrack works on 32-bit apps generated with VC++ 2.x - 5.0. TracePoint is a recent spin-off of DIGITAL Equipment Corp, whose mission is to create and market advanced development tools for 32-bit Windows apps. For further information on TracePoint visit our web site or call 888-688-2504.
From: dcpi-czar@pa.dec.com (Lance Berc) Newsgroups: comp.arch,comp.sys.dec,comp.unix.osf.osf1,comp.compilers Subject: New Alpha Performance Analysis Tools Date: 20 Jun 1997 21:43:17 -0400 Organization: Digital Equipment Corporation, Systems Research Center Lines: 33 Sender: johnl@iecc.com Approved: compilers@ivan.iecc.com Message-ID: <97-06-084@comp.compilers> NNTP-Posting-Host: ivan.iecc.com Keywords: tools, available
Version 2.2 of the DIGITAL Continuous Profiling Infrastructure, a set of performance tools for Digital Alpha systems running Digital Unix, is available for general use.
The Digital Continuous Profiling Infrastructure for Digital Alpha platforms permits continuous low-overhead profiling of entire systems, including the kernel, user programs, drivers, and shared libraries. The system is efficient enough that it can be left running all the time, allowing it to be used to drive online profile-based optimizations for production systems.
The Continuous Profiling Infrastructure maintains a database of profile information that is incrementally updated for every executable image that runs. A suite of profile analysis tools analyzes the profile information at various levels. At one extreme, the tools show what fraction of cpu cycles were spent executing the kernel and each user program. At the other extreme, the tools show how long a particular instruction stalls on average, e.g., because of a D-cache miss.
DCPI runs under Digital Unix V3.2 and V4.x, with a port to WindowsNT underway. It is free of charge. Further information, including papers and man pages, can be found at: http://www.research.digital.com/SRC/dcpi The system was developed at Digital's Systems Research Center and Western Research Laboratory, both in Palo Alto, California. A paper describing the system, will appear at SOSP-16 in October. -- Send compilers articles to compilers@iecc.com, meta-mail to compilers-request@iecc.com.See http://www.research.digital.com/SRC/dcpi
From: el@compelcon.se (Erik Lundh) Newsgroups: comp.compilers Subject: Re: asm -> structured form Date: 14 Jan 1998 14:28:38 -0500 Organization: Algonet/Tninet Lines: 22 Sender: johnl@iecc.com Approved: compilers@ivan.iecc.com Message-ID: <98-01-055@comp.compilers> References: <98-01-013@comp.compilers> NNTP-Posting-Host: ivan.iecc.com Keywords: disassemble, tools, comment Have a look at Christina Cifuentes work with decompilers at http://www.it.uq.edu.au/groups/csm/dcc.html Also, have a look at Frans Faase's excellent compilation of decompiler efforts at http://wwwis.cs.utwente.nl:8080/~faase/Ha/decompile.html (There is a disclaimer at the top of the page that Mr Faase has left the faculty and might be unable to maintain the page. But the last update is dated in december 1997... Hope he can keep it!) Best Regards, Erik Lundh Compelcon AB SWEDEN Alexander Kjeldaaswrote: [I'm impressed -- it does a better job of decompiling than anything I've seen elsewhere. It's still a far cry from the original source, but good enough to be a big help figuring out what a dusty old program does. -John] -- Send compilers articles to compilers@iecc.com, meta-mail to compilers-request@iecc.com. Archives at http://www.iecc.com/compilers
The SimOS team at Stanford University is pleased to announce the second release of our complete machine simulation environment. If you are receiving this email, then you have downloaded an earlier version of SimOS or were deemed "someone who may be interested". If you would like to be taken off this infrequently used list, send mail to "simos@cs.stanford.edu" and we'll take you off of it immediately.
For those of you who need a refresher, SimOS is a "complete machine simulator" in that it models the hardware of uniprocessor and multiprocessor computers in enough detail to boot and run commercial operating systems as well as applications designed for these operating systems. This includes databases, web servers, and other workloads that traditional simulation tools have trouble supporting. Furthermore, SimOS executes these workloads at high speeds and provides support for easily collecting detailed hardware and software performance information.
There have been substantial improvements and enhancements since the first SimOS release including:
* Support for the Digital Alpha architecture running the Digital Unix operating system.
* Support for the MIPS 64-bit architecture.
* More modular hardware simulator interfaces that simplify the process of adding new processor, memory system, and device models.
SimOS is available free of charge for the research community and runs on several different hardware platforms. For download information, research papers, a discussion group, and more, visit the new SimOS web site at:
Date: Sun, 21 Jun 1998 10:50:58 -0400 Reply-To: History of Computing IssuesTo coincide with the 50th Anniversary of the Small Scale Experimental Machine at Manchester, I am releasing the first "official" version of an SSEM simulator written in Java, and therefore (presumably) platform-independent. Source and binaries are available atSender: History of Computing Issues From: Lee Wittenberg Subject: SSEM Simulator To: SHOTHC-L@SIVM.SI.EDU X-UIDL: 2b629fc6064ad7c8f2c0919e41c76276
ftp://samson.kean.edu/pub/leew/ssem/
Method for verifying contiquity of a binary translated block of instructions by attaching a compare and/or branch instruction to predecessor block of instructions Abstract A method for enabling a first block of instructions to verify whether the first block of instructions follows a second block of instructions in an order of execution. The method includes appending a compare instruction to the first block of instructions. The compare instruction compares a first value from the first block of instructions with a second value from the second block of instructions, which precedes the first block of instructions in the order of execution. The method further includes appending a branching instruction to the first block of instructions. The branching instruction is executed in response to the first value being unequal to the second value. The branching instruction, when executed, branches to an alternative look-up routine to obtain a block of instructions that follows the second block of instructions in the order of execution. http://patents.uspto.gov/cgi-bin/ifetch4?INDEX+PATBIB-ALL+0+24884+0+6+20371+OF+1+1+1+PN%2f5721927 What is claimed is: 1. A computer-implemented method for enabling a first block of instructions to verify whether the first block of instructions follows a second block of instructions in an order of execution the method comprising the steps of: a) appending a compare instruction to the first block of instructions, the compare instruction when executed compares a first value from the first block of instructions with a second value from the second block of instructions, said second block of instructions preceding said first block of instructions in the order of execution; and b) appending a branching instruction to the first block of instructions, said branching instruction is executed in response to the first value being unequal to the second value, said branching instruction, when executed, branches to an alternative look-up routine to obtain a block of instructions that follows the second block of instructions in the order of execution. U.S. REFERENCES: (No patents reference this one) Patent Inventor Issued Title 5167023 De Nicolas et al. 11 /1992 Translating a dynamic transfer control instruction address in a simulated CPU processor ABSTRACT: The system and method of this invention simulates the flow of control of an application program targeted for a specific instruction set of a specific processor by utilizing a simulator running on a second processing system having a second processor with a different instruction set. The simulator reduces the number of translated instructions needed to simulate the flow of control of the first processor instructions when translating the address of the next executable instruction resulting from a dynamic transfer of control, i.e., resulting from a return instruction. The simulator compares the address that is loaded at run time by the return instruction with the return address previously executed by that instruction. If the last return address matches, the location of the return is the same. If the last return does not match, a translate look-aside buffer is used to determine the address. If the translate look-aside buffer does not find the address, then a binary tree look up mechanism is used to determine the address of the next instruction after a return. The performance of the simulator is enhanced by utilizing the easiest approaches first in the chance that a translated instruction will result most efficiently. 5287490 Sites 2 /1994 Identifying plausible variable length machine code of selecting address in numerical sequence, decoding code strings, and following execution transfer paths ABSTRACT: Information about the location of untranslated instructions in an original program is discovered during execution of a partial translation of the program, and that information is used later during re-translation of the original program. Preferably the information includes origin addresses of translated instructions and corresponding destination address of untranslated instructions of execution transfers that occur during the execution of the partial translation. Preferably this feedback of information from execution to re-translation is performed after each execution of the translated program so that virtually all of the instructions in the original program will eventually be located and translated. To provide an indication of the fraction of the code that has been translated, the program is scanned to find plausible code in the areas of memory that do not contain translated code. The plausible code is identified by selecting addresses according to three different scanning modes and attempting to decode variable-length instructions beginning at the selected addresses. The scanning modes include a first mode in which addresses are selected in numerical sequence by a scan pointer, a second mode in which addresses are selected in instruction-length sequence by an instruction decode pointer, and a third mode in which the selected addresses are destination addresses of previously-decoded execution transfer instructions. hat is claimed is: 26. A method of operating a digital computer having an addressable memory, said addressable memory containing a computer program, said computer program including instructions and data at respective address locations of said addressable memory, each of said instructions consisting of contents of a variable number of contiguous ones of said address locations depending upon an operation specified by said each of said instructions, said method identifying address locations of said addressable memory that appear to contain said instructions of said computer program, said method comprising the steps of: a) selecting program addresses in numerical sequence, and attempting to decode an instruction in said addressable memory at each program address until an initial instruction is decoded; and when said initial instruction is decoded, then b) attempting to decode a string of instructions immediately following said initial instruction until an execution transfer instruction is decoded, and when an attempt to decode an instruction fails, continuing said selecting program addresses and said attempting to decode an instruction at each program address as set out in said step a), and when an execution transfer instruction is decoded, then c) attempting to decode an instruction at a destination address of the decoded execution transfer instruction, and when the attempt to decode an instruction at the destination address of the decoded execution transfer instruction fails, continuing said selecting program addresses and said attempting to decode an instruction at each program address as set out in step a), and when the attempt to decode an instruction at the destination address of the decoded execution transfer instruction succeeds, then identifying, as said address locations of said addressable memory that appear to contain said instructions of said computer program, the address locations including said initial instruction and said string of instructions including said execution transfer instruction, wherein some program addresses of said computer program are known to contain instructions, and wherein said step a) skips over the program addresses that are known to contain instructions, wherein the decoding of an instruction is not permitted when an instruction being decoded partially overlaps program addresses known to contain an instruction, and wherein said step a) skips over a program address containing a value that is included in a predefined set of values, regardless of whether an attempt to decode an instruction starting at the program address would be successful, wherein said set of values includes values that indicate instructions having a length of one program address location, said set of values includes opcodes of privileged instructions, and said set of values includes the value of zero, and wherein said step a) skips over a program address that is the first address of a string of at least four printable ASCII alphanumeric characters. 5560013 Scalzi et al. 9 /1996 Method of using a target processor to execute programs of a source architecture that uses multiple address spaces ABSTRACT: A method of utilizing large virtual addressing in a target computer to implement an instruction set translator (1ST) for dynamically translating the machine language instructions of an alien source computer into a set of functionally equivalent target computer machine language instructions, providing in the target machine, an execution environment for source machine operating systems, application subsystems, and applications. The target system provides a unique pointer table in target virtual address space that connects each source program instruction in the multiple source virtual address spaces to a target instruction translation which emulates the function of that source instruction in the target system. The target system efficiently stores the translated executable source programs by actually storing only one copy of any source program, regardless of the number of source address spaces in which the source program exists. The target system efficiently manages dynamic changes in the source machine storage, accommodating the nature of a preemptive, multitasking source operating system. The target system preserves the security and data integrity for the source programs on a par with their security and data integrity obtainable when executing in source processors (i.e. having the source architecture as their native architecture). The target computer execution maintains source-architected logical separations between programs and data executing in different source address spaces--without a need for the target system to be aware of the source virtual address spaces. Having thus described our invention, what we claim as new and desire to secure by Letters patent is: 1. An emulation method for executing individual source instructions in a target processor to execute source programs requiring source processor features not built into the target processor, comprising the steps of: inputting instructions of a source processor program to an emulation target processor having significant excess virtual addressing capacity compared to a virtual addressing capacity required for a source processor to natively execute the source processor program, and supporting multiple source virtual address spaces in the operation of the source processor, building a virtual ITM (instruction translation map) in a target virtual address space supported by the target processor, the virtual ITM containing an ITM entry for each source instruction addressable unit, each source instruction addressable unit beginning on a source storage instruction boundary, structuring each ITM entry for containing a translation address to a target translation program that executes a source instruction having a source address associated with the ITM entry, determining a ratio R by dividing the length of each ITM entry by the length of each source instruction addressable unit, accessing an ITM entry for an executing source instruction by: generating a source aggregate virtual address for the source instruction by combining the source address of the source instruction with a source address space identifier of a source virtual address space containing the instruction, multiplying the source aggregate virtual address by R to obtain a target virtual address component, and inserting the target virtual address component into a predetermined component location in a target virtual address to generate an ITM entry target virtual address for locating an ITM entry associated with the source instruction in order to obtain a one-to-one addressing relationship between ITM entry target virtual addresses and source instruction addresses. 5619665 Emma 4 /1997 Method and apparatus for the transparent emulation of an existing instruction-set architecture by an arbitrary underlying instruction-set architecture ABSTRACT: The invention provides means and methods for extending an instruction-set architecture without impacting the software interface. This circumvents all software compatibility issues, and allows legacy software to benefit from new architectural extensions without recompilation and reassembly. The means employed are a translation engine for translating sequences of old architecture instructions into primary, new architecture instructions, and an extended instruction (EI) cache memory for storing the translations. A processor requesting a sequence of instructions will look first to the EI-cache for a translation, and if translations are unavailable, will look to a conventional cache memory for the sequence, and finally, if still unavailable, will look to a main memory. I claim: 1. A method for translating a series of one or more instructions of a first semantic type into one or more instructions of a second semantic type, comprising the steps of: providing a first memory; providing a second memory; translating a sequence of instructions of the first semantic type stored in the first memory into one or more primary instructions of the second semantic type and storing the instructions of the second type in the second memory; upon a request from the processor for the sequence of instructions of the first semantic type: providing the corresponding instructions of the second semantic type if available in the second memory; providing the sequence of instructions of the first semantic type if the corresponding instructions of the second semantic type are not available in the second memory. [Others found.] 4347565 Kareda et al. 8 /1982 Address control system for software simulation ABSTRACT: An address control system for software simulation in a virtual machine system having a virtual storage function. When a simulator program is simulating an instruction of a program to be simulated, an address translation of an operand address in the program to be simulated is achieved using a translation lookaside buffer, thereby greatly reducing the overhead for the address translation during the simulator program execution. 4638423 Ballard 1 /1987 Emulating computer ABSTRACT: An apparatus and method is disclosed for providing an emulating computer. The present invention consists of a computer having a storage area, processing unit, control circuits and translation circuit. The original instructions are first loaded into the storage area. When the processor attempts to operate an instruction the control circuit loads a section of the instructions into the translating circuit. These instructions are then translated and stored in a memory area of the translating circuit having the address of the original instruction. The processor unit then accesses the storage area and retrieves the translated instruction. What is claimed is: 7. A method of emulating a computer comprising the steps of: transmitting an instruction to a processing unit; checking a cache memory for a translated instruction; loading an instruction block into an instruction memory if said translated instruction is not in said cache memory; translating an instruction of said instruction block providing a translated instruction; storing said translated instruction in said cache memory; and transmitting said translated instruction from said cache memory to said processing unit.
DATE: June 10, Thursday, 2:30 TITLE: Jalapeno --- a new Java Virtual Machine for Servers SPEAKER: Vivek Sarkar IBM T. J. Watson Research Center ABSTRACT: In this talk, we give an overview of the Jalapeno Java Virtual Machine (JVM) research project at the IBM T. J. Watson Research Center. The goal of Jalapeno is to expand the frontier of JVM technologies for server nodes --- especially in the areas of dynamic optimized compilation and specialization, scalable exploitation of multiple processors in SMPs, and the use of a JVM as a 7x24 application server. The Jalapeno JVM has two key distinguishing features. First, the Jalapeno JVM takes a compile-only approach to program execution. Instead of providing both an interpreter and a JIT/dynamic compiler, it provides two dynamic compilers --- a quick non-optimizing "baseline" compiler, and a slower production-strength optimizing compiler. Both compilers share the same interfaces with the rest of the JVM, thus making it easy to mix execution of unoptimized methods with optimized methods. Second, the Jalapeno JVM is itself implemented in Java! This design choice brings with it several advantages as well as technical challenges. The advantages include a uniform memory space for JVM objects and application objects, and ease of portability. The key technical challenge is to overcome the large performance penalties of executing Java code (compared to native code) that has been the experience of current JVMs; if we succeed in doing so, we will simultaneously improve the performance of our JVM as well as of the applications running on our JVM. The Jalapeno project was initiated in January 1998 and is still work in progress. This talk will highlight our design decisions and early experiences in working towards our goal of building a high-performance JVM for SMP servers.
http://www.cs.utah.edu/projects/avalanche/avalanche-publications.html http://www.cs.utah.edu/projects/avalanche/paint.ps http://www.hensa.ac.uk/parallel/simulation/architectures/paint/paint.tar.Z
"What's visible about software is the effect it has on something else. If two thoroughly different programs have the same observable effects, you cannot tell which one has executed. If a given portion of a program has no observable effects, then you have no way of knowing if it is executing, if it has finished, if it got part way through and then stopped, or if it produced 'the right answer.' Programmers nearly always must rely on highly indirect measures to determine what happens when their programs execute. This is one reason why debugging is so difficult."[Digital Woes, Lauren Ruth Weiner, 1993, Addison-Wesley]
%A Alexander Klaiber %I Transmeta Corporation %T The Technology Behind Crusoe(tm) Processors %R From http://www.transmeta.com/pdf/white_papers/paper_aklaiber_19jan00.pdf as of 2002/08/19. %D 2000White paper on Crusoe, emulation.
Kumar et al., emulation Verification of the Motorola 68060, Proceedings, ICCD, 1995, pp. 150-158. Note et al., Rapid Prototyping of DSP Systems: Requirements and Solutions, 6th IEEE Int'l Wkshp on RSP, 1995, pp. 40-47. Tremblay et al., A Fast and Flexible Performance Simulator for Micro-Architecture Trade-off Analysis on Ultrasparc-1 '1995, pp 2. Rosenberg, J.M., Dictionary of Computers, Information Processing & Telecommunications, John Wiley & Sons, pp 382
ACM Transactions on Computer Systems (TOCS) Volume 15 , Issue 4 (November 1997)
Continuous profiling: where have all the cycles gone?
Authors Jennifer M. Anderson Digital Equipment Corp., Palo Alto, CA William E. Weihl Digital Equipment Corporation, Palo Alto, CA Lance M. Berc Digital Equipment Corp., Palo Alto, CA Jeffrey Dean Digital Equipment Corp., Palo Alto, CA Sanjay Ghemawat Digital Equipment Corp., Palo Alto, CA Monika R. Henzinger Digital Equipment Corp., Palo Alto, CA Shun-Tak A. Leung Digital Equipment Corporation, Palo Alto, CA Richard L. Sites Digital Equipment Corporation, Palo Alto, CA Mark T. Vandevoorde Digital Equipment Corporation, Palo Alto, CA Carl A. Waldspurger Digital Equipment Corporation, Palo Alto, CA
Publisher ACM Press New York, NY, USA Pages: 357 - 390 Periodical-Issue-Article Year of Publication: 1997 ISSN:0734-2071
ABSTRACT This article describes the Digital Continuous Profiling Infrastructure, a sampling-based profiling system designed to run continuously on production systems. The system supports multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel. Samples are collected at a high rate (over 5200 samples/sec. per 333MHz processor), yet with low overhead (1-3% slowdown for most workloads). Analysis tools supplied with the profiling system use the sample data to produce a precise and accurate accounting, down to the level of pipeline stalls incurred by individual instructions, of where time is bring spent. When instructions incur stalls, the tools identify possible reasons, such as cache misses, branch mispredictions, and functional unit contention. The fine-grained instruction-level analysis guides users and automated optimizers to the causes of performance problems and provides important insights for fixing them.
Software Profiling for Hot Path Prediction: Less is More, Evelyn Duesterwald (duester@hpl.hp.com), Vasanth Bala (vas@hpl.hp.com), HPL 2000 DOC WEB
Questions: Can Valgrind run itself? The -z trick suggests no. Probably no SSE/SSE2. Further explanation of nested system calls (how they arise) would be useful.
@article{rf-specifying-instructions:97, author="Norman Ramsey and Mary F. Fernandez", title="{S}pecifying {R}epresentations of {M}achine {I}nstructions", journal="ACM Transactions on Programming Languages and Systems", volume = "19", number = "3", pages = "492--524", month="May", year="1997" }
@techreport{larsson-sim-from-spec:97, name = "F. Larsson", title="{G}enerating {E}fficient {S}imulators from a {S}pecification {L}anguage", institution="Swedish Institute of Computer Science", year="1997" }
@article{pzrm-fast-320C54x:97, author="S. Pees, V. Zivojnovic, A. Ropers, H. Meyr", title="{F}ast {S}imulation of the {T}{I} {T}{M}{S}320{C}54x {D}{S}{P}", journal="International Conference on Signal Processing Applications and Technology}, pages = "995-999", month="September", year="1997" }
A simulator is a powerful tool for hardware as well as software development. However, implementing an efficient simulator by hand is a very labour intensive and error-prone task. This paper describes a tool for automatic generation of efficient instruction set architecture (ISA) simulators. A specification file describing the ISA is used as input to the tool. Besides a simulator, the tool also generates an assembler and a disassembler for the architecture. We present a method where statistics is used to identify frequently used instructions. Special versions of these instructions are then created by the tool in order to speed up the simulator. With this technique we have generated a SPARC V8 simulator which is more efficient than our hand-coded and hand-optimized one.''
Instruction-set simulators allow programmers a detailed level of insight into, and control over, the execution of a program, including parallel programs and operating systems. In principle, instruction set simulation can model any target computer and gather any statistic. Furthermore, such simulators are usually portable, independent of compiler tools, and deterministic-allowing bugs to be recreated or measurements repeated. Though often viewed as being too slow for use as a general programming tool, in the last several years their performance has improved considerably. We describe SIMICS, an instruction set simulator of SPARC-based multiprocessors developed at SICS, in its role as a general programming tool. We discuss some of the benefits of using a tool such as SIMICS to support various tasks in software engineering, including debugging, testing, analysis, and performance tuning. We present in some detail two test cases, where we've used SimICS to support analysis and performance tuning of two applications, Penny and EQNTOTT. This work resulted in improved parallelism in, and understanding of, Penny, as well as a performance improvement for EQNTOTT of over a magnitude. We also present some early work on analyzing SPARC/Linux, demonstrating the ability of tools like SimICS to analyze operating systems. (NOTE: A later version of this report was published in ILPS'97)