Flow-Based Function Customization in the Presence of Representation Pollution, by Allyn Dimock, Ian Westmacott, Robert Muller, Franklyn Turbak and Joe Wells. Submitted, November 16, 2000.
Abstract: The CIL compiler for core Standard ML compiles whole ML programs using a novel typed intermediate language that supports the generation of type-safe customized data representations. In this paper, we present empirical data comparing the relative efficacy of several different customization strategies for function representations. We develop a cost model to interpret dynamic counts of operations required for each strategy. One of our results is data showing that compiling with a function representation strategy that makes customization decisions based on the presence or absence of free variables of a function and which removes representation pollution by introducing multiway dispatch (what we call the selective sink splitting strategy) can produce better code than that produced by a defunctionalizing strategy similar to that in the literature.
[Compressed Postscript (144K)] [DVI (175K)]
Program Representation Size in an Intermediate Language with Intersection and Union Types, by Allyn Dimock, Ian Westmacott, Robert Muller, Franklyn Turbak, Joe Wells and Jeff Considine. To appear in the proceedings of the Third International Workshop on Types in Compilation, Montreal, CA, September 22, 2000.
Abstract: The CIL compiler for core Standard ML compiles whole programs using the CIL typed intermediate language with flow labels and intersection and union types. Flow labels embed flow information in the types and intersection and union types support precise polyvariant type and flow information, without the use of type-level abstraction or quantification.
Compile-time representations of CIL types and terms are potentially large compared to those for similar types and terms in systems based on quantified types. The listing-based nature of intersection and union types, together with flow label annotations on types, contribute to the size of CIL types. The CIL term representation duplicates portions of the program where intersection types are introduced and union types are eliminated. This duplication makes it easier to represent type information and to introduce multiple representation conventions, but incurs a compile-time space cost.
This paper presents empirical data on the compile-time space costs of using CIL. These costs can be made tractable using standard hash-consing techniques. Surprisingly, the duplicating nature of CIL has acceptable compile-time space performance in practice on the benchmarks and flow analyses that we have investigated. Increasing the precision of flow analysis can significantly reduce compile-time space costs. There is also preliminary evidence that the flow information encoded in CIL's type system can effectively guide data customization.
This paper supersedes an earlier version [DimockWestmacottMullerTurbakWellsConsidine00b] which appeared in an informal proceedings of the workshop.
[Compressed Postscript (231K)] [DVI (313K)]
Space Issues in Compiling with Intersection and Union Types, by Allyn Dimock, Ian Westmacott, Robert Muller, Franklyn Turbak, Joe Wells and Jeff Considine. In the Informal Proceedings of the Third International Workshop on Types in Compilation, Montreal, CA, September 22, 2000. The Informal Proceedings were published as CMU Technical Report CMU-CS-00-161.
Abstract: The CIL compiler for core Standard ML compiles whole programs using the CIL typed intermediate language with flow labels and intersection and union types. Flow labels embed flow information in the types and intersection and union types support precise polyvariant type and flow information, without the use of type-level abstraction or quantification.
Compile-time representations of CIL types and terms are potentially large compared to those for similar types and terms in systems based on quantified types. The listing-based nature of intersection and union types, together with flow label annotations on types, contribute to the size of CIL types. The CIL term representation duplicates portions of the program where intersection types are introduced and union types are eliminated. This duplication makes it easier to represent type information and to introduce multiple representation conventions, but incurs a compile-time space cost.
This paper presents empirical data on the compile-time space costs of using CIL. These costs can be made tractable using standard hash-consing techniques. Surprisingly, the duplicating nature of CIL has acceptable compile-time space performance in practice on the benchmarks and flow analyses that we have investigated. Increasing the precision of flow analysis can significantly reduce compile-time space costs. There is also preliminary evidence that the flow information encoded in CIL's type system can effectively guide data customization.
This preliminary version of the paper is superseded by [DimockWestmacottMullerTurbakWellsConsidine00a].
[Compressed Postscript (225K)] [DVI (278K)] [Bibtex Citation]
A Calculus with Polymorphic and Polyvariant Flow Types, by Joe Wells, Allyn Dimock, Robert Muller and Franklyn Turbak. To appear in the Journal of Functional Programming, Cambridge University Press.
Abstract: We present the CIL calculus, a typed lambda calculus which is intended to serve as the basis of an intermediate language for optimizing compilers for higher-order polymorphic programming languages. The language CIL has function, product, sum and recursive types. In addition, CIL has a novel formulation of intersection and union types and flow labels on both terms and types. The calculus is formulated to support the integration of polyvariant flow information in a polymorphically typed program representation. These flow types can guide a compiler in generating customized data representations in a strongly typed setting. We prove that the calculus satisfies confluence and subject reduction properties.
This paper supersedes [WellsDimockMullerTurbak97].
[Compressed Postscript (259K)] [DVI (262K)] [Bibtex Citation]
Sufficient Conditions for the Existence of Unique Minimal Fixpoints of Nonmonotomic Functions, by Yuli Zhou and Robert Muller, To appear in the Logic Journal of the IGPL, Oxford University Press.
Abstract: We prove several lattice theoretical fixpoint theorems based on the classical theorem of Knaster and Tarski. These theorems give sufficient conditions for a system of generally nonmonotonic functions on a complete lattice to define a unique minimal fixpoint. The primary objective of this paper is to develop a domain theoretic framework to study the semantics of logic programs as well as various rule-based systems where the rules define generally nonmonotonic functions on lattices.
This paper supersedes [ZhouMuller90].
[Compressed Postscript (K)] [DVI (90K)]
Two Applications of Standardization and Evaluation in Combinatory Reduction Systems, by Robert Muller and Joe Wells. Revised version under review.
Abstract: This paper presents two worked applications of a general framework that can be used to support reasoning about the operational equality relation defined by a programming language semantics. The framework, based on Combinatory Reduction Systems, facilitates the proof of standardization theorems for programming calculi. The importance of standardization theorems to programming language semantics was shown by Plotkin: standardization together with confluence guarantee that two terms equated in the calculus are semantically equal. The framework is applied to the call-by-value lambda calculus and to an untyped version of the CIL calculus. The latter is a basis for an intermediate language being used in a compiler.
This and the following paper are companion papers.
[Compressed Postscript (253K)] [DVI (229K)] [Bibtex Citation]
Standardization and Evaluation in Combinatory Reduction Systems, by Joe Wells and Robert Muller. Working paper.
Abstract: A rewrite system has standardization iff for any rewrite sequence there is an equivalent one which contracts the redexes in a standard order. Standardization is extremely useful for finding normalizing strategies and proving that a rewrite system for a programming language is sound with respect to the language's operational semantics.
Although for some rewrite systems the standard-order can be simple, e.g., left-to-right or outermost-first, many systems need a more delicate order. There are abstract notions of standard order which always apply, but proofs (often quite difficult) are required that the rewrite system satisfies a number of axioms and not much guidance is provided for finding a concrete order that satisfies the abstract definition.
This paper gives a framework based on combinatory reduction systems (CRS's) which is general enough to handle many programming languages. If the CRS is orthogonal and fully extended and a good redex ordering can be found, then a standard order is obtained together with the standardization theorem. If the CRS also satisfies further criteria, then a good redex ordering is mechanically obtained from the rewrite rules. If the CRS is a constructor system and satisfies an additional requirement, then definitions of value and evaluation providing an operational semantics are automatically obtained together with a Plotkin/Wadsworth/Felleisen-style standardization theorem.
This and the preceding paper are companion papers.
Region Inference with Rank-2 Intersection Types and its Formalization in Isabelle, by Ian Westmacott, Robert Muller, Torben Amtoft and Joe Wells. Working paper.
Abstract: This working paper develops a Tofte-Talpin style region inference system for terms in a polymorphic programming language with rank-2 intersection types. This subsumes core ML. Region inference systems translate untyped programs into equivalent programs containing explicit storage management annotations that allocate and deallocate storage in a stack-like manner. We prove the soundness of our translation with respect to the operational semantics. Except where noted, all of the definitions, axioms, lemmas and theorems appearing in this paper have been formalized in Isabelle and all proofs have been mechanically verified.
Comments welcome!
Compiling with Polymorphic and Polyvariant Flow Types, by Franklyn Turbak, Allyn Dimock, Robert Muller and Joe Wells, in the Proceedings of the First International Workshop on Types in Compilation, Amsterdam, June, 1997.
Abstract: Optimizing compilers for function-oriented and object-oriented languages exploit type and flow information for efficient implementation. Although type and flow information (both control and data flow) are inseparably intertwined, compilers usually compute and represent them separately. Partially, this has been a result of the usual polymorphic type systems using universal and existential quantifiers, which are difficult to use in combination with flow annotations.
In the Church Project, we are experimenting with intermediate languages that integrate type and flow information into a single flow type framework. This integration facilitates the preservation of flow and type information through program transformations. In this paper we describe CIL, an intermediate language supporting polymorphic types and polyvariant flow information and describe its application in program optimiziation. We are experimenting with this intermediate language in a flow and type-directed compiler for a functional language. The types of CIL encode flow information (1) via labels that approximate sources and sinks of values and (2) via intersection and union types, finitary versions of universal and existential types that support type polymorphism and (in combination with the labels) polyvariant flow analysis. Accurate flow types expose opportunities for a wide range of optimizing program transformations.
[Compressed Postscript (119K)] [DVI (67K)] [Bibtex Citation]
Strongly Typed Flow-Directed Representation Transformations, by Allyn Dimock, Robert Muller, Franklyn Turbak and Joe Wells, in the Proceedings of the International Conference on Functional Programming, Amsterdam, June, 1997.
Abstract: We present a new framework for transforming data representations in a strongly typed intermediate language. Our method allows both value producers (sources) and value consumers (sinks) to support multiple representations, automatically inserting any required code. % to handle these possibilities. Specialized representations can be easily chosen for particular source/sink pairs. The framework is based on these techniques:
[Compressed Postscript (187K)] [Bibtex Citation]
A Typed Intermediate Language for Flow-Directed Compilation , by Joe Wells, Allyn Dimock, Robert Muller and Franklyn Turbak, in the Proceedings of TAPSOFT'97: Theory and Practice of Software Development TAPSOFT (CAAP/FASE), 7th International Joint Conference, Lille, France, April, 1997.
Abstract: We present a typed intermediate language for optimizing compilers for function-oriented and polymorphically typed programming languages (e.g., ML). The language CIL is a typed lambda calculus with product, sum, intersection, and union types as well as function types annotated with flow labels. A novel formulation of intersection and union types supports encoding flow information in the typed program representation. This flow information can direct optimization.
This paper is superseded by [WellsDimockMullerTurbak99].
[Compressed Postscript (143K)] [Bibtex Citation]
A Staging Calculus and its Application to the Verification of Translators (Preliminary Report), in the Proceedings of the 21st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages , pp. 389 - 396.
Abstract: We develop a calculus in which the computation steps required to execute a computer program can be separated into discrete stages. The calculus, denoted lambda-2, is embedded within the pure untyped lambda calculus. The main result of the paper is a characterization of sufficient conditions for confluence for terms in the calculus. The condition can be taken as a correctness criterion for translators that perform reductions in one stage leaving residual redexes over for subsequent computation stages. As an application of the theory, we verify the correctness of a macro expansion algorithm. The expansion algorithm is of some interest in its own right since it solves the problem of desired variable capture using only the familiar capture avoiding substitutions.
[Compressed Postscript (92K)] [DVI (62K)] [Bibtex Citation]
Safe and Decidable Type Checking in an Object-Oriented Language , by Kim B. Bruce, Jon Crabtree, Allyn Dimock, Robert Muller, Thomas P. Murtagh, Robert van Gent, in the Proceedings of the ACM OOPSLA '93 Conference on Object-Oriented Programming Systems, Languages and Applications.
Abstract: Over the last several years, much interesting work has been done in modelling object-oriented programming languages in terms of extensions of the bounded second-order lambda calculus, $F_{\le}$. Unfortunately, it has recently been shown by Pierce (\cite{Pierce92}) that type checking $F_{\le}$ is undecidable. Moreover he showed that the undecidability arises in the seemingly simpler problem of determining whether one type is a subtype of another. In \cite{Bruce92,BrPOPL93}, the first author introduced a statically-typed, functional, object-oriented programming language, TOOPL, which supports classes, objects, methods, instance variables, subtypes, and inheritance. The semantics of TOOPL is based on $F_{\le}$, so the question arises whether type checking in this language is decidable. In this paper we show that type checking for TOOPLE, a minor variant of TOOPL (Typed Object-Oriented Programming Language), is decidable. The proof proceeds by showing that subtyping is decidable, that all terms of TOOPLE have minimum types (which are in fact computable), and then using these two results to show that type checking is decidable. Interestingly, conditional statements introduce significant problems which necessitated the computation of generalized least upper bounds of types. The interaction of the least upper bounds with the implicit recursion in object and class definitions and the contravariant nature of function spaces makes the computation of appropriate least upper bounds more subtle than might be expected. Our algorithm fails to be polynomial in the size of the term because the size of the type of a term can be exponential in the size of the term. Nevertheless, it performs well in practice.
This paper concentrates on the language without instance variables, though the results can be extended to the full language, at the cost of some added complexity.
[Compressed Postscript (155K)] [DVI (110K)] [Bibtex Citation]
M-LISP: A Representation Independent Dialect of LISP with Reduction Semantics, by Robert Muller, ACM Transactions on Programming Languages and Systems, Vol. 14, No. 4, October, 1992, pp. 589-616.
Abstract: In this paper we introduce M-LISP, a simple new dialect of LISP which is designed with an eye toward reconciling LISP's metalinguistic power with the structural style of operational semantics advocated by Plotkin [Plo75]. We begin by reviewing the original definition of LISP [Mc60] in an attempt to clarify the source of its metalinguistic power. We find that it arises from a problematic clause in this definition. We then define the abstract syntax and operational semantics of M-LISP, essentially a hybrid of M-expression LISP and Scheme. Next, we tie the operational semantics to the corresponding equational logic. As usual, provable equality in the logic implies operational equality. Having established this framework we then extend M-LISP with the metalinguistic eval and reify operators (the latter is a non-strict operator which converts its argument to its metalanguage representation.) These operators encapsulate the metalinguistic representation conversions that occur globally in S-expression LISP. We show that the naive versions of these operators render LISP's equational logic inconsistent. On the positive side, we show that a naturally restricted form of the eval operator is confluent and therefore is a conservative extension of M-LISP. Unfortunately, we must weaken the logic considerably to obtain a consistent theory of reification.
This paper supersedes [Muller91].
[Compressed Postscript (149K)] [DVI (105K)] [Bibtex Citation]
Abstract Interpretation in Weak Powerdomains, by Robert Muller and Yuli Zhou, in the Proceedings of the ACM Conference on LISP and Functional Programming, June 1992.
Abstract: We introduce a class of semantic domains, weak powerdomains, that are intended to serve as value spaces for abstract interpretations in which safety is a concern. We apply them to the analysis of PCF programs. In the classical abstract interpretation approach, abstract domains are constructed explicitly and the abstract semantics is then related to the concrete semantics. In the approach presented here, abstract domains are derived directly from concrete domains. The conditions for deriving the domains are intended to be as general as possible while still guaranteeing that the derived domain has sufficient structure so that it can be used as a basis for computing correct information about the concrete semantics. We prove three main theorems, the last of which ensures the correctness of abstract interpretation of PCF programs given safe interpretations of the constants. This generalizes earlier results obtained for the special case of strictness analysis.
[Compressed Postscript (102K)] [DVI (58K)] [Bibtex Citation]
MLISP: Its Natural Semantics and Equational Logic (Extended Abstract), by Robert Muller, in the Proceedings of the ACM SIGPLAN and IFIP Symposium on Partial Evaluation and Semantics Based Program Manipulation , June 1991.
Abstract: The LISP evaluator is a virtual machine analog of the stored-program computer on which it executes --- it has universal power and dynamically constructed representations of programs can be converted by the eval operator into executable programs. In this paper we study the natural operational semantics and equational logic of a dialect of pure LISP and an extension which includes the eval operator and fexprs (i.e., non-strict functions whose arguments are passed by-representation). We begin by defining a natural operational semantics for the pure subset of M-LISP, a simple, representation-independent hybrid of McCarthy's original M-expression LISP and Scheme. We then establish the connection between the semantics and its equational logic in the usual way and prove that the logic is sound and consistent. With this as our setting we define the axioms and inference rules which give a deterministic operational semantics for the eval operator and fexprs. While the new constructions achieve the same ends as their S-expression LISP counterparts, they do so in somewhat different ways. The new versions encapsulate within the constructions the representation decoding which occurs globally in S-expression LISP. Turning to the equational logics we confirm the folklore that these constructs are in some sense unreasonable: as they are traditionally defined, neither construct is Church-Rosser (CR). On the positive side we show that a naturally restricted form of the eval operator is CR and therefore its equational theory is consistent.
This paper is superseded by [Muller92].
[Compressed Postscript (106K)] [DVI (67K)] [Bibtex Citation]
Domain Theory for Nonmonotonic Functions, by Yuli Zhou and Robert Muller, Proceedings of the Second International Conference on Algebraic and Logic Programming, Nancy, France, October 1990.
Abstract: We prove several lattice theoretical fixpoint theorems based on the classical theorem of Knaster and Tarski. These theorems give sufficient conditions for a system of generally nonmonotonic functions on a complete lattice to define a unique minimal fixpoint. The primary objective of this paper is to develop a domain theoretic framework to study the semantics of logic programs as well as various rule-based systems where the rules define generally nonmonotonic functions on lattices.
This paper is superseded by [ZhouMuller98].
[Compressed Postscript (117K)] [DVI (69K)] [Bibtex Citation]
Proceedings of the First International Workshop on Types in Compilation (TIC97), Amsterdam, The Netherlands, June, 1997, published as Technical Report BCCS-97-03, Computer Science Department, Boston College.
Abstract: This is the informal proceedings of the TIC97 workshop.
Lecture Notes on Domain Theory, Harvard University CRCT Technical Report TR-06-92.
Abstract: What follows is a collection of lecture notes for a series of three lectures introducing Scott's {\em domain theory}. The lecture notes are not intended to serve as a primary reference but rather as a supplement to a more comprehensive treatment such as \cite{sc86} or \cite{st77}.
Abstract Semantics of First-Order Recursive Schemes, with Y. Zhou, Harvard University CRCT Technical Report TR-08-91.
Abstract: We develop a general framework for deriving abstract domains from concrete semantic domains in the context of first-order recursive schemes and prove several theorems which ensure the correctness (safety) of abstract computations. The abstract domains, which we call {\em Weak Hoare powerdomains}, subsume the roles of both the abstract domains and the collecting interpretations in the abstract interpretation literature.
Semantic Prototyping in M-LISP: A Representation Independent Dialect of LISP with Reduction Semantics, Harvard University CRCT Technical Report TR-05-90.
Abstract: In this paper we describe a new semantic metalanguage which simplifies prototyping of programming languages. The system integrates Paulson's semantic grammars within a new dialect of LISP, M-LISP, which has somewhat closer connections to the $\lambda$-calculus than other LISP dialects such as Scheme. The semantic grammars are expressed as attribute grammars. The generated parsers are M-LISP functions that can return denotational (i.e., higher-order) representations of abstract syntax. We illustrate the system with several examples and compare it to related systems.
Syntax Macros in M-LISP: A Representation Independent Dialect of LISP with Reduction Semantics, Harvard University CRCT Technical Report TR-04-90.
Abstract: In this paper we present an efficient algorithm for avoiding unintended name captures during syntax macro transcription in LISP. The algorithm is a hybrid of Kohlbecker's {\em Macro-by-Example} and {\em Hygienic} algorithms adapted for a representation-independent dialect of LISP. The adaptation yields a substantially different model of syntax macros than is found in S-expression LISP dialects. The most important difference is that $\lambda$-binding patterns become apparent when an abstraction is first (i.e., partially) transcribed in the syntax tree. This allows us to avoid a larger class of name captures than is possible in S-expression dialects such as Scheme where $\lambda$-binding patterns are not apparent until the tree is completely transcribed.