Dr. Okasaki spent three years at Columbia University as an Assistant Professor of Computer Science, where he taught courses in programming languages and advanced data structures. He has also worked as a visiting researcher at the University of Glasgow, and as a consultant for an Internet startup company, developing a compiler for their agent control language. His primary research interests are programming languages and algorithms. He is especially interested in the combination of these two areas, considering questions of how the details of a programming language affect the implementation and efficiency of algorithms.
Copyright Notice The documents contained in these pages are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder. Index by Topic
Alternatives to Two Classic Data Structures by Chris Okasaki. Symposium on Computer Science Education (SIGCSE), February 2005, pages 162-165.
Download: PDF, source code
Abstract: Red-black trees and leftist heaps are classic data structures that are commonly taught in Data Structures (CS2) and/or Algorithms (CS7) courses. This paper describes alternatives to these two data structures that may offer pedagogical advantages for typical students.
Functional Data Structures by Chris Okasaki. Chapter 40 of Handbook of Data Structures and Applications, October 2004, pages 40-1 to 40-17.
Abstract: An introduction to functional data structures for a non-functional programming audience. Examples are implemented in both Haskell and Java.
Flattening Combinators: Surviving Without Parentheses by Chris Okasaki. Journal of Functional Programming, 13(4):815-822, July 2003.
Download: PDF
Abstract: A combinator expression is flat if it can be written without parentheses, that is, if all applications nest to the left, never to the right. This note explores a simple method for flattening combinator expressions involving arbitrary combinators.
Fun with binary heap trees by Chris Okasaki. The Fun of Programming, March 2003, pages 1-16.
Abstract: Overview of functional data structures, using examples related to binary heap trees. Introduces maxiphobic heaps, a new kind of priority queue especially suitable for teaching.
Techniques for Embedding Postfix Languages in Haskell by Chris Okasaki. Haskell Workshop, October 2002, pages 105-113.
Download: PDF, source code
Abstract: One popular use for Haskell in recent years has been as a host language for domain-specific embedded languages. But how can one embed a postfix language in Haskell, given that Haskell only supports prefix and infix syntax? This paper describes several such embeddings for increasingly complex postfix languages.
Breadth-First Numbering: Lessons from a Small Exercise in Algorithm Design by Chris Okasaki. International Conference on Functional Programming, September 2000, pages 131-136.
Download: PDF
Abstract: Every programmer has blind spots. Breadth-first numbering is an interesting toy problem that exposes a blind spot common to many--perhaps most--functional programmers.
An Overview of Edison by Chris Okasaki. Electronic Notes in Theoretical Computer Science, 41(1), 2001. [Original version in Haskell Workshop, September 2000, pages 34-45.]
Download: PDF
Abstract: Edison is a library of functional data structures implemented in Haskell. It supports three main families of abstractions: sequences, collections (e.g., sets and priority queues), and associative collections (e.g., finite maps). This paper summarizes the design of Edison, with particular attention to how that design is influenced by details of Haskell.
Simple Confluently Persistent Catenable Lists by Haim Kaplan, Chris Okasaki, and Robert Endre Tarjan. SIAM Journal on Computing, 30(3):965-977, 2000. [Original version in Scandinavian Workshop on Algorithm Theory, July 1998, pages 119--130.]
Download: PDF
Abstract: We consider the problem of maintaining persistent lists subject to concatenation and to insertions and deletions at both ends. Updates to a persistent data structure are nondestructive--each operation produces a new list incorporating the change while keeping intact the list or lists to which it applies. Although general techniques exist for making data structures persistent, these techniques fail for structures that are subject to operations, such as catenation, that combine two or more versions. In this paper we develop a simple implementation of persistent double-ended queues with catenation that supports all deque operations in constant amortized time.
Workshop on Algorithmic Aspects of Advanced Programming Languages (WAAAPL'99), edited by Chris Okasaki. September 30, 1999.
Download: PDF
Description: Proceedings from the workshop.
From Fast Exponentiation to Square Matrices: An Adventure in Types by Chris Okasaki. International Conference on Functional Programming, September 1999, pages 28-35.
Download: PDF
Abstract: Square matrices serve as an interesting case study in functional programming. Common representations, such as lists of lists, are both inefficient--at least for access to individual elements--and error-prone, because the compiler cannot enforce ``squareness''. Switching to a typical balanced-tree representation solves the first problem, but not the second. We develop a representation that solves both problems: it offers logarithmic access to each individual element and it captures the shape invariants in the type, where they can be checked by the compiler. One interesting feature of our solution is that it translates the well-known fast exponentiation algorithm to the level of types. Our implementation also provides a stress test for today's advanced type systems--it uses nested types, polymorphic recursion, higher-order kinds, and rank-2 polymorphism.
Red-Black Trees in a Functional Setting by Chris Okasaki. Journal of Functional Programming, 9(4):471-477, July 1999.
Download: PDF
Abstract: We present an exceptionally simple and fast implementation of balanced binary search trees. Both the simplicity and the speed result from re-thinking important design decisions in a functional, rather than imperative, setting.
Book Review of Computability and Complexity: From a Programming Perpective by Neil D. Jones. Review by Chris Okasaki. Journal of Functional Programming, 9(4):481-482, July 1999.
Safe-for-Space Threads in Standard ML by Edoardo Biagioni, Ken Cline, Peter Lee, Chris Okasaki, and Chris Stone. Higher-Order and Symbolic Computation, 11(2):209-225, December 1998. [Original version in 2nd ACM SIGPLAN Workshop on Continuations, January 1997, pages 3:1-16.]
Download: PDF, source code
Abstract: Threads can be implemented easily in an extension of Standard ML by using first-class continuations, but the straightforward approaches for doing so lead to space leaks. We show how these space leaks arise and give a new implementation for threads that is safe for space.
Views for Standard ML by Chris Okasaki. Workshop on ML, September 1998, pages 14-23.
Download: PDF
Abstract: In Standard ML, as in many other languages, programmers are often confronted with an unpleasant choice between pattern matching and abstraction. Because pattern matching can only be performed on concrete datatypes, programmers must often sacrifice either the convenience of pattern matching or the engineering benefits of abstraction. Views relieve this tension by allowing pattern matching on abstract datatypes. We propose a modest extension of Standard ML with views and define its semantics via a source-to-source translation back into Standard ML without views. We claim no particular technical innovation; rather, we have attempted to engineer a solution that blends as seamlessly as possible with the rest of the language, including the module system and the stateful features of the language.
Fast Mergeable Integer Maps by Chris Okasaki and Andy Gill. Workshop on ML, September 1998, pages 77-86.
Download: PDF, source code
Abstract: Finite maps are ubiquitous in many applications, but perhaps nowhere more so than in compilers and other language processors. In these applications, three operations on finite maps dominate all others: looking up the value associated with a key, inserting a new binding, and merging two finite maps. Most implementations of finite maps in functional languages are based on balanced binary search trees, which perform well on the first two, but poorly on the third. We describe an implementation of finite maps with integer keys that performs well in practice on all three operations. This data structure is not new--indeed, it is thirty years old this year--but it deserves to be more widely known.
Purely Functional Data Structures by Chris Okasaki. Cambridge University Press, 1998.
This is a revised and expanded version of my thesis. New data structures include red-black trees, splay trees, pairing heaps, leftist heaps, tries, and Hood-Melville queues. I also provide numerous exercises, as well as Haskell translations of each of the major data structures.
Reviewed in Journal of Functional Programming (September 1999), Computing Reviews (March 1999), and Science of Computer Programming (November 1999).
Source code: Standard ML, Haskell, Markus Mottl's translation into OCAML
Even Higher-Order Functions for Parsing or Why Would Anyone Ever Want To Use a Sixth-Order Function? by Chris Okasaki. Journal of Functional Programming, 8(2):195-199, March 1998.
Download: PDF
Abstract: We illustrate the use of third-, fourth-, fifth-, and even sixth-order functions with examples taken from a combinator parsing library.
Three Algorithms on Braun Trees by Chris Okasaki. Journal of Functional Programming, 7(6):661-666, November 1997.
Download: PDF
Abstract: We explore algorithms for several problems involving a family of balanced trees known as Braun trees. In each case, the final algorithm achieves its efficiency by exploiting the particularly stringent notion of balance embodied in Braun trees.
Catenable Double-Ended Queues by Chris Okasaki. International Conference on Functional Programming, June 1997, pages 66-74.
Download: PDF
Abstract: Catenable double-ended queues are double-ended queues (deques) that support catenation (i.e., append) efficiently without sacrificing the efficiency of other operations. We present a purely functional implementation of catenable deques for which every operation, including catenation, takes O(1) amortized time. Kaplan and Tarjan have independently developed a much more complicated implementation of catenable deques that achieves similar worst-case bounds. The two designs are superficially similar, but differ in the underlying mechanism used to achieve efficiency in a persistent setting (i.e., when used in a non-single-threaded fashion). Their implementation uses a technique called recursive slowdown, while ours relies on the simpler mechanism of lazy evaluation.
Besides lazy evaluation, our implementation also exemplifies the use of two additional language features: polymorphic recursion and views. Neither is indispensable, but both significantly simplify the presentation.
Purely Functional Data Structures by Chris Okasaki. Ph.D. thesis. School of Computer Science, Carnegie Mellon University. September 1996.
Abstract: When a C programmer needs an efficient data structure for a particular problem, he or she can often simply look one up in any of a number of good textbooks or handbooks. Unfortunately, programmers in functional languages such as Standard ML or Haskell do not have this luxury. Although some data structures designed for imperative languages such as C can be quite easily adapted to a functional setting, most cannot, usually because they depend in crucial ways on assignments, which are disallowed, or at least discouraged, in functional languages. To address this imbalance, we describe several techniques for designing functional data structures, and numerous original data structures based on these techniques, including multiple variations of lists, queues, double-ended queues, and heaps, many supporting more exotic features such as random access or efficient catenation.
In addition, we expose the fundamental role of lazy evaluation in amortized functional data structures. Traditional methods of amortization break down when old versions of a data structure, not just the most recent, are available for further processing. This property is known as persistence, and is taken for granted in functional languages. On the surface, persistence and amortization appear to be incompatible, but we show how lazy evaluation can be used to resolve this conflict, yielding amortized data structures that are efficient even when used persistently. Turning this relationship between lazy evaluation and amortization around, the notion of amortization also provides the first practical techniques for analyzing the time requirements of non-trivial lazy programs.
Finally, our data structures offer numerous hints to programming language designers, illustrating the utility of combining strict and lazy evaluation in a single language, and providing non-trivial examples using polymorphic recursion and higher-order, recursive modules.
Functional Data Structures by Chris Okasaki. Advanced Functional Programming, August 1996, pages 131-158, LNCS 1129.
Download: PDF, source code.
Abstract: Efficient data structures have been studied extensively for over thirty years. When a C programmer needs an efficient data structure for a particular problem, he or she can often simply look one up in any of a number of good textbooks or handbooks. However, the same cannot be said for the ML or Haskell programmer. Although some imperative data structures can be adapted quite easily to a functional setting, most cannot.
In this lecture, we will examine how functional data structures differ from imperative data structures and explore efficient functional implementations of several common abstractions, including FIFO queues, catenable lists, and meldable priority queues. A running theme will be the use of lazy evaluation in efficient amortized data structures.
Optimal Purely Functional Priority Queues by Gerth Stølting Brodal and Chris Okasaki. Journal of Functional Programming, 6(6):839-857, November 1996.
Download: PDF, source code.
Abstract: Brodal recently introduced the first implementation of imperative priority queues to support findMin, insert, and meld in O(1) worst-case time, and deleteMin in O(log n) worst-case time. These bounds are asymptotically optimal among all comparison-based priority queues. In this paper, we adapt Brodal's data structure to a purely functional setting. In doing so, we both simplify the data structure and clarify its relationship to the binomial queues of Vuillemin, which support all four operations in O(log n) time. Specifically, we derive our implementation from binomial queues in three steps: first, we reduce the running time of insert to O(1) by eliminating the possibility of cascading links; second, we reduce the running time of findMin to O(1) by adding a global root to hold the minimum element; and finally, we reduce the running time of meld to O(1) by allowing priority queues to contain other priority queues. Each of these steps is expressed using ML-style functors. The last transformation, known as data-structural bootstrapping, is an interesting application of higher-order functors and recursive structures.
The Role of Lazy Evaluation in Amortized Data Structures by Chris Okasaki. International Conference on Functional Programming, May 1996, pages 62-72.
Download: PDF, source code
Abstract: Traditional techniques for designing and analyzing amortized data structures in an imperative setting are of limited use in a functional setting because they apply only to single-threaded data structures, yet functional data structures can be non-single-threaded. In earlier work, we showed how lazy evaluation supports functional amortized data structures and described a technique (the banker's method) for analyzing such data structures. In this paper, we present a new analysis technique (the physicist's method) and show how one can sometimes derive a worst-case data structure from an amortized data structure by appropriately scheduling the premature execution of delayed components. We use these techniques to develop new implementations of FIFO queues and binomial queues.
Amortization, Lazy Evaluation, and Persistence: Lists with Catenation via Lazy Linking by Chris Okasaki. IEEE Symposium on Foundations of Computer Science, October 1995, pages 646-654.
Download: PDF, source code
Abstract: Amortization has been underutilized in the design of persistent data structures, largely because traditional accounting schemes break down in a persistent setting. Such schemes depend on saving ``credits'' for future use, but a persistent data structure may have multiple ``futures'', each competing for the same credits. We describe how lazy evaluation can often remedy this problem, yielding persistent data structures with good amortized efficiency. In fact, such data structures can be implemented purely functionally in any functional language supporting lazy evaluation. As an example of this technique, we present a purely functional (and therefore persistent) implementation of lists that simultaneously support catenation and all other usual list primitives in constant amortized time. This data structure is much simpler than the only existing data structure with comparable bounds, the recently discovered catenable lists of Kaplan and Tarjan, which support all operations in constant worst-case time.
Simple and Efficient Purely Functional Queues and Deques by Chris Okasaki. Journal of Functional Programming, 5(4):583-592, October 1995.
Download: PDF, source code
Abstract: We present purely functional implementations of queues and double-ended queues (deques) requiring only O(1) time per operation in the worst case. Our algorithms are considerably simpler than previous designs with the same bounds. The inspiration for our approach is the incremental behavior of certain functions on lazy lists.
Purely Functional Random-Access Lists by Chris Okasaki. Functional Programming Languages and Computer Architecutre, June 1995, pages 86-95.
Download: PDF, source code
Abstract: We present a new data structure, called a random-access list, that supports array lookup and update operations in O(log n) time, while simultaneously providing O(1) time list operations (cons, head, tail). A closer analysis of the array operations improves the bound to O(min {i,log n}) in the worst case and O(log i) in the expected case, where i is the index of the desired element. Empirical evidence suggests that this data structure should be quite efficient in practice.
Call-by-need and Continuation-passing Style by Chris Okasaki, Peter Lee, and David Tarditi. Lisp and Symbolic Computation, 7(1):57-81, January 1994.
Download: PDF
Abstract: This paper examines the transformation of call-by-need $\lambda$ terms into continuation-passing style (CPS). It begins by presenting a simple transformation of call-by-need $\lambda$ terms into program graphs and a reducer for such graphs. From this, an informal derivation is carried out, resulting in a translation from $\lambda$ terms into self-reducing program graphs, where the graphs are represented as CPS terms involving storage operations. Though informal, the derivation proceeds in simple steps, and the resulting translation is taken to be our canonical CPS transformation for call-by-need $\lambda$ terms.
In order to define the CPS transformation more formally, two alternative presentations are given. The first takes the form of a continuation semantics for the call-by-need language. The second presentation follows Danvy and Hatcliff's two-stage decomposition of the call-by-name CPS transformation, resulting in a similar two-stage CPS transformation for call-by-need.
Finally, a number of practical matters are considered, including an improvement to eliminate the so-called administrative redexes, as well as to avoid unnecessary memoization and take advantage of strictness information. These improvements make it feasible to consider potential applications in compilers for call-by-need programming languages.
(This is a substantially revised version of our earlier CW'92 paper.)
Experience with a Course on Architectures for Software Systems by David Garlan, Mary Shaw, Chris Okasaki, Curtis Scott, and Roy Swonger. Conference on Software Engineering Education, October 1992, pages 23-43.
Abstract: As software systems grow in size and complexity their design problem extends beyond algorithms and data structures to issues of system design. This area receives little or no treatment in existing computer science curricula. Although courses about specific systems are usually available there is no systematic treatment of the organizations used to assemble components into systems. These issues - the software architecture level of software design - are the subject of a new course that we taught for the first time in Spring 1992. This paper describes the motivation for the course, the content and structure of the current version, and our plans for improving the next version.
Graph Reduction and Lazy Continuation-Passing Style by Chris Okasaki, Peter Lee, and David Tarditi. Workshop on Continuations, June 1992, pages 91-101.
(Superseded by our LASC'94 paper.)