Copyright Notice The documents contained in these pages are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Index by Topic
Abstract: Red-black trees and leftist heaps are classic data structures that are commonly taught in Data Structures (CS2) and/or Algorithms (CS7) courses. This paper describes alternatives to these two data structures that may offer pedagogical advantages for typical students.
Functional Data Structures by Chris Okasaki. Chapter 40 of
Handbook of Data Structures and Applications, October 2004, pages 40-1 to 40-17.
Abstract: An introduction to functional data structures for a non-functional programming audience. Examples are implemented in both Haskell and Java.
Abstract: A combinator expression is flat if it can be written without parentheses, that is, if all applications nest to the left, never to the right. This note explores a simple method for flattening combinator expressions involving arbitrary combinators.
Abstract: Overview of functional data structures, using examples related to binary heap trees. Introduces maxiphobic heaps, a new kind of priority queue especially suitable for teaching.
Abstract: One popular use for Haskell in recent years has been as a host language for domain-specific embedded languages. But how can one embed a postfix language in Haskell, given that Haskell only supports prefix and infix syntax? This paper describes several such embeddings for increasingly complex postfix languages.
Abstract: Every programmer has blind spots. Breadth-first numbering is an interesting toy problem that exposes a blind spot common to many--perhaps most--functional programmers.
An Overview of Edison by Chris Okasaki.
Electronic Notes in Theoretical Computer Science, 41(1), 2001. [Original version in
Haskell Workshop, September 2000, pages 34-45.]
Abstract: Edison is a library of functional data structures implemented in Haskell. It supports three main families of abstractions: sequences, collections (e.g., sets and priority queues), and associative collections (e.g., finite maps). This paper summarizes the design of Edison, with particular attention to how that design is influenced by details of Haskell.
Simple Confluently Persistent Catenable Lists by Haim Kaplan, Chris Okasaki, and Robert Endre Tarjan.
SIAM Journal on Computing, 30(3):965-977, 2000. [Original version in
Scandinavian Workshop on Algorithm Theory, July 1998, pages 119--130.]
Abstract: We consider the problem of maintaining persistent lists subject to concatenation and to insertions and deletions at both ends. Updates to a persistent data structure are nondestructive--each operation produces a new list incorporating the change while keeping intact the list or lists to which it applies. Although general techniques exist for making data structures persistent, these techniques fail for structures that are subject to operations, such as catenation, that combine two or more versions. In this paper we develop a simple implementation of persistent double-ended queues with catenation that supports all deque operations in constant amortized time.
Description: Proceedings from the workshop.
Abstract: Square matrices serve as an interesting case study in functional programming. Common representations, such as lists of lists, are both inefficient--at least for access to individual elements--and error-prone, because the compiler cannot enforce ``squareness''. Switching to a typical balanced-tree representation solves the first problem, but not the second. We develop a representation that solves both problems: it offers logarithmic access to each individual element and it captures the shape invariants in the type, where they can be checked by the compiler. One interesting feature of our solution is that it translates the well-known fast exponentiation algorithm to the level of types. Our implementation also provides a stress test for today's advanced type systems--it uses nested types, polymorphic recursion, higher-order kinds, and rank-2 polymorphism.
Abstract: We present an exceptionally simple and fast implementation of balanced binary search trees. Both the simplicity and the speed result from re-thinking important design decisions in a functional, rather than imperative, setting.
Safe-for-Space Threads in Standard ML by Edoardo Biagioni, Ken Cline, Peter Lee, Chris Okasaki, and Chris Stone.
Higher-Order and Symbolic Computation, 11(2):209-225, December 1998. [Original version in
2nd ACM SIGPLAN Workshop on Continuations, January 1997, pages 3:1-16.]
Abstract: Threads can be implemented easily in an extension of Standard ML by using first-class continuations, but the straightforward approaches for doing so lead to space leaks. We show how these space leaks arise and give a new implementation for threads that is safe for space.
Abstract: In Standard ML, as in many other languages, programmers are often confronted with an unpleasant choice between pattern matching and abstraction. Because pattern matching can only be performed on concrete datatypes, programmers must often sacrifice either the convenience of pattern matching or the engineering benefits of abstraction. Views relieve this tension by allowing pattern matching on abstract datatypes. We propose a modest extension of Standard ML with views and define its semantics via a source-to-source translation back into Standard ML without views. We claim no particular technical innovation; rather, we have attempted to engineer a solution that blends as seamlessly as possible with the rest of the language, including the module system and the stateful features of the language.
Abstract: Finite maps are ubiquitous in many applications, but perhaps nowhere more so than in compilers and other language processors. In these applications, three operations on finite maps dominate all others: looking up the value associated with a key, inserting a new binding, and merging two finite maps. Most implementations of finite maps in functional languages are based on balanced binary search trees, which perform well on the first two, but poorly on the third. We describe an implementation of finite maps with integer keys that performs well in practice on all three operations. This data structure is not new--indeed, it is thirty years old this year--but it deserves to be more widely known.
This is a revised and expanded version of my
thesis. New data structures include red-black trees, splay trees, pairing heaps, leftist heaps, tries, and Hood-Melville queues. I also provide numerous exercises, as well as Haskell translations of each of the major data structures.
Reviewed in Journal of Functional Programming (September 1999), Computing Reviews (March 1999), and Science of Computer Programming (November 1999).
Abstract: We illustrate the use of third-, fourth-, fifth-, and even sixth-order functions with examples taken from a combinator parsing library.
Abstract: We explore algorithms for several problems involving a family of balanced trees known as Braun trees. In each case, the final algorithm achieves its efficiency by exploiting the particularly stringent notion of balance embodied in Braun trees.
Abstract: Catenable double-ended queues are double-ended queues (deques) that support catenation (i.e., append) efficiently without sacrificing the efficiency of other operations. We present a purely functional implementation of catenable deques for which every operation, including catenation, takes O(1) amortized time. Kaplan and Tarjan have independently developed a much more complicated implementation of catenable deques that achieves similar worst-case bounds. The two designs are superficially similar, but differ in the underlying mechanism used to achieve efficiency in a persistent setting (i.e., when used in a non-single-threaded fashion). Their implementation uses a technique called recursive slowdown, while ours relies on the simpler mechanism of lazy evaluation.
Besides lazy evaluation, our implementation also exemplifies the use of two additional language features: polymorphic recursion and views. Neither is indispensable, but both significantly simplify the presentation.
Abstract: When a C programmer needs an efficient data structure for a particular problem, he or she can often simply look one up in any of a number of good textbooks or handbooks. Unfortunately, programmers in functional languages such as Standard ML or Haskell do not have this luxury. Although some data structures designed for imperative languages such as C can be quite easily adapted to a functional setting, most cannot, usually because they depend in crucial ways on assignments, which are disallowed, or at least discouraged, in functional languages. To address this imbalance, we describe several techniques for designing functional data structures, and numerous original data structures based on these techniques, including multiple variations of lists, queues, double-ended queues, and heaps, many supporting more exotic features such as random access or efficient catenation.
In addition, we expose the fundamental role of lazy evaluation in amortized functional data structures. Traditional methods of amortization break down when old versions of a data structure, not just the most recent, are available for further processing. This property is known as persistence, and is taken for granted in functional languages. On the surface, persistence and amortization appear to be incompatible, but we show how lazy evaluation can be used to resolve this conflict, yielding amortized data structures that are efficient even when used persistently. Turning this relationship between lazy evaluation and amortization around, the notion of amortization also provides the first practical techniques for analyzing the time requirements of non-trivial lazy programs.
Finally, our data structures offer numerous hints to programming language designers, illustrating the utility of combining strict and lazy evaluation in a single language, and providing non-trivial examples using polymorphic recursion and higher-order, recursive modules.
Abstract: Efficient data structures have been studied extensively for over thirty years. When a C programmer needs an efficient data structure for a particular problem, he or she can often simply look one up in any of a number of good textbooks or handbooks. However, the same cannot be said for the ML or Haskell programmer. Although some imperative data structures can be adapted quite easily to a functional setting, most cannot.
In this lecture, we will examine how functional data structures differ from imperative data structures and explore efficient functional implementations of several common abstractions, including FIFO queues, catenable lists, and meldable priority queues. A running theme will be the use of lazy evaluation in efficient amortized data structures.
Abstract: Brodal recently introduced the first implementation of imperative priority queues to support findMin, insert, and meld in O(1) worst-case time, and deleteMin in O(log n) worst-case time. These bounds are asymptotically optimal among all comparison-based priority queues. In this paper, we adapt Brodal's data structure to a purely functional setting. In doing so, we both simplify the data structure and clarify its relationship to the binomial queues of Vuillemin, which support all four operations in O(log n) time. Specifically, we derive our implementation from binomial queues in three steps: first, we reduce the running time of insert to O(1) by eliminating the possibility of cascading links; second, we reduce the running time of findMin to O(1) by adding a global root to hold the minimum element; and finally, we reduce the running time of meld to O(1) by allowing priority queues to contain other priority queues. Each of these steps is expressed using ML-style functors. The last transformation, known as data-structural bootstrapping, is an interesting application of higher-order functors and recursive structures.
Abstract: Traditional techniques for designing and analyzing amortized data structures in an imperative setting are of limited use in a functional setting because they apply only to single-threaded data structures, yet functional data structures can be non-single-threaded. In earlier work, we showed how lazy evaluation supports functional amortized data structures and described a technique (the banker's method) for analyzing such data structures. In this paper, we present a new analysis technique (the physicist's method) and show how one can sometimes derive a worst-case data structure from an amortized data structure by appropriately scheduling the premature execution of delayed components. We use these techniques to develop new implementations of FIFO queues and binomial queues.
Abstract: Amortization has been underutilized in the design of persistent data structures, largely because traditional accounting schemes break down in a persistent setting. Such schemes depend on saving ``credits'' for future use, but a persistent data structure may have multiple ``futures'', each competing for the same credits. We describe how lazy evaluation can often remedy this problem, yielding persistent data structures with good amortized efficiency. In fact, such data structures can be implemented purely functionally in any functional language supporting lazy evaluation. As an example of this technique, we present a purely functional (and therefore persistent) implementation of lists that simultaneously support catenation and all other usual list primitives in constant amortized time. This data structure is much simpler than the only existing data structure with comparable bounds, the recently discovered catenable lists of Kaplan and Tarjan, which support all operations in constant worst-case time.
Abstract: We present purely functional implementations of queues and double-ended queues (deques) requiring only O(1) time per operation in the worst case. Our algorithms are considerably simpler than previous designs with the same bounds. The inspiration for our approach is the incremental behavior of certain functions on lazy lists.
Abstract: We present a new data structure, called a random-access list, that supports array lookup and update operations in O(log n) time, while simultaneously providing O(1) time list operations (cons, head, tail). A closer analysis of the array operations improves the bound to O(min {i,log n}) in the worst case and O(log i) in the expected case, where i is the index of the desired element. Empirical evidence suggests that this data structure should be quite efficient in practice.
Abstract: This paper examines the transformation of call-by-need $\lambda$ terms into continuation-passing style (CPS). It begins by presenting a simple transformation of call-by-need $\lambda$ terms into program graphs and a reducer for such graphs. From this, an informal derivation is carried out, resulting in a translation from $\lambda$ terms into self-reducing program graphs, where the graphs are represented as CPS terms involving storage operations. Though informal, the derivation proceeds in simple steps, and the resulting translation is taken to be our canonical CPS transformation for call-by-need $\lambda$ terms.
In order to define the CPS transformation more formally, two alternative presentations are given. The first takes the form of a continuation semantics for the call-by-need language. The second presentation follows Danvy and Hatcliff's two-stage decomposition of the call-by-name CPS transformation, resulting in a similar two-stage CPS transformation for call-by-need.
Finally, a number of practical matters are considered, including an improvement to eliminate the so-called administrative redexes, as well as to avoid unnecessary memoization and take advantage of strictness information. These improvements make it feasible to consider potential applications in compilers for call-by-need programming languages.
(This is a substantially revised version of our earlier CW'92 paper.)
Abstract: As software systems grow in size and complexity their design problem extends beyond algorithms and data structures to issues of system design. This area receives little or no treatment in existing computer science curricula. Although courses about specific systems are usually available there is no systematic treatment of the organizations used to assemble components into systems. These issues - the software architecture level of software design - are the subject of a new course that we taught for the first time in Spring 1992. This paper describes the motivation for the course, the content and structure of the current version, and our plans for improving the next version.
?