Dryad was a research project at Microsoft Research for a general purpose runtime for execution of data parallel applications. The research prototypes of the Dryad and DryadLINQ data-parallel processing frameworks are available in source form at GitHub.[1]
Overview
Microsoft made several preview releases of this technology available as add-ons to Windows HPC Server 2008 R2.
An application written for Dryad is modeled as a directed acyclic graph (DAG). The DAG defines the dataflow of the application, and the vertices of the graph defines the operations that are to be performed on the data. The "computational vertices" are written using sequential constructs, devoid of any concurrency or mutual exclusion semantics. The Dryad runtime parallelizes the dataflow graph by distributing the computational vertices across various execution engines (which can be multiple processor cores on the same computer or different physical computers connected by a network, as in a cluster). Scheduling of the computational vertices on the available hardware is handled by the Dryad runtime, without any explicit intervention by the developer of the application or administrator of the network. The flow of data between one computational vertex to another is implemented by using communication "channels" between the vertices, which in physical implementation is realized by TCP/IP streams, shared memory or temporary files. A stream is used at runtime to transport a finite number of structured Items.
Dryad defines a domain-specific language, which is implemented via a C++ library, that is used to create and model a Dryad execution graph. Computational vertices are written using standard C++ constructs. To make them accessible to the Dryad runtime, they must be encapsulated in a class that inherits from the GraphNode
base class. The graph is defined by adding edges; edges are added by using a composition operator (defined by Dryad) that connects two graphs (or two nodes of a graph) with an edge. Managed code wrappers for the Dryad API can also be written.
There exist several high-level language compilers which use Dryad as a runtime; examples include Scope (Structured Computations Optimized for Parallel Execution) and DryadLINQ.[2]
In October 2011, Microsoft discontinued active development on Dryad, shifting focus to the Apache Hadoop framework.[3][4][5]
References
- ↑ GitHub - MicrosoftResearch/Dryad: This is a research prototype of the Dryad and DryadLINQ data-parallel processing frameworks running on Hadoop YARN.
- ↑ "DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language" (PDF). Microsoft Research. Retrieved 2009-01-21.
- ↑ Patee, Don. "Announcing the Windows Azure HPC Scheduler and HPC Pack 2008 R2 Service Pack 3 releases!". Microsoft. Retrieved 2013-05-31.
- ↑ Foley, Mary Joe. "Microsoft drops Dryad; puts its big-data bets on Hadoop". ZDNet. Retrieved 2013-05-31.
- ↑ Henschen, Doug. "Microsoft Ditches Dryad, Focuses On Hadoop". Information Week. Retrieved 2013-05-31.
Further reading
- "Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks" (PDF). Microsoft Research. Retrieved 2007-12-04.
- "SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets" (PDF). Microsoft Research. Retrieved 2009-01-21.