Paradigm | Dataflow |
---|---|
First appeared | 2001 |
Platform | Platform-independent |
Filename extensions | .cal, .xdf |
Major implementations | |
Open RVC-CAL Compiler, OpenDF framework |
CAL (the Cal Actor Language) is a high-level programming language[1] for writing (dataflow) actors, which are stateful operators that transform input streams of data objects (tokens) into output streams. CAL has been compiled to a variety of target platforms, including single-core processors, multicore processors, and programmable hardware. It has been used in several application areas, including video and processing, compression and cryptography. The MPEG Reconfigurable Video Coding (RVC)[2] working group has adopted CAL as part of their standardization efforts.
History and Introduction
The CAL Actor Language was developed in 2001 as part of the Ptolemy II project at University of California at Berkeley. CAL is a dataflow language geared towards a variety of application domains, such as multimedia processing, control systems, network processing etc.
Another common reason for choosing dataflow is that the goal is an efficient parallel implementation which would be difficult or impossible to achieve using a sequential programming language. Sequential languages are notoriously difficult to parallelize in general, so efficient parallel implementations will usually require significant guidance from the user. A CAL dataflow program provides simple, understandable, and powerful abstractions that allow the specification of as much or as little parallelism as is required, enabling tools to produce sophisticated implementations that exploit the concurrent structure of a computation.
When programming in dataflow, the programmer is typically constructing a concurrent description of a computational system, which is different from a common sequential program. Rather than being concerned with the step-by-step execution of an algorithm, a dataflow programmer builds a system of asynchronously communicating entities called actors. Much of the programming effort is directed toward finding a good factoring of the problem into actors, and toward engineering appropriate communication patterns among those actors.
CAL features
The structure of actors
Actors perform their computation in a sequence of steps called firings. In each of those steps:
- 1. the actor may consume tokens from its input ports,
- 2. it may modify its internal state,
- 3. it may produce tokens at its output ports.
Consequently, describing an actor involves describing its interface to the outside, the ports, the structure of its internal state, as well as the steps it can perform, what these steps do (in terms of token production and consumption, and the update of the actor state), and how to pick the step that the actor will perform next. This section discusses some of the constructs in the CAL language that deal with these issues. Actions describe the things that happen during a step that an actor takes. In fact, it is accurate to say that a step consists of executing an action. Recall that when an actor takes a step, it may consume input tokens and produce output tokens.
Therefore, input patterns do the following:
- They define the number of tokens (for each port) that will be consumed when the action is executed (fired).
- They declare the variable symbols by which tokens consumed by an action firing will be referred to within the action.
- They define a firing condition for the action, i.e. a condition that must be met for the action to be able to fire.
The output side of an action is a little simpler, the output expressions simply define the number and values of the output tokens that will be produced on each output port by each firing of the action. It is permissible to omit the explicit naming of the port that an input pattern or output expression applies to if an action provides as many input patterns as there are input ports, or output expressions as there are output ports. In such a case, the patterns or expressions are matched by position against the port declarations.
One way of thinking about an actor is as an operator on streams of data — sequences of tokens enter it on its input ports, and sequences of tokens leave it on its output ports. When discussing the operation of an actor, it is often useful to look at it as an operator on streams. Actors can have parameters. They act as constants during the actor execution, and are given a concrete value when an actor is instantiated as part of an actor network. The main purpose of actor parameters is to allow programmers to specify families of related actors, without having to duplicate a lot of code.
Non determinism
A non-deterministic actor is one that, for the same input sequences, allows more than one run and more than one possible output. Non-determinism can be very powerful when used appropriately, but it can also be a very troublesome source of errors. A particular concern is that non-determinism might be introduced into an actor inadvertently, i.e. the author thinks the actor is deterministic even though it isn't. One of the key design goals of the CAL language was to allow the description of non-deterministic actors, while at the same time permitting tools to identify possible sources of non-determinism, so that they can warn the user about them.
A key consequence of a non-deterministic actor like NDMerge is that during an actual execution, its output may depend on the timing of its input. If both its input queues are empty, and NDMerge is waiting for input, then whatever input the next token arrives at may be the one that is copied next to the output. Consequently, the scheduling of activities in the actor network, or the relative speeds of the actors feeding into an actor like NDMerge may affect the output of the system. This may, occasionally, by desirable, and at other times it may not. In any event, it is a property that one needs to be aware of.
One way to look at non-determinism of the kind that makes an actor dependent on the precise timing of token arrivals is that such an actor only appears to be non-deterministic if we look at it as an operator on streams, because that view abstracts from the temporal properties of the execution, and thus purposefully removes information that is used to determine the sequence in which actions fire. From the perspective of the CAL language, this is not entirely accurate, but even so, it is easy to write non-deterministic actors that would not be deterministic even if we knew everything about the timing of the tokens and the actor implementation—such as the following:
Guarded actions
The guard clause of an action contains a number of expressions that all need to be true in order for the action to be fireable. For the first action to be fireable, the incoming token needs to be greater or equal to zero, in which case it will be sent to output P. Otherwise that action cannot fire. Conversely, for the second action to be fireable, the token needs to be less than zero, in which case it is sent to output N. A run of this actor might look like this: An actor could run into trouble if it ever encounters a zero token, because none of its actions will be able to fire on it.
It's not illegal to write actors that terminate on some input, and in fact it may be important to have a few of those in some systems. But it is a pitfall that one needs to be aware of. Secondly, the guard conditions are also disjoint in addition to being exhaustive.
Finally, note that guard conditions can ”peek” at the incoming tokens without actually consuming them — if the guards happen to be false or the action is not fired for some other reason, and if the token is not consumed by another action, then it remains where it is, and will be available for the next firing. (Or it will remain there forever, as in the case of the zero token in front of SplitDead, which is never removed because the actor is dead.)
The Select actor below is another example of the use of guarded actions. It is similar to the NDMerge actor in the sense that it merges two streams (the ones arriving at its A and B input ports). However, it does so according to the (Boolean) values of the tokens arriving at its S input port.
Actors with state
In all the actors so far, nothing an action firing did would in any way affect subsequent firings of actions of the same actor. Using state variables, action firings can leave information behind for subsequent firings of either the same or a different action of the same actor. The way this actor is written, the selection of the next input token and the actual copying of the token to the output is one atomic step.
Note that Select and IterSelect are almost, but not entirely, equivalent. First of all, IterSelect makes twice as many steps in order to process the same number of tokens. Secondly, it actually reads, and therefore consumes, the S input token, irrespective of whether a matching data token is available on A or B.
Schedules
The IterSelect actor of the previous section illustrated the use of state to control the selection of actions. This is an extremely common thing to do in practice, and the CAL language provides special syntax for this purpose in the form of schedules. Conceptually, one can think of schedules as codifying a particular pattern of using a state variable—they do not add anything to the language in terms of expressiveness. The rationale for using schedules is twofold:
- They are usually easier to use and less error prone than using a state variable and many guards and assignments.
- Tools can use the information encoded in a schedule more easily, and thus recognize regularities in the actor that might help them to produce more efficient code, or perform other analyses that help in implementation and design.
Each state transition consists of three parts: the original state, a list of action tags, and the following state. One thing worth noting is that the number of actions has increased—instead of the original three, the new version with the schedule now has four actions. The reason is that an action can no longer directly assign the successor state, as it did in the original, where depending on the value of the token read state would be assigned either the value 1 or 2. In the version with a schedule, that state modification is implicit in the structure of the state machine, and it happens depending on which action fires. Accordingly, the condition that checks the value of the token has moved from within the body of the action to the guards of the two actions tagged readT and readF.
Priorities
As long as it has only input on one of its input ports, everything is unambiguous. But, just like NDMerge, as soon as input is available on both input ports, it could fire either of its two actions, and there is nothing in that actor specification which would predispose it to choose one over the other.
None of the language constructs so far would allow us to do this. Unlike in this case of schedules, which could be regarded syntactic sugar because they could be reduced to existing elements of the language (state variables, guards, and assignments), this situation does in fact require a true extension—action priorities. The basic idea is to add a number of inequalities that relate actions with respect to their firing precedence.
Just as in the case of schedules, we use action tags to identify actions that we want to refer to later on—this time within the priority inequality. The priority block contains only one such inequality, relating the action tagged config to the one tagged process, giving the former priority over the latter. Of course, even this version is still very much timing-dependent. In this case, that need not be a problem, and in fact is probably a requirement for this actor to perform its function. But in general, it is important to understand that priorities, especially when used as in the previous example, need to be well- understood to yield the correct results. Especially when information about the timing of the communication within the network is vague, it is probably best to think of them as strong implementation directives.
Statements and expressions
The previous chapter focused primarily on those constructs in CAL that are related to actor-specific concepts—token input and output, actions, controlling the action selection and so forth. This section discusses the more ”pedestrian” parts of CAL, the statements and expressions used to manipulate data objects and express (sequential) algorithms. This part of the language is similar to what can be found in many procedural programming languages (such as C, Pascal, Java, Ada), so we will focus on areas that might be slightly different in CAL.
Expressions
Unlike languages such as C, CAL makes a strong distinction between statements and expressions. They have very distinct roles, very distinct meanings, and they can never be used interchangeably. An expression in CAL is a piece of code whose sole purpose is to compute a value. We also say that an expression has a value, or that it evaluates to a value. For most expressions, the value that they evaluate to will depend on the values of one or more variables at the time when the expression is evaluated. Since variable values may change over time, the same expression may have different values when evaluated at different points in time.
Atomic expressions
Probably the most fundamental expressions are constants. Another group of basic expressions are variable references. Syntactically, a variable is any sequence of letters and digits. One important property of expressions is that they are guaranteed not to change variables (we also say they have no side effects)—consequently, within an expression, multiple references to the same variable will always yield the same result.
Simple composite expressions
CAL provides operators of two kinds to build expressions: unary and binary. A unary operator in CAL is always a prefix operator, i.e. it appears before its single operand. A binary operator occurs between its two operands.
Statements
In some ways, statements in CAL are just the opposite of expressions: they do not have a ”return value”, but they can change the values of variables. Indeed, changing the values of variables is the whole point of statements. Statements are executed in strict sequential order, and unless otherwise specified, the execution of statements proceeds in the order in which they appear in the program text, which means that any variable changes produced by a statement may affect the execution of subsequent statements.
Control flow
As in most other programming languages, there are constructs to control the order in which the statements within a program are executed. The part of this loop that directly follows the 'foreach keyword is a generator, much like those in list comprehensions.
Action
- Input patterns: declaring variables
- Guard: specifying enabling conditions
- Output expressions: computing output tokens
- Body: modifying the actor state
Supporting tools
OpenDF framework
Open RVC-CAL compiler
References
- ↑ CAL Language Report: Specification of the CAL actor language, Johan Eker and Jörn W. Janneck, Technical Memorandum No. UCB/ERL M03/48, University of California, Berkeley, CA, 94720, USA, December 1, 2003
- ↑ Overview of the MPEG Reconfigurable Video Coding Framework, Shuvra S. Bhattacharyya, Johan Eker, Jörn W. Janneck, Christophe Lucarz, Marco Mattavelli, Mickaël Raulet, Journal of Signal Processing Systems, 2009, Springer