Orca 0.5.3-dev0
Orca is a workflow management solution similar in nature to Airflow and Luigi, but specifically for microservices and is built with data streaming in mind. It attempts to provide a sensible way to define a data driven model workflow through a combination of yaml and python.
Installation
The easiest way to install orca is from pip
pip install amanzi.orca
To verify that the install succeeded run the version command
orca version
Quickstart
Create a file called orca.yml
apiVersion: '1.0'
version: '0.1'
name: 'quickstart example'
job:
- task: hello_world
python: print('Hello World!')
To invoke the CLI after installing one just invokes orca:
$ orca
Usage: orca [OPTIONS] COMMAND [ARGS]...
Options:
-v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or DEBUG
--help Show this message and exit.
Commands:
run Run an orca workflow.
todot Create a graphviz dot file from an orca workflow.
validate Validate an orca workflow.
version Print the orca version.
Execute the hello world example
$ orca run orca.yml
Hello World!
Concepts
This section describes some core concepts to understand how to use orca
What is a Orca Workflow ?
An orca workflow, is a yaml document describing the actions that make up the workflow. Here is a simple "hello world" example
apiVersion: '1.0'
name: 'hello-workflow'
version: 0.1
job:
- task: hello
python: print('Hello World!')
The document above describes a Job
object that is an array describing the tasks
to perform as steps in the workflow.
A task is some action that can be taken during the execution of the workflow, in this instance we are using
whats called a python task
. There are a number of available tasks available for use in your workflows, They are:
python
bash
http
csip
You can read more about tasks at their dedicated page.
Orca also has the ability to define variables
for reuse throughout the workflow. These variables can be reused amongst
tasks and passed in as inputs. variables, are namespaced under the var yaml object, and any reference to a variable must
being with var.
A side note about strings: as you may notice in the example below, for most of our yaml document we have not been
quoting our strings the one exception to this rule is in the var section of the document. i.e if the variable is a string
you must quote it!
apiVersion: '1.0'
name: 'variable example'
version: '0.1'
var:
name: 'Susie'
job:
- task: hello
python: |
msg = 'Hello {0}'.format(person_to_greet)
print(msg)
inputs:
person_to_greet: var.name
Another component of an orca workflow are control structures
. Control structures allow for
control flow type operations to be performed on a subset of tasks by writing yaml describing the nature of the control
flow.
There are four kinds of control structures in orca.
- if control
- for control
- switch control
- fork control
If Structures
#Example of a if condition object
if: 5 > 0
do:
- task: cond_task_1
python: print('hello condition!')
if
structures contain two required top level properties if
which describes the condition and do
the list of tasks
to execute.
For structures
for: i, range(0,10)
do:
- task: print_i
python: print(counter)
inputs:
counter: i
for
control structures also contain two required properties they are : for
and do
.
The value of the for
property must be defined as var,expression
Switch structures
switch: 1
1:
- task: start5_1
python: var.afile
- task: start5_2
python: var.afile
2:
- task: start1_1
python: var.afile
- task: start1_2
python: var.afile
default:
- task: default
python: var.afile
switch
statements require a switch
and default
properties. The default
properties is the set of tasks to run given that
the switch condition does not resolve to a case. The switch
property value can be be a variable thats evaluated at the time the switch block
is reached.
Fork structures
fork:
# first group of tasks
- - task: start1
python: var.afile
- task: start4
python: var.afile
# second group
- - task: start2
python: var.afile
# third group
- - task: start3
python: var.afile
A Fork structure is an array, of task
lists. each list in the topmost array is executed in parallel.
Examples
Here is a git repository of examples for using orca
Commands
Orca currently provides a concise set of commands for running workflows.
run
The run command executes the workflow defined in the yaml configuration file. Options include ledgers, to publish the results of each task in the workflow too, options include a json, mongo or kafka ledger
Usage: orca run [OPTIONS] FILE [ARGS]...
Run an orca workflow.
Options:
--ledger-json PATH file ledger.
--ledger-mongo TEXT mongodb ledger, TEXT format "<host[:port]>/<db>/<col>".
--ledger-kafka TEXT kafka ledger, TEXT format "<host[:port]>/topic".
--help Show this message and exit.
todot
Converts the yaml definition into a graphviz dotfile
version
Prints the current orca version
$ orca version
0.5.2