The FlowProject

This chapter describes how to setup a complete workflow via the implementation of a FlowProject.

Setup and Interface

To implement a more automated workflow, we can subclass a FlowProject:

# project.py
from flow import FlowProject

class MyProject(FlowProject):
    pass

if __name__ == '__main__':
    MyProject().main()

Tip

You can generate boiler-plate templates like the one above with the $ flow init function. There are multiple different templates available via the -t/--template option.

Executing this script on the command line will give us access to this project’s specific command line interface:

$ python project.py
usage: project.py [-h] {status,next,run,script,submit} ...

Note

You can have multiple implementations of FlowProject that all operate on the same signac project! This may be useful, for example, if you want to implement two very distinct workflows that operate on the same data space. Simply put those in different modules, e.g., project_a.py and project_b.py.

Classification

The FlowProject uses a classify() method to generate labels for a job. A label is a short text string, that essentially represents a condition. Following last chapter’s example, we could implement a greeted label like this:

# project.py
from flow import FlowProject
from flow import staticlabel

class MyProject(FlowProject):

    @staticlabel()
    def greeted(job):
        return job.isfile('hello.txt')
# ...

Using the staticlabel decorator turns the greeted() function into a function, which will be evaluated for our classification. We can check that by executing the hello operation for a few job and then looking at the project’s status:

$ python operations.py hello 0d32 2e6
hello 0d32543f785d3459f27b8746f2053824
hello 2e6ba580a9975cf0c01cb3c3f373a412
$ python project.py status --detailed
Status project 'MyProject':
Total # of jobs: 10

label    progress
-------  ----------
greeted  |########--------------------------------| 20.00%

Detailed view:
job_id                            S      next_op  labels
--------------------------------  ---  ---------  --------
0d32543f785d3459f27b8746f2053824  U               greeted
14fb5d016557165019abaac200785048  U
2af7905ebe91ada597a8d4bb91a1c0fc  U
2e6ba580a9975cf0c01cb3c3f373a412  U               greeted
42b7b4f2921788ea14dac5566e6f06d0  U
751c7156cca734e22d1c70e5d3c5a27f  U
81ee11f5f9eb97a84b6fc934d4335d3d  U
9bfd29df07674bc4aa960cf661b5acd2  U
9f8a8e5ba8c70c774d410a9107e2a32b  U
b1d43cd340a6b095b41ad645446b6800  U

Abbreviations used:
S: status
U: unknown

Determine the next-operation

Next, we should tell the project, that the hello() operation is to be executed, whenever the greeted condition is not met. We achieve this by adding the operation to the project:

class MyProject(FlowProject):

  def __init__(self, *args, **kwargs):
      super(MyProject, self).__init__(*args, **kwargs)

      self.add_operation(
        name='hello',
        cmd='python operations.py hello {job._id}',
        post=[MyProject.greeted])

Let’s go through the individual arguments of the add_operation() method:

The name argument is arbitrary, but must be unique for all operations part of the project’s workflow. It simply helps us to identify the operation without needing to look at the full command.

The cmd argument actually determines how to execute the particular operation, ideally it should be a function of job. We can construct the cmd either by using formatting fields, as shown above. We can use any attribute of our job instance, that includes state points (e.g. job.sp.a) or the workspace directory (job.ws). The command is later evaluated like this: cmd.format(job=job).

Alternatively, we can define a function that returns a command or script, e.g.:

# ...
    self.add_operation(
        name='hello',
        cmd=lambda job: "python operations.py hello {}".format(job),
        post=[MyProject.greeted])

Finally, the post argument is a list of unary condition functions.

Definition:

A specific operation is eligible for execution, whenever all pre-conditions (pre) are met and at least one of the post-conditions (post) is not met.

In this case, the hello operation will only be executed, when greeted() returns False; we can check that again by looking at the status:

$ python project.py status --detailed
Status project 'MyProject':
Total # of jobs: 10

label    progress
-------  -------------------------------------------------
greeted  |########--------------------------------| 20.00%

Detailed view:
job_id                            S    next_op    labels
--------------------------------  ---  ---------  --------
0d32543f785d3459f27b8746f2053824  U               greeted
14fb5d016557165019abaac200785048  U !  hello
2af7905ebe91ada597a8d4bb91a1c0fc  U !  hello
2e6ba580a9975cf0c01cb3c3f373a412  U               greeted
42b7b4f2921788ea14dac5566e6f06d0  U !  hello
751c7156cca734e22d1c70e5d3c5a27f  U !  hello
81ee11f5f9eb97a84b6fc934d4335d3d  U !  hello
9bfd29df07674bc4aa960cf661b5acd2  U !  hello
9f8a8e5ba8c70c774d410a9107e2a32b  U !  hello
b1d43cd340a6b095b41ad645446b6800  U !  hello

Abbreviations used:
!: requires_attention
S: status
U: unknown

Running project operations

Similar to the run() interface earlier, we can execute all pending operations with the python project.py run command:

$ python project.py run
hello 42b7b4f2921788ea14dac5566e6f06d0
hello 2af7905ebe91ada597a8d4bb91a1c0fc
hello 14fb5d016557165019abaac200785048
hello 751c7156cca734e22d1c70e5d3c5a27f
hello 9bfd29df07674bc4aa960cf661b5acd2
hello 81ee11f5f9eb97a84b6fc934d4335d3d
hello 9f8a8e5ba8c70c774d410a9107e2a32b
hello b1d43cd340a6b095b41ad645446b6800

Again, the execution is automatically parallelized.

Let’s remove a few random hello.txt files to regain pending operations:

$ rm workspace/2af7905ebe91ada597a8d4bb91a1c0fc/hello.txt
$ rm workspace/9bfd29df07674bc4aa960cf661b5acd2/hello.txt

Generating Execution Scripts:

Using the script command, we can generate an operation execution script based on the pending operations, which might look like this:

$ python project.py script
---- BEGIN SCRIPT ----

set -u
set -e
cd /Users/johndoe/my_project

# Statepoint:
#
# {{
#   "a": 4
# }}
python operations.py hello 2af7905ebe91ada597a8d4bb91a1c0fc &

wait
---- END SCRIPT ----


---- BEGIN SCRIPT ----

set -u
set -e
cd /Users/johndoe/my_project

# Statepoint:
#
# {{
#   "a": 0
# }}
python operations.py hello 9bfd29df07674bc4aa960cf661b5acd2 &

wait
---- END SCRIPT ----

These scripts can be used for the execution of operations directly, or they could be submitted to a cluster environment for remote execution. For more information about how to submit operations for execution to a cluster environment, see the Cluster Submission chapter.

Full Demonstration

The screencast below is a complete demonstration of all steps:

Checkout the next chapter for a guide on how to submit operations to a cluster environment.