API¶

Workflow management based on the signac framework.

The signac-flow package provides the basic infrastructure to easily configure and implement a workflow to operate on a signac data space.

class flow.FlowProject(config=None, environment=None)¶

Bases: signac.contrib.project.Project

A signac project class specialized for workflow management.

This class provides a command line interface for the definition, execution, and submission of workflows based on condition and operation functions.

This is a typical example on how to use this class:

@FlowProject.operation
def hello(job):
    print('hello', job)

FlowProject().main()

Parameters:	config (A signac config object.) – A signac configuaration, defaults to the configuration loaded from the environment.

add_operation(name, cmd, pre=None, post=None, **kwargs)¶

Add an operation to the workflow.

This method will add an instance of FlowOperation to the operations-dict of this project.

See also: run()

New in version 0.6.

Parameters:

operations (Sequence of instances of JobOperation) – The operations to execute (optional).
pretend (bool) – Do not actually execute the operations, but show which command would have been used.
np (int) – The number of processors to use for each operation.
timeout (int) – An optional timeout for each operation in seconds after which execution will be cancelled. Use -1 to indicate not timeout (the default).
progress – Show a progress bar during execution.

scheduler_jobs(scheduler)¶

Fetch jobs from the scheduler.

This function will fetch all scheduler jobs from the scheduler and also expand bundled jobs automatically.

However, this function will not automatically filter scheduler jobs which are not associated with this project.

Parameters:	scheduler (`Scheduler`) – The scheduler instance.
Yields:	All scheduler jobs fetched from the scheduler instance.

script(operations, parallel=False, template='script.sh', show_template_help=False)¶

Generate a run script to execute given operations.

Parameters:	operations (Sequence of instances of `JobOperation`) – The operations to execute. parallel – Execute all operations in parallel (default is False). parallel – bool template (str) – The name of the template to use to generate the script. show_template_help (bool) – Show help related to the templating system and then exit.

submit(bundle_size=1, jobs=None, names=None, num=None, parallel=False, force=False, walltime=None, env=None, **kwargs)¶

Submit function for the project’s main submit interface.

Changed in version 0.6.

Parameters:

bundle_size (int) – Specify the number of operations to be bundled into one submission, defaults to 1.
jobs (Sequence of instances Job.) – Only submit operations associated with the provided jobs. Defaults to all jobs.
names (Sequence of str) – Only submit operations with any of the given names, defaults to all names.
num (int) – Limit the total number of submitted operations, defaults to no limit.
parallel (bool) – Execute all bundled operations in parallel. Has no effect without bundling.
force (bool) – Ignore all warnings or checks during submission, just submit.
walltime – Specify the walltime in hours or as instance of datetime.timedelta.

submit_operations(operations, _id=None, env=None, parallel=False, flags=None, force=False, template='script.sh', pretend=False, show_template_help=False, **kwargs)¶

Submit a sequence of operations to the scheduler.

Changed in version 0.6.

Parameters:

operations (A sequence of instances of JobOperation) – The operations to submit.
_id (str) – The _id to be used for this submission.
serial (bool) – Execute all bundled operations in serial.
flags (list) – Additional options to be forwarded to the scheduler.
force (bool) – Ignore all warnings or checks during submission, just submit.
template (str) – The name of the template file to be used to generate the submission script.
pretend (bool) – Do not actually submit, but only print the submission script to screen. Useful for testing the submission workflow.
kwargs – Additional keyword arguments to be forwarded to the scheduler.

Returns:

Return the submission status after successful submission or None.

classmethod update_aliases(aliases)¶: Update the ALIASES table for this class.

update_stati(scheduler, jobs=None, file=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>, pool=None, ignore_errors=False)¶: This function has been removed as of version 0.6.

classmethod write_human_readable_statepoint(script, job)¶: Write statepoint of job in human-readable format to script.

Deprecated since version 0.6: Users should migrate to the new templating system.

write_script(script, operations, background=False, **kwargs)¶

Write a script for the execution of operations.

Deprecated since version 0.6: Users should migrate to the new templating system.

By default, this function will generate a script with the following components:

write_script_header(script)
write_script_operations(script, operations, background=background)
write_script_footer(script)

Parameters:	script – The script to write the commands to. operations (A sequence of JobOperation) – The operations to be written to the script. background (bool) – Whether operations should be executed in the background; useful to parallelize execution.

write_script_footer(script, **kwargs)¶: “Write the script footer for the execution script.

Deprecated since version 0.6: Users should migrate to the new templating system.

write_script_header(script, **kwargs)¶: “Write the script header for the execution script.

Deprecated since version 0.6: Users should migrate to the new templating system.

write_script_operations(script, operations, background=False, **kwargs)¶: Write the commands for the execution of operations as part of a script.

Deprecated since version 0.6: Users should migrate to the new templating system.

class flow.JobOperation(name, job, cmd, directives=None, np=None)¶

Bases: object

This class represents the information needed to execute one operation for one job.

An operation function in this context is a shell command, which should be a function of one and only one signac job.

Note

This class is used by the FlowProject class for the execution and submission process and should not be instantiated by users themselves.

Changed in version 0.6.

Parameters:

name (str) – The name of this JobOperation instance. The name is arbitrary, but helps to concisely identify the operation in various contexts.
job (signac.Job.) – The job instance associated with this operation.
cmd (str) – The command that executes this operation.
directives (dict) – A dictionary of additional parameters that provide instructions on how to execute this operation, e.g., specifically required resources.

get_id(index=0)¶: Return a name, which identifies this job-operation.

get_status()¶: Retrieve the operation’s last known status.

set_status(value)¶: Store the operation’s status.

class flow.label(name=None)¶

Bases: object

Decorate a FlowProject class function as a label function. For example:

class MyProject(FlowProject):

    @label()
    def foo(self, job):
        return True

class flow.classlabel(name=None)¶

Bases: flow.labels.label

A label decorator for classmethods.

This decorator implies “classmethod”!

class flow.staticlabel(name=None)¶

Bases: flow.labels.label

A label decorator for staticmethods.

This decorator implies “staticmethod”!

flow.cmd(func)¶

Specifies that func returns a shell command.

If this function is an operation function defined by FlowProject, it will be interpreted to return a shell command, instead of executing the function itself.

For example:

@FlowProject.operation
@flow.cmd
def hello(job):
    return "echo {job._id}"

class flow.directives(**kwargs)¶

Bases: object

Decorator for operation functions to provide additional execution directives.

Directives can for example be used to provide information about required resources such as the number of processes required for execution of parallelized operations.

flow.run(parser=None)¶

Access to the “run” interface of an operations module.

Executing this function within a module will start a command line interface, that can be used to execute operations defined within the same module. All top-level unary functions will be intepreted as executable operation functions.

For example, if we have a module as such:

# operations.py

def hello(job):
    print('hello', job)

if __name__ == '__main__':
    import flow
    flow.run()

Then we can execute the hello operation for all jobs from the command like like this:

$ python operations.py hello

Note

You can control the degree of parallelization with the --np argument.

For more information, see:

$ python operations.py --help

flow.init(alias=None, template=None, root=None, out=None)¶: Initialize a templated FlowProject module.

flow.redirect_log(job, filename='run.log', formatter=None, logger=None)¶

Redirect all messages logged via the logging interface to the given file.

Parameters:	job (`signac.Project.Job`) – An instance of a signac job. logger – The instance of logger to which the new file log handler is added. Defaults to the default logger returned by logging.getLogger() if this argument is not provided.
Formatter:	The logging formatter to use, uses a default formatter if this argument is not provided.

type logger:: logging.Logger

flow.get_environment(test=False, import_configured=True)¶

Attempt to detect the present environment.

This function iterates through all defined ComputeEnvironment classes in reversed order of definition and and returns the first EnvironmentClass where the is_present() method returns True.

Parameters:	test (bool) – Return the TestEnvironment
Returns:	The detected environment class.

Parameters:	job (`Job`) – The signac job handle.
Returns:	An instance of JobOperation to execute next or None, if no operation is eligible.
Return type:	:py:class:`~.JobOperation or NoneType

Parameters:	job (`Job`) – The signac job handle.
Yield:	All instances of `JobOperation` job is eligible for.

Parameters:	scheduler_jobs – An iterable of scheduler job instances.
Returns:	A nested dictionary (job_id, op_name, scheduler jobs)