Cluster Submission

While it is always possible to manually submit scripts like the one shown in the previous chapter to a cluster, using the flow interface will allows us to keep track of submitted operations for example to prevent the resubmission of active operations.

In addition, signac-flow may utilize environment profiles to adjust the submission process based on your local environment. That is because different cluster environments will offer different resources and require slightly different options for submission. While the basic options will be as similar as possible, the submit interface will be slightly adapted to the local environment. You can check out the available options with the python project.py submit --help command.

The submit interface

In general, we submit operations through the primary interface of the FlowProject. If we have a project.py module (as shown earlier), which looks something like this:

# project.py
from flow import FlowProject

class Project(FlowProject):

    def __init__(*args, **kwargs):
        super(Project, self).__init__(*args, **kwargs)

if __name__ == '__main__':
    Project().main()

Then we can submit operations from the command line with the following command:

$ python project.py submit

Note

From here on we will abbreviate the $ python project.py submit command with $ <project> submit. That is because the module may be named differently.

In many cases you will need to provide additional arguments to the scheduler, such as your account name, the required walltime, and other information about requested resources. Some of these options can be specified through the native interface, that means flow knows about these options and you can see them when executing submit --help.

However, you can always forward any arguments directly to the scheduler command as positional arguments. For example, if we wanted to specify an account name with a torque scheduler, we could use the following command:

$ <project> submit -- -l A:my_account_name

Everything after the two dashes -- will not be interpreted by the submit interface, but directly forwarded to the scheduler as is.

Note

Unless you have one of the supported schedulers installed, you will not be able to submit any operations on your computer, however you will be able to run some test commands in order to debug the process as best as you can. On the other hand, if you are in one of the natively supported high-performance super computing environments (e.g. XSEDE), you may take advantage of configurations profiles specifically tailored to those environments.

Submitting Operations

The submission process consists of the following steps:

  1. Gathering of all operations eligible for submission.
  2. Generating of scripts to execute those operations.
  3. Submission of those scripts to the scheduler.

The first step is largely determined by your project workflow. You can see which operation might be submitted by looking at the output of $ <project> status --detailed. You may further reduce the operations to be submitted by selecting specifc jobs (-j), specific operations (-o), or generally reduce the total number of operations to be submitted (-n). For example the following command would submit up to 5 hello operations:

$ <project> submit -o hello -n 5

By default, all operations are eligible for submission, however you can overload the FlowProject.eligible_for_submission() method to customize this behavior.

The scripts for submission are generated by the FlowProject.write_script() method. This method itself calls the following methods:

write_script_header(script)
write_script_operations(script, operations, ...)
write_script_footer(script)

This means by default, each script will contain one header and footer at the beginning and end of the script and the commands for each operation will be written in between. In order to customize the generation of scripts, it is recommended to overload any of these three functions, or to overload the write_script() method itself.

Tip

Use the script command to debug the generation of execution scripts.

Parallelization

When submitting operations to the cluster, signac-flow assumes that each operations requires one processor and will generate a script requesting the resources accordingly.

When you execute parallelized operations you need to specify that with your operation. For example, assuming that we want to execute a program called foo, which will automatically parallelize onto 24 cores. Then we would need to specify the operation like this:

class MyProject(FlowProject):

    def __init__(self, *args, **kwargs):
        super(MyProject, self).__init__(*args, **kwargs)
            self.add_operation(
              name='foo',                         # name of the operation
              cmd='cd {job.ws}; foo input.txt',   # the execution command
              np=24,                              # foo requires 24 cores
            )

If you are using MPI for parallelization, you may need to prefix your command accordingly:

cmd='cd {job.ws}; mpirun -np 24 foo input.txt'

Different environment use different MPI-commands, you can use your environment-specific MPI-command like that:

from flow import get_environment

# ..
    env = get_environment()

    self.add_operation(
      name='foo',
      cmd='cd {job.ws};' +  env.mpi_cmd('foo input.txt', np=24),
      np=24,
    )

Tip

Both the cmd-argument and the np-argument may be callables, that means you can specify both the command itself, but also the number of processors as a function of job!

Here is an example using lambda-expressions:

self.add_operation(
    name='foo',
    cmd=lambda job: env.mpi_cmd("foo input.txt", np=job.sp.a),
    np=lambda job: job.sp.a)

Operation Bundling

By default all operations will be submitted as separate cluster jobs. This is usually the best model for clusters that scale well with the size of your operations. However, you may choose to bundle multiple operations into one submission using the --bundle option, e.g., if you need to run multiple processes in parallel to fully utilize one node.

For example, the following command will bundle up to 5 operations into a single cluster job:

$ <project> submit --bundle 5

These 5 operations will be executed in parallel, that means the resources for this cluster jobs will be the sum of the resources required for each operation. Without any argument the --bundle option will bundle all operations into a single cluster job.

Finally, if you have many small operations, you could bundle them into a single cluster job submission with the --serial option. In this mode, all bundled operations will be executed in serial and the resources required will be determined by the operation which requires the most resources.

Managing Environments

The signac-flow package attempts to detect your local environment and based on that adjusts the options provided by the submit interface. You can check which environment you are using, by looking at the output of submit --help.

For more information, see the next chapter.