API

The complete signac application interface (API).

Module contents

The signac framework aids in the management of large and heterogeneous data spaces.

It provides a simple and robust data model to create a well-defined indexable storage layout for data and metadata. This makes it easier to operate on large data spaces, streamlines post-processing and analysis and makes data collectively accessible.

class signac.Project(config=None)

Bases: object

The handle on a signac project.

Application developers should usually not need to directly instantiate this class, but use contrib.get_project() instead.

build_job_search_index(index, include=None, hash_=None)

Build a job search index.

Parameters:
  • index (list) – A document index.
  • include (Mapping) – A mapping of keys that shall be included (True) or excluded (False).
Returns:

A job search index based on the provided index.

Return type:

JobSearchIndex

build_job_statepoint_index(exclude_const=False, index=None)

Build a statepoint index to identify jobs with specific parameters.

This method generates unordered key-value pairs, with complete statepoint paths as keys, encoded in JSON, and a set of job ids of all corresponding jobs, e.g.:

>>> project.open_job({'a': 0, 'b': {'c': 'const'}}).init()
>>> project.open_job({'a': 1, 'b': {'c': 'const'}}).init()
>>> for k, v in project.job_statepoint_index():
...     print(k, v)
...
["a", 1] {'b7568fa73881d27cbf24bf58d226d80e'}
["a", 0] {'54b61a7adbe004b30b39aa399d04f483'}
["b", "c", "abc"] {'b7568fa73881d27cbf24bf58d226d80e', '54b61a7adbe004b30b39aa399d04f483'}
Parameters:
  • exclude_const (bool) – Exclude entries that are shared by all jobs that are part of the index.
  • index – A document index.
Yields:

Key-value pairs of JSON-encoded statepoint parameters and and a set of corresponding job ids.

config

The project’s configuration.

create_access_module(formats=None, crawlername=None, filename=None, master=True, depth=1)

Create the access module for indexing

This method generates the acess module containing indexing directives for master crawlers.

Parameters:
  • formats (dict) – The format definitions as mapping.
  • crawlername (str) – Specify a name for the crawler class. Defaults to a name based on the project’s name.
  • filename (str) – The name of the access module file. Defaults to the standard name and should ususally not be changed.
  • master (bool) – If True, will add master crawler execution commands to the bottom of the file.
  • depth (int) – Specifies the depth of the master crawler definitions (if master is True). Defaults to 1 to reduce the crawling depth of the master crawler. A value of 0 means no limit.
create_linked_view(job_ids=None, prefix=None, force=False, index=None)

Create a persistent linked view of the selected data space..

This method determines unique paths for each job based on the job’s statepoint and creates symbolic links to the associated workspace directories. This is useful for browsing through the data space in a human-readable manner.

Assuming that the parameter space is

  • a=0, b=0
  • a=1, b=0
  • a=2, b=0
  • ...,

where b does not vary over all statepoints, this method will create the following symbolic links within the specified view prefix:

view/a/0/job -> /path/to/workspace/7f9fb369851609ce9cb91404549393f3
view/a/1/job -> /path/to/workspace/017d53deb17a290d8b0d2ae02fa8bd9d
...

Note

To maximize the compactness of each view path, b which does not vary over the selected data space, is ignored.

create_view(filter=None, prefix='view')

Create a view of the workspace.

Warning

This method is deprecated. Please use create_linked_view() instead.

This method gathers all varying statepoint parameters and creates symbolic links to the workspace directories. This is useful for browsing through the workspace in a human-readable manner.

Let’s assume the parameter space is

  • a=0, b=0
  • a=1, b=0
  • a=2, b=0
  • ...,

where b does not vary over all statepoints.

Calling this method will generate the following symbolic links within the speciefied view directory:

view/a/0 -> /path/to/workspace/7f9fb369851609ce9cb91404549393f3
view/a/1 -> /path/to/workspace/017d53deb17a290d8b0d2ae02fa8bd9d
...

Note

As b does not vary over the whole parameter space it is not part of the view url. This maximizes the compactness of each view url.

Parameters:
  • filter (mapping) – If not None, create view only for jobs matching filter.
  • prefix – Specifies where to create the links.
dump_statepoints(statepoints)

Dump the statepoints and associated job ids.

Equivalent to:

{project.open_job(sp).get_id(): sp for sp in statepoints}
Parameters:statepoints (iterable) – A list of statepoints.
Returns:A mapping, where the key is the job id and the value is the statepoint.
Return type:dict
find_job_documents(filter=None)

Find all job documents in the project’s workspace.

This method iterates through all jobs or all jobs matching the filter and yields each job’s document as a dict. Each dict additionally contains a field ‘statepoint’, with the job’s statepoint and a field ‘_id’, which is the job’s id.

Parameters:filter (mapping) – If not None, only find job documents matching filter.
Yields:Instances of dict.
Raises:KeyError – If the job document already contains the fields ‘_id’ or ‘statepoint’.
find_job_ids(filter=None, doc_filter=None, index=None)

Find the job_ids of all jobs matching the filters.

The optional filter arguments must be a Mapping of key-value pairs and JSON serializable.

Note

Providing a pre-calculated index may vastly increase the performance of this function.

Parameters:
  • filter (Mapping) – A mapping of key-value pairs that all indexed job statepoints are compared against.
  • doc_filter – A mapping of key-value pairs that all indexed job documents are compared against.
  • index – A document index.
Yields:

The ids of all indexed jobs matching both filters.

Raises:
  • TypeError – If the filters are not JSON serializable.
  • ValueError – If the filters are invalid.
  • RuntimeError – If the filters are not supported by the index.
find_jobs(filter=None, doc_filter=None, index=None)

Find all jobs in the project’s workspace.

The optional filter arguments must be a Mapping of key-value pairs and JSON serializable.

Note

Providing a pre-calculated index may vastly increase the performance of this function.

Parameters:
  • filter (Mapping) – A mapping of key-value pairs that all indexed job statepoints are compared against.
  • doc_filter – A mapping of key-value pairs that all indexed job documents are compared against.
Yields:

Instances of Job

Raises:
  • TypeError – If the filters are not JSON serializable.
  • ValueError – If the filters are invalid.
  • RuntimeError – If the filters are not supported by the index.
find_statepoints(filter=None, doc_filter=None, index=None, skip_errors=False)

Find all statepoints in the project’s workspace.

Parameters:
  • filter (mapping) – If not None, only yield statepoints matching the filter.
  • skip_errors (bool) – Show, but otherwise ignore errors while iterating over the workspace. Use this argument to repair a corrupted workspace.
Yields:

statepoints as dict

find_variable_parameters(statepoints=None)

Find all parameters which vary over the data space.

Warning

This method is deprecated. Please see build_job_statepoint_index() for an alternative method.

This method attempts to detect all parameters, which vary over the parameter space. The parameter sets are ordered decreasingly by data sub space size.

Warning

This method does not detect linear dependencies within the state points. Linear dependencies should generally be avoided.

Parameters:statepoints (Iterable of parameter mappings.) – The statepoints to consider. Defaults to all state points within the data space.
Returns:A hierarchical list of variable parameters.
Return type:list
get_id()

Get the project identifier.

Returns:The project id.
Return type:str
Raises:LookupError – If no project id could be determined.
classmethod get_project(root=None)

Find a project configuration and return the associated project.

Parameters:root (str) – The project root directory. If no root directory is given, the next project found within or above the current working directory is returned.
Returns:The project handle.
Raises:LookupError – If no project configuration can be found.
get_statepoint(jobid, fn=None)

Get the statepoint associated with a job id.

The statepoint is retrieved from the workspace or from the statepoints file if the former attempt fails.

Parameters:
  • jobid (str) – A job id to get the statepoint for.
  • fn (str) – The filename of the file containing the statepoints, defaults to FN_STATEPOINTS.
Returns:

The statepoint.

Return type:

dict

Raises:

KeyError – If the statepoint associated with jobid could not be found.

See also dump_statepoints().

index(formats=None, depth=0, skip_errors=False, include_job_document=True)

Generate an index of the project’s workspace.

This generator function indexes every file in the project’s workspace until the specified depth. The job document if it exists, is always indexed, other files need to be specified with the formats argument.

for doc in project.index('.*\.txt', TextFile):
    print(doc)
Parameters:
  • formats (dict) – The format definitions as mapping.
  • depth (int) – Specifies the crawling depth. A value of 0 (default) means no limit.
  • skip_errors (bool) – Skip all errors which occur during indexing. This is useful when trying to repair a broken workspace.
  • include_job_document (bool) – Include the contents of job documents.
Yields:

index documents

classmethod init_project(name, root=None, workspace=None, make_dir=True)

Initialize a project with the given name.

It is safe to call this function multiple times with the same arguments. However, a RuntimeError is raised in case where an existing project configuration would conflict with the provided initialization parameters.

Parameters:
  • name (str) – The name of the project to initialize.
  • root (str) – The root directory for the project. Defaults to the current working directory.
  • workspace (str) – The workspace directory for the project. Defaults to $project_root/workspace.
  • make_dir (bool) – Create the project root directory, if it does not exist yet.
Returns:

The project handle of the initialized project.

Raises:

RuntimeError – If the project root path already contains a conflicting project configuration.

num_jobs()
open_job(statepoint=None, id=None)

Get a job handle associated with a statepoint.

This method returns the job instance associated with the given statepoint or job id. Opening a job by statepoint never fails. Opening a job by id, requires a lookup of the statepoint from the job id, which may fail if the job was not previously initialized.

Parameters:
  • statepoint (mapping) – The job’s unique set of parameters.
  • id (str) – The job id.
Returns:

The job instance.

Return type:

signac.contrib.job.Job

Raises:

KeyError – If the attempt to open the job by id fails.

read_statepoints(fn=None)

Read all statepoints from a file.

Parameters:fn (str) – The filename of the file containing the statepoints, defaults to FN_STATEPOINTS.

See also dump_statepoints(). See also write_statepoints().

repair()

Attempt to repair the workspace after it got corrupted.

reset_statepoint(job, new_statepoint)

Reset the statepoint of job.

Danger

Use this function with caution! Resetting a job’s statepoint, may sometimes be necessary, but can possibly lead to incoherent data spaces. If you only want to extend your statepoint, consider to use update_statepoint() instead.

Parameters:
  • job (Job) – The job, that should be reset to a new state point.
  • new_statepoint (mapping) – The job’s new unique set of parameters.
Raises:

RuntimeError – If a job associated with the new unique set of parameters already exists in the workspace.

root_directory()

Returns the project’s root directory.

update_statepoint(job, update, overwrite=False)

Update the statepoint of job.

Warning

While appending to a job’s statepoint is generally safe, modifying existing parameters may lead to data inconsistency. Use the overwrite argument with caution!

Parameters:
  • job (Job) – The job, whose statepoint shall be updated.
  • update (mapping) – A mapping used for the statepoint update.
  • overwrite – Set to true, to ignore whether this update overwrites parameters, which are currently part of the job’s statepoint. Use with caution!
Raises:
  • KeyError – If the update contains keys, which are already part of the job’s statepoint.
  • RuntimeError – If a job associated with the new unique set of parameters already exists in the workspace.
workspace()

Returns the project’s workspace directory.

The workspace defaults to project_root/workspace. Configure this directory with the ‘workspace_dir’ attribute. If the specified directory is a relative path, the absolute path is relative from the project’s root directory.

Note

The configuration will respect environment variables, such as $HOME.

write_statepoints(statepoints=None, fn=None, indent=2)

Dump statepoints to a file.

If the file already contains statepoints, all new statepoints will be appended, while the old ones are preserved.

Parameters:
  • statepoints (iterable) – A list of statepoints, defaults to all statepoints which are defined in the workspace.
  • fn (str) – The filename of the file containing the statepoints, defaults to FN_STATEPOINTS.
  • indent (int) – Specify the indentation of the json file.

See also dump_statepoints().

signac.get_project(root=None)

Find a project configuration and return the associated project.

Parameters:root (str) – The project root directory. If no root directory is given, the next project found within or above the current working directory is returned.
Returns:The project handle.
Return type:Project
Raises:LookupError – If no project configuration can be found.
signac.init_project(name, root=None, workspace=None, make_dir=True)

Initialize a project with the given name.

It is safe to call this function multiple times with the same arguments. However, a RuntimeError is raised in case where an existing project configuration would conflict with the provided initialization parameters.

Parameters:
  • name (str) – The name of the project to initialize.
  • root (str) – The root directory for the project. Defaults to the current working directory.
  • workspace (str) – The workspace directory for the project. Defaults to $project_root/workspace.
  • make_dir (bool) – Create the project root directory, if it does not exist yet.
Returns:

The project handle of the initialized project.

Return type:

Project

Raises:

RuntimeError – If the project root path already contains a conflicting project configuration.

signac.get_database(name, hostname=None, config=None)

Get a database handle.

The database handle is an instance of Database, which provides access to the document collections within one database.

db = signac.db.get_database('MyDatabase')
docs = db.my_collection.find()

Please note, that a collection which did not exist at the point of access, will automatically be created.

Parameters:
  • name (str) – The name of the database to get.
  • hostname (str) – The name of the configured host. Defaults to the first configured host, or the host specified by default_host.
  • config (common.config.Config) – The config object to retrieve the host configuration from. Defaults to the global configuration.
Returns:

The database handle.

Return type:

pymongo.database.Database

signac.fetch(doc, mode='r', sources=None, ignore_linked_mirrors=False)

Fetch all data associated with this document.

The sources argument is either a list of filesystem-like objects or a list of file system configurations or a mix of both.

See contrib.filesystems.filesystems_from_config() for details.

Parameters:
  • doc (mapping) – A document which is part of an index.
  • mode – Mode to use for file opening.
  • sources – An optional set of sources to fetch files from.
  • ignore_linked_mirrors – Ignore all mirror information in the document’s link attribute.
Yields:

Data associated with this document in the specified format.

signac.fetch_one(doc, *args, **kwargs)

Fetch data associated with this document.

Unlike fetch(), this function returns only the first file associated with doc and ignores all others. This function returns None if not file is associated with the document.

Parameters:doc (mapping) – A document which is part of an index.
Returns:Data associated with this document or None.