API Reference¶
This is the API for the signac (core) application.
The Project¶
Attributes
Check the project's workspace for corruption. |
|
|
Clone job into this project. |
Get project's configuration. |
|
|
Create or update a persistent linked view of the selected data space. |
|
Detect the project's state point schema. |
Get data associated with this project. |
|
Get document associated with this project. |
|
Get document associated with this project. |
|
|
Export all jobs to a target location, such as a directory or a (compressed) archive file. |
|
Find all jobs in the project's workspace. |
|
Prepend a filename with the project path. |
|
Group jobs according to one or more state point or document parameters. |
|
Import the data space located at origin into this project. |
|
Check if a filename exists in the project path. |
Determine the minimum length required for a job id to be unique. |
|
|
Get a job handle associated with a state point. |
The path to the project directory. |
|
|
Attempt to repair the workspace after it got corrupted. |
Get HDF5 stores associated with this project. |
|
|
Synchronize this project with the other project. |
Update the persistent state point cache. |
|
The project's workspace directory. |
- class signac.Project(path=None)¶
Bases:
object
The handle on a signac project.
A
Project
may only be constructed in a directory that is already a signac project, i.e. a directory in whichinit_project()
has already been run. To search upwards in the folder hierarchy until a project is found, instead invokeget_project()
orProject.get_project()
.- Parameters:
path (str, optional) – The project directory. By default, the current working directory (Default value = None).
- FN_CACHE = '.signac/statepoint_cache.json.gz'¶
The default filename for the state point cache file.
- FN_DOCUMENT = 'signac_project_document.json'¶
The project’s document filename.
- KEY_DATA = 'signac_data'¶
The project’s datastore key.
- check()¶
Check the project’s workspace for corruption.
- Raises:
signac.errors.JobsCorruptedError – When one or more jobs are identified as corrupted.
- clone(job, copytree=None)¶
Clone job into this project.
Create an identical copy of job within this project.
See signac clone for the command line equivalent.
- Parameters:
job (
Job
) – The job to copy into this project.copytree (callable, optional) – The function used for copying directory tree structures. Uses
shutil.copytree()
ifNone
(Default value = None). The function requires that the target is a directory.
- Returns:
The job instance corresponding to the copied job.
- Return type:
- Raises:
DestinationExistsError – In case that a job with the same id is already initialized within this project.
- property config¶
Get project’s configuration.
The configuration is immutable once the Project is constructed. To modify a project configuration, use the command line or edit the configuration file directly.
See signac config for related command line tools.
- Returns:
Dictionary containing project’s configuration.
- Return type:
_ProjectConfig
- create_linked_view(prefix=None, job_ids=None, path=None)¶
Create or update a persistent linked view of the selected data space.
Similar to
export_to()
, this function expands the data space for the selected jobs, but instead of copying data will create symbolic links to the individual job directories. This is primarily useful for browsing through the data space using a file-browser with human-interpretable directory paths.By default, the paths of the view will be based on variable state point keys as part of the implicit schema of the selected jobs that we create the view for. For example, creating a linked view for a data space with schema
>>> print(project.detect_schema()) { 'foo': 'int([0, 1, 2, ..., 8, 9], 10)', }
by calling
project.create_linked_view('my_view')
will look similar to:my_view/foo/0/job -> workspace/b8fcc6b8f99c56509eb65568922e88b8 my_view/foo/1/job -> workspace/b6cd26b873ae3624653c9268deff4485 ...
It is possible to control the paths using the
path
argument, which behaves in the exact same manner as the equivalent argument forexport_to()
.Note
The behavior of this function is almost equivalent to
project.export_to('my_view', copytree=os.symlink)
with the major difference that view hierarchies are actually updated, meaning that invalid links are automatically removed.See signac view for the command line equivalent.
- Parameters:
prefix (str, optional) – The path where the linked view will be created or updated (Default value = None).
job_ids (iterable, optional) – If None (the default), create the view for the complete data space, otherwise only for this iterable of job ids.
path (str or callable, optional) – The path (function) used to structure the linked data space (Default value = None).
- Returns:
A dictionary that maps the source directory paths to the linked directory paths.
- Return type:
- property data¶
Get data associated with this project.
This property should be used for large array-like data, which can’t be stored efficiently in the project document. For examples and usage, see Centralized Project Data.
Equivalent to:
return project.stores['signac_data']
See also
H5Store
Usage examples.
- Returns:
An HDF5-backed datastore.
- Return type:
- detect_schema(exclude_const=False, subset=None)¶
Detect the project’s state point schema.
See signac schema for the command line equivalent.
- Parameters:
exclude_const (bool, optional) – Exclude all state point keys that are shared by all jobs within this project (Default value = False).
subset (sequence[Job or str], optional) – A sequence of jobs or job ids specifying a subset over which the state point schema should be detected (Default value = None).
- Returns:
The detected project schema.
- Return type:
ProjectSchema
- property doc¶
Get document associated with this project.
Alias for
document()
.- Returns:
The project document. Supports attribute-based access to dict keys.
- Return type:
MutableMapping
- property document¶
Get document associated with this project.
- Returns:
The project document. Supports attribute-based access to dict keys.
- Return type:
MutableMapping
- export_to(target, path=None, copytree=None)¶
Export all jobs to a target location, such as a directory or a (compressed) archive file.
Use this function in combination with
find_jobs()
to export only a select number of jobs, for example:project.find_jobs({'foo': 0}).export_to('foo_0.tar')
The
path
argument enables users to control how exactly the exported data space is to be expanded. By default, the path-function will be based on the implicit schema of the exported jobs. For example, exporting jobs that all differ by a state point key foo withproject.export_to('data/')
, the exported directory structure could look like this:data/foo/0 data/foo/1 ...
That would be equivalent to specifying
path=lambda job: os.path.join('foo', job.sp.foo)
.Instead of a function, we can also provide a string, where fields for state point keys are automatically formatted. For example, the following two path arguments are equivalent: “foo/{foo}” and “foo/{job.sp.foo}”.
Any attribute of job can be used as a field here, so
job.doc.bar
,job.id
, andjob.ws
can also be used as path fields.A special
{{auto}}
field allows us to expand the path automatically with state point keys that have not been specified explicitly. So, for example, one can providepath="foo/{foo}/{{auto}}"
to specify that the path shall begin withfoo/{foo}/
, but is then automatically expanded with all other state point key-value pairs. How key-value pairs are concatenated can be controlled via the format-specifier, so for example,path="{{auto:_}}"
will generate a structure such asdata/foo_0 data/foo_1 ...
Finally, providing
path=False
is equivalent topath="{job.id}"
.See also
import_from()
:Previously exported or non-signac data spaces can be imported.
- signac export :
See signac export for the command line equivalent.
- Parameters:
target (str) – A path to a directory to export to. The target can not already exist. Besides directories, possible targets are tar files (.tar), gzipped tar files (.tar.gz), zip files (.zip), bzip2-compressed files (.bz2), and xz-compressed files (.xz).
path (str or callable, optional) – The path (function) used to structure the exported data space. This argument must either be a callable which returns a path (str) as a function of job, a string where fields are replaced using the job-state point dictionary, or False, which means that we just use the job-id as path. Defaults to the equivalent of
{{auto}}
.copytree (callable, optional) – The function used for copying directory tree structures. Uses
shutil.copytree()
ifNone
(Default value = None). The function requires that the target is a directory.
- Returns:
A dict that maps the source directory paths, to the target directory paths.
- Return type:
- find_jobs(filter=None)¶
Find all jobs in the project’s workspace.
The filter argument must be a JSON-serializable Mapping of key-value pairs. The
filter
argument can search against both job state points and job documents. See https://docs.signac.io/en/latest/query.html#query-namespaces for a description of supported queries.See signac find for the command line equivalent.
Tip
To find a single job given a state point, use open_job with O(1) cost.
Tip
To find many groups of jobs, use your own code to loop through the project once and build multiple matching lists.
Warning
find_jobs costs O(N) each time it is called. It applies the filter to every job in the workspace.
- Parameters:
filter (Mapping, optional) – A mapping of key-value pairs used for the query (Default value = None).
- Returns:
JobsCursor of jobs matching the provided filter.
- Return type:
- Raises:
TypeError – If the filters are not JSON serializable.
ValueError – If the filters are invalid.
- fn(filename)¶
Prepend a filename with the project path.
- classmethod get_job(path=None)¶
Find a Job in or above the current working directory (or provided path).
- Parameters:
path (str, optional) – The starting point to search for a job. If None, the current working directory is used (Default value = None).
- Returns:
The first job found in or above the provided path.
- Return type:
- Raises:
LookupError – If a job cannot be found.
- classmethod get_project(path=None, search=True, **kwargs)¶
Find a project configuration and return the associated project.
- Parameters:
path (str, optional) – The starting point to search for a project. If None, the current working directory is used (Default value = None).
search (bool, optional) – If True, search for project configurations inside and above the specified path, otherwise only return a project in the specified path (Default value = True).
**kwargs – Optional keyword arguments that are forwarded to the
Project
class constructor.
- Returns:
An instance of
Project
.- Return type:
- Raises:
LookupError – If no project configuration can be found.
- groupby(key=None, default=None)¶
Group jobs according to one or more state point or document parameters.
Prepend the key with ‘sp.’ or ‘doc.’ to specify the query namespace. If no prefix is specified, group by state point key.
This method can be called on any
JobsCursor
such as the one returned byfind_jobs()
or by iterating over a project.Examples
# Group jobs by state point parameter 'a'. for key, group in project.groupby('a'): print(key, list(group)) # Group jobs by document value 'a'. for key, group in project.groupby('doc.a'): print(key, list(group)) # Group jobs by jobs.sp['a'] and job.document['b'] for key, group in project.groupby(('a', 'doc.b')): print(key, list(group)) # Find jobs where job.sp['a'] is 1 and group them # by job.sp['b'] and job.sp['c']. for key, group in project.find_jobs({'a': 1}).groupby(('b', 'c')): print(key, list(group)) # Group by job.sp['d'] and job.document['count'] using a lambda. for key, group in project.groupby( lambda job: (job.sp['d'], job.document['count']) ): print(key, list(group))
If key is None, jobs are grouped by id, placing one job into each group.
If default is None, only jobs with the key defined will be grouped. Jobs without the key will be filtered out and not included in any group.
- Parameters:
key (str, iterable, or callable, optional) – The grouping key(s) passed as a string, iterable of strings, or a callable that will be passed one argument, the job (Default value = None).
default (object, optional) – A default value to be used when a given key is not present. The value must be sortable and is only used if not None (Default value = None).
- Yields:
key – Key identifying this group.
group (iterable of Jobs) – Iterable of Job instances matching this group.
- import_from(origin=None, schema=None, sync=None, copytree=None)¶
Import the data space located at origin into this project.
This function will walk through the data space located at origin and will try to identify data space paths that can be imported as a job workspace into this project.
The
schema
argument expects a function that takes a path argument and returns a state point dictionary. A default function is used when no argument is provided. The default schema function will simply look for state point files – usually namedsignac_statepoint.json
– and then import all data located within that path into the job workspace corresponding to the specified state point.Alternatively the schema argument may be a string, that is converted into a schema function, for example: Providing
foo/{foo:int}
as schema argument means that all directories underfoo/
will be imported and their names will be interpreted as the value forfoo
within the state point.Tip
Use
copytree=os.replace
orcopytree=shutil.move
to move dataspaces on import instead of copying them.Warning: Imports can fail due to conflicts. Moving data instead of copying may therefore lead to inconsistent states and users are advised to apply caution.
See also
export_to()
: Export the project data space.- signac import :
See signac import for the command line equivalent.
- Parameters:
origin (str, optional) – The path to the data space origin, which is to be imported. This may be a path to a directory, a zip file, or a tarball archive (Default value = None).
schema (callable, optional) – An optional schema function, which is either a string or a function that accepts a path as its first and only argument and returns the corresponding state point as dict. (Default value = None).
sync (bool or dict, optional) – If
True
, the project will be synchronized with the imported data space. If a dict of keyword arguments is provided, the arguments will be used forsync()
(Default value = None).copytree (callable, optional) – The function used for copying directory tree structures. Uses
shutil.copytree()
ifNone
(Default value = None). The function requires that the target is a directory.
- Returns:
A dict that maps the source directory paths to the target directory paths.
- Return type:
- classmethod init_project(path=None)¶
Initialize a project in the provided directory.
It is safe to call this function multiple times with the same arguments. However, a RuntimeError is raised if an existing project configuration would conflict with the provided initialization parameters.
See signac init for the command line equivalent.
- isfile(filename)¶
Check if a filename exists in the project path.
- min_len_unique_id()¶
Determine the minimum length required for a job id to be unique.
This method’s runtime scales with the number of jobs in the workspace.
- Returns:
Minimum string length of a unique job identifier.
- Return type:
- open_job(statepoint=None, id=None)¶
Get a job handle associated with a state point.
This method returns the job instance associated with the given state point or job id. Opening a job by a valid state point never fails. Opening a job by id requires a lookup of the state point from the job id, which may fail if the job was not previously initialized.
- Parameters:
- Returns:
The job instance.
- Return type:
- Raises:
KeyError – If the attempt to open the job by id fails.
LookupError – If the attempt to open the job by an abbreviated id returns more than one match.
- repair(job_ids=None)¶
Attempt to repair the workspace after it got corrupted.
This method will attempt to repair lost or corrupted job state point files using a state point cache.
- Parameters:
job_ids (iterable[str], optional) – An iterable of job ids that should get repaired. Defaults to all jobs.
- Raises:
signac.errors.JobsCorruptedError – When one or more corrupted job could not be repaired.
- property stores¶
Get HDF5 stores associated with this project.
Use this property to access an HDF5 file within the project directory using the
H5Store
dict-like interface.This is an example for accessing an HDF5 file called
'my_data.h5'
within the project directory:project.stores['my_data']['array'] = np.random((32, 4))
This is equivalent to:
H5Store(project.fn('my_data.h5'))['array'] = np.random((32, 4))
Both the project.stores and the H5Store itself support attribute access. The above example could therefore also be expressed as:
project.stores.my_data.array = np.random((32, 4))
- Returns:
The HDF5 store manager for this project.
- Return type:
- sync(other, strategy=None, exclude=None, doc_sync=None, selection=None, **kwargs)¶
Synchronize this project with the other project.
Try to clone all jobs from the other project to this project. If a job is already part of this project, try to synchronize the job using the optionally specified strategies.
See signac sync for the command line equivalent.
- Parameters:
other (
Project
) – The other project to synchronize this project with.strategy (callable, optional) – A synchronization strategy for file conflicts. If no strategy is provided, a
SyncConflict
exception will be raised upon conflict (Default value = None).exclude (str, optional) – A filename exclude pattern. All files matching this pattern will be excluded from synchronization (Default value = None).
doc_sync (attribute or callable from
DocSync
, optional) – A synchronization strategy for document keys. If this argument is None, by default no keys will be synchronized upon conflict (Default value = None).selection (sequence of
Job
or job ids (str), optional) – Only synchronize the given selection of jobs (Default value = None).**kwargs – This method also accepts the same keyword arguments as the
sync_projects()
function.
- Raises:
DocumentSyncConflict – If there are conflicting keys within the project or job documents that cannot be resolved with the given strategy or if there is no strategy provided.
FileSyncConflict – If there are differing files that cannot be resolved with the given strategy or if no strategy is provided.
SchemaSyncConflict – In case that the check_schema argument is True and the detected state point schema of this and the other project differ.
- temporary_project(dir=None)¶
Context manager for the initialization of a temporary project.
The temporary project is by default created within the parent project’s workspace to ensure that they share the same file system. This is an example for how this method can be used for the import and synchronization of external data spaces.
with project.temporary_project() as tmp_project: tmp_project.import_from('/data') project.sync(tmp_project)
- to_dataframe(*args, **kwargs)¶
Export the project metadata to a pandas
DataFrame
.The arguments to this function are forwarded to
to_dataframe()
.- Parameters:
*args – Forwarded to
to_dataframe()
.**kwargs – Forwarded to
to_dataframe()
.
- Return type:
- update_cache()¶
Update the persistent state point cache.
This function updates a persistent state point cache, which is stored in the project directory. Most data space operations, including iteration and filtering or selection are expected to be significantly faster after calling this function, especially for large data spaces.
The JobsCursor class¶
Attributes
|
Export all jobs to a target location, such as a directory or a (zipped) archive file. |
|
Group jobs according to one or more state point or document parameters. |
|
Convert the selection of jobs to a pandas |
- class signac.project.JobsCursor(project, filter=None)¶
Bases:
object
An iterator over a search query result.
Application developers should not directly instantiate this class, but use
find_jobs()
instead.Enables simple iteration and grouping operations.
Warning
JobsCursor caches the jobs that match the filter. Call Project.find_jobs again to update the search after making changes to jobs or the workspace that would change the result of the search.
- Parameters:
project (
Project
) – Project handle.filter (Mapping) – A mapping of key-value pairs used for the query (Default value = None).
- export_to(target, path=None, copytree=None)¶
Export all jobs to a target location, such as a directory or a (zipped) archive file.
See also
export_to()
For full details on how to use this function.
- Parameters:
target (str) – A path to a directory or archive file to export to.
path (str or callable) – The path (function) used to structure the exported data space (Default value = None).
copytree (callable, optional) – The function used for copying directory tree structures. Uses
shutil.copytree()
ifNone
(Default value = None). The function requires that the target is a directory.
- Returns:
A dictionary that maps the source directory paths to the target directory paths.
- Return type:
- groupby(key=None, default=None)¶
Group jobs according to one or more state point or document parameters.
Prepend the key with ‘sp.’ or ‘doc.’ to specify the query namespace. If no prefix is specified, group by state point key.
This method can be called on any
JobsCursor
such as the one returned byfind_jobs()
or by iterating over a project.Examples
# Group jobs by state point parameter 'a'. for key, group in project.groupby('a'): print(key, list(group)) # Group jobs by document value 'a'. for key, group in project.groupby('doc.a'): print(key, list(group)) # Group jobs by jobs.sp['a'] and job.document['b'] for key, group in project.groupby(('a', 'doc.b')): print(key, list(group)) # Find jobs where job.sp['a'] is 1 and group them # by job.sp['b'] and job.sp['c']. for key, group in project.find_jobs({'a': 1}).groupby(('b', 'c')): print(key, list(group)) # Group by job.sp['d'] and job.document['count'] using a lambda. for key, group in project.groupby( lambda job: (job.sp['d'], job.document['count']) ): print(key, list(group))
If key is None, jobs are grouped by id, placing one job into each group.
If default is None, only jobs with the key defined will be grouped. Jobs without the key will be filtered out and not included in any group.
- Parameters:
key (str, iterable, or callable, optional) – The grouping key(s) passed as a string, iterable of strings, or a callable that will be passed one argument, the job (Default value = None).
default (object, optional) – A default value to be used when a given key is not present. The value must be sortable and is only used if not None (Default value = None).
- Yields:
key – Key identifying this group.
group (iterable of Jobs) – Iterable of Job instances matching this group.
- to_dataframe(sp_prefix='sp.', doc_prefix='doc.', usecols=None, flatten=False)¶
Convert the selection of jobs to a pandas
DataFrame
.This function exports the job metadata to a
pandas.DataFrame
. All state point and document keys are prefixed by default to be able to distinguish them.- Parameters:
sp_prefix (str, optional) – Prefix state point keys with the given string. Defaults to “sp.”.
doc_prefix (str, optional) – Prefix document keys with the given string. Defaults to “doc.”.
usecols (list-like or callable, optional) – Used to select a subset of columns. If list-like, must contain strings corresponding to the column names that should be included. For example,
['sp.a', 'doc.notes']
. If callable, the column will be included if the function called on the column name returns True. For example,lambda x: 'sp.' in x
. Defaults toNone
, which uses all columns from the state point and document. Note that this filter is applied after the doc and sp prefixes are added to the column names.flatten (bool, optional) – Whether nested state points or document keys should be flattened. If True,
{'a': {'b': 'c'}}
becomes a column nameda.b
with valuec
. If False, it becomes a column nameda
with value{'b': 'c'}
. Defaults toFalse
.
- Returns:
A pandas DataFrame with all job metadata.
- Return type:
The Job class¶
Attributes
Get a copy of the job's state point as a read-only mapping. |
|
Remove all job data, but not the job itself. |
|
Close the job and switch to the previous working directory. |
|
Get data associated with this job. |
|
Alias for |
|
Get document associated with this job. |
|
|
Prepend a filename with the job path. |
Get the unique identifier for the job's state point. |
|
|
Initialize the job's workspace directory. |
|
Check if a filename exists in the job directory. |
|
Move this job to project. |
|
Enter the job's workspace directory. |
The path to the job directory. |
|
Get the project that contains this job. |
|
Remove the job's workspace including the job document. |
|
Remove all job data, but not the job itself. |
|
Alias for |
|
Get or set the job's state point. |
|
Get HDF5 stores associated with this job. |
|
|
Perform a one-way synchronization of this job with the other job. |
|
Change the state point of this job while preserving job data. |
- class signac.job.Job(project, statepoint=None, id_=None, directory_known=False)¶
Bases:
object
The job instance is a handle to the data of a unique state point.
Application developers should not directly instantiate this class, but use
open_job()
instead.Jobs can be opened by
statepoint
orid_
. If both values are provided, it is the user’s responsibility to ensure that the values correspond. Setdirectory_known
toTrue
when the job directory is known to exist - this skips some expensive isdir checks.- Parameters:
- FN_DOCUMENT = 'signac_job_document.json'¶
The job’s document filename.
- FN_STATE_POINT = 'signac_statepoint.json'¶
The job’s state point filename.
The job state point is a human-readable file containing the job’s state point that is stored in each job’s workspace directory.
- KEY_DATA = 'signac_data'¶
The job’s datastore key.
- property cached_statepoint¶
Get a copy of the job’s state point as a read-only mapping.
cached_statepoint
uses the state point cache to provide fast access to the job’s state point for reading.Note
Create and update the state point cache by calling
project.update_cache
or runningsignac update-cache
on the command line.See also
Use
statepoint
to modify the job’s state point.- Returns:
Returns the job’s state point.
- Return type:
Mapping
- clear()¶
Remove all job data, but not the job itself.
This function will do nothing if the job was not previously initialized.
See signac rm -c for the command line equivalent.
- close()¶
Close the job and switch to the previous working directory.
- property data¶
Get data associated with this job.
This property should be used for large array-like data, which can’t be stored efficiently in the job document. For examples and usage, see Job Data Storage.
Equivalent to:
return job.stores['signac_data']
- Returns:
An HDF5-backed datastore.
- Return type:
- property doc¶
Alias for
document
.Warning
Even deep copies of
doc
will modify the same file, so changes will still effectively be persisted between deep copies. If you need a deep copy that will not modify the underlying persistent JSON file, use the call operator to get an equivalent plain dictionary:job.doc()
.See signac document for the command line equivalent.
- Returns:
The job document handle. Supports attribute-based access to dict keys.
- Return type:
MutableMapping
- property document¶
Get document associated with this job.
Warning
Even deep copies of
document
will modify the same file, so changes will still effectively be persisted between deep copies. If you need a deep copy that will not modify the underlying persistent JSON file, use the call operator to get an equivalent plain dictionary:job.document()
. For more information, seeJSONDict
.See signac document for the command line equivalent.
- Returns:
The job document handle. Supports attribute-based access to dict keys.
- Return type:
MutableMapping
- fn(filename)¶
Prepend a filename with the job path.
- property id¶
Get the unique identifier for the job’s state point.
- Returns:
The job id.
- Return type:
- init(force=False, validate_statepoint=True)¶
Initialize the job’s workspace directory.
This function will do nothing if the directory and the job state point already exist and the state point is valid.
Returns the calling job.
See signac job -c for the command line equivalent.
- Parameters:
force (bool, optional) – Overwrite any existing state point files, e.g., to repair them if they got corrupted (Default value = False).
validate_statepoint (bool, optional) – When True (the default), load the job state point and ensure that it matches the id. When False, exit early when the job directory exists.
- Returns:
The job handle.
- Return type:
- Raises:
OSError – If the workspace directory cannot be created or any other I/O error occurs when attempting to save the state point file.
JobsCorruptedError – If the job state point on disk is corrupted.
- isfile(filename)¶
Check if a filename exists in the job directory.
- move(project)¶
Move this job to project.
This function will attempt to move this instance of job from its original project to a different project.
See signac move for the command line equivalent.
- Parameters:
project (
Project
) – The project to move this job to.
- open()¶
Enter the job’s workspace directory.
You can use the Job class as context manager:
with project.open_job(my_statepoint) as job: # Manipulate your job data pass
Opening the context will switch into the job’s workspace, leaving it will switch back to the previous working directory.
- property path¶
The path to the job directory.
See signac job -w for the command line equivalent.
- Type:
- property project¶
Get the project that contains this job.
- Returns:
Returns the project containing this job.
- Return type:
- remove()¶
Remove the job’s workspace including the job document.
This function will do nothing if the workspace directory does not exist.
See signac rm for the command line equivalent.
- reset()¶
Remove all job data, but not the job itself.
This function will initialize the job if it was not previously initialized.
- property sp¶
Alias for
statepoint
.
- property statepoint¶
Get or set the job’s state point.
Setting the state point to a different value will change the job id.
For more information, see Modifying the State Point.
Tip
Use
cached_statepoint
for fast read-only access to the state point.Warning
The state point object behaves like a dictionary in most cases, but because it persists changes to the filesystem, making a copy requires explicitly converting it to a dict. If you need a modifiable copy that will not modify the underlying JSON file, you can access a dict copy of the state point by calling it, e.g.
sp_dict = job.statepoint()
instead ofsp = job.statepoint
. For more information, seeJSONAttrDict
.See signac statepoint for the command line equivalent.
Danger
Use this function with caution! Resetting a job’s state point may sometimes be necessary, but can possibly lead to incoherent data spaces.
- Returns:
Returns the job’s state point. Supports attribute-based access to dict keys.
- Return type:
MutableMapping
- property stores¶
Get HDF5 stores associated with this job.
Use this property to access an HDF5 file within the job’s workspace directory using the
H5Store
dict-like interface.This is an example for accessing an HDF5 file called ‘my_data.h5’ within the job’s workspace:
job.stores['my_data']['array'] = np.random((32, 4))
This is equivalent to:
H5Store(job.fn('my_data.h5'))['array'] = np.random((32, 4))
Both the
stores
and theH5Store
itself support attribute access. The above example could therefore also be expressed as:job.stores.my_data.array = np.random((32, 4))
- Returns:
The HDF5-Store manager for this job.
- Return type:
- sync(other, strategy=None, exclude=None, doc_sync=None, **kwargs)¶
Perform a one-way synchronization of this job with the other job.
By default, this method will synchronize all files and document data with the other job to this job until a synchronization conflict occurs. There are two different kinds of synchronization conflicts:
The two jobs have files with the same, but different content.
The two jobs have documents that share keys, but those keys are associated with different values.
A file conflict can be resolved by providing a ‘FileSync’ strategy or by excluding files from the synchronization. An unresolvable conflict is indicated with the raise of a
FileSyncConflict
exception.A document synchronization conflict can be resolved by providing a doc_sync function that takes the source and the destination document as first and second argument.
- Parameters:
other (Job) – The other job to synchronize from.
strategy (callable, optional) – A synchronization strategy for file conflicts. If no strategy is provided, a
SyncConflict
exception will be raised upon conflict (Default value = None).exclude (str, optional) – A filename exclude pattern. All files matching this pattern will be excluded from synchronization (Default value = None).
doc_sync (attribute or callable from
DocSync
, optional) – A synchronization strategy for document keys. If this argument is None, by default no keys will be synchronized upon conflict (Default value = None).dry_run (bool, optional) – If True, do not actually perform the synchronization.
**kwargs – Extra keyword arguments will be forwarded to the
sync_jobs()
function which actually executes the synchronization operation.
- Raises:
FileSyncConflict – In case that a file synchronization results in a conflict.
- update_statepoint(update, overwrite=False)¶
Change the state point of this job while preserving job data.
By default, this method will not change existing parameters of the state point of the job.
This method will change the job id if the state point has been altered.
For more information, see Modifying the State Point.
Warning
While appending to a job’s state point is generally safe, modifying existing parameters may lead to data inconsistency. Use the
overwrite
argument with caution!- Parameters:
update (dict) – A mapping used for the state point update.
overwrite (bool, optional) – If False, an error will be raised if the update modifies the values of existing keys in the state point. If True, any existing keys will be overwritten in the same way as
dict.update()
. Use with caution! (Default value = False).
- Raises:
KeyError – If the update contains keys which are already part of the job’s state point and
overwrite
is False.DestinationExistsError – If a job associated with the new state point is already initialized.
OSError – If the move failed due to an unknown system related error.
The JSONDict¶
This class implements the interface for the job’s statepoint
and document
attributes, but can also be used on its own.
- signac.JSONDict¶
alias of
BufferedJSONAttrDict
The H5Store¶
This class implements the interface to the job’s data
attribute, but can also be used on its own.
- class signac.H5Store(filename, **kwargs)¶
An HDF5-backed container for storing array-like and dictionary-like data.
The H5Store is a
MutableMapping
and therefore behaves similar to adict
, but all data is stored persistently in the associated HDF5 file on disk.Supported types include:
built-in types (int, float, str, bool, NoneType, array)
numpy arrays
pandas data frames (requires pandas and pytables)
mappings with values that are supported types
Values can be accessed as attributes (
h5s.foo
) or via key index (h5s['foo']
).Examples
>>> from signac import H5Store >>> with H5Store('file.h5') as h5s: ... h5s['foo'] = 'bar' ... assert 'foo' in h5s ... assert h5s.foo == 'bar' ... assert h5s['foo'] == 'bar'
The H5Store can be used as a context manager to ensure that the underlying file is opened, however most built-in types (excluding arrays) can be read and stored without the need to explicitly open the file. To access arrays (reading or writing), the file must always be opened!
To open a file in read-only mode, use the
open()
method withmode='r'
:>>> with H5Store('file.h5').open(mode='r') as h5s: ... pass
- Parameters:
- clear()¶
Remove all data from this store.
Danger
All data will be removed, this action cannot be reversed!
- close()¶
Close the underlying HDF5 file.
- property file¶
Access the underlying instance of
h5py.File
.This property exposes the underlying
h5py.File
object, enabling use of functions such ash5py.Group.create_dataset()
orh5py.Group.require_dataset()
.Note
The store must be open to access this property!
- Returns:
The
h5py.File
object that this store is operating on.- Return type:
- Raises:
H5StoreClosedError – If the store is closed.
- property filename¶
Return the H5Store filename.
- flush()¶
Flush the underlying HDF5 file.
- get(k[, d]) D[k] if k in D, else d. d defaults to None. ¶
- items() a set-like object providing a view on D's items ¶
- keys() a set-like object providing a view on D's keys ¶
- property mode¶
Return the default opening mode of this H5Store.
- open(mode=None)¶
Open the underlying HDF5 file.
- pop(k[, d]) v, remove specified key and return the corresponding value. ¶
If key is not found, d is returned if given, otherwise KeyError is raised.
- popitem() (k, v), remove and return some (key, value) pair ¶
as a 2-tuple; but raise KeyError if D is empty.
- setdefault(key, value)¶
Set a value for a key if that key is not already set.
- update([E, ]**F) None. Update D from mapping/iterable E and F. ¶
If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
- values() an object providing a view on D's values ¶
The H5StoreManager¶
This class implements the interface to the job’s stores
attribute, but can also be used on its own.
- class signac.H5StoreManager(prefix)¶
Bases:
_DictManager
Helper class to manage multiple instances of
H5Store
within a directory.- Parameters:
prefix (str) – The directory prefix shared by all files managed by this class.
Examples
Assuming that the
stores/
directory exists:>>> stores = H5StoreManager('stores/') >>> stores.data <H5Store(filename=stores/data.h5)> >>> stores.data.foo = True >>> dict(stores.data) {'foo': True}
- keys()¶
Return an iterable of keys.
- property prefix¶
Return the prefix.
Top-level functions¶
The signac framework aids in the management of large and heterogeneous data spaces.
It provides a simple and robust data model to create a well-defined, indexable storage layout for data and metadata. This makes it easier to operate on large data spaces, streamlines post-processing and analysis, and makes data collectively accessible.
- signac.TemporaryProject(cls=None, **kwargs)¶
Context manager for the generation of a temporary project.
This is a factory function that creates a Project within a temporary directory and must be used as context manager, for example like this:
with TemporaryProject() as tmp_project: tmp_project.import_from('/data')
- signac.buffered(buffer_capacity=None)¶
Enter context to buffer all operations for this backend.
- Parameters:
buffer_capacity (int) – The capacity of the buffer to use within this context (resets after the context is exited).
- signac.diff_jobs(*jobs)¶
Find differences among a list of jobs’ state points.
The resulting diff is a dictionary where the keys are job ids and the values are each job’s state point minus the intersection of all provided jobs’ state points. The comparison is performed over the combined set of keys and values.
See signac diff for the command line equivalent.
- Parameters:
*jobs (sequence[
Job
]) – Sequence of jobs to diff.- Returns:
A dictionary where the keys are job ids and values are the unique parts of that job’s state point.
- Return type:
Examples
>>> import signac >>> project = signac.init_project() >>> job1 = project.open_job({'constant': 42, 'diff1': 0, 'diff2': 1}).init() >>> job2 = project.open_job({'constant': 42, 'diff1': 1, 'diff2': 1}).init() >>> job3 = project.open_job({'constant': 42, 'diff1': 2, 'diff2': 2}).init() >>> print(job1) c4af2b26f1fd256d70799ad3ce3bdad0 >>> print(job2) b96b21fada698f8934d58359c72755c0 >>> print(job3) e4289419d2b0e57e4852d44a09f167c0 >>> signac.diff_jobs(job1, job2, job3) {'c4af2b26f1fd256d70799ad3ce3bdad0': {'diff2': 1, 'diff1': 0}, 'b96b21fada698f8934d58359c72755c0': {'diff2': 1, 'diff1': 1}, 'e4289419d2b0e57e4852d44a09f167c0': {'diff2': 2, 'diff1': 2}} >>> signac.diff_jobs(*project) {'c4af2b26f1fd256d70799ad3ce3bdad0': {'diff2': 1, 'diff1': 0}, 'b96b21fada698f8934d58359c72755c0': {'diff2': 1, 'diff1': 1}, 'e4289419d2b0e57e4852d44a09f167c0': {'diff2': 2, 'diff1': 2}}
- signac.get_buffer_capacity()¶
Get the current buffer capacity.
- Returns:
The amount of data that can be stored before a flush is triggered in the appropriate units for a particular buffering implementation.
- Return type:
- signac.get_current_buffer_size()¶
Get the total amount of data currently stored in the buffer.
- Returns:
The size of all data contained in the buffer in the appropriate units for a particular buffering implementation.
- Return type:
- signac.get_job(path=None)¶
Find a Job in or above the provided path (or the current working directory).
- Parameters:
path (str, optional) – The starting point to search for a job. If None, the current working directory is used (Default value = None).
- Returns:
The first job found in or above the provided path.
- Return type:
- Raises:
LookupError – If a job cannot be found.
Examples
When the current directory is a job directory:
>>> signac.get_job() signac.job.Job(project=..., statepoint={...})
- signac.get_project(path=None, search=True, **kwargs)¶
Find a project configuration and return the associated project.
- Parameters:
path (str, optional) – The starting point to search for a project. If None, the current working directory is used (Default value = None).
search (bool, optional) – If True, search for project configurations inside and above the specified path, otherwise only return a project in the specified path (Default value = True).
**kwargs – Optional keyword arguments that are forwarded to
get_project()
.
- Returns:
An instance of
Project
.- Return type:
- Raises:
LookupError – If no project configuration can be found.
- signac.init_project(path=None)¶
Initialize a project.
It is safe to call this function multiple times with the same arguments. However, a RuntimeError is raised if an existing project configuration would conflict with the provided initialization parameters.
- Parameters:
path (str, optional) – The directory for the project. Defaults to the current working directory.
- Returns:
The initialized project instance.
- Return type:
- Raises:
RuntimeError – If the project path already contains a conflicting project configuration.
- signac.is_buffered()¶
Check if this backend is currently buffered.
Submodules¶
signac.sync module¶
Synchronization of jobs and projects.
Jobs may be synchronized by copying all data from the source job to the destination job. This means all files are copied and the documents are synchronized. Conflicts, that means both jobs contain conflicting data, may be resolved with a user defined strategy.
The synchronization of projects is in essence the synchronization of all jobs which are in the destination project with the ones in the source project and the sync synchronization of the project document. If a specific job does not exist yet at the destination it is simply cloned, otherwise it is synchronized.
A sync strategy is a function (or functor) that takes the source job,
the destination job, and the name of the file generating the conflict
as arguments and returns the decision whether to overwrite the file as
Boolean. There are some default strategies defined within this module as
part of the FileSync
class. These are the default strategies:
always – Always overwrite on conflict.
never – Never overwrite on conflict.
update – Overwrite when the modification time of the source file is newer.
Ask – Ask the user interactively about each conflicting filename.
For example, to synchronize two projects resolving conflicts by modification time, use:
dest_project.sync(source_project, strategy=sync.FileSync.update)
Unlike files, which are always either overwritten as a whole or not, documents
can be synchronized more fine-grained with a sync function. Such a function (or
functor) takes the source and the destination document as arguments and performs
the synchronization. The user is encouraged to implement their own sync functions,
but there are a few default functions implemented as part of the DocSync
class:
NO_SYNC – Do not perform any synchronization.
COPY – Apply the same strategy used to resolve file conflicts.
update – Equivalent to dst.update(src).
ByKey – Synchronize the source document key by key, more information below.
This is how we could synchronize two jobs, where the documents are synchronized with a simple update function:
dst_job.sync(src_job, doc_sync=sync.DocSync.update)
The DocSync.ByKey
functor attempts to synchronize the destination document
with the source document without overwriting any data. That means this function
behaves similar to update()
for a non-intersecting set of keys,
but in addition will preserve nested mappings without overwriting values. In addition,
any key conflict, that means keys that are present in both documents, but have
differing data, will lead to the raise of a DocumentSyncConflict
exception.
The user may expclitly decide to overwrite certain keys by providing a “key-strategy”,
which is a function that takes the conflicting key as argument, and returns the
decision whether to overwrite that specific key as Boolean. For example, to sync
two jobs, where conflicting keys should only be overwritten if they contain the
term ‘foo’, we could execute:
dst_job.sync(src_job, doc_sync=sync.DocSync.ByKey(lambda key: 'foo' in key))
This means that all documents are synchronized ‘key-by-key’ and only conflicting keys that
contain the word “foo” will be overwritten, any other conflicts would lead to the
raise of a DocumentSyncConflict
exception. A key-strategy may also be
a regular expression, so the synchronization above could also be achieved with:
dst_job.sync(src_job, doc_sync=sync.DocSync.ByKey('foo'))
- class signac.sync.DocSync¶
Bases:
object
Collection of document synchronization functions.
- COPY = 'copy'¶
Copy (and potentially overwrite) documents like any other file.
- NO_SYNC = False¶
Do not synchronize documents.
- static update(src, dst)¶
Perform a simple update.
- class signac.sync.FileSync¶
Bases:
object
Collection of file synchronization strategies.
- class Ask¶
Bases:
object
Resolve sync conflicts by asking whether a file should be overwritten interactively.
- static always(src, dst, fn)¶
Resolve sync conflicts by always overwriting.
- classmethod keys()¶
Return keys.
- static never(src, dst, fn)¶
Resolve sync conflicts by never overwriting.
- static update(src, dst, fn)¶
Resolve sync conflicts based on newest modified timestamp.
- signac.sync.sync_jobs(src, dst, strategy=None, exclude=None, doc_sync=None, recursive=False, follow_symlinks=True, preserve_permissions=False, preserve_times=False, preserve_owner=False, preserve_group=False, deep=False, dry_run=False)¶
Synchronize the dst job with the src job.
By default, this method will synchronize all files and document data of dst job with the src job until a synchronization conflict occurs. There are two different kinds of synchronization conflicts:
The two jobs have files with the same name, but different content.
The two jobs have documents that share keys, but those keys are mapped to different values.
A file conflict can be resolved by providing a ‘FileSync’ strategy or by excluding files from the synchronization. An unresolvable conflict is indicated with the raise of a
FileSyncConflict
exception.A document synchronization conflict can be resolved by providing a doc_sync function that takes the source and the destination document as first and second argument.
- Parameters:
src (
Job
) – The src job, data will be copied from this job’s workspace.dst (
Job
) – The dst job, data will be copied to this job’s workspace.strategy (callable, optional) – A synchronization strategy for file conflicts. The strategy should be a callable with signature
strategy(src, dst, filepath)
wheresrc
anddst
are the source and destination instances ofProject
andfilepath
is the filepath relative to the project path. If no strategy is provided, aerrors.SyncConflict
exception will be raised upon conflict. (Default value = None)exclude (str, optional) – A filename exclusion pattern. All files matching this pattern will be excluded from the synchronization process. (Default value = None)
doc_sync (attribute or callable from
DocSync
, optional) – A synchronization strategy for document keys. The default is to use a safe key-by-key strategy that will not overwrite any values on conflict, but instead raises aDocumentSyncConflict
exception.recursive (bool, optional) – Recursively synchronize sub-directories encountered within the job workspace directories. (Default value = False)
follow_symlinks (bool, optional) – Follow and copy the target of symbolic links. (Default value = True)
preserve_permissions (bool, optional) – Preserve file permissions (Default value = False)
preserve_times (bool, optional) – Preserve file modification times (Default value = False)
preserve_owner (bool, optional) – Preserve file owner (Default value = False)
preserve_group (bool, optional) – Preserve file group ownership (Default value = False)
dry_run (bool, optional) – If True, do not actually perform any synchronization operations. (Default value = False)
deep (bool, optional) – (Default value = False)
- signac.sync.sync_projects(source, destination, strategy=None, exclude=None, doc_sync=None, selection=None, check_schema=True, recursive=False, follow_symlinks=True, preserve_permissions=False, preserve_times=False, preserve_owner=False, preserve_group=False, deep=False, dry_run=False, parallel=False, collect_stats=False)¶
Synchronize the destination project with the source project.
Try to clone all jobs from the source to the destination. If the destination job already exist, try to synchronize the job using the optionally specified strategy.
- Parameters:
source (class:~.Project) – The project presenting the source for synchronization.
destination (class:~.Project) – The project that is modified for synchronization.
strategy (callable, optional) – A synchronization strategy for file conflicts. The strategy should be a callable with signature
strategy(src, dst, filepath)
wheresrc
anddst
are the source and destination instances ofProject
andfilepath
is the filepath relative to the project path. If no strategy is provided, aerrors.SyncConflict
exception will be raised upon conflict. (Default value = None)exclude (str, optional) – A filename exclusion pattern. All files matching this pattern will be excluded from the synchronization process. (Default value = None)
doc_sync (attribute or callable from
DocSync
) – A synchronization strategy for document keys. The default is to use a safe key-by-key strategy that will not overwrite any values on conflict, but instead raises aDocumentSyncConflict
exception.selection (sequence of
Job
or job ids (str), optional) – Only synchronize the given selection of jobs. (Default value = None)check_schema (bool, optional) – If True, only synchronize if this and the other project have a matching state point schema. See also:
detect_schema()
. (Default value = True)recursive (bool, optional) – Recursively synchronize sub-directories encountered within the job workspace directories. (Default value = False)
follow_symlinks (bool, optional) – Follow and copy the target of symbolic links. (Default value = True)
preserve_permissions (bool, optional) – Preserve file permissions (Default value = False)
preserve_times (bool, optional) – Preserve file modification times (Default value = False)
preserve_owner (bool, optional) – Preserve file owner (Default value = False)
preserve_group (bool, optional) – Preserve file group ownership (Default value = False)
dry_run (bool, optional) – If True, do not actually perform the synchronization operation, just log what would happen theoretically. Useful to test synchronization strategies without the risk of data loss. (Default value = False)
deep (bool, optional) – (Default value = False)
parallel (bool, optional) – (Default value = False)
collect_stats (bool, optional) – (Default value = False)
- Returns:
Returns stats if
collect_stats
isTrue
, elseNone
.- Return type:
NoneType or
FileTransferStats
- Raises:
DocumentSyncConflict – If there are conflicting keys within the project or job documents that cannot be resolved with the given strategy or if there is no strategy provided.
FileSyncConflict – If there are differing files that cannot be resolved with the given strategy or if no strategy is provided.
SchemaSyncConflict – In case that the check_schema argument is True and the detected state point schema of this and the other project differ.
signac.errors module¶
Errors raised by signac.
- exception signac.errors.ConfigError¶
Bases:
Error
,RuntimeError
Error with parsing or reading a configuration file.
- exception signac.errors.DestinationExistsError(destination)¶
Bases:
Error
,RuntimeError
The destination for a move or copy operation already exists.
- Parameters:
destination (str) – The destination causing the error.
- exception signac.errors.DocumentSyncConflict(keys)¶
Bases:
SyncConflict
Raised when a synchronization operation fails due to a document conflict.
- keys¶
The keys that caused the conflict.
- exception signac.errors.FileSyncConflict(filename)¶
Bases:
SyncConflict
Raised when a synchronization operation fails due to a file conflict.
- filename¶
The filename of the file that caused the conflict.
- exception signac.errors.H5StoreAlreadyOpenError¶
-
Indicates that the underlying HDF5 file is already open.
- exception signac.errors.H5StoreClosedError¶
Bases:
Error
,RuntimeError
Raised when trying to access a closed HDF5 file.
- exception signac.errors.IncompatibleSchemaVersion¶
Bases:
Error
The project’s schema version is incompatible with this version of signac.
- exception signac.errors.InvalidKeyError¶
Bases:
ValueError
Raised when a user uses a non-conforming key.
- exception signac.errors.JobsCorruptedError(job_ids)¶
Bases:
Error
,RuntimeError
The state point file of one or more jobs cannot be opened or is corrupted.
- Parameters:
job_ids – The job id(s) of the corrupted job(s).
- exception signac.errors.KeyTypeError¶
Bases:
TypeError
Raised when a user uses a key of invalid type.
- exception signac.errors.SchemaSyncConflict(schema_src, schema_dst)¶
Bases:
SyncConflict
Raised when a synchronization operation fails due to schema differences.
- exception signac.errors.StatepointParsingError¶
Bases:
Error
,RuntimeError
Indicates an error that occurred while trying to identify a state point.
- exception signac.errors.SyncConflict¶
Bases:
Error
,RuntimeError
Raised when a synchronization operation fails.