Changelog

The signac package follows semantic versioning.

Version 1.0

Highlights

  • Native integration of HDF5 files with the H5Store and H5StoreManager, which are exposed as the job.data, job.stores, project.data, and project.stores properties respectively.
  • The newly added signac.get_job() function makes it easier to obtain instances of Job by calling the function from within a job’s workspace directory or by directly providing the path to the job’s workspace directory. This is especially useful for interactive work or when accessing jobs which are outside of the current project.
  • Simplified export of project and job data to pandas dataframes via the to_dataframe() function.
  • Projects and job search results are displayed nicely in Jupyter Notebooks.
  • Support for compressed Collection files.

[1.1.0] – 2019-05-19

Added

  • Add command line options --sp and --doc for signac find that allow users to display key-value pairs of the state point and document in combination with the job id (#97, #146).
  • Improve the representation (return value of repr()) of instances of H5Group and SyncedAttrDict.

Fixed

  • Fix: Searches for whole numbers will match all numerically matching integers regardless of whether they are stored as decimals or whole numbers (#169).
  • Fix: Passing an instance of dict to H5Store.setdefault() will return an instance of H5Group instead of a dict (#180).
  • Fix error with storing numpy arrays and scalars in a synced dictionary (e.g. job.statepoint, job.document) (#184).
  • Fix issue with ResourceWarning originating from unclosed instance of Collection (#186).
  • Fix issue with using the get_project() function with a relative path and search=False (#191).

Removed

  • Support for Python version 3.4 (no longer tested).

[1.0.0] – 2019-02-28

Added

  • Official support for Python 3.7.
  • The H5Store and H5StoreManager classes, which are useful for storing (numerical) array-like data with an HDF5-backend. These classes are exposed within the root namespace.
  • The job.data and project.data properties which present an instance of H5Store to access numerical data within the job workspace and project root directory.
  • The job.stores and project.stores properties, which present an instance of H5StoreManager to manage multiple instances of H5Store to store numerical array-like data within the project workspace and project root directory.
  • The signac.get_job() and the signac.Project.get_job() functions that allow users to get a job handle by switching into or providing the job’s workspace directory.
  • The job variable is automatically set when opening a signac shell from within a job’s workspace directory.
  • Add the signac shell -c option which allows the direct specification of Python commands to be executed within the shell.
  • Automatic cast of numpy arrays to lists when storing them within a JSONDict, e.g., a job.statepoint or job.document.
  • Enable Collection class to manage collections stored in compressed files (gzip, zip, etc.).
  • Enable deleting of JSONDict keys through the attribute interface, e.g., del job.doc.foo.
  • Pretty HTML representation of instances of Project and JobsCursor targeted at Jupyter Notebooks (requires pandas, automatically enabled when installed).
  • The to_dataframe() function to export the job state point and document data of a Project or a JobsCursor, e.g., the result of Project.find_jobs(), as a pandas.Dataframe (requires pandas).

Changed

  • Dots (.) in keys are no longer allowed for JSONDict and Collection keys (previously deprecated).
  • The JSONDict module is exposed in the root namespace, which is useful for storing text-serializable data with a JSON-backend similar to the job.statepoint or job.document, etc.
  • The Job.init() method returns the job to allow one-line job creation and initialization.
  • The search argument was added to the signac.get_project() function, which when True (the default), will cause signac to search for a project within and above a specified root directory, not only within the root directory. The behavior without any arguments remains unchanged.

Fixed

  • Fix Collection.update() behavior such that existing documents with identical primary key are updated. Previously, a KeyError would be raised.
  • Fix issue where the Job.move() would trigger a confusing DestinationExists exception when trying to move jobs across devices / file systems.
  • Fix issue that caused failures when the python-rapidjson package is installed. The python-rapidjson package is used as the primary JSON-backend when installed.
  • Fix issue where schema with multiple keys would subset incorrectly if the list of jobs or statepoints was provided as an iterator rather than a sequence.

Removed

  • Removes the obsolete and deprecated core.search_engine module.
  • The previously deprecated Project.find_statepoints() and Project.find_job_documents() functions have been removed.
  • The Project.find_jobs() no longer accepts the obsolete index argument.

Version 0.9

Highlights

  • Adds persistent state point index caching, which speeds up all functions that require indexing, for example the $ signac find command.
  • Adds the $ signac sync tool for synchronization of multiple signac projects.
  • Adds the $ signac schema function for the automatic detection of the implicit schema of a signac project.
  • Adds the $near operator to match numbers with up to a specific precision.
  • Adds functions for the import and export of data spaces.
  • Add functions for the management of data on the project level, as opposed to the job level.

[0.9.5] – 2019-01-31

Fixed

  • Ensure that the next() function can be called for a JobsIterator, e.g., project.find().
  • Pickling issue that occurs when a _SyncedDict (job.statepoint, job.document, etc.) contains a list.
  • Issue with the readline module that would cause signac shell to fail on Windows operating systems.

[0.9.4] – 2018-10-24

Added

  • Adds the $ signac import command and the Project.import_from() method for the import of data spaces into a project workspace, such as a directory, a tarball, or a zip file.
  • Adds the $ signac export command and the Project.export_to() method for the export of project workspaces to an external location, such as a directory, a tarball, or a zip file.
  • Adds functionality for the rapid initialization of temporary projects with the signac.TemporaryProject context manager.
  • Adds the signac.Project.temporary_project() context manager which creates a temporary project within the root project workspace.
  • Add signac to the default namespace when invoking signac shell.
  • Add option to specify a custom view path for the signac view/ Project.create_linked_view() function.
  • Iterables of documents used to construct a Collection no longer require an _id field.

Changed

  • The default path for linked views has been adjusted to match the one used for data exports.

Fixed

  • Fix issue where differently typed integer values stored within a Collection under the same key would not be indexed correctly. This issue affected the correct function of the $type operator for aforementioned cases and would lead to incorrect types in the Project schema detection algorithm for integer values.
  • Fix issue where jobs that are migrated (state point change), but are not initialized, were not properly updated.
  • Fix issue where changes to lists as part of synchronized dictionary, for example a state point or document would not be saved.
  • Fix non-deterministic issue occuring on network file systems when trying to open jobs where the user has no write access to the job workspace directory.

[0.9.3] – 2018-06-14

Added

  • Add $near operator to express queries for numerical values that match up to a certain precision.
  • Add the $ signac shell sub command to directly launch a Python interpreter within a project directory.

Fixed

  • Fix issue where a job instance would not be properly updated after more than one state point reset.

[0.9.2] – 2017-12-18

Added

  • Add provisional feature (persistent state point caching); calling the Project.update_cache() method will generate and store a persistent state point cache in the project root directory, which will increase the speed of many project iteration, search, and selection operations.
  • Add Project.check() method which checks for workspace corruption, but does not make any attempt to repair it.
  • The Project.repair() method will attempt to repair jobs, that have been corrupted by manually renaming the job’s workspace directory.

Changed

  • Enable the write_concern flag for the job.document.
  • Allow to omit the specification of an authentication mechanism in the MongoDB host configuration.

Fixed

  • Fix critical issue in the JSONDict implementation that would previously overwrite the underlying file when attempting to store values that are not JSON serializable.
  • Fix issue where the Project.export() function would ignore the update argument when the index to export to would be a MongoDB collection.

[0.9.1] – 2017-11-07

Fixed

  • Fix critical issue in the SyncedAttrDict implementation that would previously overwrite the underlying file if the first operation was a __setitem__() operation.

[0.9.0] – 2017-10-28

Added

  • Introduction of $ signac sync, Project.sync(), and Job.sync() for the simplified and fine-grained synchronization of multiple project data spaces.
  • Introduction of $ signac schema and Project.detect_schema() for the automatic detection of the implicit and semi-structured state point schema of a project data space.
  • Simplified aggregation of jobs over projects and Project.find_jobs() results with the Project.groupby() function.
  • Support for project-centralized data with the Project.document attribute and the Project.fn() method for the wrapping of filenames within the project root directory.
  • Added the Job.clear() and the Job.reset() methods to clear or reset a job’s workspace data.

Changed

  • Both Job.statepoint and Job.document now use the same underlying data structure and provide the exact same API (copy with () and access of keys as attributes).
  • The Collection class uses an internal counter instead of UUIDs for the generation of primary keys (resulting in improved performance).
  • Major performance improvements (faster Collection, improved caching)
  • Overhaul of the reference documentation.

Version 0.8

Highlights

  • Adds boolean and arithmetic operators to search queries.
  • Major revision of the indexing system.
  • Adds $ signac document command line function.
  • Add the signac.Collection class for the management of persistent document collections.

[0.8.7] – 2017-10-05

Fixed

  • Fix an issue where the creation of linked views was non-deterministic in some cases.
  • Fix an issue where the creation of linked views would fail when the project contains job with state points that have lists as values.

[0.8.6] – 2017-08-25

Fixed

  • Fix Collection append truncation issue (see issue #66).

[0.8.5] – 2017-06-07

Changed

  • The signac ids in the signac find –show view are no longer enclosed by quotation marks.

Fixed

  • Fix compatibility issue that broke the signac find –view and all –pretty commands on Python 2.7.
  • Fix issue where view directories would be incomplete in combination with heterogeneous state point schemas.

[0.8.4] – 2017-05-19

Added

  • All search queries on project and collection objects support various operators including: $and, $or, $gt, $gte, $lt, $lte, $eq, $ne, $exists, $regex, $where, $in, $nin, and $type.
  • The $ signac find command supports a simple filter syntax, where key value pairs can be provided as individual arguments.
  • The $ signac find command is extended by a –show option, to display the state point and the document contents directly. The contents are truncated to an adjustable depth to reduce output noise.
  • The $ signac view command has an additional filter option to select a sub data space directly without needing to pipe job ids.
  • The new $ signac document command can be used to display a job’s document directly.

Changed

  • Minor performance improvements.

[0.8.3] – 2017-05-10

Changed

  • Raise ExportError when updating with an empty index.

Fixed

  • Fix command line logic issue with $signac config host.
  • Fix bug, where Collection.replace_one() would ignore the upsert argument under specific conditions.

[0.8.2] – 2017-04-19

Fixed

  • Fixes a TypeError which occurred under specific conditions when calling Collection.find() with nested filter arguments.

[0.8.1] – 2017-04-17

Fixed

  • Fixes wide-spread typo (indeces -> indexes).

[0.8.0] – 2017-04-16

Overall major simplification of the generation of indexes and the management and searching of index collections without external database.

Added

  • Introduction of the Collection class for the management of document collections, such as indexes in memory and on disk.
  • Generation of file indexes directly via the signac.index_files() function.
  • Generation of master indexes directly via the signac.index() function and the $ signac index command.
  • The API of signac_access.py files has been simplified, including the possibility to use a blank file for a minimal configuration.
  • Use the $ signac project --access command to create a minimal access module in addition to Project.create_access_module().
  • The update of existing index collections has been simplified by using the export() function with the update=True argument, which means that stale documents (the associated file or state point no longer exists) are automatically identified and removed.
  • Added the Job.ws attribute, as short-cut for Job.workspace().
  • The Job.sp interface has a get() function which can be used to specify a default value in case that the requested key is not part of the state point.

Changed (breaking API)

  • The $ signac index command generates a master index instead of a project index. To generate a project index from the command line use $ signac project --index instead.
  • The SignacProjectCrawler class expects the project’s root directory as first argument, not the workspace directory.
  • The get_crawlers() function defined within a signac_access.py access module is expected to yield crawler instances directly, not a mapping of crawler ids and instances.
  • The simplification of the signac_access.py module API is reflected in a reduction of arguments to the Project.create_access_module() method.

Changed (non-breaking)

  • The RegexFileCrawler, SignacProjectCrawler and MasterCrawler classes were moved into the root namespace.
  • If a MasterCrawler object is instantiated with the raise_on_error argument set to True, any errors encountered during crawling are raised instead of ignored and skipped; this simplifies the debugging of erroneous access modules.
  • Improved error message for invalid configuration files.
  • Better error messages for invalid $ signac find queries.
  • Check a host configuration on the command line via $ signac host --test.
  • A MongoDB database host configuration defaults to none when no authentication method is explicitly specified.
  • Using the --debug option in combination with $ signac index will show the traceback of errors encountered during indexing instead of ignoring them.
  • Instances of Job are hashable, making it possible to use them as dict keys for instance.
  • The representation of Job instances via repr() can actually serves as copy constructor command.
  • The project interface implementation performs all non-trivial search operations on an internally management index collection, which improves performance and simplifies the code base.

Deprecated

  • The DocumentSearchEngine class has been deprecated, its functionality is now provided by the Collection class.

Fixed

  • An issue related to exporting documents to MongoDB collections via pymongo in combination with Python 2.7 has been fixed.

Version 0.7

Highlights

  • Add support for Python 3.6, PyPy and PyPy3.
  • Make any instance of Project behave like an iterable (for job in project).
  • Introduction of the Job.sp attribute to access state point variables.
  • Revision of the linked view function, which now allows the update of previous views.
  • Support for searching by job document keys on the command line.
  • Add functions for moving and cloning jobs.
  • Add functions for changing a job’s state point.
  • Enable opening of jobs by abbreviated id.

[0.7.1] – 2017-01-09

Added

  • When the python-rapidjson package is installed, it will be used for JSON encoding/decoding (experimental).

Changed

  • All job move-related methods raise DestinationExistsError in case of destination conflicts.
  • Optimized $ signac find command.

Fixed

  • Fixed bug in $ signac statepoint.
  • Suppress ‘broken pipe error’ message when using $ signac find for example in combination with $ head.

[0.7.0] – 2017-01-04

Added

  • Add support for Python version 3.6.
  • Add support for PyPy and PyPy3.
  • Simplified iteration over project data spaces.
  • An existing linked view can be updated by executing the view command again.
  • Add attribute interface for the access and modification of job state points: Job.sp.
  • Add function for moving and copying of jobs between projects.
  • All project related iterators support the len-operator.
  • Enable iteration over all jobs with: for job in project:.
  • Make len(project) an alias for project.num_jobs().
  • Add in-operator to determine whether a job is initialized within a project.
  • Add Job.sp attribute to access and modify a job’s state point.
  • The Project.open_job() method accepts abbreviated job ids.
  • Add Project.min_len_unique_id() method to determine the minimum length of job ids to be unique within the project’s data space.
  • Add Job.move() method to move jobs between projects.
  • Add Project.clone() method to copy jobs between projects.
  • Add $ signac move and $ signac clone command line functions.
  • Add Job.reset_statepoint() method to reset a job’s state point.
  • Add Job.update_statepoint() method to update a job’s state point.
  • Add a Job.FN_DOCUMENT constant which defines the default filename of the job document file
  • The $ signac find command accepts a -d/--doc-filter option to filter by job document contents.
  • Add the Project.create_linked_view() method as replacement for the previously deprecated Project.create_view() method.

Changed

  • Linked views use relative paths.
  • The Guide documentation chapter has been renamed to Reference and generally overhauled.
  • The Quick Reference documentation chapter has been extended.

Fixed

  • Fix error when using an instance of Job after calling Job.remove().
  • A project created in one the standard config directories (such as the home directory) does not take prevalence over project configurations in or above the current working directory.

Removed

  • The signac-gui component has been removed.
  • The Project.create_linked_view() force argument is removed.
  • The Project.find_variable_parameters() method has been removed

Version 0.6

Highlights

  • General revision of the indexing and export system.
  • General consolidation including the removal of the conversion framework.

[0.6.2] – 2017-12-15

Added

  • Add instructions on how to acknowledge signac in publications to documentation.
  • Add cite module for the auto-generation of formatted references and BibTeX entries.

Removed

  • Remove SSL authentication support.

[0.6.1] – 2017-11-26

Changed

  • The Project.create_view() method triggers a DeprecationWarning instead of a PendingDeprecationWarning.
  • The Project.find_variable_parameters() method triggers a DeprecationWarning instead of a PendingDeprecationWarning.

Fixed

  • Make package more robust against PySide import errors.
  • Fix Project.__repr__ method.
  • Fix critical bug in fs.GridFS class, which rendered it unusuable.
  • Fix issue in indexing.fetch() previously resulting in local paths being ignored.
  • Fix error signac.__all__ namespace directive.

[0.6.0] – 2016-11-18

Added

  • Add the export_to_mirror() function for mirroring files.
  • Introduction of the signac.fs namespace to simplify the configuration of mirror filesystems.
  • Add errors module to root namespace. Many exceptions raised inherit from the base exception types defined within that module, making it easier to catch signac related errors.
  • Add the export_one() function for the export of a single index document; simplifies the implementation of custom export functions.
  • Opening an instance of Job with the open_job() method multiple times and entering a job context recursively is now well-defined behavior: Opening a job now adds the current working directory onto a stack, closing it switches into the directory on top of the stack.
  • The return type of Project.open_job() can be configured to make it easier to specialize projects with custom job types.

Changed

  • The MasterCrawler logic has been simplified; their primary function is the compilation of index documents from slave crawlers, all export logic, including data mirroring is now provided by the signac.export() function.
  • Each index document is now uniquely coupled with only one file or data object, which is why signac.fetch() replaces signac.fetch_one() and the latter one has been deprecated and is currently an alias of the former one.
  • The signac.fetch() function always returns a file-like object, regardless of format definition.
  • The format argument in the crawler define() function is now optional and has now very well defined behavior for str types. It is encouraged to define a format with a str constant rather than a file-like object type.
  • The TextFile file-like object class definition in the formats module has been replaced with a constant of type str.
  • The signac.export() function automatically delegates to specialized implementations such as export_pymongo() and is more robust against errors, such as broken connections.
  • The export_pymongo() function makes multiple automatic restart attempts when encountering errors.
  • Documentation: The tutorial is now based on signac-examples jupyter notebooks.
  • The contrib.crawler module has been renamed to contrib.indexing to better reflect the semantic context.
  • The signac.export() function now implements the logic for data linking and mirroring.
  • Provide default argument for ‘–indent’ option for $ signac statepoint command.
  • Log, but do not reraise exceptions during MasterCrawler execution, making the compilation of master indexes more robust against errors.
  • The object representation of Job and Project instances is simplified.
  • The warning verbosity has been reduced when importing modules with optional dependencies.

Removed

  • All modules related to the stale conversion framework feature have been removed resulting in a removal of the optional networkx dependency.
  • Multiple modules related to the conversion framework feature have been removed, including: contrib.formats_network, contrib.conversion, and contrib.adapters.

Fixed

  • Opening instances of Job with the Job.open() method multiple times, equivalently entering the job context recursively, does not cause an error anymore, but instead the behavior is well-defined.

Version 0.5

[0.5.0] – 2016-08-31

Added

  • New function: signac.init_project() simplifies project initialization within Python
  • Added optional root argument to signac.get_project() to simplify getting a project handle outside of the current working directory
  • Added optional argument to signac.get_project(), to allow fetching of projects outside of the current working directory.
  • Added two class factory methods to Project: get_project() and init_project().

Changed

  • The performance of project indexing and crawling has been improved.

Version 0.4

[0.4.0] – 2016-08-05

Added

  • The performance of find operations can be greatly improved by using pre-generated job indexes.
  • New top-level commands: $ signac find, $ signac index, $ signac statepoint, and $ signac view.
  • New method: Project.create_linked_view()
  • New method: Project.build_job_statepoint_index()
  • New method: Project.build_job_search_index()
  • The Project.find_jobs() method allows to filter by job document.

Changed

  • The SignacProjectCrawler indexes all jobs, not only those with non-empty job documents.
  • The signac.fetch_one() function returns None if no associated object can be fetched.
  • The tutorial is restructured into multiple parts.
  • Instructions for installation are separated from the guide.

Removed

  • Remove previously deprecated crawl keyword argument in index export functions.
  • Remove previously deprecated function common.config.write_config().

Version 0.3

[0.3.0] – 2016-06-23

Added

  • Add contributing agreement and guidelines.

Changed

  • Change license from MIT to BSD 3-clause license.

Version 0.2

[0.2.9] – 2016-06-06

Added

  • Addition of the signac config command line API.
  • Password updates are encrypted with bcrypt when passlib is installed.
  • The user is prompted to enter missing credentials (username/password) in case that they are not stored in the configuration.
  • The $ signac confg tool provides the --update-pw argument, which allows users to update their own password.
  • Added MIT license, in addition, all source code files contain a short licensing header.

Changed

  • Improved documentation on how to configure signac.
  • The OSI classifiers are updated, including an upgrade of the development status to ‘4 - beta’.

Fixed

  • Nested job state points can no longer get corrupted. This bug occurred when trying to operate on nested state point mappings.

Deprecated

  • Deprecated pymongo versions 2.x are no longer supported.

[0.2.8] – 2016-04-18

Added

  • Project is now in the root namespace.
  • Add index() method to Project.
  • Add create_access_module() method to Project.
  • Add find_variable_parameters() method to Project.
  • Add fn() method to Job, which prepends the job’s workspace path to a filename.
  • The documentation contains a comprehensive tutorial

Changed

  • The crawl() function yields only the index documents and not a tuple of (_id, doc).
  • export() and export_pymongo() expect the index documents as first argument, not a crawler instance. The old API is still supported, but will trigger a DeprecationWarning.

[0.2.7] – 2016-02-29

Added

  • Add job.isfile() method

Changed

  • Optimize project.find_statepoints() and project.repair() functions.

[0.2.6] – 2016-02-20

Added

  • Add job.reset_statepoint() and job.update_statepoint()
  • Add job.remove() function

Changed

  • Sanitize filter argument in all project.find_*() methods.
  • The job.statepoint() function accurately represents saved statepoints, e.g. tuples are represented as lists, as there is no difference between tuples and lists in JSON.
  • signac-gui does not block on database operations.
  • signac-gui allows reload of databases and collections of connected hosts.

Fixed

  • RegexFileCrawler define() class function only acts upon the actual specialization and not globally on all RegexFileCrawler classes.
  • signac-gui does not crash when replica sets are configured.

[0.2.5] – 2016-02-10

Added

  • Added signac.get_project(), signac.get_database(), signac.fetch() and signac.fetch_one() to top-level namespace.
  • Added basic shell commands, see $ signac --help.
  • Allow opening of jobs by id: project.open_job(id='abc123...').
  • Mirror data while crawling.
  • Use extra sources for fetch() and fetch_one().
  • Add file system handler: LocalFS, handler for local file system.
  • Add file system handler: GridFS, handler for MongoDB GridFS file system.
  • Crawler tags, to control which crawlers are used for a specific index.
  • Allow explicit job workspace creation with job.init().
  • Forwarding of pymongo host configuration via signac configuration.

Changed

  • Major reorganization of the documentation, split into: Overview, Guide, Quick Reference and API.
  • Documentation: Add notes for system administrators about advanced indexing.
  • Warn about outdated pymongo versions.
  • Set zip_safe flag to true in setup.py.
  • Remove dependency on six module, by adding it to the common subpackage.

Deprecated

Fixed

  • Fixed hard import of pymongo bug (issue #24).
  • Crawler issues with malformed documents.

[0.2.4] – 2016-01-11

Added

  • Implement Project.repair() function for projects with corrupted workspaces.
  • Allow environment variables in workspace path definition.
  • Check and fix config permission errors.

Changed

  • Increase robustness of job manifest file creation.

Fixed

  • Fix project crawler deep directory issue (hotfix).

[0.2.3] – 2015-12-09

Fixed

  • Fix a few bugs related to project views.

[0.2.2] – 2015-11-30

Fixed

  • Fix SignacProjectCrawler super() bug.

[0.2.1] – 2015-11-29

Added

  • Add support for Python 2.7
  • Add signac-gui (early alpha)
  • Allow specification of relative and default workspace paths
  • Add the ability to create project views
  • Add Project.find_*() functions to search the workspace
  • Add function to write and read state point hash tables

[0.2.0] – 2015-11-05

  • Major consolidation of the package.
  • Remove all hard dependencies, but six.