Query API

As briefly described in Finding jobs, the find_jobs() method provides much more powerful search functionality beyond simple selection of jobs with specific state points. One of the key features of signac is the possibility to search the project workspace to select desired subsets as needed.

It is possible to use search expressions directly on the command line, for example in combination with the $ signac find command. In this case, filter arguments are expected to be provided as valid JSON expressions. Notably, JSON (Javascript’s) Booleans are true and false as opposed to Python’s True and False.

Query Namespaces

Filter keys are namespaced by the type of the key. This means that any filter can be used to simultaneously search for keys in both the job state point and the job document. Namespaces are identified by prefixing filter keys with the appropriate prefixes. Currently, the following prefixes are recognized:

For example, to select all jobs whose state point key a has the value “foo” and document key b has the value “bar”, you would use:

project.find_jobs({"sp.a": "foo", "doc.b": "bar"})

The default prefix is sp, so any filter key that does not have a recognized prefix will be matched against job state points. This means that the following query is equivalent to the one above:

project.find_jobs({"a": "foo", "doc.b": "bar"})

Basic Expressions

Filter arguments are a mapping of expressions, where a single expression consists of a key-value pair. All selected documents will match these expressions.

The simplest expression is an exact match. For example, in order to select all jobs whose state point key a has the value 42, you would use the following expression: {"a": 42} as follows:

project.find_jobs({"a": 42})
signac find '{"a": 42}'

Select All

If you want to select the complete data set, don’t provide any filter argument at all. The default argument of None or an empty expression {} will select all jobs or documents. As was previously demonstrated, iterating over all jobs in a project can be accomplished directly.

for job in project:
    pass

On the command line, signac find without an arguments will return a list of all jobs.

signac find

Simple Selection

To select documents by one or more specific key-value pairs, provide these directly as filter arguments. For example, assuming that we have a list of documents with values N, kT, and p, as such:

1: {"N": 1000, "kT": 1.0, "p": 1}
2: {"N": 1000, "kT": 1.2, "p": 2}
3: {"N": 1000, "kT": 1.3, "p": 3}
...

We can select the 2nd document with {'p': 2}, but also {'N': 1000, 'p': 2} or any other matching combination.

Nested Keys

To match nested keys, avoid nesting the filter arguments, but instead use the .-operator. For example, if the documents shown in the example above were all nested like this:

1: {"statepoint": {"N": 1000, "kT": 1.0, "p": 1}}
2: {"statepoint": {"N": 1000, "kT": 1.2, "p": 2}}
3: {"statepoint": {"N": 1000, "kT": 1.3, "p": 3}}
...

Then we would use {'statepoint.p': 2} instead of {'statepoint': {'p': 2}} as filter argument. This is not only easier to read, but also increases compatibility with MongoDB database systems.

Operator Expressions

In addition to simple exact value matching, signac also provides operator-expressions to execute more complicated search queries.

Arithmetic Expressions

If we wanted to match all documents where p is greater than 2, we would use the following filter argument:

project.find_jobs({"p": {"$gt": 2}})
signac find '{"p": {"$gt": 2}}'

Note that we have replaced the value for p with the expression {'$gt': 2} to select all jobs with p values greater than 2. Here is a complete list of all available arithmetic operators:

  • $eq: equal to

  • $ne: not equal to

  • $gt: greater than

  • $gte: greater or equal to

  • $lt: less than

  • $lte: less than or equal to

Near Operator

The $near operator is used to find jobs with state point parameters that are near a value, where floating point precision may make it difficult to match the exact value. The behavior of $near matches that of Python’s math.isclose function. The “reference” value and tolerances are passed in as a list in the order [reference, [relative_tolerance, [absolute_tolerance]]], where the inner []s denote optional values. Note that default values are relative_tolerance = 1e-09 and absolute_tolerance = 0.

signac find theta.\$near 0.6  # easier than typing 0.600000001
signac find '{"p.$near": [100, 0.05]}'  # p within 5% of 100
signac find '{"p.$near": [100, 0.05, 2]}'  # abs(p-100)/max(p, 100) < 0.05 or abs(p-100) < 2

Logical Operators

There are three supported logical operators: $and, $or, and $not. The first two are unique in that they involve combinations of other query operators. To query with one of these two logical expression, we construct a mapping with the logical operator as the key and a list of expressions as the value. As usual, the $and operator matches documents where all the expressions are true, while the $or expression matches if documents satisfy any of the provided expressions. For example, to find all documents where p is greater than 2 or kT=1.0, we could use the following:

project.find_jobs({"$or": [{"p": {"$gt": 2}}, {"kT": 1.0}]})
signac find '{"$or": [{"p": {"$gt": 2}}, {"kT": 1.0}]}'

Logical expressions may be nested but cannot be the value of a key-value expression.

For the $not operator, we again construct a mapping with the operator as the key, but the value is a single expression rather than a list of expressions. For example, to find all jobs where a parameter a is not close to zero, we could use the following:

project.find_jobs({"$not": {"a": {"$near": 0}}})

Exists Operator

Warning

Boolean expressions are written differently on the command line because the command line takes JSON formatting.

If you want to check for the existence of a specific key but do not care about its actual value, use the $exists-operator. For example, this expression will return all documents that have a key p regardless of its value. Likewise, using False as argument would return all documents that have no key with the given name.

project.find_jobs({"p": {"$exists": True}})

On the command line, this expression must be valid JSON encapsulated in single quotes:

signac find '{"p": {"$exists": true}}'

Array Operator

This operator may be used to determine whether specific keys have values, that are in ($in), or not in ($nin) a given array, e.g.:

project.find_jobs({"p": {"$in": [1, 2, 3]}})
signac find '{"p": {"$in": [1, 2, 3]}}'

This would return all documents where the value for p is either 1, 2, or 3. The usage of $nin is analogous and will return all documents where the value is not in the given array.

Regular Expression Operator

This operator may be used to search for documents where the value of type str matches a given regular expression. For example, to match all documents where the value for protocol contains the string “assembly”, we could use:

project.find_jobs({"protocol": {"$regex": "assembly"}})
signac find '{"protocol": {"$regex": "assembly"}}'

This operator internally applies the re.search() function and will never match if the value is not of type str.

To negate a regular expression use a negative lookaround, e.g., to match all state points where the protocol does not contain the word “assembly”, you would use:

project.find_jobs({"protocol": {"$regex": r"^(?!.*assembly).*$"}})

Tip

Use the Regex101 app to develop and test your regular expressions.

Type Operator

This operator may be used to search for documents where the value is of a specific type. For example, to match all documents, where the value of the key N is of integer-type, we would use:

project.find_jobs({"N": {"$type": "int"}})

Other supported types include float, str, bool, list, and null.

signac find '{"N": {"$type": "int"}}'

Where Operator

This operator allows us to apply a custom function to each value and select based on its return value. For example, instead of using the regex-operator, as shown above, we could write the following expression:

project.find_jobs({"protocol": {"$where": 'lambda x: "assembly" in x'}})
signac find '{"protocol": {"$where": "lambda x: \"assembly\" in x"}}'

Simplified Syntax on the Command Line

For simple filters, you can use a simplified syntax instead of writing JSON. For example, instead of signac find '{"p": 2}', you can type signac find p 2.

A simplified expression consists of key-value pairs in alternation. The first argument will then be interpreted as the first key, the second argument as the first value, the third argument as the second key, and so on. If you provide an odd number of arguments, the last value will default to {'$exists': true}.

Querying via operator is supported using the .-operator. Finally, you can use /<regex>/ instead of {'$regex': '<regex>'} for regular expressions.

The following table shows simplified signac find expressions on the left, full query syntax in JSON in the middle, and full query syntax in Python the right.

simplified JSON

full JSON (in '' for command line)

Python

p

'{"p": {'$exists': true}}'

{'p': {'$exists': True}}

p 2

'{"p": 2}'

{'p': 2}

p 2 kT

'{"p": 2, "kT": {"$exists": true}}'

{'p': 2, 'kT': {'$exists': True}}

p 2 kT.$gte 1.0

'{"p": 2, "kT.$gte": 1.0}' or '{"p": 2, "kT": {"$gte": 1.0}}'

{'p': 2, 'kT': {'$gte': 1.0}}

protocol /assembly/

'{"protocol": {"$regex": "assembly"}}'

{'protocol': {'$regex': 'assembly'}}

Important

The $ character used in operator-expressions must be escaped in many terminals, that means for example instead of $ signac find p.$gt 2, you would need to write $ signac find p.\$gt 2.