See also
- @collate in the Ruffus Manual
- Use of add_inputs(…) | inputs(…) in the Ruffus Manual
- Decorators for more decorators
@collate( input, filter, replace_inputs | add_inputs, output, [extras,…] )¶
- Purpose:
Use filter to identify common sets of inputs which are to be grouped or collated together:
Each set of inputs which generate identical output and extras using the formatter or regex (regular expression) filters are collated into one job.
This variant of
@collate
allows additional inputs or dependencies to be added dynamically to the task, with optional string substitution.add_inputs nests the the original input parameters in a list before adding additional dependencies.
inputs replaces the original input parameters wholescale.
This is a many to fewer operation.
Only out of date jobs (comparing input and output files) will be re-run.
Example of add_inputs
regex(r".*(\..+)"), "\1.summary"
creates a separate summary file for each suffix. But we also add date of birth data for each species:animal_files = "tuna.fish", "shark.fish", "dog.mammals", "cat.mammals" # summarise by file suffix: @collate(animal_files, regex(r".+\.(.+)$"), add_inputs(r"\1.date_of_birth"), r'\1.summary') def summarize(infiles, summary_file): passThis results in the following equivalent function calls:
summarize([ ["shark.fish", "fish.date_of_birth" ], ["tuna.fish", "fish.date_of_birth" ] ], "fish.summary") summarize([ ["cat.mammals", "mammals.date_of_birth"], ["dog.mammals", "mammals.date_of_birth"] ], "mammals.summary")Example of add_inputs
using
inputs(...)
will summarise only the dates of births for each species group:animal_files = "tuna.fish", "shark.fish", "dog.mammals", "cat.mammals" # summarise by file suffix: @collate(animal_files, regex(r".+\.(.+)$"), inputs(r"\1.date_of_birth"), r'\1.summary') def summarize(infiles, summary_file): passThis results in the following equivalent function calls:
summarize(["fish.date_of_birth" ], "fish.summary") summarize(["mammals.date_of_birth"], "mammals.summary")Parameters:
- input = tasks_or_file_names
- can be a:
- Task / list of tasks.
- File names are taken from the output of the specified task(s)
- (Nested) list of file name strings (as in the example above).
- File names containing
*[]?
will be expanded as a glob.- E.g.:
"a.*" => "a.1", "a.2"
- filter = matching_regex
- is a python regular expression string, which must be wrapped in a regex indicator object See python regular expression (re) documentation for details of regular expression syntax
- filter = matching_formatter
- a formatter indicator object containing optionally a python regular expression (re).
- add_inputs = add_inputs(…) or replace_inputs = inputs(…)
Specifies the resulting input(s) to each job.
Positional parameters must be disambiguated by wrapping the values in inputs(…) or an add_inputs(…).
Named parameters can be passed the values directly.
Takes:
- Task / list of tasks.
- File names are taken from the output of the specified task(s)
- (Nested) list of file name strings.
- Strings will be subject to substitution. File names containing
*[]?
will be expanded as a glob. E.g."a.*" => "a.1", "a.2"
- output = output
- Specifies the resulting output file name(s).
- extras = extras
Any extra parameters are passed verbatim to the task function
If you are using named parameters, these can be passed as a list, i.e.
extras= [...]
See @collate for more straightforward ways to use collate.