Creating combinations of files
Sometimes you need to create all the possible combinations of a set of files that you have as file streams.
For example, say that you have two file streams:
[a.txt b.txt]
[1.txt 2.txt 3.txt]
... and you want to process all of the combinations of these two sets of files. So in other words, what you want is:
[a.txt a.txt a.txt b.txt b.txt b.txt]
[1.txt 2.txt 3.txt 1.txt 2.txt 3.txt]
This is something you can accomplish with the FileCombinator component, available in SciPipe 0.9.1 and later.
Example
Given that you have a set of files:
letterfile_a.txt
letterfile_b.txt
numberfile_1.txt
numberfile_2.txt
numberfile_3.txt
... and you want to create all combinations of the letter*
files and the number*
files, you can do it as follows:
package main
import (
"github.com/scipipe/scipipe"
"github.com/scipipe/scipipe/components"
)
func main() {
wf := scipipe.NewWorkflow("wf", 4)
letterGlobber := components.NewFileGlobber(wf, "letter_globber", "letterfile_*.txt")
numberGlobber := components.NewFileGlobber(wf, "number_globber", "numberfile_*.txt")
fileCombiner := components.NewFileCombinator(wf, "file_combiner")
fileCombiner.In("letters").From(letterGlobber.Out())
fileCombiner.In("numbers").From(numberGlobber.Out())
catenator := wf.NewProc("catenator", "cat {i:letters} {i:numbers} > {o:combined}")
catenator.In("letters").From(fileCombiner.Out("letters"))
catenator.In("numbers").From(fileCombiner.Out("numbers"))
catenator.SetOut("combined", "{i:letters|basename|%.txt}.{i:numbers|basename|%.txt}.combined.txt")
wf.Run()
}
Note that when accessing an in-port on the FileCombinator with the In(PORTNAME)
method, this port
will be created automatically, together with a corresponding out-port which can be accessed with the
same name, Out(PORTNAME)
, as can be seen when we connect the fileCombinator to the catenator process
further down in the code.
The program above, if put in a .go
file and run with go run file.go
, will generate the following
files (excluding the accompanying .audit.json files):
letterfile_b.txt
letterfile_a.txt
numberfile_3.txt
numberfile_2.txt
numberfile_1.txt
letterfile_a.numberfile_2.combined.txt
letterfile_a.numberfile_1.combined.txt
letterfile_a.numberfile_3.combined.txt
letterfile_b.numberfile_2.combined.txt
letterfile_b.numberfile_1.combined.txt
letterfile_b.numberfile_3.combined.txt
As you can see, all the combinations of the