Chunker¶
Overview¶
This is a simple commandline utility to take an input database file and divide it into similar-sized smaller pieces. Then each piece can be used as a dbase file in a separate ROCS run. This divide-and-conquer approach is an alternative way to run a single dbase over multiple CPUs but without the use of MPI.
Example Commands¶
To break input.oeb.gz
into 5 chunks, each with the same number of molecules.
prompt> chunker -in input.oeb.gz -base bar -nchunks 5
would create
bar0000001.oeb.gz
bar0000002.oeb.gz
bar0000003.oeb.gz
bar0000004.oeb.gz
bar0000005.oeb.gz
To break input.oeb.gz into chunks, each with 1000 multi-conformer molecules:
prompt> chunker -in input.oeb.gz -base foo -chunksize 1000
Command Line Help¶
A description of the command line interface can be obtained by executing Chunker with the –help option.
> chunker --help
will generate the following output:
Help functions:
chunker --help simple : Get a list of simple parameters
chunker --help all : Get a complete list of parameters
chunker --help defaults : List the defaults for all parameters
chunker --help <parameter> : Get detailed help on a parameter
chunker --help html : Create an html help file for this program
chunker --help versions : List the toolkits and versions used in the application
Required Parameters¶
-
-in
<filename>
¶ Name of input file to chunk.
-
-base
<NAME>
¶ Base name of output files. Output files will be sequentially numbered.
And one of the following two options must be used:
-
-nchunks
N
¶ Create N new files of equal number of conformers or molecules. Chunker will read through the entire file once to count the number of conformers/molecules, then will create the new files. The switch is the
-countConfs
flag. N must be a positive integer.
-
-chunksize
M
¶ Create new files, each containing M molecules. M must be a positive integer.
Note
Only one of -nchunks
or -chunksize
can be used.
Optional Parameters¶
-
-countConfs
¶
If the flag
-countConfs
is set to true then the file will be switched to give approximately equal numbers of conformers in each chunk. The split always occurs at the end of each molecule so all the conformers for each molecule are kept together. If the flag-countConfs
is set to false then the file will be switched to give equal numbers of molecules in each chunk.[default = true]
-
-pad_zeros
¶
This option will pad the front of the output filenames with zeroes, which helps keep files in order when doing a sort of filenames.
[default = true]