Polymorph Search with IEFF Crystal Force Field (Part II of CSP Protocol: Generation and Filtering)

This Floe is the second part of the Crystal Structure Prediction (CSP) protocol developed by OpenEye. The goal of it is to predict, starting from a conformational ensemble, most stable crystal geometries. For energy function we use Intermolecular Energy Force Field (IEFF).

For each conformer in the ensemble, the electrostatic multipoles, needed for IEFF, are evaluated. Sampling over the list of space groups is done and crystal geometries are generated and optimized with IEFF force field. The lowest lying structures are deduplicated and are the main result of this workflow.

We below list top 20 most frequent space groups with their respective frequencies (data taken from spacegroup frequencies Table). Data is provided for general and for chiral space groups.

Space group

Frequency

Chiral space group

Frequency

14

35.1

19

47.9

2

19.3

4

30.1

19

9.01

1

5.16

15

7.16

5

4.32

4

5.66

18

2.50

61

3.78

92

1.40

62

1.54

20

1.06

33

1.53

146

0.787

9

0.999

96

0.675

1

0.970

76

0.627

60

0.897

152

0.590

5

0.813

144

0.431

29

0.725

173

0.399

11

0.675

198

0.367

12

0.515

169

0.324

13

0.507

145

0.324

148

0.485

154

0.282

18

0.470

78

0.271

7

0.367

170

0.229

56

0.354

155

0.229

Promoted Parameters

  • unique_confs (dataset_out) : Resulting dataset with all unique conformers in top IEFF crystal structures after rigid packing.
    Default: unique_confs
  • in (data_source) : Dataset with input molecules on which crystal polymorph predictions need to be performed.
  • failure (dataset_out) : Dataset containing records with failed jobs from three stages of computation: qm multipoles, IEFF, or crystal visualization.
    Default: failure
  • out (dataset_out) : Resulting dataset with lowest in energy, deduplicated crystal structures (in the CIF format) predicted with IEFF Crystal Force Field.
    Default: top_structures
  • qm_mults (dataset_out) : Resulting dataset with computed QM Multipoles, useful to store in case random packing stage needs to be re-done without recomputing QM Multipoles.
    Default: qm_mults

Extra Required Parameters

  • Temp Collection Name (collection_sink) : Name for the created collections.
    Default: IEFF Temp Crystal Packings Collection
  • Switch (boolean) : This parameter controls whether records are sent to the ‘true’ or ‘false’ port
    Default: True
  • Collection Name (collection_sink) : Name for the created collections.
    Default: IEFF Crystal Packings Collection
  • Output Shard Format (string) : The format of the data that shards will contain
    Default: oedb
    Choices: oedb, ism.gz, oez, oeb.gz, oeb
  • Records per shard parameter (integer) : Number of records in each shard. For optimal performance the combination of parameters: ‘Size of the batch’ (batch_size), ‘Parallel Group Item Count’ (item_count), and ‘Records per shard parameter’ (records_per_shard) need to satisfy : records_per_shard = batch_size * item_count
    Default: 50
  • Random Packing Switch (boolean) : Controls if Random Packing of monomer in crystal is performed or skipped.
    Default: True
  • Output Shard Format (string) : The format of the data that shards will contain
    Default: oedb
    Choices: oedb, ism.gz, oez, oeb.gz, oeb
  • records_per_shard (integer) : The target number of records in a shard. 0 indicates to run up to the max_shard_bytes limit per shard
    Default: 10
  • Hit List Size (integer) : The desired size of the hit list.
    Default: 1 Min: 1
  • Energy tag for global minimum (Field Type: Float) : Energy tag for lattice energy in order to find the global minimum.
    Default: IEFF Lattice Full Energy (kcal/mol)
  • Switch (boolean) : This parameter controls whether records are sent to the ‘true’ or ‘false’ port
    Default: True
  • QM Multipoles Switch (boolean) : Controls if QM Multipoles are computed or this step is skipped.
    Default: True