MaaS Database Preparation¶
Prerequisite¶
The steps below presume the Maas Client (maascli) has been installed per Requirements.
Input files¶
MaaS prep requires one or two input files, depending on whether you want to include conformers in the database.
Note
Note that at the scale of 1B compounds, conformers are not currently supported.
The primary input file should be a SMILES (.ism) file containing the molecules of interest. There are a few requirements:
Titles must be included. MaaS uses titles for many cross-referencing features.
Titles must be unique. If more than one entry has the same title, all subsequent molecules with that title will be omitted from the database.
Molecules should be as registered/expected by chemists. If a molecule is registered without stereochemistry, DO NOT enumerate them in this file. Stereo enumeration should be done on the input to Omega for generating conformers. See below.
Conformers¶
If conformers are to be included, they should be created with the same titles as the SMILES file. Omega requires undefined stereo centers to be enumerated before conformers can be created. This is done using the Flipper app (documentation).
As Flipper enumerates stereoisomers, each will retain the same title as the input file. This means that we can associate all the various stereoisomers in the conformer file with the matching racemic molecule in the SMILES file.
Running prep¶
maascli database prep has a number of options, but in most cases, the defaults are designed to give the best results. (For very large DBs like Enamine REAL, see notes below)
$ maascli database prep --help
Usage: maascli database prep [OPTIONS] ID TITLE INPUT MAASDB
Prepare a new database for maas
ID : Unique ID, string with no spaces or special chars
TITLE : Title for this database, visible in the UI
INPUT : Input SMILES filename
MAASDB : Output .maasdb filename
Options:
--version VERSION Version string to label database
--confs CONFS OEZ or OEB file with conformers for same
molecules as INPUT
--sss [all|mdl|smarts] Which fast substructure files to create.
--fps [all|circular|circularvs|tree|treevs|path|pathvs]
Which FP types to create.
--numbits [512|1024|2048|4096] Number of bits for fingerprint creation
--tautomerType [reasonable|none]
Tautomer normalization method.
--progress [bar|stderr|log|none]
Progress output style: [bar, stderr, log,
none] [default: bar]
--progressDelta FLOAT Delta time for progress output. (for styles
stderr and log) [default: 60.0]
--force Overwrite existing files.
There are four required parameters, ID, TITLE, INPUT and MAASDB. ID has to be unique but this then allows access to this database via the API using this ID. TITLE will be shown in the UI, so can be as detailed as needed. Make sure to put it in quotes if there are any spaces or special characters. If not provided, VERSION will be set to a date stamp for the current day (e.g. 2020-08-17).
In this example, we will also add the –confs flag to include the .oez files of conformers.
$ ls
emolecules.ism emolecules_confs.oez
$ maascli database prep emolecules "eMolecules" emolecules.ism emolecules.maasdb --confs emolecules_confs.oez
maas prep
ID: emolecules
Title: eMolecules
Input: /Users/bob/dev/git/maas-server/foo/emolecules.ism
Output: /Users/bob/dev/git/maas-server/foo/emolecules.maasdb
Protomer canonicalization: reasonable
Confs: /Users/bob/dev/git/maas-server/foo/emolecules_confs.oez
Including FP types: (4096 bits)
Path
Circular
Tree
PathVS
TreeVS
CircularVS
Including Substructure search types:
MDL SubSearch Screen
SMARTS SubSearch Screen
Creating title map [============================================================] 100.00% 00:00:00
Creating [============================================================] 100.00% 00:00:01
Writing circular [============================================================] 100.00% 00:00:00
Writing circularvs [============================================================] 100.00% 00:00:00
Writing path [============================================================] 100.00% 00:00:00
Writing pathvs [============================================================] 100.00% 00:00:00
Writing tree [============================================================] 100.00% 00:00:00
Writing treevs [============================================================] 100.00% 00:00:00
Writing SMARTS [============================================================] 100.00% 00:00:00
Writing MDL [============================================================] 100.00% 00:00:00
$ ls
emolecules.ism emolecules.maasdb-pathvs.fpbin emolecules.maasdb-tree.fpbin
emolecules.maasdb emolecules.maasdb-sss-MDL.oeb emolecules.maasdb-treevs.fpbin
emolecules.maasdb-circular.fpbin emolecules.maasdb-sss-MDL.oeb.idx emolecules.maasdb.json
emolecules.maasdb-circularvs.fpbin emolecules.maasdb-sss-SMARTS.oeb emolecules_confs.oez
emolecules.maasdb-path.fpbin emolecules.maasdb-sss-SMARTS.oeb.idx maas_prep.log