To illustrate the use of the domain index we will consider user ARNACHM1_TEST, table NCI_OPEN with column SMILES containing structure. The test data must be loaded prior to running the commands in this section. The commands should be executed from the command line prompt after logging in to SQL*Plus as the ARNACHM1_TEST user.
To perform a substructure search use the substructure(<column_name>, <query>, <max_hits>) operator, where <query> is a SMARTS pattern or MDL query block and <max_hits> is the maximum number of hits. Set <max_hits> to -1 to retrieve all rows that match. In the case where <query> is a SMARTS pattern, <query> must be a varchar2 string. In the case where <query> is an MDL query block, <query> must be a CLOB. The selection of apprpriate data types through a programming API is demonstrated in the examples section. An example SMARTS query:
select id from nci_open where c$arnachm1.substructure(smiles, '[#6]c2nc1ccccc1o2', -1) = 1;
The mdl_substructure operator may be used to perform a substructure search using an MDL query block represented by a string, although the operator has been deprecated. The varchar2 data type has a limit of 4000 characters, and as such may not be safe to use in some instances. It is preferable to use the substructure operator with the MDL query block represented by a CLOB data type. The query can use ‘|’ in place of newlines when executing queries pl/sql, although this is unnecessary in java or python.
select id from nci_open where c$arnachm1.mdl_substructure(SMILES, 'Cc2nc1ccccc1o2 JME 2002.05 Tue Mar 17 15:39:01 PDT 2009 10 11 V2000 5.9792 1.4000 0.0000 C 0.0000 2.1000 0.0000 C 0.0000 0.7000 0.0000 C 1.2124 2.8000 0.0000 C 1.2124 0.0000 0.0000 C 3.7564 2.5326 0.0000 N 3.7564 0.2674 0.0000 O 4.5792 1.4000 0.0000 C 2.4249 2.1000 0.0000 C 2.4249 0.7000 0.0000 C 1 8 1 0 2 3 2 0 2 4 1 0 3 5 1 0 4 9 2 0 5 10 2 0 6 8 2 0 6 9 1 0 7 8 1 0 7 10 1 0 9 10 1 0 M END ', 10000) = 1;
For exact match use the exactMatch(<column_name>, <query>, <max_hits>) where <query> is a smiles structure. For example,
select id from nci_open where c$arnachm1.exactMatch(smiles, 'O=C1C(=CC(=O)C=C1)C', -1) = 1;
The similarity(<column_name>, <query>, <min_similarity>, <max_hits>) operator retrieves all rows that have Tanimoto similarity of at least <min_similarity> to the query structure.
select id from nci_open where c$arnachm1.similarity(smiles, 'CCCc1ccc(cc1)S(=O)(=O)Nc2cc(on2)C', 0.80, -1) = 1;
An ancillary operator smililarityScore is provided to retrieve similarity scores:
select c$arnachm1.similarityScore(1), id from nci_open where c$arnachm1.similarity(smiles, 'CCCc1ccc(cc1)S(=O)(=O)Nc2cc(on2)C', 0.60, -1, 1) = 1;
The integer argument in the ancillary operator must match the extra integer argument in the similarity function (this is an Oracle requirement). In this case matching ROWIDs are returned directly in a PL/SQL table (from tableIndexSubstructSqlFilter) and are cast to a SQL table using the table operator.