OpenEye-drconvert API Reference¶
Conversion to Records¶
- class drconvert.MolFileConverter(path: str, options: Optional[drconvert.drconvert.MolConversionOptions] = None, display_name: Optional[str] = None)¶
Convert a Molecular file to OERecords
- __iter__() Generator[openeye.oechem.OEMolRecord, None, None] ¶
Iterator of OERecords
- class drconvert.MolConversionOptions(isomeric_conf_test=False, schema_limit=0, clear_mol_data=True, unique_values_limit=25, sample_percent=100.0, generic_tags=[], mol_title_field='', smiles_field_name='Original SMILES', schema=None, error_field=None)¶
This class contains options for conversion from molecules to records.
Constructor arguments (and their defaults) are:
isomeric_conf_test (False): If True, consecutive input molecules with matching graphs are stored as conformers in a multiconformer molecule.
schema_limit (0): The maximum number of input rows to sample. The default is to sample all rows.
clear_mol_data (True): If True, molecule data (SD and generic) is not copied to each conformer from the parent molecule.
unique_values_limit (25): The number of unique values will be counted up to this limit. If the number of unique values in a field is below this limit, metadata will be added to the field indicating that the data is categorical.
sample_percent (100.0): If specified, this percentage of values are sampled and data types are perceived from that sample. The default is to examine all values.
generic_tags ([]): If provided, generic data is transferred from the input molecule to the output record for the specified tags.
mol_title_field (“”): If provided, the titles of molecules are extracted into a separate field.
smiles_field_name (“Original SMILES”): The name of a field to retain the original SMILES strings from the input file, before conversion to molecules.
schema (None): If specified, the provided OERecord defines a schema that will strictly control the interpretation of input values. The field types and metadata will be transferred to the output records.
error_field (None): If this value is provided, it designates a destination field on the output records to show data parsing errors.
- class drconvert.CSVConverter(path: str, options: Optional[drconvert.drconvert.CSVConversionOptions] = None, display_name: Optional[str] = None)¶
Convert a CSV file to OERecords
- __iter__() Generator[openeye.oechem.OEMolRecord, None, None] ¶
Iterator of OERecords
- class drconvert.CSVConversionOptions(schema_limit=0, unique_values_limit=25, delimiter=None, sample_percent=100.0, smiles_field_name='Original SMILES', schema=None, error_field=None)¶
This class contains options for converting .csv files to records.
Constructor arguments (and their defaults) are:
schema_limit (0): The maximum number of input rows to sample. The default is to sample all rows.
unique_values_limit (25): The number of unique values will be counted up to this limit. If the number of unique values in a field is below this limit, metadata will be added to the field indicating that the data is categorical.
delimiter (None): What character to use for the CSV delimiter. By default, the delimiter is automatically detected.
sample_percent (100.0): If specified, this percentage of values are sampled and data types are perceived from that sample. The default is to examine all values.
smiles_field_name (“Original SMILES”): The name of a field to retain the original SMILES strings from the input file, before conversion to molecules.
schema (None): If specified, the provided OERecord defines a schema that will strictly control the interpretation of input values. The field types and metadata will be transferred to the output records.
error_field (None): If this value is provided, it designates a destination field on the output records to show data parsing errors.
- class drconvert.OEDUConverter(path: str, options: Optional[drconvert.drconvert.MolConversionOptions] = None, display_name: Optional[str] = None)¶
Convert an OEDesignUnit oedu file to OERecords
- __iter__() Generator[openeye.oechem.OEMolRecord, None, None] ¶
Iterator of OERecords
- class drconvert.SmilesFileConverter(path: str, options: Optional[drconvert.drconvert.MolConversionOptions] = None, display_name: Optional[str] = None)¶
Convert a SMILES file to OERecords
- __iter__() Generator[openeye.oechem.OEMolRecord, None, None] ¶
Iterator of OERecords
- drconvert.get_converter(path: Union[str, pathlib.Path], display_name: Optional[str] = None) Union[drconvert.drconvert.CSVConverter, drconvert.drconvert.OEDBConverter, drconvert.drconvert.OEDUConverter, drconvert.drconvert.SmilesFileConverter, drconvert.drconvert.MolFileConverter] ¶
Returns the appropriate converter for a specified file.
- Parameters
path (str | Path) – The path to the file to be converted
display_name (str or None) – The display name for the file
- Returns
An instance of a converter class
Conversion from Records¶
- class drconvert.RecordConvertToCSV(path: str, delimiter: str = ',', include_mol_title: bool = True)¶
Convert a record file to CSV format
- __iter__() Generator[str, None, None] ¶
Iterator of CSV lines.
Returns header as the first item, followed by the rows of the CSV
- Returns
Iterator of CSV Rows as strings
- class drconvert.RecordConvertToMols(path: str)¶
Convert a record file to OEMols
- __iter__() Generator[openeye.oechem.OEMol, None, None] ¶
Iterator of OEMols
- class drconvert.RecordConvertToOEDU(path: str)¶
Convert a record file to OEDU
- __iter__() Generator[openeye.oechem.OEDesignUnit, None, None] ¶
Iterator of OEDUs
- class drconvert.ArchiveConverter(path: str, chunk_size: int = 16777216)¶
Convert a tarball or zip file into OERecords
- __iter__() Generator[Union[drconvert.drconvert.CSVConverter, drconvert.drconvert.OEDBConverter, drconvert.drconvert.OEDUConverter, drconvert.drconvert.SmilesFileConverter, drconvert.drconvert.MolFileConverter], None, None] ¶
Iterator of OERecords
- drconvert.record_to_mol(record)¶
Convert a single record to an OEMol :param strict: Fail if a field with data on the record cannot be added as generic data :param fallback_to_non_primary: If True, then the first molecule field encountered will be used if a
primary molecule field is not present on the record.
- Paral strip_images
If True, fields containing images will be replaced with the string ‘<Image>’
- Returns
An OEMol with the record data attached as generic data
- drconvert.record_to_du(record)¶
Convert a single record with a designunit to an OEDesignUnit
The record to design unit only converts the design unit on the record, and does not preserve other field data. The conversion is not round trippable.
Conversion to Alternate Formats¶
DRConvert provides support for alternatives to OpenEye’s formats. The current formats that are supported are pandas Dataframes and Apache Parquet.
These formats require extra libraries that DRConvert does not depend on. To use them
install pandas>=0.25.0,<0.26.0
and pyarrow>=0.14.1,<0.15.0
.
Pandas¶
- drconvert.pandas.read_record_file_to_dataframe(path: str, serializable: bool = False) Union[pandas.core.series.Series, pandas.core.frame.DataFrame] ¶
Reads a record file into a pandas DataFrame
Requires pandas, if not installed importing will trigger an ImportError
- Parameters
path (string) – Path to record file
serializable (bool) – If serializable, keep all non-POD types as bytes. Required to serialize dataframe in some cases
- Returns
A pandas Dataframe representation of the record file
- drconvert.pandas.record_to_dataframe(record: openeye.oechem.OEMolRecord, serializable: bool = False) pandas.core.frame.DataFrame ¶
Converts an
OERecord
to a pandas Dataframe.Requires pandas, if not installed importing will trigger an ImportError
Parquet¶
- drconvert.parquet.parquet_to_dataframe(path)¶
Returns a Dataframe read from a parquet file
Requires pandas and pyarrow, if not installed importing will trigger an ImportError
- Parameters
path (string) – Path to parquet file
- Returns
DataFrame
- drconvert.parquet.record_file_to_parquet(path, output_path, compression='snappy')¶
Reads a record file and writes out a parquet file
Requires pandas and pyarrow, if not installed importing will trigger an ImportError
- Parameters
path (string) – Path to record file
output_path (string) – Path to write parquet file
compression (string) – Compression format, valid options are ‘snappy’, ‘gzip’, ‘brotli’, None. Default is ‘snappy’