OpenEye-drconvert API Reference

Conversion to Records

class drconvert.MolFileConverter(path: str, options: MolConversionOptions | None = None, display_name: str | None = None)

Convert a Molecular file to OERecords

__iter__() → Generator[OEMolRecord, None, None]: Iterator of OERecords

class drconvert.MolConversionOptions(isomeric_conf_test=False, schema_limit=0, clear_mol_data=True, unique_values_limit=25, sample_percent=100.0, generic_tags=[], mol_title_field='', smiles_field_name='Original SMILES', schema=None, error_field=None)

This class contains options for conversion from molecules to records.

Constructor arguments (and their defaults) are:

isomeric_conf_test (False): If True, consecutive input molecules with matching graphs are stored as conformers in a multiconformer molecule.

schema_limit (0): The maximum number of input rows to sample. The default is to sample all rows.

clear_mol_data (True): If True, molecule data (SD and generic) is not copied to each conformer from the parent molecule.

unique_values_limit (25): The number of unique values will be counted up to this limit. If the number of unique values in a field is below this limit, metadata will be added to the field indicating that the data is categorical.

sample_percent (100.0): If specified, this percentage of values are sampled and data types are perceived from that sample. The default is to examine all values.

generic_tags ([]): If provided, generic data is transferred from the input molecule to the output record for the specified tags.

mol_title_field (“”): If provided, the titles of molecules are extracted into a separate field.

smiles_field_name (“Original SMILES”): The name of a field to retain the original SMILES strings from the input file, before conversion to molecules.

schema (None): If specified, the provided OERecord defines a schema that will strictly control the interpretation of input values. The field types and metadata will be transferred to the output records.

error_field (None): If this value is provided, it designates a destination field on the output records to show data parsing errors.

class drconvert.CSVConverter(path: str, options: CSVConversionOptions | None = None, display_name: str | None = None)

Convert a CSV file to OERecords

__iter__() → Generator[OEMolRecord, None, None]: Iterator of OERecords

class drconvert.CSVConversionOptions(schema_limit=0, unique_values_limit=25, delimiter=None, sample_percent=100.0, smiles_field_name='Original SMILES', schema=None, error_field=None)

This class contains options for converting .csv files to records.

Constructor arguments (and their defaults) are:

schema_limit (0): The maximum number of input rows to sample. The default is to sample all rows.

unique_values_limit (25): The number of unique values will be counted up to this limit. If the number of unique values in a field is below this limit, metadata will be added to the field indicating that the data is categorical.

delimiter (None): What character to use for the CSV delimiter. By default, the delimiter is automatically detected.

sample_percent (100.0): If specified, this percentage of values are sampled and data types are perceived from that sample. The default is to examine all values.

smiles_field_name (“Original SMILES”): The name of a field to retain the original SMILES strings from the input file, before conversion to molecules.

schema (None): If specified, the provided OERecord defines a schema that will strictly control the interpretation of input values. The field types and metadata will be transferred to the output records.

error_field (None): If this value is provided, it designates a destination field on the output records to show data parsing errors.

class drconvert.OEDUConverter(path: str, options: MolConversionOptions | None = None, display_name: str = None)

Convert an OEDesignUnit oedu file to OERecords

__iter__() → Generator[OEMolRecord, None, None]: Iterator of OERecords

class drconvert.SmilesFileConverter(path: str, options: MolConversionOptions | None = None, display_name: str | None = None)

Convert a SMILES file to OERecords

__iter__() → Generator[OEMolRecord, None, None]: Iterator of OERecords

Returns the appropriate converter for a specified file.

Parameters:

path (str | Path) – The path to the file to be converted
display_name (str or None) – The display name for the file

Returns:

An instance of a converter class

Conversion from Records

class drconvert.RecordConvertToCSV(path: str, delimiter: str = ',', include_mol_title: bool = True)

Convert a record file to CSV format

__iter__() → Generator[str, None, None]

Iterator of CSV lines.

Returns header as the first item, followed by the rows of the CSV

Returns:: Iterator of CSV Rows as strings

class drconvert.RecordConvertToMols(path: str)

Convert a record file to OEMols

__iter__() → Generator[OEMol, None, None]: Iterator of OEMols

class drconvert.RecordConvertToOEDU(path: str)

Convert a record file to OEDU

__iter__() → Generator[OEDesignUnit, None, None]: Iterator of OEDUs

class drconvert.ArchiveConverter(path: str, chunk_size: int = 16777216)

Convert a tarball or zip file into OERecords

__iter__() → Generator[CSVConverter | OEDBConverter | OEDUConverter | SmilesFileConverter | MolFileConverter | ShapeQueryConverter, None, None]: Iterator of OERecords

drconvert.record_to_mol(record)

Convert a single record to an OEMol.

Parameters:

strict – Fail if a field with data on the record cannot be added as generic data.
fallback_to_non_primary – If True, then the first molecule field encountered will be used if a primary molecule field is not present on the record.
strip_images – If True, fields containing images will be replaced with the string ‘<Image>’.

Returns:

An OEMol with the record data attached as generic data.

drconvert.record_to_du(record)

Convert a single record with a designunit to an OEDesignUnit.

The record to design unit only converts the design unit on the record, and does not preserve other field data. The conversion is not round trippable.

Conversion to Alternate Formats

DRConvert provides support for alternatives to OpenEye’s formats. The current formats that are supported are pandas Dataframes and Apache Parquet.

These formats require extra libraries that DRConvert does not depend on. To use them install pandas>=0.25.0,<0.26.0 and pyarrow>=0.14.1,<0.15.0.

Pandas

drconvert.pandas.read_record_file_to_dataframe(path: str, serializable: bool = False) → Series | DataFrame

Reads a record file into a pandas DataFrame

Requires pandas, if not installed importing will trigger an ImportError

Parameters:

path (string) – Path to record file
serializable (bool) – If serializable, keep all non-POD types as bytes. Required to serialize dataframe in some cases

Returns:

A pandas Dataframe representation of the record file

drconvert.pandas.record_to_dataframe(record: OEMolRecord, serializable: bool = False) → DataFrame

Converts an OERecord to a pandas Dataframe.

Requires pandas, if not installed importing will trigger an ImportError

Parameters:

record (OERecord) – An OERecord
serializable (bool) – If serializable, keep all non-POD types as bytes. Required to serialize dataframe in some cases

Returns:

A pandas Dataframe representation of the record

Parquet

drconvert.parquet.parquet_to_dataframe(path)

Returns a Dataframe read from a parquet file

Requires pandas and pyarrow, if not installed importing will trigger an ImportError

Parameters:: path (string) – Path to parquet file
Returns:: DataFrame

drconvert.parquet.record_file_to_parquet(path, output_path, compression='snappy')

Reads a record file and writes out a parquet file

Requires pandas and pyarrow, if not installed importing will trigger an ImportError

Parameters:

path (string) – Path to record file
output_path (string) – Path to write parquet file
compression (string) – Compression format, valid options are ‘snappy’, ‘gzip’, ‘brotli’, None. Default is ‘snappy’