OpenEye-drconvert API Reference
Conversion to Records
- class drconvert.MolFileConverter(path: str, options: MolConversionOptions | None = None, display_name: str | None = None)
Convert a Molecular file to OERecords
- __iter__() Generator[OEMolRecord, None, None]
Iterator of OERecords
- class drconvert.MolConversionOptions(isomeric_conf_test=False, schema_limit=0, clear_mol_data=True, unique_values_limit=25, sample_percent=100.0, generic_tags=[], mol_title_field='', smiles_field_name='Original SMILES', schema=None, error_field=None)
This class contains options for conversion from molecules to records.
Constructor arguments (and their defaults) are:
isomeric_conf_test (False): If True, consecutive input molecules with matching graphs are stored as conformers in a multiconformer molecule.
schema_limit (0): The maximum number of input rows to sample. The default is to sample all rows.
clear_mol_data (True): If True, molecule data (SD and generic) is not copied to each conformer from the parent molecule.
unique_values_limit (25): The number of unique values will be counted up to this limit. If the number of unique values in a field is below this limit, metadata will be added to the field indicating that the data is categorical.
sample_percent (100.0): If specified, this percentage of values are sampled and data types are perceived from that sample. The default is to examine all values.
generic_tags ([]): If provided, generic data is transferred from the input molecule to the output record for the specified tags.
mol_title_field (“”): If provided, the titles of molecules are extracted into a separate field.
smiles_field_name (“Original SMILES”): The name of a field to retain the original SMILES strings from the input file, before conversion to molecules.
schema (None): If specified, the provided OERecord defines a schema that will strictly control the interpretation of input values. The field types and metadata will be transferred to the output records.
error_field (None): If this value is provided, it designates a destination field on the output records to show data parsing errors.
- class drconvert.CSVConverter(path: str, options: CSVConversionOptions | None = None, display_name: str | None = None)
Convert a CSV file to OERecords
- __iter__() Generator[OEMolRecord, None, None]
Iterator of OERecords
- class drconvert.CSVConversionOptions(schema_limit=0, unique_values_limit=25, delimiter=None, sample_percent=100.0, smiles_field_name='Original SMILES', schema=None, error_field=None)
This class contains options for converting .csv files to records.
Constructor arguments (and their defaults) are:
schema_limit (0): The maximum number of input rows to sample. The default is to sample all rows.
unique_values_limit (25): The number of unique values will be counted up to this limit. If the number of unique values in a field is below this limit, metadata will be added to the field indicating that the data is categorical.
delimiter (None): What character to use for the CSV delimiter. By default, the delimiter is automatically detected.
sample_percent (100.0): If specified, this percentage of values are sampled and data types are perceived from that sample. The default is to examine all values.
smiles_field_name (“Original SMILES”): The name of a field to retain the original SMILES strings from the input file, before conversion to molecules.
schema (None): If specified, the provided OERecord defines a schema that will strictly control the interpretation of input values. The field types and metadata will be transferred to the output records.
error_field (None): If this value is provided, it designates a destination field on the output records to show data parsing errors.
- class drconvert.OEDUConverter(path: str, options: MolConversionOptions | None = None, display_name: str | None = None)
Convert an OEDesignUnit oedu file to OERecords
- __iter__() Generator[OEMolRecord, None, None]
Iterator of OERecords
- class drconvert.SmilesFileConverter(path: str, options: MolConversionOptions | None = None, display_name: str | None = None)
Convert a SMILES file to OERecords
- __iter__() Generator[OEMolRecord, None, None]
Iterator of OERecords
- drconvert.get_converter(path: str | Path, display_name: str | None = None) CSVConverter | OEDBConverter | OEDUConverter | SmilesFileConverter | MolFileConverter | ShapeQueryConverter
Returns the appropriate converter for a specified file.
- Parameters:
path (str | Path) – The path to the file to be converted
display_name (str or None) – The display name for the file
- Returns:
An instance of a converter class
Conversion from Records
- class drconvert.RecordConvertToCSV(path: str, delimiter: str = ',', include_mol_title: bool = True)
Convert a record file to CSV format
- __iter__() Generator[str, None, None]
Iterator of CSV lines.
Returns header as the first item, followed by the rows of the CSV
- Returns:
Iterator of CSV Rows as strings
- class drconvert.RecordConvertToMols(path: str)
Convert a record file to OEMols
- __iter__() Generator[OEMol, None, None]
Iterator of OEMols
- class drconvert.RecordConvertToOEDU(path: str)
Convert a record file to OEDU
- __iter__() Generator[OEDesignUnit, None, None]
Iterator of OEDUs
- class drconvert.ArchiveConverter(path: str, chunk_size: int = 16777216)
Convert a tarball or zip file into OERecords
- __iter__() Generator[CSVConverter | OEDBConverter | OEDUConverter | SmilesFileConverter | MolFileConverter | ShapeQueryConverter, None, None]
Iterator of OERecords
- drconvert.record_to_mol(record)
Convert a single record to an OEMol.
- Parameters:
strict – Fail if a field with data on the record cannot be added as generic data.
fallback_to_non_primary – If True, then the first molecule field encountered will be used if a primary molecule field is not present on the record.
strip_images – If True, fields containing images will be replaced with the string ‘<Image>’.
- Returns:
An OEMol with the record data attached as generic data.
- drconvert.record_to_du(record)
Convert a single record with a designunit to an OEDesignUnit.
The record to design unit only converts the design unit on the record, and does not preserve other field data. The conversion is not round trippable.
Conversion to Alternate Formats
DRConvert provides support for alternatives to OpenEye’s formats. The current formats that are supported are pandas Dataframes and Apache Parquet.
These formats require extra libraries that DRConvert does not depend on. To use them
install pandas>=0.25.0,<0.26.0
and pyarrow>=0.14.1,<0.15.0
.
Pandas
Copyright (C) 2023 Cadence Design Systems, Inc. (Cadence)
- drconvert.pandas.read_record_file_to_dataframe(path: str, serializable: bool = False) Series | DataFrame
Reads a record file into a pandas DataFrame
Requires pandas, if not installed importing will trigger an ImportError
- Parameters:
path (string) – Path to record file
serializable (bool) – If serializable, keep all non-POD types as bytes. Required to serialize dataframe in some cases
- Returns:
A pandas Dataframe representation of the record file
- drconvert.pandas.record_to_dataframe(record: OEMolRecord, serializable: bool = False) DataFrame
Converts an
OERecord
to a pandas Dataframe.Requires pandas, if not installed importing will trigger an ImportError
Parquet
Copyright (C) 2023 Cadence Design Systems, Inc. (Cadence)
- drconvert.parquet.parquet_to_dataframe(path)
Returns a Dataframe read from a parquet file
Requires pandas and pyarrow, if not installed importing will trigger an ImportError
- Parameters:
path (string) – Path to parquet file
- Returns:
DataFrame
- drconvert.parquet.record_file_to_parquet(path, output_path, compression='snappy')
Reads a record file and writes out a parquet file
Requires pandas and pyarrow, if not installed importing will trigger an ImportError
- Parameters:
path (string) – Path to record file
output_path (string) – Path to write parquet file
compression (string) – Compression format, valid options are ‘snappy’, ‘gzip’, ‘brotli’, None. Default is ‘snappy’