Frequently Asked Questions

When should hardware requirements be changed?

The memory, disk space and number of threads parameters can be very important for running the most efficient QM calculations. The defaults of 14GB memory, 25GB disk space, and 8 CPUs were set to be conservative for our default methods and basis sets (HF/6-31G for geometry optimizations and B3LYP/6-31G* for single point energies) for most drug like molecules (up to 50 heavy atoms). If you are going to run calculations with a non-default method or basis set, we recommend performing smaller scale benchmark calculations to make sure your resource settings are sufficient.

How should hardware metrics be checked in QM calculations?

By default, the memory, disk space and CPU usage metrics are turned on for all three Cubes which use Gaussian Software (Run Gaussian Input Directories, Gaussian Single Point Energy, and Gaussian Geometry Optimization Cube). That means you can see how much memory and disk space the cube actually used during a Floe by following these steps:

  1. Go to the Floe tab, click on Jobs, then select your running or finished Floe.

  2. Zoom in on the Gaussian Calculation Cubes in the Floe (why multiple serial Cubes)

  3. Make sure to select a Cube that had 1 or more records pass through it

  4. Click on the ellipsis

  5. Select Cube Metrics

  6. A graph will appear showing how each Cube metric changes over time.

    1. You can deactivate a metric you do not want to see by clicking on its name in the top right.

    2. When scrolling over the graph the value for each line appears, this is helpful since the units for each metric are different. The memory and disk space are shown in MB and the CPU is a percentage of requested CPUs (i.e. 100 means 8 CPUs out of the 8 requested are being used).

Gaussian Floe Metrics

How much will a Floe cost?

This is a difficult question to answer. The computational cost of QM calculations are going to depend on both the size of your molecule and the method and basis set you have chosen for your calculation. When changing the method or basis set or moving to much larger molecules (>50 heavy atoms) it is best to run a small benchmark calculation while monitoring the Cube metrics on Orion.

What method and basis set should be used?

There is no universal right answer to this question. Before making changes, consider if the default method and basis set are appropriate, what elements and ions are included in your system, and the type of calculation being performed.

The defaults chosen for each Floe have the goal of being affordable and reasonably accurate for most neutral drug like molecules. HF/6-31G and B3LYP/6-31G* are the defaults for geometry optimizations and single point energies. These are both reltively small basis sets. If your calculations include charged species or elements larger than Krypton, a larger basis set may be required. Any time you change the defaults make sure to monitor the Cube metrics to make sure the memory, disk space, and thread count settings are sufficient. Consult the Gaussian documentation for more information on DFT methods and basis sets.

Remember to think about the calculation being performed before changing the default method and basis set. If you need minimized geometries at a high level of theory, it can be more efficient to start with a lower level of theory. For example, if starting with conformers from Omega, which uses a force field to set bond lengths and angles, a geometry optimization at the default level of theory should be performed first. Then perform a second optimization at the higher level of theory.

How can Gaussian log files be used to understand failures?

There are a variety of errors that can cause Gaussian input files to fail. The Gaussian QM Run Input Files Failed Calculation tutorial demonstrates the two ways calculations can fail, (1) an input file being rejected by Orion and (2) the Gaussian calculation failing.

In the first category, failures are relatively easy to understand. The hardware requirements in the Gaussian input file must agree with the resources requested on Orion. In this case, the Failure Report should provide enough information for how to change an input file to make it successful. It is best practice to not specify thread count, memory, or disk space in an input file, instead use the Orion Hardware Requirements to meet your needs.

Understanding Gaussian calculation failures can be more complex. The best option is to download the resulting archive files from a calculation. There is a sub-directory for each calculation. The log files include details of where in the calculation Gaussian failed. For help understanding these log files, consult the Gaussian documentation or see Gaussian’s instructions for technical support. In addition, there are many open source resources about common Gaussian errors including those in the Compute Canada Documentation and Zhe Wang’s Blog

Why are there multiple serial cubes instead of a parallel cube?

In all three Floes in this package, there are many copies of the Cubes that perform the Gaussian calculation. Orion has a variety of different Cube types. Parallel Cubes request resources on AWS on demand based on the number of inputs. While this should be ideal for scaling up any calculation Parallel Cubes also have some important limitations. The primary concern for Gaussian calculations is that Parallel Cubes have a strict 12 hour time limit and calculations that exceed that time limit are cancelled. Given the flexibility in calculations performed with Gaussian input files and the variety of molecule and basis set sizes for the other Floes this time limit needed to be avoided. Instead, all Gaussian Floes use 10 copies of a serial Cube which do not have time limits, but allow for some parallelization across multiple calculations in the same job. To optimize wall clock time, start multiple jobs to allow for more parallelization.

How do you create datasets on Orion?

Datasets on Orion can be created by uploading any standard molecule file (mol2, sdf, xyz, etc). These files will automatically be converted to Datasets with 1 record/molecule. If you upload file types which have coordinates, but no connectivity information (i.e. XYZ) make sure to look at the uploaded molecules. When converting these files to molecules on Orion, connectivity and bond order are perceived based on atom distances which can lead to unexpected atomic formal charges or implicit hydrogens. At this time, molecules with implicit hydrogens will fail Gaussian on Orion calculations.

Datasets can also be created with SMILES files or by sketching a molecule on Orion. In these cases the molecules will not have coordinates and cannot be used with the Gaussian Floes. However, if you also have access to the small-molecule-discovery-suite there are many Floes which can help generate conformers, such as OMEGA - 3D Conformer Ensemble Generation.

How do you rerun a Gaussian calculation from a checkpoint file?

Checkpoint files are only generated if they are specified in the Gaussian input file with link0 entry %chk. If you have completed a calculation with a checkpoint file, then that file can be used as part of the input for a future calculation. The only requirement is that the checkpoint file is in the same subdirectory before archiving them into a tar or zip file for the calculation. For more information about checkpoint files refer to the Gaussian documentation for input files.

Can GPUs be used with Gaussian on Orion?

The short answer: no.

This decision was made by considering the cost increase and availability of GPUs on Orion compared with the expected speed up. For now, this trade off does not seem to be worth it. This may change in future releases. If you need GPUs for your Gaussian calculations please contact support@eyesopen.com.

How should files for the Run Gaussian Input File Floe be created and organized?

The Gaussian QM Run Input Files Floe looks for Gaussian input files with .com or .gjf extensions. This Floe can also take tar or zip files with multiple input directories. The Floe can also accept multiple files of any type as input. Tar files with no compression or zip, gzip, bzip2 or lzma with compression can be processed. In case of multiple .com or .gjf Gaussian files, you can keep all the files in the main directory or any organization of sub-directories before archiving.

When there are multiple input files in the same directory, a new subdirectory will be created for each file (with the same name as the file). If duplicate files or directories are found when parsing all input to the Floe, an integer will be added to the end of the sub-directory name in order to reduce the risk of lost data. However, this could cause confusion when analyzing the output so it is best practice to make sure all Gaussian input files (even if multiple archives are passed to the same Floe) have unique names.

The output from all calculations in one job will be saved to a single output file. The output files must be tar files with or without compression. The type of tar file is determined by the extension. If an unsupported extension is provided, a tar file without compression will be created (i.e. if the output file is named Gaussian_output.zip, Gaussian_output.tar is created instead).

Why did the Gaussian input file change after running on Orion?

When running Gaussian files on Orion, those files are validated in a number of ways, some of which may result in the file being changed. Any references to memory, thread counts, or disk space in the input file are compared to the hardware requirements requested for the Floe. If more memory, disk space, or threads are specified in the input file, the calculation will fail.

There is only one situation where an input file could be changed. If an input file specifies a path to any of the Gaussian output files then those paths are reset. There is limit write access for Cubes on Orion, so output file paths are reset to be written to the directory where the calculation was run. This means those files will be included in the output.

What should you know before running a large calculation with a Gaussian input file?

The ability to run a Gaussian calculation with any input file allows for flexible calculations. However, the default hardware requirements for this Floe were set to the same levels as the Single Point Energy and Geometry Optimization Floes. That is, they should be sufficient for DFT calculations on most drug molecules (<50 heavy atoms) with relatively small basis sets (i.e. B3LYP/6-31G*). When running calculations with larger molecules or higher levels of theory, keep a close eye on the metrics on Orion.

It is also best practice to always save your checkpoint file, by adding the line %chk=[filename].chk to your input file. These files save the state of your system and are useful for restarting a calculation when necessary.