CCL Home Page
Up Directory CCL README
                      MMFF94 Validation Suite
                      ------------------------

                 Revised and extended, November 1998

                        Dr. Thomas A. Halgren
                        Merck and Co., Inc.
                        Building 50SW-100
                        P.O. Box 2000
                        Rahway, NJ 07065

                        Phone:  732-594-7735
                        Fax:    732-594-4224
                        E-Mail: halgren@merck.com

The test molecules for this validation suite consist of 698 structures derived 
from the Cambridge Structural Database maintained by the Cambridge 
Crystallographic Data Center (which graciously gave permission for their use), 
plus 63 additional structures for small molecules and ions, for a total of 761. 
The native CSD structures were modified by assigning single and multiple bonds, 
affixing formal ionic charges where appropriate, and adding hydrogens to 
complete the valence.  The resultant structures were minimized to a rms gradient 
of 0.000001 kcal/mol/A on the MMFF94 energy surface, and were then 
systematically distorted and re-minimized, and then distorted and re-minimized 
again.  The distortion/re-minimizaton steps were taken to reduce the likelihood 
that any final conformation represents a very shallow local minimum on the 
MMFF94 surface, as a molecular-mechanics optimizer might conceivably convert 
such a conformation to a different local minimum and falsely imply a problem 
with the implementation of MMFF94 being tested.

The validation suite was constructed to test all entries in the MMFF*.PAR 
parameter files as well as all default-parameter and empirical-rule procedures.
The last 8 the 63 additional structures mentioned above are new entries in this
updated suite.  Named ERULE_01 through ERULE_08, these structures are fragments 
of CSD structures that have been chosen to more fully test the MMFF94 empirical-
rule parameter generation procedures than did the original members of the 
suite (see below).  The MMFF94 parameter files can be accessed via an Internet 
browser at http://journals.wiley.com (select "Journal of Computational 
Chemistry", then "Supplementary Material", then "Volume 17", then the hyperlink 
for page 490) or at ftp://ftp.wiley.com/public/journals/jcc/suppmat/17/490.  The parameter 
files can also be accessed by ftp at anonymous@ftp.wiley.com; cd to 
public/journals/jcc/suppmat/17/490.

The additional ERULE structures represent one reason for issuing this updated 
validation suite.  A second is to correct the MMFF94 results for eight members 
of the original suite -- namely, for structures CEWYIM30, DAKCEX, FAPLUD, 
GIGCEE, KEPKIZ, SAKGUG, TAPJUP, and VEWZOM. Errors in the original MMFF94 atom 
typing for these structures were discovered when individuals attempting to 
implement MMFF94 de novo encountered discrepancies with the posted results. 
Most such discrepancies had to do with the rules for assignment of aromaticity 
in nested ring systems.  

In addition to input molecular structure files and auxiliary data, the
validation suite provides output files from computer runs made using Merck's 
OPTIMOL molecular-mechanics program and BatchMin 5.5 from Columbia University. 

Note: some files are quite large. Before downloading, you may want to check the 
sizes listed at the end of this document.  You may want to retrieve the 
compressed tar achive of these files, MMFF94.tar.gz (6.07 MBytes), and
unpack it by giving the following UNIX command:

                     gunzip -c MMFF94.tar.gz | tar xvof -


                            Structure Input Files
                            ---------------------

The following files comprise the input molecular structure data:

			   MMFF94_dative.mol2
			   MMFF94_hypervalent.mol2
			   MMFF94.mmd (hypervalent representation only)


Two formats are provided: "mol2", from Tripos, and "mmd", the designation used 
at Merck for BatchMin "dat" files.  We chose these file formats because they 
are in fairly widespread use and because they allow explicit single and 
multiple bonds to be designated. Unlike file formats more commonly used at 
Merck, these formats are limited in that they cannot specify formal-charge 
information. For this updated suite, however, this information has been 
included in other files described below.

For the convenience of the user, the mol2 files are presented in two versions.  
One of these  -- MMFF94_dative.mol2 -- uses dative bonding in tetracoordinate 
sulfur and phosphorous compounds.  This representation, for example, treats a 
sulfonamide as having four single bonds to a +2 sulfur, two of which come from 
formally negative terminal oxygen atoms.  This is the native representation for 
OPTIMOL, the host program for MMFF.  In contrast, the native BatchMin 
representation features two double bonds from formally neutral oxygen atoms to 
a formally neutral sulfur, for a (hypervalent) total of six bonds to sulfur; 
correspondingly "hypervalent" phosphorous compounds have a total of five 
bonds to phosphorous.  This hypervalent bonding pattern is used in the
MMFF94_hypervalent.mol2 and MMFF94.mmd files in the validation suite. Note: the 
atom types in the mol2 files (which were generated by a file conversion 
procedure developed at Merck) in some cases differ from authentic SYBYL atom 
types, and therefore should not be relied upon.


                           Output Data Files
                           -----------------

Results of the MMFF94 calculations are contained in the following three files:

                           MMFF94.energies
			   MMFF94_bmin.log
			   MMFF94_opti.log

The MMFF94.energies file contains records that list the molecule name, the 
total MMMFF94 energy computed by OPTIMOL, and the BatchMin 5.5 energy.  It 
should be noted that the BatchMin calculations used a locally modified version 
of the mmff_setup co-process in which mmff_setup was enhanced to handle the 
full range of hypervalent -> dative bonding conversions encountered in the 
suite; some cases were not properly accommodated in the distributed BatchMin 5.5 
and 6.0 code, but all should be properly handled beginning with BatchMin 6.5 
(these internal bonding conversions are needed because the mmff-setup code, 
which was derived from OPTIMOL, assumes dative bonding). In all cases, no 
cutoffs on nonbonded interactions were employed and a unit dielectric constant 
was used. As comment records in the MMFF94.energies file indicate, the OPTIMOL 
and BatchMin total energies agree to within 0.0001 kcal/mol in all but 15 
instances;  the largest difference is about 0.0035 kcal/mol.  The 15 cases are 
ones in which a positive or negative formal charge is shared among three atoms 
of the same MMFF atom type (e.g., the three nitrogens of a guanidinium group); 
the single-precision division by 3 in the BatchMin run produces a less precise 
final partial atomic charge and a less accurate total MMFF94 energy.

The MMFF94_bmin.log file contains BatchMin 5.5 output, obtained on a SGI R10000 
processor, for single-point energy calculations on input structures read from 
the MMFF94.mmd file.  This log file partitions the total energy into components 
such as bond stretching, angle-bending, torsion, van der Waals, and 
electrostatic.  It provides the next level of information beyond the simple 
compilation of total energies found in the MMFF94.energies file.

Finally, the MMFF94_opti.log file contains the output from an OPTIMOL run that 
employed as input an internal Merck-format data file, MMFF94.ffd, that contains 
a superset of the information provided in the file MMFF94_dative.mol2 (which 
was created from it).  This log file provides by far the greatest amount of 
validation information.  For each molecule, it begins with information about 
the atom typing (when rings are present) and lists any invocations of the 
empirical-rule generation procedures. An initial "list" section then gives the 
symbolic and numeric MMFF94 types for each atom, together with the MMFF94 
formal atomic charge (fractional, rather than integral, when carboxylate anions, 
guanidinium cations, etc., are present, but usually zero) and partial atomic 
charge (also provided in the input data files).  Next, the total energy and the 
energy components (bond stretching, ...) are listed.  Also shown is the total 
rms gradient (kcal/mol/A).  This quantity is typically small, as befits an 
energy-minimized structure, but is not zero because the stored coordinates have 
too little numerical precision.  Finally, the "analyze" section exhaustively 
lists all interactions of a given type (i.e., all bond-stretching interactions, 
all angle-bending interactions, ...), and reports both the force-field 
parameters and the "strain energy" for the interaction.  The notation should be 
obvious for the most part, but it should be noted that the listed "FF CLASS" 
indices are the quantities called "bond-type index", "angle-type index", etc., 
in the 1996 J. Comput. Chem. papers (see References).  For nonbonded 
interactions, only pair-wise terms for which the van der Waals repulsion energy 
is at least 0.01 kcal/mol are listed.  Each nonbond output line includes the 
separate vdW attraction and repulsion components, the Coulombic interaction 
energy, and the Buffered 14-7 R* and Eps parameters produced by the MMFF 
combination rules; this data should be more than sufficient to validate an 
implementation of the MMFF94 nonbonded potential. One cautionary note: eqs.(3) 
and (4) in the fifth MMFF paper were typeset incorrectly; their counterparts in 
the first four MMFF papers, however, are correct.  The OPTIMOL run was also 
made on a R10000 processor.


                          Auxiliary Files
                          ---------------

The following files provide additional information:

                           MMFF94.titles
                           MMFF94.changed-or-new_results
                           MMFF94.empirical_rule_parameters
                           MMFF94.dative_molecules
                           MMFF94.fc_dative
                           MMFF94.fc_hypervalent
                           
The MMFF94.titles file gives short titles for all of the molecules in the suite. 
The MMFF94.changed-or-new_results file lists the new MMFF94 energies for the 
eight molecules for which there have been changes in the MMFF94 atom-type 
assignments.  For reference, this file also lists the previously obtained 
MMFF94 energies; it should be noted that the new energies reflect new MMFF94-
optimized geometries as well as new atom types and parameter assignments.  This 
file also lists the MMFF94 energies for the eight added "empirical rule" 
structures.  The MMFF94.empirical_rule_parameters file lists structures for 
which parameters generated from MMFF94 empirical rules are required and 
specifies the interactions involved.  This file shows that only structures 
CEWYIM30, KEPKIZ, and OHMW1 from the original suite required such parameter 
generation.  (It should be noted, however, that only the first instance of the 
generation of a given parameter is reflected in the file; such generated 
parameters are added to the internal database used by OPTIMOL or BatchMin, and 
therefore are no longer "missing" if later structures in the suite request 
them.)  Next, the MMFF94.dative_molecules file lists the names of the molecules 
(129 in number) for which the mol2 files provide contrasting dative and 
hypervalent structures. (For the mmd file, which always uses the hypervalent 
representation, the molecule names begin in column 11 of the header cards, 
immediately following the left square bracket.)  Finally, for the sake of 
completeness the MMFF94.fc_dative and MMFF94.fc_hypervalent files specify the 
formal ionic charges used in these representations; as indicated previously, 
this information is not preserved in the "mol2" input files, though in most 
cases it is implicit in the MacroModel atom types (some of which represent 
Merck extensions) listed in the "mmd" file.


                          Recommendation and Request
                          --------------------------

To validate a MMFF94 implementation, it would certainly make sense to choose a 
subset of the suite, to convert the mol2 or mmd input data to another format if 
necessary, and then to begin by computing and comparing total energies to those 
listed in the MMFF94.energies file; if and when differences are found, the 
component energies can then be compared to those listed in the MMFF94_bmin.log 
or MMFF94_opti.log files.  Examination of the detailed interaction listings in 
the OPTIMOL log file might then be needed to diagnose a problem.  Ultimately, 
the entire validation suite should be checked.  It is the implementer's choice 
as to whether to use a dative- or hypervalent-bonding representation for 
affected compounds, or to support both formats.  

We have two requests.  The first is that any implementation of MMFF94 be
identified simply as "MMFF94", and that the name "Merck" not be used in product 
literature or in any other way.  This is a trademarking issue that our lawyers 
understand better than I; they are quite adamant about it.

The second request is that any implementation of MMFF94 be explicitly
characterized by its authors as to whether it is: (1) partial, or (2) complete.  
An implementation should not be labeled "complete" unless it is applicable to
all 761 molecules in the test suite and produces total and component energies 
that match those posted here to within numerical precision.  For a partial 
implementation, published descriptions and product literature should state the 
degree to which the implementation is applicable to the molecules in the 
validation suite and the degree to which it produces authentic results for 
those members of the suite to which it is applicable; a clear statement should 
also be made as to whether or not the MMFF94 functional form has been fully 
implemented, as well as whether or not the MMFF94 "step-down" equivalencing 
protocol for default parameter assigmnent is fully utilized and whether or not 
the MMFF94 empirical-rule procedures for parameter generation are faithfully 
employed.


                                Restrictions
                                ------------

While a legal agreement authorizes the posting of this public-access validation 
suite, it prohibits Merck from providing assistance in the development,
testing, and implementation of MMFF to any third-party commercial software 
development company other than academic developers of software.  As a matter of 
courtesy, however, we would appreciate hearing from parties that implement 
MMFF94 as to how they characterize the completeness and accuracy of their 
implementation of MMFF94 and as to whether they find discrepancies they believe 
may reflect errors in the posted results.

                                References
                                ----------

1.  Thomas A. Halgren, J. Comput. Chem., 17, 490-519 (1996). 
2.  Thomas A. Halgren, J. Comput. Chem., 17, 520-552 (1996).
3.  Thomas A. Halgren, J. Comput. Chem., 17, 553-586 (1996).
4.  Thomas A. Halgren and Robert B. Nachbar, J. Comput. Chem., 17, 587-615 
    (1996).
5.  Thomas A. Halgren, J. Comput. Chem., 17, 616-641 (1996).
6.  Thomas A. Halgren, J. Comput. Chem., submitted.
7.  Thomas A. Halgren, J. Comput. Chem., submitted.

Paper 6 describes the derivation and performance of the MMFF94s variant of
MMFF94.  This variant and the rationale for it are briefly described in 
papers 1, 3, and 4.  A companion MMFF94s validation suite will be posted when 
this manuscript is published and the MMFF94s parameters have passed into the 
public domain.

Paper 7 compares the abilities of MMFF94, MMFF94s, CFF95, CVFF, MSI CHARMm, 
AMBER*, OPLS*, MM2*, and MM3* (1) to reproduce experimental and theoretical
values for conformational energies, and (2) to produce reasonable values and 
trends for intermolecular-interaction energies and geometries in hydrogen-
bonded complexes.  Some results are also presented for CHARMM 22.

As of November 1998, papers 6 and 7 have been reviewed favorably, and it is 
anticipated that each will be published in J. Comput. Chem. when revised, and 
shortened, versions, have been resubmitted.  The input data used in evaluating 
force fields in paper 7 are being posted elsewhere on the CCL archives in the 
hope that this information will help others to test additional force fields.


                               File Sizes
                               ----------

 File name                  Size in Bytes
------------------------------------------
 MMFF94.changed-or-new_results         858
 MMFF94.dative_molecules             1,080
 MMFF94.empirical_rule_parameters    2,453
 MMFF94.energies                    31,499
 MMFF94.fc_dative                   32,991
 MMFF94.fc_hypervalent              20,575
 MMFF94.mmd                      2,371,742
 MMFF94.tar.gz                   6,069,796
 MMFF94.titles                      53,537
 MMFF94_bmin.log                 1,181,426
 MMFF94_dative.mol2              1,653,121
 MMFF94_hypervalent.mol2         1,653,121
 MMFF94_opti.log                24,855,731
Modified: Wed Nov 25 01:23:00 1998 GMT
Page accessed 20860 times since Sat Apr 17 21:17:07 1999 GMT