CCL Home Page
Up Directory CCL README
                        MMFF94 Validation Suite
                        -----------------------

                        Dr. Thomas A. Halgren
                        Merck and Co., Inc.
                        Building 50SW-100
                        P.O. Box 2000
                        Rahway, NJ 07065

                        Phone:  732-594-7735
                        Fax:    732-594-4224
                        E-Mail: halgren@merck.com

The test molecules for this validation suite consist of 698 structures derived 
from the Cambridge Structural Database maintained by the Cambridge 
Crystallographic Data Center (which graciously gave permission for their use), 
plus 55 additional structures for small molecules and ions.  The native CSD 
structures were modified by assigning single and multiple bonds, affixing 
formal ionic charges where appropriate, and adding hydrogens to complete the 
valence.  The resultant structures were minimized to a rms gradient of 0.000001 
kcal/mol/A on the MMFF94 energy surface, and were then systematically distorted 
and re-minimized, and then distorted and re-minimized again.  The distortion/
re-minimizaton steps were taken to reduce the likelihood that any final 
conformation represents a very shallow local minimum on the MMFF94 surface, as 
a molecular-mechanics optimizer might conceivably convert such a conformation 
to a different local minimum and falsely imply a problem with the implementation 
of MMFF94 being tested.

The validation suite was constructed to test all entries in the MMFF*.PAR 
parameter files as well as all default-parameter and empirical-rule procedures. 
The MMFF94 parameter files can be accessed via an Internet browser at 
http://journals.wiley.com (select "Journal of Computational Chemistry", then 
"Supplementary Material", then "Volume 17", then the hyperlink for page 490) 
or at ftp://ftp.wiley.com/public/journals/jcc/suppmat/17/490.  The parameter 
files can also be accessed by ftp at anonymous@ftp.wiley.com; cd to 
public/journals/jcc/suppmat/17/490.

In addition to input molecular structure files, this test suite provides output 
files from computer runs made using Merck's OPTIMOL molecular-mechanics program 
and BatchMin version 5.5 from Columbia University. 


                               Input Data
                               ----------

Input structure files are provided in two formats: "mol2", from Tripos, and 
"mmd", the designation used at Merck for BatchMin "dat" files.  We chose these 
file formats because they are in fairly widespread use and because they allow 
explicit single and multiple bonds to be designated.  For the convenience of 
the user, the mol2 files are presented in two versions.  One of these  -- 
MMFF94_dative.mol2 -- uses dative bonding at sulfur in sulfonamides and similar 
compounds.  This representation treats a sulfonamide as having four single 
bonds to a tetracoordinate +2 sulfur, two of which come from formally negative 
terminal oxygen atoms.  This is the native representation for OPTIMOL, the host 
program for MMFF.  In contrast, the native BatchMin representation features two 
double bonds from formally neutral oxygen atoms to a formally neutral sulfur, 
for a (hypervalent) total of six bonds to sulfur.  This hypervalent bonding 
pattern is used in the MMFF94_hypervalent.mol2 and MMFF94.mmd files in the test 
suite.    

Thus, the following files comprise the input molecular structure data:

			MMFF94_dative.mol2
			MMFF94_hypervalent.mol2
			MMFF94.mmd  (hypervalent representation only)

In addition, a MMFF94.dative_molecules file is included that lists the names 
of the molecules (129 in number) for which the mol2 files provide contrasting 
dative and hypervalent structures.  (For the mmd file, the molecule names begin 
in column 11 of the header cards, immediately following the left square 
bracket.)  Finally, a MMFF94.titles file gives short titles for all of the 
molecules in the test set.

                               Output Data
                               -----------

The MMFF94.energies file contains records that list the molecule name, the 
total MMMFF94 energy computed by OPTIMOL, and the BatchMin 5.5 energy.  It 
should be noted that the BatchMin calculations used a locally modified version 
of the mmff_setup co-process in which mmff_setup was enhanced to handle the 
full range of hypervalent -> dative bonding conversions encountered in the test 
suite; some cases were not properly accomodated in the distributed BatchMin 5.5 
and 6.0 code, but all should be properly handled beginning with BatchMin 6.5 
(these internal bonding conversions are needed because the mmff-setup code, 
which was derived from OPTIMOL, assumes dative bonding). In all cases, no 
cutoffs on nonbonded interactions were employed and a unit dielectric constant 
was used. As comment records in the MMFF94.energies file indicate, the OPTIMOL 
and BatchMin total energies agree to within 0.0001 kcal/mol in all but 15 
instances;  the largest difference is about 0.0035 kcal/mol.  The 15 cases are 
ones in which a positive or negative formal charge is shared among three atoms 
of the same MMFF atom type (e.g., the three nitrogens of a guanidinium group); 
the single-precision division by 3 in the BatchMin run produces a less precise 
final partial atomic charge and a less accurate total MMFF94 energy.

The MMFF94_bmin.log file contains BatchMin 5.5 output, obtained on a sgi R10000 
processor, for single-point energy calculations on input structures read from 
the MMFF94.mmd file.  This log file partitions the total energy into components 
such as bond stretching, angle-bending, torsion, van der Waals, and 
electrostatic.  It provides the next level of information beyond the simple 
compilation of total energies found in the MMFF94.energies file.

Finally, the MMFF94_opti.log file contains the output from an OPTIMOL run that 
employed as input an internal Merck-format data file, MMFF94.ffd, that contains 
the same information as does the file MMFF94_dative.mol2 (indeed, the mol2 file 
was created from it).  This log file provides by far the greatest amount of 
validation information.  For each molecule, an initial "list" section gives the 
symbolic and numeric MMFF94 types for each atom, together with the MMFF94 
formal charge (usually zero) and partial atomic charge (the last of which is 
also provided in the input data files).  Next, the total energy and the energy 
components (bond stretching, ...) are listed.  Also shown is the total rms 
gradient (kcal/mol/A).  This quantity is typically small, as befits an energy-
minimized structure, but is not zero because the stored coordinates have too 
little numerical precision.  Finally, the "analyze" section exhaustively lists 
all interactions of a given type (i.e., all bond-stretching interactions, all 
angle-bending interactions, ...), and reports both the force-field parameters 
and the "strain energy" for the interaction.  The notation should be obvious 
for the most part, but it should be noted that the listed "FF CLASS" indices 
are the quantities called "bond-type index", "angle-type index", etc., in the 
1996 J. Comp. Chem. papers (see References).  For nonbonded interactions, only 
pair-wise terms for which the van der Waals repulsion energy is at least 
0.01 kcal/mol are listed.  Each nonbond output line includes the separate vdW 
attraction and repulsion components, the Coulombic interaction energy, and the 
Buffered 14-7 R* and Eps parameters produced by the MMFF combination rules; 
this data should be more than sufficient to validate an implementation of the 
MMFF94 nonbonded potential. One cautionary note: eqs.(3) and (4) in the fifth 
MMFF paper were typeset incorrectly; their counterparts in the first four MMFF 
papers, however, are correct.  The OPTIMOL run was made on a R4000 processor.


                          Recommendation and Request
                          --------------------------

To validate a MMFF94 implementation, it would certainly make sense to choose a 
subset of the test suite, to convert the mol2 or mmd input data to another 
format if necessary, and then to begin by computing and comparing total 
energies to those listed in the MMFF94.energies file; if and when differences 
are found, the component energies can then be compared to those listed in 
the MMFF94_bmin.log or MMFF94_opti.log files.  Examination of the detailed 
interaction listings in the OPTIMOL log file might then be needed to diagnose a 
problem.  Ultimately, the entire test set should be checked.  It is the 
implementer's choice as to whether to use a dative- or hypervalent-bonding 
representation for affected compounds, or to support both formats.  

We have two requests.  The first is that any implementation of MMFF94 be
identified simply as "MMFF94", and that the name "Merck" not be used in product 
literature or in any other way.  This is a trademarking issue that our lawyers 
understand better than I; they are quite adamant about it.

The second request is that any implementation of MMFF94 be explicitly
characterized by its authors as to whether it is: (1) complete, or (2) partial.  
An implementation should not be labeled "complete" unless it is applicable to
all 753 molecules in the test suite and produces total and component energies 
that match those posted here to within numerical precision.  For a partial 
implementation, published descriptions and product literature should state the 
degree to which the implementation is applicable to the molecules in the test 
suite and the degree to which it produces authentic results for those members 
of the test-suite to which it is applicable; a clear statement should also be 
made as to whether or not the MMFF94 functional form has been fully implemented, 
as well as whether or not the MMFF94 "step-down" equivalencing protocol for 
default parameter assigmnent is fully utilized and whether or not the MMFF94 
empirical-rule procedures for parameter generation are faithfully employed.


                                Restrictions
                                ------------

While a legal agreement permits the posting of this public-access validation 
suite, it prohibits Merck from providing assistance in the development,
testing, and implementation of MMFF to any third-party commercial software 
development company other than academic developers of software.  As a matter of 
courtesy, however, we would appreciate hearing from parties that implement 
MMFF94 as to how they characterize the completeness and accuracy of their 
implementation of MMFF94.

                                References
                                ----------

1.  Thomas A. Halgren, J. Comput. Chem., 17, 490-519 (1996). 
2.  Thomas A. Halgren, J. Comput. Chem., 17, 520-552 (1996).
3.  Thomas A. Halgren, J. Comput. Chem., 17, 553-586 (1996).
4.  Thomas A. Halgren and Robert B. Nachbar, J. Comput. Chem., 17, 587-615 
    (1996).
5.  Thomas A. Halgren, J. Comput. Chem., 17, 616-641 (1996).
6.  Thomas A. Halgren, J. Comput. Chem., submitted (May, 1998).
7.  Thomas A. Halgren, J. Comput. Chem., submitted (May, 1998).

Paper 6 describes the derivation and performance of the MMFF94s variant of
MMFF94.  This variant and the rationale for it are briefly described in 
papers 1, 3, and 4.  A companion MMFF94s validation suite will be provided when 
this manuscript is published (whether in J. Comp. Chem. or elsewhere) and the 
MMFF94s parameters have passed into the public domain.

Paper 7 compares the abilities of MMFF94, MMFF94s, CFF95, CVFF, MSI CHARMm, 
AMBER*, OPLS*, MM2*, and MM3* (1) to reproduce experimental and theoretical
values for conformational energies, and (2) to produce realistic values and 
trends for intermolecular-interaction energies and geometries in hydrogen-
bonded complexes.  Some results are also presented for CHARMM 22.
Modified: Thu Jun 4 16:00:00 1998 GMT
Page accessed 2042 times since Wed Dec 27 09:33:37 2006 GMT