From noy@einstein.sc.mahidol.ac.th  Sat Sep  7 05:53:43 1996
Received: from einstein.sc.mahidol.ac.th  for noy@einstein.sc.mahidol.ac.th
	by www.ccl.net (8.7.5/950822.1) id FAA12141; Sat, 7 Sep 1996 05:31:27 -0400 (EDT)
Received: (from noy@localhost) by einstein.sc.mahidol.ac.th (8.6.12/8.6.9) id QAA06336 for chemistry@www.ccl.net; Sat, 7 Sep 1996 16:23:14 +0700
From: "Dr. Teerakiat Kerdcharoen" <noy@einstein.sc.mahidol.ac.th>
Message-Id: <199609070923.QAA06336@einstein.sc.mahidol.ac.th>
Subject: Colloque Weyl symposium
To: chemistry@www.ccl.net
Date: Sat, 7 Sep 1996 16:23:14 +0700 (GMT+0700)
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit


Dear all,
	I would like to ask whether some of you know about
the symposium on "Metal in Ammonia" that held every two years.
Could you please, if you know, direct me to the information
about this symposium Colloque Weyl on where and when this symposim
will be held for the year 1997 so that my colleague will prepare
to participate this meeting.
sincerely,
Teerakiat

From john@ibg.uu.se  Sat Sep  7 06:29:11 1996
Received: from nucleus.ibg.uu.se  for john@ibg.uu.se
	by www.ccl.net (8.7.5/950822.1) id FAA12145; Sat, 7 Sep 1996 05:43:42 -0400 (EDT)
Received: from localhost by nucleus.ibg.uu.se via SMTP (940816.SGI.8.6.9/940406.SGI.AUTO)
	for <chemistry@www.ccl.net> id LAA22712; Sat, 7 Sep 1996 11:48:15 +0200
Date: Sat, 7 Sep 1996 11:48:13 +0200 (MDT)
From: John Marelius <john@ibg.uu.se>
To: chemistry@www.ccl.net
Subject: New square root algorithm in molecular dynamics
Message-ID: <Pine.SGI.3.95.960907114305.22606D-100000@nucleus.ibg.uu.se>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII


Here is a copy of Terje Mathisen's news posting describing his square root
algorithm to which I referred in my recent posting to the list.

John Marelius
---------------------------------------------------------------------------
Terje Mathisen's article:

John Marelius wrote:
> 
> This posting summarises the results of the discussion on square-root
> calculations in molecular dynamics (MD) applications. Using a novel
> algorithm by Terje Mathisen and general optimisations of the code,
> performance has increased by up to 55%, depending on the simulated system.

[snip]
 
> One person who became interested in the problem was Terje Mathisen, he
> started thinking and coding and has overwhelmed me with a stream of
> ever-improving codes for fast and accurate functions to calculate
> sqrt(1/x).
> 
> Please refer to his posting, describing the details of the algorithm!

OK, here's the story John promised I'd write:

When John made his post with a the inner loop of his simulation code,
one thing was obvious: He had two expensive operations in his inner
loop: One square root, and one division. The rest was all floating point
muls and adds, with no compares or branches.

After a while I realized that if the division and square root steps
could be combined, it would be much easier to speed his code up.

My first attempt separated the exponent and the mantissa part of the
input value, keeping the least significant bit of the exponent together
with the leading bits of the mantissa, and then used linear
interpolation with a fairly large lookup table (32KB, 4K double prec
entries).

This version was very fast, clocking in at about 20 cycles, but gave
only about 32 significant bits in the result.

I also tried higher-order interpolation functions:

A quadratic curve gave another 7-8 bits, but this doubled the processing
time.

The final solution was to use a lookup table for range reduction,
looking up 1/x, and multiplying by it, leaving a number that's very
close to 1.0.

I then used a third-order polynomial on the reduced-range value, before
multiplying the result with the product of two other table lookups: One
for the exponent and another for sqrt(1/x), i.e. the square root of the
values in the first table.

After John tested the code and found that the table-driven version gave
_exactly_ the same 8-digit results for all atoms during the first 50
simulation steps, I decided that we could probably reduce the table
sizes a bit.

At this time I finally figured out how to convert my (truncated) Taylor
polynomial into a Chebychev version, which effectively recovered all the
precision lost by reducing the lookup tables.

I also decided that by making the sqrt(1/x) table single precision, the
double precision (1/x) table would contain the exact squares of the
first table, which would both reduce the table size and increase the
accuracy.

After inspecting the final code, it was obvious that it suffered from
several pipeline stalls, due to direct linear dependencies. By writing a
version that calculated two inverse square roots in parallel,
inv_sqrt2(x, y), most of these losses could be recovered.

The final version needs a good scheduling compiler, but on an Alpha it
runs about 2.5 times faster than the compiler-generated code for
sqrt(1.0/x), leading to the 55% faster results for John's simulations.

Here's the working code:

/* Table-driven sqrt(1/x) functions
   (c) Terje Mathisen 1996

   Use freely as long as my copyright is retained

   Gives approximately 49 significant bits (15 decimal digits).
*/

#include <math.h>

#ifdef BIG_ENDIAN
  #define HIGH_WORD 0
#else
  #define HIGH_WORD 1
#endif

#define MANTISSA_BITS 20
#define MANTISSA_MASK ((1 << MANTISSA_BITS) - 1)
#define EXP_BITS  11
#define EXP_SIZE  (1 << EXP_BITS)
#define EXP_BIAS  1023
#define EXP_ZERO  (EXP_BIAS << MANTISSA_BITS)

/* Here I define the size of the mantissa lookup tables: */
#define MANT_BITS 10
#define MANT_SIZE (1 << MANT_BITS)
#define MANT_SHIFT (MANTISSA_BITS - MANT_BITS)
#define MANT_ROUND (1 << (MANT_SHIFT - 1))

static double exp_root[EXP_SIZE];   /* sqrt(1.0/(2^exp)) */
static float root_x[MANT_SIZE+1];   /* sqrt(1/x)         */
static double inv_x[MANT_SIZE+1];   /* root_x^2          */

/* Taylor terms: */
#define t0 (1.0)
#define t1 (-1.0/2.0)
#define t2 (3.0/8.0)
#define t3 (-5.0/16.0)
#define t4 (35.0/128.0)
#define t5 (-63.0/256.0)

#ifdef TAYLOR_SERIES
  #define c0 t0
  #define c1 t1
  #define c2 t2
  #define c3 t3
#else
  /* These Chebychev terms have been calculated while assuming
     that the 5th and higher orders of the Taylor series can be
     disregarded.
     They must be re-calculated if MANT_BITS is reduced below 10!
  */
  #define c0 (t0 -
t4/(8.0*2.0*16.0*MANT_SIZE*MANT_SIZE*MANT_SIZE*MANT_SIZE))
  #define c1 t1
  #define c2 (t2 + t4/(4.0*MANT_SIZE*MANT_SIZE))
  #define c3 t3
#endif

double inv_sqrt(double x)
{
    double dx, dx2, scale, poly;
    long *plx = (long *) &x;
    long exp, mant;

    exp = plx[HIGH_WORD];
    mant = exp & MANTISSA_MASK;
    plx[HIGH_WORD] = mant + EXP_ZERO;          /* Force exponent == 0 */
    mant = (mant + MANT_ROUND) >> MANT_SHIFT;  /* 11-bit rounded
mantissa */
    exp >>= MANTISSA_BITS;                     /* 11-bit exponent */

    dx = x * inv_x[mant] - 1.0;
    scale = exp_root[exp] * root_x[mant];     /* scale = product of
exponent and mantissa term */

    dx2 = dx*dx;
    poly = c0 + dx2*c2 + dx*(c1 + dx2*c3);
    return poly * scale;
}

double inv_sqrt2(double x, double *y)
{
    double dx, dx2, scalex, polyx;
    double dy, dy2, scaley, polyy;
    long *plx = (long *) &x;
    long *ply = (long *) y;
    long expx, mantx;
    long expy, manty;

    expx = plx[HIGH_WORD];
    expy = ply[HIGH_WORD];
    
    mantx = expx & MANTISSA_MASK;
    manty = expy & MANTISSA_MASK;
    
    plx[HIGH_WORD] = mantx + EXP_ZERO;   /* Force exponent == 0 */
    ply[HIGH_WORD] = manty + EXP_ZERO;   /* Force exponent == 0 */
    
    mantx = (mantx + MANT_ROUND) >> MANT_SHIFT;  /* rounded mantissa */
    manty = (manty + MANT_ROUND) >> MANT_SHIFT;  /* rounded mantissa */
    
    expx >>= MANTISSA_BITS;                      /* 11-bit exponent */
    expy >>= MANTISSA_BITS;                      /* 11-bit exponent */

    dx = x * inv_x[mantx] - 1.0;
    dy = *y * inv_x[manty] - 1.0;
    
    scalex = exp_root[expx] * root_x[mantx]; /* scale = product of
exponent and mantissa term */
    scaley = exp_root[expy] * root_x[manty]; /* scale = product of
exponent and mantissa term */

    dx2 = dx*dx;
    dy2 = dy*dy;
    
    polyx = c0 + dx2*c2 + dx*(c1 + dx2*c3);
    polyy = c0 + dy2*c2 + dy*(c1 + dy2*c3);

    polyx *= scalex;
    polyy *= scaley;

    *y = polyy;
    return polyx;
}

void init_tables(void)
{
    double x, root, dx;
    int i;

    /* First initialize the exp_root table: */
    root = sqrt(0.5); x = 1.0;
    for (i = 1023; i < 2047; i+=2) {
        exp_root[i] = x;
        exp_root[i+1] = x*root;
        x *= 0.5;
    }
    exp_root[2047] = 0.0;               /* sqrt(1.0/Inf) = 0.0 */
    
    x = 2.0;
    for (i = 1022; i > 0; i-=2) {
        exp_root[i] = x*root;
        exp_root[i-1] = x;
        x *= 2.0;
    }
    exp_root[0] = 0.0;
    ((long *) &exp_root[0])[HIGH_WORD] = 0x7ff00000;  /* Infinity */

    /* Next, do the two mantissa-based tables:
       Bug workaround: I must first generate the single prec table,
       and then the double prec inv_x[] table!
       Otherwise, the compiler (Watcom) refuses to use the rounded
       (f) value.
    */
    dx = 1.0 / MANT_SIZE;
    for (i = 0; i <= MANT_SIZE; i++) {
        x = 1.0 + dx * i;
        root_x[i] = sqrt(1.0/x);
    }
    for (i = 0; i <= MANT_SIZE; i++) {
        inv_x[i] = root_x[i]*root_x[i];
    }
}


-- 
- <Terje.Mathisen@hda.hydro.com>
Using self-discipline, see http://www.eiffel.com/discipline
"almost all programming can be viewed as an exercise in caching"


From D.van.der.Spoel@chem.rug.nl  Sat Sep  7 07:53:42 1996
Received: from mailhost.rug.nl  for D.van.der.Spoel@chem.rug.nl
	by www.ccl.net (8.7.5/950822.1) id HAA12367; Sat, 7 Sep 1996 07:21:17 -0400 (EDT)
Received: from rugch4.chem.rug.nl by mailhost.rug.nl with SMTP (PP);
          Sat, 7 Sep 1996 13:19:54 +0200
Received: from rugmd2 by rugch4.chem.rug.nl (NAA08759);
          Sat, 7 Sep 1996 13:19:52 +0200
Received: by rugmd2 (940816.SGI.8.6.9/940406.SGI)	for chemistry@www.ccl.net 
          id NAA20687; Sat, 7 Sep 1996 13:19:49 +0200
From: D.van.der.Spoel@chem.rug.nl (David van der Spoel)
Message-Id: <199609071119.NAA20687@rugmd2>
Subject: SQRT in MD
To: chemistry@www.ccl.net
Date: Sat, 7 Sep 1996 13:19:49 +0200 (MDT)
Reply-To: D.van.der.Spoel@chem.rug.nl
X-Mailer: ELM [version 2.4 PL25]
Content-Type: text


John Marelius <john@ibg.uu.se> wrote:

:The core of the problem is that square root calculations and also
:divisions take up to two orders of magnitude longer than multiplications 
:and additions to execute on most computers.

:Approaches by me and others involving integer-math, look-up tables,
:Taylor polynominals or Newton-Raphson iterations ended up being
:either slower or less accurate than the standard function.

The GROMACS MD code (http://rugmd0.chem.rug.nl/~gmx) includes
a newton rhapson iteration method for the 1.0/sqrt(x) function
that takes 5 floating point instructions, rather than 100
(two orders of magnitude ??) This enables our code
to vectorize or be pipelined on machines (compilers) that can not
inline library functions (e.g. the Portland Compiler on our 
parallel Intel-i860 box).

A description of the algorithm we use can be
found in the GROMACS User Manual, which is freely available from
our WWW page. In most cases it is considerably faster to use the
special instructions for 1.0/sqrt(x), e.g. the SGI R8000 chip has
such an instruction and the compiler detects 1.0/sqrt(x) constructs
in the code. Consequently the inner loop of our coulomb routine
(that calculates 1 coulomb interaction and forces, total 27 floating
point operations) executes in 12 clock cycles on a R8000, leading to
an inner loop performance of 168 MFlops, slightly more than
half the theoretical maximum of 300 (at 75 MHz). 

Regards,

David van der Spoel
---------------------------------------------------------
EMAIL:	spoel@chem.rug.nl
WWW:	http://rugmd0.chem.rug.nl/~spoel
PHONE:	31-50-3634327	FAX: 31-50-3634800
MAIL:	Nijenborgh 4, 9747 AG Groningen, The Netherlands.
---------------------------------------------------------


From tp@elptrs7.rug.ac.be  Sat Sep  7 10:53:44 1996
Received: from elptrs7.rug.ac.be  for tp@elptrs7.rug.ac.be
	by www.ccl.net (8.7.5/950822.1) id JAA12766; Sat, 7 Sep 1996 09:56:07 -0400 (EDT)
Received: by elptrs7.rug.ac.be (AIX 4.1/UCB 5.64/4.03)
          id AA26232; Sat, 7 Sep 1996 15:57:09 +0200
Date: Sat, 7 Sep 1996 15:57:08 +0200 (DFT)
From: "Park, Tae-Yun" <tp@elptrs7.rug.ac.be>
To: Computational Chemistry List <chemistry@www.ccl.net>
Subject: geometry optimization in GAMESS
Message-Id: <Pine.A32.3.91.960907154631.24388A-100000@elptrs7.rug.ac.be>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII


Dear all,

I have a question on geometry optimization in GAMESS.

After the geometry optimization in GAMESS is finished,
hessian calculation is run automatically to check if 
it's a minimum geometry or not and to get the values
of the normal vibrational frequencies.

Sometimes, I've got 1 imaginary vibration frequency, 
and rarely, there are 2 imaginary vibration frequencies.
These cases clearly means that I do not have a minimized
geometry.

My question is, what should I do if I have a geometry
contains imaginary frequecy?  Do I have to optimize again
with lower value of OPTTOL(default in GAMESS is 0.0001)
in STATPT group, or start with different initial geometry 
which has different configuration?  Or, try to use other
value of maximum/minimu trusted radius??

Any suggestion/advice/information will be greatly appreciated.

Many thanks to all the resposes in advence.  



				Sincerely,

				     Park, TAE-YUN    
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
State University of Ghent
Laboratorium voor Petrochemische Techniek
Krijgslaan 281, Blok S5  
9000 Gent, Belgium	  
TEL:+(32)-0(9)-264-4527
FAX:+(32)-0(9)-264-4999
e-mail: tp@elptrs7.rug.ac.be
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


From raman@bioc01.uthscsa.edu  Sat Sep  7 15:53:46 1996
Received: from elzip.uthscsa.edu  for raman@bioc01.uthscsa.edu
	by www.ccl.net (8.7.5/950822.1) id PAA13885; Sat, 7 Sep 1996 15:20:42 -0400 (EDT)
Received: from bioc01.uthscsa.edu by uthscsa.edu (PMDF V5.0-5 #13300)
 id <01I9781M1JR4004GMH@uthscsa.edu> for chemistry@www.ccl.net; Sat,
 07 Sep 1996 14:20:29 -0500 (CDT)
Received: from bioc01.uthscsa.edu by uthscsa.edu (PMDF V5.0-5 #13300)
 id <01I9781HPMC0003V4C@uthscsa.edu> for chemistry@www.ccl.net; Sat,
 07 Sep 1996 14:20:22 -0500 (CDT)
Received: by bioc01.uthscsa.edu (4.1/SMI-4.1) id AA10089; Sat,
 07 Sep 1996 14:19:00 -0500 (CDT)
Date: Sat, 07 Sep 1996 14:18:56 -0500 (CDT)
From: raman@bioc01.uthscsa.edu (C.S.RAMAN)
Subject: Thiazole, oxazole etc.,
To: chemistry@www.ccl.net
Message-id: <9609071919.AA10089@bioc01.uthscsa.edu>
X-Mailer: ELM [version 2.4 PL3]
Content-type: text
Content-transfer-encoding: 7BIT


Hello CCL:

I would like to hear from anyone who has done computations on thiazole,
oxazole and their derivatives.  Alternatively, any citations on this
subject would be equally helpful.

Cheers
-raman
-- 
   _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
   _/                                                                  _/
   _/                       C.S.RAMAN                                  _/
   _/               Department of Biochemistry                         _/
   _/        University of Texas Health Science Center                 _/
   _/                 7703 Floyd Curl Drive                            _/
   _/              San Antonio, TX 78284-7760                          _/
   _/                          USA                                     _/
   _/                                                                  _/
   _/                Tel:     (210) 614-0839                           _/
   _/                Fax:     (210) 567-2490                           _/
   _/             E-mail:  raman@bioc01.uthscsa.edu                    _/
   _/                                                                  _/
   _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/  
   _/                                                                  _/
   _/   "The real problem in speech is not precise language.           _/ 
   _/    The problem is clear language."   --Richard Feynman           _/
   _/                                                                  _/          
   _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/


