From chemistry-request /at\server.ccl.net Wed Mar 29 05:52:34 2000 Received: from rzumail2.unizh.ch (rzumail2.unizh.ch [130.60.128.10]) by server.ccl.net (8.8.7/8.8.7) with ESMTP id FAA08995 for ; Wed, 29 Mar 2000 05:52:33 -0500 Received: from zisgi.unizh.ch (zisgi.unizh.ch [130.60.19.17]) by rzumail2.unizh.ch (8.9.3/8.9.3/05) with ESMTP id MAA16807; Wed, 29 Mar 2000 12:52:22 +0200 (MET DST) From: "Dr. Peter Burger" Received: [(chburger $#at#$ localhost) by zisgi.unizh.ch (8.9.3/SMI-IRIX6.2/USAR_MAIL_V2) id MAA62332; Wed, 29 Mar 2000 12:52:22 +0200 (MDT)] Message-Id: <200003291052.MAA62332 $#at#$ zisgi.unizh.ch> Subject: Re: CCL:Athlon vs Intel performance To: Matthias.Mann%!at!%chemie.tu-dresden.de (Matthias Mann) Date: Wed, 29 Mar 2000 12:52:22 +0200 (CEST) Cc: CHEMISTRY /at\ccl.net In-Reply-To: <00032910450501.11106%!at!%coch11> from "Matthias Mann" at Mar 29, 0 10:26:24 am X-Mailer: ELM [version 2.4 PL25 PGP2] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Hi, just worth my 0.02 cents The numbers I have. show that at the same clock freq. the Athlon is roughly 0-10 % faster on my benchmarks compared to the PIII (ADF, Turbomole, Jaguar). Things are different for Dmol which make heavy usage of Blas-3 dgemm, which is substantially faster on the Athlon using the fabulous Atlas (cudos to Clint Whaley!) libarary, e.g. 860 MFlops at 800 MHz for the Athlon compared to 825 MFlops at 750 " " 440 MFlops at 550 MHz on a PIII (still Katmai core) for dgemm based matrix multiplications (independent of the matrix size) Based on figures by Intel for their new math kernel library, peformance of a 733 MHz Coppermine is below 600 MFlops. For Dmol this helps alot and gives a 30% performance edge for the Athlon. But I guess most vendors that use Blas-3 libs will provide only one version of their commercial software which is quite likely to be based on the Intel library. For Gaussian & Gamess etc. you can obviously choose yourself, but then it depends how much your code depends on Blas-3. For Jaguar, there was a tremendous speedup in their new Linux 4.0 version (factor 2x over 3.5) which to the best of my knowledge is at least in parts due to the usage of dgemm. I will post an updated version of my Linux quantumchemistry software comparison shortly which includes new results for Jaguar 4.0 and Dmol on the Athlon/Linux. The Athlon has still alot more potential and needs better compilers to support its second FP pipeline (not present in the PIII) and a faster memory bus. Compaq I heard is working on an Athlon version of their DVF NT Fortran compiler and supposedly they see good speedup (taken from ct' a German computer magazine) already. Supposedly it is due in 3 months. The memory bus is already slightly improved with the event of the new KX133 chipset supporting 133 MHz SDRAM. Ought to get still better though through the usage of DDR RAM aware chipset in the second half of this year. Nevertheless stream memory bandwith benchmarks went up from 500 MB/sec on the 100 MHz bus to 600-700 MHz on the 133 memory bus in the new KX133 based boards. Also, one can expect alot good things from the new Thunderbird core which has the L2 cache ondie running at the core clock speed and which should be out soon. At least Intels Coppermine with 256 K ondie L2 cache seems to have a 20% performance edge over the Katmai core with 512 K L2 cache running a 1/2 clockrate. Regards, Peter P.S. May I ask users of the Portland group (pgi) Linux pgf77/90 Fortran compiler (e.g. Gaussian G98 users!) to request at pgi for Athlon optimizations in that pgi notices that there definitely is a market. ------------------------------------------- Peter Burger Anorg.-chem. Institut Winterthurerstr. 190 Universitaet Zuerich