Re: CCL:Athlon vs Intel performance



Hi,
 just worth my 0.02 cents
 The numbers I have. show that at the same clock freq. the Athlon is
 roughly 0-10 % faster on my benchmarks compared to the PIII (ADF,
 Turbomole, Jaguar). Things are different for Dmol which make heavy usage
 of Blas-3 dgemm, which is substantially faster on the Athlon using the
 fabulous Atlas (cudos to Clint Whaley!) libarary, e.g.
 860 MFlops at 800 MHz for the Athlon compared to
 825 MFlops at 750 "              "
 440 MFlops at 550 MHz on a PIII (still Katmai core)
 for dgemm based matrix multiplications (independent of the matrix size)
 Based on figures by Intel for their new math kernel library, peformance
 of a 733 MHz Coppermine is below
 600 MFlops.
 For Dmol this helps alot and gives a 30% performance edge for the
 Athlon. But I guess most vendors that use Blas-3 libs will provide only
 one version of their commercial software which is quite likely to be based
 on the Intel library.
 For Gaussian & Gamess etc. you can obviously choose yourself, but then
 it depends how much your code depends on Blas-3. For Jaguar, there was
 a tremendous speedup in their new Linux 4.0 version (factor 2x over
 3.5) which  to the best of my knowledge is at least in parts due to the
 usage of dgemm. I will post an updated version of my Linux
 quantumchemistry  software comparison shortly which includes new results
 for Jaguar 4.0 and Dmol on the Athlon/Linux.
 The Athlon has still alot more potential and needs better compilers
 to support its second FP pipeline (not present in the PIII) and a
 faster memory bus. Compaq I heard is working on an Athlon version
 of their DVF NT Fortran compiler and supposedly they see good speedup
 (taken from ct' a German computer magazine) already. Supposedly it is
 due in 3 months.
 The memory bus is already slightly improved with the event of the new
 KX133 chipset supporting 133 MHz SDRAM. Ought to get still better though
 through  the usage of DDR RAM aware chipset in the second half of this
 year.  Nevertheless stream memory bandwith benchmarks went up from
 500 MB/sec on the 100 MHz bus to 600-700 MHz on the 133 memory bus
 in the new KX133 based boards.
 Also, one can expect alot good things from the new Thunderbird core
 which has the L2 cache ondie running at the core clock speed and which
 should be out soon. At least Intels Coppermine with 256 K ondie L2 cache
 seems to have a 20% performance edge over the Katmai core with 512 K
 L2 cache running a 1/2 clockrate.
 Regards,
 Peter
 P.S. May I ask users of the Portland group (pgi) Linux pgf77/90 Fortran
 compiler (e.g. Gaussian G98 users!) to request at pgi for Athlon
 optimizations in that pgi notices that there definitely is a market.
 -------------------------------------------
 Peter Burger
 Anorg.-chem. Institut
 Winterthurerstr. 190
 Universitaet Zuerich