Re: CCL:Athlon vs Intel performance
Hi,
just worth my 0.02 cents
The numbers I have. show that at the same clock freq. the Athlon is
roughly 0-10 % faster on my benchmarks compared to the PIII (ADF,
Turbomole, Jaguar). Things are different for Dmol which make heavy usage
of Blas-3 dgemm, which is substantially faster on the Athlon using the
fabulous Atlas (cudos to Clint Whaley!) libarary, e.g.
860 MFlops at 800 MHz for the Athlon compared to
825 MFlops at 750 " "
440 MFlops at 550 MHz on a PIII (still Katmai core)
for dgemm based matrix multiplications (independent of the matrix size)
Based on figures by Intel for their new math kernel library, peformance
of a 733 MHz Coppermine is below
600 MFlops.
For Dmol this helps alot and gives a 30% performance edge for the
Athlon. But I guess most vendors that use Blas-3 libs will provide only
one version of their commercial software which is quite likely to be based
on the Intel library.
For Gaussian & Gamess etc. you can obviously choose yourself, but then
it depends how much your code depends on Blas-3. For Jaguar, there was
a tremendous speedup in their new Linux 4.0 version (factor 2x over
3.5) which to the best of my knowledge is at least in parts due to the
usage of dgemm. I will post an updated version of my Linux
quantumchemistry software comparison shortly which includes new results
for Jaguar 4.0 and Dmol on the Athlon/Linux.
The Athlon has still alot more potential and needs better compilers
to support its second FP pipeline (not present in the PIII) and a
faster memory bus. Compaq I heard is working on an Athlon version
of their DVF NT Fortran compiler and supposedly they see good speedup
(taken from ct' a German computer magazine) already. Supposedly it is
due in 3 months.
The memory bus is already slightly improved with the event of the new
KX133 chipset supporting 133 MHz SDRAM. Ought to get still better though
through the usage of DDR RAM aware chipset in the second half of this
year. Nevertheless stream memory bandwith benchmarks went up from
500 MB/sec on the 100 MHz bus to 600-700 MHz on the 133 memory bus
in the new KX133 based boards.
Also, one can expect alot good things from the new Thunderbird core
which has the L2 cache ondie running at the core clock speed and which
should be out soon. At least Intels Coppermine with 256 K ondie L2 cache
seems to have a 20% performance edge over the Katmai core with 512 K
L2 cache running a 1/2 clockrate.
Regards,
Peter
P.S. May I ask users of the Portland group (pgi) Linux pgf77/90 Fortran
compiler (e.g. Gaussian G98 users!) to request at pgi for Athlon
optimizations in that pgi notices that there definitely is a market.
-------------------------------------------
Peter Burger
Anorg.-chem. Institut
Winterthurerstr. 190
Universitaet Zuerich