Athlon optimized BLAS routines



 From: Emil Briggs <briggs \\at// tick.physics.ncsu.edu>
 I've written a set of Athlon optimized double precision
 BLAS1 routines in assembly. The source can be found at
 http://nemo.physics.ncsu.edu/~briggs/blas_src_v0.11_tar.gz
 The routines are in an early stage of development but they do
 pass the Level 1 Blas testing routines from netlib and they
 work correctly in the applications I've tested. Thats not
 a guarantee that they are correct in all cases however so
 use with caution. The routines use the Athlon memory prefetch
 instructions which really boost performance for large data
 sets. (You'll get an illegal instruction exception on PentiumIII
 systems).
 I'm still tuning, optimizing and debugging but these are already
 much faster on an Athlon than any other set of BLAS1 libs that
 I know of available for Linux.
 Regards
 Emil
 ------------------------------------------------------------------
 Benchmark results on a 500Mhz Athlon system. Best time of 20 iterations.
 Intel optimized libraries are version lsblasppro1.1o_08.99.a courtesy
 of Greg Henry.
 DAXPY
 **********************************************************
            N        Athlon libs         Intel optimized
         4096           411.49           373.48
        16384           109.60            82.35
        65536            57.49            49.29
       262144            52.66            43.67
      1048576            51.31            43.66
 DDOT
 **********************************************************
            N        Athlon libs         Intel optimized
         4096           587.35           164.40
        16384           155.30            72.18
        65536            80.81            42.39
       262144            75.72            28.76
      1048576            74.79            26.16
 DSCAL
 **********************************************************
            N        Athlon libs         Intel optimized
         4096           293.67           164.40
        16384            91.50            72.18
        65536            58.36            42.39
       262144            44.73            28.76
      1048576            42.25            26.16
 DNRM2
 **********************************************************
            N        Athlon libs         Intel optimized
         4096           630.45           545.39
        16384           394.94           338.10
        65536           182.55           116.71
       262144           156.92            72.56
      1048576           153.80            65.74
 DCOPY
 **********************************************************
            N        Athlon libs         Intel optimized
         4096          4398.05           2421.83
        16384           888.85            515.11
        65536           539.97            324.83
       262144           449.60            308.21
      1048576           445.05            298.25
 -------------------------------------------------------------------
 To unsubscribe send a message body containing "unsubscribe"
 to beowulf-request \\at// beowulf.org