Re: HP vs. IBM



bernhold (- at -) qtp.ufl.edu writes:
 >It would be nice to also know the hardware configurations of the
 >machines (amt. of memory, type of disks, etc.) since this can influence
 >the performance as well.
 SS1: 24 Mbytes of 80-ns DRAM memory.
      scratch on a tmpfs file system spanning
         a Wren VI CDC 94191-766 and a Fujitsu M2263 drive.***
      20 MHz Sparc-based CPU with a 20MHz Weitek 3170-based FPU.
 SS2: 24 Mbytes of 80-ns DRAM memory.
      internal SUN 207 Mb 3.5'' SCSI drives
      scratch on a Wren VI CDC 94191-766
      40 MHz Sparc-based CPU with a 40 MHz TI TMS390C602A-based FPU.
 Decstation 5000: 24 Mbytes of DRAM memory.
      scratch on a Wren VI CDC 94191-766
      25 MHz MIPS-based CPU
 HP/720: 32 Mbytes of 80-ns DRAM memory
      two internal Quantum 210 Mb 3.5'' SCSI drives
      scratch on external Fujitsu 1.4 Gb drive.
      50 MHz PA RISC 1.1 CPU
 I hope this helps. I don't have information on the HP/730 or IBM Model
 530's disk/memory configurations; they were on evaluation here a few months
 ago and aren't available to me now.
 ***I don't recommend this configuration for Sparcstations doing
 ab-initio work; a bug in the 4.1.1 tmpfs file system sometimes
 put large jobs into a noninterruptable disk wait for very long periods.
 (The 100174-01 OS patches haven't helped.) I now use ordinary 4.2 file
 systems tuned for very large scratch files.
 Here are some posts (clipped from comp.arch a few months ago) that
 may be of interest:
 ----- Begin Included Message -----
 Article: 4642 of comp.arch
 Xref: news.larc.nasa.gov comp.sys.hp:2596 comp.sys.apollo:2720 comp.arch:4642
 comp.benchmarks:466
 Path:
 news.larc.nasa.gov!uakari.primate.wisc.edu!zaphod.mps.ohio-state.edu!rpi!uupsi!sunic!kuling!irf
 From: irf (- at -) kuling.UUCP (Bo Thide')
 Newsgroups: comp.sys.hp,comp.sys.apollo,comp.arch,comp.benchmarks
 Subject: Snakebytes (long -- and poisonous?).
 Message-ID: <1998 (- at -) kuling.UUCP>
 Date: 27 Mar 91 00:48:19 GMT
 Sender: news (- at -) kuling.UUCP
 Reply-To: irf (- at -) kuling.DoCS.UU.SE (Bo Thide')
 Organization: Dept. of Computer Systems, Uppsala University, Sweden
 Lines: 95
 Now that the Snakes (HP9000/700 series HP-PA 1.1 RISC workstations) are let
 loose, the official HP info has become available.  Some of this info follows.
 There are three models, the desktop (114mm*508mm*470mm) 720 (Cobra) and
 730 (King Cobra) and the deskside (610mm*220mm*595mm) 750 (Coral). They
 come initially with HP-UX 8.01 to be upgraded to HP-UX 8.05 in June. Later
 OSF/1 will be available.
 Clock: 50 MHZ (720) or 66 MHz (730, 750)
 Cache: 128 kB instr/256 kB data (720, 730), 256 kB instr/256 kB data.
 Interfaces: SCSI-II, EISA, LAN, RS-232 (to 460.8 kbaud), HP-HIL, Centronics.
             HP-IB optional (via EISA!).
 Monitors: 72 Hz, 19" 1280x1024 8-bit grayscale (GRX) or 8+8 color planes
 (CRX).
 Software: X11R4, OSF/Motif1.2 (not 1.1!), VUE, NCS, NFS, 4.3BSD TCP/IP, ARPA.
 Languages: C, C++, Pascal, FORTRAN, ANSI C, Assembler.  FORTRAN compiler
 	   with "+800" option for series 800 compatibility. Series 800
 	   binaries run on series 700 machines.
 Performance (with HP-UX 8.05) and comparison with other workstations:
 -----------------------------------------------------------------------------
                             SPEC        Khorner-       Linp2P  x11-  Dhry-
                         mark int  fp    stones   MIPS  MFLOPS  perf  stone2.0
 -----------------------------------------------------------------------------
 HP9000/730,750 G/CRX    72.2 51.0 91.0  143974   76    22.9    10460  114680
 HP9000/720 G/CRX        55.5 39.0 70.2  119213   57    17.2     8244   87000
 IBM 6000/550            54.3 34.5 73.5   n/a     56    23       n/a    n/a
 IBM 6000/320            24.6 16.3 32.4   54661   29.5   8.5     1520   45250
 Sun SPARCstation 2GX    21.0 20.2 21.5   27142   28.5   4.2     n/a    35590
 DECstation 5000/200PXGT 18.5 19.0 18.5   26456   24.2   3.7     3256   38760
 DECstation 3100         11.3 11.8 10.9   15285   14.9   1.6     1702   23470
 Sun SPARCstation IPC    11.8 12.4 11.4   13329   15.7   1.7     n/a    22830
 -----------------------------------------------------------------------------
 Linp2P = Linpack Double precision, 100*100 FORTRAN BLAS, rolled.
 x11perf = geometric mean of the x11perf1.2 component tests (excluding 1
 	  and 500 pixel tests).
 Selected x11perf Tests:
 -----------------------------------------------------------------------------
 			         10 pixel  10*10   TR      create & map
 			Dots     lines     rects   text    subwins (50 kids)
 -----------------------------------------------------------------------------
 HP9000/730,750 G/CRX    1630000  911000    278000  273000  6000
 HP9000/720 G/CRX        1260000  874000    272000  245000  4500
 DECstation 5000/200PXGT  370000  455000    256000   90900  1750
 Sun SPARCstation 2GX     101100  147000     83500   49000  1050
 -----------------------------------------------------------------------------
 Graphics Performance:
 -----------------------------------------------------------------------------
                           2D floating       3D floating pt
 		    	  pt vectors/s      vectors/s (peak)
 -----------------------------------------------------------------------------
 HP9000/730,750 G/CRX      1120000           1150000
 HP9000/720 G/CRX          1120000           1150000
 DECstation 5000/200PXGT    300000            300000
 Sun SPARCstation 2GX       450000            240000
 -----------------------------------------------------------------------------
 Sequential Disk Access Rates:
 -----------------------------------------------------------------------------
                                        Read (kB/s)       Write (kB/s)
 -----------------------------------------------------------------------------
 HP9000/700, 1*210MByte disk            1120              1140
 HP9000/700, 1*420MByte disk            1520              1510
 HP9000/700, 2*210MByte disk            2070              1800
 HP9000/700, 2*420MByte disk            2460              2140
 Sun SPARCstation 2, 207MByte disk       744               794
 -----------------------------------------------------------------------------
 ANSYS SP-3 results (smaller = better):
 -----------------------------------------------------------------------------
                             CPU seconds
 -----------------------------------------------------------------------------
 Cray 2                       27
 HP9000/730,750 G/CRX         49
 DEC VAX9000                  65
 HP9000/720 G/CRX             66
 IBM 6000/540                 68
 DECstation 5000             145
 IBM 6000/320                107
 Sun SPARCstation 1+         311
 Sun SPARCstation 2          225
 -----------------------------------------------------------------------------
 HP numbers were measured with series 800 compiler code. No series 700
 specific optimizations used.
 >From
 news.larc.nasa.gov!elroy.jpl.nasa.gov!sdd.hp.com!hplabs!hpda!hpcuhb!hpcuhe!linley
 Wed Mar 27 10:03:36 EST 1991
 Article: 4644 of comp.arch
 Path:
 news.larc.nasa.gov!elroy.jpl.nasa.gov!sdd.hp.com!hplabs!hpda!hpcuhb!hpcuhe!linley
 From: linley (- at -) hpcuhe.cup.hp.com (Linley Gwennap)
 Newsgroups: comp.arch
 Subject: Re: Snake
 Message-ID: <32580006 (- at -) hpcuhe.cup.hp.com>
 Date: 26 Mar 91 22:35:14 GMT
 References: <69465 (- at -) brunix.UUCP>
 Organization: PA-RISC Marketing Central
 Lines: 104
 Due to popular demand, here is an article comparing the new Snakes CPU to
 IBM's "America" chip (used in the RS/6000 series).  I have deleted the
 section on America.  I would be happy to post more info if this is useful.
 						--Linley Gwennap
 						  Hewlett-Packard
 HP SNAKES CPU
 HP's high-performance chip set consists of the  "Snakes"  CPU  chip
 and  a
 floating  point  coprocessor  ("FPC")  jointly developed with Texas
 Instru-
 ments[1].  These are the first chips to implement the PA-RISC 1.1 architec-
 ture.   They  use  a  traditional RISC approach to achieve industry-leading
 performance of 72 SPECmarks with a 66 MHz clock.
 PA-RISC 1.1, an extension to the original  PA-RISC  architecture,  includes
 several  new instructions, many of which accelerate graphics operations[2].
 A multiply-and-add instruction (as in IBM's POWER) is  included.  In  addi-
 tion,  the  page  size was doubled to 4 KB to reduce the TLB miss rate, and
 eight "shadow" registers were added to provide quick context switching
 for
 the TLB miss handler.
 The CPU contains all integer  instruction  processing,  cache  control  and
 memory  management  functions.   All  cache  memory is included in external
 SRAMs connected directly to the CPU.  Snakes has a 64-bit path  to  the  D-
 cache,  just  like  the  R4000.   Both  the I- and D-caches can be accessed
 simultaneously, resulting in a total cache bandwidth of 792 MB  per  second
 (peak).   The  FPC implements all floating point instructions.  It receives
 instructions and data from the caches at the same time as the CPU, and  du-
 plicates parts of the CPU's instruction pipeline, eliminating the penalties
 often incurred by separate CPU and FPC chips.  Snakes is designed  to  work
 with a variety of memory and I/O interfaces.
 The CPU uses a five-stage pipeline to reduce cycle time.  The penalties  in
 this  pipeline  have been minimized.  For example, conditional branches are
 executed with no delay if their outcome is predicted  correctly,  and  with
 only  a  single  cycle penalty otherwise.  The branch prediction algorithm,
 more advanced than America's, predicts forward branches to be  untaken  and
 backward  branches  taken, thus optimizing for loops. The load penalty is a
 maximum of one cycle and the store penalty a maximum of two;  these  penal-
 ties can usually be avoided by the compiler. All other integer instructions
 (except a few rare system control functions) are always executed in a  sin-
 gle  cycle.   This uncomplicated design is reflected by a simple, efficient
 compiler.
 Although Snakes is not superscalar, PA-RISC instructions such  as  ADD  AND
 BRANCH,  MOVE  AND  BRANCH and COMPARE AND BRANCH allow a similar amount of
 parallelism as America for integer-only applications; in fact, the ratio of
 Integer  SPECmarks  to  MHz  for  Snakes (65/66) actually exceeds America's
 (35/42).
 FPC is a full 64-bit implementation.  It contains  two  parallel  execution
 units:   the ALU (addition, conversion) and the MPY unit (multiply, divide,
 square root).  Each unit can start a new operation on every other cycle, so
 FPC  can  accept one floating point instruction per cycle provided that ALU
 and MPY instructions are alternated.
 The external caches are direct mapped and are protected by  parity,  making
 them  slightly less robust than America's ECC cache.  Cache coherency flags
 are included to facilitate multiprocessor operation.  A write-back protocol
 is  used  to reduce writes to main memory.  Although Snakes does not imple-
 ment America's complex "critical word first" algorithm on cache
 misses,  it
 will  begin  processing  as soon as the critical word is obtained, reducing
 the miss penalty by as much  as  seven  cycles.   Snakes  supports  a  wide
 variety  of  off-the-shelf  SRAMs  and can be configured with anywhere from
 8 KB to 3 MB of external cache.  At  its  maximum  operating  frequency  of
 66 MHz, it requires 12 ns SRAMs.
 The I- and D-TLBs are fully associative and contain 96  entries  each.   In
 addition, each TLB implements four variable size "block" entries
 capable of
 mapping up to 16 MB each, which can be  used  for  large  portions  of  the
 operating system and/or graphics frame buffers.  The memory system supports
 48 bits (256 terabytes) of virtual address space and 32 bits  (4 gigabytes)
 of  real address space.  (This is a subset of the full 64-bit virtual space
 allowed by PA-RISC).  Two addressing modes support 1 GB or 4 GB  data  seg-
 ments, significantly larger than America's segments.
 A separate bus provides access to memory, I/O and,  if  desired,  graphics.
 This bus is a synchronous, dedicated interface with a peak transfer rate of
 264 MB per second, about one-half the speed  of  America's  memory  system.
 The bus bandwidth is limited by its width of 32 bits, but a wider bus would
 have required a larger, more expensive package.  Snakes's cache miss penal-
 ty,  measured  in cycles, is much higher than America's, due to the shorter
 clock cycle time. Snakes compensates for these penalties  by  allowing  for
 large  external caches to reduce the miss rate; the performance numbers for
 Snakes assume a 128 KB instruction cache and 256 KB data cache.
 The CPU is fabricated in HP's CMOS-26 process (a  1.0 micron,  three  metal
 layer  process)  and  packaged in a 408-pin PGA.  FPC is fabricated in TI's
 0.8 micron CMOS process and placed in  a  207-pin  PGA.   These  PGAs  were
 custom-designed  to  allow  high  frequency operation with wide CMOS buses.
 The CPU contains about 577,000 transistors, while FPC  uses  640,000.   For
 lower-cost  systems,  the  chip set is designed to run at frequencies below
 66 MHz, allowing lower-speed SRAMs to be used.  FPC can also be  eliminated
 to further reduce costs.
 REFERENCES AND NOTES
 [1]  "CMOS  PA-RISC  Processor  for  a  New  Family  of  Workstations"
 by
 M. Forsyth,  S. Mangelsdorf,  E. DeLano,  C. Gleason and J. Yetter, COMPCON
 Spring 91 Digest of Technical Papers, February 1991.
 [2] "Architecture and Compiler Enhancements for  PA-RISC
 Workstations"  by
 D. Odnert,  R. Hansen,  M. Dadoo and M. Laventhal, COMPCON Spring 91 Digest
 of Technical Papers, February 1991.
 ----- End Included Message -----
 -----
 Fred Senese, MS 234 (804) 864-4777 | senese (- at -) schug.larc.nasa.gov
 (128.155.22.47)
 Speaking from (but not for) NASA-LaRC, Hampton VA 23665-5225
 Subliminal Message: 1. Anonymously ftp to schug.larc.nasa.gov
                     2. cd ~/resume ; get resume.[ps|tex|ascii].
                     3. Hire me.