Re: HP vs. IBM
bernhold (- at -) qtp.ufl.edu writes:
>It would be nice to also know the hardware configurations of the
>machines (amt. of memory, type of disks, etc.) since this can influence
>the performance as well.
SS1: 24 Mbytes of 80-ns DRAM memory.
scratch on a tmpfs file system spanning
a Wren VI CDC 94191-766 and a Fujitsu M2263 drive.***
20 MHz Sparc-based CPU with a 20MHz Weitek 3170-based FPU.
SS2: 24 Mbytes of 80-ns DRAM memory.
internal SUN 207 Mb 3.5'' SCSI drives
scratch on a Wren VI CDC 94191-766
40 MHz Sparc-based CPU with a 40 MHz TI TMS390C602A-based FPU.
Decstation 5000: 24 Mbytes of DRAM memory.
scratch on a Wren VI CDC 94191-766
25 MHz MIPS-based CPU
HP/720: 32 Mbytes of 80-ns DRAM memory
two internal Quantum 210 Mb 3.5'' SCSI drives
scratch on external Fujitsu 1.4 Gb drive.
50 MHz PA RISC 1.1 CPU
I hope this helps. I don't have information on the HP/730 or IBM Model
530's disk/memory configurations; they were on evaluation here a few months
ago and aren't available to me now.
***I don't recommend this configuration for Sparcstations doing
ab-initio work; a bug in the 4.1.1 tmpfs file system sometimes
put large jobs into a noninterruptable disk wait for very long periods.
(The 100174-01 OS patches haven't helped.) I now use ordinary 4.2 file
systems tuned for very large scratch files.
Here are some posts (clipped from comp.arch a few months ago) that
may be of interest:
----- Begin Included Message -----
Article: 4642 of comp.arch
Xref: news.larc.nasa.gov comp.sys.hp:2596 comp.sys.apollo:2720 comp.arch:4642
comp.benchmarks:466
Path:
news.larc.nasa.gov!uakari.primate.wisc.edu!zaphod.mps.ohio-state.edu!rpi!uupsi!sunic!kuling!irf
From: irf (- at -) kuling.UUCP (Bo Thide')
Newsgroups: comp.sys.hp,comp.sys.apollo,comp.arch,comp.benchmarks
Subject: Snakebytes (long -- and poisonous?).
Message-ID: <1998 (- at -) kuling.UUCP>
Date: 27 Mar 91 00:48:19 GMT
Sender: news (- at -) kuling.UUCP
Reply-To: irf (- at -) kuling.DoCS.UU.SE (Bo Thide')
Organization: Dept. of Computer Systems, Uppsala University, Sweden
Lines: 95
Now that the Snakes (HP9000/700 series HP-PA 1.1 RISC workstations) are let
loose, the official HP info has become available. Some of this info follows.
There are three models, the desktop (114mm*508mm*470mm) 720 (Cobra) and
730 (King Cobra) and the deskside (610mm*220mm*595mm) 750 (Coral). They
come initially with HP-UX 8.01 to be upgraded to HP-UX 8.05 in June. Later
OSF/1 will be available.
Clock: 50 MHZ (720) or 66 MHz (730, 750)
Cache: 128 kB instr/256 kB data (720, 730), 256 kB instr/256 kB data.
Interfaces: SCSI-II, EISA, LAN, RS-232 (to 460.8 kbaud), HP-HIL, Centronics.
HP-IB optional (via EISA!).
Monitors: 72 Hz, 19" 1280x1024 8-bit grayscale (GRX) or 8+8 color planes
(CRX).
Software: X11R4, OSF/Motif1.2 (not 1.1!), VUE, NCS, NFS, 4.3BSD TCP/IP, ARPA.
Languages: C, C++, Pascal, FORTRAN, ANSI C, Assembler. FORTRAN compiler
with "+800" option for series 800 compatibility. Series 800
binaries run on series 700 machines.
Performance (with HP-UX 8.05) and comparison with other workstations:
-----------------------------------------------------------------------------
SPEC Khorner- Linp2P x11- Dhry-
mark int fp stones MIPS MFLOPS perf stone2.0
-----------------------------------------------------------------------------
HP9000/730,750 G/CRX 72.2 51.0 91.0 143974 76 22.9 10460 114680
HP9000/720 G/CRX 55.5 39.0 70.2 119213 57 17.2 8244 87000
IBM 6000/550 54.3 34.5 73.5 n/a 56 23 n/a n/a
IBM 6000/320 24.6 16.3 32.4 54661 29.5 8.5 1520 45250
Sun SPARCstation 2GX 21.0 20.2 21.5 27142 28.5 4.2 n/a 35590
DECstation 5000/200PXGT 18.5 19.0 18.5 26456 24.2 3.7 3256 38760
DECstation 3100 11.3 11.8 10.9 15285 14.9 1.6 1702 23470
Sun SPARCstation IPC 11.8 12.4 11.4 13329 15.7 1.7 n/a 22830
-----------------------------------------------------------------------------
Linp2P = Linpack Double precision, 100*100 FORTRAN BLAS, rolled.
x11perf = geometric mean of the x11perf1.2 component tests (excluding 1
and 500 pixel tests).
Selected x11perf Tests:
-----------------------------------------------------------------------------
10 pixel 10*10 TR create & map
Dots lines rects text subwins (50 kids)
-----------------------------------------------------------------------------
HP9000/730,750 G/CRX 1630000 911000 278000 273000 6000
HP9000/720 G/CRX 1260000 874000 272000 245000 4500
DECstation 5000/200PXGT 370000 455000 256000 90900 1750
Sun SPARCstation 2GX 101100 147000 83500 49000 1050
-----------------------------------------------------------------------------
Graphics Performance:
-----------------------------------------------------------------------------
2D floating 3D floating pt
pt vectors/s vectors/s (peak)
-----------------------------------------------------------------------------
HP9000/730,750 G/CRX 1120000 1150000
HP9000/720 G/CRX 1120000 1150000
DECstation 5000/200PXGT 300000 300000
Sun SPARCstation 2GX 450000 240000
-----------------------------------------------------------------------------
Sequential Disk Access Rates:
-----------------------------------------------------------------------------
Read (kB/s) Write (kB/s)
-----------------------------------------------------------------------------
HP9000/700, 1*210MByte disk 1120 1140
HP9000/700, 1*420MByte disk 1520 1510
HP9000/700, 2*210MByte disk 2070 1800
HP9000/700, 2*420MByte disk 2460 2140
Sun SPARCstation 2, 207MByte disk 744 794
-----------------------------------------------------------------------------
ANSYS SP-3 results (smaller = better):
-----------------------------------------------------------------------------
CPU seconds
-----------------------------------------------------------------------------
Cray 2 27
HP9000/730,750 G/CRX 49
DEC VAX9000 65
HP9000/720 G/CRX 66
IBM 6000/540 68
DECstation 5000 145
IBM 6000/320 107
Sun SPARCstation 1+ 311
Sun SPARCstation 2 225
-----------------------------------------------------------------------------
HP numbers were measured with series 800 compiler code. No series 700
specific optimizations used.
>From
news.larc.nasa.gov!elroy.jpl.nasa.gov!sdd.hp.com!hplabs!hpda!hpcuhb!hpcuhe!linley
Wed Mar 27 10:03:36 EST 1991
Article: 4644 of comp.arch
Path:
news.larc.nasa.gov!elroy.jpl.nasa.gov!sdd.hp.com!hplabs!hpda!hpcuhb!hpcuhe!linley
From: linley (- at -) hpcuhe.cup.hp.com (Linley Gwennap)
Newsgroups: comp.arch
Subject: Re: Snake
Message-ID: <32580006 (- at -) hpcuhe.cup.hp.com>
Date: 26 Mar 91 22:35:14 GMT
References: <69465 (- at -) brunix.UUCP>
Organization: PA-RISC Marketing Central
Lines: 104
Due to popular demand, here is an article comparing the new Snakes CPU to
IBM's "America" chip (used in the RS/6000 series). I have deleted the
section on America. I would be happy to post more info if this is useful.
--Linley Gwennap
Hewlett-Packard
HP SNAKES CPU
HP's high-performance chip set consists of the "Snakes" CPU chip
and a
floating point coprocessor ("FPC") jointly developed with Texas
Instru-
ments[1]. These are the first chips to implement the PA-RISC 1.1 architec-
ture. They use a traditional RISC approach to achieve industry-leading
performance of 72 SPECmarks with a 66 MHz clock.
PA-RISC 1.1, an extension to the original PA-RISC architecture, includes
several new instructions, many of which accelerate graphics operations[2].
A multiply-and-add instruction (as in IBM's POWER) is included. In addi-
tion, the page size was doubled to 4 KB to reduce the TLB miss rate, and
eight "shadow" registers were added to provide quick context switching
for
the TLB miss handler.
The CPU contains all integer instruction processing, cache control and
memory management functions. All cache memory is included in external
SRAMs connected directly to the CPU. Snakes has a 64-bit path to the D-
cache, just like the R4000. Both the I- and D-caches can be accessed
simultaneously, resulting in a total cache bandwidth of 792 MB per second
(peak). The FPC implements all floating point instructions. It receives
instructions and data from the caches at the same time as the CPU, and du-
plicates parts of the CPU's instruction pipeline, eliminating the penalties
often incurred by separate CPU and FPC chips. Snakes is designed to work
with a variety of memory and I/O interfaces.
The CPU uses a five-stage pipeline to reduce cycle time. The penalties in
this pipeline have been minimized. For example, conditional branches are
executed with no delay if their outcome is predicted correctly, and with
only a single cycle penalty otherwise. The branch prediction algorithm,
more advanced than America's, predicts forward branches to be untaken and
backward branches taken, thus optimizing for loops. The load penalty is a
maximum of one cycle and the store penalty a maximum of two; these penal-
ties can usually be avoided by the compiler. All other integer instructions
(except a few rare system control functions) are always executed in a sin-
gle cycle. This uncomplicated design is reflected by a simple, efficient
compiler.
Although Snakes is not superscalar, PA-RISC instructions such as ADD AND
BRANCH, MOVE AND BRANCH and COMPARE AND BRANCH allow a similar amount of
parallelism as America for integer-only applications; in fact, the ratio of
Integer SPECmarks to MHz for Snakes (65/66) actually exceeds America's
(35/42).
FPC is a full 64-bit implementation. It contains two parallel execution
units: the ALU (addition, conversion) and the MPY unit (multiply, divide,
square root). Each unit can start a new operation on every other cycle, so
FPC can accept one floating point instruction per cycle provided that ALU
and MPY instructions are alternated.
The external caches are direct mapped and are protected by parity, making
them slightly less robust than America's ECC cache. Cache coherency flags
are included to facilitate multiprocessor operation. A write-back protocol
is used to reduce writes to main memory. Although Snakes does not imple-
ment America's complex "critical word first" algorithm on cache
misses, it
will begin processing as soon as the critical word is obtained, reducing
the miss penalty by as much as seven cycles. Snakes supports a wide
variety of off-the-shelf SRAMs and can be configured with anywhere from
8 KB to 3 MB of external cache. At its maximum operating frequency of
66 MHz, it requires 12 ns SRAMs.
The I- and D-TLBs are fully associative and contain 96 entries each. In
addition, each TLB implements four variable size "block" entries
capable of
mapping up to 16 MB each, which can be used for large portions of the
operating system and/or graphics frame buffers. The memory system supports
48 bits (256 terabytes) of virtual address space and 32 bits (4 gigabytes)
of real address space. (This is a subset of the full 64-bit virtual space
allowed by PA-RISC). Two addressing modes support 1 GB or 4 GB data seg-
ments, significantly larger than America's segments.
A separate bus provides access to memory, I/O and, if desired, graphics.
This bus is a synchronous, dedicated interface with a peak transfer rate of
264 MB per second, about one-half the speed of America's memory system.
The bus bandwidth is limited by its width of 32 bits, but a wider bus would
have required a larger, more expensive package. Snakes's cache miss penal-
ty, measured in cycles, is much higher than America's, due to the shorter
clock cycle time. Snakes compensates for these penalties by allowing for
large external caches to reduce the miss rate; the performance numbers for
Snakes assume a 128 KB instruction cache and 256 KB data cache.
The CPU is fabricated in HP's CMOS-26 process (a 1.0 micron, three metal
layer process) and packaged in a 408-pin PGA. FPC is fabricated in TI's
0.8 micron CMOS process and placed in a 207-pin PGA. These PGAs were
custom-designed to allow high frequency operation with wide CMOS buses.
The CPU contains about 577,000 transistors, while FPC uses 640,000. For
lower-cost systems, the chip set is designed to run at frequencies below
66 MHz, allowing lower-speed SRAMs to be used. FPC can also be eliminated
to further reduce costs.
REFERENCES AND NOTES
[1] "CMOS PA-RISC Processor for a New Family of Workstations"
by
M. Forsyth, S. Mangelsdorf, E. DeLano, C. Gleason and J. Yetter, COMPCON
Spring 91 Digest of Technical Papers, February 1991.
[2] "Architecture and Compiler Enhancements for PA-RISC
Workstations" by
D. Odnert, R. Hansen, M. Dadoo and M. Laventhal, COMPCON Spring 91 Digest
of Technical Papers, February 1991.
----- End Included Message -----
-----
Fred Senese, MS 234 (804) 864-4777 | senese (- at -) schug.larc.nasa.gov
(128.155.22.47)
Speaking from (but not for) NASA-LaRC, Hampton VA 23665-5225
Subliminal Message: 1. Anonymously ftp to schug.larc.nasa.gov
2. cd ~/resume ; get resume.[ps|tex|ascii].
3. Hire me.