Regarding what measurement to use in a benchmark, I feel you should measure what you are really interested in. Namely "how long does it take to run", or wall clock time. If you use CPU time you would add up the time for each individual CPU, and that isn't a very useful measure of the performance increase of a parallel computer. Wall clock time takes into account any paging/swapping/io that the program causes, which are important practical issues.