CCL: Calculations on a DRBL cluster



 Sent to CCL by: Alexander Martins Silva [alex.msilva#%#uol.com.br]
         Hi CCLers,
 
I have installed a small linux cluster with the DRBL program (drbl.sourceforge.net). I can execute small parallel jobs with mpi and even short Gamess calculations. However, the parallel jobs stops without error message when these jobs require some hours of calculation. Even when I start serial jobs on each node the error is the same. The same job can stop at differents points. What's happening? How can I solve this? I'm using a Mandrake 10/Athlon XP 2600 server machine with Raid1, 10 P4 clients and a PLanet 24port/gigabit/switch.
             Any suggestion or advise is welcome.
                        Thanks,
                     Alexander.