CCL: Gamess in parallel
- From: Alexander Martins Silva <alex.msilva###uol.com.br>
- Subject: CCL: Gamess in parallel
- Date: Tue, 18 Oct 2005 21:52:54 +0000
Sent to CCL by: Alexander Martins Silva [alex.msilva:-:uol.com.br]
Hi,
I'm trying to run the latest release of GAMESS on a linux
cluster under Mandrake 10.0. The compilation went fine and all tests
were passed. The rsh and ssh (without password) are working fine. The
non-root user can acesses any node of the cluster from the server.
However, I can't execute a parallel job, and the generic error message
is obtained:
> /usr/local/gamess/ddikick.x /usr/local/gamess/gamess.01.x
exam01.inp
-ddi 2 2 node01 node02 -scr <scratch>
Distributed Data Interface kickoff program.
Initiating 2 compute processes on 2 nodes to run the following command:
/usr/local/gamess/gamess.01.x exam01.inp
TCP connect error: ECONNREFUSED.
TCP connect error: ECONNREFUSED.
TCP: Connect failed. at1 -> at1101.ime.eb.br:35389.
A fatal error occurred on DDI Process 0.
TCP: Connect failed. at1 -> at1101.ime.eb.br:35389.
A fatal error occurred on DDI Process 2.
TCP connect error: ECONNREFUSED.
TCP: Connect failed. at1102 -> at1101.ime.eb.br:35389.
A fatal error occurred on DDI Process 1.
TCP connect error: ECONNREFUSED.
TCP: Connect failed. at1102 -> at1101.ime.eb.br:35389.
A fatal error occurred on DDI Process 3.
ddikick.x: Timed out while waiting for DDI processes to check in.
ddikick.x: Fatal error detected.
The error is most likely to be in the application, so check for
input errors, disk space, memory needs, application bugs, etc.
ddikick.x will now clean up all processes, and exit...
ddikick.x: Sending kill signal to DDI processes.
ddikick.x: Execution terminated due to error(s).
What's the problem? How can I fix it?
Thanks in advance,
Alexander.