Views
MPIClusterErrors
Common Errors from using MPI in cluster
- What does
p4_error: semget failed for setnum: 0mean?This means that the maximum number of allowed semaphores on the master node has been created, and the program you are trying to run cannot allocate a new semaphore for inter-process communication. This can happen when somebody has been testing software that does not exit properly, leaving semaphores and shared memory segments allocated.
If the leftover semaphores are owned by you, it can be fixed by running the following two commands:
$ /opt/mpich/gnu/sbin/cleanipcs$ cluster-fork /opt/mpich/gnu/sbin/cleanipcs(In this case, using the intel or gnu version doesn't matter. The scripts are identical.)
It is possible that other users may have filled up the semaphore table. In this case, either they or root will need to clean the tables.
- What does
p4_error: net_recv failed for fd = 7, errno = : 104mean?This means that the maximum number of allowed semaphores on one of the child nodes has been created, and the program you are trying to run cannot allocate a new semaphore for inter-process communication with this node. This can happen when somebody has been testing software that does not exit properly, leaving semaphores and shared memory segments allocated.
If the leftover semaphores are owned by you, it can be fixed by running the following command:
$ cluster-fork /opt/mpich/gnu/sbin/cleanipcs