Distinguish between started- and connected state for Infiniband nodes
At this moment, the Infiniband node starts and blocks ib_start
until a connection request appears and it is fully connected. The node is in the state STARTED
. Then, as soon as the remote server disconnects, ib_stop
is called and the Infiniband node destroys all IBV and RDMACM components. The state is now STOPPED
. Since it is stopped, it doesn't accept new connections.
A solution to this problem is to introduce a new state: CONNECTED
. In this proposed implementation, ib_start
initializes all structs to accept a connection, starts a separate thread to monitor a global event channel, and sets the state of the node to STARTED
. If a connection request appears in the event channel, the thread establishes the connection and sets the state to CONNECTED
. Then, if the remote server disconnects, not all components are destroyed and the node is set to the state STARTED
(and thus not to STOPPED
). If a remote host tries to connect now, it can accept this connection request.