Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • VILLASnode VILLASnode
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 28
    • Issues 28
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 6
    • Merge requests 6
  • Deployments
    • Deployments
    • Releases
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • ACS
  • Public
  • VILLASframework
  • VILLASnodeVILLASnode
  • Issues
  • #152

Closed
Open
Created Jun 30, 2018 by Dennis Potter@dennis.potter

Distinguish between started- and connected state for Infiniband nodes

At this moment, the Infiniband node starts and blocks ib_start until a connection request appears and it is fully connected. The node is in the state STARTED. Then, as soon as the remote server disconnects, ib_stop is called and the Infiniband node destroys all IBV and RDMACM components. The state is now STOPPED. Since it is stopped, it doesn't accept new connections.

A solution to this problem is to introduce a new state: CONNECTED. In this proposed implementation, ib_start initializes all structs to accept a connection, starts a separate thread to monitor a global event channel, and sets the state of the node to STARTED. If a connection request appears in the event channel, the thread establishes the connection and sets the state to CONNECTED. Then, if the remote server disconnects, not all components are destroyed and the node is set to the state STARTED (and thus not to STOPPED). If a remote host tries to connect now, it can accept this connection request.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking