Cluster System Management (CSM)
(Open Source Release 2018 with Quarterly Updates)
CSM helps manage your High Performance Computer (HPC) cluster. It offers a suite of tools for maintaining your cluster. It provides discovery and management of system resources, database integration (PostgreSQL), job launch support (workload management APIs), node diagnostics (diag APIs and scripts), RAS events and actions, infrastructure health checks, and python Bindings for C APIs. CSM is a component of the Open Source IBM product CAST, which stands for Cluster Administration Storage Tools.
In this position as Software Engineer at IBM, my main role was CSM API team lead. Here I lead the CSM team designing and developing all of its C based APIs. These APIs let all the system programs communicate with the CSM suite along with other external programs. We worked internally in team, externally across teams, and with external contractors to complete this component. Secondary roles on the CSM team included UFM integration, inventory collection, and tools programming.
View the Open Source GitHub product here: CAST