mpi_jm

mpi_jm Objectives:

Efficiently run a large set of jobs of bounded size in a large allocation of nodes.
Solve node fragmentation function. Repeated launch and completion of jobs leaves available nodes spread out with poor interconnect performance.
Automate binding of node resources, picking the right cores to match GPU use, setting affinity for openMP threads.
Support overlay of jobs using distinct resources on the same nodes, i.e. GPU vs CPU jobs.
Jobs support pre and post actions that can be used to chain computations with simple dependencies.
Customizable collection of workload with python based frontend.
Dynamic dependencies - jobs can wait for set of prerequisite conditions like completion of a job with neighboring parameters.
Python based runtime estimation. Can implement machine learning model configured on prior run data.
Dynamic reconfiguration of jobs to compute inside allocation time.