Configuring Slurm¶
Slurm categorizes system usage in terms of trackable resources (TRES). Keystone uses these TRES values to enforce allocation limits in the Slurm scheduler. The total billable TRES for a given Slurm job is calculated as a weighted sum of usage \(\left ( U \right )\), scaled by administrator-defined billing weights \(\left ( W \right )\) :
Keystone interfaces with Slurm to automatically enforce per-cluster limits on a group's total allowed Billable Usage.
These limits are enforced using the GrpTresMins=billing=[LIMIT]
setting.
Once a group reaches its allocation limit, additional Slurm jobs will be prevented from running on the target cluster.
Keystone is agnostic to most Slurm settings and requires minimal modification to an existing cluster. However, certain fairshare features are incompatible with Keystone's accounting model and must be disabled. The steps below outline the configuration required for integration with Slurm.
Enable Resource Tracking¶
To impose usage limits, Keystone requires the utilized resources to be represented as a TRES in Slurm. Tracking is enabled by default for common resources such as CPU, memory, and energy. Administrators may extend this list to include additional resource types, such as GPUs.
The AccountingStorageTRES
setting is used to configure which TRES values are stored in the Slurm database.
Refer to the official Slurm documentation for more details.
Example: Tracking GPU usage
To enable tracking for GPU resources:
AccountingStorageTRES=gres/gpu
Example: Tracking GPU and IOP
To track GPU and a license-based resource named iop1
:
AccountingStorageTRES=gres/gpu,license/iop1
Disable Usage Decay¶
Slurm defaults to using the multifactor priority plugin to schedule jobs. To verify this, inspect the PriorityType setting:
scontrol show config | grep PriorityType
When using the multifactor plugin, the PriorityDecayHalfLife
and PriorityUsageResetPeriod
settings need to be disabled.
These settings cause Slurm to reduce recorded usage over time, which interferes with Keystone's accounting calculations.
PriorityDecayHalfLife=00:00:00
PriorityUsageResetPeriod=NONE
Important
Disabling these settings may affect your Slurm fairshare policy.
Administrators are strongly encouraged to review their fairshare policy settings.
Configure Charging Rates¶
TRES billing weights default to zero and must be explicitly defined using the TRESBillingWeights
option.
Weights are set per partition and can be expressed in a variety of units.
See the Slurm documentation for full details.
Example: Billing for CPU
To only charge users for CPU usage:
PartitionName=partition_name TRESBillingWeights="CPU=1.0"
Example: Billing for CPU and GPU
To charge GPU usage at twice the rate of CPU usage:
PartitionName=partition_name TRESBillingWeights="CPU=1.0,GRES/gpu=2.0"
To ensure accurate usage calculations, enable the MAX_TRES
flag in the PriorityFlags
setting.
This ensures the billable TRES includes node-local resources (e.g. CPU, memory, GPU) as well as global TRES (e.g. licenses).
PriorityFlags=MAX_TRES