Documentation
¶
Overview ¶
Package compute contains code for accessing compute resources from many different cluster types, including AWS, Google Cloud, and HPC-style cluster scheduler.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type HPCBackend ¶
type HPCBackend struct {
Name string
SubmitCmd string
CancelCmd string
Template string
Conf *config.Config
Event events.Writer
Database tes.ReadOnlyServer
// ExtractID is responsible for extracting the task id from the response
// returned by the SubmitCmd.
ExtractID func(string) string
// MapStates takes a list of backend specific ids and calls out to the backend
// via (squeue, qstat, condor_q, etc) to get that tasks current state. These states
// are mapped to TES states along with an optional reason for this mapping.
// The Reconcile function can then use the response to update the task states
// and system logs to report errors reported by the backend.
MapStates func([]string) ([]*HPCTaskState, error)
ReconcileRate time.Duration
Log *logger.Logger
events.Backend
}
HPCBackend represents an HPCBackend such as HtCondor, Slurm, Grid Engine, etc.
func (*HPCBackend) Cancel ¶
func (b *HPCBackend) Cancel(ctx context.Context, taskID string) error
Cancel cancels a task via "qdel", "condor_rm", "scancel", etc.
func (*HPCBackend) Close ¶
func (b *HPCBackend) Close()
func (*HPCBackend) Reconcile ¶
func (b *HPCBackend) Reconcile(ctx context.Context)
Reconcile loops through tasks and checks the status from Funnel's database against the status reported by the backend (slurm, htcondor, grid engine, etc). This allows the backend to report system error's that prevented the worker process from running.
Currently this handles a narrow set of cases:
|---------------------|-----------------|--------------------| | Funnel State | Backend State | Reconciled State | |---------------------|-----------------|--------------------| | QUEUED | FAILED | SYSTEM_ERROR | | QUEUED | QUEUED/PENDING* | SYSTEM_ERROR | | INITIALIZING | FAILED | SYSTEM_ERROR | | RUNNING | FAILED | SYSTEM_ERROR |
In this context a "FAILED" state is being used as a generic term that captures one or more terminal states for the backend.
*QUEUED/PENDING: this captures the case where the scheduler has a task that is stuck in the queued state because the resource request that can never be fulfilled.
func (*HPCBackend) WriteEvent ¶
WriteEvent writes an event to the compute backend. Currently, only TASK_CREATED is handled, which calls Submit.
Directories
¶
| Path | Synopsis |
|---|---|
|
Package aws_batch contains code for accessing compute resources via AWS Batch.
|
Package aws_batch contains code for accessing compute resources via AWS Batch. |
|
Package gcp_batch contains code for accessing compute resources via Google Batch.
|
Package gcp_batch contains code for accessing compute resources via Google Batch. |
|
Package gridengine contains code for accessing compute resources via Open Grid Engine.
|
Package gridengine contains code for accessing compute resources via Open Grid Engine. |
|
Package htcondor contains code for accessing compute resources via HTCondor.
|
Package htcondor contains code for accessing compute resources via HTCondor. |
|
Package kubernetes contains code for accessing compute resources via the Kubernetes v1 Batch API.
|
Package kubernetes contains code for accessing compute resources via the Kubernetes v1 Batch API. |
|
Package local contains code for accessing compute resources via the local computer, for Funnel development and debugging.
|
Package local contains code for accessing compute resources via the local computer, for Funnel development and debugging. |
|
Package noop contains a compute backend that does nothing, for testing purposes.
|
Package noop contains a compute backend that does nothing, for testing purposes. |
|
Package pbs contains code for accessing compute resources via PBS/Torque.
|
Package pbs contains code for accessing compute resources via PBS/Torque. |
|
Code generated by mockery v1.0.0.
|
Code generated by mockery v1.0.0. |
|
Package slurm contains code for accessing compute resources via Slurm.
|
Package slurm contains code for accessing compute resources via Slurm. |