Documentation
¶
Overview ¶
Package dsort provides distributed massively parallel resharding for very large datasets.
- Copyright (c) 2018-2023, NVIDIA CORPORATION. All rights reserved.
Package dsort provides distributed massively parallel resharding for very large datasets.
- Copyright (c) 2018-2023, NVIDIA CORPORATION. All rights reserved.
Package dsort provides distributed massively parallel resharding for very large datasets.
- Copyright (c) 2018-2023, NVIDIA CORPORATION. All rights reserved.
Package dsort provides distributed massively parallel resharding for very large datasets.
- Copyright (c) 2018-2022, NVIDIA CORPORATION. All rights reserved. *
Package dsort provides distributed massively parallel resharding for very large datasets.
- Copyright (c) 2018-2023, NVIDIA CORPORATION. All rights reserved.
Package dsort provides distributed massively parallel resharding for very large datasets.
- Copyright (c) 2018-2023, NVIDIA CORPORATION. All rights reserved.
Package dsort provides distributed massively parallel resharding for very large datasets.
- Copyright (c) 2018-2023, NVIDIA CORPORATION. All rights reserved.
Package dsort provides distributed massively parallel resharding for very large datasets.
- Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
Package dsort provides distributed massively parallel resharding for very large datasets.
- Copyright (c) 2018-2022, NVIDIA CORPORATION. All rights reserved.
Package dsort provides distributed massively parallel resharding for very large datasets.
- Copyright (c) 2018-2023, NVIDIA CORPORATION. All rights reserved.
Package dsort provides distributed massively parallel resharding for very large datasets.
- Copyright (c) 2018-2023, NVIDIA CORPORATION. All rights reserved.
Package dsort provides distributed massively parallel resharding for very large datasets.
- Copyright (c) 2018-2023, NVIDIA CORPORATION. All rights reserved.
Package dsort provides distributed massively parallel resharding for very large datasets.
- Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.
msgp -file <path to dsort/manager_types.go> -tests=false -marshal=false -unexported Code generated by the command above; see docs/msgp.md. DO NOT EDIT.
Package dsort provides APIs for distributed archive file shuffling.
- Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.
Package dsort provides distributed massively parallel resharding for very large datasets.
- Copyright (c) 2018-2023, NVIDIA CORPORATION. All rights reserved.
Package dsort provides APIs for distributed archive file shuffling.
- Copyright (c) 2018-2023, NVIDIA CORPORATION. All rights reserved.
Package dsort provides distributed massively parallel resharding for very large datasets.
- Copyright (c) 2018-2023, NVIDIA CORPORATION. All rights reserved.
Index ¶
- Constants
- func PabortHandler(w http.ResponseWriter, r *http.Request)
- func PgetHandler(w http.ResponseWriter, r *http.Request)
- func Pinit(si cluster.Node)
- func PremoveHandler(w http.ResponseWriter, r *http.Request)
- func PstartHandler(w http.ResponseWriter, r *http.Request, parsc *ParsedReq)
- func TargetHandler(w http.ResponseWriter, r *http.Request)
- func Tinit(t cluster.Target, stats stats.Tracker, db kvdb.Driver)
- type Algorithm
- type CreationPhaseMetadata
- type JobInfo
- type LocalExtraction
- type Manager
- type ManagerGroup
- func (mg *ManagerGroup) AbortAll(err error)
- func (mg *ManagerGroup) Add(managerUUID string) (*Manager, error)
- func (mg *ManagerGroup) Get(managerUUID string, inclArchived bool) (*Manager, bool)
- func (mg *ManagerGroup) List(descRegex *regexp.Regexp, onlyActive bool) []JobInfo
- func (mg *ManagerGroup) Remove(managerUUID string) error
- type MetaSorting
- type Metrics
- type ParsedReq
- type PhaseInfo
- type RemoteResponse
- type RequestSpec
- type ShardCreation
- type TimeStats
Constants ¶
const ( Alphanumeric = "alphanumeric" // string comparison (decreasing or increasing) None = "none" // none (used for resharding) MD5 = "md5" // compare md5(name) Shuffle = "shuffle" // random shuffle (use with the same seed to reproduce) Content = "content" // extract (int, string, float) from a given file, and compare )
const ( ExtractionPhase = "extraction" SortingPhase = "sorting" CreationPhase = "creation" )
const (
DSorterGeneralType = "dsort_general"
)
const (
DSorterMemType = "dsort_mem"
)
const DefaultExt = archive.ExtTar // default shard extension/format/MIME when spec's input_extension is empty
const PrefixJobID = "srt-"
Variables ¶
This section is empty.
Functions ¶
func PabortHandler ¶ added in v1.3.19
func PabortHandler(w http.ResponseWriter, r *http.Request)
DELETE /v1/sort/abort
func PgetHandler ¶ added in v1.3.19
func PgetHandler(w http.ResponseWriter, r *http.Request)
GET /v1/sort
func PremoveHandler ¶ added in v1.3.19
func PremoveHandler(w http.ResponseWriter, r *http.Request)
DELETE /v1/sort
func PstartHandler ¶ added in v1.3.19
func PstartHandler(w http.ResponseWriter, r *http.Request, parsc *ParsedReq)
POST /v1/sort
Types ¶
type Algorithm ¶ added in v1.3.19
type Algorithm struct {
// one of the `algorithms` above
Kind string `json:"kind"`
// used with two sorting alg-s: Alphanumeric and Content
Decreasing bool `json:"decreasing"`
// when sort is a random shuffle
Seed string `json:"seed"`
// usage: exclusively for Content sorting
// e.g.: ".cls" containing sorting key for each record (sample) - see next
// NOTE: not to confuse with shards "input_extension"
Ext string `json:"extension"`
// ditto: Content only
// `shard.contentKeyTypes` enum values: {"int", "string", "float" }
ContentKeyType string `json:"content_key_type"`
}
type CreationPhaseMetadata ¶
type CreationPhaseMetadata struct {
Shards []*shard.Shard `msg:"shards"`
SendOrder map[string]*shard.Shard `msg:"send_order"`
}
func (*CreationPhaseMetadata) DecodeMsg ¶
func (z *CreationPhaseMetadata) DecodeMsg(dc *msgp.Reader) (err error)
DecodeMsg implements msgp.Decodable
func (*CreationPhaseMetadata) EncodeMsg ¶
func (z *CreationPhaseMetadata) EncodeMsg(en *msgp.Writer) (err error)
EncodeMsg implements msgp.Encodable
func (*CreationPhaseMetadata) Msgsize ¶
func (z *CreationPhaseMetadata) Msgsize() (s int)
Msgsize returns an upper bound estimate of the number of bytes occupied by the serialized message
type JobInfo ¶
type JobInfo struct {
ID string `json:"id"` // job ID == xact ID (aka managerUUID)
SrcBck cmn.Bck `json:"src-bck"`
DstBck cmn.Bck `json:"dst-bck"`
StartedTime time.Time `json:"started_time,omitempty"`
FinishTime time.Time `json:"finish_time,omitempty"`
ExtractedDuration time.Duration `json:"started_meta_sorting,omitempty"`
SortingDuration time.Duration `json:"started_shard_creation,omitempty"`
CreationDuration time.Duration `json:"finished_shard_creation,omitempty"`
Objs int64 `json:"loc-objs,string"` // locally processed
Bytes int64 `json:"loc-bytes,string"` //
Metrics *Metrics
Aborted bool `json:"aborted"`
Archived bool `json:"archived"`
}
JobInfo is a struct that contains stats that represent the DSort run in a list
func (*JobInfo) IsFinished ¶
type LocalExtraction ¶
type LocalExtraction struct {
PhaseInfo
// TotalCnt is the number of shards DSort has to process in total.
TotalCnt int64 `json:"total_count,string"`
// ExtractedCnt is the cumulative number of extracted shards. In the
// end, this should be roughly equal to TotalCnt/#Targets.
ExtractedCnt int64 `json:"extracted_count,string"`
// ExtractedSize is uncompressed size of extracted shards.
ExtractedSize int64 `json:"extracted_size,string"`
// ExtractedRecordCnt - number of records extracted from all shards.
ExtractedRecordCnt int64 `json:"extracted_record_count,string"`
// ExtractedToDiskCnt describes number of shards extracted to the disk. To
// compute the number shards extracted to memory just subtract it from
// ExtractedCnt.
ExtractedToDiskCnt int64 `json:"extracted_to_disk_count,string"`
// ExtractedToDiskSize - uncompressed size of shards extracted to disk.
ExtractedToDiskSize int64 `json:"extracted_to_disk_size,string"`
}
LocalExtraction contains metrics for first phase of DSort.
type Manager ¶
type Manager struct {
// tagged fields are the only fields persisted once dsort finishes
ManagerUUID string `json:"manager_uuid"`
Metrics *Metrics `json:"metrics"`
Pars *parsedReqSpec `json:"pars"`
// contains filtered or unexported fields
}
Manager maintains all the state required for a single run of a distributed archive file shuffle.
func (*Manager) ListenSmapChanged ¶
func (m *Manager) ListenSmapChanged()
type ManagerGroup ¶
type ManagerGroup struct {
// contains filtered or unexported fields
}
ManagerGroup abstracts multiple dsort managers into single struct.
var Managers *ManagerGroup
func NewManagerGroup ¶
func NewManagerGroup(db kvdb.Driver, skipHk bool) *ManagerGroup
NewManagerGroup returns new, initialized manager group.
func (*ManagerGroup) AbortAll ¶
func (mg *ManagerGroup) AbortAll(err error)
func (*ManagerGroup) Add ¶
func (mg *ManagerGroup) Add(managerUUID string) (*Manager, error)
Add new, non-initialized manager with given managerUUID to manager group. Returned manager is locked, it's caller responsibility to unlock it. Returns error when manager with specified managerUUID already exists.
func (*ManagerGroup) Get ¶
func (mg *ManagerGroup) Get(managerUUID string, inclArchived bool) (*Manager, bool)
Get gets manager with given mangerUUID. When manager with given uuid does not exist and user requested persisted lookup, it looks for it in persistent storage and returns it if found. Returns false if does not exist, true otherwise.
func (*ManagerGroup) List ¶
func (mg *ManagerGroup) List(descRegex *regexp.Regexp, onlyActive bool) []JobInfo
func (*ManagerGroup) Remove ¶
func (mg *ManagerGroup) Remove(managerUUID string) error
Remove the managerUUID from history. Used for reducing clutter. Fails if process hasn't been cleaned up.
type MetaSorting ¶
type MetaSorting struct {
PhaseInfo
// SentStats - time statistics about records sent to another target
SentStats *TimeStats `json:"sent_stats,omitempty"`
// RecvStats - time statistics about records receivied from another target
RecvStats *TimeStats `json:"recv_stats,omitempty"`
}
MetaSorting contains metrics for second phase of DSort.
type Metrics ¶
type Metrics struct {
Extraction *LocalExtraction `json:"local_extraction,omitempty"`
Sorting *MetaSorting `json:"meta_sorting,omitempty"`
Creation *ShardCreation `json:"shard_creation,omitempty"`
// job description
Description string `json:"description,omitempty"`
// warnings during the run
Warnings []string `json:"warnings,omitempty"`
// errors, if any
Errors []string `json:"errors,omitempty"`
// has been aborted
Aborted atomic.Bool `json:"aborted,omitempty"`
// has been archived to persistent storage
Archived atomic.Bool `json:"archived,omitempty"`
}
Metrics is general struct which contains all stats about DSort run.
func (*Metrics) ElapsedTime ¶
type PhaseInfo ¶
type PhaseInfo struct {
Start time.Time `json:"started_time"`
End time.Time `json:"end_time"`
// Elapsed time (in seconds) from start to given point of time or end when
// phase has finished.
Elapsed time.Duration `json:"elapsed"`
// Running specifies if phase is in progress.
Running bool `json:"running"`
// Finished specifies if phase has finished. If running and finished is
// false this means that the phase did not have started yet.
Finished bool `json:"finished"`
// contains filtered or unexported fields
}
PhaseInfo contains general stats and state for given phase. It is base struct which is extended by actual phases structs.
type RemoteResponse ¶
func (*RemoteResponse) DecodeMsg ¶
func (z *RemoteResponse) DecodeMsg(dc *msgp.Reader) (err error)
DecodeMsg implements msgp.Decodable
func (*RemoteResponse) EncodeMsg ¶
func (z *RemoteResponse) EncodeMsg(en *msgp.Writer) (err error)
EncodeMsg implements msgp.Encodable
func (*RemoteResponse) Msgsize ¶
func (z *RemoteResponse) Msgsize() (s int)
Msgsize returns an upper bound estimate of the number of bytes occupied by the serialized message
type RequestSpec ¶
type RequestSpec struct {
// Required
InputBck cmn.Bck `json:"input_bck" yaml:"input_bck"`
InputFormat apc.ListRange `json:"input_format" yaml:"input_format"`
OutputFormat string `json:"output_format" yaml:"output_format"`
OutputShardSize string `json:"output_shard_size" yaml:"output_shard_size"`
// Desirable
InputExtension string `json:"input_extension" yaml:"input_extension"`
// Optional
// Default: InputExtension
OutputExtension string `json:"output_extension" yaml:"output_extension"`
// Default: ""
Description string `json:"description" yaml:"description"`
// Default: same as `bck` field
OutputBck cmn.Bck `json:"output_bck" yaml:"output_bck"`
// Default: alphanumeric, increasing
Algorithm Algorithm `json:"algorithm" yaml:"algorithm"`
// Default: ""
OrderFileURL string `json:"order_file" yaml:"order_file"`
// Default: "\t"
OrderFileSep string `json:"order_file_sep" yaml:"order_file_sep"`
// Default: "80%"
MaxMemUsage string `json:"max_mem_usage" yaml:"max_mem_usage"`
// Default: calcMaxLimit()
ExtractConcMaxLimit int `json:"extract_concurrency_max_limit" yaml:"extract_concurrency_max_limit"`
// Default: calcMaxLimit()
CreateConcMaxLimit int `json:"create_concurrency_max_limit" yaml:"create_concurrency_max_limit"`
// debug
DSorterType string `json:"dsorter_type"`
DryRun bool `json:"dry_run"` // Default: false
Config cmn.DSortConf
}
RequestSpec defines the user specification for requests to the endpoint /v1/sort.
func (*RequestSpec) ParseCtx ¶ added in v1.3.19
func (rs *RequestSpec) ParseCtx() (*ParsedReq, error)
type ShardCreation ¶
type ShardCreation struct {
PhaseInfo
// ToCreate - number of shards that to be created in this phase.
ToCreate int64 `json:"to_create,string"`
// CreatedCnt the number of shards that have been so far created.
// Should match ToCreate when phase finishes.
CreatedCnt int64 `json:"created_count,string"`
// MovedShardCnt specifies the number of shards that have migrated from this
// to another target. Applies only when dealing with compressed
// data. Sometimes, rather than creating at the destination, it is faster
// to create a shard on a specific target and send it over (to the destination).
MovedShardCnt int64 `json:"moved_shard_count,string"`
// RequestStats - time statistics: requests to other targets.
RequestStats *TimeStats `json:"req_stats,omitempty"`
// ResponseStats - time statistics: responses to other targets.
ResponseStats *TimeStats `json:"resp_stats,omitempty"`
}
ShardCreation contains metrics for third and last phase of DSort.
type TimeStats ¶
type TimeStats struct {
// Total contains total number of milliseconds spend on
// specific task.
Total int64 `json:"total_ms,string"`
// Count contains number of time specific task was triggered.
Count int64 `json:"count,string"`
MinMs int64 `json:"min_ms,string"`
MaxMs int64 `json:"max_ms,string"`
AvgMs int64 `json:"avg_ms,string"`
}
TimeStats contains statistics about time spent on specific task. It calculates min, max and avg times.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
Package ct provides additional dsort-specific content types
|
Package ct provides additional dsort-specific content types |
|
Package shard provides Extract(shard), Create(shard), and associated methods across all suppported archival formats (see cmn/archive/mime.go)
|
Package shard provides Extract(shard), Create(shard), and associated methods across all suppported archival formats (see cmn/archive/mime.go) |