Documentation
¶
Overview ¶
Package tests contains the E2E test implementations for xmtpd.
Index ¶
- type ChaosAttestationFaultTest
- type ChaosBandwidthThrottleTest
- type ChaosCompoundFaultTest
- type ChaosConnectionResetTest
- type ChaosLatencyTest
- type ChaosNetworkPartitionTest
- type ChaosNodeDownTest
- type GatewayScaleTest
- type MultiPayerTest
- type PayerLifecycleTest
- type RateRegistryChangeTest
- type SettlementVerificationTest
- type SmokeTest
- type StuckStateDetectionTest
- type SustainedLoadTest
- type SyncVerificationTest
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ChaosAttestationFaultTest ¶
type ChaosAttestationFaultTest struct{}
ChaosAttestationFaultTest verifies payer report attestation behavior when a node crashes mid-cycle. With 3 nodes, quorum = (3/2)+1 = 2 attestations.
The test runs in three phases:
- Start 3 nodes, generate traffic, verify the pipeline works normally (at least one report gets created).
- Stop one node (leaving 2 alive). Since quorum is 2 and 2 nodes remain, attestation should still succeed. Verify that reports progress past AttestationPending even with a dead node.
- Restart the stopped node and verify it rejoins the cluster. The cluster should continue processing reports.
This tests the resilience of the attestation quorum mechanism under real node failures (not just network partitions).
func NewChaosAttestationFaultTest ¶
func NewChaosAttestationFaultTest() *ChaosAttestationFaultTest
func (*ChaosAttestationFaultTest) Description ¶
func (t *ChaosAttestationFaultTest) Description() string
func (*ChaosAttestationFaultTest) Name ¶
func (t *ChaosAttestationFaultTest) Name() string
func (*ChaosAttestationFaultTest) Run ¶
func (t *ChaosAttestationFaultTest) Run( ctx context.Context, env *types.Environment, ) error
type ChaosBandwidthThrottleTest ¶
type ChaosBandwidthThrottleTest struct{}
ChaosBandwidthThrottleTest verifies graceful degradation under severe bandwidth constraints. It applies aggressive throttling to ALL nodes simultaneously, then verifies:
- Envelopes still get through (just slowly)
- No nodes crash or become permanently stuck
- After throttling is removed, the system returns to normal throughput
This simulates a scenario like a cloud provider network incident where all inter-node bandwidth is severely constrained.
func NewChaosBandwidthThrottleTest ¶
func NewChaosBandwidthThrottleTest() *ChaosBandwidthThrottleTest
func (*ChaosBandwidthThrottleTest) Description ¶
func (t *ChaosBandwidthThrottleTest) Description() string
func (*ChaosBandwidthThrottleTest) Name ¶
func (t *ChaosBandwidthThrottleTest) Name() string
func (*ChaosBandwidthThrottleTest) Run ¶
func (t *ChaosBandwidthThrottleTest) Run( ctx context.Context, env *types.Environment, ) error
type ChaosCompoundFaultTest ¶
type ChaosCompoundFaultTest struct{}
ChaosCompoundFaultTest applies multiple different faults across different nodes simultaneously to stress the system's ability to handle compound failure scenarios. Specifically:
- Node 200: high latency (1000ms) — simulates a geographically distant node
- Node 300: bandwidth throttle (10 KB/s) — simulates a congested link
While these faults are active, traffic is generated and the test verifies:
- Traffic still flows to the unaffected node (100)
- Both degraded nodes eventually receive envelopes (even if slowly)
- After all faults are removed, the cluster converges
func NewChaosCompoundFaultTest ¶
func NewChaosCompoundFaultTest() *ChaosCompoundFaultTest
func (*ChaosCompoundFaultTest) Description ¶
func (t *ChaosCompoundFaultTest) Description() string
func (*ChaosCompoundFaultTest) Name ¶
func (t *ChaosCompoundFaultTest) Name() string
func (*ChaosCompoundFaultTest) Run ¶
func (t *ChaosCompoundFaultTest) Run( ctx context.Context, env *types.Environment, ) error
type ChaosConnectionResetTest ¶
type ChaosConnectionResetTest struct{}
ChaosConnectionResetTest injects TCP connection resets (RST) on a node while traffic is flowing and verifies:
- Traffic generation continues despite RSTs (the client retries or the remaining nodes still accept traffic)
- After RSTs are removed, the affected node recovers and catches up
- All nodes eventually converge on envelope count
This is a more aggressive fault than latency — it actively kills TCP connections, forcing reconnection and retry logic to engage.
func NewChaosConnectionResetTest ¶
func NewChaosConnectionResetTest() *ChaosConnectionResetTest
func (*ChaosConnectionResetTest) Description ¶
func (t *ChaosConnectionResetTest) Description() string
func (*ChaosConnectionResetTest) Name ¶
func (t *ChaosConnectionResetTest) Name() string
func (*ChaosConnectionResetTest) Run ¶
func (t *ChaosConnectionResetTest) Run( ctx context.Context, env *types.Environment, ) error
type ChaosLatencyTest ¶
type ChaosLatencyTest struct{}
ChaosLatencyTest injects network latency into a node while traffic is flowing and verifies:
- Traffic still flows to the affected node (latency doesn't prevent delivery)
- Envelopes replicate across all nodes (including the latency-affected one)
- After latency is removed, the system returns to normal
func NewChaosLatencyTest ¶
func NewChaosLatencyTest() *ChaosLatencyTest
func (*ChaosLatencyTest) Description ¶
func (t *ChaosLatencyTest) Description() string
func (*ChaosLatencyTest) Name ¶
func (t *ChaosLatencyTest) Name() string
func (*ChaosLatencyTest) Run ¶
func (t *ChaosLatencyTest) Run(ctx context.Context, env *types.Environment) error
type ChaosNetworkPartitionTest ¶
type ChaosNetworkPartitionTest struct{}
ChaosNetworkPartitionTest verifies that the system correctly handles a network partition where a node becomes unreachable by other nodes.
Architecture context: Toxiproxy sits between nodes in the Docker network. Each node's on-chain HTTP address points to its toxiproxy proxy. When we add a timeout=0 toxic (black hole) on node-300's proxy:
- Other nodes CANNOT sync FROM node-300 (they connect via the proxy)
- Node-300 CAN still sync FROM others (it connects to THEIR proxies)
- Client publishes bypass toxiproxy (direct port mapping to host)
The test:
- Partitions node-300 by black-holing its proxy
- Publishes envelopes to node-300 (direct, bypasses proxy)
- Verifies other nodes do NOT receive node-300's envelopes (they can't sync)
- Removes the partition
- Verifies other nodes catch up by syncing node-300's envelopes
func NewChaosNetworkPartitionTest ¶
func NewChaosNetworkPartitionTest() *ChaosNetworkPartitionTest
func (*ChaosNetworkPartitionTest) Description ¶
func (t *ChaosNetworkPartitionTest) Description() string
func (*ChaosNetworkPartitionTest) Name ¶
func (t *ChaosNetworkPartitionTest) Name() string
func (*ChaosNetworkPartitionTest) Run ¶
func (t *ChaosNetworkPartitionTest) Run( ctx context.Context, env *types.Environment, ) error
type ChaosNodeDownTest ¶
type ChaosNodeDownTest struct{}
ChaosNodeDownTest stops a node while traffic is flowing, restarts it, and verifies it catches up with the rest of the cluster.
func NewChaosNodeDownTest ¶
func NewChaosNodeDownTest() *ChaosNodeDownTest
func (*ChaosNodeDownTest) Description ¶
func (t *ChaosNodeDownTest) Description() string
func (*ChaosNodeDownTest) Name ¶
func (t *ChaosNodeDownTest) Name() string
func (*ChaosNodeDownTest) Run ¶
func (t *ChaosNodeDownTest) Run(ctx context.Context, env *types.Environment) error
type GatewayScaleTest ¶
type GatewayScaleTest struct{}
GatewayScaleTest verifies that gateways can be dynamically added and removed while traffic is being generated without errors.
func NewGatewayScaleTest ¶
func NewGatewayScaleTest() *GatewayScaleTest
func (*GatewayScaleTest) Description ¶
func (t *GatewayScaleTest) Description() string
func (*GatewayScaleTest) Name ¶
func (t *GatewayScaleTest) Name() string
func (*GatewayScaleTest) Run ¶
func (t *GatewayScaleTest) Run(ctx context.Context, env *types.Environment) error
type MultiPayerTest ¶
type MultiPayerTest struct{}
MultiPayerTest verifies that traffic from distinct payer addresses is correctly attributed in the database. It creates multiple clients with different payer keys targeting the same node, generates traffic from each, and asserts that per-payer usage records are created with the correct addresses.
func NewMultiPayerTest ¶
func NewMultiPayerTest() *MultiPayerTest
func (*MultiPayerTest) Description ¶
func (t *MultiPayerTest) Description() string
func (*MultiPayerTest) Name ¶
func (t *MultiPayerTest) Name() string
func (*MultiPayerTest) Run ¶
func (t *MultiPayerTest) Run(ctx context.Context, env *types.Environment) error
type PayerLifecycleTest ¶
type PayerLifecycleTest struct{}
PayerLifecycleTest generates traffic and verifies the full payer report lifecycle: creation -> attestation -> submission -> settlement -> excess transfer -> claim -> withdraw -> payer withdrawal.
Worker scheduling:
Generator (workerID=1), Submitter (workerID=2), Settlement (workerID=3) fire at minute offsets 0, +5, +10 within a 60-minute cycle based on Knuth hash of the node ID. Attestation polls every 10s (env var).
The worst-case wait is for the generator's first fire (~60 min). Total test runtime: ~90 minutes.
func NewPayerLifecycleTest ¶
func NewPayerLifecycleTest() *PayerLifecycleTest
func (*PayerLifecycleTest) Description ¶
func (t *PayerLifecycleTest) Description() string
func (*PayerLifecycleTest) Name ¶
func (t *PayerLifecycleTest) Name() string
func (*PayerLifecycleTest) Run ¶
func (t *PayerLifecycleTest) Run(ctx context.Context, env *types.Environment) error
type RateRegistryChangeTest ¶
type RateRegistryChangeTest struct{}
RateRegistryChangeTest verifies that the system correctly handles rate registry changes mid-operation. It starts a cluster, adds rates to the on-chain registry, generates traffic, and verifies that payer reports are still created and settled using the updated rates.
Nodes are configured with a short rate registry refresh interval (10s) so they pick up changes quickly.
func NewRateRegistryChangeTest ¶
func NewRateRegistryChangeTest() *RateRegistryChangeTest
func (*RateRegistryChangeTest) Description ¶
func (t *RateRegistryChangeTest) Description() string
func (*RateRegistryChangeTest) Name ¶
func (t *RateRegistryChangeTest) Name() string
func (*RateRegistryChangeTest) Run ¶
func (t *RateRegistryChangeTest) Run( ctx context.Context, env *types.Environment, ) error
type SettlementVerificationTest ¶
type SettlementVerificationTest struct{}
SettlementVerificationTest verifies end-to-end settlement by cross-referencing database state against on-chain contract state. After traffic generates payer reports that settle, the test reads the on-chain report via the PayerReportManager contract and asserts that sequence IDs, settlement status, and node IDs match the database records.
func NewSettlementVerificationTest ¶
func NewSettlementVerificationTest() *SettlementVerificationTest
func (*SettlementVerificationTest) Description ¶
func (t *SettlementVerificationTest) Description() string
func (*SettlementVerificationTest) Name ¶
func (t *SettlementVerificationTest) Name() string
func (*SettlementVerificationTest) Run ¶
func (t *SettlementVerificationTest) Run( ctx context.Context, env *types.Environment, ) error
type SmokeTest ¶
type SmokeTest struct{}
SmokeTest verifies basic cluster functionality: starts nodes and a gateway, publishes envelopes, and checks they replicate to all nodes.
func NewSmokeTest ¶
func NewSmokeTest() *SmokeTest
func (*SmokeTest) Description ¶
type StuckStateDetectionTest ¶
type StuckStateDetectionTest struct{}
StuckStateDetectionTest verifies that the system can be observed entering a stuck state when attestation quorum is lost. It starts with 4 nodes, waits for the payer report pipeline to function normally, then removes enough nodes from the canonical network that the remaining nodes cannot reach the 2/3 quorum required for attestation.
The test then verifies:
- New reports are still generated (generator runs on each node independently)
- Reports remain stuck in AttestationPending (quorum unreachable)
- After restoring the canonical network, reports resume normal progression
This demonstrates the observability needed for stuck state alerting.
func NewStuckStateDetectionTest ¶
func NewStuckStateDetectionTest() *StuckStateDetectionTest
func (*StuckStateDetectionTest) Description ¶
func (t *StuckStateDetectionTest) Description() string
func (*StuckStateDetectionTest) Name ¶
func (t *StuckStateDetectionTest) Name() string
func (*StuckStateDetectionTest) Run ¶
func (t *StuckStateDetectionTest) Run( ctx context.Context, env *types.Environment, ) error
type SustainedLoadTest ¶
type SustainedLoadTest struct{}
SustainedLoadTest generates a realistic workload for 30 minutes, then waits for payer reports to settle. It verifies that the system handles sustained traffic without reports getting stuck or lost.
func NewSustainedLoadTest ¶
func NewSustainedLoadTest() *SustainedLoadTest
func (*SustainedLoadTest) Description ¶
func (t *SustainedLoadTest) Description() string
func (*SustainedLoadTest) Name ¶
func (t *SustainedLoadTest) Name() string
func (*SustainedLoadTest) Run ¶
func (t *SustainedLoadTest) Run( ctx context.Context, env *types.Environment, ) error
type SyncVerificationTest ¶
type SyncVerificationTest struct{}
SyncVerificationTest verifies that all nodes in a cluster agree on the latest sequence IDs for each originator. After publishing envelopes and allowing time for replication, it compares vector clocks across all nodes to ensure consistency.
func NewSyncVerificationTest ¶
func NewSyncVerificationTest() *SyncVerificationTest
func (*SyncVerificationTest) Description ¶
func (t *SyncVerificationTest) Description() string
func (*SyncVerificationTest) Name ¶
func (t *SyncVerificationTest) Name() string
func (*SyncVerificationTest) Run ¶
func (t *SyncVerificationTest) Run( ctx context.Context, env *types.Environment, ) error
Source Files
¶
- chaos_attestation_fault.go
- chaos_bandwidth_throttle.go
- chaos_compound_fault.go
- chaos_connection_reset.go
- chaos_latency.go
- chaos_network_partition.go
- chaos_node_down.go
- gateway_scale.go
- multi_payer.go
- payer_lifecycle.go
- rate_registry_change.go
- settlement_verification.go
- smoke.go
- stuck_state_detection.go
- sustained_load.go
- sync_verification.go