mlabuse

command

v0.0.5 Latest Latest Go to latest Published: Jul 6, 2025 License: MIT Imports: 2 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/dobrevit/hkp-plugin-core

Links

Open Source Insights

README ¶

ML Abuse Detection Plugin

Overview

The ML Abuse Detection Plugin provides advanced machine learning-based abuse detection capabilities for the HKP (Hockeypuck) server. This plugin uses sophisticated behavioral analysis, anomaly detection algorithms, and AI-generated content detection to identify and prevent various forms of abuse including bot attacks, automated scripts, AI-generated content, and prompt injection attempts.

Key Features

Advanced Detection Capabilities

Behavioral Anomaly Detection: Uses Isolation Forest algorithm to detect abnormal request patterns
LLM/AI Content Detection: Identifies AI-generated content and prompt injection attempts
Real-time Learning: Adapts to new attack patterns through online learning
Entropy Analysis: Measures randomness in user behavior patterns
Session Profiling: Tracks comprehensive session-level behavioral metrics

Machine Learning Models

Isolation Forest Algorithm: Detects outliers in multi-dimensional behavioral space
Perplexity Analysis: Identifies synthetic text based on language model predictions
Pattern Recognition: Recognizes known attack signatures and behavioral patterns
Adaptive Thresholds: Self-adjusting detection thresholds based on traffic patterns

Intelligence Coordination

Event Integration: Coordinates with rate limiting and other security plugins
Header Intelligence: Adds ML scores to HTTP headers for downstream processing
Escalation Logic: Can trigger extended bans for persistent abusers
Real-time Metrics: Comprehensive statistics and performance monitoring

Configuration

Basic Configuration

[plugins.ml-abuse-detector]
enabled = true
modelPath = "/var/lib/hockeypuck/ml-models/anomaly.model"
anomalyThreshold = 0.85
behaviorWindowSize = 100
updateInterval = "5m"
llmDetection = true
syntheticThreshold = 0.75
maxMemoryMB = 256
enableRealtimeUpdate = true

Advanced Configuration

[plugins.ml-abuse-detector]
enabled = true
modelPath = "/var/lib/hockeypuck/ml-models/anomaly.model"
anomalyThreshold = 0.85
behaviorWindowSize = 100
updateInterval = "5m"
llmDetection = true
syntheticThreshold = 0.75
maxMemoryMB = 256
enableRealtimeUpdate = true

# Custom anomaly type thresholds
[plugins.ml-abuse-detector.thresholds]
bot_regular = 0.8
bot_random = 0.85
rapid_requests = 0.75
crawler = 0.9
user_agent_rotation = 0.85
high_errors = 0.7

# Model training settings
[plugins.ml-abuse-detector.training]
minDataPoints = 10
updateBatchSize = 100
retrainPercentage = 0.1
normalDataRatio = 0.5

# LLM detection tuning
[plugins.ml-abuse-detector.llm]
perplexityWeight = 0.4
patternWeight = 0.4
countWeight = 0.2
minTextLength = 50
maxTextLength = 1048576

Configuration Parameters

Parameter	Type	Default	Description
`enabled`	boolean	`true`	Enable/disable the plugin
`modelPath`	string	`""`	Path to the ML model file
`anomalyThreshold`	float	`0.85`	Score above this triggers blocking (0.0-1.0)
`behaviorWindowSize`	int	`100`	Number of requests to analyze for behavior
`updateInterval`	string	`"5m"`	How often to update ML models
`llmDetection`	boolean	`true`	Enable AI/LLM content detection
`syntheticThreshold`	float	`0.75`	Threshold for AI-generated content
`maxMemoryMB`	int	`256`	Maximum memory usage in MB
`enableRealtimeUpdate`	boolean	`true`	Enable online learning and model updates

API Endpoints

Implementation Status: ✅ All ML Abuse Detection endpoints listed below are currently implemented in the codebase.

Status and Monitoring

GET `/api/ml/status`

Get ML abuse detection system status.

Response:

{
    "plugin": "ml-abuse-detector",
    "version": "1.0.0",
    "enabled": true,
    "threshold": 0.85,
    "llm_detection": true,
    "metrics": {
        "total_requests": 145823,
        "blocked_requests": 1247,
        "block_rate": 0.00855,
        "avg_anomaly_score": 0.324
    }
}

GET `/api/ml/metrics`

Get comprehensive ML detection metrics.

Response:

{
    "total_requests": 145823,
    "blocked_requests": 1247,
    "block_rate": 0.00855,
    "anomaly_detections": {
        "bot_regular": 423,
        "rapid_requests": 312,
        "crawler": 189,
        "user_agent_rotation": 156,
        "high_errors": 98,
        "general_anomaly": 69
    },
    "llm_detections": 234,
    "injection_attempts": 45,
    "avg_anomaly_score": 0.324,
    "avg_synthetic_score": 0.287,
    "hourly_stats": [
        {
            "hour": 14,
            "requests": 8234,
            "blocked": 67,
            "anomalies_detected": 45,
            "llm_detected": 12,
            "avg_anomaly_score": 0.342
        }
    ],
    "uptime": "6h23m45s"
}

POST `/api/ml/analyze`

Analyze specific client or content for abuse patterns.

Request:

{
    "client_ip": "192.168.1.100",
    "text": "Optional text content to analyze for AI generation"
}

Response:

{
    "client_ip": "192.168.1.100",
    "anomaly_score": 0.923,
    "anomaly_type": "bot_regular",
    "reasons": [
        "Timing patterns too regular",
        "Low entropy in request intervals",
        "Suspicious user agent rotation"
    ],
    "confidence": 0.89,
    "llm_analysis": {
        "is_ai_generated": true,
        "perplexity": 2.34,
        "synthetic_score": 0.812,
        "prompt_injection": false,
        "token_patterns": ["formal_language", "repetitive_structure"]
    }
}

Detection Capabilities

Anomaly Types

Type	Description	Detection Indicators
`bot_regular`	Too-regular timing patterns	Timing entropy < 0.2, consistent intervals
`bot_random`	Artificially random patterns	Timing entropy > 0.9, suspicious randomness
`rapid_requests`	Inhuman request speed	Average interval < 0.5s, burst patterns
`user_agent_rotation`	Suspicious UA switching	>3 different user agents in session
`crawler`	Aggressive crawling behavior	>50 unique paths, systematic access
`high_errors`	Excessive error generation	Error rate > 30%, repeated failures
`general_anomaly`	Other abnormal patterns	High overall anomaly score

Behavioral Analysis Metrics

Session Pattern Analysis

Session Duration: Total time of user session
Request Count: Number of requests in session
Unique Paths: Number of distinct endpoints accessed
Error Rate: Percentage of requests resulting in errors
Bytes Transferred: Total data transfer volume
Key Operation Ratio: Ratio of key operations to total requests

Entropy Metrics

Timing Entropy: Randomness in request timing patterns
Path Entropy: Randomness in path access patterns
Parameter Entropy: Randomness in request parameters
Overall Score: Composite entropy assessment

LLM/AI Detection

Detection Patterns

The plugin identifies AI-generated content through:

Perplexity Analysis: Low perplexity indicates overly predictable text
Token Patterns: Specific phrases and structures common in AI text
Formal Language: Excessive use of formal or academic language
Repetitive Structure: Consistent sentence and paragraph patterns
Prompt Injection: Attempts to manipulate AI systems

AI Content Indicators

Unnaturally perfect grammar and syntax
Overuse of transitional phrases
Lack of personal anecdotes or informal language
Consistent tone without emotional variation
Technical accuracy combined with generic explanations

Response Headers

The plugin adds intelligence headers for coordination with other security plugins:

X-ML-Anomaly-Score: 0.923
X-ML-Anomaly-Type: bot_regular
X-ML-LLM-Detected: true
X-ML-Synthetic-Score: 0.812

Integration with Other Plugins

Rate Limiting Integration

Event Subscription: Listens to ratelimit.violation events
Profile Updates: Uses violation data to improve behavioral profiles
Escalation: Can trigger extended bans for persistent abusers
Shared Intelligence: Provides ML scores for informed rate limiting decisions

Zero Trust Integration

Risk Assessment: ML scores contribute to Zero Trust risk calculations
Behavioral Analytics: Shared behavior data improves trust scoring
Authentication Triggers: High anomaly scores can trigger step-up authentication

Event System Integration

Published Events

ml.abuse.detected: When abuse is detected and blocked
ml.abuse.escalate: When extended ban is recommended

Subscribed Events

ratelimit.violation: Rate limiting violations for profile updates

Performance Considerations

Resource Usage

Memory Usage: 50-100MB for models and behavior profiles
CPU Impact: <5% overhead for typical traffic loads
Latency: <10ms processing time per request
Model Updates: Background process, doesn't block requests
Disk Usage: Model files typically 10-50MB

Scaling Recommendations

Small Deployments (<1000 req/min)

Default configuration suitable
2-4 CPU cores recommended
4GB RAM minimum

Medium Deployments (1000-10000 req/min)

Consider Redis backend for shared profiles
4-8 CPU cores recommended
8-16GB RAM for model caching

Large Deployments (>10000 req/min)

Distribute ML processing across instances
Dedicated ML inference servers
8+ CPU cores, 16-32GB RAM per instance

Best Practices

Configuration Tuning

Start Conservative: Begin with higher thresholds (0.9+) and lower gradually
Monitor False Positives: Track legitimate traffic patterns
Adjust by Traffic Type: Different thresholds for different endpoint types
Regular Model Updates: Keep models current with traffic patterns

Operational Guidelines

Gradual Rollout: Enable monitoring before enforcement
Baseline Establishment: Allow learning period before strict enforcement
Regular Review: Periodically review blocked requests for accuracy
Performance Monitoring: Track resource usage and processing times

Security Configuration

Model Protection: Secure model files with appropriate permissions
Log Analysis: Regular review of detection patterns and effectiveness
Threshold Adjustment: Adapt thresholds based on threat landscape
Backup Strategy: Regular backups of trained models and configurations

Troubleshooting

High False Positive Rate

Symptoms:

Legitimate users being blocked
High block rate in metrics
User complaints about access issues

Solutions:

Increase anomalyThreshold (e.g., from 0.85 to 0.9)
Increase behaviorWindowSize for more data points
Review whitelist paths for legitimate automation
Check if CDN or proxy is affecting behavioral analysis

Model Performance Issues

Symptoms:

High CPU usage
Slow response times
Memory leaks

Solutions:

Reduce behaviorWindowSize if memory constrained
Disable enableRealtimeUpdate to prevent model growth
Increase updateInterval to reduce update frequency
Monitor cleanup intervals and adjust as needed

Detection Accuracy Problems

Symptoms:

Missing obvious bot traffic
Inconsistent detection results
Poor LLM detection accuracy

Solutions:

Lower detection thresholds gradually
Increase training data through longer observation periods
Verify model file integrity and version
Check for sufficient diverse training data

Integration Issues

Symptoms:

Events not being received
Headers not appearing
Plugin conflicts

Solutions:

Verify plugin load order (ML should load after rate limiting)
Check event subscription setup
Review middleware registration
Monitor plugin communication logs

Monitoring and Alerting

Key Metrics to Monitor

Detection Rates: Block rate, false positive rate
Performance: Processing latency, memory usage
Model Health: Accuracy metrics, update success rate
System Load: CPU usage, goroutine counts

Recommended Alerts

alerts:
  - name: high_ml_block_rate
    condition: ml_block_rate > 0.1
    severity: warning
    
  - name: ml_processing_latency
    condition: ml_avg_latency > 50ms
    severity: warning
    
  - name: ml_memory_usage
    condition: ml_memory_mb > 400
    severity: critical
    
  - name: ml_model_update_failure
    condition: ml_update_failures > 3
    severity: critical

Security Considerations

Threat Model

The plugin addresses:

Automated Bot Attacks: Detection through behavioral analysis
AI-Generated Spam: LLM detection capabilities
Prompt Injection: Specific detection for AI manipulation attempts
Coordinated Attacks: Pattern recognition across multiple clients
Evasion Attempts: Adaptive learning to counter sophisticated attackers

Privacy Considerations

Data Minimization: Only essential behavioral metrics are stored
Anonymization: Client profiles use hashed identifiers where possible
Retention Limits: Automatic cleanup of old behavioral data
GDPR Compliance: Support for data deletion requests

Version History

v1.0.0: Initial release with core ML abuse detection features
- Isolation Forest anomaly detection
- Basic LLM content detection
- Event-driven integration with rate limiting
- Real-time learning capabilities
- Tomb.Tomb integration for reliable goroutine management
- Comprehensive metrics and monitoring

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
mlabuse Package mlabuse provides a machine learning-based abuse detection plugin for Hockeypuck	Package mlabuse provides a machine learning-based abuse detection plugin for Hockeypuck

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

ML Abuse Detection Plugin

Overview

Key Features

Advanced Detection Capabilities

Machine Learning Models

Intelligence Coordination

Configuration

Basic Configuration

Advanced Configuration

Configuration Parameters

API Endpoints

Status and Monitoring

GET /api/ml/status

GET /api/ml/metrics

POST /api/ml/analyze

Detection Capabilities

Anomaly Types

Behavioral Analysis Metrics

Session Pattern Analysis

Entropy Metrics

LLM/AI Detection

Detection Patterns

AI Content Indicators

Response Headers

Integration with Other Plugins

Rate Limiting Integration

Zero Trust Integration

Event System Integration

Published Events

Subscribed Events

Performance Considerations

Resource Usage

Scaling Recommendations

Small Deployments (<1000 req/min)

Medium Deployments (1000-10000 req/min)

Large Deployments (>10000 req/min)

Best Practices

Configuration Tuning

Operational Guidelines

Security Configuration

Troubleshooting

High False Positive Rate

Model Performance Issues

Detection Accuracy Problems

Integration Issues

Monitoring and Alerting

Key Metrics to Monitor

Recommended Alerts

Security Considerations

Threat Model

Privacy Considerations

Version History

Documentation ¶

Source Files ¶

Directories ¶

GET `/api/ml/status`

GET `/api/ml/metrics`

POST `/api/ml/analyze`