README
¶
NNTP Analyze Tool
A powerful command-line tool for analyzing NNTP newsgroups, providing detailed statistics about article distribution, sizes, dates, and caching performance.
Features
- Comprehensive Group Analysis: Analyze individual newsgroups or all groups at once
- Article Size Distribution: Detailed breakdown of article sizes across predefined ranges
- Date Range Analysis: Filter and analyze articles by date ranges
- Caching System: Intelligent caching for improved performance on subsequent analyses
- Multiple Export Formats: Export results in JSON or CSV format
- Batch Processing: Efficient processing of large newsgroups with configurable limits
Installation
go build -o nntp-analyze ./cmd/nntp-analyze
Usage
Basic Analysis
Analyze a single newsgroup:
./nntp-analyze -host news.example.com -port 119 -group alt.binaries.movies
Analyze All Groups
Use the special $all flag to analyze all groups in the database:
./nntp-analyze -host news.example.com -port 119 -group '$all'
Advanced Options
./nntp-analyze [OPTIONS]
Required:
-host string NNTP server hostname
-port int NNTP server port (default: 119)
-group string Newsgroup name to analyze (use '$all' for all groups)
Authentication:
-username string NNTP username (if required)
-password string NNTP password (if required)
-ssl Use SSL/TLS connection
Analysis Options:
-force Force refresh of cached data
-max-articles int Maximum number of articles to analyze (0 = unlimited)
-start-date string Start date for analysis (YYYY-MM-DD format)
-end-date string End date for analysis (YYYY-MM-DD format)
-timeout int Connection timeout in seconds (default: 30)
Cache Management:
-clear-cache Clear cached data for the specified group
-cache-stats Show statistics for cached data only
-validate-cache Validate cache file integrity
Export Options:
-export string Export format: 'json' or 'csv'
Examples:
# Analyze with date filtering
./nntp-analyze -host news.server.com -group alt.test -start-date 2024-01-01 -end-date 2024-12-31
# Force refresh and limit analysis
./nntp-analyze -host news.server.com -group alt.test -force -max-articles 10000
# Export results to JSON
./nntp-analyze -host news.server.com -group alt.test -export json
# Analyze with SSL
./nntp-analyze -host ssl.news.server.com -port 563 -ssl -group alt.test
Output Format
The analyzer provides detailed statistics including:
Basic Statistics
- Group name and provider information
- Total article count and size
- Article number range (first to last)
- Date range (oldest to newest articles)
- Time span and articles per day
- Cache status and performance
Article Size Distribution
Articles are categorized into the following size ranges:
- < 4K: Small text posts, short messages
- 4K - 16K: Medium text posts, small attachments
- 16K - 32K: Large text posts, small binaries
- 32K - 64K: Medium binaries, multi-part text
- 64K - 128K: Large binaries, compressed files
- 128K - 256K: Very large binaries
- 256K - 512K: Huge binaries, large media files
- > 512K: Extremely large binaries
Each category shows:
- Absolute count with comma-separated formatting
- Percentage of total articles
- Human-readable size information
Example Output
=== Analysis Results ===
Group: alt.binaries.movies
Provider: news.example.com
Total Articles: 15,432
Total Bytes: 2.3 GB
Article Range: 1001 - 16432
Date Range: 2024-01-15 to 2024-12-28
Time Span: 348.0 days
Articles per Day: 44.3
Cached Articles: 15,432
Cache Exists: true
Analyzed At: 2024-12-28 14:30:25
=== Article Size Distribution ===
Total Articles: 15,432
Total Size: 2.3 GB
Average Size: 156.2 KB
< 4K : 1,245 ( 8.1%)
4K - 16K : 2,891 (18.7%)
16K - 32K : 3,567 (23.1%)
32K - 64K : 4,123 (26.7%)
64K - 128K : 2,234 (14.5%)
128K - 256K : 892 ( 5.8%)
256K - 512K : 345 ( 2.2%)
> 512K : 135 ( 0.9%)
Caching System
The analyzer uses an intelligent caching system to improve performance:
- Cache Location:
data/cache/{provider}/{sanitized_group_name}.overview - Cache Format: Tab-separated XOVER data for fast parsing
- Automatic Creation: Cache is created during first analysis
- Incremental Updates: Future analyses can reuse cached data
- Cache Validation: Built-in integrity checking
Cache Management Commands
# Clear cache for a specific group
./nntp-analyze -host news.server.com -group alt.test -clear-cache
# Show cache statistics without re-analyzing
./nntp-analyze -host news.server.com -group alt.test -cache-stats
# Validate cache file integrity
./nntp-analyze -host news.server.com -group alt.test -validate-cache
Date Filtering
The analyzer supports flexible date filtering:
# Analyze articles from a specific date range
./nntp-analyze -host news.server.com -group alt.test \
-start-date 2024-01-01 -end-date 2024-12-31
# Analyze articles from a specific date onwards
./nntp-analyze -host news.server.com -group alt.test \
-start-date 2024-06-01
# Analyze articles up to a specific date
./nntp-analyze -host news.server.com -group alt.test \
-end-date 2024-12-31
Export Formats
JSON Export
./nntp-analyze -host news.server.com -group alt.test -export json
Produces structured JSON output suitable for integration with other tools.
CSV Export
./nntp-analyze -host news.server.com -group alt.test -export csv
Produces comma-separated values suitable for spreadsheet applications.
Performance Considerations
- Large Groups: Analysis is limited to 10,000 articles by default for performance
- Batch Processing: Articles are processed in 10,000-article batches
- Connection Pooling: Uses efficient connection pooling for NNTP operations
- Memory Usage: Streaming processing keeps memory usage low
- Caching: Significantly reduces analysis time for repeated operations
Error Handling
The analyzer handles various error conditions gracefully:
- Network Issues: Automatic retry and timeout handling
- Invalid Dates: Malformed date headers are logged but don't stop analysis
- Missing Articles: Gaps in article sequences are handled transparently
- Cache Corruption: Automatic cache validation and rebuilding
Troubleshooting
Common Issues
- Connection Refused: Check host, port, and firewall settings
- Authentication Failed: Verify username and password
- Group Not Found: Ensure the newsgroup exists and is accessible
- Cache Issues: Use
-clear-cacheto rebuild corrupted cache files
Debug Information
The analyzer provides detailed logging during operation:
- Connection status and authentication
- Batch processing progress
- Date parsing warnings for malformed headers
- Cache operations and statistics
Integration
The analyzer can be integrated into larger workflows:
- Monitoring: Regular analysis of newsgroup activity
- Reporting: Automated generation of usage statistics
- Data Pipeline: CSV/JSON export for further processing
- Alerting: Detect unusual patterns in newsgroup activity
Contributing
When contributing to the analyzer:
- Test with various newsgroup types (text, binary, mixed)
- Ensure backward compatibility with existing cache files
- Add appropriate error handling for new features
- Update this README for new functionality
License
This tool is part of the go-pugleaf project. See the main project license for details.