presidioanonymizer

package
v0.2.13 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 19, 2025 License: Apache-2.0 Imports: 8 Imported by: 0

README


title: Presidio Anonymizer Plugin

Overview

This plugin integrates with Microsoft's Presidio to analyze and anonymize sensitive data in your fields. The plugin uses two Presidio services:

  • Analyzer API for detecting PII entities
  • Anonymizer API for anonymizing detected entities

This plugin integrates with Microsoft's Presidio Anonymizer API to anonymize sensitive data in your fields. The Presidio anonymizer is module for anonymizing detected PII text entities with desired values.

Microsoft's Presidio demo

Configuration

presidio_anonymizer:
    anonymize_url: http://localhost:8080/anonymize
    analyzer_url: http://localhost:8080/analyze
    language: en
    hash_type: md5    # Optional, used for hash operator
    encrypt_key: ""    # Optional, used for encrypt operator
    anonymizer_rules:
      - type: EMAIL_ADDRESS
        operator: mask
        masking_char: "*"
        chars_to_mask: 4
      - type: PERSON
        operator: replace
        new_value: "[REDACTED]"
      - type: PHONE_NUMBER
        operator: hash
      - type: CREDIT_CARD
        operator: encrypt

Configuration Parameters

  • anonymize_url: Required. The URL of your Presidio Anonymizer API endpoint.
  • analyzer_url: Required. The URL of your Presidio Analyzer API endpoint.
  • language: Optional. Language for the analyzer (default: "en").
  • hash_type: Optional. Hash algorithm for "hash" operator (e.g., "md5", "sha256").
  • encrypt_key: Optional. Encryption key for "encrypt" operator.
  • anonymizer_rules: List of anonymization rules that will be applied to detected entities.

Each rule contains:

  • type: The type of PII to detect (e.g., "PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER", etc.)
  • operator: The anonymization operation. Supported values:
    • mask: Mask the value with a character
    • replace: Replace with a new value
    • hash: Hash the value using specified algorithm
    • encrypt: Encrypt the value using provided key
  • masking_char: Used with "mask" operator - the character to use for masking
  • chars_to_mask: Used with "mask" operator - number of characters to mask
  • new_value: Used with "replace" operator - the value to replace the detected PII with

Example

Input:

{
  "email": "john.doe@example.com",
  "name": "John Doe",
  "phone": "+1-555-123-4567",
  "description": "Contact John Doe at john.doe@example.com or +1-555-123-4567"
}

Output:

{
  "email": "****.doe@example.com",
  "name": "John Doe",
  "phone": "+1-555-123-4567",
  "description": "Contact <PERSON> at ****.doe@example.com or +<IN_PAN>4567"
}

Notes

  1. The plugin first uses Presidio Analyzer to detect PII entities in the text
  2. Then it applies the configured anonymization rules to the detected entities
  3. If no PII is detected, the original data is returned unchanged
  4. Each anonymization operator requires specific parameters:
    • mask: requires masking_char and chars_to_mask
    • replace: requires new_value
    • hash: uses global hash_type configuration
    • encrypt: uses global encrypt_key configuration
  5. The anonymization is applied to all detected entities of the specified type in the text

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func New

func New(config Config) (plugins.Interceptor, error)

Types

type AnonymizerRule

type AnonymizerRule struct {
	// Type of PII to detect (e.g. "PERSON", "PHONE_NUMBER", etc.)
	Type string `json:"type" yaml:"type"`
	// Operator defines the anonymization operation ("mask", "replace", "hash", "encrypt")
	Operator string `json:"operator" yaml:"operator"`
	// NewValue is used with "replace" operator
	NewValue string `json:"new_value,omitempty" yaml:"new_value,omitempty"`
	// MaskingChar is used with "mask" operator
	MaskingChar string `json:"masking_char,omitempty" yaml:"masking_char,omitempty"`
	// CharsToMask is used with "mask" operator
	CharsToMask int `json:"chars_to_mask,omitempty" yaml:"chars_to_mask,omitempty"`
}

AnonymizerRule defines how to anonymize a specific type of PII

type Config

type Config struct {
	// AnonymizeURL is the URL of the Presidio Anonymizer API
	AnonymizeURL string `json:"anonymize_url" yaml:"anonymize_url"`
	// AnalyzerURL is the URL of the Presidio Analyzer API
	AnalyzerURL string `json:"analyzer_url" yaml:"analyzer_url"`
	// Language is the language used for analysis (default: "en")
	Language string `json:"language" yaml:"language"`
	// HashType is the hash algorithm used for hash operator (e.g., "md5", "sha256")
	HashType string `json:"hash_type" yaml:"hash_type"`
	// EncryptKey is the key used for encrypt operator
	EncryptKey string `json:"encrypt_key" yaml:"encrypt_key"`
	// AnonymizerRules defines the anonymization rules that apply to detected entities
	AnonymizerRules []AnonymizerRule `json:"anonymizer_rules" yaml:"anonymizer_rules"`
}

Config represents the configuration for the Presidio Anonymizer plugin

func (Config) Doc

func (c Config) Doc() string

func (Config) Tag

func (c Config) Tag() string

type Plugin

type Plugin struct {
	// contains filtered or unexported fields
}

func (*Plugin) Doc

func (p *Plugin) Doc() string

func (*Plugin) Process

func (p *Plugin) Process(data map[string]any, _ map[string][]string) (processed map[string]any, skipped bool)

type PresidioAnonymizer

type PresidioAnonymizer struct {
	Type        string `json:"type"`
	NewValue    string `json:"new_value,omitempty"`
	MaskingChar string `json:"masking_char,omitempty"`
	CharsToMask int    `json:"chars_to_mask,omitempty"`
	HashType    string `json:"hash_type,omitempty"`
	CryptoKey   string `json:"key,omitempty"`
	FromEnd     bool   `json:"from_end"`
}

PresidioAnonymizer represents configuration for anonymization

type PresidioClient

type PresidioClient struct {
	// contains filtered or unexported fields
}

PresidioClient handles communication with Presidio API endpoints

func NewPresidioClient

func NewPresidioClient(analyzerURL, anonymizerURL string) *PresidioClient

NewPresidioClient creates a new instance of PresidioClient

func (*PresidioClient) Analyze

func (c *PresidioClient) Analyze(text string, templates []analyzeTemplate, language string) ([]analyzeResult, error)

Analyze sends request to Presidio Analyzer API

func (*PresidioClient) Anonymize

func (c *PresidioClient) Anonymize(text string, anonymizers map[string]PresidioAnonymizer, analyzerResults []analyzeResult) (string, error)

Anonymize sends request to Presidio Anonymizer API

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL