Detection Methods

Overview

KafkaCode uses multiple detection methods to identify privacy issues, secrets, and compliance violations in your source code.

Detection Categories

Secrets Detection

API keys, tokens, credentials

PII Detection

Personal identifiable information

Compliance Checks

GDPR, CCPA requirements

Context Analysis

AI-powered semantic analysis

Secrets Detection

Critical Level Secrets

AWS Access Keys

Pattern: AKIA[0-9A-Z]{16}Example:

# ❌ Bad: Hardcoded AWS key
aws_access_key = "AKIAIOSFODNN7EXAMPLE"

# ✅ Good: Use environment variables
aws_access_key = os.getenv('AWS_ACCESS_KEY_ID')

Severity: Critical (100 points)

Private Keys

Pattern: -----BEGIN (RSA |EC )?PRIVATE KEY-----Example:

// ❌ Bad: Embedded private key
const privateKey = `-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEA...
-----END RSA PRIVATE KEY-----`;

// ✅ Good: Load from secure file
const privateKey = fs.readFileSync('/secure/path/key.pem');

Severity: Critical (100 points)

Stripe API Keys

Pattern: sk_live_[0-9a-zA-Z]{24}Example:

// ❌ Bad: Hardcoded Stripe key
const stripe = require('stripe')('sk_live_51H...');

// ✅ Good: Use environment variable
const stripe = require('stripe')(process.env.STRIPE_SECRET_KEY);

Severity: Critical (100 points)

Database Credentials

Pattern: Password/credentials in connection stringsExample:

# ❌ Bad: Hardcoded database password
DATABASE_URL = "postgresql://user:password123@localhost/db"

# ✅ Good: Use environment variables
DATABASE_URL = os.getenv('DATABASE_URL')

Severity: Critical (100 points)

High Level Secrets

OAuth Tokens

Pattern: GitHub, GitLab, and other OAuth tokensExample:

// ❌ Bad: Hardcoded OAuth token
const token = 'ghp_1234567890abcdef';

// ✅ Good: Use secure storage
const token = await getSecureToken();

Severity: High (50 points)

JWT Secrets

Pattern: jwt_secret, JWT_SECRET assignmentsExample:

# ❌ Bad: Hardcoded JWT secret
JWT_SECRET = "mysecretkey123"

# ✅ Good: Use environment variable
JWT_SECRET = os.getenv('JWT_SECRET')

Severity: High (50 points)

API Keys

Pattern: Generic API key patternsExample:

// ❌ Bad: Hardcoded API key
const apiKey = 'api_key_1234567890abcdef';

// ✅ Good: Load from config
const apiKey = config.get('API_KEY');

Severity: High (50 points)

PII Detection

Medium Level PII

Email Addresses

Pattern: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}Example:

// ❌ Bad: Hardcoded email
const adminEmail = "[email protected]";

// ✅ Good: Use configuration
const adminEmail = config.get('ADMIN_EMAIL');

Severity: Medium (10 points)GDPR Consideration: Email addresses are PII under GDPR

Phone Numbers

Pattern: Various international formatsExample:

# ❌ Bad: Hardcoded phone number
support_phone = "+1-555-123-4567"

# ✅ Good: Use configuration
support_phone = config.SUPPORT_PHONE

Severity: Medium (10 points)CCPA Consideration: Phone numbers are personal information

Social Security Numbers

Pattern: \d{3}-\d{2}-\d{4}Example:

// ❌ Bad: SSN in code
const testSSN = "123-45-6789";

// ✅ Good: Use mock data service
const testSSN = mockDataService.generateFakeSSN();

Severity: Critical (100 points)

Credit Card Numbers

Pattern: Luhn algorithm validated sequencesExample:

# ❌ Bad: Test credit card in code
test_card = "4111111111111111"

# ✅ Good: Use test mode tokens
test_card = stripe.Token.create_test_card()

Severity: Critical (100 points)

Low Level PII

IP Addresses

Pattern: \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\bExample:

// ⚠️ Review: IP address in code
const serverIP = "192.168.1.100";

// ✅ Better: Use DNS names
const serverHost = "api.company.com";

Severity: Low (1 point)

URLs with Sensitive Paths

Pattern: URLs containing /api/, /admin/, /secret/Example:

# ⚠️ Review: Sensitive URL in code
admin_url = "https://api.company.com/admin/users"

# ✅ Better: Use route constants
admin_url = f"{BASE_URL}/{ADMIN_ROUTES.users}"

Severity: Low (1 point)

High Entropy Strings

KafkaCode detects strings with high randomness that might be secrets:

// Calculate entropy
function calculateEntropy(str) {
  const freq = {};
  for (let c of str) {
    freq[c] = (freq[c] || 0) + 1;
  }

  let entropy = 0;
  for (let c in freq) {
    const p = freq[c] / str.length;
    entropy -= p * Math.log2(p);
  }

  return entropy;
}

Thresholds:

Entropy > 4.5 and length > 16: Potential secret
Entropy > 5.0 and length > 24: Likely secret

Example:

# High entropy string detected
secret = "x7K9mP2nQ8vL4wR6tY3zA1bC5dE0fG"  # Entropy: 4.8

# Lower entropy, likely not a secret
message = "hello world welcome back"  # Entropy: 3.2

Sensitive Keywords

Detection of sensitive data based on variable naming:

Critical Keywords
High Keywords
Medium Keywords

// These trigger CRITICAL alerts
const password = "secret123";
const privateKey = "...";
const secret = "...";
const token = "...";
const credential = "...";

Context-Aware Detection

The AI analyzer understands code context:

Example 1: Configuration vs Hardcoded

// ✅ Good: Configuration object
const config = {
  email: process.env.ADMIN_EMAIL,
  apiKey: process.env.API_KEY
};

// ❌ Bad: Hardcoded values
const config = {
  email: "[email protected]",
  apiKey: "1234567890abcdef"
};

The AI recognizes that hardcoded values are problematic while env vars are acceptable.

Example 2: Test Data vs Real Data

# ✅ Good: Clearly marked test data
TEST_EMAIL = "[email protected]"

# ❌ Bad: Looks like real data
admin_email = "[email protected]"

The AI understands context and reduces false positives for test data.

Example 3: Public vs Private

// ✅ Public info is okay
const publicKey = "-----BEGIN PUBLIC KEY-----...";

// ❌ Private key is critical
const privateKey = "-----BEGIN PRIVATE KEY-----...";

Compliance-Specific Detection

Personal Data

Name, email, phone
IP addresses
Location data
Cookies with PII

Special Categories

Health data
Biometric data
Genetic data
Religious/political views

CCPA Compliance

Personal Information

Contact information
Financial information
Purchase history
Browsing history

Identifiers

Device IDs
IP addresses
Cookie IDs
Account usernames

False Positive Reduction

KafkaCode uses several techniques to reduce false positives:

Context Analysis

AI understands if a value is a placeholder, test data, or real credential

Assignment Context

Only flags sensitive keywords when they’re being assigned values

Environment Variable Detection

Recognizes when values come from env vars or config files

Comment Analysis

Understands # TODO or # FIXME comments that mention sensitive data

Best Practices

Do's

✅ Use environment variables for all secrets
✅ Store credentials in secure vaults (AWS Secrets Manager, etc.)
✅ Use .env files with .gitignore
✅ Rotate secrets regularly
✅ Use different secrets for dev/staging/prod

Don'ts

❌ Never commit secrets to version control
❌ Don’t hardcode API keys or passwords
❌ Don’t store PII unnecessarily
❌ Don’t log sensitive information
❌ Don’t share secrets in plain text

Next Steps

Privacy Grading

Understand how grades are calculated

Interpreting Results

Learn to read scan reports

Custom Patterns

Add your own detection rules

Examples

See real-world examples

Get Started

Core Concepts

Usage Guide

Advanced

Overview

Detection Categories

Secrets Detection

PII Detection

Compliance Checks

Context Analysis

Secrets Detection

Critical Level Secrets

High Level Secrets

PII Detection

Medium Level PII

Low Level PII

High Entropy Strings

Sensitive Keywords

Context-Aware Detection

Example 1: Configuration vs Hardcoded

Example 2: Test Data vs Real Data

Example 3: Public vs Private

Compliance-Specific Detection

Personal Data

Special Categories

CCPA Compliance

Personal Information

Identifiers

False Positive Reduction

Best Practices

Next Steps

Privacy Grading

Interpreting Results

Custom Patterns

Examples

Get Started

Core Concepts

Usage Guide

Advanced

​Overview

​Detection Categories

Secrets Detection

PII Detection

Compliance Checks

Context Analysis

​Secrets Detection

​Critical Level Secrets

​High Level Secrets

​PII Detection

​Medium Level PII

​Low Level PII

​High Entropy Strings

​Sensitive Keywords

​Context-Aware Detection

​Example 1: Configuration vs Hardcoded

​Example 2: Test Data vs Real Data

​Example 3: Public vs Private

​Compliance-Specific Detection

​GDPR Compliance

Personal Data

Special Categories

​CCPA Compliance

Personal Information

Identifiers

​False Positive Reduction

​Best Practices

​Next Steps

Privacy Grading

Interpreting Results

Custom Patterns

Examples

Overview

Detection Categories

Secrets Detection

Critical Level Secrets

High Level Secrets

PII Detection

Medium Level PII

Low Level PII

High Entropy Strings

Sensitive Keywords

Context-Aware Detection

Example 1: Configuration vs Hardcoded

Example 2: Test Data vs Real Data

Example 3: Public vs Private

Compliance-Specific Detection

GDPR Compliance

CCPA Compliance

False Positive Reduction

Best Practices

Next Steps