Skip to main content

Overview

KafkaCode uses multiple detection methods to identify privacy issues, secrets, and compliance violations in your source code.

Detection Categories

Secrets Detection

API keys, tokens, credentials

PII Detection

Personal identifiable information

Compliance Checks

GDPR, CCPA requirements

Context Analysis

AI-powered semantic analysis

Secrets Detection

Critical Level Secrets

Pattern: AKIA[0-9A-Z]{16}Example:
# ❌ Bad: Hardcoded AWS key
aws_access_key = "AKIAIOSFODNN7EXAMPLE"

# ✅ Good: Use environment variables
aws_access_key = os.getenv('AWS_ACCESS_KEY_ID')
Severity: Critical (100 points)
Pattern: -----BEGIN (RSA |EC )?PRIVATE KEY-----Example:
// ❌ Bad: Embedded private key
const privateKey = `-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEA...
-----END RSA PRIVATE KEY-----`;

// ✅ Good: Load from secure file
const privateKey = fs.readFileSync('/secure/path/key.pem');
Severity: Critical (100 points)
Pattern: sk_live_[0-9a-zA-Z]{24}Example:
// ❌ Bad: Hardcoded Stripe key
const stripe = require('stripe')('sk_live_51H...');

// ✅ Good: Use environment variable
const stripe = require('stripe')(process.env.STRIPE_SECRET_KEY);
Severity: Critical (100 points)
Pattern: Password/credentials in connection stringsExample:
# ❌ Bad: Hardcoded database password
DATABASE_URL = "postgresql://user:password123@localhost/db"

# ✅ Good: Use environment variables
DATABASE_URL = os.getenv('DATABASE_URL')
Severity: Critical (100 points)

High Level Secrets

Pattern: GitHub, GitLab, and other OAuth tokensExample:
// ❌ Bad: Hardcoded OAuth token
const token = 'ghp_1234567890abcdef';

// ✅ Good: Use secure storage
const token = await getSecureToken();
Severity: High (50 points)
Pattern: jwt_secret, JWT_SECRET assignmentsExample:
# ❌ Bad: Hardcoded JWT secret
JWT_SECRET = "mysecretkey123"

# ✅ Good: Use environment variable
JWT_SECRET = os.getenv('JWT_SECRET')
Severity: High (50 points)
Pattern: Generic API key patternsExample:
// ❌ Bad: Hardcoded API key
const apiKey = 'api_key_1234567890abcdef';

// ✅ Good: Load from config
const apiKey = config.get('API_KEY');
Severity: High (50 points)

PII Detection

Medium Level PII

Pattern: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}Example:
// ❌ Bad: Hardcoded email
const adminEmail = "[email protected]";

// ✅ Good: Use configuration
const adminEmail = config.get('ADMIN_EMAIL');
Severity: Medium (10 points)GDPR Consideration: Email addresses are PII under GDPR
Pattern: Various international formatsExample:
# ❌ Bad: Hardcoded phone number
support_phone = "+1-555-123-4567"

# ✅ Good: Use configuration
support_phone = config.SUPPORT_PHONE
Severity: Medium (10 points)CCPA Consideration: Phone numbers are personal information
Pattern: \d{3}-\d{2}-\d{4}Example:
// ❌ Bad: SSN in code
const testSSN = "123-45-6789";

// ✅ Good: Use mock data service
const testSSN = mockDataService.generateFakeSSN();
Severity: Critical (100 points)
Pattern: Luhn algorithm validated sequencesExample:
# ❌ Bad: Test credit card in code
test_card = "4111111111111111"

# ✅ Good: Use test mode tokens
test_card = stripe.Token.create_test_card()
Severity: Critical (100 points)

Low Level PII

Pattern: \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\bExample:
// ⚠️ Review: IP address in code
const serverIP = "192.168.1.100";

// ✅ Better: Use DNS names
const serverHost = "api.company.com";
Severity: Low (1 point)
Pattern: URLs containing /api/, /admin/, /secret/Example:
# ⚠️ Review: Sensitive URL in code
admin_url = "https://api.company.com/admin/users"

# ✅ Better: Use route constants
admin_url = f"{BASE_URL}/{ADMIN_ROUTES.users}"
Severity: Low (1 point)

High Entropy Strings

KafkaCode detects strings with high randomness that might be secrets:
// Calculate entropy
function calculateEntropy(str) {
  const freq = {};
  for (let c of str) {
    freq[c] = (freq[c] || 0) + 1;
  }

  let entropy = 0;
  for (let c in freq) {
    const p = freq[c] / str.length;
    entropy -= p * Math.log2(p);
  }

  return entropy;
}
Thresholds:
  • Entropy > 4.5 and length > 16: Potential secret
  • Entropy > 5.0 and length > 24: Likely secret
Example:
# High entropy string detected
secret = "x7K9mP2nQ8vL4wR6tY3zA1bC5dE0fG"  # Entropy: 4.8

# Lower entropy, likely not a secret
message = "hello world welcome back"  # Entropy: 3.2

Sensitive Keywords

Detection of sensitive data based on variable naming:
  • Critical Keywords
  • High Keywords
  • Medium Keywords
// These trigger CRITICAL alerts
const password = "secret123";
const privateKey = "...";
const secret = "...";
const token = "...";
const credential = "...";

Context-Aware Detection

The AI analyzer understands code context:

Example 1: Configuration vs Hardcoded

// ✅ Good: Configuration object
const config = {
  email: process.env.ADMIN_EMAIL,
  apiKey: process.env.API_KEY
};

// ❌ Bad: Hardcoded values
const config = {
  email: "[email protected]",
  apiKey: "1234567890abcdef"
};
The AI recognizes that hardcoded values are problematic while env vars are acceptable.

Example 2: Test Data vs Real Data

# ✅ Good: Clearly marked test data
TEST_EMAIL = "[email protected]"

# ❌ Bad: Looks like real data
admin_email = "[email protected]"
The AI understands context and reduces false positives for test data.

Example 3: Public vs Private

// ✅ Public info is okay
const publicKey = "-----BEGIN PUBLIC KEY-----...";

// ❌ Private key is critical
const privateKey = "-----BEGIN PRIVATE KEY-----...";

Compliance-Specific Detection

GDPR Compliance

Personal Data

  • Name, email, phone
  • IP addresses
  • Location data
  • Cookies with PII

Special Categories

  • Health data
  • Biometric data
  • Genetic data
  • Religious/political views

CCPA Compliance

Personal Information

  • Contact information
  • Financial information
  • Purchase history
  • Browsing history

Identifiers

  • Device IDs
  • IP addresses
  • Cookie IDs
  • Account usernames

False Positive Reduction

KafkaCode uses several techniques to reduce false positives:
1

Context Analysis

AI understands if a value is a placeholder, test data, or real credential
2

Assignment Context

Only flags sensitive keywords when they’re being assigned values
3

Environment Variable Detection

Recognizes when values come from env vars or config files
4

Comment Analysis

Understands # TODO or # FIXME comments that mention sensitive data

Best Practices

  • ✅ Use environment variables for all secrets
  • ✅ Store credentials in secure vaults (AWS Secrets Manager, etc.)
  • ✅ Use .env files with .gitignore
  • ✅ Rotate secrets regularly
  • ✅ Use different secrets for dev/staging/prod
  • ❌ Never commit secrets to version control
  • ❌ Don’t hardcode API keys or passwords
  • ❌ Don’t store PII unnecessarily
  • ❌ Don’t log sensitive information
  • ❌ Don’t share secrets in plain text

Next Steps