Skip to content

DeepExtrema/Donna

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

100 Commits
 
 
 
 
 
 

Repository files navigation

Donna - Autonomous Invoice Fraud Detection & Verification

Donna is an AI-powered system that automatically monitors Gmail inboxes for invoice and billing emails, detects potential fraud, and verifies suspicious invoices by making intelligent phone calls to the companies that issued them.

🎯 What Donna Does

Donna protects individuals and businesses from invoice fraud by:

  1. Monitoring Gmail - Automatically scans incoming emails for invoices, bills, and receipts
  2. Fraud Detection - Uses AI to analyze domain legitimacy, company information, and billing patterns
  3. Online Verification - Searches Google to verify company details (phone, address, website)
  4. Intelligent Calling - Makes automated phone calls via ElevenLabs AI to verify suspicious invoices
  5. Comprehensive Logging - Records all decisions and verification attempts for audit trails

🚀 Key Features

Email Processing

  • Gmail Integration - OAuth-based access to user's Gmail inbox
  • Intelligent Filtering - Identifies invoice, bill, and receipt emails using AI classification
  • Attachment Parsing - Extracts data from PDF invoices and attachments
  • Real-time Monitoring - Gmail push notifications via Pub/Sub for instant processing

Fraud Detection

  • Domain Analysis - Checks for suspicious domains, typosquatting, and homograph attacks
  • Company Verification - Validates against whitelisted company database
  • Google Search Integration - Finds and verifies company information online
  • Confidence Scoring - Assigns confidence levels to verification results

Automated Verification

  • AI Voice Agent - ElevenLabs conversational AI makes verification calls
  • Dynamic Context - Injects user and invoice details into call scripts
  • Twilio Integration - Reliable phone call delivery and recording
  • Call Transcripts - Maintains records of all verification conversations

Dashboard & Monitoring

  • Next.js Web App - Modern React-based user interface
  • Real-time Updates - Live fraud detection results
  • Audit Logs - Complete history of all verification decisions
  • Company Profiles - Visual display of verified billers with logos

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                         User's Gmail                         │
│                    (Invoices & Bills)                        │
└────────────────────────────┬────────────────────────────────┘
                             │
                    ┌────────▼────────┐
                    │   Gmail Watch   │
                    │  (Push Notifications) │
                    └────────┬────────┘
                             │
┌────────────────────────────▼────────────────────────────────┐
│                     FastAPI Backend                          │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Email Processing                                     │  │
│  │  - Invoice Extraction (Gemini AI)                    │  │
│  │  - Attachment Parsing (PDF, images)                  │  │
│  │  - Biller Profile Extraction                         │  │
│  └──────────────────────────────────────────────────────┘  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Fraud Detection Engine                              │  │
│  │  - Domain Legitimacy Checker                         │  │
│  │  - Company Database Verification                     │  │
│  │  - Google Search Integration                         │  │
│  │  - ML-based Email Classification                     │  │
│  └──────────────────────────────────────────────────────┘  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Verification Agent                                   │  │
│  │  - ElevenLabs AI Agent                               │  │
│  │  - Twilio Call Orchestration                         │  │
│  │  - Dynamic Variable Injection                        │  │
│  └──────────────────────────────────────────────────────┘  │
└────────────────────────────┬────────────────────────────────┘
                             │
┌────────────────────────────▼────────────────────────────────┐
│                      Supabase Database                       │
│  - User Profiles                                            │
│  - Company Whitelist                                        │
│  - Fraud Detection Logs                                     │
│  - OAuth Tokens                                             │
│  - Gmail Watch Subscriptions                                │
└─────────────────────────────────────────────────────────────┘
                             │
┌────────────────────────────▼────────────────────────────────┐
│                       Next.js Frontend                       │
│  - User Dashboard                                           │
│  - Company Profiles View                                    │
│  - Fraud Alert Monitoring                                   │
│  - OAuth Authentication                                     │
└─────────────────────────────────────────────────────────────┘

🛠️ Technology Stack

Backend (FastAPI)

  • FastAPI - Modern Python web framework
  • Pydantic - Data validation and settings management
  • Supabase - PostgreSQL database and authentication
  • Google APIs - Gmail, Google Custom Search, Gemini AI
  • ElevenLabs - Conversational AI for phone calls
  • Twilio - Phone call infrastructure
  • scikit-learn - Machine learning for email classification

Frontend (Next.js)

  • Next.js 15 - React framework with App Router
  • TypeScript - Type-safe development
  • Tailwind CSS - Utility-first styling
  • Radix UI - Accessible component primitives
  • Supabase SSR - Server-side rendering with Supabase
  • Recharts - Data visualization

Infrastructure

  • Supabase - Database, Auth, and Real-time subscriptions
  • Google Cloud - Gmail API, Pub/Sub, Search API
  • ElevenLabs - AI voice agent platform
  • Twilio - Telephony infrastructure

📋 Prerequisites

  • Python 3.10+
  • Node.js 18+
  • Supabase account
  • Google Cloud Platform account (with Gmail API and Custom Search enabled)
  • ElevenLabs account (for AI calling)
  • Twilio account (for phone infrastructure)

🔧 Installation & Setup

1. Clone the Repository

git clone https://github.com/DeepExtrema/Donna.git
cd Donna

2. Backend Setup

cd api
pip install -r requirements.txt

Create .env file in the api directory:

# Supabase
SUPABASE_URL=your_supabase_url
SUPABASE_SERVICE_KEY=your_supabase_service_key

# API Authentication
API_TOKEN=your_api_token

# Google OAuth
GOOGLE_CLIENT_ID=your_google_client_id
GOOGLE_CLIENT_SECRET=your_google_client_secret

# Google Custom Search
GOOGLE_SEARCH_API_KEY=your_google_search_api_key
GOOGLE_SEARCH_ENGINE_ID=your_search_engine_id

# Gemini AI
GEMINI_API_KEY=your_gemini_api_key

# ElevenLabs
ELEVENLABS_API_KEY=your_elevenlabs_api_key

# Twilio
TWILIO_ACCOUNT_SID=your_twilio_account_sid
TWILIO_AUTH_TOKEN=your_twilio_auth_token
TWILIO_PHONE_NUMBER=your_twilio_phone_number

Optional environment variables (used by specific services with defaults):

# ElevenLabs Agent Configuration (used by conversational router)
ELEVENLABS_AGENT_ID=agent_2601k6rm4bjae2z9amfm5w1y6aps  # Default agent ID
ELEVENLABS_PHONE_NUMBER_ID=phnum_4801k6sa89eqfpnsfjsxbr40phen  # Default phone ID

3. Frontend Setup

cd webapp
npm install

Create .env.local file in the webapp directory:

NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_supabase_anon_key
NEXT_PUBLIC_API_URL=http://localhost:8000

4. Database Setup

Run the necessary Supabase migrations to create tables:

  • profiles - User profiles with company information
  • companies - Whitelisted company database
  • email_fraud_logs - Fraud detection audit logs
  • gmail_watch_subscriptions - Gmail push notification subscriptions

🚀 Running the Application

Start Backend

cd api
uvicorn main:app --reload --host 0.0.0.0 --port 8000

The API will be available at http://localhost:8000

API Documentation: http://localhost:8000/docs

Start Frontend

cd webapp
npm run dev

The web app will be available at http://localhost:3000

📚 API Endpoints

Health & Authentication

  • GET /health - Health check
  • GET / - Root endpoint

OAuth & Gmail

  • POST /oauth/store - Store OAuth tokens for a user
  • POST /oauth/webhook/supabase - Supabase OAuth webhook handler
  • POST /emails/fetch - Fetch user's invoice emails
  • POST /gmail/watch/setup - Setup Gmail push notifications for a user
  • POST /pubsub/gmail/push - Gmail push notification webhook from Google Pub/Sub

Fraud Detection

  • POST /fraud/analyze - Analyze single email for fraud
  • POST /fraud/analyze-batch - Batch analyze multiple emails
  • POST /fraud/verify-online - Verify company online via Google Search
  • POST /fraud/analyze-domain - Analyze domain legitimacy

Phone Verification

  • POST /call/conversational - Initiate AI verification call

🎯 How It Works

1. Email Monitoring

# User authenticates with Gmail OAuth
# Backend subscribes to Gmail push notifications
# New invoice emails trigger instant processing

2. Fraud Detection Pipeline

Incoming Email
    ↓
AI Classification (Bill/Receipt/Other)
    ↓
Domain Legitimacy Check
    ↓
Company Database Verification
    ↓
[Not Found] → Google Search
    ↓
[Phone Found + Low Confidence] → AI Phone Call
    ↓
Decision: LEGIT / FRAUD / CALL / PENDING
    ↓
Log to Database + Notify User

3. AI Verification Call

When a suspicious invoice is detected:

  1. Google Search finds the company's phone number
  2. ElevenLabs Agent is configured with:
    • Company name and contact info
    • User's details (from profiles table)
    • Invoice information (amount, date, etc.)
  3. Call is initiated via Twilio
  4. Conversation is recorded and transcribed
  5. Result is logged for audit

Example Call Script

Donna: "Hi, this is Donna calling on behalf of John Smith from Acme Corp. 
        I'm helping them verify an invoice email they received from your 
        company at [email protected]. Is this the right department?"

Agent: "Yes, this is billing."

Donna: "Great! John received invoice #12345 for $150.50 dated October 5th. 
        Can you confirm this invoice was sent by your company?"

[Verification continues...]

🔐 Security & Privacy

Data Protection

  • OAuth 2.0 - Secure Gmail access with user consent
  • Token Encryption - Refresh tokens stored securely in Supabase
  • PII Minimization - Only necessary data is stored
  • Audit Logging - Complete trail of all verification activities

Compliance

  • GDPR Compliant - User data handling and retention policies
  • Call Recording Consent - Disclosure at start of every call
  • Data Retention - Configurable retention periods
  • No Payment Data - No credit card or banking information stored

API Security

  • API Token Authentication - Required for all protected endpoints
  • CORS Protection - Restricted origins
  • Rate Limiting - Protection against abuse

📊 Fraud Detection Logic

Verification Status Types

Status Meaning Action
legit Company verified in database or high-confidence online match ✅ Safe to pay
fraud Suspicious domain or failed verification ⛔ Block payment
call Phone verification initiated 📞 Waiting for call result
pending Insufficient data for decision ⏳ Human review needed

Confidence Scoring

  • ≥ 0.8 - High confidence (phone + address + email match)
  • 0.5 - 0.8 - Medium confidence (phone found, triggers call)
  • < 0.5 - Low confidence (insufficient data, marked pending)

🧪 Testing

Test Fraud Detection

cd api
python test_fraud_pipeline.py

Test ElevenLabs Integration

python test_integration.py "Shopify"

Test Real Phone Call

python test_real_call.py

Test Company Verification

python test_company_verification.py

📁 Project Structure

Donna/
├── api/                          # FastAPI Backend
│   ├── app/
│   │   ├── routers/             # API routes
│   │   │   ├── emails.py        # Email fetching endpoints
│   │   │   ├── fraud.py         # Fraud detection endpoints
│   │   │   ├── oauth.py         # OAuth handlers
│   │   │   ├── gmail_watch.py   # Gmail push subscriptions
│   │   │   └── pubsub.py        # Pub/Sub webhooks
│   │   ├── services/            # Business logic
│   │   │   ├── gmail_service.py
│   │   │   ├── invoice_extractor.py
│   │   │   ├── eleven_agent.py  # ElevenLabs AI calling
│   │   │   ├── google_search_service.py
│   │   │   ├── fraud_logger.py
│   │   │   └── biller_extraction.py
│   │   ├── database/            # Database clients
│   │   │   ├── supabase_client.py
│   │   │   ├── companies.py
│   │   │   └── gmail_watch.py
│   │   ├── auth/                # Authentication
│   │   │   └── authentication.py
│   │   ├── models/              # Pydantic models
│   │   │   └── schemas.py
│   │   └── config.py            # Configuration
│   ├── ml/                      # Machine learning
│   │   ├── email_classifier.py  # Email type classification
│   │   └── domain_checker.py    # Domain legitimacy
│   ├── main.py                  # FastAPI app entry point
│   ├── requirements.txt         # Python dependencies
│   └── test_*.py               # Test scripts
├── webapp/                       # Next.js Frontend
│   ├── src/
│   │   ├── app/
│   │   │   ├── dashboard/       # Main dashboard
│   │   │   ├── api/             # API routes
│   │   │   └── utils/           # Utilities
│   │   ├── components/          # React components
│   │   │   └── ui/              # UI primitives
│   │   └── lib/                 # Libraries
│   ├── package.json
│   └── next.config.ts
└── README.md                     # This file

🔍 Key Components

Email Classifier (ml/email_classifier.py)

Uses scikit-learn to classify emails as:

  • Invoice/Bill
  • Receipt
  • Other

Domain Checker (ml/domain_checker.py)

Sophisticated domain analysis including:

  • Typosquatting detection
  • Homograph attack detection
  • Domain reputation checking
  • Company database matching

ElevenLabs Agent (app/services/eleven_agent.py)

Manages AI verification calls with:

  • Dynamic variable injection
  • User context from profiles
  • Invoice details from emails
  • Call recording and transcription

Google Search Service (app/services/google_search_service.py)

Searches for company information:

  • Phone numbers
  • Addresses
  • Email addresses
  • Website URLs

🛣️ Roadmap

Near-term

  • Inbound verification (vendor calls back on verified number)
  • Call result parsing and analysis
  • Multi-language support for calls
  • Enhanced ML models for fraud detection
  • Webhook for call completion notifications

Medium-term

  • Risk-based routing (more checks for first-time vendors)
  • Admin UI for policy tuning
  • Call scheduling (business hours only)
  • Voice biometrics (privacy-vetted)
  • International phone number support

Long-term

  • Integration with payment systems
  • Automated payment approval/rejection
  • Vendor identity graph
  • Historical risk modeling
  • Mobile app

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is proprietary software. All rights reserved.

📞 Support

For issues or questions:

  1. Check the documentation in /api/INTEGRATION_GUIDE.md
  2. Review test scripts in /api/test_*.py
  3. Open an issue on GitHub

🙏 Acknowledgments

  • ElevenLabs - For powerful conversational AI
  • Google Cloud - For Gmail API and Search API
  • Supabase - For database and authentication infrastructure
  • Twilio - For reliable telephony infrastructure

Built with ❤️ to protect against invoice fraud

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 81.2%
  • TypeScript 17.8%
  • Other 1.0%