Skip to content

Yxniy/Sentinel

Repository files navigation

SENTINEL: Distributed Financial Anomaly & Surveillance Engine

Status Architecture Focus

🎥 Live System Demo

Demo.mp4

📡 Mission Overview

Sentinel is a distributed data intelligence platform designed to ingest high-velocity financial streams, resolve semantic entities, and autonomously detect operational anomalies.

Unlike traditional rule-based monitoring, Sentinel utilizes Unsupervised Learning (Isolation Forests) and Geospatial Kinetic Analysis to identify non-linear threats (e.g., "Impossible Travel," "Behavioral Drift") in real-time, triggering context-aware interventions via an Agentic Supervisor.

🏗 High-Level Architecture

The system follows a strict Medallion Architecture (Bronze -> Silver -> Gold) to ensure data lineage and auditability.

graph TD
    A[Ingest: Plaid Stream] -->|Raw JSON| B[(Bronze: Data Lake)]
    B -->|PySpark: Entity Resolution| C[(Silver: Structured Delta)]
    C -->|PySpark: Kinetic Velocity & ML| D[(Gold: Anomaly Features)]
    D -->|Trigger| E[Agent Supervisor]
    E <-->|RAG Query| F[(Vector Memory: FAISS)]
    E -->|Operational Output| G[Intervention Logic]
Loading

⚡ Core Capabilities

1. Entity Resolution (The "Ontology" Layer)

  • Challenge: Raw transaction logs are unstructured and noisy (e.g., SQ *JOES, UBR* PENDING).
  • Solution: Implemented a Semantic Vector Pipeline using OpenAI Embeddings (text-embedding-3-small) to map dirty inputs to a canonical ontology.
  • Result: Achieved 92% entity matching accuracy on zero-shot examples without regex hardcoding.

2. Kinetic Velocity Detection (The "Defense" Layer)

  • Challenge: Detecting physical security breaches in financial patterns.
  • Solution: Engineered a PySpark pipeline to calculate Haversine Distance between sequential transaction nodes.
  • Logic: Flags events where Velocity > 800km/h (Impossible Travel), simulating compromised credentials or physical threats.

3. Unsupervised Anomaly Detection

  • Challenge: Static thresholds ($ > $500) fail to capture behavioral nuances.
  • Solution: Deployed K-Means Clustering and Isolation Forests to model "Normal Pattern of Life."
  • Outcome: System autonomously flags deviations (e.g., 3 AM spending bursts) based on distance from cluster centroids.

4. Agentic RAG Intervention

  • Challenge: Alerts lack context (Alert Fatigue).
  • Solution: A LangGraph Supervisor retrieves historical user context ("User vowed to avoid nightclubs") from a FAISS Vector Store before generating alerts.
  • Effect: Transforms raw data alerts into actionable, context-aware intelligence.

🛠 Tech Stack

  • Compute: Apache Spark (PySpark) for distributed ETL.
  • Intelligence: OpenAI GPT-4o, LangGraph, FAISS (Vector Store).
  • Orchestration: Azure Data Factory (Simulated).
  • Backend: FastAPI (Async Python 3.10+).

🚀 Deployment

# 1. Initialize the Knowledge Base
python etl/resolve_entities.py

# 2. Train the Anomaly Models
python etl/train_anomaly.py

# 3. Start the Surveillance Node
uvicorn backend.main:app --reload

Built as a simulation of Palantir Foundry / Gotham architecture capabilities.

About

Sentinel: Distributed Financial Surveillance Engine (PySpark, Vector RAG, Agentic AI).

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published