Demo.mp4
Sentinel is a distributed data intelligence platform designed to ingest high-velocity financial streams, resolve semantic entities, and autonomously detect operational anomalies.
Unlike traditional rule-based monitoring, Sentinel utilizes Unsupervised Learning (Isolation Forests) and Geospatial Kinetic Analysis to identify non-linear threats (e.g., "Impossible Travel," "Behavioral Drift") in real-time, triggering context-aware interventions via an Agentic Supervisor.
The system follows a strict Medallion Architecture (Bronze -> Silver -> Gold) to ensure data lineage and auditability.
graph TD
A[Ingest: Plaid Stream] -->|Raw JSON| B[(Bronze: Data Lake)]
B -->|PySpark: Entity Resolution| C[(Silver: Structured Delta)]
C -->|PySpark: Kinetic Velocity & ML| D[(Gold: Anomaly Features)]
D -->|Trigger| E[Agent Supervisor]
E <-->|RAG Query| F[(Vector Memory: FAISS)]
E -->|Operational Output| G[Intervention Logic]
- Challenge: Raw transaction logs are unstructured and noisy (e.g.,
SQ *JOES,UBR* PENDING). - Solution: Implemented a Semantic Vector Pipeline using OpenAI Embeddings (
text-embedding-3-small) to map dirty inputs to a canonical ontology. - Result: Achieved 92% entity matching accuracy on zero-shot examples without regex hardcoding.
- Challenge: Detecting physical security breaches in financial patterns.
- Solution: Engineered a PySpark pipeline to calculate Haversine Distance between sequential transaction nodes.
- Logic: Flags events where
Velocity > 800km/h(Impossible Travel), simulating compromised credentials or physical threats.
- Challenge: Static thresholds ($ > $500) fail to capture behavioral nuances.
- Solution: Deployed K-Means Clustering and Isolation Forests to model "Normal Pattern of Life."
- Outcome: System autonomously flags deviations (e.g., 3 AM spending bursts) based on distance from cluster centroids.
- Challenge: Alerts lack context (Alert Fatigue).
- Solution: A LangGraph Supervisor retrieves historical user context ("User vowed to avoid nightclubs") from a FAISS Vector Store before generating alerts.
- Effect: Transforms raw data alerts into actionable, context-aware intelligence.
- Compute: Apache Spark (PySpark) for distributed ETL.
- Intelligence: OpenAI GPT-4o, LangGraph, FAISS (Vector Store).
- Orchestration: Azure Data Factory (Simulated).
- Backend: FastAPI (Async Python 3.10+).
# 1. Initialize the Knowledge Base
python etl/resolve_entities.py
# 2. Train the Anomaly Models
python etl/train_anomaly.py
# 3. Start the Surveillance Node
uvicorn backend.main:app --reloadBuilt as a simulation of Palantir Foundry / Gotham architecture capabilities.