Field Value/Description
Project Name UnDone Backend System Design
Company UnDone
Document Version 1
Date 14 Jun 25
Prepared By Disha
Contact Email
[email protected]Overview Backend design deliverables for Digital Footprint Discovery Service.
Reference Docs UD-001, UD-003, SysBE-001
Guiding Principles Security, Modularity, Scalability, Compliance, Clarity
Document ID SysBE-003
Version Date Author Changes Made
Initial draft created for Digital Fooprint
Draft 14/06/2025 Disha Discovery Service
1 16/06/2025 Disha
2 16/06/2025 Disha DFD/SFD Diagrams
Reviewer Name Changes Made - Reviewer
- -
DFD
Sequence Diagram
Database Schema
Ram Api Specs
Section Description/Guidance for Devs Owner (Initial) Version
Manages OSINT orchestration, risk scoring
Service Purpose and data/artifact classification and storage Disha Draft
Sherlock, Maigret, whois, Shodan,
HaveIBeenPwned API, requests,
BeautifulSoup, Scrapy, Custom Rule Engine,
Technology Stack scikit-learn, FastAPI, PostgreSQL, JWT, Redis Disha Draft
API Gateway, OSINT Data sources, Task
Queues, Report Generator, PostgreSQL,
Cache Layer, Notification Service,
Interactions Authentication Service Disha Draft
OSINT orchestration, scan jobs, risk scoring,
Key Responsibilities data classification, artifact storage Disha Draft
Comments
*Tentative - subject to changes.
Flow Name Description/Guidance for Devs Owner Version
(Initial)
User → API Gateway → Discovery Service →
Scan Initiation External Integration Gateway → Data Sources Disha 1
Scan Discovery Service → Store FootprintItems,
Completion Artifacts, Scores → Update ScanJob status Disha 1
User → API Gateway → Discovery Service →
Results Fetch DB → User (results, classification, risk) Disha 1
User → API Gateway → Discovery Service →
Results Fetch Aggregated Data → User Disha 1
Comments Diagram Link
https://drive.google.com/file/d/
1JgGcnYWabEoVV9K2hm02EbqZ9L7U_yqN/
Changes in the whole flow view?usp=sharing
https://drive.google.com/file/d/
1JgGcnYWabEoVV9K2hm02EbqZ9L7U_yqN/
Changes in the whole flow view?usp=sharing
https://drive.google.com/file/d/
1GuW0kbGMHIvJsukidr2ae9adzicXLbxD/view?
Changes in the whole flow usp=sharing
https://drive.google.com/file/d/
1GuW0kbGMHIvJsukidr2ae9adzicXLbxD/view?
Changes in the whole flow usp=sharing
Scenario Steps Outline (for Devs) Owner (Initial) Version
Client → API GW → Discovery Service (create
ScanJob) → Queue → Collector/Worker →
Scan Request External Gateway → Store Results Disha 1
Client → API GW → Discovery Service → Query
Results DB (FootprintItems, Scores, Artifacts) → API
Retrieval GW → Client Disha 1
On new data: Discovery Service → Risk Scoring
Engine → Update FootprintItem with
Risk Scoring risk_score Disha 1
On new data: Discovery Service →
Data Classification Engine → Tag FootprintItem
Classification (removable, poisonable, severity, etc.) Disha 1
Comments Diagram Link
https://drive.google.com/file/d/
1OftU8d8WKoCg3llzRAQztFDS8Hewps2j/
Whole Flow Changed view?usp=sharing
https://drive.google.com/file/d/
1OftU8d8WKoCg3llzRAQztFDS8Hewps2j/
Whole Flow Changed view?usp=sharing
https://drive.google.com/file/d/
1iRVIIQyWRK2LcOjOJmqnaeHpbv1rj17f/view?
Whole Flow Changed usp=sharing
https://drive.google.com/file/d/
1iRVIIQyWRK2LcOjOJmqnaeHpbv1rj17f/view?
Whole Flow Changed usp=sharing
Table Name Field Name Data Type Constraints Description
Unique scan
ScanJob scan_id UUID PK job
identifier
User who
user_id UUID FK (User) requested
the scan
queued,
in_progress,
status VARCHAR NOT NULL
completed,
failed
initial,
scan_type VARCHAR NOT NULL continuous,
adhoc
List of
target_ident targets
JSONB NOT NULL
ifiers (email,
phone, etc.)
Scan
DEFAULT
created_at TIMESTAMP creation
now()
time
Scan
completed_
TIMESTAMP NULLABLE completion
at
time
FootprintIt Unique item
item_id UUID PK
em identifier
FK Scan
scan_id UUID
(ScanJob) reference
User
user_id UUID FK (User)
reference
Source of
source_url VARCHAR NOT NULL
data
source_nam Name of
VARCHAR NOT NULL
e data source
e.g.,
source_cate social_medi
VARCHAR NOT NULL
gory a,
data_broker
Types of
data found
data_types_
VARCHAR[] NOT NULL (email,
found
address,
etc.)
Data
snippet TEXT
sample
High,
severity_lev
VARCHAR NOT NULL Medium,
el
Low
removable,
classificatio
VARCHAR NOT NULL poisonable,
n
info_only
Calculated
risk_score INTEGER NOT NULL
risk score
discovered_ DEFAULT
TIMESTAMP When found
at now()
active,
removal_pe
status VARCHAR NOT NULL nding,
removed,
etc.
Unique
Artifact artifact_id UUID PK artifact
identifier
FK Associated
item_id UUID (FootprintIt footprint
em) item
screenshot,
artifact_typ
VARCHAR NOT NULL html, pdf,
e
etc.
Storage
storage_url VARCHAR NOT NULL
location
Artifact
DEFAULT
created_at TIMESTAMP creation
now()
time
Unique data
DataSourc datasource_
UUID PK source
e id
identifier
Data source
name VARCHAR NOT NULL
name
Source
category VARCHAR NOT NULL
category
Source
website_url VARCHAR
website
discovery_ API, Scrape,
VARCHAR NOT NULL
method etc.
removal_me API, Form,
VARCHAR NOT NULL
thod Email, etc.
Endpoint Method Auth Required Description
/api/v1/scans POST Yes Initiate new scan job
Get scan job status and
/api/v1/scans/{scan_id} GET Yes
summary
Get scan results,
/api/v1/scans/{scan_id}/results GET Yes filterable by
severity/type
/api/v1/users/me/footprint/ Get aggregated
GET Yes
dashboard dashboard data for user
Download or view
/api/v1/artifacts/{artifact_id} GET Yes artifact (e.g.,
screenshot)
Request Body Example Response Example Owner (Initial) Version
{ "scan_id": "uuid", "status":
{ "target_identifiers": [{ "type": "email", "value": "[email protected]" }], "scan_type":
Disha "initial" } 1
"queued" }
{ "scan_id": "uuid", "status":
- Disha 1
"completed", ... }
{ "results": [ ... ], "pagination":
- Disha 1
{ ... } }
{ "summary": { ... }, "trends":
- Disha 1
[ ... ] }
File stream or { "storage_url":
- Disha 1
"..." }
Security Area Practice/Control Details/Guidance for Devs
Use OAuth2 / JWT tokens. Ensure role-based access: only
API Access Authentication & Authorization authorized users can initiate discovery/correlation tasks.
Implement per-user and per-IP rate limiting to prevent abuse
Rate Limiting API Throttling (e.g., 60 req/min).
Validate input_type and input_value against strict formats
Input Validation Sanitize Inputs (email, IP, username). Prevent injection attacks.
Never log full emails, usernames, or tokens. Use redaction or
Data Privacy Mask Sensitive Data in Logs masking in all logs.
Automatically delete or archive older discovery/correlation
Data Retention Expiry & Cleanup tasks after 30/90 days.
Use HTTPS/TLS for all external comms. Use AES-256 for
Data Encryption Encryption at Rest & Transit sensitive data at rest.
Use signed tokens or mTLS between internal services (e.g.,
Service Communication Internal Service Authentication Discovery ↔ Correlation).
Enforce scope boundaries to prevent crawling entire domains
Discovery Limits Scope Restriction unintentionally.
Validate and audit all third-party APIs used for footprint
Third-party APIs Usage Risk Assessment discovery. Rotate keys.
Avoid storing PII in Redis without encryption or TTL. Use secure
Cache Protection Secure Cache Storage channels.
Ensure only the user who initiated the correlation can access
Graph Exposure Access Control for Visual Graphs their visualized graph data.
Log all discovery/correlation actions with user ID, timestamp,
Audit Logging Activity Monitoring IP. Integrate with SIEM.
Owner (Initial) Version
Disha Draft
Disha Draft
Disha Draft
Disha Draft
Disha Draft
Disha Draft
Disha Draft
Disha Draft
Disha Draft
Disha Draft
Disha Draft
Disha Draft