SemantIQ-M-Benchmarks

SemantIQ-M-Benchmarks is an open-source, reproducible framework for evaluating the cognitive and semantic capabilities of Multimodal AI models. It provides a unified CLI and Web UI to run rigorous benchmarks across Reasoning (SMF), Human-AI Symmetry (HACS), and Vision domains.

🚀 Key Features

SMF (Semantic Maturity Framework): Evaluate abstract reasoning, bias resilience, and knowledge boundaries.
HACS (Human-AI Cognitive Symmetry): Measure how well models align with human cognitive patterns.
Vision: Test Text-to-Image generation for attribute binding, spatial reasoning, and consistency.
Reproducible: Git-hashed datasets and deterministic scoring ensure your results are verifiable.
Privacy-First: Run everything locally. Your data stays on your machine.

📦 Quick Start (Local)

Prerequisites

Python ≥ 3.11
Node.js ≥ 18 (for Web UI)
Git

Installation

git clone https://github.com/kaveh8866/SemantIQ.git
cd SemantIQ

# Setup Backend
python -m venv venv
# Windows: .\venv\Scripts\activate
# Linux/Mac: source venv/bin/activate
pip install -e .[dev]

# Setup Frontend (Optional)
cd webapp && npm install && cd ..

Running a Benchmark

# Run the 'code_writer' benchmark using a dummy model (no API key needed)
semantiq pipeline run code_writer_v1 --provider dummy --model test-model

Viewing Results

semantiq ui serve
# Open http://localhost:5173

📚 Documentation

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md and our Code of Conduct.

📜 License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details. Data artifacts are licensed under CC-BY 4.0.

🔗 Citation

If you use this framework, please cite:

@software{semantiq_benchmarks,
  author = {{SemantIQ Research Team}},
  title = {SemantIQ-M-Benchmarks: A Unified Framework for Multimodal AI Evaluation},
  year = {2025},
  url = {https://github.com/kaveh8866/SemantIQ},
  version = {0.1.0}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
adapters		adapters
benchmarks		benchmarks
cli		cli
core		core
datasets		datasets
docs		docs
evaluation		evaluation
pipeline		pipeline
prompts		prompts
reports		reports
runs		runs
tests		tests
webapp		webapp
.env.example		.env.example
.gitignore		.gitignore
.zenodo.json		.zenodo.json
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GOVERNANCE.md		GOVERNANCE.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SemantIQ-M-Benchmarks

🚀 Key Features

📦 Quick Start (Local)

Prerequisites

Installation

Running a Benchmark

Viewing Results

📚 Documentation

🤝 Contributing

📜 License

🔗 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

kaveh8866/SemantIQ

Folders and files

Latest commit

History

Repository files navigation

SemantIQ-M-Benchmarks

🚀 Key Features

📦 Quick Start (Local)

Prerequisites

Installation

Running a Benchmark

Viewing Results

📚 Documentation

🤝 Contributing

📜 License

🔗 Citation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages