Zendesk Ticket Summarizer

A terminal-based application that fetches Zendesk support tickets, uses AI (Azure OpenAI GPT-4o or Google Gemini) to generate comprehensive summaries, and provides flexible analysis capabilities including POD categorization and Diagnostics gap analysis for product insights.

Features

Ticket Fetching & Synthesis

Fetches complete ticket data from Zendesk (subject, description, all comments, custom fields)
Uses LLM to synthesize:
- Issue reported (one-liner)
- Root cause (one-liner, cross-validated against support agent's root cause)
- Summary (3-4 line paragraph with key turning points)
- Resolution (one-liner)
Structured outputs via Pydantic schemas — JSON enforced at generation time (no fragile regex)
Comment threads formatted with structural markers [Comment 3/12 | Agent | Day 3]

POD Categorization

Categorizes tickets into 13 PODs using LLM-based analysis
3 few-shot worked examples for ambiguous edge cases (WFE vs Guidance, etc.)
Binary confidence scoring ("confident" vs "not confident")
Suggests alternative PODs when ambiguous

Diagnostics Gap Analysis

Analyzes if Whatfix's "Diagnostics" feature was used in troubleshooting
Evaluates if Diagnostics COULD have helped: split into Triage (identification) and Fix (resolution)
Gap area taxonomy with 11 triage + 10 fix predefined categories
Anti-hallucination rules with evidence citation requirement
Ternary assessment ("yes", "no", "maybe") with confidence scoring

Multi-Model LLM Support

Azure OpenAI GPT-4o (primary, enterprise) — 10 concurrent requests, no delays
Google Gemini (secondary, free tier) — configurable model via .env
Provider-aware rate limiting and concurrency
Shared rate limiter prevents doubling request rate in "both" analysis mode
Native structured outputs on both providers

Prerequisites

Python 3.12+
Zendesk account with API access
At least one LLM provider:
- Azure OpenAI (recommended for 200+ tickets)
- Google Gemini API key (free tier, good for small samples)

Installation

Clone the repository:

git clone https://github.com/R-eehan/ticket-summarizer.git
cd ticket-summarizer

Set up conda environment:

conda create -n ticket-summarizer python=3.12 -y
conda activate ticket-summarizer
pip install -r requirements.txt

Configure environment variables:

cp .env.example .env

Edit .env with your credentials:

Required (Zendesk):

ZENDESK_API_KEY=your_zendesk_api_token
ZENDESK_SUBDOMAIN=whatfix
ZENDESK_EMAIL=[email protected]

Azure OpenAI (recommended):

AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_KEY=your_azure_api_key
AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o
AZURE_OPENAI_API_VERSION=2024-10-21

Gemini (optional, for quality comparison):

GEMINI_API_KEY=your_gemini_api_key
GEMINI_MODEL=gemini-2.5-flash
# Options: gemini-2.5-flash (stable), gemini-3.1-flash-lite-preview (newest free tier)

Note: You can run Azure-only without setting GEMINI_API_KEY.

Usage

python main.py --input <csv_path> --analysis-type <pod|diagnostics|both> [--model-provider <gemini|azure>]

Examples

# Diagnostics analysis with Azure (recommended for bulk)
python main.py --input tickets.csv --analysis-type diagnostics --model-provider azure

# POD categorization with Azure
python main.py --input tickets.csv --analysis-type pod --model-provider azure

# Both analyses in parallel
python main.py --input tickets.csv --analysis-type both --model-provider azure

# Small sample with Gemini (free tier, quality comparison)
python main.py --input test_5_tickets.csv --analysis-type diagnostics --model-provider gemini

CLI Parameters

Parameter	Required	Options	Description
`--input`	Yes	File path	CSV with ticket IDs
`--analysis-type`	Yes	`pod`, `diagnostics`, `both`	Analysis mode
`--model-provider`	No	`gemini` (default), `azure`	LLM provider

Provider Comparison

Factor	Azure OpenAI (Recommended)	Gemini Free Tier
Concurrency	10 concurrent (configurable)	1 sequential
Speed	~15-20 min for 500 tickets	~60+ min for 500 tickets
Rate Limits	300+ RPM (Tier 1)	10-15 RPM
Daily Limit	Unlimited (pay-per-use)	250-1000 RPD
Best For	Production runs (200-500 tickets)	Quality comparison (30-50 tickets)
Cost	Enterprise pricing	Free

Input CSV Format

Two formats auto-detected:

# Format 1: Serial No + Ticket ID
Serial No,Ticket ID
1,78788
2,78969

# Format 2: Zendesk Tickets ID only
Zendesk Tickets ID
78788
78969

Output

Generates timestamped JSON + CSV files:

POD Mode: output_pod_YYYYMMDD_HHMMSS.json + .csv
Diagnostics Mode: output_diagnostics_YYYYMMDD_HHMMSS.json + .csv
Both Mode: Both file pairs generated in parallel

CSV is designed for Excel pivot table analysis. JSON contains full structured data.

Configuration

Key settings in config.py and .env:

Setting	Default	Description
`AZURE_MAX_CONCURRENT`	10	Concurrent Azure API calls
`AZURE_REQUEST_DELAY`	0	Seconds between Azure calls
`GEMINI_MAX_CONCURRENT`	1	Concurrent Gemini API calls
`GEMINI_REQUEST_DELAY`	7	Seconds between Gemini calls (10 RPM limit)
`GEMINI_MODEL`	`gemini-2.5-flash`	Gemini model identifier
`AZURE_OPENAI_API_VERSION`	`2024-10-21`	Azure API version
`ZENDESK_MAX_CONCURRENT`	10	Concurrent Zendesk API calls

All rate limiting settings are overridable via .env.

Architecture

main.py                  CLI orchestrator, shared semaphore
  |
config.py                Configuration, prompts, rate limits
schemas.py               Pydantic schemas for structured LLM outputs
  |
llm_provider.py          Factory pattern: Azure + Gemini providers
  |                      Native structured outputs on both
  |
fetcher.py               Zendesk API client (async, 10 concurrent)
synthesizer.py           Ticket synthesis (structured output)
categorizer.py           POD categorization (structured output, few-shot)
diagnostics_analyzer.py  Diagnostics gap analysis (structured output)
csv_exporter.py          CSV export for pivot tables
utils.py                 Logging, HTML stripping, validation

Troubleshooting

Error	Solution
`max_tokens unsupported`	Update `.env`: `AZURE_OPENAI_API_VERSION=2024-10-21`
`ZENDESK_API_KEY not set`	Add credentials to `.env`
`AZURE_OPENAI_ENDPOINT not set`	Add all 4 Azure vars to `.env`
Rate limiting (429)	Use Azure for bulk; reduce concurrency in `.env`
Ticket not found	Verify ticket IDs exist and you have Zendesk access

Check logs/app_YYYYMMDD.log for detailed debug info.

License

Internal use only - Whatfix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zendesk Ticket Summarizer

Features

Ticket Fetching & Synthesis

POD Categorization

Diagnostics Gap Analysis

Multi-Model LLM Support

Prerequisites

Installation

Usage

Examples

CLI Parameters

Provider Comparison

Input CSV Format

Output

Configuration

Architecture

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
docs		docs
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
categorizer.py		categorizer.py
config.py		config.py
csv_exporter.py		csv_exporter.py
diagnostics_analyzer.py		diagnostics_analyzer.py
fetcher.py		fetcher.py
llm_provider.py		llm_provider.py
main.py		main.py
requirements.txt		requirements.txt
schemas.py		schemas.py
synthesizer.py		synthesizer.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

Zendesk Ticket Summarizer

Features

Ticket Fetching & Synthesis

POD Categorization

Diagnostics Gap Analysis

Multi-Model LLM Support

Prerequisites

Installation

Usage

Examples

CLI Parameters

Provider Comparison

Input CSV Format

Output

Configuration

Architecture

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages