- [2024/11] I'm Spartacus, No, I'm Spartacus: Measuring and Understanding LLM Identity Confusion
- [2024/10] Good Parenting is all you need -- Multi-agentic LLM Hallucination Mitigation
- [2024/10] Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis
- [2024/10] Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG
- [2024/09] Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
- [2024/08] A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia
- [2024/06] Banishing LLM Hallucinations Requires Rethinking Generalization
- [2024/05] Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach
- [2024/05] Unveiling Hallucination in Text, Image, Video, and Audio Foundation Models: A Comprehensive Review
- [2024/05] Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval
- [2024/05] Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
- [2024/04] Mitigating LLM Hallucinations via Conformal Abstention
- [2024/04] Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
- [2024/04] Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations
- [2024/04] Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs
- [2024/04] Reducing hallucination in structured outputs via Retrieval-Augmented Generation
- [2024/04] PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics
- [2024/03] Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding
- [2024/03] HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
- [2024/03] The Human Factor in Detecting Errors of Large Language Models: A Systematic Literature Review and Future Research Directions
- [2024/03] Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases
- [2024/03] Detecting Hallucination and Coverage Errors in Retrieval Augmented Generation for Controversial Topics
- [2024/03] Truth-Aware Context Selection: Mitigating the Hallucinations of Large Language Models Being Misled by Untruthful Contexts
- [2024/03] Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models
- [2024/03] Tell me the truth: A system to measure the trustworthiness of Large Language Models
- [2024/03] ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models
- [2024/03] HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild
- [2024/03] Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
- [2024/03] In Search of Truth: An Interrogation Approach to Hallucination Detection
- [2024/03] DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models
- [2024/02] Reducing Hallucinations in Entity Abstract Summarization with Facts-Template Decomposition
- [2024/02] TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space
- [2024/02] Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding
- [2024/02] Measuring and Reducing LLM Hallucination without Gold-Standard Answers via Expertise-Weighting
- [2024/02] Comparing Hallucination Detection Metrics for Multilingual Generation
- [2024/02] Can We Verify Step by Step for Incorrect Answer Detection?
- [2024/02] Strong hallucinations from negation and how to fix them
- [2024/02] Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models
- [2024/02] Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance
- [2024/02] Can LLMs Produce Faithful Explanations For Fact-checking? Towards Faithful Explainable Fact-Checking via Multi-Agent Debate
- [2024/02] Understanding the Effects of Iterative Prompting on Truthfulness
- [2024/02] Is it Possible to Edit Large Language Models Robustly?
- [2024/02] C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models
- [2024/02] A Survey on Hallucination in Large Vision-Language Models
- [2024/01] Hallucination is Inevitable: An Innate Limitation of Large Language Models
- [2024/01] Mitigating Hallucinations of Large Language Models via Knowledge Consistent Alignment
- [2024/01] Large Language Models are Null-Shot Learners
- [2024/01] Model Editing Can Hurt General Abilities of Large Language Models
- [2024/01] Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty
- [2024/01] Seven Failure Points When Engineering a Retrieval Augmented Generation System
- [2023/12] DelucionQA: Detecting Hallucinations in Domain-specific Question Answering
- [2023/12] Improving Factual Error Correction by Learning to Inject Factual Errors
- [2023/12] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
- [2023/12] The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
- [2023/11] A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
- [2023/11] Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
- [2023/11] Deficiency of Large Language Models in Finance: An Empirical Examination of Hallucination
- [2023/11] Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus
- [2023/11] Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges
- [2023/11] Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-based Retrofitting
- [2023/11] UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation
- [2023/11] When Large Language Models contradict humans? Large Language Models' Sycophantic Behaviour
- [2023/11] Calibrated Language Models Must Hallucinate
- [2023/10] Explainable Claim Verification via Knowledge-Grounded Reasoning with Large Language Models
- [2023/10] Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
- [2023/09] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
- [2023/09] Beyond task performance: evaluating and reducing the flaws of large multimodal models with in-context-learning
- [2023/09] BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models
- [2023/09] Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources
- [2023/09] Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding
- [2023/09] Compressing LLMs: The Truth is Rarely Pure and Never Simple
- [2023/09] Conformal Language Modeling
- [2023/09] CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
- [2023/09] Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-Image Generation
- [2023/09] Do Large Language Models Know about Facts?
- [2023/09] Ferret: Refer and Ground Anything Anywhere at Any Granularity
- [2023/09] Fine-Tuning Language Models for Factuality
- [2023/09] INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection
- [2023/09] Lightweight Language Model Calibration for Open-ended Question Answering with Varied Answer Lengths
- [2023/09] MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
- [2023/09] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
- [2023/09] RAPPER: Reinforced Rationale-Prompted Paradigm for Natural Language Explanation in Visual Question Answering
- [2023/09] Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning
- [2023/09] Self-Contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation
- [2023/09] Supervised Knowledge Makes Large Language Models Better In-context Learners
- [2023/09] Teaching Language Models to Hallucinate Less with Synthetic Tasks
- [2023/09] Teaching Large Language Models to Self-Debug
- [2023/09] The Reasonableness Behind Unreasonable Translation Capability of Large Language Model
- [2023/09] Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph
- [2023/09] Unveiling and Manipulating Prompt Influence in Large Language Models
- [2023/09] Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
- [2023/08] Simple synthetic data reduces sycophancy in large language models
- [2023/07] A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation
- [2023/07] Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models
- [2023/06] Explore, Establish, Exploit: Red Teaming Language Models from Scratch
- [2023/06] Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
- [2023/05] Fact-Checking Complex Claims with Program-Guided Reasoning
- [2023/05] HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
- [2023/05] Improving Factuality and Reasoning in Language Models through Multiagent Debate
- [2023/05] Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models
- [2023/05] Mitigating Language Model Hallucination with Interactive Question-Knowledge Alignment
- [2023/05] Sources of Hallucination by Large Language Models on Inference Tasks
- [2023/05] Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
- [2023/04] In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT
- [2023/03] SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
- [2023/02] A Categorical Archive of ChatGPT Failures
- [2023/02] Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback
- [2022/02] Locating and Editing Factual Associations in GPT
- [2022/02] Survey of Hallucination in Natural Language Generation