Quantifying LLM Reliability Across Risk Scenarios for Trustworthy Enterprise AI
SINGAPORE, March 10, 2026 /PRNewswire/ -- Appier today announced new research advancing the reliability of Agentic AI systems. To expand the impact of its research and development efforts, Appier's AI research team continues to focus on frontier topics in Agentic AI and Large Language Models (LLMs), exploring forward-looking technical challenges that push the boundaries of marketing technology innovation.
In its latest paper, "Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models," the team introduces a systematic evaluation framework to measure how language models make decisions under different risk conditions. The approach significantly improves model reliability in high-risk scenarios through a novel methodological design.
The research addresses a key challenge in deploying Agentic AI in enterprise environments: ensuring that autonomous AI decisions are trustworthy. The findings further strengthen Appier's technological leadership in AI while contributing practical insights for the broader Agentic AI ecosystem.
As enterprises move from AI copilots toward autonomous AI agents, reliability has become a critical barrier to adoption. According to a 2025 McKinsey survey, 62% of organizations have already begun experimenting with AI agents, yet inaccuracy remains the most commonly cited risk in enterprise AI adoption.
As an AI-native Agentic AI-as-a-Service (AaaS) company, Appier continues to translate cutting-edge research into enterprise-ready methodologies and product capabilities. This study specifically addresses two major enterprise concerns: AI hallucinations and decision reliability. To tackle this challenge, the research introduces a Risk-Aware Decision-Making framework that converts LLM decisions across varying risk conditions into quantifiable metrics, providing a stronger governance foundation for enterprise AI deployment.
Turning Risk-Aware Strategies into Quantifiable Metrics
Traditional LLM evaluations focus primarily on whether an answer is correct. In enterprise environments, however, the cost of being wrong and the value of refusing to answer differ significantly. The study introduces structured risk parameters—including rewards for correct answers, penalties for incorrect responses, and costs for refusal—to simulate different risk scenarios. Under this framework, models must evaluate their capability, confidence level, and risk conditions before deciding whether to answer, refuse, or guess. Decision quality is then measured by whether the model maximizes expected reward, providing a more realistic assessment of strategic decision-making.
Key Finding: Strategic Imbalance in Existing Models
Using the Risk-Aware Decision-Making framework, the research finds that many leading LLMs display strategic imbalance across risk scenarios. In high-risk settings, models often over-guess despite potential negative consequences. In low-risk scenarios, they may become overly conservative and refuse to answer too frequently. This inconsistency limits both the autonomy and safety of AI systems in enterprise environments. The study suggests the issue is not purely knowledge-related but stems from the model's difficulty in integrating multiple capabilities into a stable decision strategy.
Skill Decomposition Enables More Optimal Decisions
To address this challenge, the research proposes a Skill Decomposition approach, breaking decision-making into three steps:
- Task Execution — solving the task to generate an initial answer
- Confidence Estimation — evaluating confidence in that answer
- Expected-Value Reasoning — reasoning about outcomes under risk conditions
This structured reasoning process enables models to determine whether answering or refusing yields the best outcome. The approach allows models to better integrate multiple capabilities and produce more rational and stable decisions in high-risk environments, offering a practical path toward more reliable enterprise AI systems.
"For Agentic AI to operate in critical enterprise workflows, the key is not only making AI smarter, but making its autonomous decisions more reliable," said Chih-Han Yu, CEO and Co-founder of Appier. "Appier has built its products around AI and continuously invested in world-class research. By turning LLM risk awareness into a quantifiable methodology, this research strengthens the foundation for trustworthy enterprise AI and helps accelerate the real-world adoption of Agentic AI and translate it into scalable business value and ROI."
The research findings have been further integrated into Appier's Agentic AI-powered platforms, including Ad Cloud, Personalization Cloud, and Data Cloud, helping enterprises advance autonomous workflows in a more reliable and trustworthy way.
Looking ahead, Appier will continue leveraging its strong AI research capabilities, proprietary data assets, and deep industry expertise to advance Agentic AI innovation and support enterprises in building more efficient and trustworthy AI-driven operations.
About Appier
Appier (TSE: 4180) is an AI-native Agentic AI as a Service (AaaS) company that empowers business decision-making with cutting-edge AdTech and MarTech solutions. Founded in 2012 with the vision of "Making AI Easy by making software intelligent," Appier endeavors to help businesses turn AI into ROI with its Ad Cloud, Personalization Cloud, and Data Cloud solutions. Now Appier has 17 offices across APAC, the US and EMEA, and is listed on the Tokyo Stock Exchange. Visit www.appier.com for more company information, and visit ir.appier.com/en/ for more IR information.
For media queries, please email pr@appier.com
Logo - https://mma.prnewswire.com/media/2688236/Appier_white_logo__blue_background_Logo.jpg
Share this article