• Homepage
  • News
  • eSports
  • PC
  • Playstation
  • Xbox
  • Nintendo
  • Reviews
  • Cosplay
What's Hot

Gambling groups betting big on Pennsylvania primaries

June 2, 2026

Martin Scorsese becomes the latest — and most unlikely — Hollywood voice for AI

June 2, 2026

OpenAI Codex expands to enterprise with Sites, plugins, non-dev users

June 2, 2026
Facebook Twitter Instagram
  • Contact
  • Terms & Conditions
  • Privacy Policy
Facebook Twitter Instagram
Gaming MasterGaming Master
Subscribe
  • Homepage
  • News
  • eSports
  • PC
  • Playstation
  • Xbox
  • Nintendo
  • Reviews
  • Cosplay
Gaming MasterGaming Master
Home»Uncategorized»New Microsoft tool lets devs spin up AI behavior tests using text descriptions
Uncategorized

New Microsoft tool lets devs spin up AI behavior tests using text descriptions

By June 2, 2026No Comments3 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


AI researchers and labs have advanced by leaps and bounds in evaluating AI models for everything from safety and compliance to sycophancy and alignment. But it appears companies and developers are faced with a new, specific need: making sure that their AI system behaves as intended for their specific product or service.

In a bid to make that testing process simpler, Microsoft on Tuesday took the wraps off ASSERT, short for Adaptive Spec-driven Scoring for Evaluation and Regression Testing.

The open-source framework, Microsoft says, makes evaluating application-specific AI behavior easy by using AI to turn high-level, natural-language descriptions of goals, policies, or intended behaviors into thorough, scored tests that can be investigated.

ASSERT takes plain-language descriptions of an AI model’s expected behavior and policies, turns them into a structured set of acceptable and unacceptable behaviors, generates problem scenarios and test cases, runs them against the target system, and scores the results. It can also record the paths the AI system takes, including intermediate actions and tool calls, so developers can inspect where failures happen.

Devs can provide system context, tools, and constraints, too, if they want to further customize what the evaluations cover.

For example, a developer could specify that a document research AI agent shouldn’t send emails to people outside the company, limit confidential information to C-level executives, and provide concise summaries with prior context in mind. ASSERT will use those rules to generate test cases that check whether the system follows those rules on an ongoing basis.

Image Credits:Microsoft

The framework, according to Microsoft, fills a gap that broader, more general evaluations cannot when AI models are intended to behave in a manner that is shaped by an application or product’s context, policies, and tools.

See also  Amazon announces Prime Day sales date and it's happening a tad earlier this year

“One of the things we’ve learned is that evaluations are absolutely critical to making good decisions,” said Sarah Bird, chief product officer of Responsible AI at Microsoft. “Because if you don’t understand the behavior of the AI system, it’s really hard to know if it’s meeting your organization’s bar […] What we found is that if you really want to have a trustworthy system, you should evaluate many more dimensions that are application-specific.”

Bird said ASSERT can be used to evaluate systems when they’re being built, after deployment, and even for continuous monitoring.

The release comes amidst a gradual but broader shift in the AI industry. As models grow more capable, researchers are focusing on repeatable testing and regression checks, with Stanford’s HELM, MLCommons’ AILuminate, and evaluation groups like METR rolling out benchmarks to measure how models behave under different conditions.

When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.



Source link

behavior descriptions devs lets Microsoft spin tests Text tool
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

Martin Scorsese becomes the latest — and most unlikely — Hollywood voice for AI

June 2, 2026

OpenAI Codex expands to enterprise with Sites, plugins, non-dev users

June 2, 2026

Audible just launched a rewards program that pays you back for listening every day

June 2, 2026
Add A Comment

Leave A Reply Cancel Reply

Our Picks

Gambling groups betting big on Pennsylvania primaries

June 2, 2026

Martin Scorsese becomes the latest — and most unlikely — Hollywood voice for AI

June 2, 2026

OpenAI Codex expands to enterprise with Sites, plugins, non-dev users

June 2, 2026

Audible just launched a rewards program that pays you back for listening every day

June 2, 2026
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
Don't Miss
eSports

Gambling groups betting big on Pennsylvania primaries

By adminJune 2, 20260

Gambling interest groups are spending big in campaigns ahead of the Pennsylvania primaries this month. …

Martin Scorsese becomes the latest — and most unlikely — Hollywood voice for AI

June 2, 2026

OpenAI Codex expands to enterprise with Sites, plugins, non-dev users

June 2, 2026

Audible just launched a rewards program that pays you back for listening every day

June 2, 2026

Subscribe to Updates

Get the latest creative news from SmartMag about art & design.

About Us
About Us

Targeted Gaming delivers the best and most comprehensive video game and entertainment coverage, including news, reviews, trailers, walkthroughs, and guides for PS4, Xbox One, Nintendo Switch, PC, and More.

We're accepting new partnerships right now.

Latest Posts

Gambling groups betting big on Pennsylvania primaries

June 2, 2026

Martin Scorsese becomes the latest — and most unlikely — Hollywood voice for AI

June 2, 2026

OpenAI Codex expands to enterprise with Sites, plugins, non-dev users

June 2, 2026
Sponsors

Type above and press Enter to search. Press Esc to cancel.