Mission Statement

The mission of the SPEC RG AIware Working Group is to provide a forum to advance the transparent, reproducible, and trustworthy evaluation of AI models their powered software through rigorous benchmarking, experimental analysis, and the development of standardized tools and processes. By fostering collaboration across academia and industry, the group aims to build meaningful assessment frameworks that capture the performance, behavior, reliability, and usability of AIware in a rapidly evolving technological landscape.

General Topics of Interest

Topics of interest include, but are not limited to, the following:

Novel benchmarking and evaluation technologies on AI models and their powered software systems
Innovative techniques for tracing, monitoring, and analyzing the behavior of AIware
Standardized processes, best practices, and lessons learned for developing and evaluating AIware
Quantitative and qualitative studies on both open source and closed source AIware systems
Open sourced tools, infrastructures and datasets that support AIware evaluation and analysis

Current Activities

Survey existing AIware-related benchmarks and evaluation methodologies to identify key gaps and challenges
Develop a taxonomy of evaluation dimensions relevant to AIware
Select representative scenarios for pilot studies (e.g., SWE-bench, KernelBench, Effibench)

Future Activities

Release the enhanced benchmarks for the pilot studies as open source projects
Identify common pitfalls and challenges in benchmarking and evaluating AIware, and propose mitigation strategies
Providing open source trusted evaluation platforms for AI models and their powered systems (e.g., sandboxed environment)
Collaborate with regulatory and standards organizations to develop formal evaluation guidelines
Periodically report on the progress, risks. and social impact for AIware through faithful and meaningful benchmark evaluations