About the RG AIware Working Group

Recent innovations in AI and AI-powered software systems, collectively referred to as AIware, have democratized access to a wide range of advanced capabilities. Tasks that once required domain expertise, such as software development or poem writing, can now be performed by the average user. However, evaluating AIware presents unique challenges that go beyond those of traditional software. These include issues of reproducibility, potential data contamination, and the lack of reliable evaluation oracles, many of which still depend on labor-intensive human judgment rather than automated metrics.

The AIware Working Group within the Standard Performance Evaluation Corporation (SPEC) Research Group (RG) is taking a comprehensive approach to benchmarking, evaluation, and experimental analysis of AI models and their powered software. The group’s scope spans both academic and industrial contexts, focusing not only on performance but also on the behavior, reliability, and usability of AIware.

The group aims to develop processes, benchmarks, and evaluation tools to better understand these software systems through both quantitative and qualitative means. This includes standardized metrics, scenario-driven evaluations, and hybrid assessments that leverage both human and AI-assisted judgment. Ultimately, the RG AIware Group seeks to advance research and promote best practices for transparent, reproducible, responsible, and trustworthy evaluation in the rapidly evolving field of AIware.

Its membership body currently includes representatives of BEZNext, Concordia University, King’s College London, Hewlett Packard Enterprise, Rochester Institute of Technology, University of L’Aquila, University of Waterloo, Wuhan University, York University, and Zhejiang University.