Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications.
As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level but also at the societal level for a better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives.
This article presents a comprehensive set of resources that will help you understand LLM evaluation starting from what to evaluate, where to evaluate, and how to evaluate.
Keep reading with a 7-day free trial
Subscribe to To Data & Beyond to keep reading this post and get 7 days of free access to the full post archives.