On December 6th, under the guidance of the Office of the Financial Work Committee of the Shanghai Municipal Committee of the Communist Party of China and the Shanghai Municipal Economic and Information Technology Commission, the Shanghai Artificial Intelligence Industry Association and the Shanghai Financial Industry Federation jointly released the first national group standard focused on financial business capabilities — the "Guidelines for Application Evaluation of Financial Large Models" (hereinafter referred to as "Evaluation Guidelines"). At the same time, Shanghai Kupas Technology Co., Ltd. also released a multidimensional financial large model evaluation dataset (2024 version).

As a unit of the group standard, the research team on financial large language models, led by Professor Zhang Liwen from the School of Statistics and Data Science and the Ding Shui Hu Advanced Financial Institute at Shanghai University of Finance and Economics, actively participated in the drafting and revision of the "Evaluation Guidelines." The university fully leverages the disciplinary advantages of the School of Statistics and Data Science, the Institute of Data Science and Statistics, the School of Finance, and the Ding Shui Hu Advanced Financial Institute, utilizing its deep expertise in fintech, data analysis, artificial intelligence, and other fields to provide strong professional support and intellectual backing for related efforts through interdisciplinary collaboration. This process not only reflects Shanghai University of Finance and Economics' proactive role in promoting industry standardization and regulation but also highlights its depth of integration and innovation-driven impact across multiple levels, including education, scientific research, and social service, contributing significantly to industry development and technological advancement.
It is reported that the "Guidelines for the Evaluation of Large Model Applications in Finance" is centered around financial business and oriented towards the application of models in financial institutions. From five dimensions—model fundamentals, financial security and value alignment, financial risk control, financial professional cognition, and assistance in expanding financial business—it proposes 185 indicator requirements and constructs a capacity evaluation framework for large models in the financial sector.
In terms of the model's foundational capabilities, the guidelines define 11 unimodal metric requirements, including text classification and information extraction, as well as 6 multimodal metric requirements, including image-text retrieval and video Q&A, focusing on the model's basic understanding and inference functions. Regarding financial safety and value alignment capabilities, the guidelines design 9 metric requirements, including content compliance, cultural value, and ethical value, centered around the model's reliability, interpretability, and privacy protection. In terms of financial risk control capabilities, the guidelines propose 19 metric requirements, including interest rate risk, exchange rate risk, and public opinion risk, focusing on risk prevention and control in the model's practical applications. For financial professional cognition capabilities, the guidelines summarize 23 metric requirements, including accounting, preparation and analysis of financial statements, centered around the model's basic financial knowledge and information interpretation. In terms of financial business auxiliary expansion capabilities, the guidelines extract 28 business scenarios across five fields: banking, funds, insurance, securities, and trust, covering 117 metric requirements, including loan and deposit business, quantitative trading, etc. The "Financial Large Model Application Evaluation Guidelines" also simultaneously refine the relevant evaluation content, provide related evaluation methods and tools, and offer assessment grading standards in the form of an appendix.
At the same time, the Shanghai Artificial Intelligence Laboratory, in collaboration with Shanghai University of Finance and Economics and Shanghai KuPass Technology Co., Ltd., released the "Financial Large Model Application Evaluation Report (2024)" (hereinafter referred to as the "Report"). The evaluation focuses on the core business needs of the financial industry and the adaptability of large models in financial scenarios. It assesses the financial professional capabilities of 20 mainstream large models from 14 institutions, in conjunction with key application scenarios in banking, securities, insurance, and funds.

The results show that the evaluated models perform excellently in financial security and value alignment, reflecting the industry's high emphasis on key compliance and ethical issues. However, in terms of foundational model capabilities, financial expertise recognition, and particularly multimodal processing abilities, the evaluated models still have certain shortcomings, especially in their performance in complex financial business scenarios where improvement is urgently needed. According to the comprehensive total score rankings, the evaluated models from Anthropic, Step Star / Financial Leap Star, and Alibaba are in the top three.
In the future, financial evaluation reports will be published once or twice a year, aiming to provide the industry with objective evaluation results and directional guidance references, helping financial technology achieve further breakthroughs and development in the areas of intelligence, specialization, and refinement.


