FLAME: Enhancing Functional Coverage in Processor Verification via Large Language Models

FLAME: Enhancing Functional Coverage in
Processor Verification via Large Language Models

¹Tianjin University
²Huawei Noah's Ark Lab
^*Corresponding Author

Abstract

Processor functional verification plays a crucial role in ensuring the quality of processor designs. Traditional techniques like Constrained Random Verification (CRV) struggle to achieve high coverage of functional points due to the vast instruction space of processors. While LLM-based techniques show potential, merely instructing LLMs has notable limitations, especially when addressing functional points that require deep semantic understanding. To tackle these challenges, we propose a novel technique, FLAME, which leverages Retrieval-Augmented Generation (RAG), Chain-of-Thought (CoT), and a functional-coverage-guided feedback mechanism. This technique establishes semantic mappings between functional points and instructions, enabling the iterative generation of valid and effective test cases. Evaluation of four widely-used open-source processor designs shows that FLAME outperforms the typical or state-of-the-art baselines in functional coverage improvement with an average of 34.25%~220% while drastically reducing the time required to achieve the same functional coverage by up to 86.13%. Moreover, ablation analysis highlights the vital role of each component in the framework's overall effectiveness. This work demonstrates the superiority of our LLM-based technique FLAME in enhancing processor functional verification.

FLAME Framework

The above figure shows the overview of our novel technique FLAME for processor functional verification, which is dedicated to covering more functional points automatically. FLAME is divided into three parts: knowledge base construction, LLM-assisted test generation, and functional-coverage-guided feedback. FLAME begins by collecting extensive processor-design-related information to build a comprehensive knowledge base that provides essential background information. Then, FLAME uses the RAG technique to retrieve information related to the target functional points from the established knowledge base and generate high-quality test cases based on our devised Documents—Instructions—Programs CoT. Finally, a functional-coverage-guided feedback mechanism is utilized, where the previously-generated test cases and coverage result information could be provided to LLMs as a reference for the iterative generation process. Note that, due to the cost of LLMs, our technique is not applied to functional points that are easily addressed in practice, specifically those covered efficiently by widely-used CRV methods in our work following the existing study. In other words, FLAME focuses on addressing functional point bottlenecks in order to achieve cost-effectiveness.

Experiments

RQ1: How do different LLMs and test case formats perform in FLAME?

RQ2: How does FLAME perform compared to existing test case generation techniques in processor functional verification?

RQ3: How does each key component of FLAME contribute to the overall effectiveness?

Conclusion

We propose a novel LLM-based test generation framework, FLAME, to cover more functional points in processor verification automatically. By leveraging RAG, CoT, and a functional-coverage-guided feedback mechanism, FLAME establishes semantic mappings between functional points and instructions, enabling the iterative generation of valid and effective test cases. Evaluation of four widely-used processor designs demonstrates that FLAME surpasses baselines in functional coverage improvement while significantly reducing the time required to achieve the same functional coverage. Additionally, ablation analysis underscores the critical contribution of each component to the overall effectiveness. Future work will focus on expanding the evaluation to encompass a broader range of processor designs and exploring more efficient LLM strategies for further enhancement.

FLAME: Enhancing Functional Coverage in
Processor Verification via Large Language Models

Abstract

FLAME Framework

Experiments

RQ1: Comparison of function coverage across LLMs

RQ2: Comparison of functional coverage between FLAME and baselines

RQ3: Functional coverage comparison across variants

Conclusion

FLAME: Enhancing Functional Coverage inProcessor Verification via Large Language Models

Abstract

FLAME Framework

Experiments

RQ1: Comparison of function coverage across LLMs

RQ2: Comparison of functional coverage between FLAME and baselines

RQ3: Functional coverage comparison across variants

Conclusion

FLAME: Enhancing Functional Coverage in
Processor Verification via Large Language Models