CNN Plus加速器IP

使用低功耗FPGA实现AI加速

客制化卷积神经网络(CNN)IP —— CNN Plus IP是一种灵活的加速器IP,通过莱迪思FPGA的并行处理能力、分布式存储器和DSP资源,轻松实现超低功耗AI。

可配置的使用模式 —— 提供两种实现模式:低能耗(Compact)和高性能。低能耗模式是使用FPGA本地存储器的低功耗处理模式。 而高性能模式专为较大的网络实现而优化。

易于实现 —— 采用TensorFlow等常见的机器学习框架训练出来的模型可使用莱迪思神经网络编译器工具进行编译,然后通过CNN Plus加速器IP在硬件上实现。

特性

  • 对莱迪思神经网络编译器工具生成的每条命令序列都执行一系列的计算
  • 资源使用可配置,平衡功耗和性能
  • 能够以低能耗模式和高性能模式运行
  • 使用内部和外部存储资源,管理访问从而优化性能
  • 可配置的神经网络权重位宽(16位、8位、1位)

跳转到

Block Diagram

Functional Block Diagram of CNN Plus Accelerator (Compact CNN Type)

Resource Utilization

LFCPNX-100-9BBG484I
Configuration3 clk_i, aclk_i Fmax (MHz)2 Registers LUTs LRAMs4 EBRs Logical DSP
MULT9, MULT18 REG18, PREADD9
Default 121.448, 127.081 2566 3626 2 13 13, 1 13, 13
Scratch Pad Memory Size=2K, Others=Default 119.104, 125.078 2578 3639 2 15 13, 1 13, 13
Scratch Pad Memory Size=4K, Others=Default 116.469, 124.270 2582 3650 2 19 13, 1 13, 13
Scratch Pad Memory Size=8K, Others=Default 122.160, 130.022 2590 3657 2 27 13, 1 13, 13
Scratch Pad Memory Mode=OCTA, Others=Default 120.685, 127.275 2710 3844 2 15 13, 1 13, 13
Scratch Pad Memory Mode=OCTA, Scratch Pad Memory Size=2K, Others=Default 123.183, 129.149 2714 3840 2 19 13, 1 13, 13
Scratch Pad Memory Mode=OCTA, Scratch Pad Memory Size=4K, Others=Default 122.085, 123.047 2722 3858 2 27 13, 1 13, 13
Memory Type=SINGLE_LRAM, Others=Default 130.005, 130.463 2565 3608 1 13 13, 1 13, 13
Memory Type=QUAD_LRAM, Others=Default 116.564, 122.011 2573 3677 4 13 13, 1 13, 13
Machine Leaning Type=OPTIMIZED_CNN, Others=Default 113.869, 129.601 5470 7226 2 17 48, 4 48, 48
Machine Leaning Type=OPTIMIZED_CNN, Scratch Pad Memory Size=2K, Others=Default 112.803, 123.686 5475 7240 2 21 48, 4 48, 48
Machine Leaning Type=OPTIMIZED_CNN, Scratch Pad Memory Size=4K, Others=Default 114.863, 124.409 5486 7265 2 29 48, 4 48, 48
Machine Leaning Type=OPTIMIZED_CNN, Scratch Pad Memory Size=8K, Others=Default 113.353, 122.234 5490 7279 2 45 48, 4 48, 48
Machine Leaning Type=OPTIMIZED_CNN, Scratch Pad Memory Size=8K, Others=Default 113.353, 122.234 5490 7279 2 45 48, 4 48, 48
Machine Leaning Type=OPTIMIZED_CNN, Line Buffer Size=1024, Others=Default 115.473, 126.662 5475 7250 2 21 48, 4 48, 48
Machine Leaning Type=OPTIMIZED_CNN, Line Buffer Size=2048, Others=Default 118.161, 122.609 5477 7266 2 30 48, 4
Machine Leaning Type=OPTIMIZED_CNN, Scratch Pad Memory Size=8K, Maximum Burst Length=256, Others=Default 119.446, 126.263 5491 7279 2 45 48, 4 48, 48
Machine Leaning Type=OPTIMIZED_CNN Convolution Engine=DUAL_CONV, Others=Default 113.225, 124.657 7394 9585 2 21 84, 4 84, 84
Machine Leaning Type=OPTIMIZED_CNN Convolution Engine=QUAD_CONV, Others=Default 117.233, 124.008 11023 14216 2 29 156, 4 156, 156
Machine Leaning Type=OPTIMIZED_CNN LRAM Enable Output Register=Checked, Others=Default 147.842, 131.027 5475 7252 2 17 48, 4 48, 48
Embedded Mode=Checked, Others=Default 127.259, N/A 1737 2474 2 5 13, 1 13, 13
Embedded Mode=Checked, Machine Leaning Type=OPTIMIZED_CNN, Others=Default 120.715, N/A 4673 6109 2 9 48, 4 48, 48
Embedded Mode=Checked, Line Buffer Size=1024, Machine Leaning Type=OPTIMIZED_CNN, Others=Default 114.116, N/A 4677 6112 2 13 48, 4 48, 48
Embedded Mode=Checked, Machine Leaning Type=OPTIMIZED_CNN, Convolution Engine=DUAL_CONV, Others=Default 118.991, N/A 6596 8463 2 13 84, 4 84, 84
Machine Leaning Type=EXTENDED_CNN, Others=Default 134.120, 124.750 5798 8049 2 20 48, 4 48, 48
Machine Leaning Type=EXTENDED_CNN, Scratch Pad Memory Size=2K, Others=Default 126.406, 130.497 5803 8051 2 24 48, 4 48, 48
Machine Leaning Type=EXTENDED_CNN, Scratch Pad Memory Size=4K, Others=Default 150.466, 114.403 5811 8059 2 32 48, 4 48, 48
Machine Leaning Type=EXTENDED_CNN, Scratch Pad Memory Size=8K, Others=Default 132.732, 127.926 5815 8041 2 48 48, 4 48, 48
Machine Leaning Type=EXTENDED_CNN, Maximum Argument Size=8192, Others=Default 143.947, 128.287 5798 8051 2 22 48, 4 48, 48
Machine Leaning Type=EXTENDED_CNN, Convolution Engine=DUAL_CONV, Others=Default 144.071, 127.665 7722 10404 2 24 84, 4 84, 84
Machine Leaning Type=EXTENDED_CNN, Convolution Engine=QUAD_CONV, Others=Default 146.585, 135.208 11352 15002 2 32 156, 4 156, 156
Embedded Mode=Checked, Machine Leaning Type=EXTENDED _CNN, Others=Default 134.88, N/A 5000 6929 2 12 48, 4 48, 48

Notes:
1. Performance may vary when using a different software version or targeting a different device density or speed grade.
2. Fmax is generated when the FPGA design only contains the CNN Plus Accelerator IP Core. These values may be reduced when user logic is added to the FPGA design.
3. The K value in “Scratch Pad Memory Size=*K” is equivalent to 1024 entries × 2 bytes. For example, 4K is equal to 8 kB of scratch pad memory.
4. The OPTIMIZED_CNN implementation has a lot more EBRs because it duplicates the EBRs in Convolution scratch storage to enable parallel processing. In addition, some duplicated submodules have their own EBRs: CONV_EU (1 EBR per unit) and POOL (1 EBR shared by 2 units).

订购信息

产品系列 订购编号 描述
CrossLink-NX CNNPLUS-ACCEL-CNX-U 单次设计许可
CrossLink-NX CNNPLUS-ACCEL-CNX-UT 站点许可

文档

快速参考
标题 编号 版本 日期 格式 文件大小
选择全部
CNN Plus Accelerator IP Core - User Guide
FPGA-IPUG-02115 1.5 12/5/2023 PDF 853.8 KB