CNN Plus加速器IP

使用低功耗FPGA实现AI加速

客制化卷积神经网络(CNN)IP —— CNN Plus IP是一种灵活的加速器IP,通过莱迪思FPGA的并行处理能力、分布式存储器和DSP资源,轻松实现超低功耗AI。

可配置的使用模式 —— 提供两种实现模式:低能耗(Compact)和高性能。低能耗模式是使用FPGA本地存储器的低功耗处理模式。 而高性能模式专为较大的网络实现而优化。

易于实现 —— 采用TensorFlow等常见的机器学习框架训练出来的模型可使用莱迪思神经网络编译器工具进行编译,然后通过CNN Plus加速器IP在硬件上实现。

特性

  • 对莱迪思神经网络编译器工具生成的每条命令序列都执行一系列的计算
  • 资源使用可配置,平衡功耗和性能
  • 能够以低能耗模式和高性能模式运行
  • 使用内部和外部存储资源,管理访问从而优化性能
  • 可配置的神经网络权重位宽(16位、8位、1位)

跳转到

框图

CNN Plus IP低能耗模式框图

CNN Plus IP高性能模式框图

性能和尺寸

CrossLink-NX Performance and Resource Utilization
Configuration3 clk_i, aclk_i Fmax (MHz)2 Slice Registers LUTs LRAMs EBRs4 Logical DSP
MULT9, MULT18 MULT18, PREADD9
Default 116.401, 118.652 2855 3673 2 12 13, 1 13, 13
Scatch Pad Memory Size=4K, Others=Default 119.962, 118.259 2890 3689 2 15 13, 1 13, 13
Scatch Pad Memory Size=8K, Others=Default 121.832, 116.009 2898 3685 2 19 13, 1 13, 13
Scatch Pad Memory Size=16K, Others=Default 118.751, 113.598 2880 3703 2 27 13, 1 13, 13
Memory Type=SINGLE_LRAM, Others=Default 115.062, 113.404 2869 3631 1 12 13, 1 13, 13
Machine Leaning Type=OPTIMIZED_CNN 123.609, 113.662 5687 7693 2 17 48, 4 48, 48
Machine Leaning Type=OPTIMIZED_CNN, Scatch Pad Memory Size=2K, Others=Default 117.564, 109.158 5695 7717 2 21 48, 4 48, 48
Machine Leaning Type=OPTIMIZED_CNN, Scatch Pad Memory Size=4K, Others=Default 124.239, 118.092 5709 7711 2 29 48, 4 48, 48
Machine Leaning Type=OPTIMIZED_CNN, Scatch Pad Memory Size=8K, Others=Default 120.963, 112.133 5707 7706 2 45 48, 4 48, 48
Machine Leaning Type=OPTIMIZED_CNN, Scatch Pad Memory Size=8K, Maximum Burst Length=256, Others=Default 123.289, 120.875 5709 7722 2 45 48, 4 48, 48

1. Performance may vary when using a different software version or targeting a different device density or speed grade.
2. Fmax is generated when the FPGA design only contains the CNN Plus Accelerator IP Core. These values may be reduced when user logic is added to the FPGA design.
3. The K value in “Scatch Pad Memory Size=*K” is equivalent to 1024 entries x 2 bytes. For example, 4K is equal to 8 kB of scratch pad memory.
4. The OPTIMIZED_CNN implementation has a lot more EBRs because it duplicates the EBRs in Convolution scratch strorage to enable parallel processing. Also, some duplicated submodules have their own EBRs: CONV_EU (1 EBR per unit) and POOL (1 EBR shared by 2 units).

订购信息

产品系列 订购编号 描述
CrossLink-NX CNNPLUS-ACCEL-CNX-U 单次设计许可
CrossLink-NX CNNPLUS-ACCEL-CNX-UT 站点许可

文档

快速参考
标题 编号 版本 日期 格式 文件大小
CNN Plus Accelerator IP User Guide
FPGA-IPUG-02115 1.0 5/21/2020


Like most websites, we use cookies and similar technologies to enhance your user experience. We also allow third parties to place cookies on our website. By continuing to use this website you consent to the use of cookies as described in our Cookie Policy.