White Paper

ORSPI4 Field-Programmable System-on-a-Chip Solves Design Challenges for 10 Gbps Line Cards

Sidhartha Mohanty and Fred Koons
Lattice Semiconductor Corporation
October 2003
The long awaited convergence of telecommunication and data networks is finally poised to happen in the next few quarters. The convergence will not be centered on a common protocol or occur at a single Open Systems Interconnection (OSI) layer. Rather, it will form around a common line rate – 10 Gbits/second (Gbps).

Venerable Ethernet has scaled quickly from a 10 Mbps standard in 1996 to 10 Gbps deployments seen today. Fibre Channel emerged in the early 1990’s as the protocol of choice in storage networking applications. Rather than fading away, Fibre Channel has risen like the Phoenix and is also soaring to 10 Gbps. Built for circuit-switched networks, SONET/SDH has steadily migrated into the core of the network, and now the metropolitan area of the network, from OC-3 to OC-192 in less than a decade.

**System-Packet Interface (SPI4.2)**

Resource-strapped equipment vendors must develop strategies that will reduce time-to-market while maximizing the reuse of hardware and software. SPI4.2 (System-Packet Interface, Level 4, Phase 2) is a recent system-level interface standard that enables the development of flexible, scalable systems for a converged data and telecommunications infrastructure. Published in 2001 by the Optical Internetworking Forum (OIF), the SPI4.2 standard allows the transmission of multiple protocols at variable, high-speed data rates, including: Packet-over-SONET/SDH (POS), OC-192, Ethernet, Fast Ethernet, Gigabit Ethernet, 10 Gigabit Ethernet, and 10 Gigabit Fibre-Channel SAN. SPI4.2 relieves designers from wasting valuable resources on proprietary ASIC-based or specialized network processor interfaces to support different data rates and services. The benefits are a common standards-based interface facilitating inter-connection between various devices.

Designed for packet transfer between a MAC device and a network processor or switch fabric, the SPI4.2 interface supports the aggregate bandwidths required of ATM and Packet-over-SONET/SDH (POS) applications. SPI4.2 provides a common interface for 10 Gbps Wide Area Network (WAN), Local Area Network (LAN), Metro Area Network (MAN), and Storage Area Network (SAN) technologies, and it is ideal for systems that aggregate low-data rate channels into a single 10 Gbps uplink for long haul or backbone transmission. Key attributes of the SPI4.2 Interface Specification include:

- Point-to-point connection (for example, between single MAC and single NPU device)
- Minimum 9.952 Gbps Transmit/Receive data paths (separate and independent)
  - 16-bits wide datapath
  - Source-synchronous double-edge clocking with a 311 MHz minimum operating frequencies
  - 622 Mbps minimum data rate per line, 700 Mbps typical data-rate for static alignment, 850 Mbps typical data-rate for dynamic receiver alignment
  - LVDS I/Os
- Static alignment or Dynamic Alignment (optional) per receive pin
- Support for 256 ports:
  - Suitable for STS-1 granularity in SONET/SDH applications – 192 ports
  - Fast Ethernet granularity in Ethernet applications – 100 ports
- Transmit/Receive FIFO Status Interface, which passes flow control information back to the transmitting device on a continuous basis.

Figure 1 illustrates the basic architecture for a SPI4.2-based system. Some of the benefits to system designers are obvious:

- Protocol-independence allows for the simultaneous transfer of multiple protocols
- Allows for variable packet sizes
- Simplified interface to a variety of PHY devices preserves system-core card design
- Low pin count (compared to other interfaces like CSIX)
– Use of LVDS I/O reduces power requirements and improves signal integrity
– In-band control further minimizes pin count and lowers overall design cost
– Per port flow-control to allow for mixing of protocols at full rate while not wasting unused bandwidth by a given port
– Very low overhead and simplified control mechanisms
– Optional dynamic bit deskewing allows for easier board design by extending trace lengths and compensating for multiple connectors in signal path

**Figure 1 — SPI4.2 Block Diagram**

The SPI4.2 interface provides a standards-based interface (OIF) for inter-chip communication. A number of designers are using the SPI4.2 protocol at the lower rates (1/4th data-rate and clock, at 2.5 Gbps) for OC-48 applications. This provides a migration path for the board as they migrate to the next-generation 10 Gbps interface in the future with a faster clock and higher data-rate. The Lattice ORSPI4 FPSC supports both the quarter-rate and full-rate operations on the same device.

**Layout Constraints**

Traditional parallel architectures employ a central synchronous clock distribution scheme that enables data to be clock in and out of target devices. Source synchronous timing schemes transmit both the clock and data from a common driver. Since SPI4.2 is a source-synchronous parallel interface it can be susceptible to signal skew caused by varying layout constraints. Connector reflections can also cause significant misalignment of signals at the receiving device. Figure 2 shows three common board scenarios: chip-to-chip with matched trace lengths; point-to-point through connectors with varying trace lengths; and, chip-to-chip with varying trace lengths.
SPI4.2 defines optional static and dynamic timing modes to handle these cases. A static alignment scheme allows for shifting the clock phase by a fixed amount. This is useful when trace lengths can be precisely matched. For the more common case where trace lengths will vary, SPI4.2 employs a de-skew technique that relies on a built-in training sequence with user-selectable repetition rate and duration. Referred to as Dynamic Alignment, this timing mode effectively eliminates phase errors due to PCB traces of unequal lengths by continuously monitoring the data and adjusting the phase of the clock to align with it. Figure 3 illustrates the effect of dynamic alignment. This continuous operation essentially negates changes in jitter or skew due to temperature and process variations or other factors that can wreak havoc on a high-speed system.

Two key elements are required on the SPI4.2 Receiver block to handle Dynamic Alignment:
- Clock Phase Recovery: The receiver block needs to evaluate multiple phases of the clock against the incoming data to ensure that the data is reliably sampled. An alternative approach is to evaluate multiple phases of the data against a single clock, which requires a potentially long delay chain to create the different phases.
• Dynamic Data line De-skew: There are expected differences in arrival times of each data line due to board layout. Since different clocks are used to sample each data line, a de-skew function is needed on the board to compensate for the +/- bit of relative de-skew between the lines.

The ORSPI4 FPSC handles these functions using an analog CDR-like function on the receiver. The receiver block evaluates the data over 16 phases of the clock, ensuring that the data is reliably sampled. This approach provides a much more reliable sampling point.

**XAUI — The MAC-to-PHY Interface**

Another interface essential for 10 Gbps communications systems is the MAC-to-PHY interface. XAUI (X stands for 10 Gbps and AUI means Attachment Unit Interface) is an Ethernet-only specification (IEEE 802.3ae), specifically defined for MAC-to-PHY (XENPAK, X2, or XPAK optics modules) connections in 10 Gbps Ethernet systems. Another similar interface is 10 Gbps Fibre Channel XAUI for connection to 10 Gbps Fibre Channel MAC devices. There is rapidly growing interest in using XAUI as a 10 Gbps backplane interface for a number of reasons:

- XAUI is a low pin count, self-clocked serial bus directly evolved from Gigabit Ethernet
- By arranging four serial lanes, the 4-bit XAUI interface supports the ten-times data throughput required by 10 Gigabit Ethernet
- The XAUI employs the same robust 8B/10B transmission code of Gigabit Ethernet to provide a high level of signal integrity across the common FR-4 backplanes
- Inherently low EMI (Electro-Magnetic Interference) due to it’s self-clocked nature, compensation for multi-bit bus skew—allowing significantly longer chip-to-chip distances
- Built-in error detection and fault isolation capabilities
- Low power consumption

XAUI and SPI4.2 are complimentary interfaces — XAUI sits at the OSI Layer 1–Layer 2 boundary, while SPI4.2 resides in the MAC layer. Interoperable interfaces like XAUI and SPI4.2 allow architects to focus their resources on the real value-added portions of a communication system that are the hardware and software at higher layers.

**Figure 4 — SPI4.2 around the MAC, XAUI on the backplane**

Packet Buffering

Packet buffering is another of the tough design challenges in packet switching/routing systems, especially at 10 Gbps. Memory management systems and external memory reads and writes have to be fast enough to sustain the 10 Gbps line rate in order not to become a bottleneck. If a line card is to support a 100% speedup to the switch fabric, then buffer memory must support at least 24 Gbps total bandwidth (read/write), 10 Gbps data in either direction and 2 Gbps overhead in each direction.
To determine the per-pin bit rate requirement at the memory interface, the 24 Gbps requirement should be divided by the memory width, and the result is the per-pin bit rate requirement. The clock speed required to achieve this bit rate will depend on the memory type. For a double data rate (DDR) interface, the clock rate is half the per-pin bit rate requirement. For example, if the memory width is 32 bits, the per-pin bit rate requirement is 24 Gbps/32 = 700 Mbps. If the memory were DDR, then the required clock rate would be 350 MHz. Since few memories will run this fast, the memory width must be expanded in order to lower the clock rate to 200 MHz, which is possible with DDR DRAM families like FCRAM & RLDRAM, or having a separate read and write bus which is possible with QDR SRAM devices. QDR SRAM devices are typically the best memory devices for high-speed 10 Gbps applications because of the higher peak bandwidths they can provide through the separate read and write bus, and reduced empty cycles for pre-charge and activation.

**ORSPI4 Field-Programmable System-on-a-Chip**

Lattice Semiconductor has developed a next-generation FPSC intended for high-speed data transmission. Built on the Series 4 re-configurable embedded System-on-a-Chip (SoC) architecture, the ORSPI4 FPSC contains two SPI4.2 interface blocks, a high-speed QDR II SRAM memory controller, 4 channels of 600 Mbps to 3.7 Gbps SERDES with 8b/10b encoding/decoding and over 16,000 Logic Elements all on a single chip. This is truly the world’s first SoC targeted at Line and Switch card applications for high-speed communications systems. The high-speed blocks are embedded on the ASIC side for higher performance, low power, and higher density. The FPGA gates are available for the user’s interface functions and proprietary logic specific to the users’ cards. The interface to the FPGA side is a low-speed FIFO interface, enabling a user-friendly interface for the logic implemented in the FPGA.
The SPI4.2 blocks provide dual 10 Gbps Physical to Link Layer interfaces in conformance to the OIF-SPI4-02.0 specification. Each block provides a bi-directional interface with an aggregate bandwidth of 14.4 Gbps. This is achieved by using 16 LVDS pairs each for Rx and Tx operating at a data rate of 900 Mbps with a 450 MHz DDR clock. Both static and dynamic alignment are supported at the receive interface. Dynamic alignment is used to compensate for bit-to-bit skew at higher data rates, where it becomes difficult to meet tight setup/hold requirements. The ORSPI4 supports dynamic bit-by-bit de-skewing over 16 phases of clock. Typical FPGA implementations support a maximum of 4 or 8 phases. The ORSPI4 implementation provides for a better data sample at the high-speed 10 Gbps interface. The ORSPI4 also supports the quarter-rate mode for OC-48 (2.5 Gbps, 1/4th data-rate and clock) applications using the SPI4.2 interface and protocols.

DIP-4 and DIP-2 parity generation and checking are supported in the device, embedded on the ASIC side. Data buffering of 8Kbytes each for both transmit and receive is provided by embedded Dual-Port RAM in each SPI4.2 core. The FIFOs enable smooth clock transfer between the high-speed SPI4.2 interface and the low-speed FPGA, allowing clock domain transfer of up to four (4) FPGA clock domains per SPI4.2 interface. The ORSPI4 also contains internal 1K deep main and shadow calendar supporting scheduling of up to 256 ports. The main calendar is the active calendar that is currently in use. The shadow calendar is the backup calendar that can be updated and/or changed. This facilitates dynamic hit-less bandwidth provisioning as defined by Appendix G of the OIF specification. The entire calendar provisioning scheme is embedded on the ASIC side, thus reducing the complexity for the FPGA. The Transmit and Receive Status FIFOs can also store flow control information for up to 256 ports, the maximum specified in the SPI4.2 specification.
Unlike other SPI4.2 implementations for FPGAs, the ORSPI4 FPSC embeds all the high-speed functions in an ASIC core, allowing the programmable gates to be used for important time-to-market bridging functions. Additionally, embedding these functions within an ASIC core guarantees performance and interoperability. For instance, dynamic alignment can be very challenging (performance: power, speed) using regular FPGA I/Os. The ORSPI4 FPSC embeds complex CDR-like analog functions, high performance PLLs, and other associated logic to perform dynamic alignment in the ASIC block. And, FPGA gates can be notoriously difficult to place and route a complex high-speed core like SPI4.2. The ASIC implementation provides a big advantage in terms of both speed performance as well as reducing total power consumption. Typical FPGA implementations can consume upwards of 10W for one SPI4.2 interface implementation. In comparison, the ORSPI4 dissipates less than 2W per SPI4.2 implementation at 900 Mbps operation. This is less than one-fifth the power of competitive SPI4.2 FPGA implementations. That’s a big advantage for power-hungry 10 Gbps line cards.

In order to provide wire-speed packet processing, the ORSPI4 also contains an independent Memory Controller Block that provides data buffering between the FPGA logic and external memory and supports a throughput of greater than 20 Gbps. Data is transferred to and from memory through two sets of 36-bit unidirectional data lines (one read, one write) operating at 200 MHz DDR. A set of 72 data signals, is available to transfer data across the core-FPGA interface and allows the system to utilize the bandwidth available with second-generation Quad Data Rate (QDR-II) SRAMs. Of the 72 data signals, 8 signals can be used either for parity or data. For some applications a second memory controller can be added in the FPGA gates for two independent line-rate buffers.

The SPI4 interface blocks in the ORSPI4 FPSC contains industry-best features:

<table>
<thead>
<tr>
<th>Feature</th>
<th>Description</th>
</tr>
</thead>
</table>
| Multiple SPI4.2 interface cores| • TWO independent full-featured OIF-compliant SPI4.2 interfaces for >20 Gbps bandwidth.  
                                    • Supports quarter-rate mode for 2.5 Gbps operation                             |
| Data Alignment                 | • Supports both static and dynamic alignment schemes                         
                                    • Supports dynamic bit de-skew over 16 phases of clock                          |
| Parity generation and checking | DIP-2 and DIP-4 parity generation and checking embedded on the ASIC side      |
| Calendar support               | Embedded 1K deep main and shadow calendar built-in,                           
                                    supporting scheduling up to 256 ports and hit-less bandwidth provisioning (Appendix G) |
| Flow Control flexibility       | An embedded set of write and read port descriptor memories allow for a flexible flow control interface to be instituted per SPI4.2 port |
| Signal integrity              | Dedicated LVDS drivers and receivers with a Center Tap included increases performance and reduces jitter |
| Low Power                     | Less than 2W of power for each SPI4.2 interface at 900 Mbps operation with dynamic alignment |
| Packet Buffering               | Embedded high-speed QDR II SRAM memory controller for interface to external QDR II SRAM for line-rate packet buffering |
| User design interface         | User friendly FIFO interface to FPGA logic for clock domain transfer and ease of design |

The high-speed SERDES block supports four serial links, each operating at up to 3.7 Gbps (2.96 Gbps data rate with 8b/10b encoding and decoding), to provide four full-duplex synchronous interfaces with built-in Rx Clock and Data Recovery (CDR) and transmitter pre-emphasis. The SERDES block is identical to that in the ORT82G5 and ORT42G5 FPSCs; supports embedded 8b/10b encoding/decoding and implements link state machines for both 10 Gbps Ethernet, and Fibre Channel. The state machines are IEEE P802.3ae/D4.01 XAUI-
based and also support FC (ANSI X3.230: 1994) link synchronization. The SERDES in the ORSPI4 FPSC contains industry-best performance with the following features:

<table>
<thead>
<tr>
<th>Feature</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Widest range of Programmable Data Rates</td>
<td>4 channels from 0.6 Gbps all the way up to industry-leading performance at 3.7 Gbps</td>
</tr>
</tbody>
</table>
| Multiple Standards Compliance                | • XAUI-based at 40" of FR-4  
• Fibre Channel (1G, 2G), XAUI-Ethernet (10G) and XAUI-FC (10G) |
| Rx Jitter Tolerance                          | • 0.75UIp-p typical, 0.65 UIp-p worst case, superior to XAUI and Fibre Channel specifications (@ 3.125 Gbps) |
| Tx Total Jitter                              | • 0.17UIp-p typical, 0.24 UIp-p worst case, superior to XAUI and Fibre Channel specifications (@ 3.125 Gbps) |
| Low Power per SERDES Channel                | 225 mW worst case, including I/O buffers @ 3.125 Gbps                      |
| Fast Locking Times                           | Bit Realignment 300 nanoseconds (938 bit times @ 3.125 Gbps) nominal        |
| Transmitter Output (CML)                     | • Full-amplitude mode: 0.6V p-p Minimum  
• Half-amplitude mode: 0.3V p-p Minimum     |
| Demonstrated Drive Length                    | 40 inches of FR-4 backplane                                                |

The ORSPI4 FPSC also contains a semi-dedicated microprocessor interface, a 32-bit internal system bus (and 4-bits parity), and built-in system registers that act as the control and status center for the device. This interface includes many maskable interrupts from the SPI4.2, SERDES, and memory controller blocks, together with the ability to configure the FPGA device.

Figure 6 below shows various system applications for the ORSPI4 FPSC. The ORSPI4 can be used to provide multiple midplane SPI4.2 interfaces on 10 Gbps line cards, interfacing to 10 Gbps Framers, MACs, or Traffic Managers, or Network Processors. The device also provides a SERDES-based XAUI backplane for a 10 G Ethernet line card. The embedded memory controller is used for external packet buffering at line rates. An additional high-speed memory controller can be implemented on the FPGA side for external packet buffering for the second 10 Gbps SPI4.2 port. The ORSPI4 FPSC can also be used to bridge between multiple 2.5 Gbps line cards (PL3, U3, Any-Phy) to a 10 Gbps line card (SPI4.2).

**Figure 6 — Applications Examples for the ORSPI4**

**Figure 6A**
Quad 2.5 Gbps to 10 Gbps bridging solution. Implements the interface functions to bridge between legacy 2.5 Gbps applications and 10 Gbps applications.
Figure 6B
SPI4.2 to SERDES backplane. Provides midplane (SPI4.2) and backplane (SERDES) interfaces.

Figure 6C
10 Gbps Packet Processing Applications. Provides multiple SPI4.2 interface bridge on 10 Gbps Line Card. Provides SPI4.2 interfaces between 10 Gbps Framer (or MAC) and 10 Gbps Network Processor, with external QDR II SRAM memory interface for packet buffering.

Figure 6D
SPI4.2 interfaces to 10 G Ethernet MACs. Translation from SPI4.2 interface to XGMII interface on 10 GE MAC.

Figure 6E
20 Gbps Packet Processing Applications. Provides multiple (2) SPI4.2 interfaces on 20 Gbps line card with two memory controllers for line-rate buffering using external memory devices.
Summary

The ORSPI4 FPSC is the industry’s most comprehensive programmable 10 Gbps/2.5 Gbps SoC bridge platform— it combines all the necessary capabilities to quickly bring a line or switch card to market:

- Dual low power SPI4.2 interfaces with embedded Static or Dynamic Alignment, dynamic bit de-skew;
- Quad XAUI/Fibre Channel SERDES capable of line speeds to 3.7 Gbps;
- Embedded memory controller to external QDR II SRAM memory for packet buffering to insure wire-speeds;
- And, over 16K FPGA Logic Elements to handle a variety of bridging functions.