I was recently in the market for a new FPGA to start building my upcoming projects on.
Due to the scale of my upcoming projects a Xilinx series 7 UltraScale+ FPGA of the Virtex family would be perfect, but a Kintex series FPGA will be sufficient for early prototyping. Due to not wanting to part ways with the eye watering amounts of money that is required for an Vivado enterprise edition license my choice was effectively narrowed to the FPGA chips available under the WebPack version of Vivado.
Xilinx supported boards per Vivado edition
Unsurprisingly Xilinx are well aware of how top of the range the Virtex series are, and doesn’t offer any Virtex UltraScale+ chips with the webpack license. That said, they do offer support for two very respectable Kintex UltraScale+ FPGA models, the XCKU3P and the XCKU5P .
Xiling product guide, overview for the Kintex UltraScale+ series
These two chips are far from being small hobbyist toys, with the smaller XCUK3P already boasting +162K LUTs and 16 GTY transceivers, capable, depending on the physical constraints imposed by the chip packaging of operating at up to 32.75Gb/s.
Now that the chip selection has been narrowed down I set out to look for a dev board.
My requirements for the board where that it featured :
at least 2 SFP+ or 1 QSFP connector
a JTAG interface
a PCIe interface at least x8 wide
As to where to get the board from, my options where :
Design the board myself Get the AXKU5 or AXKU3 from Alinx See what I could unearth on the second hand market
Although option 1 could have been very interesting, designing a dev board with both a high speed PCIe and ethernet interface was not the goal of today’s project.
As for option 2 , Alinx is newer vendor that is still building up its credibility in the west, their technical documentation is a bit sparse, but the feedback seems to be positive with no major issues being reported. Most importantly, Alinx provided very fairly priced development boards in the 900 to 1050 dollar range ( +150$ for the HPC FMC SFP+ extension board ). Although these are not cheap by any metric, compared to the competitions price point, they are the best value.
Option 2 was coming up ahead until I stumbled upon this ebay listing :
Ebay listing for a decommissioned Alibaba Cloud accelerator FPGA For 200$ this board featured a XCKU3P-FFVB676 , 2 SPF+ connector and a x8 PCIe interface. On the flip side it came with no documentation whatsoever, no guaranty it worked, and the faint promise in the listing that there was a JTAG interface. A sane person would likely have dismissed this as an interesting internet oddity, a remanence of what happens when a generation of accelerator cards gets phased out in favor of the next, or maybe just an expensive paperweight.
But I like a challenge, and the appeal of unlocking the 200$ Kintex UltraScale+ development board was too great to ignore.
As such, I aim for this article to become the documentation paving the way to though this mirage.
The debugger challenge #
Xilinx’s UG908 Programming and Debugging User Guide (Appendix D) specifies their blessed JTAG probe ecosystem for FPGA configuration and debug. Rather than dropping $100+ on yet another proprietary dongle that’ll collect dust after the project ends, I’m exploring alternatives. The obvious tradeoff: abandoning Xilinx’s toolchain means losing ILA integration. However, the ILA fundamentally just captures samples and streams them via JTAG USER registers, there’s nothing preventing us from building our own logic analyzer with equivalent functionality and a custom host interface.
Enter OpenOCD. While primarily targeting ARM/RISC-V SoCs, it maintains an impressive database of supported probe hardware and provides granular control over JTAG operations. More importantly, it natively supports SVF (Serial Vector Format), a vendor-neutral bitstream format that Vivado can export.
The documentation landscape is admittedly sparse for anything beyond 7-series FPGAs, and the most recent OpenOCD documentation I could unearth was focused on Zynq ARM core debugging rather than fabric configuration. But the fundamentals remain sound: JTAG is JTAG, SVF is standardized, and the boundary scan architecture hasn’t fundamentally changed.
The approach should be straightforward: generate SVF from Vivado, feed it through OpenOCD with a commodity JTAG adapter, and validate the configuration. Worst case, we’ll need to patch some adapter-specific quirks or boundary scan chain register addresses. Time to find out if this theory holds up in practice.
The plan #
So, to resume, the current plan is to buy a second hand hardware accelerator of eBay at a too good to be true price, and try to configure it with an unofficial probe using open source software without any clear official support.
The answer to the obvious question you are thinking if you, like me, have been around the block a few times is: many things.
As such, we need a plan for approaching this. The goal of this plan is to outline incremental steps that will build upon themselves with the end goal of being able to use this as a dev board.
1 - Confirming the board works #
First order of business will be to confirm the board is showing signs of working as intended.
There is a high probability that the flash wasn’t wiped before this board was sold off, as such the previous bitstream should still be in the flash. Given this board was used as an accelerator, we should be able to use that to confirm the board is working by either checking if the board is presenting itself as a PCIe endpoint or if the SFP’s are sending the ethernet PHY idle sequence.
2 - Connecting a debugger to it #
The next step is going to be to try and connect the debugger. The eBay listing advertised there is a JTAG interface, but the picture is grainy enough that where that JTAG is and what pins are available is unclear.
Additionally, we have no indication of what devices are daisy chained together onto the JTAG scan chain. This is an essential question for flashing over JTAG, so it will need to be figured out.
At this point, it would also be strategic to try and do some more probing into the FPGA via JTAG. Xilinx FPGAs exposes a handful of useful system registers accessible over JTAG. The most well known of these interfaces is the SYSMON, which allows us, among other things, to get real time temperature and voltage reading from inside the chip. Although openOCD doesn’t have SYSMON support out of the box it would be worth while to build it, to :
Familiarise myself with openOCD scripting, this might come in handy when building my ILA replacement down the line Having an easy side channel to monitor FPGA operating parameters Make a contribution to openOCD as it have support for the interfacing with XADC but not SYSMON
3 - Figuring out the Pinout #
The hardest part will be figuring out the FPGA’s pinout and my clock sources. The questions that need answering are :
what external clocks sources do I have, what are there frequencies and which pins are they connected to
which transceivers are the SFPs connected to
which transceivers is the PCIe connected to
4 - Writing a bitstream #
For now I will be focusing on writing a temporary configurations over JTAG to the CCLs and not re-writing the flash.
That plan is to trying writing either the bitstream directly though openOCD’s virtex2 + pld drivers, or by replaying the SVF generated by Vivado.
Since I believe a low iteration time is paramount to project velocity and getting big things done, I also want automatize all of the Vivado flow from taking the rtl to the SVF generation.
Simple enough ?
Liveness test #
A few days later my prize arrived via express mail.
My prized Kintex UltraScale+ FPGA board also known as the decommissioned Alibaba cloud accelerator. Jammed transceiver now safely removed.
Unexpectedly it even came with a free 25G SFP28 Huawei transceiver rated for a 300m distance and a single 1m long OS2 fiber patch cable. This was likely not intentional as the transceiver was jammed in the SFP cage, but it was still very generous of them to include the fiber patch cable.
Free additional SFP28-25G-1310nm-300m-SM Huawei transceiver, and 1m long OS2 patch cable
The board also came with a travel case and half of a PCIe to USB adapter and a 12V power supply that one could use to power the board as a standalone device. Although this standalone configuration will not be of any use to me, for those looking to develop just networking interfaces without any PCIe interface, this could come in handy.
Overall the board looked a little worn, but both the transceiver cages and PCIe connectors didn’t look to be damaged.
Standalone configuration #
Before real testing could start I first did a small power-up test using the PCIe to USB adapter that the seller provided. I was able to do a quick check using the LEDs and the FPGAs dissipated heat that the board seemed to be powering up at a surface level (pun intended).
PCIe interface #
As a reminder, this next section relies on the flash not having been wiped and still containing the previous user’s design.
Since I didn’t want to directly plug mystery hardware into my prized build server, I decided to use a Raspberry Pi 5 as my sacrificial test device and got myself an external PCIe adapter.
It just so happened that the latest Raspberry Pi version, the Pi 5, now features an external PCIe Gen 2.0 x1 interface. Though our FPGA can handle up to a PCIe Gen 3.0 and the board had a x8 wide interface, since PCIe standard is backwards compatible and the number of lanes on the interface can be downgraded, plugging our FPGA with this Raspberry Pi will work.
FPGA board connected to the Raspberry Pi 5 via the PCIe to PCIe x1 adapter
After both the Raspberry and the FPGA were booted, I SSHed into my rpi and started looking for the PCIe enumeration sequence logged from the Linux PCIe core subsystem.
dmesg log :
[ 0.388790] pci 0000:00:00.0: [14e4:2712] type 01 class 0x060400 [ 0.388817] pci 0000:00:00.0: PME# supported from D0 D3hot [ 0.389752] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 0.495733] brcm-pcie 1000110000.pcie: link up, 5.0 GT/s PCIe x1 (!SSC) [ 0.495759] pci 0000:01:00.0: [dabc:1017] type 00 class 0x020000
Background information #
Since most people might not be intimately as familiar with PCIe terminology, allow me to quickly document what is going on here.
0000:00:00.0 : is the identifier of a specific PCIe device connected through the PCIe network to the kernel, it read as domain : bus : device . function .
[14e4:2712] : is the device’s [vendor id:device id] , these vendor id identifiers are assigned by the PCI standard body to hardware vendors. Vendors are then free to define there own vendor id’s.
The full list of official vendor id’s and released device id can be found : https://admin.pci-ids.ucw.cz/read/PC/14e4 or in the linux kernel code : https://github.com/torvalds/linux/blob/7aac71907bdea16e2754a782b9d9155449a9d49d/include/linux/pci_ids.h#L160-L3256
type 01 : PCIe has two types of devices, bridges allowing the connection of multiple downstream devices to an upstream device, and endpoints are the leafs. Bridges are of type 01 and endpoints of type 00 .
class 0x60400 : is the PCIe device class, it categorizes the kind of function the device performs. It uses the following format 0x[Base Class (8 bits)][Sub Class (8 bits)][Programming Interface (8 bits)] , ( note : the sub class field might be unused ).
A list of class and sub class identifiers can be found: https://admin.pci-ids.ucw.cz/read/PD or again in the linux codebase : https://github.com/torvalds/linux/blob/7aac71907bdea16e2754a782b9d9155449a9d49d/include/linux/pci_ids.h#L15-L158
Dmesg log #
The two most interesting lines of the dmesg log are :
[ 0.388790] pci 0000:00:00.0: [14e4:2712] type 01 class 0x060400 [ 0.495759] pci 0000:01:00.0: [dabc:1017] type 00 class 0x020000
Firstly the PCIe subsystem logs that at 0000:00:00.0 it has discovered a Broadcom BCM2712 PCIe Bridge ( vendor id 14e4 , device id 0x2712 ).This bridge (type 01 ) class 0x0604xx tells us it is a PCI-to-PCI bridge, meaning it is essentially creating additional PCIe lanes downstream for endpoint devices or additional bridges.
The subsystem then discovers a second device at 0000:01:00.0 , this is an endpoint (type 00 ), and class 0x02000 tells us it is an ethernet networking equipment.
Of note dabc doesn’t correspond to a known vendor id. When designing a PCIe interface in hardware these are parameters we can configured. Additionally, among the different ways Linux uses to identify which driver to load for a PCIe device the vendor id and device id can be used for matching. Supposing we are implementing custom logic, in order to prevent any bug where the wrong driver might be loaded, it is best to use a separate vendor id. This also helps identify your custom accelerator at a glance and use it to load your custom driver.
As such, it is not surprising to see an unknown vendor id appear for an FPGA, this with the class as an ethernet networking device is a strong hint this is our board.
Full PCIe device status #
Dmesg logs have already given us a good indication that our FPGA board and its PCIe interface was working but to confirm with certainty that the device with vendor id dabc is our FPGA we now turn to lspci . lspci -vvv is the most verbose output and gives us a full overview of the detected PCIe devices capabilities and current configurations.
Broadcom bridge:
0000:00:00.0 PCI bridge: Broadcom Inc. and subsidiaries BCM2712 PCIe Bridge (rev 21) (prog-if 00 [Normal decode]) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [48] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME- Capabilities: [ac] Express (v2) Root Port (Slot-), MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0 ExtTag- RBE+ DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+ MaxPayload 512 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <4us ClockPM+ Surprise- LLActRep- BwNot+ ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x1 TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt+ RootCap: CRSVisible+ RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+ RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+ 10BitTagComp- 10BitTagReq- OBFF Via WAKE#, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- LN System CLS Not Supported, TPHComp- ExtTPHComp- ARIFwd+ AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled, ARIFwd- AtomicOpsCtl: ReqEn- EgressBlck- LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS+ LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported, DRS- DownstreamComp: Link Up - Present Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 RootCmd: CERptEn+ NFERptEn+ FERptEn+ RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd- FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0 ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000 Capabilities: [160 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [180 v1] Vendor Specific Information: ID=0000 Rev=0 Len=028 > Capabilities: [240 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=8us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=1us LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us Capabilities: [300 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Kernel driver in use: pcieport
FPGA board:
0000:01:00.0 Ethernet controller: Device dabc:1017 Subsystem: Red Hat, Inc. Device a001 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- -tclargs