The price of DDR5 memory is setting new highs these days as demand badly outstrips supply. In a bid to save money, Meta is recovering legacy DDR4 memory from used servers and is installing it into new machines using its in-house developed Vistara ASIC that enables it to connect old memory modules to its latest servers running AMD EPYC 'Turin' processors that only support DDR5 memory.
Interestingly, Meta is not the only company developing such a solution. Panmnesia, a startup from South Korea, has developed an off-the-shelf CXL controller and switch that enables servers to attach considerably larger memory pools without extending latency, which differentiates Panmnesia’s solution from competing CXL offerings.
Custom ASIC enables DDR4 memory to work with new servers
Vistara is Meta’s first-gen custom CXL memory expander ASIC designed to attach outdated DDR4 memory to modern servers. The chip implements a CXL 2.0 Type-3 memory expander over a PCIe 5.0 x16 interface and bridges standard DDR4 RDIMMs to host processors. Each ASIC supports two independent 72-bit DDR4 memory channels and can provide up to 256 GB of capacity using 64 GB DIMMs. At present, Meta deploys 128 GB per ASIC using 32 GB DDR4 modules recovered from decommissioned servers.
Latest Videos From Watch full video here:
(Image credit: Meta)
Meta deploys Vistara in its MemServer platform, where two ASICs connect to a single 158-core AMD Turin processor over PCIe 5.0 x8 links. Each server combines 768 GB of DDR5-6400 local memory with 256 GB of CXL-attached DDR4-2400, which expands memory capacity to 1 TB. The software stack transparently exposes CXL memory as a separate NUMA node and enables Linux to migrate cold pages to the slower DDR4 tier (with 76 GB/s of bandwidth) and retain frequently accessed data in local DDR5 (with 614 GB/s of bandwidth).
(Image credit: Meta)
The ASIC is based on three RISC-V processor cores for secure boot, device initialization, firmware management, and health monitoring. Meta claims it has optimized its CXL controller and memory pipeline to reduce protocol overhead, minimize queuing delays, and lower idle round-trip latency to around 50ns. The chip also incorporates advanced reliability features, including Reed-Solomon two-symbol error correction and x4 chip-kill support.
(Image credit: Meta)
... continue reading