The Next Computing Revolution: Bringing Processing Inside Memory

Dr. Onur Mutlu is a renowned computer scientist and Professor at ETH Zurich whose pioneering research in computer architecture, memory systems, and hardware security has shaped industry standards and influenced technologies used by billions worldwide. In this article, he shares his insights on advancing memory-centric computing, exploring the challenges, breakthroughs, and future possibilities of next-generation computer memory systems. With advancements in memory-centric (i.e., processing-in-memory) architectures, what challenges do you foresee in integrating these systems into mainstream computing? Memory-centric computing (or processing-in-memory) is another fascinating research area where we can change the paradigm of how we do computing. Moving data between memory and processor consumes orders of magnitude more energy than computation. Many results from real systems show that most (e.g., >90%) of the energy spent executing major AI models comes from data movement and memory access, not computation performed on the data. Unfortunately, our existing processor-centric design paradigm is the major fundamental cause of significant data movement across the entire system. All data needs to be processed in computation units, which today are very far away from memory, storage, and sensing units. If we want to make computing truly efficient and high performance (and also robust and sustainable), we should minimize data movement. We can do so by placing computation capability near and inside memory structures (especially high-density memories like DRAM and flash), and thus greatly (i.e., by orders of magnitude) enhance energy efficiency, performance, robustness, security, and sustainability of almost all computing systems we build. We have been doing research in this area for at least 15 years. We have written many papers and designed many new promising processing-in-memory techniques, advocating for a paradigm shift and demonstrating great energy and performance benefits with processing-in-memory. Processing data near, and in, where data resides or is produced simply makes sense, from a very fundamental first-principles standpoint. Three important works that overview the area, written 12 years apart, are the following: Onur Mutlu, Ataberk Olgun, and İsmail Emir Yüksel, “Memory-Centric Computing: Solving Computing’s Memory Problem” Invited Paper in Proceedings of the 17th IEEE International Memory Workshop (IMW), Monterey, CA, USA, May 2025. Onur Mutlu, “Memory Scaling: A Systems Architecture Perspective,” Proceedings of the 5th International Memory Workshop (IMW), Monterey, CA, May 2013. Onur Mutlu, Saugata Ghose, Juan Gomez-Luna, Rachata Ausavarungnirun, Mohammad Sadrosadati, and Geraldo F. Oliveira, “A Modern Primer on Processing in Memory,” Invited Book Chapter in Emerging Computing: From Devices to Systems – Looking Beyond Moore and Von Neumann, Springer, July 2022 (updated February 2025). Of course, changing computing hardware is a tough task, and changing the computing paradigm is an even more greatly difficult task. It comes with many ramifications and challenges across the entire computing stack, since the processor-centric thinking and decisions (we call it the processor-centric mindset) are ingrained in essentially every part of the computing stack, spanning circuits to algorithms (all parts of hardware and software design), including computing theory and computing education. For example, when we educate our students, all of our courses immediately ingrain the processor-centric thinking in bright minds: the CPU we design is separate from memory, with a huge dichotomy between them, and memory cannot compute — as a result, data needs to be moved from memory to CPU for the system to work. We build on this foundation that everyone accepts as almost a “dogma” (and definitely “business as usual”). As a result, our design decisions across the entire computing stack, as well as business models in computing, are optimized for processor-centric computing. It is therefore not easy to disrupt the status quo. And, I believe this is the most difficult part: how do we change the processor-centric mindset and business models? Despite this inherent and longstanding difficulty, much progress has been made in the past 10-15 years to enable the adoption of processing-in-memory (PIM), thanks to my group’s research and the collective efforts of the research community. Many recent studies show that memory-centric computing can greatly improve system performance & energy efficiency, and can benefit system robustness. Major industrial vendors and startup companies have recently introduced memory (DRAM) chips with sophisticated computation capabilities to accelerate data-intensive workloads, including AI workloads. We have been working heavily with one of those startup companies (UPMEM) to develop a software stack (e.g., programming models, compilers, benchmark suites, application case studies, system software, security infrastructure) for real PIM hardware (e.g., our IEEE Access 2022, SIGMETRICS 2022, ISPASS 2023, PACT 2023, PACT 2024, SIGMETRICS 2025 papers). We have also demonstrated, fascinatingly, that commercial off-the-shelf DRAM chips that anyone can buy are capable of performing bulk bitwise computation operations by exploiting their fundamental operational properties, without changing the DRAM chip or its interface! This is fascinating because DRAM chips are not designed for such a purpose, yet by violating the timing parameters of such chips, we showed that one can perform general-purpose and functionally-complete bulk bitwise operations (see our HPCA 2024 and IEDM 2024 papers, and the PiDRAM infrastructure; links below). Now, imagine what one could do if they were to shift their mindset and design the DRAM chip to actually robustly perform computation operations (as opposed to doing nothing in this regard). We believe a major challenge in the adoption of PIM lies in the development of a software stack that can easily and effectively take advantage of the underlying memory-centric hardware, without significantly burdening the programmers. We have made significant progress, yet there is much more that remains to be done in both research and development. The field is super exciting and there are many game-changing ideas to be discovered and invented! Going forward, both hardware and software stack should be revisited and designed carefully and efficiently to take advantage of memory-centric computing. For interested folks, I would recommend following our open and freely available tutorials and workshops on memory-centric computing. We have also developed open and freely available courses that cover PIM and memory-centric designs and ideas, and we keep updating our courses at every incarnation. Some useful links follow: Ismail Emir Yuksel, Yahya Can Tugrul, Ataberk Olgun, F. Nisa Bostanci, A. Giray Yaglikci, Geraldo F. Oliveira, Haocong Luo, Juan Gomez-Luna, Mohammad Sadrosadati, and Onur Mutlu, “Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis”, Proceedings of the 30th International Symposium on High-Performance Computer Architecture (HPCA), Edinburgh, UK, March 2024. Ataberk Olgun, Juan Gomez Luna, Konstantinos Kanellopoulos, Behzad Salami, Hasan Hassan, Oguz Ergin, and Onur Mutlu, “PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM”, ACM Transactions on Architecture and Code Optimization (TACO), March 2023. Onur Mutlu, Ataberk Olgun, İsmail Emir Yüksel, and Geraldo F. Oliveira, “Memory-Centric Computing: Recent Advances in Processing-in-DRAM”, Invited Paper in Proceedings of the 70th Annual IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, USA, December 2024. Geraldo Oliveira, Juan Gómez-Luna, Onur Mutlu, “Processing-in-Memory Tutorials: Experiences from Past Two Years and Thoughts Looking Forward”, 22 January 2025. Onur Mutlu’s Computer Architecture Course at ETH Zurich (Fall 2024): Course Page; YouTube Playlist Processing in Memory Course at ETH Zurich (Spring 2023): Course Page; YouTube Playlist Memory-Centric Computing Systems Tutorials and Workshops Given the increasing complexity of memory systems, what methodologies do you employ to ensure the reliability and efficiency of your designs? Memory (including storage) system research is quite exciting, as the memory system is the major bottleneck for energy and performance in modern systems. Yet, to do truly impactful and high-quality scientific research in this area, one needs to be ready to develop many different types of infrastructures at different levels of the computing stack and rigorously validate such infrastructures. In my group, we develop *a lot* of infrastructure to advance the state of the art. We have been doing so since 2009, when my group was formed at Carnegie Mellon University, and we have intensified such efforts when we moved to ETH Zurich. And, we open source almost all of the infrastructure we develop, enabling others to also do great works in memory systems. Many academic researchers, as well as major companies, use these infrastructures. Both academic and industrial researchers and engineers provide us with feedback on these infrastructures, enabling us to continually improve and validate our evaluation methods. We use a multi-pronged approach to develop cutting-edge infrastructure to both understand and model existing and future memory systems, and test new ideas. This approach has at least five components at different levels: 1. First, we develop real chip testing infrastructures (e.g., SoftMC, DRAM Bender, EasyDRAM, which are based on FPGA based memory controllers we have developed) so that we can experimentally understand the characteristics and capabilities of existing chips, under many different operating conditions (e.g., wide range of temperatures, voltage levels, latencies) and workloads (e.g., access patterns of AI workloads or security attacks). We have developed multiple memory testing infrastructures for DRAM and NAND flash memory, starting in 2010-2011. These infrastructures have enabled major discoveries in the robustness and computation capabilities of real DRAM chips by us (like RowHammer, RowPress, Variable Read Disturbance, Functionally-Complete computation capability in DRAM, data copying capability in real DRAM chips, True Random Number Generation capability in real DRAM chips, and many more) as well as other researchers. Similarly, we have developed NAND flash memory testing infrastructures that led to major discoveries in understanding latency and error characteristics and computation capability of flash memory chips (e.g., Flash-Cosmos). Such infrastructures are invaluable for making new discoveries and testing innovative ideas on real chips and we continue to expand our “memory discovery” lab that enables concurrent testing of hundreds of memory modules. Our major infrastructures have been in use by industry and academia and are open source at https://github.com/CMU-SAFARI/DRAM-Bender and https://github.com/CMU-SAFARI/SoftMC. 2. Based on our deep understanding of real DRAM chips (developed using real results from real chips), we develop circuit-level modeling infrastructures. These infrastructures enable us to accurately examine the effects of circuit-level changes on power, performance, area characteristics of innovative DRAM and flash memory architectures. We open source these infrastructures as well. An example: https://github.com/CMU-SAFARI/CLRDRAM. 3. To evaluate the benefits of new architectural and system-level ideas that cannot be evaluated on existing systems (e.g., processing in memory, new DRAM architectures), we develop high-level simulation infrastructures that model the memory and the storage system (and oftentimes the entire processing system as well). Since 2006, we have developed many such simulators. A very popular DRAM simulator, Ramulator (and now Ramulator 2.0; https://github.com/CMU-SAFARI/ramulator2) is widely used in both industry and academia, with significant investments made to it by multiple companies as well as research groups. We keep expanding Ramulator and making it easy to use. On the flash memory side, we have released the MQSim simulation infrastructure, which enables studies in cutting-edge flash and non-volatile memory SSDs (https://github.com/CMU-SAFARI/MQSim). More recently, we have released an exciting simulation framework called Virtuoso, which can enable fast and accurate evaluation of innovative ideas that span both hardware and software, especially in virtual memory systems (https://github.com/CMU-SAFARI/Virtuoso). We also continue to develop and release simulation infrastructures for evaluating prefetching and processing in memory mechanisms (e.g., Pythia https://github.com/CMU-SAFARI/Pythia, DAMOV https://github.com/CMU-SAFARI/DAMOV). 4. To demonstrate end-to-end benefits of new ideas using existing hardware (i.e., real memory chips), we develop many prototyping infrastructures. PiDRAM (https://github.com/CMU-SAFARI/PiDRAM), which is a platform for evaluating end-to-end performance of processing-in-DRAM techniques is one such example. Sibyl (https://github.com/CMU-SAFARI/Sibyl) is another system where we demonstrate the performance of using reinforcement learning based data placement on real SSDs. Validation is critical for us and we validate our ideas and infrastructures using real systems as much as possible. 5. We develop many benchmarks to evaluate especially new paradigms like memory-centric computing systems (e.g., PRIM Benchmarks https://github.com/CMU-SAFARI/prim-benchmarks and DAMOV workloads https://github.com/CMU-SAFARI/DAMOV). These have been instrumental in setting the stage for many developments in processing in memory. I would recommend everyone doing serious research to invest in infrastructure development, both real system and simulation based. When solving difficult systems problems, doing so can enable leading-edge research, new discoveries, and new methods of inventing and evaluating new ideas. Doing so requires hard work and extensive time investment, and has many “engineering” challenges, but in my opinion, it greatly pays off to validate ideas and have high impact. RowHammer, RowPress, Functionally-Complete DRAM, Flash-Cosmos, Variable Read Disturbance, etc. are all examples of our discoveries that would not have been possible without a large investment in infrastructure. For example, the DRAM infrastructure we developed took 1.5 years to develop, but, fascinatingly, we (and others) are still discovering new phenomena with it. We open source all our simulators and infrastructures for the community to use and benefit from. This also enables independent validation and easier reproducibility, which are very important in science. Open source lectures we provide also enable education of the broader community. I am proud to see many of our open source infrastructures broadly used by academia and industry (having real impact on both long-term research and short-term products, as well as broader education at a global scale) and four of our works have received best/distinguished artifact awards at top computer architecture and systems venues (ISCA, MICRO, HPCA, PPoPP). Dr. Onur Mutlu is an ACM and IEEE Fellow and award-winning Professor at ETH Zurich whose groundbreaking work in energy-efficient computer architecture and memory systems drives advances in the hardware powering today’s computing world.

The Next Computing Revolution: Bringing Processing Inside Memory

Share this article

Related Articles