UCSD Workshop Recognizes Leading Research on the Potential of New Memory Technologies
San Diego, Calif., April 3, 2018 -- The University of California San Diego hosted the 9th Annual Non-Volatile Memories Workshop (NVMW 2018) on March 11 to 13 here on campus. Over 200 scientists and engineers from around the world gathered to discuss the latest innovations in non-volatile computer memories, how they will be used to power applications ranging from “big data” to machine learning.
For the first time, the workshop presented awards recognizing outstanding research in the area of non-volatile memory technologies.
The NVMW Memorable Paper Award is given to a student paper published in the last two years that is of exceptional quality and is expected to have substantial impact on the fields of study related to non-volatile memories. The NVMW gave two of these awards this year: One in the area of system architectures and applications and another in the area of devices, coding, and information theory.
A third award, the NVMW Persistent Impact Prize (presented this year by Marvell) recognizes a paper published at least five years prior that has had exceptional impact on the fields of study related to non-volatile memories. The award committee interprets both “impact” and “non-volatile memories” broadly.
All three awards include a $1000 cash prize and a customized, 3D printed medal. Details on both awards are available at the NVMW website.
“We created to these awards to recognize the amazing work that researchers are doing to fully realize the potential of these new memory technologies” said Dr. Steven Swanson, one of the workshop’s organizers. “The NVMW draws the best work from around the world in this area, so it is uniquely positioned to identify and recognize outstanding contributions.”
The Persistent Impact Prize
|From left: UC San Diego computer science professor Steven Swanson, Haris Volos from HP Enterprise and UC San Diego electrical and computer engineering professor Paul Siegel.|
This year, the Persistent Impact Prize went to Haris Volos (HP Enterprise) and Mike Swift (University of Wisconsin, Madison) won the award for their paper “Mnemosyne: Lightweight Persistent Memory,” originally published in 2011 at the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Marvell sponsored the Persistent Impact Prize.
The award committee made the award with the following citation:
Mnemosyne is awarded the 2018 Persistent Impact Prize in recognition of its contributions to the foundations of programming with non-volatile main memory. It was among the first to identify the need to address persistent memory as part of the programming model; propose lightweight transactions to support persistence; and handle consistency under failure. The problems it described and the solutions it proposed have influenced the design of most (if not all) of persistent main memory systems that followed it.
Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: lightweight persistent memory. In Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS XVI). ACM, New York, NY, USA, 91-104. http://dx.doi.org/10.1145/1950365.1950379
Since the dawn of computing, computer applications have organized their data between two tiers: memory and storage. This organization has been largely motivated by the different performance and data-retention characteristics of memory and storage technologies. As memory is faster than storage, applications typically load their data into memory during processing. Memory, however, is also volatile, meaning that it retains its contents when powered on but loses its contents when power is interrupted. To work around this limitation, applications typically store their data in a persistent storage technology, such as hard disk or solid-state disk drive, for long-term retention. Document editing is a familiar example: a word processor loads a document into memory for fast interactive editing, and periodically saves the document on disk in the form of a file for long-term retention; on the unfortunate event of a power interruption or other failure, any unsaved edits are lost.
Advances in solid-state physics over the last decade have given rise to a new breed of persistent memory technologies that provide persistent storage at memory-like performance, thus blurring the line between memory and storage. Thus, applications can now keep a single image of data in persistent memory that they can access and update in-place quickly without the need to load from and save to a file on a slow disk for long-term persistence. Although this removes the danger of data loss on the event of a power interruption or other failure, it raises the risk of data corruption. Going back to the document-editing example, the word processor must ensure that a failure while editing a document does not result in a document where some parts correspond to a previous version of the document and other parts correspond to a newer version. Worse, the document may be corrupted and become unusable. Programmers using persistent memory thus need tools to ensure that they can safely transform data between correct versions, without risk of corruption if a failure happens during the transformation.
Mnemosyne was among the first systems to identify the above opportunity and challenges with developing applications targeting new persistent memory technologies. Mnemosyne provides programmers with the necessary tools and mechanisms to write correct and high-performance applications that update persistent data in-place without risking corruption under failure using transactions. Mnemosyne’s influence has been significant as it stimulated further inter-disciplinary research on persistent memory spanning several areas of computer science, including computer architecture, operating systems, data management systems, programming languages, and security; and provided the groundwork for nearly all of the persistent memory systems that have followed. Its design was an inspiration to the Storage Networking Industry Association (SNIA) persistent memory programming model.
The Memorable Paper Award
For the award’s two categories, the NVMW recognized one winner and five finalist.
In System Architecture and Application
|From left: UC San Diego computer science professor Steven Swanson, Ana Klimovic of Stanford University, and UC San Diego electrical and computer engineering professor Paul Siegel.|
Ana Klimovic, Heiner Litz, and Christos Kozyrakis. 2017. “ReFlex: Remote Flash ≈ Local Flash”. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, New York, NY, USA, 345-359. DOI: https://doi.org/10.1145/3037697.3037732
Internet companies such as Facebook and Google host trillions of messages, photos, and videos for their users. Hence, they need storage systems that are massive in scale, fast to access, and cost effective. Scale is achieved by hosting internet services in datacenters with thousands of machines, each contributing its local storage to the global data pool. Speed is achieved by selectively replacing slow hard disks in machines with Flash storage devices that can serve data accesses with 100x lower latency and 10,000x higher throughput.
However, flash makes it difficult to build a cost-effective storage system. Flash devices are typically underutilized in terms of capacity and throughput due to the imbalance in the compute and storage requirements of the internet services running on each machine. In the past, datacenter operators dealt with the same challenge for disks by allowing services running on each machine to allocate storage over the network on any disk with spare capacity and bandwidth in the datacenter. Remote (over the network) access to disks allows us to utilize all available capacity and throughput. Past efforts to implement similar remote access systems for Flash devices have ran into significant challenges. Network protocol processing at the throughput of Flash devices requires a large number of processor cores and adds overheads that cancel out the latency advantages of using Flash. Moreover, when two remote machines access the same Flash device, interference between the two access streams can lead to unpredictable performance degradation.
To address these challenges, researchers we developed ReFlex. ReFlex enables high performance access to remote Flash storage with minimal compute resources and provides predictable performance for multiple services sharing a Flash device over the network. Using a single processing core, the system can process up to 850,000 requests per second which is 11x more than a traditional Linux network storage system. ReFlex makes remote Flash look like local Flash to applications, making it easy for a service running on a particular machine to use spare Flash capacity and bandwidth on other machines in the datacenter. To provide predictable performance when multiple remote machines access the same Flash device, ReFlex uses a novel scheduler to process incoming requests in an interference-aware manner.
ReFlex is having an increasing impact in industry and, in collaboration with IBM Research, has been integrated into the Apache Crail distributed storage system. This integration allows popular data analytics frameworks to leverage ReFlex to improve their resource efficiency while maintaining high, predictable performance. ReFlex is also being ported to a system on chip (SoC) platform by Broadcom Limited. ReFlex is open-source software and available at: https://www.github.com/stanford-mast/reflex.
- Mohsen Imani, Saransh Gupta, and Tajana Rosing. 2017. Ultra-Efficient Processing In-Memory for Data Intensive Applications. In Proceedings of the 54th Annual Design Automation Conference 2017 (DAC '17). ACM, New York, NY, USA, Article 6
- S. Yang, C. Schoeny and L. Dolecek, "Order-optimal permutation codes in the generalized cayley metric," 2017 IEEE Information Theory Workshop (ITW), Kaohsiung, 2017, pp. 234-238.
- W. Wen, L. Zhao, Y. Zhang and J. Yang, "Speeding up crossbar resistive memory by exploiting in-memory data patterns," 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Irvine, CA, 2017, pp. 261-267.
- Ahmed Hareedy, Homa Esfahanizadeh, Lara Dolecek, "High performance non-binary spatially-coupled codes for flash memories", Information Theory Workshop (ITW) 2017 IEEE, pp. 229-233, 2017.
Yi Liu, Pengfei Huang, Alexander W. Bergman, Paul H. Siegel, “Optimal Data Shaping Code Design,” based upon "Performance of Optimal Data Shaping Codes," in Proceedings of IEEE International Symposium on Information Theory, Aachen, Germany, June 25-30, 2017, pp. 1003-1007
- Pengfei Huang, Yi Liu, Xiaojie Zhang, Paul H. Siegel, Erich F. Haratsch, “Syndrome-Coupled Rate-Compatible Error-Correcting Codes for Flash Memories,” based upon “Syndrome-Coupled Rate-Compatible Error-Correcting Codes," in Proceedings of IEEE Information Theory Workshop, Kaohsiung, Taiwan, November 6-10, 2017, pp. 454-458.
In Devices and Information Theory
|From left: UC San Diego computer science professor Steven Swanson, Ahmed Hareedy from UCLA and UC San Diego electrical and computer engineering professor Paul Siegel.|
Ahmed Hareedy, Homa Esfahanizadeh, Lara Dolecek, "High performance non-binary spatially-coupled codes for flash memories", Information Theory Workshop (ITW) 2017 IEEE, pp. 229-233, 2017. http://ieeexplore.ieee.org/document/8277940/
In order to meet the demands of data-hungry applications, modern data storage devices are expected to become increasingly dense. This is a challenging endeavor, and storage engineers are continuously trying to provide novel technologies. However, these new technologies are typically associated with an increase in the number, sources, and types of errors. For example, in Flash memories, programming errors and inter-cell interference are sources of errors which are exacerbated by increasing the density of the device. The same applies for grid misalignments, read-head flying-height variations, and inter-track interference in hard disk drives. This fact makes the goal of ensuring highly-reliable, dense storage devices a formidable challenge. Our research focuses on providing novel and efficient error-correcting coding schemes that are capable of overcoming this challenge. In particular, through informed exploitation of the underlying channel characteristics of the storage device being studied, we provide frameworks for systematically generating error-correcting codes (ECCs), with mathematical guarantees, that offer performance improvements in orders of magnitude relative to the prior state of the art. These frameworks are based on mathematical tools drawn from coding theory and information theory, and rely on advanced techniques from probability theory, linear algebra, graph theory, optimization, and combinatorics.
In this work, we focus on the design and optimization of a particular class of ECCs, namely spatially-coupled codes, for practical storage channels. Spatially-coupled codes have a compact graphical representation, and are known to provide complexity/latency gains over other codes. Additionally, spatially-coupled codes have enhanced theoretical properties. However, there is a significant room for improving the finite-length performance of these codes for different applications. We recently discovered that the nature of the error-prone structures in the graph of a code critically depends on the channel underlying the storage device for which the code is used. Here, we focus on practical Flash channels. We propose a three-stage code design approach that aims at minimizing the number of these error-prone structures in the graph of the designed spatially-coupled codes. To perform this minimization, our approach optimizes a particular set of code design parameters at each of the three stages. Codes designed using the proposed approach achieve over two orders of magnitude and over 200% raw bit error rate performance gains compared to known techniques in the literature. We are currently working on extending the proposed approach to design high performance spatially-coupled codes for magnetic recording devices. This approach, along with our previously developed methods, constitutes a comprehensive ECC toolbox for a variety of modern storage and memory systems, including multi-dimensional storage systems.
Ana Klimovic, Heiner Litz, and Christos Kozyrakis from Stanford won the inaugural Memorable Paper Award in systems, architecture, and applications for their paper “ReFlex: Remote Flash ≈ Local Flash,” while Ahmed Hareedy, Homa Esfahanizadeh, and Lara Dolecek from UCLA work the Memorable Paper Award in ECC and devices for “A Three-Stage Approach for Designing Non-Binary Spatially-Coupled Codes for Flash Memories.” Toshiba Memory sponsored the Memorable Paper awards.
- Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, and Thomas Anderson. 2017. Strata: A Cross Media File System. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). ACM, New York, NY, USA, 460-477.
- Jian Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah, Amit Borase, Tamires Brito Da Silva, Steven Swanson, and Andy Rudoff. 2017. NOVA-Fortis: A Fault-Tolerant Non-Volatile Main Memory File System. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). ACM, New York, NY, USA, 478-496.
- Matias Bjørling, Javier Gonzalez, and Philippe Bonnet. 2017. LightNVM: The Linux Open-Channel SSD Subsystem. In Proceedings of the USENIX Conference on File and Storage Technologies, (FAST), Santa Clara, CA, USA. 359--374.
- A. Joshi, V. Nagarajan, S. Viglas, and M. Cintra, “ATOM: Atomic Durability in Non-volatile Memory through Hard- ware Logging,” in HPCA, 2017
- Ismail Oukid, Daniel Booss, Adrien Lespinasse, Wolfgang Lehner, Thomas Willhalm, and Grégoire Gomes. 2017. Memory management techniques for large-scale persistent-main-memory systems. Proc. VLDB Endow. 10, 11 (August 2017), 1166-1177
- Ana Klimovic, Heiner Litz, and Christos Kozyrakis. 2017. “ReFlex: Remote Flash ≈ Local Flash”. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, New York, NY, USA, 345-359.
The workshop also included over forty technical presentations and two keynotes, and a tutorial. The first keynote speaker, Dr. Bianca Schroeder from the University of Toronto, described her work revealing the real-world reliability of flash memory devices in large-scale data center. In the second keynote, Amit Golandar, Technical Director at NetApp presented an overview of how software support for persistent memory has evolved over the last decade.
The workshop began on Sunday with a half-day tutorial on “Caches for the Persistent Memory and Flash Era,” presented by Dr. Irfan Ahmad (CloudPhysics) and Dr. Ymir Vigfusson (Emory University).
This year, the workshop enjoys the support of IBM, NetApp, Toshiba Memory, Facebook, Intel, Micron, Marvell, Samsung, Huawei,Western Digital, VMWare, Seagate, Broadcom, CNEXLabs, and SNIA.
About the Non-Volatile Memories Workshop
The Non-Volatile Memories Workshop is the world’s premier venue for research into how to use non-volatile memory technology to improve the performance, reliability, and efficiency of computing systems. It was founded in 2010 by Dr. Paul Siegel and Dr. Steven Swanson of the University of California, San Diego’s Jacob School of engineering. The workshop is a co-production of the Center for Memory and Recording Research (http://cmrr.ucsd.edu) and the Non-Volatile Systems Laboratory (http://nvsl.ucsd.edu) at UC San Diego. More information, including a detailed program, is available at http://nvmw.ucsd.edu. Questions about the workshop should be directed to firstname.lastname@example.org.
Jacobs School of Engineering