Workshop 2

* If you encounter menus do not work upon clicking, delete your browser's cache.

PPAC Analysis and System-Technology Co-Optimization for 3D Memory-on-Logic IC, Many-Core SOC and AI Computing Applications

Organizers: Rongmei Chen (imec) and Geert Van der Plas (imec)

In this workshop, the state-of-the-art of 3D IC technology, design techniques and application will be given. Wafer by wafer and face to face hybrid bonding based fine-pitch (<1µm) 3D interconnects will be reviewed and its application to various 3D IC designs will be introduced. 3D partitioning in various levels of caches for applications ranging from mobile SOC to high performance SOC will be evaluated and traded off for system power, performance, area and cost (PPAC) considering the impact of the 3D interconnect technologies. The purpose of this workshop is to provide a comprehensive view of 3D IC design, PPAC analysis, and optimization from technology, circuit, up to system applications. We will cover various studies related to 3D stacking ranging from high-performance 3D CPU physical design, novel 3D microarchitecture exploration to many-core 3D system design, supported by power delivery and thermal analysis. We will supplement these explorations with hardware measurements from a state-of-the-art 3D demonstrator consisting of a cache coherent interconnect mesh implemented in 3D using sub-10µm 3D pitch and face-to-face hybrid wafer bonding technology on 12nm FinFET process node. 3D opportunities to AI computing will also be discussed.

Live Session: June 13, 7:00 AM-9:00AM (JST)

About Rongmei Chen

Rongmei Chen obtained Bachelor and Ph.D. degrees (with excellent Ph.D. thesis honor) from Tsinghua University, China, in 2012 and 2017 respectively. He was awarded with "Ten Best Progress in Radiation Physics Field of China in 2015-2017" for his innovative contribution to the field of radiation effects of IC. He was a post-doc at LIRMM-CNRS/Montpellier University, France from 2017 to 2019. He joined imec, Leuven, Belgium in 2019. He has been working on 3D IC design and exploration, SRAM design, CNT electronics and IC radiation effects. He is currently a research scientist at imec and is also a Marie Curie scholar. He has published more than 30 papers in journals or conferences including TED, TNS, NSREC, RADECS, IEDM etc. He has served as TPC members for several meetings such as VLSI-SOC, EDTM etc..

About Geert Van der Plas

Geert Van der Plas obtained Ph.D. degree from the Katholieke Universiteit Leuven, Belgium, in 2001. He joined imec, Belgium, in 2003. He has been working on energy efficient data converter, power/signal integrity and 3D integration technologies. Currently he is program manager in the 3D system integration program addressing system scaling using advanced 3D (TSV) and packaging (FO-WLP) technology for high performance, mobile and IoT applications. His interests are in characterization, modeling, system exploration and design enablement of 3D integration technologies.

1. Time Performance Improvement by Agile Design and 3D Integration, Tadahiro Kuroda, The University of Tokyo

With the shift to a knowledge-intensive society, where data replaces materials as resources and value is provided by knowledge-based services instead of things, the IC is changing its role not only as a means to deliver AI processing with increasing sophistication to convert data to knowledge, but also as a key component in building society’s infrastructure in the form of communication networks and base stations. As a result, development time and power performance are emerging as key IC technical challenges. In this presentation, the use of an agile design methodology enabled by high-capacity SRAM and silicon compiler technology, as well as 3D IC integration to improve IC time-performance will be described. Furthermore, cost and performance analysis and comparison with current solutions especially in base station applications will be presented. The target is a 10-fold improvement in both time and performance, in addition to the democratization of IC design.

2. Future of HBM Packaging Technology, Kangwook Lee, SK hynix

The era of IoT (Internet of Things) and IoE (Internet of Everything) is emphasizing the big data analytics and cloud computing. Artificial intelligence, 5G communications and un-tact culture sparked by covid-19 are key driving forces prompting this paradigm shift of our life. The demand on larger capacity and higher bandwidth in the memory products is well aligned with this trend and has been widely implemented for high performance computing (HPC) and server applications, graphic process units and network accelerators, and so on. SK Hynix has been firstly opened up the high bandwidth memory (HBM) market for high performance graphic card in 2014, implementing the innovative packaging technologies such as chip to wafer stack using thermos-compression bonding and non-conductive films, molded wafer handling and testing of multi-die stacked wafer. To adjust the market requirements on HBM, the efforts to increase the capacity and memory bandwidth have being lasted. The number of stack dies increases from 4 in HBM1 to 8 in HBM2/2E and will increase to 12 in HBM3 and 16 in HBM4. Not only die thickness but bond line thickness should be 50~60% lower in 12-high stack than first version of HBM1. Higher capacity is driving new technologies of thin wafer handling process, highly reliable and productive stacking, gap-filling materials and processes. According to the needs of power and bandwidth increase, we are facing with a quite tough challenges in thermal management. Thermal dissipation ability of an HBM cube is a key factor determining product and system performance as the memory bandwidth as well as capacity increases. Thus innovation of packaging material, process, structure and design improvements are necessarily required to meet high thermal dissipation ability. Increasing thermal dummy bumps with adopting highly conductive gap fill material or gap fill fee bond structure such as hybrid bond can be good solutions for lowering thermal resistance. In addition, a variety of possible applications of HBM may lead not only to HBM design and packaging technology improvements but to package component level innovations. Larger interposer size, the adoption of organic substrate or RDL interposer eliminating a silicon interposer, and heterogeneous stack of logic/memory can be another feature for future HBM. In this presentation, we would like to describe several technical achievements during a few years as a leading company of HBM products and introduce future directions to develop key technologies of HBM.

3. High-Performance AI Computing and Opportunities for 3D, Mustafa Badaroglu, Qualcomm

We are living in a connected world with access to data in vast amounts relying on energy-efficient performant computation. This has become more challenging under the requirements of seamless interaction between instant data and big data. Instant data generation requires low-power performant devices that can generate the data instantly. Big data requires abundant computing, communication bandwidth, and memory resources to generate the requested service and information. This trend in computing and data movement necessitate a close interaction between memory and computing, particularly for AI applications. In this talk we will describe the sustainability challenges of single-chip AI solutions and shed light to opportunities brought by 2.5D/3D integration for high-performance AI.

4. Architecture, Physical Design and 3D Technology Co-Optimization for Next Generation High Performance Energy Efficient Systems, Brian Cline, Arm

In this talk, we present a compilation of 3D-IC design-related studies done at Arm over the last few years. We begin by discussing the technology landscape around 3D-ICs, including the advanced packaging integration roadmap and the 2D scaling roadmap that it complements. Then we cover the design implications, challenges, and solutions associated with “Macro” level 3D-IC design concepts, as well as higher density, “Micro” level concepts, spanning from 3D partitioning to multi-tier timing and clock design, to power delivery and thermal management. Through this work, we hope to show both the potential benefits afforded by 3D-IC design techniques, but the remaining challenges that still exist, in order to promote a lively workshop discussion on how to create a healthy and robust 3D-IC production ecosystem.

5. 3D Partitioning Strategies for Memory-on-Logic Designs for Many Core SoCs, Dragomir Milojevic, Universite Libre de Bruxelles

In this presentation we will address the challenges of CMOS scaling for many core SoC designs and propose 3D system integration to address them. Quick overview of different 3D technologies enablers will be presented focusing on their geometrical and electrical properties. Cross section of the stack using IMEC iN3 technology, Face-to-Face hybrid bonding with fine pitch (<1um) and nano-TSVs (0.09um) to escape the stack will be shown. We then cover the 3D Die-by-Die place & route environment and propose methodology for 3D timing optimisation. The above technology assumptions and design flow has been applied for Memory-on-Logic implementation of an open-source, many-core, densely connected system (56 RISCV cores) where all L1 SRAM memory macros have been moved to the top die. Two different partitioning strategies to handle 3D nets with FanOut > 1 will be presented: one where the split occurs on the top die, and the other where the split occurs on the bottom die. Detailed Performance, Power and Area comparison of both 3D partitioning strategies with respect to 2D show significant performance benefits of 3D technology for densely connected SoCs.

6. Entering a New Dimension with 3D-IC Design, - EDA Perspective, Vinay Patwardhan, CADENCE

With the evolution of advanced packaging technologies and popularity of multi-chiplet flows, EDA tools have to adapt to be more IC centric than PCB centric. There are system design level challenges and traditional ‘planning and implementation’ challenges that have to be mitigated through algorithmic expansion. This presents an opportunity to expand some of Cadence’s powerful planning, implementation and signoff technologies into the 3rd dimension. Cadence’s comprehensive 3D-IC solution to address all the needs of system design, planning and system level signoff analysis will be presented in this session. A couple of examples of new innovative flows for automation of 3D-IC exploration & power-thermal analysis capabilities will be presented.

7. 3D Technology: The Enabler for Advanced Digital Applications, Francois Andrieu, CEA-Leti

Today, Integrated Circuit roadmap is not only driven by CMOS scaling. 3D technologies are the other key technology booster to improve the Performance Power Area Cost (PPAC) tradeoff at the system level. Especially, 3D enables increasing the data bandwidth and/or the heterogeneous integration and/or the form factor between different sub-circuits. As usual, a universal 3D solution does not exist but there are rather many 3D techniques suited for different applications. For high performance applications, whose performance are limited by the memory access, 3D can bring high data bandwidth by µ-bumps, TSV and active interposer whereas, for compute-bound applications, it can boost the computing performance through a heterogeneous integration of different accelerators. To illustrate this statement, a 220GOPS 96-core processor with 6 Chiplets 3D-stacked on an active interposer offering 0.6ns/mm latency, 3TBit/s/mm2 inter-chiplet interconnects will be highlighted as well as the capability to co-integrate chiplets with bare dice in a multi-chip-module for heterogeneous and scalable high performance compute nodes. For imaging applications driven by the pixel size reduction and the form factor, hybrid bonding (also called Cu bonding) is the straightforward solution. It enables even a sub-µm 3D contact pitch with a wafer to wafer bonding. Finally, in order to increase even further the 3D contact density and to reduce the stack height, the 3D-sequential integration is the ultimate solution. This time, the 3D contact pitch is limited by the lithography alignment between different tiers integrated sequentially on a given wafer. Recent results will be presented, which open the way for high performance transistors and circuits integrated by 3D-sequential techniques. For all of these 3D solutions, the most cost-effective system is obtained with chip/tier specialization and dedicated technology per chip/tier.

8. Tackling the Memory Wall via 3D Memory Partitioning : A System Level Perspective, Manu Perumkunnil, imec

For the most advanced CMOS nodes, because of diminished power, performance and cost returns, dimensional scaling must be now complemented by Design and System Technology Co-optimization (DTCO and STCO) strategies, so that SoC bottlenecks can be mitigated. Performance scaling for SoCs has plateaued over the last decade because dimensional technology scaling has not been able to mitigate the ‘Memory Wall’ problem. This problem will only exacerbate with Machine Learning, Artificial Intelligence, Vision, and other data dominated application domains coming to the forefront and also SRAM Power Performance and Area (PPA) gains saturating at advanced nodes. To this end, it has become increasingly clear that 3-D integration and partitioning schemes can offer potential routes to continue scaling and tackle the Memory Wall via DTCO and STCO innovations. This can be via enabling higher integration density, heterogeneous (and novel) technology integration and extending the number of functions per 3-D chip. The 3-D integration scene has attracted a lot of interest over the past decade with demonstrations as well as commercial products like chiplets (AMD Rome, Milan in EPYC products) and 3D stacking (Intel Lakefield) more recently. In this talk, I will give an overview for the need of 3D on-chip cache like memories in highly heterogenous advanced SoCs for the future based on commercial references of today like the Apple A12, Kirin 980 etc via a top-down approach. Following this, based on cross layer exploration and system level simulations, we show that 3D partitioning can help tackle the Memory wall by facilitating large caches and increased bandwidth.