2019 Symposium on VLSI Circuits

Short Course 1

CMOS Technology Enablers for Pushing the Limits of Semiconductors: Materials to Packaging [Suzaku I]

Monday, June 10, 8:30-16:50

Organizers: M. Tada, NEC Corp.
N. Ramaswamy, Micron Technology Inc.

Breaking the Limitations of FinFET Scaling, M. Liu, Intel Corp.
Emerging Interconnect Technologies for Nanoelectronics, K. Saraswat, Stanford Univ.
Advanced Process Technologies Required for Future Scaling and Devices, R. Clark, TEL
DTCO in 2019: The Precious Metal Stack and the Route to Better Designs, B. Cline, ARM Ltd.
3D Integration for More Moore and More than Moore, C.-H. Tung, TSMC
Recent STT-MRAM Technology: From Lab to Fab, Y. Song, Samsung
Emerging Logic Devices for Future Computing, S. Salahuddin, Univ. of California, Berkeley
Overview in Three-dimensionally Arrayed Flash Memory Technology, R. Katsumata, Toshiba Memory Corp.

Short Course 2

Advanced 5G Circuits, Systems and Applications [Suzaku II]

Monday, June 10, 8:30-16:50

Organizers: H.-J. Song, POSTECH
A. Loke, Qualcomm Inc.

5G Real and Future, T. Nakamura, NTT Docomo
mmWave RFIC Technologies for 5G Infrastructure Applications, S.-G. Yang, Samsung
Design Challenges and Solutions of LO Generation for 5G Mobile Systems, J.-H. Choi , UNIST
Acoustic Filter for 5G Smartphones, H. Nakamura, Skyworks
Substrate Material and Packaging Technology for 5G Millimeter Wave Communication, K. Sudo, Murata
Beamforming Circuits, Systems, and Operations for 5G MIMO Systems, H. Wang, Georgia Institute of Tech.
Built-In Test and Calibration of Phased Arrays, B. Floyd , NCSU

Short Course 3

Opportunities and Challenges at the Intersection of Security and AI [Shunju II, III]

Monday, June 10, 8:30-16:50

Organizers: M. Hashimoto, Osaka Univ.
X. Zhang, IBM
K. Maekawa, Renesas Electronics Corp.
N. Ramaswamy, Micron Technology Inc.

Introduction to Artificial Intelligence & Security, R. Aitken, ARM Ltd.
Mobile Deep Learning Processors: Turning Challenges into Opportunities, H.-J. Yoo, KAIST
AI Computing Architectures and Hardware, J. Burns, IBM
Nonvolatile Circuit for AI Edge Applications, M.-F. Chang, National Tsing-Hua Univ.
RRAM Fabric for In-memory Computing and Neuromorphic Computing Applications, W. Lu, Univ. of Michigan
Circuit Design Resistant to Side Channel Attacks, N. Homma, Tohoku Univ.,
Energy-efficient Circuits for Cryptography and Entropy Generation, S. Mathew, Intel Corp.
Introduction to Electromagnetic Information Security, Y. Hayashi, Nara Institute of Science and Technology
The field that has transformed the world and which we have annually gathered to celebrate is undergoing a metamorphosis. Economics no longer inexorably points down Moore’s curve, price per gate has leveled or is rising. The leading edge nodes have become the territory of the very few companies that dare to use them. Simultaneously, the number of startups has shrunk by orders of magnitude. So where are we going? Together with you, a panel of experts with backgrounds ranging from academia, industry association, and companies from start-ups to established will attempt to provide some insights into our future.

SESSION 1
Joint Opening and Plenary Session 1 [Shunju I, II, III]
Tuesday, June 11, 8:00-10:00

8:00- Joint Welcome and Opening Remarks
M. Masahara, AIST
M. Ikeda, The Univ. of Tokyo
C.-P. Chang, Applied Materials, Inc.
K. Chang, Xilinx Inc

8:40- Plenary
Chairpersons: K. Takeuchi, Chuo Univ.
T. Palacios, MIT

C1-1 - 8:40 (Plenary)
Virtual Cyborg: Beyond Human Limits, M. Inami, The Univ. of Tokyo, Japan

The social revolutions have accompanied innovation of the view of the body. If we regard the information revolution as establishment of a virtual society against the real society, it is necessary to design a new view of body “JIZAI body (Virtual Cyborg)”, which can adapt freely to the change of social structure, and establish a new view of the body.

In this talk, we discuss how we understand of basic knowledge about the body editing for construction of JIZAI body (Virtual Cyborg) based on VR, AR and Robotics. Superhuman Sports: Applying Human Augmentation to Physical Exercise.

This talk will also present Superhuman Sports, a form of “Human-Computer Integration” to overcome somatic and spatial limitation of humanity by merging technology with the body. In Japan, official home of the 2020 Olympics and Paralympics, we hope to create a future of sports where everyone, strong or weak, young or old, non-disabled or disabled, can play and enjoy playing without being disadvantaged.

T1-1 - 9:20 (Plenary)
Managing Moore’s Inflection: DARPA’s Electronics Resurgence Initiative, W. Chappell, DARPA, USA

In June 2017, the DARPA Microsystems Technology Office (MTO) announced the upwards of $1.5 billion Electronics Resurgence Initiative (ERI) to ensure far-reaching improvements in electronics performance well beyond the limits of traditional scaling. The gains that came as electronics technology sprinted forward according to Moore’s Law were not guaranteed but realized through ingenuity and close collaboration between commercial industry, academia, and government. The present moment, beyond his law, is where Gordon Moore had true prescience. ERI is building on the long tradition of successful partnerships to foster the environment needed for the next wave of U.S. and allied semiconductor innovation.
SESSION 2
Advanced Wireless [Suzaku II]
Tuesday, June 11, 10:30-12:35

Chairpersons: H.-J. Song, POSTECH
E. Janssen, NXP Semiconductors

C2-1 - 10:30
A 76- to 81-GHz, 0.6º degree rms Phase Error Multi-channel Transmitter with a Novel Phase Detector and Compensation Technique,
T. Fujibayashi and Y. Takeda, Asahi Kasei Microdevices, Japan

A precisely phase controlled transmitter operating in 76- to 81-GHz for the automotive radar application is presented. To achieve accurate phase control, a novel phase detector using 3rd-order distortion is used to compensate the transmitter phase error. The multi-channel transmitter using this detector achieves less than 0.6º root-mean-square (RMS) phase error in 76- to 81-GHz frequency range. Since the proposed phase detector does not rely on the other TX channels, it's easy to extend the number of channels. This proposed transmitter is implemented in 65-nm CMOS technology. The phase detector consumes 1.8mW per channel.

C2-2 - 10:55
426-GHz Imaging Pixel Integrating a Transmitter and a Coherent Receiver with an Area of 380x470 μm² in 65-nm CMOS,
Y. Zhu*, P. R. Byreddy*, K. K. O* and W. Choi*, **, *The Univ. of Texas at Dallas and **Oklahoma state Univ., USA

A 426-GHz imaging pixel integrating a transmitter and a coherent receiver using the three oscillators for 3-push within an area of 380x470 μm² is demonstrated. The TX power is -11.3 dBm (EIRP) and sensitivity is -89.6 dBm for 1-kHz noise bandwidth. The sensitivity is the lowest among imaging pixels operating above 0.3 THz. The pixel consumes 52 mW from a 1.3 V VDD. The pixel can be used with a reflector with 47 dB gain to form a camera-like reflection mode image for an object 5 m away.

C2-3 - 11:20
A 1-5GHz Direct-Digital RF Modulator with an Embedded Time-Approximation Filter Achieving -43dB EVM at 1024 QAM,
S. Su and M. S.-W. Chen, Univ. of Southern California, USA

This paper presents a 1-5GHz direct-digital RF modulator with an embedded time-approximation filter to suppress out-of-band (OOB) noise floor. The proposed time-approximation filter technique approximates a FIR impulse response in time domain via a modulated LO waveform, leading to an equivalent RF bandpass filtering during the frequency up-conversion process. The silicon prototype achieves a peak output power of 23 dBm at 1 GHz over the 0.9-5.2GHz band with -43dB/-42 EVM for a 10/20-MHz 1024/256 QAM signal at 2.4 GHz. By inserting a notch in time-approximation filter, OOB noise floor achieves < -158 dBc/Hz NSD at 100MHz frequency offset with peak stopband noise rejection of > 50dB.

C2-4 - 11:45
A 26-42 GHz Broadband, Back-off efficient and VSWR Tolerant CMOS Power Amplifier Architecture for 5G Applications,
C. Chappidi and K. Sengupta, Princeton Univ., USA

Future mm-Wave transmitter front-ends will need to operate in an electromagnetically complex environment that are resistant to near-field antenna perturbations (VSWR events) while operating across multiple mmWave frequency bands (28/37/39/42 GHz) and with high efficiency and linearity with spectrally efficient modulation. This is particularly difficult since these parameters (bandwidth, linearity, efficiency, and VSWR tolerance) trade off strongly with each other in a PA. In this paper, we present a PA architecture that exploits mutual load pulling through a multi-port network in a nonlinear fashion to achieve VSWR tolerance while demonstrating Doherty-like operation across 26-42 GHz. The PA designed in 65-nm bulk CMOS generates $P_{sat}>19$ dBm with $PAE_{peak}>20\%$ across all the bands and up to 4.84x enhancement in PAE at 9.6 dB back-off. The PA demonstrates strong tolerance to VSWR events up to 4:1 load circle and supports 64-QAM OFDM modulation with 8 Gbps across 28-40GHz.

C2-5 - 12:10
A Time Domain Artificial Intelligence Radar for Hand Gesture Recognition Using 33-GHz Direct Sampling,
J. Park, SG. Lee, H. Koh, C. Kim and T. W. Kim, Yonsei Univ., Korea

This research developed time domain Artificial Intelligence radar using up to 33 GS/s direct sampling technique. It can recognize both static and dynamic hand gesture by learning the unique impulse signal that comes back from target. The algorithm gets recognition rate 93.2% and 90.5%, respectively on set of static and dynamic gesture.
SESSION 3
High Performance Computing [Suzaku I]

Tuesday, June 11, 10:30-12:10

Chairpersons: J. Chang, TSMC
Z. Zhang, Univ. of Michigan

C3-1 - 10:30

A dual-chiplet Chip-on-Wafer-on-Substrate (CoWoS®) was implemented in 7nm 15M process. Each SoC chiplet has four Arm® Cortex®-A72 processors operating at 4GHz. The on-die interconnect mesh bus operates above 4GHz at 2mm distance. The inter-chiplet connection features a scalable, 0.56pJ/bit power efficiency, 1.6Tb/s/mm² bandwidth density, and 0.3V Low-voltage-In-Package-INterCONnect (LIPINCON™) interface achieving 8Gb/s/pin and 320Gb/s bandwidth. Silicon test-chip measurements validate the processor, on-die interconnects and inter-chiplet interface performance. The built-in eye-scan feature shows the inter-chiplet connection achieves 244mV eye-height and 69% UI eye-width.

C3-2 - 10:55

This paper presents a 16nm 496-core RISC-V network-on-chip (NoC). The mesh achieves 1.4GHz at 0.98V, yielding a peak of 695 Giga RISC-V instructions/s (GRVIS) and a record 812,350 CoreMark benchmark score. The main feature is the NoC architecture, which uses only 1881µm² per router node, enables highly scalable and dense compute, and provides up to 361 Tbps of aggregate bandwidth.

C3-3 - 11:20
A 250mV, 0.063J/GHash Bitcoin Mining Engine in 14nm CMOS Featuring Dual-Vcc SHA256 Datapath and 3-Phase Latch Based Clocking, V. Suresh, S. Satpathy, R. Kumar, M. Anders, H. Kaul, A. Agarwal, S. Hsu, R. Krishnamurthy, V. De and S. Mathew, Intel Corp., USA

A 0.15mm² Bitcoin mining engine is fabricated in 14nm CMOS with highest-reported energy-efficiency of 0.063J/GHash at 250mV, 25C. Fully-unrolled SHA256 datapath with Bitcoin-specific look-ahead/deferred digest optimizations and 3-cycle distributed scheduler provide 31/56% digest/scheduler delay reductions, resulting in 10% higher energy-efficiency with dual-Vcc operation. 3-phase latch-based clocking with stretchable non-overlapping clocks eliminates all min-delay paths, reducing total sequential power consumption by 50%. Robust mining operation over a wide supply range of 230-900mV is demonstrated, with 10-760MHash/s throughput measured at 100C.

C3-4 - 11:45

This paper presents a 25mm² SoC in 16nm FinFET technology targeting flexible acceleration of compute intensive kernels in DNN, DSP and security algorithms. The SoC includes an always-on sub-system, a dual-core Arm A53 CPU cluster, an embedded FPGA array, and a quad-core cache-coherent accelerator cluster. Measurement results demonstrate the following observations: 1) moving DSP/cryptography kernels from A53 to eFPGA increases energy efficiency between 5.5x and 28.9x, 2) the use of cache coherency for datapath accelerators increases throughput by 2.94x, and 3) accelerator flexibility-efficiency (GOPS/W) range on spans more than 50x, with 3.1x (+SIMD), 16.5x (eFPGA), 54.5x (CCA) compared to the dual-core CPU baseline on comparable tasks. The energy per inference on MobileNet CNN shows a peak improvement of 47.6x.

SESSION 4
Advanced Frequency Generators [Suzaku III]

Tuesday, June 11, 14:00-15:40

Chairpersons: J. Lee, National Taiwan Univ.
D. Griffith, Texas Instruments

C4-1 - 14:00

This paper presents an injection-locked PLL that employs RC pulse generator and injection timing calibration to enhance the jitter and reference spur performance. An ultra-low power oscillator is designed to reduce the overall power consumption of the PLL. The chip is fabricated in 65nm CMOS technology, occupying an area of 0.25mm². The proposed ILPLL achieves 70fsrms integrated jitter and -66Dbc reference spur, while consuming 0.2mW, which translates into -270dB FoM at 2.4GHz output frequency.
A 270-GHz Fully-Integrated Frequency Synthesizer in 65nm CMOS, X. LIU and H. C. Luong, The Hong Kong Univ. of Science and Technology, Hong Kong

A fully-integrated sub-THz frequency synthesizer is proposed leveraging an RF sub-sampling PLL (SS-PLL) cascaded with an ILFM-based mm-Wave LO generation chain and a sub-THz mixer for frequency extension. Third-harmonic and fourth-harmonic extraction enhancement methods are proposed for the ILFMx3 and ILFMx4, respectively. A distributed biased technique is proposed to improve the linearity of the magnetic tuning sub-THz ILFMx6. In addition, a frequency tracking loop (FTL) with frequency and amplitude calibration is proposed for the ILFMs. The 65nm CMOS prototype measures a locking range from 61.2-to-100.8GHz, 122.4-to-136.8GHz, and 198.5-to-273.6GHz, phase noise from -79.3dBc/Hz to -95.4dBc/Hz at 1-MHz offset, an integrated jitter from 124fs to 159fs, and an output power of -11dBm and DC-to-RF efficiency of 0.16% at a carrier of 211.4GHz.

A microwatt-class always-on sensor fusion engine is fabricated in 14nm CMOS and occupies 0.024mm$^2$. Single-instruction matrix operations, complex fixed-point SIMD instructions with inline shift/permute, programmable power gates, and AOI clocked circuits achieve 19% clock power reduction with 100mV improved $V_{th}$. Single-instruction matrix operations, complex fixed-point SIMD instructions with inline shift/permute, programmable power gates, and AOI clocked circuits achieve 19% clock power reduction with 100mV improved $V_{th}$. We have prototyped a microcontroller (MCU) that employs crystalline In-Ga-Zn oxide transistors having an extremely low off current below 10$^{-21}$ A. The IGZO-based MCU can retain data during power gating in both of its processing unit and memory, and there is an integrated voltage regulator that can store the reference voltage. The MCU is prototyped with the combination of 60-nm IGZO (in BEOL) and 110-nm Si CMOS processes. It has a standby power of 880 nW, a system backup time of 21 ns and a system wakeup time of 4.69 μs. The MCU that employs IGZO technology can be applied to devices which require low power consumption as well as fast wakeup.

A 2.2μW 600kHz Frequency-Locked Relaxation Oscillator with 0.046%/V Voltage and 48.69ppm/°C Temperature Stability for IoT Sensor Node Applications, X. Meng, Hong Kong Univ. of Science and Technology, Hong Kong

This brief presents a 600kHz relaxation oscillator with 2.2μW power consumption. A low power frequency-locked loop (FLL) structure is proposed to increase the frequency immunity against voltage, temperature and process (PVT) variations. A front-end regulator is proposed to further enhance the voltage stability with limited power overhead. A current-injection scheme is proposed to compensate the temperature variation from the on-chip poly-resistors. Fabricated in 0.18um technology, this oscillator can operate at an unregulated supply from 1.1V to 3.3V with 0.046%/V voltage stability and the measured temperature stability is 48.69ppm/°C from -45°C to 125°C. The measured Jitter$_{rms}$ is 1.56ns with 120k hits.

A 2.2μW 600kHz Frequency-Locked Relaxation Oscillator with 0.046%/V Voltage and 48.69ppm/°C Temperature Stability for IoT Sensor Node Applications, X. Meng, Hong Kong Univ. of Science and Technology, Hong Kong

This brief presents a 600kHz relaxation oscillator with 2.2μW power consumption. A low power frequency-locked loop (FLL) structure is proposed to increase the frequency immunity against voltage, temperature and process (PVT) variations. A front-end regulator is proposed to further enhance the voltage stability with limited power overhead. A current-injection scheme is proposed to compensate the temperature variation from the on-chip poly-resistors. Fabricated in 0.18um technology, this oscillator can operate at an unregulated supply from 1.1V to 3.3V with 0.046%/V voltage stability and the measured temperature stability is 48.69ppm/°C from -45°C to 125°C. The measured Jitter$_{rms}$ is 1.56ns with 120k hits.

A 3-GHz 8 clock multiplier has been proposed with a jitter performance that is insensitive to frequency drift without a continuous frequency tracking loop (FTL). With the proposed digital calibration techniques, the spurs can be effectively suppressed down to -5.9dBc. Fabricated in 28-nm CMOS technology, this prototype presents an integrated jitter of 138fs$^{rms}$ while consuming 6.5mW from a 1-V/0.8-V supplies and achieves -29dB FoM.

A 3-GHz 8 clock multiplier has been proposed with a jitter performance that is insensitive to frequency drift without a continuous frequency tracking loop (FTL). With the proposed digital calibration techniques, the spurs can be effectively suppressed down to -5.9dBc. Fabricated in 28-nm CMOS technology, this prototype presents an integrated jitter of 138fs$^{rms}$ while consuming 6.5mW from a 1-V/0.8-V supplies and achieves -29dB FoM.

A 138fs$^{rms}$-Integrated-Jitter and ~249dB-FoM Clock Multiplier with ~51dBc Spur Using A Digital Spur Calibration Technique in 28-nm CMOS, Y.-A. Li and A. Niknejad, Univ. of California, Berkeley, USA

A 3-GHz 8 clock multiplier has been proposed with a jitter performance that is insensitive to frequency drift without a continuous frequency tracking loop (FTL). With the proposed digital calibration techniques, the spurs can be effectively suppressed down to -5.9dBc. Fabricated in 28-nm CMOS technology, this prototype presents an integrated jitter of 138fs$^{rms}$ while consuming 6.5mW from a 1-V/0.8-V supplies and achieves -29dB FoM.
Catena: A 0.5-V Sub-0.4-mW 16-Core Spatial Array Accelerator for Mobile and Embedded Computing, J. P. Cerqueira*, T. J. Repetti*, Y. Fu**, S. Priyadarshi***, M. A. Kim* and M. Seok*, *Columbia Univ. and **Qualcomm, Inc., USA

We present Catena, a programmable 16-core spatial array accelerator supporting workloads for mobile and embedded devices. Deeply scaling supply voltage of such parallel processors could save energy, but alone results in limited savings, as it magnifies the energy waste of underutilized hardware. Therefore, we design Catena with novel circuit and architecture techniques to minimize such energy waste. Thanks to the proposed techniques, the 65-nm CMOS prototype achieves state-of-the-art energy efficiencies across multiple workloads.

SESSION 6

Physical Sensors [Suzaku I]

Tuesday, June 11, 14:00-15:40

Chairpersons: T. Tokuda, Nara Institute of Science and Technology
K. Makinwa, Delft Univ. of Technology

C6-1 - 14:00


This paper presents a low power, reconfigurable, high dynamic range (DR), light-to-digital converter (LDC) for wearable PPG/NIRS recording. The LDC converts light into the time domain with a dual-slope mode integrator, followed by a counter-based, time-to-digital converter. This architecture merges the functionalities of a conventional transimpedance amplifier and ADC, while quantization in time domain significantly improves the DR. The inherent low pulse repetition frequency (PRF) of LDC also reduces the LED power. Furthermore, the DR of the LDC can be easily reconfigured by re-programming the counting step size or the PRF of the LEDs, allowing optimal power consumption for different DR scenarios. The IC achieves a maximum DR of 119dB while only consuming 196μW (including 2X LEDs). The IC is validated with PPG and NIRS tests, using photodiodes (PDs) and silicon photomultipliers (SiPMs) respectively.

C6-2 - 14:25

A 0.02mm² 100dB-DR Impedance Monitoring IC with PWM-Dual GRO Architecture, H. Han, W. Choi and Y. Chae, Yonsei Univ., Korea

This paper presents an impedance monitoring IC that achieves small area and wide DR. The stimulated signal is encoded by using pulse width modulation (PWM) and complex impedance can be measured by in- (I) and quadrature- (Q) phase outputs through dual phase demodulation. The two-level I/Q signals drive two gated-ring-oscillator (GRO) based ADCs, thus eliminating the distortion of GROs. Fabricated in 0.11μm CMOS, the prototype IC occupies only 0.02mm². It achieves a wide DR of 100dB and a resolution of 19.21Ω rms at 1MΩ resistance in a conversion time of 5ms, and consumes 152.3μW. It corresponds to state-of-the-art resolution FoM of 14.6pJ/step.

C6-3 - 14:50


This paper presents a fully integrated CMOS multi-modal cellular sensor/stimulator array with 21952 multi-modal pixels. 1568 simultaneous parallel readout channels, 16 μmx16 μm pixel pitch for single cell resolution, and 3.6 mmx1.6 mm tissue-level field-of-view (FoV), achieving high-resolution multi-parametric cellular potential/impedance/optical imaging for holistic cellular characterization and cell-based assays. Moreover, the array system reports the first on-chip true 4-point impedance sensing scheme with 16 parallel impedance sensing channels, which enables precise cellular impedance measurements with aggressively scaled electrodes and large electrode-electrolyte interfacial impedance. The chip also supports concurrent 16-channel 5-bit reconfigurable current-mode cell stimulation. The chip is implemented in a 130 nm low-cost standard CMOS process. Extracellular potentials (700 μV-1.5 mV) from on-chip cultured neonatal rat ventricular myocytes (NRVMs) are successfully measured. With on-chip cultured cardiac fibroblasts, full-chip high-resolution optical images and 4-point impedance mapping precisely capture cell distribution, growth, proliferation, and surface adhesion.

C6-4 - 15:15


An integrated 11-bit 2-D CMOS stress sensor is presented with 66dB of dynamic range, measuring -100 to 360MPa, and <1LSB error over temperature from 5°C to 90°C. N-Well-based primary elements enable accurate sensing of stress magnitude and angle, and allow repeatable error compensation.
SESSION 7
Data Converter Techniques [Suzaku III]

Tuesday, June 11, 16:00-18:05

Chairpersons: C.-C. Liu, MediaTek Inc.
E. Martens, imec

C7-1 - 16:00
A 75.8dB-SNDR Pipeline SAR ADC with 2nd-order Interstage Gain Error Shaping, C.-K. Hsu and N. Sun, The Univ. of Texas at Austin, USA

This paper presents a low-cost gain error shaping (GES) technique that can substantially suppress the in-band interstage gain error in pipeline ADCs. It works for both closed-loop and open-loop amplification. A prototype ADC with the proposed 2nd-order GES technique in 40nm CMOS achieves 75.8dB SNDR over 12.5MHz BW while operating at 100MS/s and consuming 1.54mW. It achieves 174.9dB Schreier FoM. The GES-related hardware occupies less than 2% of the core area.

C7-2 - 16:25

This work presents a time-multiplexing SAR ADC to support up to 5-lead ECG monitoring with >100dB SNDR per readout channel. Its noise and linearity performance are enhanced by a combination of dual-reference architecture and mismatch error shaping (MES) technique without using amplifiers or calibration, resulting in a >106dB SFDR and 109.4dB DR within 250Hz bandwidth (FoM_SDR=178.9dB). The ECG analog front-end (AFE), including 3 DC-coupled instrumentation amplifiers (IAs) and 1 ADC, occupies only 0.48mm² in 55nm CMOS. Each ECG channel achieves 1μVrms (0.5-250Hz) input-referred noise at a low IA gain of 6V/V with a 667mVpp-diff linear input range.

C7-3 - 16:50

This work proposes a dual-residue pipelined-SAR ADC that generates two residue signals from a single amplifier, which eliminates the need for gain-matching calibration. A capacitive interpolating SAR conversion technique is also proposed for the second stage for power efficiency. A prototype ADC fabricated in 40nm CMOS occupies an active area of 0.026 mm² and achieves an SNDR of 62.1 dB at Nyquist and 67.1 dB SFDR under a 0.9 V supply.

C7-4 - 17:15
A 0.2 - 8 MS/s flexible SAR ADC Achieving 0.35 - 2.5 fJ/Conv-Step and Using Self-Quenched Dynamic Bias Comparator, H. BINDRA*, A.-J. Annema*, S. M. Louwsma**, and B. Nauta*, **Teledyne DALSA Corp., The Netherlands

A 10b flexible SAR ADC is presented incorporating a self-quenched dynamic bias comparator and a self-triggered asynchronous delay line. The ADC is fabricated in 65nm CMOS, occupies 0.04mm² and has an ENOB > 9bit and SFDR > 66dB for sampling rates from 0.2 to 8MS/s at supply voltages respectively from 0.7V to 1.3V with a Walden FOM from 0.35 to 2.5fJ/conv-step.

C7-5 - 17:40
A 29mW 5GS/s Time-Interleaved SAR ADC Achieving 48.5dB SNDR with Fully-Digital Timing-Skew Calibration Based on Digital-Mixing, G. Mingqiang*, M. Jiaji*, S. Sai-Weng*, W. Hegong**, and M. P. Rui*, **Univ. of Macau, Macau, **Univ. of Texas at Austin, USA and ***Instituto Superior Técnico / Universidade de Lisboa, Portugal

This paper presents a 5GS/s 16-way Time-Interleaved SAR ADC in 28nm CMOS, proposing a fully-digital background timing-skew calibration based on digital mixing, without adding any extra analog circuits. We implement the sub-channel SAR with a splitting-combined monotonic switching procedure. The prototype ADC achieves 48.5dB SNDR at Nyquist rate, while the power consumption is 29mW leading to a Walden FOM of 26.7fJ/conv-step.

SESSION 8
Low-Power Wireless [Suzaku II]

Tuesday, June 11, 16:00-18:05

Chairpersons: K. Okada, Tokyo Institute of Technology
A. Zolfaghari, Broadcom Corp.

C8-1 - 16:00

A pre-802.11ba wake-up radio (WUR) receiver and digital baseband (DBB) is presented and monolithically integrated as a technology demonstrator within an 802.11a/b/g/n/ac Wi-Fi transceiver. Occupying <0.105mm², the WUR receiver consumes only 340µW, with a measured sensitivity of -92.4dBm, and can tolerate a -56dBm, 25MHz offset, Wi-Fi blocker with 3dB desensitization. The WUR receiver operates when the Wi-Fi system is in sleep mode and turns on the Wi-Fi radio upon receiving an 802.11ba, D1.0 compliant, wake-up packet.
C8-2 - 16:25
A 3.8 mW Sub-Sampling Direct RF-to-Digital Converter for Polar Receiver Achieving 1.94 Gb/s Data Rate with 1024-APSK Modulation, H. Wang, Z. Su, H. Zhao, Y. Wang and F. Dai, Auburn Univ., USA

This paper presents a direct RF-to-digital converter (RDC) for polar RX. It consists of a pair of TDCs, an ADC, and a precise sampling position control system. Unlike conventional direct-RF sampling receivers, the RDC samples the input RF signal at baseband rate. It is capable of directly digitizing the phase and amplitude of the received modulated RF signals. It is compatible with a variety of modulations and has advantages of relaxed system requirements on phase noise and linearity when APSK is used. The RDC achieves a max rate of 1.94 Gb/s with 1024-APSK at a carrier of 6 GHz, consuming only 3.8mW.

C8-3 - 16:50

In this paper we present a highly miniaturized Bluetooth Low Energy (BLE) broadcaster suitable for volume constrained ingestible/wearable wireless sensors. The proposed module includes a 65nm CMOS chip co-packaged with a thin-Film Bulk Acoustic Resonator (FBAR) serving as the frequency reference. All necessary electronics are integrated in a volume of 1.53 mm³ (1.6 mm x 1.6 mm x 0.6 mm), the smallest reported to date in literature. The PLL-free transceiver architecture is realized using a directly modified low-power (1.2 mA) FBAR oscillator feeding a class D power amplifier with integrated matching network. The FBAR oscillator’s short startup time and low power results in a total TX energy of 2.37 uJ to wake up from sleep, transmit a full BLE advertising packet at 0 dBm, and go back to sleep.

C8-4 - 17:15

This work presents a 33 nW wake-up receiver with -106dBm sensitivity at 428 MHz. Within-bit duty cycling allows RF gain at nano-watt DC power levels providing 26 dB sensitivity improvement over prior art at iso-power. An RF MEMS filter and an automatic gain and offset control loop suppress noise and reject interference. The receiver can be digitally tuned across DC power, latency, and sensitivity to provide flexible functionality from indoor short range to outdoor long-range applications.

C8-5 - 17:40

We present an 802.15.4 compatible transceiver that operates without any off-chip frequency reference. With integrated Cortex-M0, the chip can also transmit BLE beacons with only three external connections (power, ground, and antenna). The RF transmitter operates with >10% system efficiency at -10 dBm output power from a regulated supply. The entire chip, including the microprocessor, can operate below 1 mW peak power when transmitting. The analog receiver power consumption is 1.03 mW from a 1.5V battery.

SESSION 9
High-Density I/Os [Suzaku I]

Chairpersons: C. P. Yue, Hong Kong Univ. of Science and Technology and A. Loke, Qualcomm Inc.

C9-1 - 16:00

A 1.02pJ/b USR link carrying 416.67 Gb/s/mm die edge (500Gb/s aggregated data rate) in 16nm FinFET, while occupying 2.4mm², is presented. To enable dense routing over conventional package material, a modified correlated NRZ signaling with low sensitivity to ISI, Xtalk, and common-mod noise has been developed. A matched CTLE/slicer topology has been employed to enhance robustness of the receiver over PVT. A very wideband Rx PLL tracks the majority of Tx jitter, resulting in significant power saving by relaxing Tx design constraints.
C9-2 - 16:25

This paper presents a data (DQ) receiver for HBM3 with a self-tracking loop that tracks a phase skew between DQ and data strobe (DQS) due to a voltage or thermal drift. The self-tracking loop achieves low power and small area by utilizing an analog-assisted baud-rate phase detector. The proposed pulse-to-charge (PC) phase detector (PD) converts the phase skew to a voltage difference and detects the phase skew from the voltage difference. An offset calibration scheme that can compensate for a mismatch of the PD is also proposed. The proposed calibration scheme operates without any additional sensing circuits by taking advantage of the write training of HBM. Fabricated in 65 nm CMOS, the DQ receiver shows a power efficiency of 370 fJ/b at 4.8 Gb/s and occupies 0.0056 mm². The experimental results show that the DQ receiver operates without any performance degradation under a ±10% supply variation.

C9-3 - 16:50
An 8nm All-Digital 7.3Gb/s/pin LPDDR5 PHY with an Approximate Delay Compensation Scheme, K. Chae, Samsung Semiconductor, Inc., Korea

An all-digital 7.3Gb/s/pin LPDDR5 PHY is presented. A non-interruptive approximate delay compensation scheme is proposed to enhance tolerance to voltage variation without any memory access black-out. Thus, seamlessly maintained DQ-centering improves access valid-window-margin under supply noise without performance penalty. In addition to that, the proposed scheme enables direct DVFS switching due to the voltage variation tolerance with minimized performance penalty. The LPDDR5 PHY in an 8nm technology demonstrated 6.4Gb/s/pin with 0.31UI at 640mV and 7.3Gb/s/pin with 0.25UI at 790mV, respectively. The voltage variation tolerance is measured up to 70mV without memory access black-out.

Circuits Evening Panel Discussion
Technology We Will See Coming Out of the Tokyo Olympics and Beyond [Suzaku I]
Tuesday, June 11, 20:00-21:30

Organizers: M. Natsui, Tohoku Univ.
K. Okada, Tokyo Institute of Technology
S. Ho, MediaTek Inc.

Moderator: K. Hamashita, AKM

Panelists: J. Jensen, Intel Corp.
M. Pate, Google
M. Mizuno, NEC Corp.
Y. Kimura, Fujitsu Ltd.
Y. Kato, DENSO Corp.
P. O'Connor, Microsoft

The upcoming Olympic Games in Tokyo will feature not only the world’s best athletes, but also the world’s newest technologies. Many new and exciting technologies will be previewed for the world to see, including 5G, IoT, AI, autonomous vehicles, AR/VR, sensors, and security. The panel will feature technologists who will give us a look behind these new technologies to the innovative circuits that make them possible. (Note that this panel is not affiliated with the Tokyo Olympics)

SESSION 10
Remarks, Awards and Plenary Session 2 [Shunju I, II, III]
Wednesday, June 12, 8:00-10:00

8:00-
Remarks and Award Ceremony
M. Masahara, AIST
M. Ikeda, The Univ. of Tokyo
C.-P. Chang, Applied Materials, Inc.
K. Chang, Xilinx Inc
C10-1 - 8:40 (Plenary)
Computational and Technology Directions for Augmented Reality Systems, S. Rabii, Facebook Inc., USA

Augmented reality (AR) is a set of technologies that will fundamentally change the way we interact with our environment. It represents a merging of the physical and the digital worlds into a rich, context aware and accessible user interface delivered through a socially acceptable form factor such as eyeglasses. One of the biggest challenges in realizing a comprehensive AR experience is managing power consumption to ensure both adequate battery life and a physically comfortable thermal envelope. This presentation reviews advanced concepts in minimizing power in data transfer across components, leveraging highly efficient accelerators while maintaining programmability, and the potential of emerging nonvolatile memories for low power computing.

T7-1 - 9:20 (Plenary)
Si Platform for Developing Spin-Based Quantum Computing, S. Tarucha, The Univ. of Tokyo, Center for Emergent Materials Science, RIKEN, Japan

To date basic techniques of implementing spin-based quantum computing have been developed using quantum dots, including single and two-qubit gates, initialization and readout. But improving the operation fidelity as well as increasing the qubit number is still a challenge in realizing fault-tolerant quantum computing. Electron spins confined to Si quantum dots have a long decoherence time and the physical area for implementing a qubit is very small, smaller than 0.1 mm$^2$. We have developed a fast gating technique for the Si quantum dots to operate the qubits with high fidelity thanks to the weakness of decoherence. I will first discuss the spin dephasing measured for Si quantum dots and how to suppress it to raise the gate fidelity well exceeding the threshold of fault tolerant computation. I will then review the current research and development to scale up the qubit system, including integration technologies of the quantum processor and cryo-electronics to improve the performance of the large-scale quantum circuit.

SESSION 11
SRAM and DRAM [Suzaku III]
Wednesday, June 12, 10:30-12:35

C11-1 - 10:30

We present a high-performance 6T SRAM architecture equipped with low-power features of late cancel, left-right enable, input-gating, and power-gating. Measurements show that these SRAMs can support CPUs running at 4GHz while offering dynamic power savings of 17% and 6% for the caches and the system respectively and up to 21X static power system savings for the low-power implementation.

C11-2 - 10:55

A 5Gb/s/pin 16Gb LPDDR4/4X reconfigurable SDRAM with a self-mode detection scheme, a voltage-high keeper (VHK) for un-terminated load and a prediction-based fast-tracking ZQ algorithm is implemented in 10nm class (2nd generation) DRAM process. Providing a reconfigurable LVSTL with a mode detection scheme to support two different DRAM interface standards (LPDDR4/4X) depending on I/O supply voltage ($V_{DDQ}$), a proposed design can maintain the system compatibility and longevity to the legacy controller and the PHY structure. The VHK for LPDDR4 enables the 3.2Gb/s operation in the un-terminated load similar to LPDDR4X by alleviating the inter symbol interference (ISI) through the controlled leakage current. In a ZQ calibration, the proposed ZQ algorithm achieves fast ZQ code searching, the calibration time can be reduced by 30% in PVT variation. Moreover, an internal ZQ calibration (IZQC) is newly adopted to minimize the variation of the driver strength to PVT variation.

C11-3 - 11:20

A new breed of Form-Factor-Driven DRAMs offers 80% lower standby power and > 50% IO signal reduction vs. Capacity-Driven commodity DRAM. Command/address/data are multiplexed onto 16 pins and combined with a Serial Control Pin in a Single-Edge-Pinout-Floorplan, providing bus efficiency >98%. Major SOC-DRAM subsystem cost savings are enabled via die size, packaging and PCB area savings using this RPCA. A 100x speedup of array fills using a new Group Write circuit further reduces test cost.
C11-4 - 11:45
Area-Efficient and Variation-Tolerant In-Memory BNN Computing Using 6T SRAM Array, J. Kim, J. Koo, T. Kim, Y. Kim, H. Kim, S. Yoo and J.-J. Kim, POSTECH, Korea

We introduce a SRAM-based binary neural network (BNN) hardware which uses a single 6T SRAM cell for XNOR operation for the first time. The cell is 45% smaller than the previous 8T bitcell for XNOR operation. We also propose an in-memory calibration and batch normalization to achieve more reliable operation under the presence of process variation.

C11-5 - 12:10

This work presents a 65nm CMOS speech recognition processor, named Thinker-IM, which employs 16 computing-in-memory (SRAM-CIM) macros for binarized recurrent neural network (RNN) computation. Its major contributions are: 1) A novel digital-CIM mixed architecture that runs an output-weight dual stationary (OWDS) dataflow, reducing 85.7% memory accessing; 2) Multi-bit XNOR SRAM-CIM macros and corresponding CIM-aware weight adaptation that reduce 9.9% energy consumption in average; 3) Predictive early batch-normalization (BN) and binarization units (PBUs) that reduce at most 28.3% computations in RNN. Measured results show the processing speed of 127.3us/Inference and over 90.2% accuracy, while achieving neural energy efficiency of 5.1pJ/Neuron, which is 2.8x better than state-of-the-art.

SESSION 12
LDOs for High Performance Digital [Suzaku II]

Wednesday, June 12, 10:30-12:35

Chairpersons: M. Hamada, Keio Univ.
C. Sandner, Infineon Technologies AG

C12-1 - 10:30

A variation-adaptive computational digital low-dropout regulators (DLO) is presented that uses an event-driven computational controller (CC) to compute the required number of power gates to regulate the output voltage for any load/reference transient. The self-calibrated CC ensures a 2-asynchronous-event-cycle settling time independent of the load/V_REF range. Measurements of a testchip in 22nm CMOS demonstrate >20X faster settling time and >6X lower droop magnitude than a conventional linear controller (LC) based LDO.

C12-2 - 10:55

A 7nm all-digital leakage-current-supply (LCS) circuit tracks core leakage across process and temperature variations and controls PFET block-head switches to supply the slow-changing leakage current while a high-bandwidth analog low-dropout (LDO) voltage regulator supplies the fast-changing dynamic current. By decreasing the LDO maximum current demand, silicon measurements demonstrate a 70mV (44%) reduction in the minimum dropout voltage, resulting in a wider voltage range of LDO usage for core power savings of 14-22%.

C12-3 - 11:20
A 0.5-1V Input Event-Driven Multiple Digital Low-Dropout-Regulator System for Supporting a Large Digital Load, S. J. Kim*, D. Kim*, Y. Pu**, C. Shi** and M. Seok*, *Columbia Univ. and **Qualcomm, Inc., USA

Recent digital low-dropout regulators have demonstrated competitive load regulation performance for a digital load even with a low input voltage. However, few existing regulator designs have investigated into supporting a spatially large load with realistic grid parasitics. This paper presents a system consisting of nine digital low-drop-out regulators based on event-driven control for better supporting such load. At 0.5V (1V) input, our prototype improves the load regulation FoM by 3.9X (9.1X) and current density by 8.7X (2.8X) over the prior state of the arts.

C12-4 - 11:45
0.5V-V_IN, 0.29ps-Transient-FOM, and Sub-2mV-Accuracy Adaptive-Sampling Digital LDO Using Single-VCO-Based Edge-Racing Time Quantizer, J. Lee, J. Bang, Y. Lim and J. Choi, Ulsan National Institute of Science and Technology, Korea

This work presents a digital LDO using a single-VCO-based edge-racing (SVER) time quantizer to achieve fast transient and high accuracy concurrently. As the SVER scales the sampling frequency dynamically according to the magnitude of the error in the output voltage, the transient response can be improved without the increase in the power consumption in the steady state. Since the SVER uses a single VCO, the accuracy of the output can be high against local mismatches. In measurement, this LDO achieved a 0.29 ps-transient FOM and a sub-2 mV accuracy under 0.5-V supply.
This paper presents a low-dropout (LDO) regulator that can supply up to 300mA output current with high power supply rejection (PSR). The proposed BGR-recursive LDO design with PSR-boosting feedforward embedded in error-amplifier improves the PSR while consuming a low quiescent current of < 50μA. Among the state-of-the-art LDOs with an external capacitor, the proposed chip fabricated in 0.5-μm CMOS achieves the highest PSR of 102-to-80dB in the frequency range from 100Hz to 0.1MHz with a current efficiency of 99.98% and shows the best FoM of 11ps in the transient response performance. The BGR-recursive LDO design also benefits a high line-regulation of 0.003%/V.

SESSION 13
High-Speed DACs and Analog Techniques [Suzaku I]
Wednesday, June 12, 10:30-12:35

C12-1 - 10:30
A 0.07mm² 210mW Single-1.1V-Supply 14-bit 10GS/s DAC with Concentric Parallelogram Routing and Output Impedance Compensation, H.-Y. Huang and T.-H. Kuo, National Cheng Kung Univ., Taiwan

A DAC with small-size non-cascoded current cells is proposed to achieve small area, low power, high linearity, and wide bandwidth. The proposed concentric parallelogram routing (CPR) reduces mismatch and timing skew among cells. In addition, the proposed output impedance compensation (OIC) remedies the insufficient output impedance of the non-cascoded current cells. The DAC, implemented in 28nm CMOS process, achieves > 64dB SFDR over the entire Nyquist bandwidth at 10GS/s while consuming 210mW from a single 1.1V supply. Compared with other state-of-the-art CMOS DACs with resolutions higher than 10bit and Nyquist bandwidths over 3.4GHz, this DAC has an active area of only 0.07mm² less than 1/12 of the others and the best performance for a commonly-used figure-of-merit (FoM).

C12-2 - 10:55
A 6b 28GS/s 4-channel Time-interleaved Current-Steering DAC with Background Clock Phase Calibration, W.-C. Kim, D.-S. Jo, Y.-J. Roh, Y.-D. Kim and S.-T. Ryu, KAIST, Korea

This paper presents a four-channel time-interleaved high-speed current-steering DAC with a proposed two-stage analog multiplexer (MUX). Optimum switching times of the cascaded MUX and the sub-DACs are guaranteed by background clock phase calibration with a proposed maximum-overlap-based phase detector. A 6b 28GS/s prototype DAC fabricated in 40nm CMOS achieves a SFDR of 34.6dB at a Nyquist input and consumes 103mW under dual supply voltages of 1.1V and 1.6V.

C12-3 - 11:20
An Energy-Efficient Comparator with Dynamic Floating Inverter Pre-Amplifier, X. Tang, B. Kasap, L. Shen, X. Yang, W. Shi and N. Sun, The Univ. of Texas at Austin, USA

This paper presents an energy-efficient comparator with a novel dynamic pre-amplifier. By using an inverter-based input pair powered by a floating reservoir capacitor, the pre-amp realizes both current reuse and dynamic bias, thereby significantly boosting g_m/I_o and reducing noise. Moreover, it greatly reduces the influence of the input common-mode voltage on the comparator performance, including noise, offset, and delay. A prototype comparator in 180nm achieves 46uV input-referred noise while consuming only 1pJ per comparison under 1.2V supply. This represents >7X energy efficiency boost compared to a Strong-Arm latch. It achieves the highest reported energy efficiency to authors' best knowledge.

C12-4 - 11:45
A 31 pW-to-113 nW Hybrid BJT and CMOS Voltage Reference with 3.6% ±3σ-Inaccuracy from 0 °C to 170 °C for Low-Power High-Temperature Systems, I. Lee and D. Blaauw, Univ. of Michigan, USA

This paper proposes a low-power voltage reference generating 736 mV from 0 °C to 170 °C for low-power high-temperature systems. Using subthreshold current, a BJT diode develops a process-insensitive complementary-to-absolute-temperature voltage, and stacked CMOS transistors compensate the temperature sensitive by adding a proportional-to-absolute-temperature voltage. To maintain a reference voltage at high temperature, the circuit is designed considering pwell-to-deep nwell diode leakage. 76 samples from 3 different wafers, fabricated in a 160 nm process, show a ±3σ inaccuracy of 3.6% from 0 °C to 170 °C without any trimming. It consumes 31 pW at 27 °C and 113 nW at 170 °C from 0.9 V.

C12-5 - 12:10
A 0.6-V Tail-Less Inverter Stacking Amplifier with 0.96 PEF, L. Shen, A. Mukherjee, S. Li, X. Tang, N. Lu and N. Sun, The Univ. of Texas at Austin, USA

This paper presents a highly power-efficient instrumentation amplifier. It adopts an inverter stacking amplifier (ISA) based 1st-stage that realizes 4x current reuse, thereby greatly reducing the supply current. To boost the power efficiency and enable its robust operation under 0.6V supply, the tail current sources are removed. A high CMRR of 84dB is maintained by combining chopping, closed-loop biasing, and inherent high impedance degeneration. A 3-stage topology with a class-AB last-stage realizes high loop gain and power-efficient dominant-pole compensation. A prototype tail-less ISA in 180nm achieves 1.38uV rms noise within 8-kHz BW, while consuming only 2.7uW. This leads to a power efficiency factor (PEF) of 0.96. To authors’ best knowledge, it is the best reported PEF to date.
Technology / Circuits Joint Focus Session 1

New Computing [Suzaku III]

Wednesday, June 12, 14:00-15:40

Chairpersons: M. Yamaoka, Hitachi, Ltd.
A. Wang, Psikick

JFS1-1 - 14:00 (Invited)
A Cloud-Ready Scalable Annealing Processor for Solving Large-Scale Combinatorial Optimization Problems, M. Hayashi, T. Takeomoto, C. Yoshimura and M. Yamaoka, Hitachi, Ltd., Japan

JFS1-2 - 14:25

A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm x 2.6 mm chip exhibits 12.6x (8.4x) energy efficiency gain, 11.7x (77.6x) off-chip bandwidth efficiency gain and 17.1x (36.9x) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.

JFS1-3 - 14:50
Spoken Vowel Classification Using Synchronization of Phase Transition Nano-Oscillators, S. Dutta, A. Khanna, W. Chakraborty, J. Gomez, J. Gomez, S. Joshi and S. Datta, Univ. of Notre Dame, USA

The paradigm of biologically-inspired computing endows the components of a neural network with dynamical functionality, such as self-oscillations, and harnesses emergent physical phenomena like synchronization, to learn and classify complex temporal patterns. In this work, we exploit the synchronization dynamics of a network of ultra-compact, low power Vanadium dioxide (VO_2) based insulator-to-metal phase-transition nano-oscillators (IMT-NO) to classify complex temporal pattern for speech discrimination. We successfully train a network of four capacitively coupled IMT-NOs to recognize spoken vowels by tuning their oscillation frequencies electrically according to a real-time learning rule and achieve high recognition rates of 90.5% for spoken vowels. Such an energy-efficient compact hardware with a small number of functional elements are a promising technology option for edge artificial intelligence.

JFS1-4 - 15:15

In this paper, we present an integrated circuit which supports Full-HD photorealistic refocusing. In contrast to the conventional single-image blurring, it provides physically-correct bokeh effect by rendering and then averaging hundreds of novel views from five images taken in different perspectives. To address the huge requirement of DRAM bandwidth and computing power, we adopt a block-based multi-rate framework and further propose two techniques: four-direction view generation and highly-parallel view rendering. The former provides a compact system architecture to save 32% of SRAM area and 92% of DRAM bandwidth without noticeable quality degradation. The latter efficiently generates 5.4G novel pixels per second to provide high-quality refocusing. This chip is fabricated in 40nm CMOS process, and the core area is 3.61 mm². It consumes 250mW when operating at 200MHz and 0.9V to support Full-HD photorealistic refocusing up to 40 fps.

SESSION 14

PLL Techniques [Suzaku II]

Wednesday, June 12, 14:00-15:40

Chairpersons: Y. Bando, Socionext Inc.
M. S-W. Chen, Univ. of Southern California

C14-1 - 14:00
A 0.25-0.4V, Sub-0.11mW/GHz, 0.15-1.6GHz PLL Using an Offset Dual-Path Loop Architecture with Dynamic Charge Pumps, Z. Zhang, G. Zhu and C. P. Yue, Hong Kong Univ. of Science and Technology, China

This paper presents an ultra-low-voltage PLL (ULVPLL) with minimum supply voltage at 0.25V. An offset dual-path loop architecture is proposed to relax the current matching requirement in the charge pump (CP) and to mitigate the CP design challenge at such low supply voltage. Two dynamic CP circuits are introduced to lower the design complexity and power consumption. Implemented in 40nm CMOS, the 0.15-1.6GHz ULVPLL is capable of operating under a 0.25-0.4V supply voltage while achieving sub-0.11mW/GHz power efficiency. Measured spur level is -58.3dBc at 0.1GHz offset from 1.6GHz output (under 0.4V supply) and -48.5dBc at 12.5MHz offset from 200MHz output (under 0.25V supply).
C14-2 - 14:25
A Reference Oversampling Digital Phase-Locked Loop with −240 dB FOM and −80 dBc Reference Spur, J.-H. Seol*, **, D. Sylvester*, D. Blaauw**, and J. Taekwang***, *Univ. of Michigan, USA, **Samsung Electronics Co., Ltd., Korea and ***ETH Zürich (Eidgenössische Technische Hochschule Zürich), Switzerland

This paper proposes a reference oversampling phase-locked loop that simultaneously suppresses in-band noise and oscillator noise while maintaining a low reference spur. The proposed phase locked loop achieves -240.3 dB Figure of Merit (FOM) and -80 dBc reference spur. The integrated jitter is 508 fs rms, and the power consumption is 3.6 mW at 2 GHz output clock frequency.

C14-3 - 14:50
A 2.2-GHz 3.2-mW DTC-free Sampling ΔΣ Fractional-N PLL with -110 dBc/Hz In-Band Phase Noise and -246dB FOM and -83dBc Reference Spur, J. Tao and C.-H. Heng, National Univ. of Singapore, Singapore

This paper presents the first sampling ΔΣ fractional-N (frac-N) PLL without the digital-to-time converter (DTC), whose design is challenging and requires complex calibration. It employs a linear slope generator (LSG) to output a linear waveform and this linearization enables the sampling phase detector (SPD) to handle larger phase step from the phase interpolator (PI). This DTC-free 2.2-GHz PLL achieves in-band phase noise of -110 dBc/Hz, -246-dB FOM and -83 dBc reference spur while consuming only 3.2 mW power.

C14-4 - 15:15
A 387.6fs Integrated Jitter and -80dBc Reference Spurs Ring Based PLL with Track-and-Hold Charge Pump and Automatic Loop Gain Control in 7nm FinFET CMOS, C.-T. Ko, TSMC, Taiwan

This paper presents a phase-locked loop that employs a track-and-hold charge pump and automatic loop gain control to enhance the jitter and spur performance against PVT variations. The chip is fabricated in 7nm FinFET technology. The proposed track-and-hold charge pump achieves <-115dBc/Hz in-band noise and consumes 53μW from a 0.9V supply. The ring-based phase-locked loop achieves 387.6fs rms integrated jitter and -80dBc reference spurs, and consumes 5.9mW from a 0.9V supply at 4GHz. This translates to an FOM of -240.5dB.

SESSION 15
DC-DC Converters [Suzaku I]
Wednesday, June 12, 14:00-15:40

Chairpersons: Y. Woo, Silicon Works
H. Lam, Analog Devices, Inc.

C15-1 - 14:00 (Invited)
A 48 V Input 0.75 V Output DC-DC Converter Power Block for HPC Systems and Datacenters, T. Takken*, A. Ferencz*, C.-S. Wu**, L. McAuliffe*, T. Jia*** and X. Zhang*, *IBM T. J. Watson Research Center, USA, **The Univ. of Tokyo, Japan and ***Northwestern Univ., USA

C15-2 - 14:25
A Two-Phase 2MHz DSD GaN Power Converter with Master-Slave AO2T Control for Direct 48V/1V DC-DC Conversion, D. Yan, X. Ke and D. B. Ma, The Univ. of Texas at Dallas, USA

This paper reports a GaN power converter that achieves direct 48V/1V DC-DC voltage conversion with a two-phase DSD architecture at 2MHz, pushing the minimum duty ratio to a record low level of 2.1%. The AO2T control with elastic ON-time modulator leads to significant improvement on transient response and voltage droop performance, compared to prior arts. A master phase mirror enables adaptive master-slave phase operation, accomplishing automatic phase current balancing for improved reliability. The converter achieves a peak efficiency of 85.4%, with an active die of 1.46mm² on 180nm HV BCD process.

C15-3 - 14:50
A 10-MHz 14.3W/mm² DAB Hysteretic Control Power Converter Achieving 2.5W/247ns Full Load Power Flipping and above 80% Efficiency in 99.9% Power Range for 5G IoTs, K. Wei*, B. Lee** and D. B. Ma*, *The Univ. of Texas at Dallas and **Texas Instruments Inc., USA

A double adaptive bound (DAB) hysteretic control power converter is designed for 5G IoTs, which require nanosecond power load flipping and high efficiency across full power range. In response to 1A/3ns load step-up/step-down, it achieves 1% tsettle of 247ns/387ns, thanks to the DAB control. This is 6 times faster than the best of the arts on 0.18um CMOS. A synchronized DCR offset cancellation scheme improves VO regulation accuracy by 10 times. As power scales from full to ultra-light load, the controller self-reconfigures to remove redundant controller loss and facilitate adaptive system power delivery. It achieves >80% efficiency over 99.9% of 2.5W full power range. Highly efficient design leads to the highest reported chip power density of 14.3W/mm².
C15-4 - 15:15

The right-half-plane zero can be eliminated in the proposed buck-boost converter to achieve fast transients for Internet-of-Thing applications. The pseudo-boost mode in the BB converter eliminates one power switch in the current path and ensures that the continuous inductor current is half of the conventional design value to achieve 97.46% peak efficiency. Besides, the output voltage ripple is reduced to 7mV. By inserting an additional phase, a smooth transition between the buck and pseudo-boost modes ensures a voltage drop less than 15mV. The slope-based transient enhancement circuit accelerates transient response in 9uS with a load variation of 400 mA.

Technology / Circuits Joint Focus Session 2
IoT & Sensor [Suzaku III]

Wednesday, June 12, 16:00-18:05

Chairpersons: M. Hashimoto, Osaka Univ.
D. Markovic, Univ. of California, Los Angeles

JFS2-1 - 16:00
Integrated Power Management and Microcontroller for Ultra-Wide Power Adaptation Down to nW, L. Lin, S. Jain and M. Alioto, National Univ. of Singapore, Singapore

This paper presents a power management unit (PMU) driving a microcontroller, and controlling a power knob that enables adaptation to the sensed power availability over an ultra-wide range, well beyond voltage scaling. Conventional battery-powered operation is augmented with pure harvesting. Wide power adaptation is enabled by comparator delay self-biasing and zero-current switching scheme shared among all power modes with single-cycle convergence.

JFS2-2 - 16:25

This paper presents a 10nm² Internet-of-Tiny-Things (IoT²) system that measures light dose using custom photovoltaic cells and a light-dose-to-digital converter (LDDC). The LDDC nulls diode leakage for temperature stability and creates headroom without power overhead by dual forward-biased photovoltaic cells. It also adaptively updates the current mirror ratio and accumulation weighting factor for a low, near-constant power consumption. The system can operate energy-autonomously at >500lx light level. The LDDC achieves a 3-sigma inaccuracy of ±3.8% and σ/μ of 2.4% across a wide light intensity range from 10lx to 300klx while consuming only 35 - 339nW.

JFS2-3 - 16:50

Ppm-level hydrogen and ammonia in air were recognized by low-power, integrated sensors consisting of catalytic metal nanosheets. Thermal energy necessary for catalytic reactions were given by Joule heating not by external heaters. The thermal-aware design of sensors reduces the power consumption to 0.14 mW. The low-power and small-area properties enable large-scale, on-chip integration of molecular sensors, which will be useful in IoT era. A sensor array was successfully connected to a platform with wireless connectivity.

JFS2-4 - 17:15

We demonstrate a record-high performance monolithic tranntenna (transistor-antenna) using 65-nm CMOS foundry in the field of a plasmonic terahertz (THz) detector. By applying ultimate structural asymmetry between source and drain on a ring FET with source diameter (dS) scaling from 30 to 0.38 micrometer, we obtained 180 times more enhanced photoresponse (∆u) in on-chip THz measurement. Through free-space THz imaging experiments, the conductive drain region of ring FET itself showed a frequency sensitivity with resonance frequency at 0.12 THz in 0.09 ~ 0.2 THz range and polarization-independent imaging results as an isotropic circular antenna. Highly-scalable and feeding line-free monolithic tranntenna enables a high-performance THz detector with responsivity of 8.8 kV/W and NEP of 3.36 pW/Hz\(^{0.5}\) at the target frequency.

JFS2-5 - 17:40
TBD
SESSION 16
Speciality I/Os [Suzaku II]

Wednesday, June 12, 16:00-18:05

Chairpersons: Y. Tomita, Fujitsu Laboratories Ltd.
J. Proesel, IBM

C16-1 - 16:00

This work presents an Electro-Absorption Modulator (EAM) based single-mode 50Gbps NRZ optical link in 16nm FinFET. The TX uses T-coil based over-peaking to improve modulation efficiency and relax TIA’s bandwidth and noise requirement. The RX uses a power efficient 3-stage TIA with T-coils to improve BW. The link sensitivity is -10.9dBm OMA at BER<10^-12 and it consumes 4.31pJ per bit at 50Gbps with 2dB link margin. To the best of the authors' knowledge, this is the fastest reported integrated optical link using a CMOS technology.

C16-2 - 16:25
A Laser-forwarded Coherent 10Gb/s BPSK Transceiver Using Monolithic Microring Resonators in 45nm SOI CMOS, N. Mehta, S. Lin, B. Yin, S. Moazeni and V. Stojanovic, Univ. of California, Berkeley, USA

This paper demonstrates the first fully integrated coherent binary-phase-shift-keying (BPSK) link using microring resonator (MRR) with forwarded laser LO signal. It is enabled by integration of silicon-photonic blocks like optical DAC based modulator, 3-dB coupler, and MRR-based balanced photodetector (PD) in a monolithic zero-change 45nm SOI CMOS. The link operates at 10 Gb/s with transmitter driver consuming 40fJ/bit and receiver with OMA sensitivity of -15.1dBm consuming 450fJ/bit. The laser-forwarded BPSK link improves the laser power budget by ~6dB compared to direct detection NRZ link with same components.

C16-3 - 16:50

This paper presents a referenceless digital clock and data recovery (CDR) with an unlimited frequency detection capability that is extended from a multi-phase oversampling scheme. With minimal hardware overhead, the proposed CDR exhibits robust frequency acquisition regardless of its initial condition. The CDR achieves a capture range from 4Gb/s to 20Gb/s, which is limited only by the operating frequency of the oscillator. The measured frequency behaviors for various data rates demonstrate that frequency acquisition is possible at any initial frequency and the worst-case acquisition time is 25μs with a PRBS31 pattern. The CDR fabricated in 65nm CMOS consumes 37.3mW at 20Gb/s and occupies 0.045mm². Compared with state-of-the-art works, this design achieves the widest capture range and the highest power efficiency.

C16-4 - 17:15
A 0.87 V 12.5 Gb/s Clock-Path Feedback Equalization Receiver with Unfixed Tap Weighting Property in 65 nm CMOS, D. Lee, KAIST, Korea

This paper presents a clock-path feedback equalization receiver with unfixed tap weighting property. In the proposed receiver, an equalization operation is achieved through a clock path so that a feedback loop delay is improved. Moreover, a feedback weight is changeable depending on the amount of inter-symbol interference (ISI) resulting in that a single tap compensates a high channel loss. Fabricated in a 65 nm CMOS, the receiver achieves a power efficiency of 0.376 mW/Gbps at a data rate of 12.5 Gb/s in 0.87 V supply. A BER < 10^-12 for an eye width of 0.16 UI was verified over a 19 dB PCB channel loss. The figure of merit (FoM) is 0.0198 mW/Gbps/dB and the receiver occupies 0.00294 mm².

C16-5 - 17:40

This paper presents a 65nm CMOS 1.62-to-10.8Gb/s video interface receiver with fully adaptive equalizers incorporating CTLE and 2-tap DFE. Sign-sign least-mean-squares (SSLMS) algorithm is used for not only the DFE but also the CTLE adaptation to reduce power consumption and extra hardware. An un-even data level is proposed for the optimum locking of the DFE and CTLE adaptation in the presence of a pre cursor. The vertical eye margin is improved by 24% at 34dB loss channel with the proposed data level. The receiver achieves BER of 10^-12 at 34dB loss channel, occupies 0.174mm², and consumes 37.2mW at 10.8Gb/s.
C17-1 - 16:00

To expand IoT application ranges, ultra-low active energy operations are expected to edge devices. Especially, read energy reduction in embedded Flash (eFlash) is strongly required to enable real-time sensing with limited energy generated by energy harvesting (EH). In this work, 1.5MB 2T-MONOS eFlash macro is fabricated with 65nm SOTB technology, using low-energy sense amplifier and data transmission circuit techniques which enhance advantages of SOTB devices. The proposed eFlash achieves 0.22 pJ/bit read energy with 64MHz read access, which is low enough to utilize EH technologies as energy sources.

C17-2 - 16:25

The paper proposes a BEOL PCM-based e-NVM solution integrated in a 28nm FD-SOI CMOS technology, giving the best performances in terms of area, access time and temperature range. The integration of a 4Mb PCM in an automotive grade (Tj up to 165°C) microcontroller chip is presented here, exhibiting a robust solution satisfying all criteria of the demanding automotive environment. The GeSbTe material used for the PCM [1] has been tuned to reach the 165°C compliance and 10 years data retention. 28nm has been determined as the optimal to exploit PCM embedded within FD-SOI CMOS technology [2], also considering the limited number of process steps related to the storage element integration. Technology also offers full feature 5V devices required for automotive application. The body bias of the FDSOI, the quiescent leakage both in circuitry and in the unselected bits inside the memory array is controlled allowing to optimize the functionality.

C17-3 - 16:50
Liquid Silicon: A Nonvolatile Fully Programmable Processing-In-Memory Processor with Monolithically Integrated ReRAM for Big Data/Machine Learning Applications, Y. Zha*, E. Nowak** and J. Li*, *Univ. of Wisconsin-Madison, USA and **CEA-LETI, France

A nonvolatile fully programmable processing-in-memory (PIM) processor named Liquid Silicon (L-Si) is demonstrated, which combines the superior programmability of general-purpose computing devices (e.g. FPGA) and high power efficiency of domain-specific accelerators. Besides the general computing applications, L-Si is particularly well suited for AI/machine learning and big data applications, which not only pose high computational/memory demand but also evolves rapidly. L-Si is fabricated by monolithically integrating HfO2 resistive RAM on top of commercial 130nm Si CMOS. Our measurement confirmed the fabricated chip operates reliably at low voltage of 650 mV. It achieves 80.9 TOPS/W in performing neural network inferences and 480 GOPS/W in performing content-based similarity search (a key big data application) at nominal voltage supply of 1.2V, showing >3x and ~100x power efficiency improvement over the state-of-the-art domain-specific CMOS/RRAM-based accelerators. Additionally, it outperforms the latest nonvolatile FPGA in energy efficiency by ~3x in general compute-intensive applications.

C17-4 - 17:15
The Demonstration of Gate Dielectric-Fuse 4kb OTP Memory Feasible for Embedded Applications in High-K Metal-gate CMOS Generations and Beyond, E. R. Hsieh**, C. W. Chang**, C. C. Chuang**, H. W. Chen*** and S. Chung*, *National Chiao Tung Univ., Taiwan, **Stanford Univ., USA and ***United Microelectronics Corp., Taiwan

A 4kb macro of One Time Programming (OTP) memory, implemented by a new breakdown, named dielectric fuse (dFuse) breakdown, has been realized on a foundry pure logic 28nm HKMG CMOS platform. The feature size of a unit cell is 1.5T per 7.5F2. The experimental results show that dFuse macro exhibits high programming (PGM) speed of 100ns at 4V, read time smaller than 10ns at 0.75V, and excellent data retention under one-month baking at 150°C. More importantly, the program voltage is weakly dependent on the environmental temperature, suitable for automotive applications. This OTP is also expected to be scalable to advanced node such as FinFET and provides an ideal and reliable solution for the storage purpose in IoT and 5G era.

C17-5 - 17:40

This paper presents an embedded Flash system based on 28nm SG-MONOS technologies for automotive. It contains the world’s largest 24MB code Flash memories and achieves 240MHz random read access at Tj of 170degC and -40degC. The peak current for programming in over-the-air software update (OTA) is reduced by 55%. A high-speed program mode with 6.5MB/s is implemented for shorter test time. The system realizes robust and fast software switching of ~1ms in OTA.
SESSION 18

Sensors for Object Detection and Recognition [Suzaku III]

Thursday, June 13, 8:30-10:10

Chairpersons: Y. Oike, Sony Semiconductor Solutions Corp.
L. Sibeud, CEA-LETI

C18-1 - 8:30

This paper presents a 640x640 fully dynamic CMOS image sensor for always-on object recognition. A pixel output is sampled with a dynamic source follower (SF) into a parasitic column capacitor, which is readout by a dynamic single-slope (SS) ADC based on a dynamic bias comparator and an energy efficient two-step counter. The sensor, implemented in a 0.11μm CMOS, achieves 0.3% peak non-linearity, 6.8e-3 RMS RN and 67dB DR. Its power consumption is only 2.1mW at 44fps and is further reduced to 260μW at 15fps with sub-sampled 320x320 mode. This work achieves the state-of-the-art energy efficiency FoM of 0.7e-9·nJ.

C18-2 - 8:55
A 132 by 104 10μm-Pixel 250μW 1kefps Dynamic Vision Sensor with Pixel-Parallel Noise and Spatial Redundancy Suppression, C. Li*, L. Longinotti*, F. Corradi**, and T. Delbruck***, *iniVation AG, **iniLabs GmbH and ***Univ. of Zurich, Switzerland

This paper reports a 132 by 104 dynamic vision sensor (DVS) with 10μm pixel in a 65nm logic process and a synchronous address-event representation (SAER) readout capable of 180Meps throughput. The SAER architecture allows adjustable event frame rate control and supports pre-readout pixel-parallel noise and spatial redundancy suppression. The chip consumes 250μW with 100kefps running at 1k event frames per second (efps), 3-5 times more power efficient than the prior art using normalized power metrics. The chip is aimed for low power IoT and real-time high-speed smart vision applications.

C18-3 - 9:20

We report a capacitance sensing circuit-based ear recognition technique that can lead to an introduction of bezel-less smart phone. The designed chip is fabricated with 130nm technology. The fundamental functionality of pseudo high voltage driving analog front end (AFE) is demonstrated. We also discuss the detection algorithm consisting of weak and strong classifiers. The measurement result showed the feasibility of replacing an existing proximity sensor with detection rate of 83%.

C18-4 - 9:45

This paper presents an ultrasound receiver ASIC in 180nm CMOS that enables element-level digitization of echo signals in miniature 3D ultrasound probes. It is the first to integrate an analog front-end and a 10-b Nyquist ADC within the 150μm element pitch of a 5-MHz 2D transducer array. To achieve this, a hybrid SAR-shared-single-slope architecture is proposed in which the ramp generator is shared within each 2x2 subarray. The ASIC consumes 1.54mW per element and has been successfully demonstrated in an acoustic imaging experiment.

SESSION 19

Continuous-Time ADCs [Suzaku II]

Thursday, June 13, 8:30-10:10

Chairpersons: M. Fukazawa, Renesas Electronics Corp.
S. Ho, MediaTek Inc.

C19-1 - 8:30

This paper presents a continuous-time (CT) zoom ADC for use in audio applications. Compared to previous zoom ADCs, its input impedance is mainly resistive, making it much easier to drive while maintaining high energy efficiency. The prototype is fabricated in a 0.16 um CMOS process, occupies 0.27 mm² and achieves 108.5 dB DR, 108.1 dB SNR, 106.4 dB SNDR in a 20 kHz BW, while consuming 618 μW. This results in a state-of-the-art Schreier FoM of 183.6 dB.
A 24mW Chopped CTΔΣM Achieving 103.5dB SNDR and 107.5dB DR in a 250kHz Bandwidth, R. Theertham, P. Koottala, S. Billa and S. Pavan, Indian Institute of Technology Madras, India

We present a CTΔΣM which uses a virtual-ground-switched resistor DAC to achieve low distortion by reducing the effects of inter-symbol interference (ISI), and parasitic resistance in the reference path. 1/f noise is reduced by chopping the first stage of the input OTA. Chopping artifacts and clock jitter sensitivity are reduced by using a 3-stage OTA, and an 8-tap FIR feedback DAC. Fabricated in 180 nm CMOS, the prototype modulator operates at 32 MS/s and achieves 103.5/107.5 dB SNDR/DR in a 250 kHz bandwidth while consuming 24 mW. The Schreier FoM is 174dB.

A 71.4dB SNDR 30MHz BW Continuous-Time Delta-Sigma Modulator Using a Time-Interleaved Noise-Shaping Quantizer in 12-nm CMOS, C.-H. Weng, T.-A. Wei, H.-Y. Hsieh, S.-H. Wu and T.-Y. Wang, MediaTek Inc., Taiwan

This work presents a continuous-time delta-sigma modulator (CTDSM) using a time-interleaved noise-shaping quantizer targeted for wireless communication system application. A quantization error duplication method enables the SAR-based quantizer to implement noise-shaping and operate at an 832MHz sampling frequency concurrently. Through the use of a CRFB loop filter topology and the noise-shaping quantizer, the proposed CTDSM achieves 71.4 dB SNDR in 30-MHz BW without STF peaking. The FoMs and FoMw are 171 dB and 17.6 fJ/conv.-step, respectively.

A 3.2mW SAR-assisted CTΔΣ ADC with 77.5dB SNDR and 40MHz BW in 28nm CMOS, P. Cenci*, M. Bolatkale**, R. Rutten**, M. Ganzertl***, G. Lassche****, K. A. Makinwa* and L. Breems**, *Delft Univ. of Technology, **NXP Semiconductors N.V. and ***Catena Microelectronics, The Netherlands

This paper presents a SAR-assisted Continuous-time Delta-Sigma (CTΔΣ) ADC, which combines the energy efficiency of SAR ADCs with the relaxed driving requirements of CTΔΣ ADCs, as well as similar anti-alias filtering. When clocked at 2.4GHz, the ADC achieves 77.5dB SNDR in 40MHz BW. It consumes 3.2mW, resulting in a state-of-the-art Walden FoM of 6.5fJ/cs and a Schreier FOM of 178.5dB.

SESSION 20

Accelerators for Security and Coding [Suzaku I]

Thursday, June 13, 8:30-10:10

Chairpersons: N. Miura, Kobe Univ. R. Aitken, ARM Ltd.

A 4900μm² 839Mbps Side-Channel Attack Resistant AES-128 in 14nm CMOS with Heterogeneous Sboxes, Linear Masked MixColumns and Dual-Rail Key Addition, R. Kumar, V. Suresh, M. Kar, S. Satpathy, M. Anders, H. Kaul, A. Agarwal, S. Hsu, G. Chen, R. Krishnamurthy, V. De and S. Mathew, Intel Corp., USA

A 4900μm² side-channel attack (SCA) resistant AES accelerator in 14nm CMOS achieves 1200x higher minimum-time-to-disclosure (MTD) over an unprotected AES. Randomized byte-order shuffling using heterogeneous Sboxes, linear masked MixColumns and dual-rail key addition enable 9.2x lower correlation between current traces and HD/HW power models. The accelerator achieves 839Mbps throughput (0.7% performance overhead vs unprotected AES) with no CPA attack detected after 12 million encryptions.


An energy-efficient AES hardware accelerator based on 2-Sbox 8-bit datapath is fabricated in 28nm CMOS for IoT and mobile SoC applications. It obtains the smallest encryption cycles of 113 of 8b-AES by 100% utilization of two Sboxes and rearranging data bytes processing order. It also minimizes intermediate data registers (InterReg) to only 40b from 256b by eliminating ShiftRow and MixColumn registers. Along with glitch reduction design of Sbox in composite-field, it achieves best-in-class efficiency of 257-923 Gbps/W and 28-991Mbps throughput rate at 0.41/0.9V with scalable voltage down to near-threshold.

A 1.4GHz 20.5Gbps GZIP Decompression Accelerator in 14nm CMOS Featuring Dual-Path Out-of-Order Speculative Huffman Decoder and Multi-Write Enabled Register File Array, S. Satpathy, V. Suresh, R. Kumar, M. Anders, H. Kaul, A. Agarwal, S. Hsu, R. Krishnamurthy, V. De, S. Mathew, V. Gopal and J. Guilford, Intel Corp., USA

A 33,464μm² GZIP decompression accelerator is fabricated in 14nm CMOS, achieving industry-leading 20.5Gbps throughput. The design features out-of-order speculative Huffman decoder to break the fundamental serial dependency resulting in 69% higher decode throughput. The hybrid dual-path decoder provides 2.3x higher performance with multi-write enabled register-file array increasing decompression throughput by up to 41%. The arithmetic-architecture-circuit co-optimized design operates at 1.4GHz at 750mV, 25ºC with peak measured energy-efficiency of 1.86pJ per code at 280mV, 2.7x higher than previously reported implementations.
**C20-4 - 9:45**

A 3.25Gb/s, 13.2pJ/b, 0.64mm² Configurable Successive-Cancellation List Polar Decoder Using Split-Tree Architecture in 40nm CMOS, Y. Tao, S.-G. Cho and Z. Zhang, Univ. of Michigan, USA

A 0.64mm² configurable successive-cancellation list polar decoder is designed in 40nm CMOS for 5G wireless applications. The decoding tree is split to 4 subtrees to be decoded by 4 sub-decoders in parallel to improve throughput and cut latency by 4x. To maximize utilization, 8 frames are interleaved and decoded simultaneously to increase throughput by another 8x to 3.25Gb/s for code length up to 1024b. Dynamic clock gating reduces the peak power dissipation to 42.8mW at 0.9V, or 13.2pJ/b. Scaling the supply voltage to 450mV reduces the energy further to 8.21pJ/b.

---

**Technology / Circuits Joint Focus Session 3**

**Technology and System for AI [Shunju II, III]**

Thursday, June 13, 8:30-10:10

Chairpersons: H. Wu, Tsinghua Univ.
G. Yeric, ARM Ltd.

**JFS3-1 - 8:30 (Invited)**

Considerations of Integrating Computing-In-Memory and Processing-In-Sensor into Convolutional Neural Network Accelerators for Low-Power Edge Devices, K.-T. Tang, National Tsing Hua Univ., Taiwan

**JFS3-2 - 8:55 (Invited)**

Computational Memory-Based Inference and Training of Deep Neural Networks, A. Sebastian, IBM Corp., Switzerland

**JFS3-3 - 9:20**

A Ternary Based Bit Scalable, 8.80 TOPS/W CNN Accelerator with Many-Core Processing-in-Memory Architecture with 896K Synapses/mm², S. Okumura, M. Yabuchi, K. Hijioka and K. Nose, Renesas Electronics Corp., Japan

A Processing-In-Memory (PIM) accelerator with ternary SRAM is proposed for low-power, large-scale deep neural network (DNN) processing. The accelerator consists of Ternary Neural Arithmetic Memory (TNAM) which is capable of bit-scalable MAC (multiply and accumulation) operation in accordance with target accuracy and power limit. An ADC less readout circuits to reduce analog-digital conversion power and a system-level variation avoidance technique utilizing features of TNAM are also proposed. A test chip with large-scale PIM is fabricated and successfully operate convolutional neural networks (CNNs) with 8.8TOPS/W and highest accuracy and area density among recent SRAM-type PIMs are obtained.

**JFS3-4 - 9:45**

Energy-Efficient Continual Learning in Hybrid Supervised-Unsupervised Neural Networks with PCM Synapses, S. Bianchi*, I. Muñoz-Martin*, G. Pedretti*, O. Melnic*, S. Ambrogio** and D. Ielmini*, *Politecnico di Milano, Italy and **IBM Research, USA

Artificial neural networks (ANNs) can outperform the human ability of object recognition by supervised training of synaptic parameters with large datasets. Contrarily to the human brain, however, ANNs cannot continually learn, i.e. acquire new information without catastrophically forgetting previous knowledge. To solve this issue, we present a novel hybrid neural network based on CMOS logic and phase change memory (PCM) synapses, mixing a supervised convolutional neural network (CNN) with bio-inspired unsupervised learning and neuronal redundancy. We demonstrate high classification accuracy in MNIST and CIFAR10 datasets (98% and 85%, respectively) and energy-efficient continual learning of up to 30% of non-trained classes with 83% average accuracy.

---

**SESSION 21**

**Time of Flight (ToF) 3D and Time-Resolved Sensor [Suzaku III]**

Thursday, June 13, 10:30-12:35

Chairpersons: Y. Hirose, Panasonic Corp.
N. Dutton, ST Microelectronics

**C21-1 - 10:30 (Invited)**

Automotive LiDAR Technology, M. E. Warren, TriLumina Corporation, USA

LiDAR is an optical analog of radar providing high spatial-resolution range information. It is an essential part of the sensor suite for ADAS (Advanced Driver Assistance Systems), and ultimately, autonomous vehicles. Many competing LiDAR designs are being developed by established companies and startup ventures. Although there are no standards, performance and cost expectations for automotive LiDAR are consistent across the automotive industry. Why are there so many different competing designs? We can look at the system requirements and organize the design options around a few key technologies.
C21-2 - 10:55
A 64x64 APD-Based ToF Image Sensor with Background Light Suppression Up to 200 klx Using In-Pixel Auto-Zeroing and Chopping, B. Park, I. Park, W. Choi and Y. C. Chae, Yonsei Univ., Korea

This paper presents a time-of-flight (ToF) image sensor for outdoor applications. The sensor employs a gain-modulated avalanche photodiode (APD) that achieves high modulation frequency. The suppression capability of background light is greatly improved up to 200 klx by using a combination of in-pixel auto-zeroing and chopping. A 64x64 APD-based ToF sensor is fabricated in a 0.11μm CMOS. It achieves depth ranges from 0.5 to 2 m with 25MHz modulation and from 2 to 20 m with 1.56MHz modulation. For both ranges, it achieves a non-linearity below 0.8% and a precision below 3.4% at a 3D frame rate of 96fps.

C21-3 - 11:20

A 640x480 indirect Time-of-Flight (ToF) CMOS image sensor has been designed with 4-tap 7-μm global-shutter pixel in 65-nm back-side illumination (BSI) process. With novel 4-tap pixel structure, we achieved motion artifact-free depth map. Column fixed-pattern phase noise (FPPN) is reduced by introducing alternative control of the clock delay propagation path in the photo-gate driver. As a result, motion artifact and column FPPN are not noticeable in the depth map. The proposed ToF sensor shows depth noise less than 0.62% with 940-nm illuminator over the working distance up to 400 cm, and consumes 197 mW for VGA, which is 0.64 pW/pixel.

C21-4 - 11:45

An ultra-compact 1.4mmx1.4mm, 128x120 SPAD image sensor with a 5-wire interface is designed for time-resolved fluorescence microendoscopy. Dynamic range is extended by noiseless frame summation in SRAM attaining 126dB time resolved imaging at 15fps with 390ps gating resolution. The sensor SoC is implemented in STMicroelectronics 40nm/90nm 3D-stacked BSI CMOS process with 8μm pixels and 45% fill factor.

C21-5 - 12:10

We present the first integrated coherent LiDAR system with experimental ranging demonstrations operating within the eyesafe 1550nm band. Leveraging a unique wafer-scale 3D integration platform which includes customizable silicon photonics and nanoscale CMOS, our system seamlessly combines a high-sensitivity optical coherent detection front-end, a large-scale optical phased array for beamforming, and CMOS electronics in a single chip. Our prototype, fabricated entirely in a 300mm wafer facility, shows that low-cost manufacturing of high-performing solid-state LiDAR is indeed possible, which in turn may enable extensive adoption of LiDARs in consumer products, such as self-driving cars, drones, and robots.

SESSION 22
High-Speed PAM4 Transceivers [Suzaku II]
Thursday, June 13, 10:30-12:35

Chairpersons: H. Katsurai, NTT Device Innovation Center
B. Casper, Intel Corp.

C22-1 - 10:30
112 Gb/s PAM4 ADC Based SERDES Receiver for Long-Reach Channels in 10nm Process, Y. Krupnik and A. Cohen, Intel Corp., Israel

A 112 Gb/s PAM4 ADC based SERDES receiver is implemented on Intel 10 nm FinFET process. The receiver consists of a low noise analog front end (AFE), a 64-way time interleaved analog to digital converter (ADC) and a clock/data recovery (CDR) loop utilizing a 7GHz digitally controlled oscillator (DCO). The receiver supports long reach, -35 dB at Nyquist, channels with a pre-forward error correction bit error rate (BER) < 1e-6 making it compatible with existing and projected Reed-Solomon FEC.

C22-2 - 10:55

This paper presents a 64Gb/s, 2.29pJ/b PAM-4 optical transmitter (TX) utilizing a VCSEL. To improve the power efficiency, the TX adopts a quarter-rate architecture consisting of a quadrature clock generator and a 4:1 MUX. By employing an asymmetric push-pull FFE, high-speed PAM-4 signaling based on a VCSEL can be achieved. It is fabricated in a 65nm CMOS technology, occupying an active area of 0.278mm².

This 56Gb/s PAM-4 transceiver leverages the high logic density provided by the 7nm FinFET technology through rigorous application of digital design styles. The usage of analog transceiver elements with less favorable scaling is minimized by an All-Digital PLL, an SST transmitter and a receiver based on a 28GS/s 8b 32-way time-interleaved ADC and DSP engine. Receiver analog signal processing is limited to a minimal, but highly linear programmable gain and peaking stage with 48dB SDR. The ADC’s ENOB measures 5.5b, enabled by the linearity of the front-end. To support long reach channels, extensive filtering is provided by a digital, fully adaptive 20-tap FFE, 1-tap DFE equalizer and a Mueller-Muller CDR. The system achieves a raw 1e-7 BER with a -33dB insertion loss channel. With a 500mW receiver, a 90mW transmitter, and 0.31mm² area per lane, the transceiver combines power efficiency with significant area reduction.

A 56Gb/s PAM-4 Receiver with Voltage Pre-Shift CTLE and 10-Tap DFE of Tap-1 Speculation in 7nm FinFET, W.-C. Chen, S.-C. Yang, Y.-N. Shih, W.-H. Huang, C.-C. Tsai and C.-H. Hsieh, TSMC, Taiwan

A 56Gb/s PAM-4 wireline receiver testchip is demonstrated in 7nm FinFET. The equalization is achieved with four stages continuous time linear equalizer (CTLE) and half-rate 10-tap decision feedback equalizer (DFE) with first tap speculative. Proposed voltage pre-shift scheme uses a programmable offset added on top of the differential data signal to alleviate front end nonlinearity. The receiver achieves BER <1E-8 at optimal timing pre-FEC and 0.2UI at 1E-6 BER over 25dB insertion loss at 14GHz. The test-chip consumes 450mW under 1.0V/1.2V power supplies, giving a FOM of 0.321pJ/bit/dB. The active area is 0.352mm².

A 52-Gb/s Sub-1pJ/bit PAM4 Receiver in 40-nm CMOS for Low-Power Interconnects, C. Wang, Z. Guang, Z. Zhang and C. P. Yue, Hong Kong Univ. of Science and Technology, Hong Kong

This paper presents a source-synchronous PAM4 receiver that adopts quarter-rate topology to achieve good bit efficiency and a voltage-controlled delay line (VCDL) in the reference path of a phase-locked loop (PLL) to recover clock and data. With linear quarter-rate samplers, the equalized input signal by two-stage continuous-time linear equalizer (CTLE) is further equalized by 1-tap feed forward equalizer (FFE) embedded in the sampler, and then processed by the following power-efficient dynamic latch and CMOS logics. With the VCDL adjusted by a bang-bang phase detector (BBPD) and a charge pump (CP), the output clocks of the four-stage ring oscillator (RO) based PLL have equal phase spacing and track the input data accordingly. The 40-nm CMOS receiver IC achieves error-free operation at 52 Gb/s with a superior bit efficiency of 0.92 pJ/b while compensating for 7.3-dB channel loss at 13 GHz.

We propose a field-free switching SOT-MRAM concept that is integration friendly and allows for separate optimization of the field component and SOT/MTJ stack properties. We demonstrate it on a 300 mm wafer, using CMOS-compatible processes, and we show that device performances are similar to our standard SOT/MTJ cells: reliable sub-ns switching with low writing power across the 300mm wafer. Our concept/design opens a new area for MRAM (SOT, STT and VCMA) technology development.

Organizers:

Developing Visual Systems for Entertainment and Art, Y. Hanai, Rhizomatiks, Japan

Rhizomatiks Research is our division dedicated to exploring new possibilities in the realms of technical and artistic expression. Focusing on media art, data art, and other RD-intensive projects, our team strives to deliver cutting edge solutions that have not yet been seen on a global stage. Rhizomatiks Research is accountable for all steps of a project, from hardware/software development up through operation. Additionally, we study the relationship between people and technology, and collaborate on projects with a myriad of creators. In this presentation, we’ll introduce our past projects which mainly utilized vision technologies such as AR/VR.

SESSION 23

Biomedical Circuits and Systems [Suzaku III]

Thursday, June 13, 14:00-15:40

Chairpersons: Y. P. Xu, National Univ. of Singapore
A. Arbabian, Stanford Univ.

C23-1 - 14:00
A Multimodal Multichannel Neural Activity Readout IC with 0.7μW/Channel Ca²⁺-Probe-Based Fluorescence Recording and Electrical Recording, T. Lee*, J.-H. Park**, J.-H. Cha**, N. Chou***, D. Jang*, J.-H. Kim***, I.-J. Cho***, S.-J. Kim** and M. Je*, *KAIST, **Ulsan National Institute of Science and Technology, ***Korea Institute of Science and Technology (KIST) and ****Ewha Womans Univ., Korea

This paper presents a multimodal multichannel neural activity readout IC which can perform not only the electrical recording (ER) but also the fluorescence recording (FR) of neural activity for the cell-type-specific study of heterogeneous neuronal cell populations. The time-based FR circuit senses Ca²⁺ concentration using Ca²⁺ probes while the ER circuit acquires action potentials (APs) and local field potentials (LFPs). The IC is fabricated in 0.18μm CMOS. The FR circuit achieves a recording range of 81dB (75pA to 860nA) and consumes the power of 0.7μW/Ch. The ER circuit achieves the input-referred noise (IRN) of 2.7μVrms over the bandwidth (BW) of 10kHz, while consuming the power of 4.9μW/Ch. The in-vitro measurement is performed for recording Ca²⁺ concentration and electrical neural signals.

C23-2 - 14:25

A galvanically-coupled body-channel communication (GC-BCC) transceiver (TRX) is proposed for bionic arms, offering robust communication and human-body safety. The GC-BCC mitigates the influence from the environmental changes and disturbances. A simple termination at the RX input widens the channel bandwidth (BW), enabling 100Mb/s communication. The implantable TX guarantees the user’s safety by employing a current-regulating channel driver, a charge-balancing scheme, and a biphasic waveform generated by bipolar RZ (BRZ) encoding. The TRX IC fabricated in 0.18μm CMOS, achieves a low bit-error rate (BER) of 10⁻⁸ with excellent TX and RX energy efficiencies of 4.75pJ/b and 26.8pJ/b, respectively.

C23-3 - 14:50
A 143nW Glucose-Monitoring Smart Contact Lens IC with a Dual-Mode Transmitter for Wireless-Powered Backscattering and RF-Radiated Transmission Using a Single Loop Antenna, C. Jeon, J. Koo, K. Lee, S.-K. Hahn, B. Kim, H.-I. Park and J.-Y. Sim, POSTECH, Korea

This paper presents a smart contact lens (SCL) controller IC with a high-precision current sensor interface and a dual-mode wireless telemetry where a single power-oscillator-based circuit with an external loop antenna supports both LSK and RF data transmission. The implemented IC in 180nm CMOS achieving a dynamic conversion range of 89 dB while dissipating 143 nW is verified in a glucose-sensing SCL system.
This paper presents a direct-digitalization front-end for wearable bio-signal recording. The FE is built with a 2nd order hybrid-CTDT $\Delta\Sigma$ modulator, taking the benefits of oversampling and noise shaping. The $\Delta\Sigma$ topology removes electrode DC offset and shapes signals as well as motion artifacts at the input by adding a $\Sigma$-stage in the feedback loop, while the $\Sigma$-stage recovers the bio-signals by quantizing the difference of the consecutive samples. To meet the requirements of noise, input impedance of a bio-potential interface, a capacitively-coupled chopper amplifier serves as an input stage and also an active adder. An asynchronous 5-bit differential-difference SAR quantizer combines the functionalities of a coarse ADC and a passive adder in a traditional $\Delta\Sigma$ loop, leading to a compact output stage. The prototype IC achieves the peak SNR of 105.6dB and DR of 108.3dB with the maximum linear input range of 720mV$_{pp}$. 

SESSION 24

AI Accelerators [Suzaku I]

Thursday, June 13, 14:00-15:40

Chairpersons: M. Natsui, Tohoku Univ.
C. Tokunaga, Intel Corp.

C24-1 - 14:00

This work presents a scalable deep neural network (DNN) accelerator consisting of 36 chips connected in a mesh network on a multi-chip-module (MCM) using ground-referenced signaling (GRS). While previous accelerators fabricated on a single monolithic die are limited to specific network sizes, the proposed architecture enables flexible scaling for efficient inference on a wide range of DNNs, from mobile to data center domains. The 16nm prototype achieves 1.29 TOPS/mm$^2$, 0.11 pJ/op energy efficiency, and 4.01 TOPS peak performance for a 1-chip system, and 127.8 peak TOPS and 2615 images/s ResNet-50 inference for a 36-chip system.

C24-2 - 14:25
A Full HD 60 fps CNN Super Resolution Processor with Selective Caching based Layer Fusion for Mobile Devices, J. Lee, D. Shin, J. Lee, J. Lee, S. Kang and H.-J. Yoo, KAIST, Korea

A high-throughput CNN super resolution (SR) processor is proposed for memory efficient SR processing. It has three key features: 1) selective caching based layer fusion to minimize external memory access (EMA), 2) memory compaction scheme for smaller on-chip memory footprint, and 3) cyclic ring core architecture to increase the throughput with improved core utilization. As a result, the implemented processor achieves 60 frames-per-second throughput in generating full HD images.

C24-3 - 14:50
A 1.32 TOPS/W Energy Efficient Deep Neural Network Learning Processor with Direct Feedback Alignment based Heterogeneous Core Architecture, D. Han, J. Lee, J. Lee and H.-J. Yoo, KAIST, Korea

An energy efficient deep neural network (DNN) learning processor is proposed using direct feedback alignment (DFA). The proposed processor achieves 2.2x faster learning speed compared with the previous learning processors by the pipelined DFA (PDFA). In order to enhance the energy efficiency by 38.7%, the heterogeneous learning core (LC) architecture is optimized with the 11-stage pipeline data-path. Furthermore, direct error propagation core (DEPC) utilizes random number generators (RNG) to remove external memory access (EMA) caused by error propagation (EP) and improve the energy efficiency by 19.9%. The proposed PDFA based learning processor is evaluated on the object tracking (OT) application, and as a result, it shows 34.4 frames-per-second (FPS) throughput with 1.32 TOPS/W energy efficiency.

C24-4 - 15:15

A Sparse Neural Acceleration Processor (SNAP) is designed to exploit unstructured sparsity in deep neural networks (DNNs). SNAP uses parallel associative search to discover input pairs to maintain an average 75% hardware utilization. SNAP's two-level partial sum reduce eliminates access contention and cuts the writestream traffic by 22x. Through diagonal and row configurations of PE arrays, SNAP supports any CONV and FC layers. A 2.4mm$^2$ 16nm SNAP test chip is measured to achieve a peak effecual efficiency of 21.55TOPS/W (16b) at 0.55V and 260MHz for CONV layers with 10% weight and activation density. Operating on pruned ResNet-50, SNAP achieves 90.98fps at 0.80V and 480MHz, dissipating 348mW.
SESSION 25

Biosensors [Suzaku III]

Thursday, June 13, 16:00-17:40

Chairpersons: M. Je, KAIST
C. Lopez, imec

C25-1 - 16:00

*Univ. of Michigan, USA and **Eidgenössische Technische Hochschule Zürich, Switzerland

This paper presents a 1.7×4.1×2 mm³ pH sensor that is a fully integrated, stand-alone and implantable system. Instead of a bulky cm size Ag/AgCl electrode, we use a mm-size integrated platinum electrode, and differential sensing using ISFET and REFET pair to compensate for unstable fluid potential. We also propose a drift compensation technique in which the leakage from the source and drain through the gate oxide is canceled, reducing drift >100x.

C25-2 - 16:25

An Aptamer-based Electrochemical-Sensing Implant for Continuous Therapeutic-Drug Monitoring in vivo
J.-C. Chien*, P. L. Mage**, H. T. Soh* and A. Arbabian*, *Stanford Univ. and **BD Bioscience, USA

This work presents the first fully wireless implant system capable of continuous monitoring of therapeutic drugs in vivo. Electrochemical readout using square-wave voltammetry (SWV) is employed to measure the changes in the drug concentration using redox-labeled structure-switching aptamers. Ultrasound (US) powering and data transmission are employed in the implant for miniaturization, large tissue depth, and high available power. We demonstrate continuous and real-time detection in the human whole blood. Implemented in 65-nm CMOS, the entire implant system operates at 6.64 mW, and measures 140mm³ and 0.24g.

C25-3 - 16:50

A 114GHz Biosensor with Integrated Dielectrophoresis for Single Cell Characterization, A. Ameri*, L. Zhang*, A. Gharia*, A. M. Niknejad* and M. Anwar**, *Univ. of California, Berkeley and **Univ. of California, San Francisco, USA

A 114GHz permittivity biosensor for characterization of single biological cells is demonstrated. Integrated high-voltage (5.4V) dielectrophoresis (DEP) for precise sample positioning enhances the sensitivity. The sensor detects a 0.73% change in the permittivity in a 1 KHz BW and is capable of identifying cells in their different stages of division as well as differentiating various cell lines.

C25-4 - 17:15

A Sub-pA Current Sensing Front-End for Transient Induced Molecular Spectroscopy, D. Ying, P.-W. Chen, C. Tseng, Y.-H. Lo and D. A. Hall, Univ. of California, San Diego, USA

We report an 8-channel array of low-noise (30.3fA/√Hz) current sensing front-ends with on-chip sensors for label-free, restriction-free biosensing. The analog front-end (AFE) consists of a 4th-order continuous-time delta-sigma modulator (DSM) that achieves 123fA sensitivity and 139dB cross-scale dynamic range over a 10 Hz bandwidth while consuming 50μW and occupying 0.11mm² per channel. A digital IIR filter and a tri-level pulse width modulated current-steering DAC are used to realize the equivalent performance of a multi-bit DSM in an area/power efficient manner. This platform was used to observe protein-ligand interactions in real-time.

SESSION 26

Power Management & Energy Harvester [Suzaku II]

Thursday, June 13, 16:00-17:40

Chairpersons: K. Kanda, Fujitsu Laboratories Ltd.
P. Mercier, Univ. of California, San Diego

C26-1 - 16:00


A fully-integrated wireless charger that realizes voltage rectification, voltage regulation and CC-CV charging in a single power stage is proposed to achieve high efficiency and low cost and volume. A bootstrapping technique is also proposed to integrate bootstrap capacitors on-chip. The charger was designed in a standard 0.35μm CMOS process with a die area of 8mm², and the measured peak efficiencies reaches 92.3% and 91.4% when the charging currents are 1A and 1.5A, respectively.

This paper proposes a stacked DLDO array with three stacked groups to improve security and efficiency, consuming 1/3 of the input current in the prior art. The security is improved by two mechanisms. The AES engine can be one of POLs hidden in the deeper levels to minimize the disturbance from the AES to the input current. The other is the digital balanced interleave control (DBIC) receives random sources from internal leakage current frequency generator (LCFG) to generate randomly noise current to further hide the current interference caused by the AES. With the help of DBIC and LCFG techniques, the correlation between input current and AES current is extremely low to 0.006, which is 150 times lower than that of conventional DLDO.


This work presents an integrated maximum-power-point tracking (MPPT) algorithm and its implementation for the high-performance parallel-synchronized-switch harvesting-on-inductor (SSHI) rectifier, which uses the Perturb and Observe (P&O) method and a proposed power monitor for output power evaluation. Fabricated in 130nm, this piezoelectric energy-harvesting system implements a 417% FOM rectifier with 97% tracking efficiency MPPT, which makes it the first work demonstrating a parallel-SSHI rectifier and high tracking-efficiency MPPT simultaneously.

A Bidirectional High-Voltage Dual-Input Buck Converter for Triboelectric Energy-Harvesting Interface Achieving 70.72% End-to-End Efficiency, I. Park, J. Maeng, M. Shim, J. Jeong and C. Kim, Korea Univ., Korea

A bidirectional high-voltage dual-input buck converter and a fully integrated maximum power point tracker for a triboelectric energy-harvesting system are proposed. The proposed MPP tracker carries out the fractional open-circuit voltage method without any external resistor or reference voltage for voltage down conversion. The proposed buck converter regulates two high DC input voltages from a triboelectric nanogenerator up to 70 V with a single shared inductor. By reducing the capacitance at the switching node, the power conversion efficiency is improved by 19% with the similar input power. The maximum end-to-end efficiency is 70.72%, which is 21.15% higher than prior work.

Friday Forum

Enabling Technologies for Autonomous Driving [Suzaku I, II, III]

Organizers:  T. Tanaka, Tohoku Univ.
K. Benaissa, Texas Instruments Inc.
Y. Oike, Sony Corp.
R. Kapusta, Analog Devices, Inc.

Moderator:  K. Nakamura, Analog Devices, Inc.

Envisioning Smart Mobility Society in the Connected Future, T. Imai, Toyota Info Technology Center

The automotive industry is changing faster today than it has in 100 years and must reconsider what our society and customers expect from us – as automotive companies. It is not only a shift from a car manufacturing & sales company to a mobility company but also a convergence of electrification, connectivity and artificial intelligence. With these exciting advances, it is our mission to provide new mobility society.

The main objectives of this session are: (1) the current state of vehicle connectivity, showing connected vehicles in Japan and how to utilize big data, and (2) our vision of the smart mobility society of the future, which is the key to realize seamless and comfortable transportation through connected vehicles with the Vehicle Control Interface and the Mobility Service Platform (MSPF).

Autonomous Driver Assistance Functions, K. Khouri, NXP Semiconductors

TBD

Electronics Technologies Evolve Automobiles!?, N. Kawahara, DENSO Corp.

Automotive electronics have been evolving and creating new control systems to realize safer and more eco-friendly vehicles. Many automotive functions are changing from mechanical to electronic control. By changing the control systems, the number of electronic parts such as sensors, electronic circuits, and actuators has been drastically increasing. And this trend will continue in the future to evolve automobiles. MEMS technologies, along with the packaging, electronic circuit, and software technologies, will become more important in the future vehicle equipped with many advanced sensors. Undoubtedly, the control system becomes more advanced with each improvement in the sensing speed or sensor accuracy. In the presentation, the future trend of automobiles and Electronics will be discussed.

Human vision is the most essential sensor to drive vehicle. Instead of human eyes, CMOS image sensor is the best sensing device to recognize objects and environment around the vehicle. Image sensors are also used in various use cases such as driver and passenger monitor in cabin of vehicle. For these use cases, some special functionalities and specification are needed. In this session the requirements for automotive image sensor will be discussed such as high dynamic range, flicker mitigation and low noise. In the last part the key technology to utilize image sensor, such as image recognition and computer vision will be discussed.

The Advent of the GPU in AI/Supercomputing and its Application to Autonomous Driving, T. Baji, NVIDIA Corp.

In the old good days, CPU performance increased almost 1.5 times / year thanks to the Moor’s Law. However by the year 2010, due to the leakage current and too complex CPU architecture, this rate becomes 1.1 times / year. On the other hand, parallel processing dedicated GPU continues to grow its performance with the rate of 1.5 times / year, and even with the Moor’s Law ending, it still continues to grow its performance by built-in accelerators. Now GPU is the most widely used accelerator in AI and Supercomputing. This GPU architecture is also applied to the most advanced autonomous driving SoC Xavier. In this talk, GPU technologies which realize this high performance, the autonomous driving platform based on this GPU and Xavier SoC, and the end-to-end system solution that enables its functional safety and reliability will be introduced.

Fleet Autonomous Vehicles for Ride-Hailing Service, TBD

TBD