=>
(Broken and defective. Almost half of the chips imported from China to Russia are unusable), Oct 17, 2022
Electrical goods and other home appliance has already increased by 18.9% YoY in Sep.
"gray" import
"A Study on the Impact of Instruction Set Architectures on Processor's Performance"
MSc Thesis, Aug 2017
gem5 simulator
x86-64: Haswell (OoO), Atom (IO)
ARMv8: A15 (OoO), A8 (IO)
Alpha: 21264 (OoO), 21164 (IO)
WDDD 2017, Jun 25 2017
=>
@LightOnIO
Appliance, a Photonic Co-processor sets a new pathway for Transformative AI / HPC computing, Mar 3, 2021
The First Commercial Photonic Computing in your own DC
50 TOPS/W
PCIe Over Fiber, Samtec, IEEE ComSoc SCV, Feb 14 2018 PDF
Samtec
FireFly Micro Flyover System
Apr 2017
PCI Express Over 100 M Of Optical Cable, Nov 30 2017
Ref
=>
"A case for scoped persist barriers in GPU", AMD Research, GPGPU 2018, Feb 25 2018
HW support for persist barriers
The only additional HW is Non-Volatile-Write FIFOs
AMD Radeon Pro SSG?
Datasheet
HBCC?
=>
"Inside Rosetta: The Engine Behind Cray's Slingshot Exascale-Era Interconnect", Feb 9, 2020
200 Gb using four 56G PAM4 Channels
The Real Magic Is In The Congestion Control
GPCNeT, SC19
Slingshot, CUG 2019
"A Superscalar Out-of-Order x86 Soft Processor for FPGA", PhD Thesis, Nov 2017
Design of the microarchitecture and circuits of a two-issue (superscalar) outof-order x86 FPGA soft processor
To boot most 32-bit x86 OS unmodified
=>
"Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim", 2nd WS on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC^2), Fe 17, 2019
Slides
Paper/Github
=>
"Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim", 2nd WS on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC^2), Fe 17, 2019
FireSim
=>
AMD: We’re Using an Optimized TSMC 5nm Process, Jan 10, 2022
"While not explicitly stating that the need to be leading edge is no longer critical,"
Stacked dies for ML, Appl, Dec 2021 (Dec 2020)
Packaging
=>
"So, You Think You Can Design A 20 Exaflops Supercomputer?", Jun 30, 2022
🇺🇸 DoE RFI: Advanced Computing Ecosystems
2025 - 2030
20 - 60 MW
can solve scientific problems 5-10x
🇯🇵次世代計算基盤 (FukaguNEXT)
=>
"LeFlow: Enabling Flexible FPGA High-Level Synthesis of Tensorflow Deep Neural Networks", arXiv, Jul 14, 2018
Int WS on FPGAs for Software Programmers (FSP 2018), Aug 31, 2018
"Large-Scale HPC systems based on Heterogeneous multicore processors",
Toshikazu Ebisuzaki (RIKEN), 戎崎俊一 (理研), Int Symp: New Horizons of Computational Science w/ Heterogeneous Many-Core Processors, Feb 27 2018
PEZY-SC2
Gyoukou
=>
"Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence", .., NVIDIA, arXIv, Feb 12, 2020
223 references
GPUDirect Storage
RAPIDS
=>
"GPUDirect Storage: Transfer Data Directly to GPU Memory, Alleviating IO Bottlenecks", NVIDIA SC19, Nov 20, 2019
Video
Slides
RAPIDS, Sep 2019
GPUDirect Storage, Blog, Aug 2019
=>
"Performance Optimization of SU3_Bench on Xeon and Programmable Integrated Unified Memory Architecture (PIUMA)", Intel and LBNL, arXiv, Mar 22, 2021
MILC Lattice Quantum ChromoDynamics
PIUMA [5], Oct 2020
=>
"Rusty’s legacy: The role of MPI in modern AI".
@thoefler
, Symposium in Honor of Rusty Lusk, Aug 14, 2023
Data Movement is Most Important!
Rusty Lusk Memorial Tribute, Dec 2022
30 Years of MPI
25 Years
=>
AArch64-ExploreExploration of Apple CPUs,
@handleym99
Vol 1: CPU Core (205 pp)
Vol 2: Load Store Unit, Cache, & Memory (215 pp)
Vol 3: Patent Exploration (116 pp)
とても素晴らしい
My Apple patent tweets
=>
Intel Creates Neuromorphic Research Community to Advance ‘Loihi’ Test Chip, Mar 1 2018
Test board
やはり、Intel Hillsboroで開催ので…
Loihi: A Neuromorphic Manycore Processor ..., IEEE Micro Jan/Feb 2018
=>
A Domain-Specific TPU Supercomputer for Training DNNs, N. Jouppi, Google, UW-Madison Virtual Comp Arch Seminar, Oct 27, 2020
1:05:01
64 pp
Hot Chips 2020
CACM, Jul 2020
Patent Appl
=>
"Novel Transformer Model Based Clustering Method for Standard Cell Design Automation", NVIDIA, ISPD 2024
Best Paper
Chia-Tung Ho
ChipNeMo, Haoxing (Mark) Ren, Chiplet Summit 2024
=>
"ChipNeMo – LLM for Chip Design", Haoxing (Mark) Ren, NVIDIA, Pre-Conference Tutorial G: AI in Chiplet Design, Chiplet Summit 2024, Feb 6
H. Ren, Jan 2, 2023
Bill Dally, Nov 9, 2023
ChipNeMo, Oct 30, 2023
Intelからの正式プレスが出た!
"Jim Keller Joins Intel to Lead Silicon Engineering", Intel, Apr 26 2018
Senior Vice President
" encompasses system-on-chip (SoC) development and integration"
"Tesla's Autopilot Chief Keller Steps Down After Two Years", Bloomberg, Apr 26 2018 12:17 JST, Updated 13:18 JST
"AMD CPU hero Jim Keller leaving Tesla to join Intel", TweakTown, Apr 26 2018 JST
"According to my industry sources"
"How to Save Money in Your Small Data Center", LBNL, Aug 20, 2018 PDF
1W saving at server => 2.50 W saving
CoE for Energy Efficiency in DCs, LBNL
DCs, FEMP Training
Better Buildings
=>
=>
Monolithic Integration of Photonics and Electronics for High-Capacity Co-Packaged Optical Engines, G. Röll, CTO, RANOVUS, EPIC Meeting on Photonics at the Final Frontier @ ESA, Sep 14 2022
GF, May 19
Mar 3
=>
Ranovus
Co-Packaged Optics using a Xilinx Versal ACAP and Ranovus Odin 800Gbps CPO 2.0, Mar 3, 2022
Monolithic 100G Optical I/O Cores for Next-Gen DC based on GF Fotonix, Mar 7
=>
[Slides] PULP: An Open Hardware Platform,
The story so far,
Workshop, HPCA 2018, Feb 25 2018
57 + 38 + 43 + 19 pages
Intro
PULP family tree
RISC-V
Accelerators
HERO
Programming
PULP Platform (Parallel Ultra Low Power Platform)
=>
Apr 15, 2022
Russian Gov has prepared a Preliminary Concept for a New National Project in Microelectronics (WG: Apr 22)
3.19T Rubles, by 2030
2020: 90 nm
2030: 28 nm
Univ
Mikron: New US Sanctions, Mar 31
=>
"A Process Independent Power Optimised Register File Architecture", Jun 30, 2021, Tony Stansfield, CTO, sureCore
"how low-power SRAM technology can be adapted to deliver low-power register files."
=>
"Gorgon: Accelerating Machine Learning from Relational Data", ..., Kunle Olukotun, Stanford, ISCA 2020, PDF
Unified data analysis CGRA for In-DB ML
Plasticine
"Democratizing AI", Kunle Olukotun, Nov 2019
=>
Next Silicon
Pioneering a radically new approach to HPC architecture
Oct 2020 (Aug 2017)
Appl, Feb 2019 (Aug 2017)
Oct 2020 (Sep 2017)
Sep 2021 (Feb 2021)
=>
"An Open Source FPGA-Optimized Out-of-Order RISC-V Soft Processor",
@BlazeDare23
, FPT 2019, PDF
Best paper candidate
up to 2.5x Dhrystone MIPS
using 60% fewer registers and 64% fewer LUTs
SystemVerilog
=>
"A Bibliography of Publications on Floating-Point Arithmetic", Feb 19 2018, Version 3.524
ftp://ftp.math.utah.edu/pub/tex/bib/fparith.pdf
1098 pages
[1] John Colson, 1726
[2] Charles Babbage, 1837
....
[6317] Neil Burgess, Javier Bruguera, & Florent de Dinechin, editors. 2017
=>
"Regional Autonomous Heterogeneous Many-core Processor for High-Performance Computing", NUDT, Patent Application, Mar 2022 (Nov 2021)
"MT-3000: a Heterogeneous Multi-zone Processor for HPC", NUDT, May 24, 2022, CCF Trans on HPC
.
@mhiramat
@rioriost
Jon Mastersさん (Red Hat) の Stanford大学での 2018年1月31日の講演ビデオ:Exploiting modern microarchitectures: Meltdown, Spectre, and other hardware attacks が公開されました
=>
"A Deep Dive Into NEC's Aurora Vector Engine", Nov 22 2017
MPI operations to be directory executed between the VEs w/o copying
VE can access Xeon DDR4 via DMA
6x HBM2
1.2TB/s
VecLen: 256 x 64bit
2.45TF @ 1.6GHz
STREAM/node, price
HPL/node, price
=>
"5th Gen Intel Xeon Scalable Emerald Rapids Resets Servers by Intel", Dec 14, 2023
Drop-in compatible w/ 4th
Up to 64 cores
Up to 320 MB Shared LLC (3x)
Up tp 5600 MT/s
CXL 1.1 Type 3 memory
Intel AI Everywhere
Core Ultra (70 pp)
=>
Intel AI Everywhere, Dec 14, 2023
Keynote
(75 pp)
Replay
Gaudi3, arriving on schedule next year
Next-Gen Products
5th Gen Xeon, Nov 8
Core Ultra (70 pp)
=>
"3-D coarse-grained reconfigurable array using multi-pole NEM relays for programmable routing", Akash Levy, .., Priyanka Raina, Stanford, Integration, Oct 13/27, 2022
Post-layout simulation of a hybrid CMOS-NEMS CGRA PE in 40nm
=>
(TSMC's Kaohsiung (高雄) Plant's latest plan finalized to introduce 2nm process), 2023-08-08 (08-09)
Latest construction plan released
Nanosheet: Mass-produced in 2025
Backside solution: 2H/2025, 2026
2nm 竹科 & 中科 + 高雄
Aug 8
=>
"TSMC, Bosch, Infineon, and NXP Establish Joint Venture to Bring Advanced Semiconductor Manufacturing to Europe", Aug 8, 2023
ESMC in Dresden
28/22, 16/12
TSMC: 70%, €3,499.93M
TSMC Board of Directors Meeting Resolutions, Aug 8
IBMシステム/360本邦第1号機到着、東海銀行
from "The 360 Revolution", 2004
Unloaded from the ship is IBM System/360 for the first customer in Japan (from IBM Japan's 50-year history book)
Unloaded from the ship is IBM System/360 for the first customer in Japan.
(from IBM Japan's 50-year history book)
Six weeks to go until the day the IBM System/360 was announced 60 years ago.
#mainframe60
=>
"eGPU: A 750 MHz Class Soft GPGPU for FPGA", Martin Langhammer (Intel), George Constantinides (Imperial), arXiv, Jul 17, 2023
Intel Agilex AGFB014R24A1E1V FPGA
SM machine with 512 threads
FFT & QRD
=>
Fault-Tolerance for High Performance and Distributed Computing: Theory and Practice, SC17 Tutorial, Nov 12 2017 [2/2]
Practical Session Slides
109 pp
Web
ULFM 2.0 release, Nov 3 2017
=>
"Apple commits $430 billion in US investments over five years", Apr 26, 2021
Original five-year goal of $350 billion set in 2018
Next-generation silicon development and 5G innovation
Creating American Jobs with Manufacturers and Suppliers Nationwide
=>
"How are Microchips Made?", Branch Education, May 17, 2024 (27:47)
とても素晴らしい、超お勧め!
※他にも沢山の勉強になるビデオ
Engineering and Science Concepts Illuminated with Videos of Accurate Models and Visualized Physics.
"A Linear Algebra Compiler for Small Problem Sizes", PhD Thesis, 2017
LGen generates code using two levels of math DSLs, perform tiling, loop fusion, vectorization at a high level of abstraction
LGen: A Basic Linear Algebra Compiler
=>
High-Performance Spaceflight Computing, NASA
Mar 24, 2017
Boeing
Project Overview, Nov 2018
Middleware, Flight SW WS
Overview, Dec 2018
Update, Dec 2019
"Large-Scale HPC systems based on Heterogeneous multicore processors",
Toshikazu Ebisuzaki (RIKEN), 戎崎俊一 (理研), Int Symp: New Horizons of Computational Science w/ Heterogeneous Many-Core Processors, Feb 27 2018
PEZY-SC2
Gyoukou
"The Connected, Automated Vehicle: Meeting the Challenges of Car 2.0", Keynote, IEEE Standards Association (IEEE-SA) thernet & IP @ Automotive Technology Day, Nov 2017 PDF
NVIDIA/Intel
Micron, Jun 2018
=>
"Leveraging Advanced Manufacturing to Address Challenges in the Automotive Memory Market.",
Brett Debenham, Sr. Director of Test Probe Central Engineering, Micron,
Keynote, Semiconductor Wafer Test Workshop 2018, Jun 3, 2018 PDF
=>
"BranchScope: A New Side-Channel Attack on Directional Branch Predictor", D. Evtyushkin, et al, ASPLOS 2018, Mar 28 2018 PDF
"Jump over ASLR: Attacking Branch Predictors to Bypass ASLR", MICRO 2016
=>
"Securing Semiconductor Supply Chains", CSET Policy Brief, Jan 2021
71 pp
CSET Issue Brief, Oct 2020
CRS, Oct 2020
SIA Webinar, Nov 2020
SkyWater, Jan 2021
=>
"Dongarra Elected as National Academy of Sciences Member", May 5, 2023
“in recognition of their distinguished and continuing achievements in original research.”
National Academy of Sciences, May 2