Public HC29 (2017)

Thanks to all the attendees, sponsors, volunteers, flint center and committee members for making HC29 another great year. Attendees please find the program slides and all video links here.

If you did not attend HC29 all slides and videos will be made available mid December, 2017.

Flint Center for the Performing Arts, Cupertino, California, Sunday-Tuesday, August 20-22, 2017.

At A GlanceTutorialsConf. Day1Conf. Day2Posters
  • Sunday 8/20: Tutorials
    • 8:00 AM – 9:00 AM: Breakfast
    • 9:00 AM – 12:20 PM: Tutorial 1: P4 for Software Defined Networks: Language and Hardware Implementation
    • 12:20 PM – 1:35 PM: Lunch
    • 1:35 PM – 4:30 PM: Tutorial 2: Building Autonomous Vehicles with NVIDIA’s DRIVE Platform
    • 4:30 PM – 6:00 PM: Reception
  • Monday 8/21: Conference Day 1
      • 7:30 AM – 9:15 AM: Breakfast
      • 9:15 AM – 9:30 AM: Introduction
      • 9:30 AM – 10:00 AM: GPU & Gaming
      • 10:00 AM – 10:30 AM: Break (Eclipse Viewing)
      • 10:30 AM – 11:30 AM: GPU & Gaming (cont)
      • 11:30 AM – 12:30 PM: IOT/Embedded
      • 12:30 PM – 1:45 PM: Lunch
      • 1:45 PM – 2:45 PM: Keynote 1: Direct Human/Machine Interface…

    • 2:45 PM – 3:45 PM: Automotive
    • 3:45 PM – 4:15 PM: Break
    • 4:15 PM – 6:15 PM: Processors
    • 6:15 PM – 7:15 PM: Reception (Wine & Snacks)
  • Tuesday 8/22: Conference Day 2
      • 7:15 AM – 8:15 AM: Breakfast
      • 8:15 AM – 10:15 AM: FPGA
      • 10:15 AM – 10:45 AM: Break
      • 10:45 AM – 11:45 AM: Neural Net
      • 11:45 AM – 12:45 PM: Keynote 2: Advances in AI…

    • 12:45 PM – 2:00 PM: Lunch
    • 2:00 PM – 3:30 PM: Neural Net (cont)
    • 3:30 PM – 4:30 PM: Architecture
    • 4:30 PM – 5:00 PM: Break
    • 5:00 PM – 7:00 PM: Server
    • 7:00 PM – 7:15 PM: Closing Remarks


Sun 8/20 Tutorial Title Presenter Affiliation
8:00 AM Breakfast
Tutorial 1: P4 for Software Defined Networks: Language and Hardware Implementation

The flexibility offered by Software Defined Networks (SDNs) is appealing to network operators, since it allows the customization of network behavior for application-specific needs. In SDNs, flexibility comes at the cost of running the network on general-purpose hardware, reducing the performance that one can obtain from specialization. A recent trend in the industry is to use specialized hardware (ASICs and FPGAs) to combine the best of SDN programmability and flexibility with hardware execution efficiency. Programmable networking hardware allows both enhancing legacy protocols (e.g. adding monitoring to a traditional L2/L3 switching) and developing new protocols at a much faster pace.

To support programmability for network devices, P4 ( has been developed as a new programming language for describing how network packets should be processed on a variety of targets ranging from general-purpose CPUs to NPUs, FPGAs, and custom ASICs. P4 was designed with three goals in mind: (i) protocol independence: devices should not “bake in” specific protocols; (ii) field reconfigurability: programmers should be able to modify the behavior of devices after they have been deployed; and (iii) portability: programs should not be tied to specific hardware targets. P4 is the first widely-adopted domain-specific language for packet processing. Several vendors have developed FPGA-based implementations and, with the arrival of Tofino, there is already at least one domain-specific processor optimized as a compiler target for P4 programs – a processor with a small instruction set that can process one header per cycle. The P4 community has created – and continues to maintain and develop – the language specification, a set of open-source tools (compilers, debuggers, code analyzers, libraries, software P4 switches, etc.), and sample P4 programs, all with the goal of making it easy for P4 users to quickly and correctly author new data-plane behaviors. Specialized backend compilers and optimizers for vendor targets are built upon the open source framework. Using P4 and the open source tools, it is easy to prototype new ideas for networking hardware and applications in P4, by simply augmenting the compiler with support for the new hardware.

The goal of the tutorial is to introduce attendees to the domain of specialized hardware and programming tools for SDN. We discuss the basic operations in networking and how they have influenced the design of the P4 language, and of specialized, programmable networking hardware. We will show how the design goals of P4 language are met through examples of programs that can run on a variety of architectures. We provide several examples of applications (e.g. monitoring) that are enabled only by the combination of programmable hardware. We expect that, at the end of the tutorial, attendees will be familiar with the application domain and with a set of several implementations from different vendors that demonstrate various trade-offs. We aim to encourage researchers to consider the programmable networking devices area as one of the areas where domain-specific specialization is already moving into commercial devices and to contribute to the development of the P4 based ecosystem.

9:00 AM T1 Background on Software Defined Networking Johann Tonsing, Netronome
9:20 AM T1 P4 Language and Applications, Part 1 & 2 Jeongkeun Lee, Barefoot Networks; and Robert Halstead, Xilinx
10:20 AM Break
10:40 AM T1 Overview of the P4 tools Andy Fingerhut, Cisco
11:10 AM T1 P4 Hardware Implementations Jeongkeun Lee, Barefoot Networks; Johann Tonsing, Netronome; and Robert Halstead, Xilinx
12:10 PM T1 Future Directions: Research Problems, Getting Involved, and Resources Andy Fingerhut, Cisco
12:20 PM Lunch
Tutorial 2: Building Autonomous Vehicles with NVIDIA’s DRIVE Platform

It is not hard to imagine a day in the near future where explaining the concept of driving would be analogous to explaining the concept of using a cassette player to play music today. The path to this scenario is paved by the rise of autonomous vehicles towards making our roads safer, and in doing so, redefining transportation as we know it.

NVIDIA’s DRIVE platform provides the foundation and the building blocks to enable development and a path to production for autonomous vehicle development. The hardware’s rich sensor input capability enables data acquisition for use cases such as neural network training and HD map generation and updates. The provided software accelerates the development of autonomous vehicle technologies such as perception, localization, and path planning. This platform provides a seamless transition for algorithms and tools created and tested on the development platform onto products that would demand the quality and safety certification supplied by our Tier 1 partners.

We will be covering the advancements in hardware which facilitate optimized computation for self-driving tasks, including the detection of objects/obstacles around the vehicle. Until recently, the perception of the world around the vehicle was primarily derived from hand-crafted computer vision algorithms. While these handcrafted algorithms were adequate for basic ADAS, it is simply impossible to manually write code for every possible scenario an autonomous vehicle (AV) might encounter. AV requires a new computing model which is deep learning. Deep learning algorithms help address the issues of robustness, accuracy, and scalability, and they have become more relevant through the advancements in the availability of big data, compute power, and accelerated frameworks.

The goal of this tutorial is to provide an overview of the autonomous vehicle landscape through NVIDIA’s platform and to highlight how deep neural networks are changing the autonomous vehicle landscape.

1:35 PM T2 An Overview of NVIDIA’s Autonomous Vehicles Platform Pradeep Kumar Gupta NVIDIA
2:50 PM Break
3:15 PM T2 Deep Neural Networks – Changing the Autonomous Vehicles Landscape Dennis Lui NVIDIA
4:30 PM Reception
6:00 PM End of Reception

Conference Day1

Mon 8/21 Session Title Presenter Affiliation
7:30 AM Breakfast
9:15 AM Introduction
9:30 AM GPU and Gaming The Xbox One X Scorpio Engine John Sell Microsoft
10:00 AM Eclipse Viewing Break
10:30 AM GPU and Gaming (cont) AMD’s Radeon Next Generation GPU Michael Mantor & Ben Sander AMD
NVIDIA’s Volta GPU: Programmability and Performance for GPU Computing Jack Choquette NVIDIA
11:30 AM IOT/Embedded SiFive Freedom SoCs: Industry’s First Open-Source RISC-V Chips Yunsup Lee SiFive
Self-timed ARM M3 Microcontroller for Energy Harvested Applications David Baker ETA Compute
12:30 PM Lunch
1:45 PM Keynote 1 The Direct Human/Machine Interface and hints of a General Artificial Intelligence Dr. Phillip Alvelda, Former DARPA PM
Abstract: Dr. Alvelda will speak about the latest and future developments in Brain-Machine Interface, and how new discoveries and interdisciplinary work in neuroscience are driving new extensions to information theory and computing architectures.
2:45 PM Automotive R-Car Gen3: Computing Platform for Autonomous Driving Era Mitsuhiko Igarashi & Kazuki Fukuoka Renesas Electronics Corporation
Localization for Next Generation Autonomous Vehicles Fergus Noble Swift Navigation
3:45 PM Break
4:15 PM Processors XPU: A programmable FPGA Accelerator for diverse workloads Jian Ouyang Baidu
Knights Mill: Intel Xeon Phi Processor for Machine Learning Jesus Corbal (Lead Presenter), Nawab Ali, Dennis Bradford, Sundaram Chinthamani, Ken Janik, Adhiraj Hassan Intel
Celerity: An Open Source RISC-V Tiered Accelerator Fabric Scott Davidson (UC San Diego), Khalid Al-Hawaj (Cornell) and Austin Rovinski ( U. Michigan)
Graph Streaming Processor (GSP) A Next-Generation Computing Architecture Val Cook ThinCI
6:15 PM Reception
7:15 PM End of Reception

Conference Day2

Tue 8/22 Session Title Presenter Affiliation
7:15 AM Breakfast
8:15 AM FPGA Xilinx RFSoC: Monolithic Integration of RF Data Converters with All Programmable SoC in 16nm FinFET for Digital-RF Communications Brendan Farley Xilinx
Stratix 10: Intel’s 14nm Heterogeneous FPGA System-in-Package (SiP) Platform Sergey Shumarayev Altera/Intel
Xilinx 16nm Datacenter Device Family with In-Package HBM and CCIX Interconnect Gaurav Singh & Sagheer Ahmad Xilinx
FPGA Accelerated Computing Using AWS F1 Instances David Pellerin Amazon
10:15 AM Break
10:45 AM Neural Net 1 A Dataflow Processing Chip for Training Deep Neural Networks Chris Nicol Wave Computing
Accelerating Persistent Neural Networks at Datacenter Scale Eric Chung & Jeremy Fowers Microsoft
11:45 AM Keynote 2 Recent Advances in Artificial Intelligence via Machine Learning and the Implications for
Computer System Design
Jeff Dean Google
12:45 PM Lunch
2:00 PM Neural Net 2 DNN ENGINE: A 16nm Sub-uJ Deep Neural Network Inference Accelerator for the Embedded Masses Paul Whatmough Harvard University/ARM Research
DNPU: An Energy-Efficient Deep Neural Network Processor with On-Chip Stereo Matching Dongjoo Shin & Hoi-Jun Yoo KAIST
Evaluation of the Tensor Processing Unit: A Deep Neural Network Accelerator for the Datacenter Cliff Young Google
3:30 PM Architecture A 400Gbps Multi-Core Network Processor James Markevitch & Srinivasa Malladi Cisco
ARM DynamIQ: Intelligent Solutions using Cluster Based Multi-Processing Peter Greenhalgh ARM
4:30 PM Break
5:00 PM Server The Next Generation IBM Z Systems Processor Christian Jacobi & Anthony Saporito IBM
The Next Generation AMD Enterprise Server Product Architecture Kevin Lepak AMD
The New Intel® Xeon® Processor Scalable Family (Formerly Skylake-SP) Akhilesh Kumar Intel
Qualcomm Centriq 2400 Processor Thomas Speier & Barry Wolford Qualcomm
7:00 PM Closing Remarks
7:15 PM End of Conference


Title Presenter
Using Texture Compression Hardware for Neural Network Inference Hardik Sharma, Tom Olson and Alex Chalfin (Georgia Institute of Technology and ARM)
SoundTracing: Real-time Sound Propagation Hardware Accelerator Dukki Hong, Tae-Hyung Lee, Woonam Chung, Jinseok Hur, Yejong Joo, Juwon Yun, Imjae Hwang and Woo-Chan Park (Sejong University)
A Memory-Efficient Persistent Key-value Store on eNVM SSDs Arup De and Zvonimir Bandic (Western Digital)
Accelerating Big Data Workloads with FPGAs Balavinayagam Samynathan, Shahrzad Mirkhani, Weiwei Chen, John Davis, Maysam Lavasani and Behnam Robatmili (Bigstream)
Loom: A Precision Exploiting Neural Network Accelerator Sayeh Sharify (University of Toronto)
EPIPHANY-V: A TFLOPS scale 16nm 1024-core 64-bit RISC Array Processor Andreas Olofsson (Adapteva, Inc.)
Fully-Integrated Surround Vision and Mirror Replacement SoC for ADAS/Automated Driving Mihir Mody, Piyali Piyali, Rajat Sagar, Gregory Shurtz, Abhinay Armstrong, Yashwant Dutt, Kedar Chitnis, Peter Labaziewicz, Jason Jones and Manoj Koul (Texas Instruments)
GRVI Phalanx On Xilinx Virtex UltraScale+: A 1680-core, 26 MB RISC-V FPGA Parallel Processor Overlay Jan Gray (Gray Research LLC)