Aaron Wang

Hello, I'm Aaron, a computer engineering student at the University of Waterloo. I enjoy working on and learning about system-level software, performance engineering, and machine learning.

Currently, I am an intern at NVIDIA on the PyTorch Core team. I work on improving the performance of the framework on NVIDIA hardware at a system level.

Previously, I interned at CentML (Now NVIDIA) as an Machine Learning Systems Engineer, where I helped optimize LLM inferencing. I worked on features such as speclative decoding, multi-node inferencing, model compilation, and more! I was also a research intern at Huawei on the AI network infrastructure team. I helped research communication optimization within AI systems, especially on the distributed side. I co-authored a paper (currently under review!) on collective communication scheduling algorithms in GPU clusters.

On the side, I also help lead & write drone software at the WARG. We build anything software-related to help drones fly, from computer vision models to full-stack ground station software. In my free time, I enjoy sampling audio gear, critique ramen restaurants, and playing video games.

Check out my blog!