Research

My research is around designing efficient abstractions for computer software by optimizing across the stack.

Programming Languages Implementation and Its Architectural Support

My current research focus is around architectural support for automatic memory management in managed languages (also known as garbage collection). In my PhD research, I am designing a novel hardware architecture inside a RISC-V SoC for GC acceleration. My honours project investigated hardware transactional memory (ISMM’21) for concurrent copying GC.

Programming language implementation is the main theme of my research. Since 2017, I have been deeply involved in developing the next generation of the MMTk memory management framework. I also previously worked on the Mu micro virtual machine, developing a RPython JIT compiler on top of Mu. I contribute to open-source language implementations, including Chapel, JikesRVM, and OpenJDK.

Performance Analysis and Optimization

My research of building high-performance, low-level computer systems is underpinned by flexible performance analysis tools and sound evaluation methodology – “if you can’t measure it, you can’t improve it.” I currently work on incorporating tracing technologies (such as eBPF) in managed language runtimes (MPLR’23), revealing optimization opportunities missed by sampling and logging. Previously, my distillation methodology (ISPASS’22) exposed the substantial overheads incurred by production garbage collectors. I applied my knowledge in industrial-strength systems during my internships at Microsoft Research, Twitter, and Google.

I am a strong believer in reproducible science, and serve on the artifact evaluation committees for top-tier conferences. I also help maintain the DaCapo benchmark suite (ASPLOS’25) .

Computer Systems and Systems for ML

I have a broad interest in computer systems, including operating systems, cyber security, and high-performance computing. During my internship at Microsoft Research, I used program synthesis to accelerate distributed ML training. By exploiting the accelerator topologies in the data center, and generating efficient implementations of parallel programming primitives, our synthesized implementations deliver up to 2.2⨉ speed up over hand-optimized vendor libraries. Our paper was awarded the Best Paper Award at PPoPP’21.