About Me

I am a Deep Learning Researcher and Engineer with a passion for using AI to improve everyday lives. I firmly believe that for Deep Learning to have a large scale impact across the globe, these models need to be pushed into devices with small form factors - from mobile phones to microcontrollers. Pushing DL into these devices will truly democratize these technologies by allowing people with limited connectivity to avail its benefit. This specific type of democratization also enables the data to be processed locally, allowing people to avail the benefits of DL without compromising on their right to privacy.

Currently, I am a Senior Research Engineer with the ML Research Lab at Arm working at enabling efficient execution of Deep Learning (DL) workloads on small devices. My work has led to filing of multiple patents, publications at various venues and led to product improvement. Specifically, I have explored model compression techniques (structured matrices, tensor decomposition, quantization and pruning), faster inference libraries, applied ML for faster inference and CNN Hardware Accelerators. The work around CNN hardware accelerator was the first effort within Arm to explore a new product for the DNN market in the embedded domain. The efforts in this project was one of the catalyst that led to the creation of Arm's NN Accelerator products . Finally, I have also led the efforts to benchmark ML Workloads on Arm platform to isolate performance bottlenecks. Both as a consultant for the newly formed team to develop the CNN Hardware offering by Arm and as Arm Research's representative for the TinyML performance working group (a Academia and Industry consortium)

In my previous employment, I have worked as a performance architect for indirect branch predictors at AMD. The predictor showed significant benefits and is part of the AMD Server Processors that are about to hit the market (in 2020). I have also worked as a verification and design engineer for memory controllers, H.264 video encoder decoder and neural network accelerator at Texas Instruments.

I did my masters at University of Wisconsin Madison pursuing a degree in Computer Science. My research work focused on predicting GPU Speedup of an application using Decision Trees and developing accelerator for classical computer vision applications.