Publications

You can also find my articles on Google Scholar.

Publications

Under Review

SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators [ Paper Link ]
Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions

Technical Report

EvaByte: Efficient Byte-level Language Models at Scale [ Blog Link ]
SubgoalXL: Subgoal-based Expert Learning for Theorem Proving [Paper Link]
Composition of Experts: A Modular Compound AI System Leveraging Large Language Models [ Paper Link ]

Published Papers and Book Chapters

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models [ Paper Link ]
ICLR 2026
Cross-Family Speculative Prefill: Training-Free Long-Context Compression with Small Draft Models
[ Paper Link ][Workshop Link ]
ICLR 2026 Workshop on AI with Recursive Self-Improvement
Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls
[ Paper Link ][Workshop Link ]
ICLR 2026 Workshop on AI with Recursive Self-Improvement
The Limits of Long-Context Reasoning in Automated Bug Fixing
[ Paper Link ][Workshop Link ]
ICLR 2026 Workshop on I Can't Believe It's Not Better: Where Large Language Models Need to Improve
Synthetic Document Question Answering in Hungarian
[ Paper Link ][Workshop Link ]
CVPR 2025 Workshop on Vision Language Models For All: Building Geo-Diverse and Culturally Aware Vision-Language Models (VLMs-4-All)
Training Domain Draft Models for Speculative Decoding: Best Practices and Insights
[ Paper Link ][Workshop Link]
ICLR 2025 Workshop on Scalable Optimization for Efficient and Adaptive Foundation Models (SCOPE)
LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference
[ Paper Link ][Workshop Link]
ICLR 2025 Workshop on Sparsity in LLMs (SLLM) (SLLM)
SambaLingo: Teaching Large Language Models New Languages
[Paper Link][Workshop Link]
4th Multilingual Representation Learning (MRL) Workshop at 2024 Conference on Empirical Methods in Natural Language Processing
Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge
[ Paper Link ][ Workshop Link]
Won the best paper award
Workshop on Customizable NLP (CustomNLP4U) at 2024 Conference on Empirical Methods in Natural Language Processing
Bloom: A 176b-parameter open-access multilingual language model
( Journal of Machine Learning Research (JMLR), 2024 (to appear) )
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts ( MICRO 2024, (To appear))
57th IEEE/ACM International Symposium on Microarchitecture
[ Conference ][Paper]
Composition of Experts on the SN40L Reconfigurable Dataflow Unit (IEEE Micro Magazine, (To appear))
[ Journal ][Paper]
Efficiently adapting pretrained language models to new languages (2023 ENLSP-NeurIPS III Workshop)
Zoltan Csaki, Pian Pawakapan, Urmish Thakker, Qiantong Xu
The Third NeurIPS Workshop on Efficient Natural Language and Speech Processing 202
[ Workshop ][ Paper]
Training Large Language Models efficiently with Sparsity and Dataflow ( SNN Workshop at ICLR 2023 )
Venkat Srinivasan, Darshan Gandhi, Urmish Thakker, Raghu Prabhakar
ICLR 2023 Workshop on Sparsity in Neural Networks
Links [ Workshop ][ Paper][ Poster]
Federated Learning for Resource-Constrained IoT Devices: Panoramas and State-of-the-art, Book Name: Federated and Transfer Learning, Publisher: Springer, Oct 2022 [ Link ]
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
ACL Demos Track 2022
[Arxiv Link]
Multitask Prompt Tuning Enables Zero-Shot Task Generalization ( ICLR 2022 )
Paper - https://arxiv.org/abs/2110.08207
Model - https://huggingface.co/bigscience/T0pp
Github link to dataset - https://github.com/bigscience-workshop/promptsource/
The Tenth International Conference on Learning Representations, April 2022
MLPerf Tiny Benchmark
Colby Banbury, Vijay Janapa Reddi, Peter Torelli, Jeremy Holleman, Nat Jeffries, Csaba Kiraly, Pietro Montino, David Kanter, Sebastian Ahmed, Danilo Pau, Urmish Thakker , Antonio Torrini, Peter Warden, Jay Cordaro, Giuseppe Di Guglielmo, Javier Duarte, Stephen Gibellini, Videet Parekh, Honson Tran, Nhan Tran, Niu Wenxu, Xu Xuesong ( NeurIPS 2021 )
[arxiv link][Openreview Link]
Thirty-fifth Conference on Neural Information Processing Systems, Dec 2021
MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers ( MLSys 2021 )
Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas Navarro, Urmish Thakkar, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, Paul N. Whatmough
[Arxiv][MLSys Link]
Fourth Conference on Machine Learning and Systems, April 2021
Doping: A Technique for Extreme Compression of LSTM Models using Sparse Additive Matrices( MLSys 2021 )
Urmish Thakker , Paul N. Whatmough, Zhi-Gang Liu, Matthew Mattina, Jesse Beu
[ MLSys Link ][ Arxiv Link]
Fourth Conference on Machine Learning and Systems, April 2021
Compressing RNNs to Kilobyte Budget for IoT Devices Using Kronecker Products ( JETC 2021 )
Urmish Thakker, Jesse Beu, Dibakar Gope, Chu Zhou, Igor Fedorov, Ganesh Dasika and Matthew Mattina
To appear, ACM Journal on Emerging Technologies in Computing Systems, 2021
[Arxiv Link][ACM Link]
Federated Learning for Resource-Constrained IoT Devices: Panoramas and State-of-the-art
Ahmed Imteaj, Urmish Thakker, Shiqiang Wang, Jian Li and M. Hadi Amini
( IOTJ 2021 ) [Arxiv] [ IEEE Link ]
IEEE Internet of Things Journal, July 2021
Rank and Run-time aware compression of NLP Applications ( SustaiNLP-EMNLP 2020 )
Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika and Matthew Mattina
First Workshop on Simple and Efficient Natural Language Processing at The Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov 2020
Links [Workshop][ Arxiv Paper ][ACL Link]
Pushing the Envelope of Dynamic Spatial Gating technologies ( AIChallengeIoT 2020 )
Xueqin Huang, Urmish Thakker , Dibakar Gope, Jesse Beu
2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things at ACM SenSys, Nov 2020
Links [Workshop][ACM Link ]
Understanding the Impact of Dynamic Channel Pruning on Conditionally Parameterized Convolutions ( AIChallengeIoT 2020 )
Ravi Raju, Dibakar Gope, Urmish Thakker , Jesse Beu
2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things at ACM SenSys, Nov 2020
Links [Workshop][ ACM Link ]
Ternary MobileNets via Per-Layer Hybrid Filter Banks ( Joint Workshop on Efficient Deep Learning in Computer Vision )
Dibakar Gope, Jesse Beu, Urmish Thakker , Matthew Mattina
Joint Workshop on Efficient Deep Learning in Computer Vision at Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
Links [Workshop][Paper]
Pushing the limits of RNN Compression (NeurIPS-EMC2 2019)
Urmish Thakker, Igor Fedorov, Jesse Beu, Dibakar Gope, Chu Zhou, Ganesh Dasika and Matthew Mattina
5th Workshop on Energy Efficient Machine Learning and Cognitive Computing, Co-located with the 33rd Conference on Neural Information Processing Systems (NeurIPS), Dec. 2019.
Links [Workshop][Arxiv Paper][ IEEE Link]
Skipping RNN State Updates without Retraining the Original Model* (SenSys-ML 2019)
Jin Tao, Urmish Thakker, Ganesh Dasika, Jesse Beu
1st Workshop on Machine Learning on Edge in Sensor Systems (Sensys-ML), Co-located with 17th ACM Conference on Embedded Networked Sensor Systems (SenSys 2019), Nov. 2019
Links [Workshop][Paper]
*Won the best paper award
Run-Time Efficient RNN Compression for Inference on Edge Device (ISCA-EMC2 2019)
Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika and Matthew Mattina
4th Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), Co-located with the 46th Int. Symp on Computer Architecture (ISCA), Jun. 2019.
Links [Workshop][Paper]

Peer Reviewed Workshop Papers
Doping: A Technique for Extreme Compression of LSTM Models using Sparse Additive Matrices ( SNN Workshop 2021 )
Urmish Thakker, Paul Whatmough, Zhi-Gang Liu, Matthew Mattina, Jesse Beu
Sparsity in Neural Networks: Advancing Understanding and Practice, July 2021
Links [Workshop]
Doped Structured Matrices for Extreme Compression of LSTM Models ( SustaiNLP-EMNLP 2020 )
Urmish Thakker, Paul Whatmough, Zhi-Gang Liu, Matthew Mattina, Jesse Beu
First Workshop on Simple and Efficient Natural Language Processing at The Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov 2020
Links [Workshop]
Benchmarking TinyML Systems: Challenges and Direction* (Benchmarking Machine Learning Workloads on Emerging Hardware Workshop)
Colby Banbury , Vijay Janapa Reddi , Will Fu , Max Lam , Amin Fazel , Jeremy Holleman , Xinyuan Huang , Robert Hurtado , David Kanter , Anton Lokhmotov , David Patterson , Danilo Pau , Jeff Sieracki , Jae-Sun Seo , Urmish Thakkar, Marian Verhelst , Poonam Yadav
First International Workshop on Benchmarking Machine Learning Workloads on Emerging Hardware at Third Conference on Machine Learning and Systems (MLSys), March 2020
*As part of the TinyML Performance Working Group
Links [Workshop][Paper]
Compressing Language Models using Doped Kronecker Products (On-device Intelligence Workshop)
Urmish Thakker , Paul Whatmough, Matthew Mattina, Jesse Beu
On-device Intelligence Workshop at Third Conference on Machine Learning and Systems (MLSys), March 2020
Links [Workshop][Paper][Video]
A Static Analysis-based Cross-Architecture Performance Prediction Using Machine Learning (ISCA-AIDArc 2019)
Newsha Ardalani, Urmish Thakker, Aws Albarghouthi, Karu Sankaralingam
2nd International Workshop on AI-assisted Design for Architecture co-located with 46th Int. Symposium on Computer Architecture (ISCA), Jun. 2019
Links [Workshop][Paper]
Measuring scheduling efficiency of RNNs for NLP applications (ISPASS-Fasthpath 2019)
Urmish Thakker, Ganesh Dasika, Jesse Beu, Matthew Mattina
6th edition of International Workshop on Performance Analysis of Machine Learning Systems (Fastpath) co-located with IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), March 2019.
Links [Workshop][Paper]

Extended Abstracts/ Posters
Composition of Experts: A Compound AI Systems Approach to build Large Language Models
Compound AI Systems Workshop 2024 at Data + AI Summit [Link to Workshop]
Hardware Aware Dynamic Inference Technologies (tinyML 2021)
tinyML Summit 2021 [Link to Summit][Recorded Talk]
Improving accuracy of neural networks compressed using fixed structures via doping (tinyML 2020)
Urmish Thakker , Ganesh Dasika, Paul Whatmough, Matthew Mattina, Jesse Beu
tinyML Summit 2020 [Link to Summit][Poster]
Aggressive Compression of MobileNets Using Hybrid Ternary Layers (tinyML 2020)
Dibakar Gope, Jesse Beu, Urmish Thakker , and Matthew Mattina
tinyML Summit 2020 [Link to Summit][Poster]
RNN Compression using Hybrid Matrix Decomposition (tinyML 2019)
Urmish Thakker, Ganesh Dasika, Jesse Beu, Dibakar Gope, and Matthew Mattina
tinyML Summit, Mar. 2019.
Links [Link to Summit][Poster]

Urmish Thakker

Publications

Publications

Under Review

Technical Report

Published Papers and Book Chapters

Peer Reviewed Workshop Papers

Extended Abstracts/ Posters