publications | Rohith Pudari

2026

TOSEM
When Tools Overlook Domain Knowledge: An Empirical Study of Refactoring in Scientific Software

Rohith Pudari, Ahmed Musa Awon, Neil Ernst, and Shurui Zhou

ACM Trans. Softw. Eng. Methodol., Jun 2026

Just Accepted

Abs DOI Bib PDF

Refactoring is a critical process for improving code quality, but anecdotal evidence has shown that refactoring in scientific software (Sci-SW ) is not always feasible. The inherently exploratory nature of Sci-SW development, characterized by evolving requirements and limited adoption of traditional software engineering practices, could present significant challenges to refactoring. However, there is no systematic study exploring refactoring practices in Sci-OSS. To bridge this gap, we explore the effectiveness of three state-of-the-art refactoring detection tools: RefDiff (C), RefactoringMiner (Java), and PyRef (Python) to detect refactorings in scientific open-source software (Sci-OSS). Our findings reveal that these tools have significant limitations, detecting fewer refactorings in Sci-OSS than non-scientific OSS (Non-Sci-OSS). Through a mixed-method approach, we identified that 67.54% of undetected refactorings in Sci-OSS require domain knowledge. To complement our analysis of the refactoring changes, we conducted surveys with 47 practitioners experienced in refactoring Sci-OSS and 14 follow-up interviews to gain deeper insights into the associated challenges. Our results revealed seven novel challenges for Sci-OSS-refactoring, including a domain knowledge gap. These findings emphasize the necessity for specialized tools and strategies to support refactoring in Sci-OSS effectively.
@article{pudari2026refactoring, author = {Pudari, Rohith and Musa Awon, Ahmed and Ernst, Neil and Zhou, Shurui}, title = {When Tools Overlook Domain Knowledge: An Empirical Study of Refactoring in Scientific Software}, year = {2026}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, issn = {1049-331X}, doi = {10.1145/3821432}, note = {Just Accepted}, journal = {ACM Trans. Softw. Eng. Methodol.}, month = jun, keywords = {Refactoring detection, Research software engineering, Tool limitations, Domain-specific refactoring}, }

2023

ICSME
Aligning Documentation and Q&A Forum through Constrained Decoding with Weak Supervision

Rohith Pudari, Shiyuan Zhou, Iftekhar Ahmed, Zhuyun Dai, and Shurui Zhou

In 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME), Oct 2023

Abs DOI Bib PDF Blog Slides

Stack Overflow (SO) is a widely used question-and-answer (Q&A) forum dedicated to software development. It plays a supplementary role to official documentation (DOC for short) by offering practical examples and resolving uncertainties. However, the process of simultaneously consulting both the documentation and SO posts can be challenging and time-consuming due to their disconnected nature. In this study, we propose DOSA, a novel approach to automatically align SO and DOC, which inject domain-specific knowledge about the DOC structure into large language models (LLMs) through weak supervision and constrained decoding, thereby enhancing knowledge retrieval and streamlining task completion during the software development procedure. Our preliminary experiments find that DOSA outperforms various widely-used baselines, showing the promise of using generative retrieval models to perform low-resource software engineering tasks.
@inproceedings{pudari2023dosa, author = {Pudari, Rohith and Zhou, Shiyuan and Ahmed, Iftekhar and Dai, Zhuyun and Zhou, Shurui}, booktitle = {2023 IEEE International Conference on Software Maintenance and Evolution (ICSME)}, title = {Aligning Documentation and Q&A Forum through Constrained Decoding with Weak Supervision}, year = {2023}, volume = {}, number = {}, pages = {346-351}, keywords = {}, doi = {10.1109/ICSME58846.2023.00043}, issn = {2576-3148}, month = oct, }
ARXIV
From Copilot to Pilot: Towards AI Supported Software Development

Rohith Pudari and Neil A. Ernst

2023

Abs arXiv Bib Blog Slides

AI-supported programming has arrived, as shown by the introduction and successes of large language models for code, such as Copilot/Codex (Github/OpenAI) and AlphaCode (DeepMind). Above human average performance on programming challenges is now possible. However, software engineering is much more than solving programming contests. Moving beyond code completion to AI-supported software engineering will require an AI system that can, among other things, understand how to avoid code smells, to follow language idioms, and eventually (maybe!) propose rational software designs. In this study, we explore the current limitations of AI-supported code completion tools like Copilot and offer a simple taxonomy for understanding the classification of AI-supported code completion tools in this space. We first perform an exploratory study on Copilot’s code suggestions for language idioms and code smells. Copilot does not follow language idioms and avoid code smells in most of our test scenarios. We then conduct additional investigation to determine the current boundaries of AI-supported code completion tools like Copilot by introducing a taxonomy of software abstraction hierarchies where ’basic programming functionality’ such as code compilation and syntax checking is at the least abstract level, software architecture analysis and design are at the most abstract level. We conclude by providing a discussion on challenges for future development of AI-supported code completion tools to reach the design level of abstraction in our taxonomy.
@misc{pudari2023copilot, title = {From Copilot to Pilot: Towards AI Supported Software Development}, author = {Pudari, Rohith and Ernst, Neil A.}, year = {2023}, archiveprefix = {arXiv}, primaryclass = {cs.SE}, }