publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2026
- When Tools Overlook Domain Knowledge: An Empirical Study of Refactoring in Scientific SoftwareRohith Pudari, Neil A. Ernst, and Shurui ZhouACM Transactions on Software Engineering and Methodology (TOSEM), Jun 2026Just Accepted
Refactoring is a critical process for improving code quality, but anecdotal evidence has shown that refactoring in scientific software (Sci-SW ) is not always feasible. The inherently exploratory nature of Sci-SW development, characterized by evolving requirements andlimited adoption of traditional software engineering practices, could present significant challenges to refactoring. However, there is nosystematic study exploring refactoring practices in Sci-OSS. To bridge this gap, we explore the effectiveness of three state-of-the-art refactoring detection tools: RefDiff (C), RefactoringMiner (Java), and PyRef (Python) to detect refactorings in scientific open-source software (Sci-OSS). Our findings reveal that these tools have significant limitations, detecting fewer refactorings in Sci-OSS than non-scientific OSS (Non-Sci-OSS). Through a mixed-method approach, we identified that 67.54% of undetected refactorings in Sci-OSS require domain knowledge. To complement our analysis of the refactoring changes, we conducted surveys with 47 practitioners experienced in refactoring Sci-OSS and 14 follow-up interviews to gain deeper insights into the associated challenges. Our results revealed seven novel challenges for Sci-OSS-refactoring, including a domain knowledge gap. These findings emphasize the necessity for specialized tools and strategies to support refactoring in Sci-OSS effectively.
2023
- ICSME
Aligning Documentation and Q&A Forum through Constrained Decoding with Weak SupervisionRohith Pudari, Shiyuan Zhou, Iftekhar Ahmed, Zhuyun Dai, and Shurui ZhouIn 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME), Oct 2023Stack Overflow (SO) is a widely used question-and-answer (Q&A) forum dedicated to software development. It plays a supplementary role to official documentation (DOC for short) by offering practical examples and resolving uncertainties. However, the process of simultaneously consulting both the documentation and SO posts can be challenging and time-consuming due to their disconnected nature. In this study, we propose DOSA, a novel approach to automatically align SO and DOC, which inject domain-specific knowledge about the DOC structure into large language models (LLMs) through weak supervision and constrained decoding, thereby enhancing knowledge retrieval and streamlining task completion during the software development procedure. Our preliminary experiments find that DOSA outperforms various widely-used baselines, showing the promise of using generative retrieval models to perform low-resource software engineering tasks.
@inproceedings{pudari2023dosa, author = {Pudari, Rohith and Zhou, Shiyuan and Ahmed, Iftekhar and Dai, Zhuyun and Zhou, Shurui}, booktitle = {2023 IEEE International Conference on Software Maintenance and Evolution (ICSME)}, title = {Aligning Documentation and Q&A Forum through Constrained Decoding with Weak Supervision}, year = {2023}, volume = {}, number = {}, pages = {346-351}, keywords = {}, doi = {10.1109/ICSME58846.2023.00043}, issn = {2576-3148}, month = oct, } - ARXIV
From Copilot to Pilot: Towards AI Supported Software DevelopmentRohith Pudari and Neil A. Ernst2023AI-supported programming has arrived, as shown by the introduction and successes of large language models for code, such as Copilot/Codex (Github/OpenAI) and AlphaCode (DeepMind). Above human average performance on programming challenges is now possible. However, software engineering is much more than solving programming contests. Moving beyond code completion to AI-supported software engineering will require an AI system that can, among other things, understand how to avoid code smells, to follow language idioms, and eventually (maybe!) propose rational software designs. In this study, we explore the current limitations of AI-supported code completion tools like Copilot and offer a simple taxonomy for understanding the classification of AI-supported code completion tools in this space. We first perform an exploratory study on Copilot’s code suggestions for language idioms and code smells. Copilot does not follow language idioms and avoid code smells in most of our test scenarios. We then conduct additional investigation to determine the current boundaries of AI-supported code completion tools like Copilot by introducing a taxonomy of software abstraction hierarchies where ’basic programming functionality’ such as code compilation and syntax checking is at the least abstract level, software architecture analysis and design are at the most abstract level. We conclude by providing a discussion on challenges for future development of AI-supported code completion tools to reach the design level of abstraction in our taxonomy.
@misc{pudari2023copilot, title = {From Copilot to Pilot: Towards AI Supported Software Development}, author = {Pudari, Rohith and Ernst, Neil A.}, year = {2023}, archiveprefix = {arXiv}, primaryclass = {cs.SE}, }