Copilot to pilot

This text was created from my defense presentation of my Master’s thesis, given at the University of Victoria, 23 August 2022. The thesis titled “AI Supported Software Development: Moving Beyond Code Completion” is available here.

full text: AI-Supported Software Development: From Copilot to Pilot

Introduction

The rise of AI-supported programming tools like GitHub’s Copilot marks a significant step in software development. Leveraging OpenAI’s Codex, Copilot assists developers by predicting and suggesting code snippets based on context. This tool represents a broader trend towards AI systems that can enhance software development productivity. However, the journey from AI-supported code completion to AI-driven software engineering involves overcoming several limitations. This blog post summarizes a research paper that investigates these limitations and proposes a taxonomy for understanding the boundaries and potential of AI-supported code completion tools.

Background

Copilot is a pioneering AI tool integrated within IDEs to suggest code based on a few lines of comments or code. It uses a large language model trained on vast amounts of code from public repositories. While Copilot excels at understanding context and semantics, software development requires more than just coding. It involves following best practices, avoiding code smells, and making rational design decisions.

Study Design

The research explored Copilot’s ability to follow language idioms and avoid code smells. Two main research questions were addressed:

RQ-1: What are the current boundaries of AI-supported code completion tools?
- RQ-1.1: How do AI-supported code completion tools manage programming idioms?
- RQ-1.2: How do AI-supported code completion tools manage to suggest non-smelly code?

To answer these questions, the study focused on Python idioms and JavaScript best practices. The evaluation involved testing Copilot’s suggestions against established coding standards and best practices.

Results

The findings revealed significant limitations in Copilot’s ability to follow idioms and avoid code smells:

Pythonic Idioms: Out of 25 tested idioms, Copilot suggested the idiomatic approach as the top suggestion in only 2 instances. In 8 cases, the idiomatic way appeared within the top 10 suggestions, but in the majority of cases (15 out of 25), the idiomatic approach was absent.
JavaScript Code Smells: Copilot suggested the best practice as the top suggestion in only 3 out of 25 instances. In 5 cases, the best practice was among the top 10 suggestions, but in 17 scenarios, it was not present at all.

These results indicate that current AI-supported code completion tools are not yet capable of consistently suggesting idiomatic or best-practice code, even though these practices are widely used in public repositories.

Taxonomy of Software Abstractions

The study proposes a taxonomy to understand the capabilities and limitations of AI-supported code completion tools. This taxonomy includes six levels of software abstractions:

Syntax Level: Ensuring suggested code is syntactically correct without compilation errors.
Correctness Level: Providing solutions that are not necessarily optimal but solve the given programming task.
Paradigms and Idioms Level: Using common paradigms and language idioms in code suggestions.
Code Smells Level: Avoiding common code smells and suggesting the most optimized version of the code.
Design Level: Supporting module-level and system-level design choices, including test cases, continuous integration, and adherence to coding style guidelines.

Discussion

The transition from code completion to AI-supported software engineering is challenging. Gathering sufficient training data, updating models to reflect evolving coding practices, and enabling multi-file input are crucial steps. Current AI tools like Copilot perform well at lower levels of abstraction but struggle with higher-level design tasks. Effective AI-supported code completion tools have the potential to significantly increase software development productivity, but achieving this requires overcoming these challenges. Enhancing training data quality by including verified sources and filtering out known flaws, integrating code review tools to improve suggestion quality, and adopting active learning approaches to learn a user’s context are some ways to improve these tools.

Implications

For Practitioners:

Pre-training the LLM: Enhancing training data quality by including verified sources and filtering out known flaws.
Code Completion Time: Integrating code review tools to improve suggestion quality and adopting active learning approaches to learn a user’s context.

For Researchers:

Moving Beyond Tokens: Developing models that work at the code block or file level, leveraging recent advances like the chain of thought process.
Design Patterns and Architectural Tactics: Training models to understand and suggest appropriate design choices.

Developing models that understand and suggest appropriate design choices, including design patterns and architectural tactics, is essential. AI-supported programming tools like Copilot represent a promising step towards more productive and efficient software development. However, significant challenges remain in moving beyond code completion to AI-supported software engineering. Addressing these challenges requires further research, better training data, and advancements in AI capabilities. The vision of AI-driven software engineering, where AI assists in complex design tasks, is achievable but requires continued innovation and development.

Conclusion

AI-supported programming tools like Copilot represent a promising step towards more productive and efficient software development. However, significant challenges remain in moving beyond code completion to AI-supported software engineering. Addressing these challenges requires further research, better training data, and advancements in AI capabilities. The vision of AI-driven software engineering, where AI assists in complex design tasks, is achievable but requires continued innovation and development.

AI-supported programming has arrived, as shown by the introduction and successes of large language models for code, such as Copilot/Codex (Github/OpenAI) and AlphaCode (DeepMind). Above human average performance on programming challenges is now possible. However, software engineering is much more than solving programming contests. Moving beyond code completion to AI-supported software engineering will require an AI system that can, among other things, understand how to avoid code smells, to follow language idioms, and eventually (maybe!) propose rational software designs. In this study, we explore the current limitations of AI-supported code completion tools like Copilot and offer a simple taxonomy for understanding the classification of AI-supported code completion tools in this space. We first perform an exploratory study on Copilot’s code suggestions for language idioms and code smells. Copilot does not follow language idioms and avoid code smells in most of our test scenarios. We then conduct additional investigation to determine the current boundaries of AI-supported code completion tools like Copilot by introducing a taxonomy of software abstraction hierarchies where ’basic programming functionality’ such as code compilation and syntax checking is at the least abstract level, software architecture analysis and design are at the most abstract level. We conclude by providing a discussion on challenges for future development of AI-supported code completion tools to reach the design level of abstraction in our taxonomy.

References

2023

Enjoy Reading This Article?