Literate Development: AI-Enhanced Software Engineering
Practical guidance for software architects and team leads on integrating LLMs for more efficient, reliable, and human-centered software development.
The software development community is currently expending a great deal of effort on developing LLM-powered tools for code generation. As these models improve, the sophistication of our tooling increases. However, despite these advances, developers continue to express familiar concerns, and many are unable to realize substantial benefits from incorporating LLMs into their daily work.
As of this writing, fully autonomous coding agents still leave much to be desired. In their daily workflows, developers currently utilize AI in three primary ways:
As an autocomplete tool.
As a research assistant (via chat interfaces).
In a copilot or pair-programming mode with an LLM.
While autocomplete and research assistance have gained traction and generally positive feedback, the copilot mode—where developers prompt LLMs to implement features or fix bugs across multiple code locations—has received mixed reviews. Novice developers often struggle to articulate their requests effectively, while experienced developers frequently find the generated code to be of subpar quality, often failing to adhere to their preferred style, coding standards, or misusing external dependencies.
I contend that addressing these issues requires a shift in development practices, rather than solely focusing on improving tools. This article proposes a set of principles and techniques for restructuring the software development process to effectively leverage contemporary LLMs. I term this approach "Literate Development," drawing an analogy to Donald Knuth's concept of Literate Programming, which also emphasizes the importance of documentation.
Literate Development: Principles
Literate Development is based on the following core principles:
Document First
In conventional development, code is often regarded as the primary artifact. However, code frequently lacks the implicit context and developer intent that are essential for LLMs to effectively generate accurate and relevant modifications or extensions.
Literate Development therefore posits that comprehensive documentation should serve as the central, authoritative source of information. Furthermore, with LLMs, documentation generation is becoming increasingly efficient, transforming it from a potential liability into a valuable asset that enhances AI's capabilities.
Link Code, Documentation, and Tests
To ensure consistency of generated code, Literate Development advocates establishing clear and explicit links between code, documentation, and tests wherever possible. This creates a network of interconnected artifacts that helps ground LLMs and facilitates the automatic validation of their output.
Prompt Iteratively
Recognizing the current limitations of LLMs, this principle emphasizes an iterative approach. LLMs should not be used to simultaneously alter or produce all linked elements (code, documentation, and tests) at once. Instead, developers should make incremental updates, focusing on one element at a time and validating it against the others either automatically or manually as needed.
Type and Test
In Literate Development, automated tests, type information, code style linting, and other code validation tools and methods remain crucial. They provide important signals to LLMs, guiding code generation and helping to ensure correctness.
In traditional development, a common argument against test-first practices is the time spent writing and maintaining test suites. However, this concern is somewhat mitigated by current LLMs, which are quite proficient at generating test code.
Literate Development: Techniques
Regardless of the specific software development methodology employed, Literate Development seeks to ensure that all artifacts (documents, diagrams, code, etc.) produced in each phase of the software development life cycle possess the following characteristics:
Accessibility to LLMs (e.g., stored in the same repository), thereby providing LLMs with access to the complete project context.
Utilization as context for creating artifacts in subsequent phases, establishing causality or one-way dependency.
Sufficiency as context for producing subsequent artifacts. For instance, design artifacts should not contain implicit requirements that are absent from the requirements documentation.
Elaborative Feedback Technique
The initial phase of development aims to produce documentation artifacts that clearly articulate the "why" and "what" of the project. This involves transforming ideas and unstructured information into a set of concrete documents that can serve as the foundational context for the project.
A valuable technique in this regard is to elicit elaborative feedback from LLMs:
Describe the project idea to the LLM, providing all available unstructured information as context.
Instruct the LLM to progressively question the idea to develop a comprehensive understanding.
Once sufficient context has been established through this dialogue, direct the LLM to compile the resulting artifacts.
Manually validate and finalize the artifacts.
This technique can be applied recursively throughout the initial phases. For example, during the initiation and planning phases, it can be used to produce a project charter, a vision document, or a release plan. These artifacts can then serve as context in the subsequent requirements analysis phase to generate user stories, use cases, and other relevant documentation. Similarly, in the design phase, all preceding artifacts can provide the necessary context for generating high- and low-level design documents, data models, feature specifications, and technical specifications.
Iterative Pair-Programming
Providing code-generating LLMs with access to documentation artifacts from the initial phases as context significantly improves the quality of their output.
For novice developers, simply incorporating this documentation as context into their prompts, even if those prompts are not ideally structured, will yield substantially better results. For experienced engineers, this practice helps ensure that LLMs produce consistent results and avoid common pitfalls.
During the development phase, an iterative approach to LLM-assisted code generation is recommended. For new code, this involves the following steps:
Capture ongoing decisions and research in (Architectural) Decision Records.
Use design documents and decision records as context to generate tests and interfaces.
Generate code that satisfies the defined tests and interfaces.
When modifying existing functionality, the following iterative process should be followed. This process is designed to prevent LLMs from masking faulty code by generating updated tests or documentation:
Update the relevant documentation.
Prompt the LLM to update the tests according to the documentation.
Prompt the LLM to update the code without changing either the documentation or the tests.
The practices and techniques I've outlined in this article provide a framework for more effectively integrating LLMs into the software development process. I believe this is the single most impactful change to the process that architects or team leads can make to level up their team’s performance. By emphasizing documentation as the primary source of truth, establishing explicit linkages between development artifacts, and employing iterative, human-guided prompting, the whole team can harness the power of LLMs while mitigating their current limitations.