LLM-Assisted Code Review: What We've Learned After 12 Months

6 min read
Code Review, LLM, Quality
Share

Twelve months ago, we began integrating large language models into our code review process. Not as a replacement for human reviewers, but as a first pass that catches the mechanical issues before a senior engineer spends their time on it. After reviewing thousands of pull requests with LLM assistance, we have a clear picture of where these tools add genuine value and where they fall short.

The experiment started with a simple hypothesis: if an LLM can identify common code issues, null pointer risks, missing error handling, inconsistent naming, security anti-patterns, then human reviewers can focus on architecture, design patterns, and business logic. That hypothesis proved correct, but the reality is more nuanced than we expected.

LLMs catch the bugs humans overlook. Humans catch the design flaws LLMs cannot see. Together, they are formidable.

LLMs excel at pattern recognition in code. They reliably identify missing input validation, SQL injection vulnerabilities, unhandled promise rejections, and resource leaks. They are also surprisingly good at spotting inconsistencies with project conventions, provided you include those conventions in the prompt context. On a typical pull request, the LLM catches two to three issues that would otherwise require human attention.

What LLMs Are Good At (and What They Are Not)

Where LLMs struggle is with anything that requires understanding the broader system context. They cannot tell you that a function duplicates logic that already exists in another module. They cannot evaluate whether an architectural decision is appropriate for the system's scale. They miss subtle concurrency issues that require understanding the runtime environment. And they occasionally hallucinate problems that do not exist, which is why human oversight remains essential.

Our process now works in three stages. The developer submits a pull request. An automated LLM review runs within minutes, posting comments directly on the diff. The developer addresses any valid findings. Then a human reviewer examines the changes, already freed from having to catch the mechanical issues. This has reduced our average review time by about 25% while improving the consistency of feedback across the team.

  • Run LLM review as an automated CI step, not a manual process
  • Include your coding standards and project conventions in the prompt context
  • Track false positive rates and adjust sensitivity accordingly
  • Never let LLM approval substitute for human review on critical paths
  • Use LLM suggestions as a teaching tool for junior developers

The key to making LLM-assisted review work is calibration. You need to tune the system to your codebase, your conventions, and your risk tolerance. A system that produces too many false positives will be ignored. One that misses real issues provides false confidence. We spent the first two months adjusting our prompts and filtering rules, and we continue to refine them as the models improve.

Want to Chat?

Contact our friendly team for quick and helpful answers.

Contact us