
Automate Compliance:
How to reduce Misra-C issues with AI
The first question every technical leader asks when someone mentions AI-assisted development for safety-critical software is the same: Can I certify code that an LLM wrote?
The honest answer is: not directly, not yet, and probably not alone. But that is the wrong frame. The right question is whether AI can reliably move violations left, by catching and correcting problems during development instead of during final static analysis or, worse, during an audit.
This post walks through how shaide, a development assistant purpose-built for regulated software, can eliminate MISRA violations through the combination of retrieval-augmented generation, static analysis tool integration, and team knowledge management.
Are LLMs ready for safety-critical work?
The MISRA C and MISRA C++ standards together define over 400 rules for safety-critical embedded development.
Enforcing them on an active codebase is not a one-time activity; it is a significant burden on every line of code written and every engineer onboarded. Many teams typically maintain dedicated MISRA specialists, run static analysis in CI, and spend a meaningful fraction of review cycles on conformance issues.
A 2025 IEEE Access study by Umer et al. benchmarked five major LLMs against MISRA C++, running each through a set of representative tasks and checking the output with PC-lint Plus. The results were sobering: GPT-4 produced 18 violations per run, Microsoft Copilot 25, Google Gemini 32, and Meta AI reached 67. The most common offenders across all models were Rule 5-0-4 (implicit type conversions) and Rule 7-1-1 (missing const qualifiers), accounting for a combined 71 violations across all models in the study.
The tempting conclusion is that LLMs are not ready for safety-critical work, they lead to more problems than they solve. But the 2025 Parasoft study by Woloszyn and Raszka tells a more nuanced story.
What the Research Actually Shows
Woloszyn and Raszka ran a controlled before-and-after experiment across four models (DeepSeek Chat v3.1, GPT-4.1, GPT-o1-mini, and o3) using 26 C++ tasks from the aider polyglot benchmark, 20 runs per model per condition, 2,080 benchmark attempts in total. The baseline confirmed the Umer findings: models produce 23–29 violations per 1,000 lines of code with no special guidance.
Their intervention was deliberately minimal: they distilled the MISRA C++ documentation into a compact set of targeted, actionable instructions and prepended them to each prompt. The results:
Top-3 targeted rules: 41–67% violation reduction
Top-5 targeted rules: up to 83% violation reduction
The key architectural insight from this study is that the LLM's context, not the LLM itself, is the bottleneck. Given the right information at the right time, models that failed at baseline compliance can be made dramatically more reliable. The follow-on engineering question is: how do you deliver that context reliably, at scale, for every function a developer writes?
Both studies focus on MISRA C++, but the underlying dynamic appears equally in MISRA C:2012 codebases.
Why You Cannot Just Paste the Standard Into a System Prompt
The naive approach, embedding the entire MISRA standard in every request, fails for several reasons that are familiar to anyone who has built production RAG systems:
Context dilution. Dumping 200 rules into a prompt means most of them are irrelevant to the code being written. Irrelevant context does not just fail to help, it actively degrades output quality.
Rule selection requires code understanding. The five most relevant rules for a floating-point filter function are different from the five most relevant rules for a state machine. Static rule lists cannot adapt.
Teams have project-specific deviations. MISRA permits documented exceptions. A team may have a reasoned deviation for a specific rule in specific contexts. Those exceptions need to be part of the context, but they live in internal documents.
Static analysis feedback must close the loop. Improving prompt context gets you to 83% violation reduction. Getting to zero requires static analysis output to re-enter the generation loop so the model can correct the remaining issues.
These four problems define the architecture that shaide implements. By building up a shared knowledge base and defining proper tool calling, shaide can push you toward being MISRA compliant.
A Concrete Example: Fixing Violations in PX4 Autopilot with cppcheck
PX4 Autopilot is an open-source flight control software used in real drones and unmanned vehicles. Its codebase is a mix of C and C++, and it contains real MISRA violations in production code.
The example here comes from the Sagetech MXS transponder driver (ADS-B collision-avoidance hardware) at:
This is C code, so the relevant standard is MISRA C:2012, which is what cppcheck's open-source MISRA addon implements. For C++ code, commercial tools such as PC-lint Plus, LDRA, and QA-C support MISRA C++:2008 and MISRA C++:2023 with the same workflow.
Two files illustrate the most common violation categories the Umer and Woloszyn studies identified: type conversion issues and qualifier gaps.
calcChecksum.c - Rule 10.4: Mixed essential types in arithmetic
Rule 10.4: Both operands of an arithmetic operator shall have the same essential type category (line 21)
len has essential type unsigned (uint8_t). The literal 1 has essential type signed (int). Before the subtraction executes, C's implicit integer promotion converts len to int (a signed type) and then performs len - 1 as a signed operation. If len is ever 0, the result is -1, which wraps to 255 before the comparison, causing the loop to iterate 255 times over unowned memory. MISRA C:2012 Rule 10.4 forces you to make the intent explicit: either len - 1U (unsigned literal) or an explicit cast.
float2Buf.c - Rule 19.2: Union used for type punning
Rule 19.2: The union keyword should not be used (lines 20–23)
Reading conversion.bytes after writing conversion.val is type punning through a union, which is undefined behaviour in standard C (only memcpy provides a well-defined way to reinterpret object representations). MISRA C:2012 Rule 19.2 prohibits unions entirely for this reason.
Rule 10.4: Signed/unsigned mismatch in loop condition (line 27)
FLOAT_SIZE is const uint16_t (unsigned essential type) and i is int (signed). Same class of violation as in calcChecksum.c: the comparison mixes essential type categories without an explicit conversion.
Running cppcheck with the MISRA Addon
The output identifies all violations by rule and location:
The same workflow runs with commercial tools. LDRA, QA-C, and PC-lint Plus all produce equivalent violation reports, and additionally support MISRA C++:2008 and MISRA C++:2023 for C++ code. The shaide integration is tool-agnostic: the agent reads the static analyzer's stdout, parses the violation lines, and maps them back to source locations. Which tool is used is a project configuration choice.
How shaide Closes the Loop
When a developer asks shaide to review the transponder SDK for MISRA compliance, this is what happens:
Context retrieval: The shaide server queries for MISRA C:2012 rules semantically relevant to checksum arithmetic and byte-level serialisation. Rules 10.4 and 19.2 rank in the top results. If the team has uploaded a deviation rationale document for their project (for instance, documenting why a project-level exception to Rule 19.2 was assessed and rejected), it is included in the retrieved context.
Initial generation: The LLM generates proposed corrected versions of both functions, armed with the retrieved rule guidance.
Static analysis: The agent invokes cppcheck (or the tool of choice). No human intervention.
Violation feedback: The agent reads the output. If violations remain, they re-enter the prompt as structured feedback.
Iteration: The agent regenerates the affected lines and reruns the checker. The loop continues until the analyzer reports zero violations, or until the agent escalates a violation it cannot resolve without an architectural decision.
Result: The developer sees a clean diff with an optional traceability comment referencing the MISRA rules applied.
The corrected versions look like this:
New code generation
shaide implements new functionality with the same method: it iterates until there are no more rule violations in the generated code.
What This Means for Technical Leaders
Shifting left has compounding returns. A MISRA violation caught by the agent during authoring costs seconds to fix. The same violation caught in CI costs a developer context-switch and a re-review. Caught in a formal audit, it costs a deviation analysis, a rationale document, and potentially a schedule slip. The feedback loop shaide creates does not eliminate the need for final static analysis, but it dramatically reduces the volume of issues that reach it.
Onboarding no longer requires a MISRA expert in the room. Junior engineers writing their first safety-critical module get the same results as a senior engineer who has spent a decade with the standard. The guardrails are in the tool, not in the person sitting next to them.
Institutional knowledge becomes infrastructure. The project-specific deviation rationales, architectural constraints, and team conventions are encoded into the development environment. When context changes (e.g. a new platform, a new safety manager, a new regulatory revision), the knowledge center is updated once and benefits every developer immediately.
Limitations
Human review remains mandatory. The agent's loop converges on a static-analysis-clean result, but static analysis is not equivalence checking. A qualified safety engineer must still review generated code for logical correctness, system-level correctness, and certification-level requirements traceability.
Model behaviour is not deterministic. The agent loop is designed to handle non-convergence by escalating to the developer rather than silently accepting a non-compliant result. But the loop does not guarantee first-pass compliance on every generation.
The Practical Takeaway
The gap between LLMs produce compliant-ish code sometimes and LLMs are a reliable part of a compliance workflow is not a model capability gap. It is an architecture gap. The research shows that targeted context reduces violations by 80%+. Closing the loop with static analysis tooling closes most of the remainder. Encoding team-specific knowledge captures what no published standard can.
That architecture exists today. For teams spending significant time on MISRA compliance, the question is no longer whether AI can help, it is whether your current toolchain is delivering the context the model needs to do it reliably.