Iterating Past EDR: An AI-Assisted Approach to Detection Bypass Development (Part 1)
Introduction
Modern endpoint detection and response (EDR) solutions no longer rely on a single detection engine. A typical EDR layers static machine learning, memory scanning, behavioral rule evaluation, and call stack analysis on top of one another, with the engines sharing telemetry and correlating events that look benign in isolation. A payload that defeats one engine can still be flagged by another, or by a behavioral rule that ties together signals from several engines at once.
For red team operators developing custom offensive tooling, this layering presents a difficult development problem. Implementing any single evasion technique is rarely the hard part. The hard part is figuring out which combinations of techniques produce a stable, undetected configuration against a specific solution, and understanding why certain combinations succeed where others fail. Manual iteration through this space is slow and prone to dead ends, and the difficulty grows non-linearly as more techniques are layered on.
This is the first of a two-part series describing a workflow that addresses that problem by incorporating an AI coding assistant as a reasoning partner throughout the detection bypass development process. The role of the assistant is not to generate exploits or write shellcode. Its role is to accelerate the analytical step that sits between observing a detection and implementing a targeted bypass. This is accomplished by ingesting the actual rule logic, mapping its conditions to the loader’s code paths, and identifying every location where a fix needs to land.
This part covers the methodology and walks through three categories of detection that were resolved using the iterative loop. Part 2 covers the final and most instructive detection, the approaches that did not work, and a prevent-mode validation that revealed something unexpected about the depth at which the bypass actually operates.
The Iterative Bypass Loop
The workflow itself is straightforward and not novel in concept: a loader is executed against a target host with the EDR product installed; the resulting alerts are observed. For each rule that fires, the rule’s source code is retrieved from the vendor’s public repository and analyzed to identify the specific conditions that triggered the detection. A targeted change is implemented in the loader to defeat those conditions. The build is redeployed and re-executed, and the process repeats.

> >
The bypass development loop. Each cycle is informed by analysis of the actual detection rule logic. The analytical steps (rule analysis and hypothesis generation) are the steps that benefit most from AI assistance.
An AI coding assistant changes the speed and depth of the analytical step. A behavioral rule expressed in EQL or KQL combines event matching, field thresholds, array searches across call stack frames, and process ancestry checks into queries that are dense and tedious to parse manually. Mapping those queries back to the API calls and code paths that produce the matched events requires both familiarity with Win32 internals and knowledge of the loader’s full codebase. An assistant with access to both can perform this mapping in a single pass, identifying triggering conditions and flagging every call site that requires modification. The cycle compresses from days of manual iteration to focused sessions during which each detection is resolved methodically.
This approach is broadly applicable because several major security vendors publish their detection logic openly. Elastic’s endpoint behavioral rules live in the protections-artifacts repository, Microsoft publishes Sentinel detection content as KQL, Splunk publishes ESCU content as SPL, and the SigmaHQ project provides vendor-agnostic rules that convert across platforms. For target products in this set, rule analysis is a matter of reading published source code rather than reverse engineering closed binaries.
Working Through the Detection Layers
Initial execution of the loader against a fresh installation of Elastic Defend 9.3.1 produced forty alerts across ten distinct rule categories. These alerts represented multiple independent detection engines firing on different aspects of the same execution. They sorted naturally into three groups, each addressed by a different class of fix.

> >
The forty alerts produced on initial execution, broken down by rule and detection layer.
Static Machine Learning
Elastic’s ML model scored the binary as malicious before execution, producing two Malware Detection Alert entries. Investigation traced the score to the C runtime library produced by the MinGW cross-compiler. MinGW is the dominant toolchain in public offensive tooling, and the CRT initialization code, section layout, and import patterns it produces are consistent across virtually every public loader, shellcode runner, and implant builder available today. The model appears to score heavily on these toolchain-level characteristics. Switching the build to use Zig as the cross-compiler, which produces substantially different runtime and linking artifacts from the same C source, dropped the static score to clean.
Memory Scanning
11 Unbacked Shellcode from Unsigned Module alerts and four Shellcode Injection alerts indicated that periodic memory scans were finding executable code in private, unbacked memory regions. The corresponding rules combine three conditions: private memory type, executable protection, and the absence of a backing file on disk. Each condition is benign in isolation, but their combination is a strong signal of injected code. The fix was to execute from memory backed by SEC_IMAGE mappings of legitimate system DLLs, a technique known as module stomping. With the payload running inside memory that appeared to belong to setupapi.dll, the unbacked-memory conditions no longer applied.
Module stomping introduced its own follow-up detections. Four Network Activity from Modified Module alerts and two Network Activity from Stomped Module alerts appeared, but the connection between the technique and the detection was not immediately obvious from the rule names alone. The assistant traced the alerts to their rule definitions, identified that the EDR was comparing the in-memory contents of the stomped DLL against the on-disk file, and connected the detection to the copy-on-write mechanism that produces divergent pages after the .text section is overwritten. This is the natural consequence of overwriting a mapped section, but seeing it through the rule logic made the fix clear. The fix was to restore the original DLL bytes after the payload exited, by creating a fresh SEC_IMAGE mapping of the same DLL and copying its .text content back over the stomped pages. This technique was published by Diego Capriotti (naksyn) in June 2023 under the name ModuleShifting, with a reference implementation on GitHub.
Call Stack Analysis
Five API Call via Jump ROP Gadget alerts originated from a JMP-based call stack spoofing technique used during sleep cycles. The corresponding rule detects ROP gadget patterns in thread call stacks. Switching to a threadpool-based sleep technique resolved this immediately, because threadpool worker threads originate inside ntdll.dll itself and produce call stacks with only standard system frames.
A related alert proved more involved. The Image Hollow from Unusual Stack rule fires when a thread executing inside a stomped module was created via CreateThread, producing a call stack with the loader’s frames visible above the standard system frames. Resolving this required dispatching the payload via the Windows threadpool API rather than CreateThread. The threadpool worker thread originates inside ntdll!TppWorkerThread, and the call stack contains only standard system frames with no unbacked entries for the rule to match.
Implementing this fix surfaced an unrelated bug in the loader’s API resolver. The first attempted threadpool dispatch produced an access violation (exception code 0xC0000005, access type 0x8) at an address that should have been a function entry point. The crash had no obvious connection to the threadpool changes and could have sent debugging in several wrong directions.
The assistant worked through the diagnosis in a series of steps. It examined the faulting address and observed that the memory region was readable but not executable, which ruled out a code corruption issue. It then noted that the bytes at the faulting address were valid ASCII text, not machine code, and recognized the pattern as an export forwarder string. From there it identified the root cause: the threadpool APIs in kernel32.dll on Windows 11 are forwarded exports. The kernel32 export table contains pointers to strings such as api-ms-win-core-threadpool-l1-2-0.CreateThreadpoolWork rather than function addresses. The loader’s PEB-walk resolver was returning these string pointers as if they were code, and the first call into one produced the access violation. The assistant identified the specific check that was missing in the resolver (whether the function RVA falls within the export directory bounds, which indicates a forwarder) and proposed the fix: detect the forwarder condition and fall back to GetProcAddress to resolve through the chain. Once this was in place, threadpool dispatch worked correctly and the unusual stack alert was eliminated.
This is a representative example of the kind of problem that benefits most from the AI-assisted workflow. The bug was not conceptually difficult, but diagnosing it required connecting a low-level access violation to a PE format detail (export forwarders) that is not commonly encountered during development. The assistant’s familiarity with the PE export directory structure and its ability to reason about the faulting address in that context compressed what could have been hours of manual investigation into a single analytical pass.
What Comes Next
These layers of fixes resolved the majority of the initial forty alerts. One alert remained: Suspicious System Module Image Hollowing.
In Part 2, we examine this final alert in detail, read the actual rule source to find an exploitable threshold, walk through the approaches that did not work, and validate the result against the EDR’s prevent-mode enforcement layer. The validation produces a finding about how deeply the bypass operates within the product’s detection stack that changes the interpretation of the entire result.
