.:: Phrack Magazine ::.

Issues: [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ] [ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ] [ 21 ] [ 22 ] [ 23 ] [ 24 ] [ 25 ] [ 26 ] [ 27 ] [ 28 ] [ 29 ] [ 30 ] [ 31 ] [ 32 ] [ 33 ] [ 34 ] [ 35 ] [ 36 ] [ 37 ] [ 38 ] [ 39 ] [ 40 ] [ 41 ] [ 42 ] [ 43 ] [ 44 ] [ 45 ] [ 46 ] [ 47 ] [ 48 ] [ 49 ] [ 50 ] [ 51 ] [ 52 ] [ 53 ] [ 54 ] [ 55 ] [ 56 ] [ 57 ] [ 58 ] [ 59 ] [ 60 ] [ 61 ] [ 62 ] [ 63 ] [ 64 ] [ 65 ] [ 66 ] [ 67 ] [ 68 ] [ 69 ] [ 70 ] [ 71 ]

Get tar.gz

Current issue : #71 | Release date : 2024-08-19 | Editor : Phrack Staff

Introduction	Phrack Staff
Phrack Prophile on BSDaemon	Phrack Staff
Linenoise	Phrack Staff
Loopback	Phrack Staff
Phrack World News	Phrack Staff
MPEG-CENC: Defective by Specification	David "retr0id" Buchanan
Bypassing CET & BTI With Functional Oriented Programming	LMS
World of SELECT-only PostgreSQL Injections	Maksym Vatsyk
A VX Adventure in Build Systems and Oldschool Techniques	Amethyst Basilisk
Allocating new exploits	r3tr074
Reversing Dart AOT snapshots	cryptax
Finding hidden kernel modules (extrem way reborn)	g1inko
A novel page-UAF exploit strategy	Jinmeng Zhou, Jiayi Hu, Wenbo Shen, Zhiyun Qian
Stealth Shell: A Fully Virtualized Attack Toolchain	Ryan Petrich
Evasion by De-optimization	Ege BALCI
Long Live Format Strings	Mark Remarkable
Calling All Hackers	cts

Title : Evasion by De-optimization

Author : Ege BALCI

==Phrack Inc.==

Volume 0x10, Issue 0x47, Phile #0x0F of 0x11

|=-----------------------------------------------------------------------=|
|=------------------=[ Evasion by De-optimization ]=---------------------=|
|=-----------------------------------------------------------------------=|
|=---------------------------=[ Ege BALCI ]=-----------------------------=|
|=-----------------------------------------------------------------------=|

--[ Table of Contents

1 - Intro
2 - Current A.V. Evasion Challenges
3 - Prior Work
4 - Transforming Machine Code
5 - Transform Gadgets
5.1 - Arithmetic Partitioning
5.2 - Logical Inverse
5.3 - Logical Partitioning
5.4 - Offset Mutation
5.5 - Register Swapping
6 - Address Realignments
7 - Known Limitations
8 - Conclusion
9 - References

--[ 1 - Intro

Bypassing security products is a very important part of many offensive
security engagements. The majority of the current AV evasion techniques
used in various evasion tools, such as packers, encoders, and obfuscators,
are heavily dependent on the use of self-modifying code running on RWE
memory regions. Considering the current state of security products, such
evasion attempts are easily detected by memory analysis tools such as
Moneta[1] and Pe-sieve[2]. This study introduces a new approach to code
obfuscation with the use of machine code de-optimization.

In this study, we will delve into how we can use certain mathematical
approaches, such as arithmetic partitioning, logical inverse, polynomial
transformation, and logical partitioning, to design a generic formula
that can be applied to many machine code instructions for transforming /
mutating or de-optimizing the bytes of the target program without
creating any recognizable patterns. Another objective of this study is
to give a step-by-step guide for implementing a machine code de-optimizer
tool that uses these methods to bypass pattern-based detection.

Such tools have already been implemented, and will be shared[3] as an
open-source project with references to this article.

--[ 2 - Current A.V. Evasion Challenges

The term "security product" represents a wide variety of software
nowadays. The inner workings of such software may differ, but the main
purpose is always to detect some type of malicious asset by recognizing
certain patterns. Such indicators can originate from a piece of code,
data, a behavior log, or even from the entropy of the program. The main
goal of this study is to introduce a new way of obfuscating malicious
code. So, we will only focus on evading malicious CODE patterns.

Arguably, the most popular way of hiding malicious code is by using
encoders. Encoding a piece of machine code can be considered the most
primitive way of eliminating malicious code patterns. Because of the
current security standards, most encoders are not very effective at
bypassing security products on their own. Instead, they're being used
in more complex evasion software such as packers, crypters, obfuscators,
command-and-control, and exploitation frameworks. Most encoders work by
encoding the malicious code by applying some arithmetic or logical
operations to the bytes. After such transformation, encoder software
needs to add a decoding routine at the beginning of the encoded payload
to fix the bytes before executing the actual malicious code.

There are three critical issues with this approach. The first issue is
the effectiveness of the encoding algorithm. If the malicious code is
being encoded by just a single byte of logical/arithmetic operation, the
security products can also detect the encoded form of the malicious
code.[4] In order to make the malicious code unrecognizable, the encoder
needs to use more complex encoding algorithms, such as primitive
cryptographic ciphers or at least multi-byte encoding schemes.

Naturally, such complex encoding algorithms require larger decoder
routines. This brings us to the second critical issue, which is the
visibility of the decoder routine. Most security products can easily
detect the existence of such code at the beginning of an encoded blob by
using static detection rules[5]. There are some new-generation encoder
software that tries to obfuscate and hide the decoder code by also
encoding the decoder routine, reducing its size, and obfuscating it
with garbage instructions.[6] Such designs produce promising results,
but it does not solve the final and most crucial issue.

In the end, all of the encoder software produces a piece of self-modifying
code. This means the resulting payload must be running inside a
Read-Write-Execute (RWE) memory region. Such a scenario can easily be
considered a suspicious indicator.

--[ 3 - Prior Work

To solve the aforementioned issues, security researchers created several
machine code obfuscators.[7][8] The purpose of these obfuscators is to
transform certain machine code instructions with certain rules for
bypassing security products without the need for self-modifying code.

These obfuscators analyze the instructions of the given binary and mutate
them to some other instruction(s). The following lines contain a simple
example. The original instruction loads the 0xDEAD value into the RCX
register;

lea rcx, [0xDEAD] ------+-> lea rcx, [1CE54]
+-> sub rcx, EFA7

The obfuscator program creates the above instruction sequence. After
executing two of the newly generated instructions, the resulting value
inside RCX will be the same. The assembled bytes of the newly generated
sequence are different enough to bypass static detection rules. But there
is one problem here. The final SUB instruction of the generated sequence
modifies the condition flags.

Depending on the code, this could affect the control flow of the program.
To avoid such conditions, these obfuscator programs also add save/restore
instructions to the generated sequence to preserve all the register values
including the condition flags. So, the final output of the program will be;

pushf ; --------------> Save condition flags to stack.
lea rcx, [1CE54]
sub rcx, EFA7
popf ; --------------> Restore condition flags from the stack.

If you analyze the final instruction sequence, you will realize that such
an order of instructions is very uncommon, and wouldn't be generated from
a compiler under normal circumstances. It means that this method of
obfuscating instructions can be detected by security products with static
detection rules for these techniques.

The following simple Yara rule can be used for detecting all the
sequences generated by the LEA transform gadget of the Alcatraz
binary obfuscator.

rule Alcatraz_LEA_Transform {
strings:
// pushf > 66 9c
// lea ?, [?] > 48 8d ?? ?? ?? ?? ??
// sub ?, ? > 48 81 ?? ?? ?? ?? ??
// popf > 66 9d
$lea_transform = { 66 9c 48 8d ?? ?? ?? ?? ?? 48 81 ?? ?? ?? ?? ?? 66 9d }
condition:
$lea_transform
}

This rule can easily be used by security products because it has a very
low false-positive rate due to the unusual order of the generated
instruction sequence. Another shortcoming of currently available machine
code obfuscators is using specific transform logic for specific
instructions. There are around 1503 instructions available for the
current Intel x86 instruction set. Designing specific transform gadgets
for each of these instructions can be considered challenging and
time-consuming. One of the main goals of this study is to create
math-based generic algorithms that can be applied to many instructions
at once.

--[ 4 - Transforming Machine Code

To effectively transform x86 machine code, we need a deep understanding
of x86 architecture and the instruction set. In the x86 architecture,
instructions can take up to five operands. We can divide the types of
operands into three main categories. An operand can either be a register,
immediate, or memory. These operand types have their subcategories, but
for the sake of simplicity, we can skip this for now.

Depending on the mnemonic, an x86 instruction can have all the
combinations of the mentioned operand types. As such, there are many ways
to transform x86 instructions. All the compiler toolchains do this all the
time during the optimization phase[9].

These optimizations can be as complex as reducing a long loop condition
to a brief branch operation or re-encoding a single instruction to a much
shorter sequence of bytes, effectively making the program smaller and
faster. The following example shows that there are multiple ways to encode
certain instructions in x86 architecture.

add al,10h --------------------> \x04\x10
add al,10h --------------------> \x80\xC0\x10

adc al,0DCh -------------------> \x14\xDC
adc al,0DCh -------------------> \x80\xD0\xDC

sub al,0A0h -------------------> \x2C\xA0
sub al,0A0h -------------------> \x80\xE8\xA0

sub eax,19930520h -------------> \x2D\x20\x05\x93\x19
sub eax,19930520h -------------> \x81\xE8\x20\x05\x93\x19

sbb al,0Ch --------------------> \x1C\x0C
sbb al,0Ch --------------------> \x80\xD8\x0C

sbb rax,221133h ---------------> \x48\x1D\x33\x11\x22\x00
sbb rax,221133h ---------------> \x48\x81\xD8\x33\x11\x22\x00

Each of the instruction pairs does the exact same thing, with different
corresponding bytes. This is usually done automatically by the compiler
toolchains. Unfortunately, this type of shortening can only be applied
to certain instructions. Also, once you analyze the produced bytes,
you'll see that only the beginning part (mnemonic code) of the bytes
are changing, the rest of the bytes representing the operand values are
exactly the same. This is not good in terms of detection, because most
detection rules focus on the operand values. These are the reasons why
we need to focus on more complex ways of de-optimization.

Most modern compiler toolchains such as LLVM, convert the code into
intermediate representations (IR) for applying complex optimizations.
IR languages make it very easy to manipulate the code and apply certain
simplifications independent from the target platform. At first glance,
using LLVM sounds very logical for achieving our objective in this study;
it already has various engines, libraries, and tooling built into the
framework for code manipulation. Unfortunately, this is not the case.

After getting into the endless rabbit hole of LLVM's inner workings, you
realize that IR-based optimizations are leaving behind certain patterns
in the code[10]. When the code is transformed into IR, whether from source
code or binary lifting[11], you lose control of individual instructions.
Because IR-based optimizations mainly focus on simplifying and shortening
well-structured functions instead of raw pieces of code, it makes it hard
to eradicate certain patterns. Maybe highly skilled LLVM wizards can hack
their way around these limitations, but we will go with manual disassembly
using the iced_x86[12] rust disassembly library in this study. It will
help us thoroughly analyze the binary code and give us enough control
over the individual instructions.

Since our primary objective is to evade security products, while
de-optimizing the instructions, we also need to be sure that the
generated instruction sequence is also commonly generated by regular
compiler toolchains. This way, our obfuscated code can blend in with
the benign code, and rule-based detection will not be possible against
our transform gadgets.

In order to determine how common the generated instructions are, we can
write specific Yara rules for our transform gadgets, and run the rules on
a large dataset. For this study, ~300 GB dataset consisting of executable
sections of various well-known benign EXE, ELF, SO, and DLL files has been
curated. We will simply run our Yara rules on this dataset and check the
false positive rate.

--[ 5 - Transform Gadgets

Now, we need a way of transforming individual instructions, while
maintaining the overall functionality of the program. In order to
achieve this, we will take advantage of basic math and numbers theory.

Most instructions in the x86 instruction set can be mapped to equivalent
mathematical operands. For example, the "ADD" instruction can be directly
translated to the addition operand "+". The following table shows various
translation examples:

MOV, PUSH, POP, LEA ---> =
CMP, SUB, SBB ---> -
ADD, ADC ---> +
IMUL, MUL ---> *
IDIV, DIV ---> /
TEST, AND ---> &
OR ---> |
XOR ---> ^
SHL ---> <<
SHR ---> >>
NOT ---> '

With this approach, we can easily represent basic x86 instructions as
mathematical equations. For example, "MOV EAX, 0x01" can be represented
as "x = 1". A bit more complex example could be;

MOV ECX,8 ) -------------> z = 8 )
SHL EAX,2 ) -------------> 4x )
SHL EBX,1 ) -------------> 2y ) ---> ((4x+2y+8)**2)
ADD EAX,EBX ) -------------> 4x+2y )
ADD EAX,ECX ) -------------> 4x+2y+8 )
IMUL EAX,EAX ) -------------> (4x+2y+8)^2 )

When dealing with code sequences that only contain operations of addition,
subtraction, multiplication, and positive-integer powers of variables,
formed expressions can be transformed using polynomial transformation
tricks. Similar "Data-flow optimization" tricks are being used by compiler
toolchains during code optimizations[9], but we can also leverage the same
principles for infinitely expanding the expressions. In the case of this
example, the above expression can be extended to:

(16x^2 + 16xy + 64x + 4y^2 + 32y + 64)

When this expression is transformed back into assembly code, you'll see
that multiple instructions are changed, new ones are added, and some
disappear. The only problem for us is that some instructions stay exactly
the same, which may still trigger detection. In order to prevent this, we
need to use other mathematical methods on a more individual level.

In the following sections, we'll analyze five different transform gadgets
that will be targeting specific instruction groups.

--[ 5.1 - Arithmetic Partitioning

Our first transform gadget will target all the arithmetic instructions
with an immediate type operand, such as MOV, ADD, SUB, PUSH, POP, etc.

Consider the following example; "ADD EAX, 0x10"

This simple ADD instruction can be considered the addition (+) operator
in the expression "X + 16". This expression can be infinitely extended
using the arithmetic partitioning method, such as:

(X + 16) = (X + 5 - 4 + 2 + 13)

When we encounter such instructions, we can simply randomize the
immediate value and add an extra instruction for fixing it. Based on
the randomly generated immediate value, we need to choose between the
original mnemonic, or the arithmetic inverse of it.

In order to keep the generated code under the radar, only one level of
partitioning (additional fix instruction) will suffice. Applying many
arithmetic operations to a single destination operand might create a
recognizable pattern. Here are some other examples:

mov edi,0C000008Eh ---+-> mov edi,0C738EE04h
+-> sub edi,738ED76h

add al,10h ------------+-> add al,0D8h
+-> sub al,0C8h

sub esi,0A0h ----------+-> sub esi,5062F20Ch
+-> add esi,5062F16Ch

push 0AABBh -----------+-> push 7F08C11Dh
+-> sub dword ptr [esp],7F081662h

Upon testing how frequent the generated code sequences are on our sample
dataset, we see that ~38% of the compiler-generated sections contain such
instruction sequences. This means that almost one of every three compiled
binary files contains these instructions, which makes it very hard to
distinguish.

--[ 5.2 - Logical Inverse

This transform gadget will target half of the logical operation
instructions with an immediate operand such as AND, OR, or XOR.

Consider the following example; "XOR R10, 0x10" This simple XOR
instruction can be written as "X ^ 16". This expression can be
transformed using the properties of the logical inverse, such as;

(X ^ 16) = (X' ^ '16) = (X' ^ -17)

Once we encounter such instructions, we can simply transform the
instructions by taking the inverse of the immediate value and adding
an additional NOT instruction for taking the inverse of the destination
operand. The same logic can also be applied to other logical operands.

"AND AL, 0x10" instructions can be expressed as "X & 16". Using the same
logical inverse trick, we can transform this expression into;

(X & 16) = (X' | 16') = (X' | -17)

For the case of AND and OR mnemonics, the destination operand needs to
be restored with an additional NOT instruction at the end. Here are some
other examples:

xor r10d,49656E69h ---+-> not r10d
+-> xor r10d,0B69A9196h

+-> not al
and al,1 --------------+-> or al,0FEh
+-> not al

+-> not edx
or edx,300h -----------+-> and edx,0FFFFFCFFh
+-> not edx

As mentioned earlier in this article, pattern-based detection rules
written for detecting malicious "code" mostly target the immediate
values on the instructions. So, using this simple logical inverse
trick will sufficiently mutate the immediate value without creating
any recognizable patterns.

After testing the frequency of the generated code sequence, we see
that ~%10 of the compiler-generated sections contain such instruction
sequences. This is high enough that any detection rule for this
specific transform won't be used by AV vendors due to the potential
for high false positives.

--[ 5.3 - Logical Partitioning

This transform gadget will target the remaining half of the logical
operation instructions with an immediate operand such as ROL, ROR,
SHL, or SHR. In the case of shift instructions, we can split the
shift operation into two parts.

Consider the following example; "SHL AL, 0x05".

This instruction can be split into "SHL AL, 0x2" and "SHL AL, 0x3". The
resulting AL value and the condition flags will always be the same. In
the case of roll instructions, there is a simpler way to mutate the
immediate value.

The destination operand of these logical operations is either a register,
or a memory with a defined size. Based on the destination operand size,
the roll immediate value can be changed accordingly.

Consider the following example: "ROL AL, 0x01"

This instruction will roll the bits of the AL register once to the left.
Since AL is an 8-bit register, the "ROL AL, 0x09" instruction will have
the exact same effect. Roll transforms are very effective for keeping the
mutated code size low since we don't need extra instructions.

Here are some other examples:

shr rbx,10h -------------+-> shr rbx,8
+-> shr rbx,8

shl qword ptr [ecx],20h -+-> shl qword ptr [ecx],10h
+-> shl qword ptr [ecx],10h

ror eax,0Ah ---------------> ror eax,4Ah

rol rcx,31h ---------------> rol rcx,0B1h

These transforms modify the condition flags the exact same way as the
original instruction, and thus can be used safely without any additional
save/restore instructions. Since the transformed code is very small,
writing an effective Yara rule becomes quite hard. After testing the
frequency of the generated code sequences, we see that ~%59 of the
compiler-generated sections contain such instruction sequences.

--[ 5.4 - Offset Mutation

This transform gadget will target all the instructions with a memory-type
operand. For a better understanding of the memory operand type, let's
deconstruct the memory addressing logic of the x86 instruction set.

Any instruction with a memory operand needs to define a memory location
represented inside square brackets. This form of representation may
contain base registers, segment prefix registers, positive and negative
offsets, and positive scale vectors. Consider the following instruction:

MOV CS:[EAX+0x100*8]
| | | |
+----+----+---+---> Segment Register
+----+---+---> Base Register
+---+---> Displacement Offset
+---> Scale Vector

A valid memory operand can contain any combination of these fields. If it
only contains a large (the same size as the bitness) displacement offset,
then it can be called an absolute address. Our Offset Mutation Transform
gadget will specifically target memory operands with a base register. We
will be using basic arithmetic partitioning tricks on the memory
displacement value of the operand.

The "MOV RAX, [RAX+0x10]" instruction moves 16 bytes from the [RAX+0x10]
memory location onto itself. Such move operations are very common because
of operations like referencing a pointer. For mutating the memory operand
values, we can simply manipulate the contents of the RAX register.

Adding a simple ADD/SUB instruction with RAX before the original
instruction will enable us to mutate the displacement offset.

Here are some examples:

mov rax,[rax] -------+-> add rax,705EBC8Dh
+-> mov rax,[rax-705EBC8Dh]

mov rax,[rax+10h] ---+-> sub rax,20DA86AAh
+-> mov rax,[rax+20DA86BAh]

lea rcx,[rcx] -------+-> add rcx,0D5F14ECh
+-> lea rcx,[rcx-0D5F14ECh]

In each of these example cases, the destination operand is the base
register inside the memory operand (pointer referencing). For the other
cases, we need additional instructions at the end for preserving the
base register contents. Here are some other examples:

+-> add rax,4F037035h
mov [rax],edi --------------+-> mov [rax-4F037035h],edi
+-> sub rax,4F037035h

+-> add rbx,34A92BDh
mov rcx,[rbx+28h] ----------+-> mov rcx,[rbx-34A9295h]
+-> sub rbx,34A92BDh

+-> sub rbp,2841821Ch
mov dword ptr [rbp+40h],1 --+-> mov dword ptr [rbp+2841825Ch],1
+-> add rbp,2841821Ch

The offset mutation transform can be applied to any instruction with a
memory operand. Unfortunately, this transform may affect the condition
flags.

In such a scenario, instead of adding extra save/restore instructions,
we can check if the manipulated condition flags are actually affecting
the control flow of the application by tracing the next instructions.

If the manipulated condition flags are being overwritten by another
instruction, we can safely use this transform. Due to the massive scope
of this transform gadget, it becomes quite hard to write an effective
Yara rule. We can easily consider the instruction mutated by this
transform to be common, and undetectable.

--[ 5.5 - Register Swapping

This transform gadget will target all the instructions with a register-
type operand, which can be considered a very large scope. This may be
the most basic but still effective transformation in our arsenal.

After the immediate and memory operand types, the register is the third
most common operand type that is being targeted by detection rules. Our
goal is to replace the register being used on an instruction with another,
same-sized register using the XCHG instruction.

Consider the "XOR RAX,0x10" instruction. We can change the RAX register
with any other 64-bit register by exchanging the value before and after
the original instruction. Here are some examples:

+-> xchg rax,rcx
xor rax,10h --------+-> xor rcx,10h
+-> xchg rax,rcx

+-> xchg rbx,rsi
and rbx,31h --------+-> and rsi,31h
+-> xchg rbx,rsi

+-> xchg rdx,rdi
mov rdx,rax --------+-> mov rdi,rax
+-> xchg rdx,rdi

This transform does not modify any of the condition flags, and can be
used safely without any additional save/restore instructions.

The generated sequence of instructions may seem uncommon, but due to the
scope of this transform and the small size of the exchange instructions,
the generated sequence of bytes is found to be very frequent in our sample
data set. After testing the frequency of the generated code sequences, we
see that ~92% of the compiler-generated sections contain such instruction
sequences.

--[ 6 - Address Realignments

After using any of these transform gadgets, an obvious outcome will be the
increased code size due to the additional number of instructions. This
will cause misalignments in the branch operations and relative memory
addresses. While de-optimizing each instruction, we need to be aware
of how much the original instruction size is increased so that we can
calculate a delta value for aligning each of the branch operations.

This may sound complex, simply because it is :) Handling such address
calculations is easy when you have the source code of the program. But
if you only have an already-compiled binary, address alignment becomes
a bit tricky. We will not dive into the line-by-line implementation of
post-de-optimization address realignment; the only thing to keep in mind
is double-checking branch instructions after the alignment.

There is a case where modified branch instructions (conditional jumps)
increase in size if they are modified to branch into much further
addresses. This specific issue causes a recursive misalignment and
requires a realignment after each fix on branch targets.

--[ 7 - Known Limitations

There are some known limitations while using these transform gadgets.

The first and most obvious one is the limited scope of supported
instruction types. There are some instruction types that cannot be
transformed with the mentioned gadgets. Instructions with no operands
are one of them. Such instructions are very hard to transform since
they do not have any operands to mutate. The only thing we can do is
relocate them somewhere else in the code.

This is not a very big problem because the frequency of unsupported
instructions is very low. In order to find out how frequently an
instruction is being generated by compilers, we can calculate frequency
statistics on our previously mentioned sample data set. The following
list contains the frequency statistics of each x86 instruction.

1. %33.5 MOV
2. %9.2 JCC (All conditional jumps)
3. %6.4 CALL
4. %5.5 LEA
5. %4.9 CMP
6. %3.9 ADD
7. %3.7 TEST
8. %3.5 JMP
9. %3.3 PUSH
10. %3.0 POP
11. %2.7 NOP
12. %2.2 XOR
13. %1.7 SUB
14. %1.5 INT3
15. %1.1 MOVZX
16. %1.0 AND
17. %1.O RET
18. %0.6 SHL
19. %0.5 OR
20. %0.5 SHR
- %11.3 <OTHER>

Similar instruction frequency studies[13] on x86 instruction set have
been made on different samples, and it can be seen that the results are
very much parallel with the list above. The instruction frequency list
shows that only around 5% of the instructions are not supported by our
transform gadgets.

As can be seen on the list, the most commonly used instructions are
simple load/store, arithmetic, logic, and branch instructions. This
means that, if implemented properly, previously explained transform
gadgets are able to transform the ~%95 of the instructions of compiler-
generated programs.

This can be considered more than enough to bypass rule-based detection
mechanisms. Another known limitation is self-modifying code. If the code
is overwriting itself, our transform gadgets will probably break the code.

Some code may also be using branch instructions with dynamically
calculated branch targets, in such cases the address realignment becomes
impossible without using code emulation. Lucky for us, such code is not
very commonly produced by compiler toolchains. Another rare condition is
overlapping instructions. Under certain circumstances, compiler toolchains
generate instructions that can be executed differently when branched into
the middle of the instruction. Consider the following example:

0000: B8 00 03 C1 BB mov eax, 0xBBC10300
0005: B9 00 00 00 05 mov ecx, 0x05000000
000A: 03 C1 add eax, ecx
000C: EB F4 jmp $-10
000E: 03 C3 add eax, ebx
0010: C3 ret

The JMP instruction will land on the third byte of the five-byte MOV
instruction at address 0000. It will create a completely new instruction
stream with a new alignment. This situation is very hard to detect without
some code emulation.

Another thing to consider is code with data inside. This is also a very
rare condition, but in certain circumstances, code can contain strings of
data. The most common scenario for such a condition is string operations
in shellcodes. It is very hard to differentiate between code and data when
there are no structured file formats or code sections.

Under such circumstances, our de-optimizer tool may treat data as code and
corrupt it by trying to apply transforms; but this can be avoided to some
extent during disassembly. Instead of using linear sweep[14] disassembly,
control flow tracing with a depth-first search[14] approach can be used
to skip data bytes inside the code.

--[ 8 - Conclusion

In this article, we have underlined real-life evasion challenges commonly
encountered by security professionals, and introduced several alternative
ways for solving these challenges via de-optimizing individual X86
instructions. The known limitations of these methods are proven not to
be a critical obstacle to the objective of this study.

These de-optimization methods have been found to be highly effective for
eliminating any pattern in machine code. A POC de-optimizer tool[3] has
been developed during this study to test the effectiveness of these
de-optimization methods. The tests are conducted by de-optimizing all the
available Metasploit[15] shellcodes and checking the detection rates via
multiple memory-based scanners and online analysis platforms.

The test results show that using these de-optimization methods is proven
to be highly effective against pattern-based detection while avoiding the
use of self-modifying code (RWE memory use). Of course, as in every study
on evasion, the real results will emerge over time after the release of
this open-source POC tool.

--[ 9 - References
- [1] https://github.com/forrest-orr/moneta
- [2] https://github.com/hasherezade/pe-sieve
- [3] https://github.com/EgeBalci/deoptimizer
- [4] https://github.com/hasherezade/pe-sieve/blob/603ea39612d7eb81545734c63dd1b4e7a36fd729/params_info/pe_sieve_params_info.cpp#L179
- [5] https://www.mandiant.com/resources/blog/shikata-ga-nai-encoder-still-going-strong
- [6] https://github.com/EgeBalci/sgn
- [7] https://github.com/zeroSteiner/crimson-forge
- [8] https://github.com/weak1337/Alcatraz
- [9] https://en.wikipedia.org/wiki/Optimizing_compiler
- [10] https://monkbai.github.io/files/sp-22.pdf
- [11] https://github.com/lifting-bits/mcsema
- [12] https://docs.rs/iced-x86/latest/iced_x86/
- [13] https://www.strchr.com/x86_machine_code_statistics
- [14] http://infoscience.epfl.ch/record/167546/files/thesis.pdf
- [15] https://github.com/rapid7/metasploit-framework

|=[ EOF ]=---------------------------------------------------------------=|