[ News ] [ Paper Feed ] [ Issues ] [ Authors ] [ Archives ] [ Contact ]


..[ Phrack Magazine ]..
.:: Evasion by De-optimization ::.

Issues: [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ] [ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ] [ 21 ] [ 22 ] [ 23 ] [ 24 ] [ 25 ] [ 26 ] [ 27 ] [ 28 ] [ 29 ] [ 30 ] [ 31 ] [ 32 ] [ 33 ] [ 34 ] [ 35 ] [ 36 ] [ 37 ] [ 38 ] [ 39 ] [ 40 ] [ 41 ] [ 42 ] [ 43 ] [ 44 ] [ 45 ] [ 46 ] [ 47 ] [ 48 ] [ 49 ] [ 50 ] [ 51 ] [ 52 ] [ 53 ] [ 54 ] [ 55 ] [ 56 ] [ 57 ] [ 58 ] [ 59 ] [ 60 ] [ 61 ] [ 62 ] [ 63 ] [ 64 ] [ 65 ] [ 66 ] [ 67 ] [ 68 ] [ 69 ] [ 70 ] [ 71 ]
Current issue : #71 | Release date : 2024-08-19 | Editor : Phrack Staff
IntroductionPhrack Staff
Phrack Prophile on BSDaemonPhrack Staff
LinenoisePhrack Staff
LoopbackPhrack Staff
Phrack World NewsPhrack Staff
MPEG-CENC: Defective by SpecificationDavid "retr0id" Buchanan
Bypassing CET & BTI With Functional Oriented ProgrammingLMS
World of SELECT-only PostgreSQL InjectionsMaksym Vatsyk
A VX Adventure in Build Systems and Oldschool TechniquesAmethyst Basilisk
Allocating new exploitsr3tr074
Reversing Dart AOT snapshotscryptax
Finding hidden kernel modules (extrem way reborn)g1inko
A novel page-UAF exploit strategyJinmeng Zhou, Jiayi Hu, Wenbo Shen, Zhiyun Qian
Stealth Shell: A Fully Virtualized Attack ToolchainRyan Petrich
Evasion by De-optimizationEge BALCI
Long Live Format StringsMark Remarkable
Calling All Hackerscts
Title : Evasion by De-optimization
Author : Ege BALCI
                             ==Phrack Inc.==

                Volume 0x10, Issue 0x47, Phile #0x0F of 0x11

|=-----------------------------------------------------------------------=|
|=------------------=[ Evasion by De-optimization ]=---------------------=|
|=-----------------------------------------------------------------------=|
|=---------------------------=[ Ege BALCI ]=-----------------------------=|
|=-----------------------------------------------------------------------=|

--[ Table of Contents

1 - Intro
2 - Current A.V. Evasion Challenges
3 - Prior Work
4 - Transforming Machine Code
5 - Transform Gadgets
    5.1 - Arithmetic Partitioning
    5.2 - Logical Inverse
    5.3 - Logical Partitioning
    5.4 - Offset Mutation
    5.5 - Register Swapping
6 - Address Realignments
7 - Known Limitations
8 - Conclusion
9 - References

--[ 1 - Intro

Bypassing security products is a very important part of many offensive 
security engagements. The majority of the current AV evasion techniques 
used in various evasion tools, such as packers, encoders, and obfuscators,
are heavily dependent on the use of self-modifying code running on RWE 
memory regions. Considering the current state of security products, such 
evasion attempts are easily detected by memory analysis tools such as 
Moneta[1] and Pe-sieve[2]. This study introduces a new approach to code 
obfuscation with the use of machine code de-optimization. 

In this study, we will delve into how we can use certain mathematical 
approaches, such as arithmetic partitioning, logical inverse, polynomial
transformation, and logical partitioning, to design a generic formula 
that can be applied to many machine code instructions for transforming / 
mutating or de-optimizing the bytes of the target program without 
creating any recognizable patterns. Another objective of this study is 
to give a step-by-step guide for implementing a machine code de-optimizer 
tool that uses these methods to bypass pattern-based detection. 

Such tools have already been implemented, and will be shared[3] as an 
open-source project with references to this article.

--[ 2 - Current A.V. Evasion Challenges

The term "security product" represents a wide variety of software 
nowadays. The inner workings of such software may differ, but the main 
purpose is always to detect some type of malicious asset by recognizing 
certain patterns. Such indicators can originate from a piece of code, 
data, a behavior log, or even from the entropy of the program. The main 
goal of this study is to introduce a new way of obfuscating malicious 
code. So, we will only focus on evading malicious CODE patterns.

Arguably, the most popular way of hiding malicious code is by using 
encoders. Encoding a piece of machine code can be considered the most 
primitive way of eliminating malicious code patterns. Because of the 
current security standards, most encoders are not very effective at 
bypassing security products on their own. Instead, they're being used 
in more complex evasion software such as packers, crypters, obfuscators, 
command-and-control, and exploitation frameworks. Most encoders work by 
encoding the malicious code by applying some arithmetic or logical 
operations to the bytes. After such transformation, encoder software 
needs to add a decoding routine at the beginning of the encoded payload 
to fix the bytes before executing the actual malicious code.

                       +-----------+
                       |  decoder  |  <--- Decodes the original payload.
  +-----------+        +-----------+
  |           |        |           |
  | malicious |        |           |
  |   code    |  ===>  |  enc-code |
  |           |        |           |
  |           |        |           |
  +-----------+        +-----------+

There are three critical issues with this approach. The first issue is 
the effectiveness of the encoding algorithm. If the malicious code is 
being encoded by just a single byte of logical/arithmetic operation, the 
security products can also detect the encoded form of the malicious 
code.[4] In order to make the malicious code unrecognizable, the encoder 
needs to use more complex encoding algorithms, such as primitive 
cryptographic ciphers or at least multi-byte encoding schemes. 

Naturally, such complex encoding algorithms require larger decoder 
routines. This brings us to the second critical issue, which is the 
visibility of the decoder routine. Most security products can easily 
detect the existence of such code at the beginning of an encoded blob by 
using static detection rules[5]. There are some new-generation encoder 
software that tries to obfuscate and hide the decoder code by also 
encoding the decoder routine, reducing its size, and obfuscating it 
with garbage instructions.[6] Such designs produce promising results, 
but it does not solve the final and most crucial issue. 

In the end, all of the encoder software produces a piece of self-modifying
code. This means the resulting payload must be running inside a 
Read-Write-Execute (RWE) memory region. Such a scenario can easily be 
considered a suspicious indicator.

--[ 3 - Prior Work

To solve the aforementioned issues, security researchers created several 
machine code obfuscators.[7][8] The purpose of these obfuscators is to 
transform certain machine code instructions with certain rules for 
bypassing security products without the need for self-modifying code. 

These obfuscators analyze the instructions of the given binary and mutate 
them to some other instruction(s). The following lines contain a simple 
example. The original instruction loads the 0xDEAD value into the RCX 
register;

        lea rcx, [0xDEAD] ------+->  lea rcx, [1CE54]
                                +->  sub rcx, EFA7

The obfuscator program creates the above instruction sequence. After 
executing two of the newly generated instructions, the resulting value 
inside RCX will be the same. The assembled bytes of the newly generated 
sequence are different enough to bypass static detection rules. But there 
is one problem here. The final SUB instruction of the generated sequence 
modifies the condition flags. 

Depending on the code, this could affect the control flow of the program. 
To avoid such conditions, these obfuscator programs also add save/restore 
instructions to the generated sequence to preserve all the register values 
including the condition flags. So, the final output of the program will be;

        pushf ; --------------> Save condition flags to stack.
        lea rcx, [1CE54]
        sub rcx, EFA7
        popf  ; --------------> Restore condition flags from the stack.      

If you analyze the final instruction sequence, you will realize that such 
an order of instructions is very uncommon, and wouldn't be generated from 
a compiler under normal circumstances. It means that this method of 
obfuscating instructions can be detected by security products with static 
detection rules for these techniques. 

The following simple Yara rule can be used for detecting all the 
sequences generated by the LEA transform gadget of the Alcatraz 
binary obfuscator.

        rule Alcatraz_LEA_Transform {
            strings:
                // pushf       > 66 9c
                // lea ?, [?]  > 48 8d ?? ?? ?? ?? ??
                // sub ?, ?    > 48 81 ?? ?? ?? ?? ??
                // popf        > 66 9d
                $lea_transform = { 66 9c 48 8d ?? ?? ?? ?? ?? 48 81 ?? ?? ?? ?? ?? 66 9d }
            condition:
                $lea_transform
        } 

This rule can easily be used by security products because it has a very 
low false-positive rate due to the unusual order of the generated 
instruction sequence. Another shortcoming of currently available machine 
code obfuscators is using specific transform logic for specific 
instructions. There are around 1503 instructions available for the 
current Intel x86 instruction set. Designing specific transform gadgets 
for each of these instructions can be considered challenging and 
time-consuming. One of the main goals of this study is to create 
math-based generic algorithms that can be applied to many instructions 
at once.

--[ 4 - Transforming Machine Code

To effectively transform x86 machine code, we need a deep understanding 
of x86 architecture and the instruction set. In the x86 architecture, 
instructions can take up to five operands. We can divide the types of 
operands into three main categories. An operand can either be a register, 
immediate, or memory. These operand types have their subcategories, but 
for the sake of simplicity, we can skip this for now. 

Depending on the mnemonic, an x86 instruction can have all the 
combinations of the mentioned operand types. As such, there are many ways 
to transform x86 instructions. All the compiler toolchains do this all the 
time during the optimization phase[9]. 

These optimizations can be as complex as reducing a long loop condition 
to a brief branch operation or re-encoding a single instruction to a much 
shorter sequence of bytes, effectively making the program smaller and 
faster. The following example shows that there are multiple ways to encode 
certain instructions in x86 architecture.

        add al,10h --------------------> \x04\x10
        add al,10h --------------------> \x80\xC0\x10
        
        adc al,0DCh -------------------> \x14\xDC
        adc al,0DCh -------------------> \x80\xD0\xDC

        sub al,0A0h -------------------> \x2C\xA0
        sub al,0A0h -------------------> \x80\xE8\xA0
        
        sub eax,19930520h -------------> \x2D\x20\x05\x93\x19
        sub eax,19930520h -------------> \x81\xE8\x20\x05\x93\x19
        
        sbb al,0Ch --------------------> \x1C\x0C
        sbb al,0Ch --------------------> \x80\xD8\x0C
        
        sbb rax,221133h ---------------> \x48\x1D\x33\x11\x22\x00
        sbb rax,221133h ---------------> \x48\x81\xD8\x33\x11\x22\x00

Each of the instruction pairs does the exact same thing, with different 
corresponding bytes. This is usually done automatically by the compiler 
toolchains. Unfortunately, this type of shortening can only be applied 
to certain instructions. Also, once you analyze the produced bytes, 
you'll see that only the beginning part (mnemonic code) of the bytes 
are changing, the rest of the bytes representing the operand values are 
exactly the same. This is not good in terms of detection, because most 
detection rules focus on the operand values. These are the reasons why 
we need to focus on more complex ways of de-optimization.  

Most modern compiler toolchains such as LLVM, convert the code into 
intermediate representations (IR) for applying complex optimizations. 
IR languages make it very easy to manipulate the code and apply certain 
simplifications independent from the target platform. At first glance, 
using LLVM sounds very logical for achieving our objective in this study;
it already has various engines, libraries, and tooling built into the 
framework for code manipulation. Unfortunately, this is not the case. 

After getting into the endless rabbit hole of LLVM's inner workings, you 
realize that IR-based optimizations are leaving behind certain patterns 
in the code[10]. When the code is transformed into IR, whether from source 
code or binary lifting[11], you lose control of individual instructions. 
Because IR-based optimizations mainly focus on simplifying and shortening 
well-structured functions instead of raw pieces of code, it makes it hard 
to eradicate certain patterns. Maybe highly skilled LLVM wizards can hack 
their way around these limitations, but we will go with manual disassembly 
using the iced_x86[12] rust disassembly library in this study. It will 
help us thoroughly analyze the binary code and give us enough control 
over the individual instructions.

Since our primary objective is to evade security products, while 
de-optimizing the instructions, we also need to be sure that the 
generated instruction sequence is also commonly generated by regular 
compiler toolchains. This way, our obfuscated code can blend in with 
the benign code, and rule-based detection will not be possible against 
our transform gadgets.

In order to determine how common the generated instructions are, we can 
write specific Yara rules for our transform gadgets, and run the rules on 
a large dataset. For this study, ~300 GB dataset consisting of executable 
sections of various well-known benign EXE, ELF, SO, and DLL files has been 
curated. We will simply run our Yara rules on this dataset and check the 
false positive rate.

--[ 5 - Transform Gadgets

Now, we need a way of transforming individual instructions, while 
maintaining the overall functionality of the program. In order to 
achieve this, we will take advantage of basic math and numbers theory. 

Most instructions in the x86 instruction set can be mapped to equivalent 
mathematical operands. For example, the "ADD" instruction can be directly 
translated to the addition operand "+". The following table shows various 
translation examples:

        MOV, PUSH, POP, LEA ---> =
              CMP, SUB, SBB ---> -
                   ADD, ADC ---> +
                  IMUL, MUL ---> *
                  IDIV, DIV ---> /
                  TEST, AND ---> &
                         OR ---> |
                        XOR ---> ^
                        SHL ---> <<
                        SHR ---> >>
                        NOT ---> '

With this approach, we can easily represent basic x86 instructions as 
mathematical equations. For example, "MOV EAX, 0x01" can be represented 
as "x = 1". A bit more complex example could be;

        MOV ECX,8      ) -------------> z = 8       )
        SHL EAX,2      ) -------------> 4x          )
        SHL EBX,1      ) -------------> 2y          ) ---> ((4x+2y+8)**2)
        ADD EAX,EBX    ) -------------> 4x+2y       )
        ADD EAX,ECX    ) -------------> 4x+2y+8     )
        IMUL EAX,EAX   ) -------------> (4x+2y+8)^2 )

When dealing with code sequences that only contain operations of addition, 
subtraction, multiplication, and positive-integer powers of variables, 
formed expressions can be transformed using polynomial transformation 
tricks. Similar "Data-flow optimization" tricks are being used by compiler 
toolchains during code optimizations[9], but we can also leverage the same 
principles for infinitely expanding the expressions. In the case of this 
example, the above expression can be extended to:

        (16x^2 + 16xy + 64x + 4y^2 + 32y + 64)

When this expression is transformed back into assembly code, you'll see 
that multiple instructions are changed, new ones are added, and some 
disappear. The only problem for us is that some instructions stay exactly 
the same, which may still trigger detection. In order to prevent this, we 
need to use other mathematical methods on a more individual level. 

In the following sections, we'll analyze five different transform gadgets 
that will be targeting specific instruction groups.

--[ 5.1 - Arithmetic Partitioning

Our first transform gadget will target all the arithmetic instructions 
with an immediate type operand, such as MOV, ADD, SUB, PUSH, POP, etc. 

Consider the following example; "ADD EAX, 0x10" 

This simple ADD instruction can be considered the addition (+) operator 
in the expression "X + 16". This expression can be infinitely extended 
using the arithmetic partitioning method, such as:

        (X + 16) = (X + 5 - 4 + 2 + 13) 

When we encounter such instructions, we can simply randomize the 
immediate value and add an extra instruction for fixing it. Based on 
the randomly generated immediate value, we need to choose between the 
original mnemonic, or the arithmetic inverse of it. 

In order to keep the generated code under the radar, only one level of 
partitioning (additional fix instruction) will suffice. Applying many 
arithmetic operations to a single destination operand might create a 
recognizable pattern. Here are some other examples:

        mov edi,0C000008Eh  ---+->  mov edi,0C738EE04h
                               +->  sub edi,738ED76h
        
        add al,10h ------------+->  add al,0D8h
                               +->  sub al,0C8h
        
        sub esi,0A0h ----------+->  sub esi,5062F20Ch
                               +->  add esi,5062F16Ch

        push 0AABBh -----------+->  push 7F08C11Dh
                               +->  sub dword ptr [esp],7F081662h

Upon testing how frequent the generated code sequences are on our sample 
dataset, we see that ~38% of the compiler-generated sections contain such 
instruction sequences. This means that almost one of every three compiled 
binary files contains these instructions, which makes it very hard to 
distinguish.

--[ 5.2 - Logical Inverse

This transform gadget will target half of the logical operation 
instructions with an immediate operand such as AND, OR, or XOR. 

Consider the following example; "XOR R10, 0x10" This simple XOR 
instruction can be written as "X ^ 16". This expression can be 
transformed using the properties of the logical inverse, such as;

        (X ^ 16) = (X' ^ '16) = (X' ^ -17) 

Once we encounter such instructions, we can simply transform the 
instructions by taking the inverse of the immediate value and adding 
an additional NOT instruction for taking the inverse of the destination 
operand. The same logic can also be applied to other logical operands. 

"AND AL, 0x10" instructions can be expressed as "X & 16". Using the same 
logical inverse trick, we can transform this expression into;

        (X & 16) = (X' | 16') = (X' | -17)  

For the case of AND and OR mnemonics, the destination operand needs to 
be restored with an additional NOT instruction at the end. Here are some 
other examples: 

        xor r10d,49656E69h  ---+->  not r10d
                               +->  xor r10d,0B69A9196h

                               +->  not al
        and al,1 --------------+->  or al,0FEh
                               +->  not al

                               +->  not edx
        or edx,300h -----------+->  and edx,0FFFFFCFFh
                               +->  not edx

As mentioned earlier in this article, pattern-based detection rules 
written for detecting malicious "code" mostly target the immediate 
values on the instructions. So, using this simple logical inverse 
trick will sufficiently mutate the immediate value without creating 
any recognizable patterns. 

After testing the frequency of the generated code sequence, we see 
that ~%10 of the compiler-generated sections contain such instruction 
sequences. This is high enough that any detection rule for this 
specific transform won't be used by AV vendors due to the potential 
for high false positives.

--[ 5.3 - Logical Partitioning

This transform gadget will target the remaining half of the logical 
operation instructions with an immediate operand such as ROL, ROR, 
SHL, or SHR. In the case of shift instructions, we can split the 
shift operation into two parts. 

Consider the following example; "SHL AL, 0x05".

This instruction can be split into "SHL AL, 0x2" and "SHL AL, 0x3". The 
resulting AL value and the condition flags will always be the same. In 
the case of roll instructions, there is a simpler way to mutate the 
immediate value. 

The destination operand of these logical operations is either a register, 
or a memory with a defined size. Based on the destination operand size, 
the roll immediate value can be changed accordingly. 

Consider the following example: "ROL AL, 0x01" 

This instruction will roll the bits of the AL register once to the left. 
Since AL is an 8-bit register, the "ROL AL, 0x09" instruction will have 
the exact same effect. Roll transforms are very effective for keeping the 
mutated code size low since we don't need extra instructions. 

Here are some other examples:

        shr rbx,10h -------------+->  shr rbx,8
                                 +->  shr rbx,8
        
        shl qword ptr [ecx],20h -+->  shl qword ptr [ecx],10h
                                 +->  shl qword ptr [ecx],10h
        
        ror eax,0Ah --------------->  ror eax,4Ah

        rol rcx,31h --------------->  rol rcx,0B1h

These transforms modify the condition flags the exact same way as the 
original instruction, and thus can be used safely without any additional 
save/restore instructions. Since the transformed code is very small, 
writing an effective Yara rule becomes quite hard. After testing the 
frequency of the generated code sequences, we see that ~%59 of the 
compiler-generated sections contain such instruction sequences.

--[ 5.4 - Offset Mutation

This transform gadget will target all the instructions with a memory-type 
operand. For a better understanding of the memory operand type, let's 
deconstruct the memory addressing logic of the x86 instruction set. 

Any instruction with a memory operand needs to define a memory location 
represented inside square brackets. This form of representation may 
contain base registers, segment prefix registers, positive and negative 
offsets, and positive scale vectors. Consider the following instruction:

        MOV CS:[EAX+0x100*8]
            |    |    |   |
            +----+----+---+---> Segment Register
                 +----+---+---> Base Register
                      +---+---> Displacement Offset
                          +---> Scale Vector

A valid memory operand can contain any combination of these fields. If it 
only contains a large (the same size as the bitness) displacement offset, 
then it can be called an absolute address. Our Offset Mutation Transform 
gadget will specifically target memory operands with a base register. We 
will be using basic arithmetic partitioning tricks on the memory 
displacement value of the operand. 

The "MOV RAX, [RAX+0x10]" instruction moves 16 bytes from the [RAX+0x10] 
memory location onto itself. Such move operations are very common because 
of operations like referencing a pointer. For mutating the memory operand 
values, we can simply manipulate the contents of the RAX register. 

Adding a simple ADD/SUB instruction with RAX before the original 
instruction will enable us to mutate the displacement offset. 

Here are some examples:

        mov rax,[rax] -------+->  add rax,705EBC8Dh
                             +->  mov rax,[rax-705EBC8Dh]
        
        mov rax,[rax+10h] ---+->  sub rax,20DA86AAh
                             +->  mov rax,[rax+20DA86BAh]
        
        lea rcx,[rcx] -------+->  add rcx,0D5F14ECh
                             +->  lea rcx,[rcx-0D5F14ECh]

In each of these example cases, the destination operand is the base 
register inside the memory operand (pointer referencing). For the other 
cases, we need additional instructions at the end for preserving the 
base register contents. Here are some other examples:

                                    +->  add rax,4F037035h
        mov [rax],edi --------------+->  mov [rax-4F037035h],edi
                                    +->  sub rax,4F037035h

                                    +->  add rbx,34A92BDh
        mov rcx,[rbx+28h] ----------+->  mov rcx,[rbx-34A9295h]
                                    +->  sub rbx,34A92BDh
        
                                    +->  sub rbp,2841821Ch
        mov dword ptr [rbp+40h],1 --+->  mov dword ptr [rbp+2841825Ch],1
                                    +->  add rbp,2841821Ch

The offset mutation transform can be applied to any instruction with a 
memory operand. Unfortunately, this transform may affect the condition 
flags. 

In such a scenario, instead of adding extra save/restore instructions, 
we can check if the manipulated condition flags are actually affecting 
the control flow of the application by tracing the next instructions. 

If the manipulated condition flags are being overwritten by another 
instruction, we can safely use this transform. Due to the massive scope 
of this transform gadget, it becomes quite hard to write an effective 
Yara rule. We can easily consider the instruction mutated by this 
transform to be common, and undetectable.

--[ 5.5 - Register Swapping

This transform gadget will target all the instructions with a register-
type operand, which can be considered a very large scope. This may be 
the most basic but still effective transformation in our arsenal. 

After the immediate and memory operand types, the register is the third 
most common operand type that is being targeted by detection rules. Our 
goal is to replace the register being used on an instruction with another,
same-sized register using the XCHG instruction. 

Consider the "XOR RAX,0x10" instruction. We can change the RAX register 
with any other 64-bit register by exchanging the value before and after 
the original instruction. Here are some examples:

                            +->  xchg rax,rcx
        xor rax,10h --------+->  xor rcx,10h
                            +->  xchg rax,rcx
        
                            +->  xchg rbx,rsi
        and rbx,31h --------+->  and rsi,31h
                            +->  xchg rbx,rsi

                            +->  xchg rdx,rdi
        mov rdx,rax --------+->  mov rdi,rax
                            +->  xchg rdx,rdi

This transform does not modify any of the condition flags, and can be 
used safely without any additional save/restore instructions. 

The generated sequence of instructions may seem uncommon, but due to the 
scope of this transform and the small size of the exchange instructions, 
the generated sequence of bytes is found to be very frequent in our sample 
data set. After testing the frequency of the generated code sequences, we 
see that ~92% of the compiler-generated sections contain such instruction 
sequences.

--[ 6 - Address Realignments

After using any of these transform gadgets, an obvious outcome will be the 
increased code size due to the additional number of instructions. This 
will cause misalignments in the branch operations and relative memory 
addresses. While de-optimizing each instruction, we need to be aware 
of how much the original instruction size is increased so that we can 
calculate a delta value for aligning each of the branch operations. 

This may sound complex, simply because it is :) Handling such address 
calculations is easy when you have the source code of the program. But 
if you only have an already-compiled binary, address alignment becomes 
a bit tricky. We will not dive into the line-by-line implementation of 
post-de-optimization address realignment; the only thing to keep in mind 
is double-checking branch instructions after the alignment. 

There is a case where modified branch instructions (conditional jumps) 
increase in size if they are modified to branch into much further 
addresses. This specific issue causes a recursive misalignment and 
requires a realignment after each fix on branch targets.

--[ 7 - Known Limitations

There are some known limitations while using these transform gadgets. 

The first and most obvious one is the limited scope of supported 
instruction types. There are some instruction types that cannot be 
transformed with the mentioned gadgets. Instructions with no operands 
are one of them. Such instructions are very hard to transform since 
they do not have any operands to mutate. The only thing we can do is 
relocate them somewhere else in the code. 

This is not a very big problem because the frequency of unsupported 
instructions is very low. In order to find out how frequently an 
instruction is being generated by compilers, we can calculate frequency 
statistics on our previously mentioned sample data set. The following 
list contains the frequency statistics of each x86 instruction.

        1.  %33.5 MOV
        2.  %9.2  JCC (All conditional jumps)
        3.  %6.4  CALL
        4.  %5.5  LEA
        5.  %4.9  CMP
        6.  %3.9  ADD
        7.  %3.7  TEST
        8.  %3.5  JMP
        9.  %3.3  PUSH
        10. %3.0  POP
        11. %2.7  NOP
        12. %2.2  XOR
        13. %1.7  SUB
        14. %1.5  INT3
        15. %1.1  MOVZX
        16. %1.0  AND
        17. %1.O  RET
        18. %0.6  SHL
        19. %0.5  OR
        20. %0.5  SHR
        -   %11.3 <OTHER>

Similar instruction frequency studies[13] on x86 instruction set have 
been made on different samples, and it can be seen that the results are 
very much parallel with the list above. The instruction frequency list 
shows that only around 5% of the instructions are not supported by our 
transform gadgets. 

As can be seen on the list, the most commonly used instructions are 
simple load/store, arithmetic, logic, and branch instructions. This 
means that, if implemented properly, previously explained transform 
gadgets are able to transform the ~%95 of the instructions of compiler-
generated programs. 

This can be considered more than enough to bypass rule-based detection 
mechanisms. Another known limitation is self-modifying code. If the code 
is overwriting itself, our transform gadgets will probably break the code. 

Some code may also be using branch instructions with dynamically 
calculated branch targets, in such cases the address realignment becomes 
impossible without using code emulation. Lucky for us, such code is not 
very commonly produced by compiler toolchains. Another rare condition is 
overlapping instructions. Under certain circumstances, compiler toolchains 
generate instructions that can be executed differently when branched into 
the middle of the instruction. Consider the following example:

        0000: B8 00 03 C1 BB  mov eax, 0xBBC10300
        0005: B9 00 00 00 05  mov ecx, 0x05000000
        000A: 03 C1           add eax, ecx
        000C: EB F4           jmp $-10
        000E: 03 C3           add eax, ebx
        0010: C3              ret

The JMP instruction will land on the third byte of the five-byte MOV 
instruction at address 0000. It will create a completely new instruction 
stream with a new alignment. This situation is very hard to detect without 
some code emulation. 

Another thing to consider is code with data inside. This is also a very 
rare condition, but in certain circumstances, code can contain strings of 
data. The most common scenario for such a condition is string operations 
in shellcodes. It is very hard to differentiate between code and data when 
there are no structured file formats or code sections. 

Under such circumstances, our de-optimizer tool may treat data as code and 
corrupt it by trying to apply transforms; but this can be avoided to some 
extent during disassembly. Instead of using linear sweep[14] disassembly, 
control flow tracing with a depth-first search[14] approach can be used
 to skip data bytes inside the code.

--[ 8 - Conclusion

In this article, we have underlined real-life evasion challenges commonly 
encountered by security professionals, and introduced several alternative 
ways for solving these challenges via de-optimizing individual X86 
instructions. The known limitations of these methods are proven not to 
be a critical obstacle to the objective of this study. 

These de-optimization methods have been found to be highly effective for 
eliminating any pattern in machine code. A POC de-optimizer tool[3] has 
been developed during this study to test the effectiveness of these 
de-optimization methods. The tests are conducted by de-optimizing all the 
available Metasploit[15] shellcodes and checking the detection rates via 
multiple memory-based scanners and online analysis platforms. 

The test results show that using these de-optimization methods is proven 
to be highly effective against pattern-based detection while avoiding the 
use of self-modifying code (RWE memory use). Of course, as in every study 
on evasion, the real results will emerge over time after the release of 
this open-source POC tool.

--[ 9 - References
    - [1] https://github.com/forrest-orr/moneta
    - [2] https://github.com/hasherezade/pe-sieve
    - [3] https://github.com/EgeBalci/deoptimizer
    - [4] https://github.com/hasherezade/pe-sieve/blob/603ea39612d7eb81545734c63dd1b4e7a36fd729/params_info/pe_sieve_params_info.cpp#L179
    - [5] https://www.mandiant.com/resources/blog/shikata-ga-nai-encoder-still-going-strong
    - [6] https://github.com/EgeBalci/sgn
    - [7] https://github.com/zeroSteiner/crimson-forge
    - [8] https://github.com/weak1337/Alcatraz
    - [9] https://en.wikipedia.org/wiki/Optimizing_compiler
    - [10] https://monkbai.github.io/files/sp-22.pdf
    - [11] https://github.com/lifting-bits/mcsema
    - [12] https://docs.rs/iced-x86/latest/iced_x86/
    - [13] https://www.strchr.com/x86_machine_code_statistics
    - [14] http://infoscience.epfl.ch/record/167546/files/thesis.pdf
    - [15] https://github.com/rapid7/metasploit-framework

|=[ EOF ]=---------------------------------------------------------------=|
[ News ] [ Paper Feed ] [ Issues ] [ Authors ] [ Archives ] [ Contact ]
© Copyleft 1985-2024, Phrack Magazine.