==Phrack Inc.== Volume 0x0d, Issue 0x42, Phile #0x0D of 0x11 |=----------------------------------------------------------------------=| |=---------=[ Hacking the Cell Broadband Engine Architecture ]=---------=| |=-------------------=[ SPE software exploitation ]=--------------------=| |=----------------------------------------------------------------------=| |=--------------=[ By BSDaemon ]=----------=| |=--------------=[ ]=----------=| |=----------------------------------------------------------------------=| "There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies" - C.A.R. Hoare ------[ Index 1 - Introduction 1.1 - Paper structure 2 - Cell Broadband Engine Architecture 2.1 - What is Cell 2.2 - Cell History 2.2.1 - Problems it solves 2.2.2 - Basic Design Concept 2.2.3 - Architecture Components 2.2.4 - Processor Components 2.3 - Debugging Cell 2.3.1 - Linux on Cell 2.3.2 - Extensions to Linux 2.3.2.1 - User-mode 2.3.2.2 - Kernel-mode 2.3.3 - Debugging the SPE 2.4 - Software Development for Linux on Cell 2.4.1 - PPE/SPE hello world 2.4.2 - Standard Library Calls from SPE 2.4.3 - Communication Mechanisms 2.4.4 - Memory Flow Control (MFC) Commands 2.4.5 - Direct Memory Access (DMA) Commands 2.4.5.1 - Get/Put Commands 2.4.5.2 - Resources 2.4.5.3 - SPE 2 SPE Communication 3 - Exploiting Software Vulnerabilities on Cell SPE 3.1 - Memory Overflows 3.1.1 - SPE memory layout 3.1.2 - SPE assembly basics 3.1.2.1 - Registers 3.1.2.2 - Local Storage Addressing Mode 3.1.2.3 - External Devices 3.1.2.4 - Instruction Set 3.1.3 - Exploiting software vulnerabilities in SPE 3.1.3.1 - Avoiding Null Bytes 3.1.4 - Finding software vulnerabilities on SPE 4 - Future and other uses 5 - Acknowledgements 6 - References 7 - Notes on SDK/Simulator Environment 8 - Sources ------[ 1 - Introduction This article is all about Cell Broadband Architecture Engine [1], a new hardware designed by a joint between Sony [2], Toshiba [3] and IBM [4]. As so, lots of architecture details will be explained, and also many development differences for this platform. The biggest differentiator between this article and others released about this subject, is the focus on the architecture exploitation and not the use of the powerful processor resources to break code [5] and of course, the focus in the differentiators of the architecture, which means the SPU (synergestic processor unit) and not in the core (PPU - power processor unit) [6], since the core is a small-modified power processor (which means, all shellcodes for Linux on Power will also works for the core and there is just small differences in the code allocation and stuffs like that). It's important to mention that everything about Cell tries to focus in the Playstation3 hardware, since it's cheap and widely deployed, but there is also big machines made with this processor [7], including the #1 in the list of supercomputers [8]. ---[ 1.1 - Paper structure The idea of this paper is to complete the studies about Cell, putting all the information needed to do security research, focused in software exploitation for this architecture together. For that, the paper have been structured in two important portions: Chapter 2 will be all about the Cell Architecture and how to develop for this architecture. It includes many samples and explains the modifications done to Linux in order to get the best from this architecture. Also, it gives the knowledge needed in order to go further in software exploitation for this arch. Chapter 3 is focused in the exploitation of the SPU processor, showing the simple memory layout it has and how to write a shellcode for the purpose of gaining control over an application running inside the SPU. ------[ 2 - Cell Broadband Engine Architecture From the IBM Research [9]: "The Cell Architecture grew from a challenge posed by Sony and Toshiba to provide power-efficient and cost-effective high-performance processing for a wide range of applications, including the most demanding consumer appliance: game consoles. Cell - also known as the Cell Broadband Engine Architecture (CBEA) - is an innovative solution whose design was based on the analysis of a broad range of workloads in areas such as cryptography, graphics transform and lighting, physics, fast-Fourier transforms (FFT), matrix operations, and scientific workloads. As an example of innovation that ensures the clients' success, a team from IBM Research joined forces with teams from IBM Systems Technology Group, Sony and Toshiba, to lead the development of a novel architecture that represents a breakthrough in performance for consumer applications. IBM Research participated throughout the entire development of the architecture, its implementation and its software enablement, ensuring the timely and efficient application of novel ideas and technology into a product that solves real challenges." It's impossible to not get excited with this. A so 'powerful' and versatile architecture, completely different from what we usually seen is an amazing stuff to research for software vulnerabilities. Also, since it's supposed to be widely deployed, there will be an infinite number of new vulnerabilities coming on in the near future. I wanted to exploit those vulnerabilities. ---[ 2.1 - What is Cell As must be already clear to the reader, I'm not talking about phones here. Cell is a new architecture, which cames to solve some of the actual problems in the computer industry. It's compatible with a well-known architecture, which are the Power Architecture, keeping most of it's advantages and solving most of it's problems (if you cannot wait until know what problems, go to 2.2.1 section). ---[ 2.2 - Cell History The focus of this section is just to give a timeline vision for the reader, not been detailed at all. The architecture was born from a joint between IBM, Sony and Toshiba, formed in 2000. They opened a design center in March 2001, based in Austin, Texas (USA). In the spring of 2004, a single Cell BE became operational. In the summer of the same year, a 2-way SMP version was released. The first technical disclosures came just in February 2005, with the simulator [10] and open-source SDK [11] (more on that later) been released in November of the same year. In the same month, Mercury started to sell Cell (yeah, sell Cell sounds funny) machines. Cell Blades was announced by IBM in February of 2006. The SDK 1.1 was released in July of the same year, with many improvements. The latest version is 3.1. ---[ 2.2.1 - Problems it solves The computer technology have been evolving along the years, but always suffering and trying to avoid some barriers. Those barriers are physically impossible to be bypassed and that's why the processor clock stopped to grow and multi-core architectures been focused. Basically we have three big walls (barriers) to the speedy grow: - Power wall It's related to the CMOS technology limits and the hard limit to the acceptable system power - Memory wall Many comparisons and improvements trying to avoid the DRAM latency when compared to the processor frequency - Frequency wall Diminishing return from deeper pipelines For a new architecture to work and be widely deployed, it was also important to keep the investments in software development. Cell accomplish that being compatible with the 64 bits Power Architecture, and attacks the walls in the following ways: - Non-homogeneous coherent multi-processor and high design frequency at a low operating voltage with advanced power management attacks the 'power wall'. - Streaming DMA architecture and three-level memory model (main storage, local storage and register files) attacks the 'memory wall'. - Non-homogeneous coherent multi-processor, highly-optimized implementation and large shared register files with software controlled branching to allow deeper pipelines attacks the 'frequency wall'. It have been developed to support any OS, which means it supports real-time operating system as well non-real time operating systems. ---[ 2.2.2 - Basic Design Concept The basic concept behind cell is it's asymmetric multi-core design. That permits a powerful design, but of course requires specific-developed applications to achieve the most of the architecture. Knowing that, becomes clear that the understanding of the new component, which is called SPU (synergistic processor unit) or SPE (synergistic processor element) proofs to be essential - see the next section for a better understanding of the differences between SPU and SPE. ---[ 2.2.3 - Architecture Components In cell what we have is a core processor, called Power Processor Element (PPE) which control tasks and synergistic processor elements (SPEs) for data-intensive processing. The SPE consists of the synergistic processor unit (SPU), which are a processor itself and the memory flow control (MFC), responsible for the data movements and synchronization, as well for the interface with the high-performance element interconnect bus (EIB). Communications with the EIB are done in a 16B/cycle, which means that each SPU is interconnected at that speedy with the bus, which supports 96B/cycle. Refer to the picture architecture-components.jpg in the directory images of the attached file for a visual of the above explanation. ---[ 2.2.4 - Processor Components As said, the Power Processor Element (PPE) is the core processor which control tasks (scheduling). It is a general purpose 64 bit RISC processor (Power architecture). It's 2-way hardware multithreaded, with a L1: 32KB I and D caches and L2: 512KB cache. Has support for real-time operations, like locking the L2 cache and the TLB (also it supports managed TLB by hardware and software). It has bandwidth and resource reservation and mediated interrupts. It's also connected to the EIB using a 16B/cycle channel (figure processor-components.jpg). The EIB itself supports four 16 bytes data rings with simultaneous transfers per ring (it will be clarified later). This bus supports over 100 simultaneous transactions achieving in each bus data port more than 25.6 Gbytes/sec in each direction. On the other side, the synergistic processor element is a simple RISC user-mode architecture supporting dual-issue VMX-like, graphics SP-float and IEEE DP-float. Important to note that the SPE itself has dedicated resources: unified 128 x 128 bit register files and 256KB local storage. Each SPE has a dedicated DMA engine, supporting 16 requests. The memory management on this architecture simplified it's use, with the local storage of the SPE being aliased into the PPE system memory (figure processor-components2.jpg). MFC in the SPE acts as the MMU providing controls over the SPE DMA access and it's compatible with the PowerPC Virtual Memory layout and is software controllable using PPE MMIO. DMA access supports 1,2,4,8...n*16 bytes transfer, with a maximum of 16 KB for I/O, and with two different queues for DMA commands: Proxy & SPU (more on this later). EIB is also connected in a broadband interface controller (BIC). The purpose of this controller is to provide external connectivity for devices. It supports two configurable interfaces (60 GB/s) with a configurable number of bytes, coherent (BIF) and/or I/O (IOIFx) protocols, using two virtual channels per interface, and multiple system configurations. The memory interface controller (MIC) is also connected to the EIB and is a Dual XDR controller (25.6 GB/s) with ECC and suspended DRAM support (figure processor-components3.jpg). Still are missing two more components: The internal interrupt controller (IIC) and the I/O Bus Master Translation (IOT) (figure processor-components4.jpg). The IIC handles the SPE interrupts as well as the external interrupts and interrupts comming from the coherent interconnect and the IOIF0 and IOIF1. It is also responsible for the interrupt priority level control and for the interrupt generation ports for IPI. Note that the IIC is duplicated for each PPE hardware thread. IOT translates bus addresses to system real addresses, supporting two level translations: - I/O segments (256 MB) - I/O pages (4K, 64K, 1M, 16M bytes) Interesting is the resource of I/O device identifier per page for LPAR use (blades) and IOST/IOPT caches managed by software and hardware. ---[ 2.3 - Debugging Cell As the bus is a high-speedy circuit, it's really difficult to debug the architecture and better seen what is going on. For that, and also to made it easy to develop software for Cell, IBM Research developed a Cell simulator [10] in which you may run Linux and install the software development kit [11]. The IBM Linux Technology Center brazilian team developed a plugin for eclipse as an IDE for the debugger and SDK. Putting it all together is possible to have the toolkit installed in a Linux machine, running the frontends for the simulator and for the SDK. The debugging interface is much better using this frontends. Anyway, it's important to notice that it's just a frontend for the normal and well know linux tools with extended support to Cell processor (GDB and GCC). ---[ 2.3.1 - Linux on Cell Linux on cell is an open-source git branch and is provided in the PowerPC 64 kernel line. It started in the 2.6.15 and is evolving to support many new features, like the scheduling improvements for the SPUs (actually it can be preempted, and my big friend Andre Detsch who reviewed this article was one of the biggest contributors to create an stable code here). On Linux it added heterogeneous lwp/thread model, with a new SPE thread model (really similar to the pthreads library as we will see later), supporting user-mode direct and indirect SPE access, full-preemptive SPE context management and for that, spe_ptrace() was create and it's support added to GDB, spe_schedule() for thread to physical spe assigment (it is not anymore FIFO - run until completion). As a note, the SPE threads shares it's address space with the parent PPE process (using DMA), demand paging for SPE access and shared hardware page table with PPE. An implementation detail is the PPE proxy thread allocated for each SPE to provide a single namespace for both PPE and SPE and assist in SPE initiated C99 and Posix library services. All the events, error and signal handling for SPEs are done by the parent PPE thread. The ELF objects for SPE are wrapped into PPE objects with an extended GLD. ---[ 2.3.2 - Extensions to Linux Here I'll try to provide some details for Linux running under a Cell Hardware. The base hardware used for this reference is a Playstation 3, which has 8 SPUs, but one is reserved with the purpose of redundancy and another one is used as hypervisor for a custom OS (in this case, Linux). All the details are valid for any Linux on Cell and we will provide an top-down view approach. ---[ 2.3.2.1 - User-mode Cell supports both power 32 and 64 bits applications, as well as 32 and 64 cell workloads. It has different programming modes, like RPC, devices subsystems and direct/indirect access. As already said, it has heterogeneous threads: single SPU, SPU groups and shared memory support. It runs over a SPE management runtime library, with 32 and 64 bits. This library interacts with the SPUFS filesystem (/spu/thread#/) in the following ways: * Open, close, read, write the files: - mem This file provides access to the local storage - regs Access to the 128 register of 128 bits each - mbox spe to ppe mailbox - liox spe to ppe interrupt mailbox - xbox_stat Get the mailbox status - signal1 Signal notification acess - signal2 Signal notification acess - signalx_type Signal type - npc Read/write SPE next program counter (for debugging) - fpcr SPE floating point control/status register - decr SPE decrementer - decr_status SPE decrementer status - spu_tag_mask Access tag query mask - event_mask Access spe event mask - srr0 Access spe state restore register 0 * open, close mmap the files: - mem Program State access of the Local Storage - signal1 Direct application access to signal 1 - signal2 Direct application access to signal 2 - cntl Direct application access to SPE controls, DMA queues and mailboxes The library also provides SPE task control system calls (to interact with the SPE system calls implemented in kernel-mode), which are: - sys_spu_create_thread Allocates a SPE task/context and creates a directory in SPUFS - sys_spu_run Activates a SPU task/context on a physical SPE and blocks in the kernel as a proxy thread to handle the events already mentioned Some functions provided by the library are related to the management of the spe tasks, like spe create group, create thread, get/set affinity, get/set context, get event, get group, get ls, get ps area, get threads, get/set priority, get policy, set group defaults, group max, kill/wait, open/close image, write signal, read in_mbox, write out_mbox, read mbox status. Obviously the standard 32 and 64 bits powerpc ELF (binary) interpreters, it is provided a SPE object loader, responsible for understand the extension to the normal objects already mentioned and for initiate the loading of the SPE threads. Going down, we have the glibc and other GNU libraries, both supporting 32 and 64 bits. ---[ 2.3.2.2 - Kernel-mode The next layer is the normal system-call interface, where we have the SPU management framework (through special files in the spufs) and modifications in the exec* interface, in a 64bit kernel. This modification is done through a special misc format binary, called SPU object loader extension. Of course there is other kernel extensions, the SPUFS filesystem, which provides the management interface and the SPU allocation, scheduling and dispatch. Also, we do have the Cell BE architecture specific code, supporting multi and large pages, SPE event & fault handling, IIC and IOMMU. Everything is controlled by a hypervisor, since Linux is what is called a custom OS when running in a Playstation3 hardware (the hypervisor is responsible for the protection of the 'secret key' of the hardware and knowing how to exploit SPU vulnerabilities plus some fuzzing on the hypervisor may be the needed knowledge to break the game protection copy in this hardware). ---[ 2.3.3 - Debugging the SPE The SDK for Linux on Cell provides good resources for Debugging and better understanding of what is going on. It's important to note the environment variables that control the behaviour of the system. So, if you set the SPU_INFO, for example, the spe runtime library will print messages when loading a SPE ELF executable (see above). ---------- begin output ---------- # export SPU_INFO=1 # ./test Loading SPE program: ./test SPU LS Entry Addr : XXX ---------- end output ---------- And it will also print messages before starting up a new SPE thread, like: ---------- begin output ---------- Starting SPE thread 0x..., to attach debugger use: spu-gdb -p XXX ---------- end output ---------- When planning to use the spu-gdb to debug a SPU thread, it's important to remember the SPU_DEBUG_START environment variable, which will include everything provided by the SPU_INFO and will stop the thread until a debugger is attached or a signal is received. Since each SPU register can hold multiple fixed (or floating) point values of different sizes, for GDB is provided a data structure that can be accessed with different formats. So, specifying the field in the data structure, we can update it using different sizes as well: ---------- begin output ---------- (gdb) ptype $r70 type = union __gdb_builtin_type_vec128 { int128_t uint128; float v4_float[4]; int32_t v4_int32[4]; int16_t v8_int16[8]; int8_t v16_int8[16]; } (gdb) p $r70.uint128 $1 = 0x00018ff000018ff000018ff000018ff0 (gdb) set $r70.v4_int[2]=0xdeadbeef (gdb) p $r70.uint128 $2 = 0x00018ff000018ff0deadbeef00018ff0 ---------- end output ---------- To permit you to better understand when the SPU code starts the execution and follow it gdb also included an interesting option: ---------- begin output ---------- (gdb) set spu stop-on-load (gdb) run ... (gdb) info registers ---------- end output ---------- Another important information for debugging your code is to understand the internal sizes and be prepared for overlapping. Useful information can be get using the following fragment code inside your spu program (careful: It's not freeing the allocated memory). --- code --- extern int _etext; extern int _edata; extern int _end; void meminfo(void) { printf("\n&_etext: %p", &_etext); printf("\n&_edata: %p", &_edata); printf("\n&_end: %p", &_end); printf("\nsbrk(0): %p", sbrk(0)); printf("\nmalloc(1024): %p", malloc(1024)); printf("\nsbrk(0): %p", sbrk(0)); } --- end code --- And of course you can also play with the GCC and LD arguments to have more debugging info: --- code --- # vi Makefile CFLAGS += -g LDFLAGS += -Wl,-Map,map_filename.map --- end code --- ---[ 2.4 - Software Development for Linux on Cell In this chapter I will introduce the inners of the Cell development, giving the basic knowledge necessary to better understand the further chapters. ---[ 2.4.1 - PPE/SPE hello world Every program in Cell that uses the SPEs needs to have at least two source codes. One for the PPE and another one for the SPE. Following is a simple code to run on the SPE (it's also in the attached tar file : --- code --- #include int main(unsigned long long speid, unsigned long long argp, unsigned long long envp) { printf("\nHello World!\n"); return 0; } --- end code --- The Makefile for this code will look like: --- code --- PROGRAM_spu = hello_spu LIBRARY_embed = hello_spu.a IMPORTS = $(SDKLIB_spu)/libc.a include ($TOP)/make.footer --- end code --- Of course it looks like any normal code. The PPE as already explained is the responsible for the creation of the new thread and allocation in the SPE: --- code --- #include #include extern spe_program_handle_t hello_spu; int main(void) { int speid, status; speid=spe_create_thread(0, &hello_spu, NULL, NULL, -1, 0); spe_wait(speid, &status, 1); return 0; } --- end code --- With the following Makefile: --- code --- DIRS = spu PROGRAM_ppu = hello_ppu IMPORTS = ../spu/hello_spu.a -lspe include $(TOP)/make.footer --- end code --- The reader will notice that the speid in the PPE program will be the same value as the speid in the main of the SPE. Also, the arguments passed to the spe_create_thread() are the ones received by the SPE program when running (argp and envp equals to NULL in our sample). Important to remember that when compiled this program will generate a binary in the spu directory, called hello_spu and another one in the root directory of this example called hello_ppu, which CONTAINS embedded the hello_spu. ---[ 2.4.2 - Standard Library Calls from SPE When the SPE program needs to use any standard library call, like for example, printf or exit, it has to call back to the PPE main thread. It uses a simple stop-and-signal assembly instruction with standardized arguments value (important to remember that since it's needed in shellcodes for SPE). That value is returned from the ioctl call and the user thread must react to that. This means copying the arguments from the SPE Local Storage, executing the library call and then calling ioctl again. The instruction according to the manual: "stop u14 - Stop and signal. Execution is stopped, the current address is written to the SPU NPC register, the value u14 is written to the SPU status register, and an interrupt is sent to the PPU." This is a disassembly output of the hello_spu program: ---------- begin output ---------- # spu-gdb ./hello_spu GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "--host=powerpc64-unknown-linux-gnu --target=spu"... (gdb) disassemble main Dump of assembler code for function main: 0x00000170 : ila $3,0x340 <.rodata> 0x00000174 : stqd $0,16($1) 0x00000178 : nop $127 0x0000017c : stqd $1,-32($1) 0x00000180 : ai $1,$1,-32 0x00000184 : brsl $0,0x1a0 # 1a0 0x00000188 : ai $1,$1,32 # 20 0x0000018c : fsmbi $3,0 0x00000190 : lqd $0,16($1) 0x00000194 : bi $0 0x00000198 : stop 0x0000019c : stop End of assembler dump. (gdb) ---------- end output ---------- ---[ 2.4.3 - Communication Mechanisms The architecture offers three main communications mechanism: - DMA Used to move data and instructions between main storage and a local storage. SPEs rely on asyncronous DMA transfers to hide memory latency and transfer overhead by moving information in parallel with SPU computation. - Mailbox Used for control communications between a SPE and the PPE or other devices. Mailboxes holds 32-bit messages. Each SPE has two mailboxes for sending messages and one mailbox for receiving messages. - Signal Notification Used for control communications from PPE or other devices. Signal notification (also known as signalling) uses 32-bit registers that can be configured for one-sender-to-one-receiver signalling or many-senders-to-one-receiver signalling. All three are controlled and implemented by the SPE MFC and it's importance is related to the way the vulnerable program will receive it's input. ---[ 2.4.4 - Memory Flow Control (MFC) Commands This is the main mechanism for the SPE to access the main storage and maintain syncronization with other processors and devices in the system. MFC commands can be issued either by the SPE itself, or by the processor and other devices, as follow: - A code running on the SPU issue a MFC command by executing a series of writes and/or reads using channel instructions. - A code running on the PPU or any other device issue a MFC command by performing a serie of stores and/or loads to memory-mapped I/O (MMIO) registers in the MFC. The MFC commands are then queued in one of those independent queues: - MFC SPU Command Queue - For channel-initiated commands by the associated SPU - MFC Proxy Command Queue - For MMIO-initiated commands by the PPE or other devices. ---[ 2.4.5 - Direct Memory Access (DMA) Commands The MFC commands that transfers data are referred as DMA commands. The transfer direction for DMA commands are based on the SPE point of view: - Into a SPE (from main storage to the local storage) -> get - Out of a SPE (from local storage to the main storage) -> put ---[ 2.4.5.1 - Get/Put Commands DMA get from the main memory to the local storage: (void) mfc_get (volatile void *ls, uint64_t ea, uint32_t size, uint32_t tag, uint32_t tid, uint32_t rid) DMA put into the main memory from the local storage: (void) mfc_put (volatile void *ls, uint64_t ea, uint32_t size, uint32_t tag, uint32_t tid, uint32_t rid) To guarantee the synchronization of the writes to the main memory, there is the options: - mfc_putf: the 'f' means fenced, or, that all commands executed before within the same tag group must finish first, later ones could be before - mfc_putb: the 'b' here means barrier, or, that the barrier command and all commands issued thereafter are NOT executed until all previously issued commands in the same tag group have been performed ---[ 2.4.5.2 - Resources For DMA operations the system uses DMA transfers with variable length sizes (1, 2, 4, 8 and n*16 bytes (n an integer, of course). There is a maximum of 16 KB per DMA transfer and 128b aligments offer better performance. The DMA queues are defined per SPU, with 16-element queue for SPU-initiated requests and 8-element queue for PPU-initiated requests. The SPU-initiated request has always a higher priority. To differentiate each DMA command, they receive a tag, with a 5-bit identifier. Same identifier can be applied to multiple commands since it's used for polling status or waiting on the completion of the DMA commands. A great feature provided is the DMA lists, where a single DMA command can cause execution of a list of transfers requests (in local storage). Lists implements scatter-gather functions and may contain up to 2K transfer requests. ---[ 2.4.5.3 - SPE 2 SPE Communication An address in another SPE local storage is represented as a 32-bit effective address (global address). SPE issuing a DMA command needs a pointer to the other SPE's local storage. The PPE code can obtain effective address of an SPE's local storage: --- code --- #include speid_t speid; void *spe_ls_addr; spe_ls_addr=spe_get_ls(speid); --- end code --- This permits the PPE to give to the SPEs each other local addresses and control the communications. Vulnerabilities may arise don't matter what is the communication flow, even without involving the PPE itself. Follow is a simple DMA demo program between PPE and SPE (see the attached file for the complete version) - This program will send an address in the PPE to the SPE through DMA: --- PPE code --- information_sent is[1] __attribute__ ((aligned 128))); spe_git_t gid; int * pointer=(int *)malloc(128); gid=spe_create_group(SCHED_OTHER, 0, 1); if (spe_group_max(gid) < 1 ) { printf("\nOps, there is no free SPE to run it...\n"); exit(EXIT_FAILURE); } is[0].addr = (unsigned int) pointer; /* Create the SPE thread */ speid=spe_create_thread (gid, &hello_dma, (unsigned long long *) &is[0], NULL, -1, 0); /* Wait for the SPE to complete */ spe_wait(speids[0], &status[0], 0); /* Best pratice: Issue a sync before ending - This is good for us ;) */ __asm__ __volatile__ ("sync" : : : "memory"); --- end code --- --- SPE code --- information_sent is __attribute__ ((aligned 128))); int main(unsigned long long speid, unsigned long long argp, unsigned long long envp) { /* Where: is -> Address in local storage to place the data argp -> Main memory address sizeof(is) -> Number of bytes to read 31 -> Associated tag to this DMA (from 0 to 31) 0 -> Not useful here (just when using caching) 0 -> Not useful here (just when using caching) */ mfc_get(&is, argp, sizeof(is), 31, 0, 0); mfc_write_tag_mask(1<<31); /* Always 1 left-shifted the value of your tag mask */ /* Issue the DMA and wait until completion */ mfc_read_tag_status_all(); } --- end code --- And now between two SPEs (also for the complete code, please refer to the attached sources): --- PPE code --- speid_t speid[2] speid[0]=spe_create_thread (0, &dma_spe1, NULL, NULL, -1, 0); speid[1]=spe_create_thread (0, &dma_spe2, NULL, NULL, -1, 0); for (i=0; i<2; i++) local_store[i]=spe_get_ls(speid[i]); /* Get local storage address */ for (i=0; i<2; i++) spe_kill(speid[i], SIGKILL); /* Send SIGKILL to the SPE threds */ --- end code --- --- SPE code --- /* Write something to the PPE */ spu_write_out_mbox(buffer); /* Read something from the PPE */ pointer = spu_read_in_mbox(); /* DMA interface */ mfc_get(buffer, pointer, size, tag, 0, 0); wait_on_mask(1< 0x3FFFF SPU ABI Reserved Usage ------------------------ | Stack grows from the Runtime Stack | higher addresses to ------------------------ | the lower addresses. Global Data | ------------------------ \/ .Text ------------------------ -> 0x00000 For the purpose of test your application, it's really interesting to use the 'size' application: ---------- begin output ---------- # size hello_spu text data bss dec hex filename 1346 928 32 2306 902 hello_spu ---------- end output ---------- ---[ 3.1.2 - SPE assembly basics It's important in order to develop a shellcode to understand the differences in the SPE assembly when comparing to PowerPC. The SPE uses risc-based assembly, which means there is a small set of instructions and everything in the SPE runs in user-mode (there is no kernel-mode for the SPE). That said we need to remember there is no system-calls, but instead there is the PPE calls (stop instructions). It is also a big endian architecture (keep that in mind while reading the following sections). This architecture provides many ways to avoid branches in the code for maximum efficiency. Since it's not a real problem while exploiting software, I'll just avoid to talk about and will also avoid to talk about SIMD instructions. For more informations on that refer to the SPU Instruction Set Architecture document [12]. ---[ 3.1.2.1 - Registers I already explained a little about the way the architecture works and in this section I'll just include what is the available register set and how to use it . The SPE does not define a conditional register, so the comparison operations will set results that are either 0 (false) or 1 (true) with the same width as the operands been tested. This results are used to do bitwise masking, instruction selection or conditional branching. As any other platform, there is general purposes registers and special purpose registers in the SPE: - General Purpose Registers (0-127) Used in different ways by the instructions. In the second word of R1 you have the information about the amount of free space in the stack (the room between end of the heap and the start of the stack). - Special Purpose Registers The SPE also supports 128 special-purpose registers. Some interesting ones: * SRR0 - Save and Restore Register 0 - Holds the address used by the interrupt return (iret) instruction * LR - Link Register - All branch instructions that set the link register will force the address of the next instruction to be loaded on this register * CTR - Count Register - Usually it's used to hold a loop counter (like the loop instruction and %ecx register in intel x86 architecture) * CR - Condition Register - Used to perform conditional comparisons To move data between Special Purpose Registers and General Purpose Registers we have the instructions * mtspr (move to special purpose register) mfspr (move from * special purpose register) ---[ 3.1.2.2 - Local Storage Addressing Mode In order to address information to/from Local Storage the instructions uses the following structure: Instruction_Opcode l10_field RA_field RT_field 8-bit 10-bit 7-bit 7-bit Where: The signed value of the l10 field is appended with 4 zeros and then added to the preferred slot in the RA, forcing the 4-rightmost bits of the sum to zero. After, the 16 bytes of the local storage address are inserted in the RT field. Preferred slot for the architecture point of view are the leftmost 4 bytes (not bits). Important to note here that the IBM convention specifies that: l10 means a 10-bit immediate value RA means a general purpose register to be used as source/destination RT means a general purpose register to be used as destination (target) Knowing that makes it easier to understand why the Local Storage Address Space is limited to 4 GB. The actual size of the Local Storage can be viewed accessing the LSLR (local storage limit register). All effective address are ANDed with the value in the LSLR before used. ---[ 3.1.2.3 - External Devices The SPU can send/receive data to/from external devices using the channel interface. The channel instructions uses quadwords (128bits) to transfer data to/from general purpose registers and the channel device (which supports 128 channels). ---[ 3.1.2.4 - Instruction Set Here are some useful instructions to be used while developing a shellcode for the SPE. Instruction Operands Description Sample ------------------------------------------------------------------------- lqd (load quadword) rt,symbol(ra) load a value (16 bytes) from Local Storage (pointed by RA to the general purpose register RT) lqd $0, 16($1) stqd (store quadword) rt,symbol(ra) the contents of the register (RT) are stored at the local storage address pointed by RA stqd $0, 16($1) ilh (immediate load halfword) rt,symbol the value of l16 is placed in register RT ilh $0, 0x1a0 il (immediate load word) rt, symbol the value of l16 is expanded to 32bits replicating the leftmost bit and then written to the RT il $0, 0x1a0 nop (no operation) rt this instruction uses a false RT and nothing is changed nop $127 ila (immediate load address) rt, symbol the value of the l18 is placed in the rightmost 18bits of RT (the remaining bits of RT are zeroed) ila $3, 0x340 a (add word) rt,ra,rb the operand on register ra is added to the operand on register rb and the result is written to RT a $0, $1, $2 ai (add word immediate) rt,ra,value the value (l10 field) is added to the operand in ra and the result written to RT ai $1, $1, -32 brsl (branch relative and set link) rt,symbol execution proceeds to the target instruction and a link register is set (the symbol is a l16 type and it is extended to the rigth with two 0 bits) - The address of the current instruction is added to the symbol address for the branch. The address of the next instruction is written to the preferred byte of the RT register. brsl $0, 0x1a0 fsmbi (form select mask for bytes immediate) rt,symbol the symbol is a l16 value used to create a mask in the register RT copying eight times each bit. Bits in the operand are related to bytes in the result in a left-to-right correspondence. fsmbi $3, 0 bi (branch indirect) ra execution proceeds to the preferred slot of RA. The right two bits in the RA are ignored (supposed to be zero). There is two flags, D and E to disable and Enable interrupts. bi $0 ---[ 3.1.3 - Exploiting Software Vulnerabilities in SPE First of all it's important to make it even more clear that it is impossible to, for example, force the SPE process to execute a new command (a.k.a. execve() shellcodes). The same happens for network-based library functions and others, as already explained we need the PPE to proxy that for us. So it open two new paths: - Create a PPE shellcode to be used while exploiting PPE software vulnerabilities that will spawn a proxy for commands received by the SPE and will create a SPE thread to do all the job -> This is pure PPC shellcode and this article already discussed everything needed to achieve that. In the attached sources you have samples in the directory cell-ppe/ [16]. - Create a vulnerability specific code for the SPE, that will print out internal program information related to the exploited SPE. This is specially interesting and difficult because: * Need to remember that the SPE uses instruction-cache, so sometimes if you overflow just a small amount of bytes, it will be specially difficult to get it executed * If you use the wrap-around characteristics of the memory layout for the SPE, you will probably overwrite also the information you are interested in. In the other hand, it's important to say that everything the information will be in the same place (or easier to understand: there is no ASLR in the SPE). Running the attached samples (specially the SPE-SPE communications because it's printing the pointers addresses will make it clear to the reader). ---[ 3.1.3.1 - Avoiding Null Bytes It is important to avoid null bytes, so we cannot use the NOP instruction in our shellcode. This creates a problem, since the ori instruction will also generate null byte if used with 0 as an argument (e.g: ori $1, $1, 0). A good replacement is the instruction or (e.g: or $1, $1, $1) or the usage of multiple instructions (which will reduce the probability of your return address). ---[ 3.1.4 - Finding software vulnerabilities on SPE The simulator provided by IBM has a feature that monitors selected addresses or regions on the Local Store for read and write accesses. This feature can help identify stack overflows conditions.o Invoked from the simulator command windows as follows: enable_stack_checking [spu_number] [spu_executable_filename] This procedure uses the nm system utility to determine the area of the Local Storage that will contain the program code and creates trigger functions to trap writes by the SPU into this region. Important to notice that this approach are just looking for writes in the text and static data and not to the heap. Of course the same approach used by this feature could be used to help the creation of a fuzzer using TCL scripts based on the one provided. ------[ 4 - Future and other uses I can't foresee the future, but this kind of architectures are becoming more and more common and will open a wide range of new vulnerabilities. The complexity behind this kind of asymmetric multi-threaded architectures are even higher than the normal ones. The lack of memory protection will help also the attackers on how to subvert those systems. The main processor been based on an already well-known architecture (powerpc) also helps the dissemination of malicious codes. Many other researchers are doing stuff using Cell: - Nick Breese presented on Crackstation project in BlackHat [5] Basically he used the SIMD capabilities and big registers provided by the architecture to crack passwords [5] - IBM Researchers released a study about the usage of the Cell SPU as a Garbage Collector Co-processor [14] - Maybe there is JTAG-based interfaces on the cell machines to try to use RiscWatch [15] - Unfortunelly the SPU access are controlled by the PPE so run integrity protection mechanisms from SPU seens infeasible -> Anyway, I wrote a network traffic analyzer using cell as base architechture. ------[ 5 - Acknowledgments A lot of people helped me in the long way for these researches that resulted in something funny to be published, you all know who you are. Special thanks to the Phrack Staff for the great review of the article, giving a lot of important insights about how to better structure it and giving a real value to it. I always need to thanks to Filipe Balestra, my research partner, for sharing with me his ideas, feedbacks, comments and experiences improving a lot the article and the samples. I'll never ever forget to say thanks to my research team and friends at RISE Security (http://www.risesecurity.org) for always keeping me motivated studying completely new things. Be sure that the unix-asm [16] project will be updated soon with all the stuff showed here and much more different types of shellcodes for the architecture. Also, of course the updates will be available for Metasploit. Big thanks to the Cell Kernel guru, Andre Detsch for sharing with me his ideas and discussing the internals of the Linux implementation for Cell. Conference organizers who invited me to talk about Cell Software Exploitation, even after many people already talked about Cell they trusted that my talk was not about brute-forcing (yeah, a lot of fun in completely different cultures). To my girlfriend who waited for me (alone, I suppose) during this travels. It's impossible to not say thanks to COSEINC, for let me keep doing this research using important company time. ------[ 6 - References [1] Cell Broadband Engine Architecture, v1.01 October 2006 http://cell.scei.co.jp/pdf/CBE_Architecture_v101.pdf [2] Sony Computer Entertainment http://www.sony.com [3] Toshiba Corporation http://www.toshiba.com [4] IBM Corporation http://www.ibm.com [5] Breese, Nick; "Crackstation"; Black Hat Europe 2008 http://www.blackhat.com/presentations/bh-europe08/Bresse/Presentation/bh-eu-08-breese.pdf [6] IBM Power Architecture http://www-03.ibm.com/chips/power/ [7] IBM Bladecenter QS21 http://www.ibm.com/systems/bladecenter/hardware/servers/qs21/index.html [8] IBM Roadrunner Supercomputer http://en.wikipedia.org/wiki/IBM_Roadrunner [9] The cell project at IBM Research http://www.research.ibm.com/cell/ [10] Cell Simulator http://www.alphaworks.ibm.com/tech/cellsystemsim [11] Cell resource center at developerWorks (SDK download) http://www-128.ibm.com/developerworks/power/cell/ [12] Synergistic Processor Unit Instruction Set Architecture v1.2 http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/76CA6C7304210F3987257060006F2C44/$file/SPU_ISA_v1.2_27Jan2007_pub.pdf [13] Moore, H.D; "Mac OS X PPC Shellcode Tricks"; Uninformed Magazine 2005 http://www.uninformed.org/?v=1&a=1&t=txt [14] Cher, Chen-Yong; Gschwind, Michael; "Cell GC: Using the Cell Synergistic Processor as a Garbage Collector Coprocessor"; 2008 http://www.research.ibm.com/cell/papers/2008_vee_cellgc_slides.pdf [15] RISCWatch Debugger http://www.ibm.com/chips/techlib/techlib.nsf/products/RISCWatch_Debugger [16] Carvalho, Ramon de; "Cell PPE Shellcodes"; RISE Security; http://www.risesecurity.org/papers/lopbuffer.pdf Others: PowerPC User Instruction Set Architecture, Book I, v2.02 January 2005 http://moss.csc.ncsu.edu/~mueller/cluster/ps3/SDK3.0/docs/arch/PPC_Vers202_Book1_public.pdf PowerPC Virtual Environment Architecture, Book II, v2.02 January 2005 http://moss.csc.ncsu.edu/~mueller/cluster/ps3/SDK3.0/docs/arch/PPC_Vers202_Book2_public.pdf PowerPC Operating Environment Architecture, Book III, v2.02 January 2005 http://moss.csc.ncsu.edu/~mueller/cluster/ps3/SDK3.0/docs/arch/PPC_Vers202_Book3_public.pdf Cell developer's corner at power.org http://www.power.org/resources/devcorner/cellcorner/ Linux info at the Barcelona Supercomputing Center website http://www.bsc.es/projects/deepcomputing/linuxoncell ------[ 7 - Notes on SDK/Simulator Environment There is some pictures on the simulator and sdk running on the attached file: images/cell-sim1.jpg and images/cell-sim2.jpg To install the SDK/Simulator, do: - Download the Cell SDK ISO image from the IBM alphaWorks website. - Mount the disk image on the mount directory: mount -o loop CellSDK.iso /mnt/phrack - Change directory to /mnt/phrack/software: - Install the SDK by using the following command and answer any prompts: ./cellsdk install To start the simulator: cd /opt/IBM/systemsim-cell/run/cell/linux ../run_gui Click on the 'go' button to start the simulated system To copy files to the simulated system (inside it run): callthru source /home/bsdaemon/Phrack/hello_ppu > hello_ppu Then give the correct permissions and execute: chmod +x hello_ppu ./hello_ppu ------[ 8 - Sources [cell_samples.tgz] Attached all the samples used on this article to be compiled in a Linux running on Cell machine. Further updates will be available in the RISE Security website at: http://www.risesecurity.org For the author's public key: http://www.kernelhacking.com/rodrigo/docs/public.txt begin 644 cell_samples.tgz M'XL(`#EF&$H``^P]:W/;.)+Y:E7-?T`EF5G;D62^*2>9U#E./.-:QW'9SLW- M;;9<%`G9O%"DAJ3\F*O[[]<`2`E\2K)H64Z`LBV+`!J-;C38W0`:-O:\B\@: MCCP<[3Q[F"1!,G6=?D+*?]+_95F1#,F`/\HS258467F&]`?")Y/&46R%"#T+ M@R"N*S_@_OCR^9'P?S\UW5#@7*RJDJFX/\J4@G_Z=^+T6CD'Y__.=@MMH_U@=.?A08PV][>0 M(DF[Y.'&:>"$[F6`3L=]-[30^]#R[0!MOC_[8.%AX&\A*+7Q-F2E+OHL>]L/ MSD[V/FVC$$?8"NVKKAU$V/5M^!R^:VWOM%JM%_#5&SL8O8UBQPVZ5^^X1Y[; MCT:8/&OAVQB'/H*O%Z,PN`RMX<65Y3L>OH@1&Z71:/RFU7+]&`TMU]^\#EQG MJ_6_K0WR!!@;CZ,WK0VH[SI0A7Y"TIO6_[4>F\$S4HG\ M?[*^X8'KX<;:F"7_BF:F\B_!W`_RKYFJ)N1_%>G#X>D9>DU$8=S:/SC:^XU^ MZURVCCZP;Z_@VQ]>N_/)&K7[D4,EOSNT1JV3T\^_G>Y](B\*4F7RUIAD`$CT MZU1.6X>?3CZ?GJ>M[4PRNA;J>"!(K70*>+EY_OED:V<(X[`[`++C<-VEZ.FF M$ODGS&FTC07T/WC_FT3^9=D0^M\J4@7_I\+9@`XX8_Z7)45)^"^;F@'E9$-1 M93'_KR*MG_XW]EUXFGT&#T`MS#ZS^_AB.+!S!4=C\I#!3/5'H@Q>X!B^O3OY2K3"3!D"FBM#OA;* M^`Y7PG>R^5$__+8I;:4EDJ_9,D/+\P)[$P1(FQ3DG\T)\4DIK"(UFBKF_T9M M@%GZOVRF\[]FFII$['](8OY?1:I2UA-;@)D"A^]/]T[_O,##/G;X4EUKHM3_ M"EK[V8=_0E&2L;4#T[4-V4*E7_.4D7_/M;$?X6Y\V^A0GRG_H/,1^=<54]85 MXO_3-$GX_U:2'D/_>^P^BS1-&?FWHN%#K`$N9/\KY/VOF*8L[/]5I`+_P?RZ MA<^.W%6[4F<4W.!PV3$Q/_]50U4,:O_K@O\K27/QWW-]]E^G[SM18'^S`P<; M6O=LOC9FO/]U0R/ZORP314`EZW^*0MR`XOV_@K2S_1-YUZ.7A\YK5,EGI$E$ M+>AU9+DCFTA27FOZ:U7Z;Q1:H`B@EP0&@U,-HH..7']\BT[HD-I__W$/'>/X M)@B_H0B'USA$I"B#0O014"JN8MHJ.J6M@!FQ;X77EG<5H/^T/`^CM[3Y_PC= M"'0->QRZ\5TW""_?3=$YOW(CP`ETD_`.P;^#$&,4!8/XQ@KQ&W07C)%M^:"K M.&X4AVY_'&/DQLCRG9T@9""&@>,.[LC3L>\`EO$51F"_#",4#.B7WXZ_H",< M02?0;]C'H>6ADW$?=&E6_X@IUG@%4*'/-Z[GH3Y&XP@/QEZ;`8'BZ(_#\]\_?SE'>\=_HC_V M3D_WCL__?`/%XZL` M_TDZ^AD[_3\O+Y[&,7H3.,4YHGU"TC M_(3F`X`Z#("T#HXMUXLX0OP)K(\`5\]!5]8UAB%@8_<:,+5@'([N%F6N%_B7 MM/=0:TKA-\@=(#^(V^@&!B>@'A39S@!->=]&AZ`LMY$N0SG+_P82A40^7Y-_,C!?L[*WP$R2?@Y5N3WYP_(\UTORE-WV MQ<7^WE':P@M$`.$X_6Y;DY*]=D>7Y5?R)JFU5<@V6;;"9T>Q,Y[@T.E!EES( M(7`KF@A,%QJVD#2)*2 M;U8Q"`VGI.J[OE-"*#TAE)%"9R7HP")I&Z85?CS]=G(:$?MO[,5D4OD;AT&$ M;JY@2H!IQ&$CT'/'D`>3S]B.DRF*@8(FM#;\E0WRMT?^2&EF^KF3HD"@)$BJ M;>EVVL<@9"."9B2YDN8H*4O2:H2-6H99',!,#DL]@*G'NKC3:KUP\,#U052.S\\^[_]S[\.'4QGI M6MES!>F]_'.RC(N0H;1:]A5,B)P(_^O?OZ(T[0#^6]/56D@DP#HCGLO^)T0@QRF$1V:1\J:6+"D)M@$`!#844JE6&G+BJK5T637F"4[S&RO'_8` MQ*X'HLP&HG*R@[-`DNZ`["BS94?5ZV0'3,9RX2FCB:*.1N(!J/64:N`$)H8]4!XFFA2'4VT M:L(V0A,5!IO93P'E,+&[Z;N8.#>J@6@P(?4D!@3S([9_B5^1_]]RBM`K6>J] M*R,LC%@+I%@A2LYN5LFA3I$0YFGR4]<=BHF2=,?A,?&Q]ZJ`B:J48@*5[1XA M*F!D\"Q.G"VAFG]]EA`6B.K@$G5K*CO`8^:BJ09"!!"S<:+H%0((,#J:F2<+ M-Y\`D'X*)#N?,'=H1F^3>'HJ@6BS9K:JV3$'I%Y3JGKM/,3,E@J@9-<( MH"Y5:TH-"&#JHFJ]$1NZGG!:T/\W\>(NY\0@,:I2V[1[#_KNO_6]H M3>W_D51=D?/VOZ:*\]\K23/L?\+F)3P`M+KP`0@?@/`!/*0/8"IN^;TZW[V" M;&CO*H`(!;D(9-#C%.3=#!!GHB#W9NBV!(A1"T2=#>2[59`?^X6^8%IB__>< MIW]GG_^%:2VO_^FF(?2_5:2*\[\9/M_G]&\6P)Q;P,7A7W'X5QS^7QWCO.BO)( MCY',F%*5[?4L/"\O\8>$))+VRC>H!9MQG&"6\;>Q@[1(G5XW5GUQM M\-SJ8YQ:'4^SR\=$Y1GU[_O4JDA-I07MO\$#Q'_2#,DLQ'_25&'_K2)5V'^# MY>,_Y4&4V(`'KN\D>IJP_X3])^R_];7_!A7!GP8/$/QIT&#PITLR\*H-/6IQR<7X.OE8.?-$D>(C[TSU<4[5E%,U MM*CQH-_9,Q85F/E%Y2R[OZ M>YHWM?7U*37(M$7`MFD14PE+?]4Y49V,Z82 M:Z=@*LDBR(\PEY;8_]/4^@\H^X7S/YHB]/^5I`K]/\/G^VC_60!U6X"$TB^4 M?J'TKZ72SPEQE6@WHO`7=9`?6@.90_]H3+F8]_T_Y>;B[YA9^W\-,[__5S)E M\?Y?28%=O]-*8CNOV,XKMO,N$*$E6+H\=961YBIK-(^Z'%*2A M%NK1,ZHDSY1*\B*6"9-\+G/BVX%,2X_9+;*^E4D]03['!UX MO'/E25:N--^57.DD*U>![V"N`LO*E>=[G2O/LG+E>5+DRG-9A(_8A]=:4QZC M^^__;>[^'UDS"NN_NKC_9R5I]O[?^WJ`\B#$'F#A#A+NH*?L#LI(=+6D-^(2 MRL!;]/_>+`)('(.X`$NL'8OW@Z=X!Q$<6$7<`+7\'T*#/ MW6-2'D-#K@BBD072JP7"+-&90.JC>3"C5=P!M"WN`!)W`&6!K,T=0$2*E7HI MKHKLPP&Q9D4'4N:;E+1Z(,8CS"?B#B!Q!U`5)HO<`61H_"U`CWD'$,%D>@N0 MB'OVH\4]$W<`B;1\NO_Y_\;B_YJZ5/#_&9*(_[N25.[_R[!Y<>]?MOH9OR$W3C-7*!-C6[[5JSNS?3[.9] M@17N&7.F>Z:Q6[B)=Z6F.\9<-PDO?7]O8S>3+\UB-;$,:XP8-F)KC9C4O*SV M2LH5=T-D%7;BGJG&A+AX2@=^$0C%I-2,#1QS^3= M@>OHGJ&>6KN.Q=13VZL=;)8*0'9+UA;2F!5L'Y:TNTGB550`40A-=DN:=*[,B"$)G+]4HDVJVA*W:ERXK` M*.E,'@A6F7>RTN]5>&_EN[-6?B^Y5^OWHD19TN_%LWC7*&7QBOQ>/":R(HF[ MKW_<"[&$WZO)M*#_IY\)]]7,_B]9+=G_I0G_STI2N?\GQ^;%/4!Y`#7[O\#B MO@:C6[B!A!M(N(&:<0.]Y]Q`>B]Q`V5$,G=3M")V3W5QY'T%A-U?!3]"P+ZF1W5S4^VG4>C^5U>VALNN!*$]U#U7I M]*A6S(YK*<6-3=2-`%E#%E=Y*QB+\W[E=61Q\S0Q:FF2WRNWKC0Q]=J7ESSS MY=7,_D/,G-Q6G1]6>5)^V!7L/\RH[J]D[?'V'^8P,<7^0['_4/AAGW"ZO_^W MN?M?#:40_ULWA/]W):DB_E.&S_>)_I0%,)_[5X1^$J&?1.BGM0S]Q(ESE9`W M$O:)@[A609^^Z^M?27C,IQSQ:=K'THA/)8&=:MA8>:.FHBQXD6LA1-1R%[D2 M+C5Z1Q()A$0"E\Z^%VLNPMVK4PR#AKO%XJN6!Z2J0H35>1AQDU-Q6R)B5J,7 M0HF[Y_;63_#SG_I8OX3X^49I[_NM_^GSP`<09,;/X1FW_6X@Q8 M(923.`,FSH`UO##=V%ZFI=?9O^J3TDEJI]3S= MV$#73?4I-1Y%GFU_%.5&]G-+/*R M=@J+O+)8Y,V/S^]^D;]HZ-_ERG_$UDM%^]&%/\7[L#!`P0H@&:2 M7G//:24>DRJ(F5GN[/\G:)-W,$A8K\5@30V M43<"9`U9+&*_B=AO54!$[+Q_UK6J=/1#K$58V2AZ"XB M>_[08.S3*$$D8I#K,$_@V0GU!)X!/.+2'1)G'I2]0S[&#G;0T,=C0,VUXXCX M>/T@1M9UX#KH^,O148W;@QY;K3]VG5AB[ML?>9$BP:C\AS4@JZV/%<_QM#"&@7 MXR%9SO.BI($/@?\/VE?/^ONN37J,0,@[5D1]E#B*2:$6S*L;77P;X]!'.`S] M@,S'-YABQAR=4#KQ,;)\LIA/24^KLF\;(1YYEHTOAK@%#V,`"!^6YU[Z2&FU MIKFO"?CDZW23`MD1FO8S'070;,JG#MV[``,"7W91,,)^&]E>0'SBS.=*$'&] MJ_'&2Z6-)/3V+>IM;'#M2-.64NC^>-@'(I.:4?R7L_%2;:..;&Q+'579?!F- MMBKJ)_4`UY$5`M9`MPCPHX2BH_D2QVS?A1O"2)L4XEK2TI9DHY&6(@Q#T\DU MQ6""8+&AV@^N*R#&[A`3N`X=+3"6*=@`7NO_0T2%-C!"FQ)%M9V@W$Z(U.UV MMTAKWE_AQDNC3;WJP.0HLBYQ:\-R*5FA(.OQIO1*6:BWC*\6)9ETJQY`HGP% M]-+.@_8!\PTK&F&/JF=D%-!?TKB6UOCF)P*=5ODEW1%#*]LWP!J]C33:,0!V M-1[T-R:@H',O=0J*#>JDD[1JII^JDO)9H=]2<'>^W:)L.0^@__$8Q(U,8\!. M_Y)"Z;N1Q]I3"4$=JO6DPY*2[AZC)6EI')%60GQ)PFZ%2"4+5/AVA&VR&F39 M=A`ZI``!M/?^D!'SRONK?^=NI(24%=K[S$R0C.F0%J*/-C9(1YC&]E+:2*20 MXD`*PYS0I_3S-DC`-ZG%#YC7!%@PVI!N%5F26=V]&^L;>P&A41BUU;NLU]WSXPDA`0FBV4GGMZ4%XVD>?4\NG_3W3(@73?P<=:X@[HL MT2.7_Y;X+TSXMCD.-N*_9@'_ZW5-\3\7RN0__*X-ME/&&OU/JS?T&/^;J/^9 MNK+_R87^X?7&:$1HO[5/C]OMPS?']LL_VSNA.4$R?:<8W$Q=?(,'<65_[10+ M<]#/^EPC"AASA^[('0>V#\I."\7\-OR04KVXZ;-"@?&0KHFW42IKA6@/;OU" M3.N!K!M9T1R='J(&,"-#BD0^9/(P=;H7,*8N6V$^)*^!2-!Q.I!!63=^XX8/ M")G)RGR34H)]-?=;<0O'^V;4'='2_#\%00NZVMUB&>OFOVZ:(?ZCF?5P_5?S M_^[I[/S#F_/#4WLZG1<.&`R`XM')>1MFYP&"!,634S0D@NL#5D-H$(>(#?^O M.6QWZ$_=HK3X_[7\\<-9Y<4(1D^M!UWESG[6&?-ST=+\1\!JY-H$6]2"ZVV, M^?7S7^[_IJ73_&_J#37_0;*^(A\['7X9LK'QWLRD$(#M'8 MKNN@3>UD1#OQR4ODO-_E;>WBLIC7KOZX9H?[7P(4?]3^U_N=#H]3FWP M>\R1M MLI0[SJP_3;WQC+GC+],*Z:(Q_8V-F&T[@7!WL&U6+A/H#.^B]E>I5$C#*T#[ M,"1)^>FH*LI`U772*X\J56;J5:;!/_'L"_[5D\#ILRLOH)US^-6Y\9G.\/QT MUQ]XO2!24_$Y:%#'ZZ%3!*FI6!H=5]EPTQXY_N>ROK]OZE$!YR""T0GB5\>C MHYA0XX6-&29JI/)B7BBP458PD(.Y;V.T%)'7=`8=W2N7I`I]P)YJZB=$I=_[>,`:R5_QM6N/Y;IL[M/PRU_N=!0O]OXWY? MV",$`'\6WY^\/#\\_Z_MCJ[<;NQ&S8EC`K^6VT?OX%&\4WD!BW\'[LNEOZ`P M@8=/2_,?-^DMR7V2ULQ_P]`B^0^_!0/R?[.AXO_D0MS_-S_S+W)5C"3&SL#M M@L`82X)%!`2FQ30I;,:2R,I@Z2DN?L:DST71LU@4!DY0`AHY]F?.R!Z`I#4$ M84RN<*UB))ZBY1B(E<4"OM#W0,YB\+=%UW1%_X=K?(/+7U7VK.L$3HLE1-$+ M_3)+&I7"*+<+&;@S%^5*,ED`86[28<*Z`;+$K$&,*V-YSRKB/KTOWGXEC.C& MK'UVS/JSR7S*S6=Z4![4G1U0X[FMG4WWR^U7;X^/[`\?WQZ?5[6J7JFP@P.R MEZ.F,]834DRAJ;"G6%25V("F/ZB!A'QA"UW$ M4ADC2LO@3':O8!:H=Y2%'O94#FM-#K%_#V[0Z)"A*=,?-+A@7/LC&-&V_64R M=`*0''%XE_"!$MNC_THC6#)F-R7*)-0XBM^41'"_E-S_=X'M]X?_-4V0!@C_ M,Q7^EPNE\G_+-D`;\-\P+#S_:>B6IOB?!ZWD_Y9L@-;I_X9N)?C?U#6E_^=" MTO[GK7UX='0>V?W(ZRCE7^ES\,ME.,V0:Q@D0`W9TX`$A&0/!WY/F=,`_Z4W/"K.A/S=E982RD9OX] M4>K\%YS.Z?RW7C>-30OPBEJDN!(GH'&OPB$V80 MF^:]GCM#R(3+2I`D@"K7C[`F_A0[$'"@3;`B@H)5J\(R0*,(,XJ_Q/0UJ)DH M,:Q$LE!Q<,YK6]F\?&,M:C=R1[X;B#;3T3Q'/PO`^\[TAH5W2FWI5E>KU2A3 M0J\^S&="9IW/A-N>;,SN[RBF0I-DM\=O8C3$#V#H2)1 MO%M@>:N@/)V#>0++6POE+=917UU'8TMU-#:K(W'D"/4+Z78HG3*A(!Y`$<:& M,V2D4A"WT!RBC).$>?OAY(&+Y\\K4$Z,84(-\63#^VY@#WW9']ZEA#-?0X:N MTQD08"Z:L:8@[`P$0N/F&>A*P":]'@Q.Z1B>K'KAZP!#LI=_P>H@K&I/YH$] MNII M+60^DE(/Y)I$R+%GGF#L3)\*(_L3)Y!0N;>$EJ>5%V/X&P\!>G*YA\NZ&3R1%/(O\5DQ8P_PVOQ-K5/WKP[>?]^>V<-<@RCWS"+ M(K]^*CWQ/Y5H_/+^?J0'$ZGR_Y9]`-;J_Z'_7].H-TS$_ZRZ.O_/A>[*_A]D MQ`[N1,.;*@]5(7+!HL9%M^L%42`4MLM&WIB"-W=Q1W-XO!.T@+R:]WO>-2SB MCV0ZYDZI\W_+/@#KYG]T_M.P-`OU_[H)2X(Z_\F!,OE/P-UVRMC@_,_4=([_ M&HK_N=!J_A-P^[=1P#7SWP#6A_M_@_P_FNK[?SG1:H^."'B[G4^(0-\2_A^I M+B#R6!'U$WLRYHX0U^B+G^(@<5UIL51?A_&-\'58--G;*>)YY"*(MA,#T3J^ M.(^,I'5S5RU#(%[5QJ=Y([Y*V.Z9`8Y5YX2NCH)@F%FU+/8+[)G*W@,Q!DG55AJC'2ZZ?C)"E,6U:B7B1.1$TYA M8<1()D6U7$!E%AJY5`G^PJN!,^[S5@EV!9,)?VNQNUX+ M-')+"0Y`)?'#*:$;SW0>@9J\^SL"D*S2K^_HBR8Y]')&*&X M`*&KR92"S/%7I[-)!^$DBC>'*CLTB.='B$(9;3:_*?>A'Y!6[__;,018J_]K MH?^/!<(`[O_-IJ;V_SQ(GO'[>,:_1V"?L;/H_0/;D[B!9_CB\+^0X?VSG$@" M1.SP/\4M2"T;]T:KYK]^G_J?LO_,A5;S/__U7[=TOOZK^-^Y4,KZKV>M_[I: M_W\Z6CW_B>EWCO\8C4:(_YB61OA/7=E_YD(*__EA\1_MCO&?'Q/^H>`I,2PG M!%CZ,:NQ9(G?#;0D&+(:(OVO M*?BO@?IOD?]?4\E_N1!%^]WCP7Y?O7Y_^(:N=OL+SC^0PH<&7!3C&J-,CX<* MYIF]"&^H4,$/F3+G_Q9M@#98_W6K0?[?9EW9?^1"*_F_I3U@W?JO6T:X_AOT M_;^FI?"_?"AK-1=[`6T%BVA@[*E8,+@UL>#4FO]`:>7\CQC]MS#`=?/?C.(_ M:G43\3^K;JGX#[G0@_/_C,(09R".J5Z5$H9YBR.6G0UF3N?S+\*7\/$Y=6Q` MF?,_%/CO'/_7&O5(_JL;W/X#Q$`U_W.@!S?_(__OE0[6X=:T%*X1EX,H'.-R MK$:1D.T.&V:=XA/+*JTH1!Y;BI&G_W!>9`OSGY99.H?8:@BXC?2_ILG]/YI* M_\N#LOB_S1!`WX'_:>K[7_F0PO\>-V7-_VVZ`&ZV_C=5_,<<:17_M[4'?`?^ MU]24_UC!RGDFPAT=AM'3HH]0 MRTS]RAY[,N7_T'Q,UE[^,.0/DT<5FKEN63Z#)<<3C&2"N0Q/9,W_+<)_&^%_ E3?K^2[.AJ?._7.C!S7^%_RE2I$B1(D6*%-TI_1]!W`^D`+@!```` ` end --------[ EOF