BSides London 2015 Challenge – Toxic PDF

Introduction

I attended BSides London and, after it was done and over, I found out they had challenges! So I found some time to play with one of them, toxic_pdf! It has been a while since my last PDF challenge, so let’s give this a go!

This challenge is also themed, but more on this in a bit!

Ghastly grim and ancient raven wandering from the nightly shore –
Tell me what thy lordly name is on the Night’s Plutonian shore!’
Quoth the raven, `Nevermore.’

The Challenge

So the challenge is this a PDF file, called crackme.pdf:

/toxic_pdf $ file crackme.pdf
  crackme.pdf: PDF document, version 1.7
/toxic_pdf $ md5sum crackme.pdf 
  9a8e90fb547d8fd3c865ed74782af600  crackme.pdf

And the PDF looks like this:

Figure 1: PDF file for the challenge.
Figure 1: PDF file for the challenge.

So, we’re knocking on a door and must know a password in order to get in. But how is the password checked? Could it be JavaScript?

PDFiD 0.2.1 crackme.pdf
 PDF Header: %PDF-1.7
 obj                   25
 endobj                25
 stream                23
 endstream             23
 xref                   0
 trailer                0
 startxref              1
 /Page                  1
 /Encrypt               0
 /ObjStm                1
 /JS                    0
 /JavaScript            0
 /AA                    0
 /OpenAction            0
 /AcroForm              1
 /JBIG2Decode           0
 /RichMedia             0
 /Launch                0
 /EmbeddedFile          0
 /XFA                   0
 /Colors > 2^24         0

Aha, I see. There’s a Page object (the PDF is a page long) and there is an AcroForm (Adobe’s equivalent of forms for PDF files), presumably there to handle our password. I know that we are thinking the same thing right now: 25 objects, a form, and no JavaScript. Right, yeah, seems legit.

Let’s keep digging! There are two approaches we could take now; the first one is to use qpdf to normalize the pdf and dump everything in the clear (and qpdf is good at this!), or manually inspect the PDF and go through a lot of junk. Let’s go through a lot of junk.

Investigating

The alert reader will have spot it by now: there is an /ObjStm being used there! The ObjStm is a directive that (Quoth The PDF Spec):

An object stream, is a stream object in which a sequence of indirect objects may be stored, as an alternative to their being stored at the outermost file level.

So basically this embeds other objects in it (among others). If we open the object in vim, we’ll get:

obj 913 0
 Type: /ObjStm
 Referencing: 
 Contains stream
  <<  
    /Type /ObjStm
    /N 21
    /First 163 
    /Length 1065
    /Filter /FlateDecode
  >>

Right, we’re getting somewhere! The reason why we might have missed some objects on our first pass could be because they were compressed somewhere in the file, and not readily visible (however still usable once the pdf started running). Dumping this object with pdf-parser:

Figure 2: Analysing the ObjStm definition.
Figure 2: Analysing the ObjStm definition.

Aha! We were correct! It would look like objects 880 and 904 are some JavaScript objects, as well as seeing some additional actions defined (the /AA statement)! Extracting the first object found in the definition, 904, we would get:

function vv56(h3m, n7) {return h3m + n7; } 
function mz821a(hu4, v9) {return v9[hu4] } 
function mmu7d(kjj, y7, y8){var kmn6 = y7 % kjj.length; 
var s = ""; 
while (s.length < kjj.length){s += mz821a(kmn6, kjj); kmn6 = (kmn6 + y8) % kjj.length; } return s; } 
var hnam4 = mmu7d("49%68%DC%79%C9%29%D8%DD%E9%19%4D%D9%18%9D%29%BD%19%B9%AC%3D%F9%C8%F8%09%8C%E8%E9%99%C8%FA%E9%78%9A%99%89%48%79%99%38%19%DD%C8%E8%89%D8%C9%7D%49%29%18%88%38%78%F8%18%9D%6D%C9%DD%99%E9%9D%E9%49%89%9D%9D%89%88%08%9D%38%E8%89%F9%FB%99%8C%D9%C8%48%89%39%DD%A9%8D%E8%6B%99%5D%18%88%D8%E8%1D%F8%39%39%2D%D8%39%3D%0D%DD%DD%EB%F9%59%AD%09%88%69%E8%E9%FC%38%8D%DD%C8%DD%EA%C9%19%19%F9%78%09%A9%FD%B8%C9%A8%DD%89%49%DD%DD%DD%DB%49%49%CB%D9%E9%FC%E8%B9%1C%6D%FC%D9%D9%9D%DD%A9%C9%1D%CD%A9%0A%59%28%DD%E8%29%E9%5D%49%3D%F8%D8%CC%48%29%99%48%28%4C%78%DA%", 199154, 198821); 
var as6z = mmu7d("%ED%B9%C9%08%38%E8%29%89%89%68%A9%28%DD%2C%8D%9C%AB%8B%89%9D%FD%F9%E9%ED%58%F8%ED%8D%D9%59%19%49%C9%BD%39%19%1D%9B%9u%B8%4D%B8%C9%E9%8D%DD%D9%09%F9%F8%B9%B8%BC%49%D9%C9%A9%4D%6D%99%D9%DD%88%C9%28%A9%A8%DB%39%48%6C%ED%28%F8%38%E8%19%4D%59%5D%49%4D%3D%99%3C%89%79%98%C9%5D%BD%3D%49%88%8D%5D%6D%C9%98%C9%B9%09%49%D9%09%D95E9%F8%98%99%49%F9%3D%F8%3D%FD%5D%89%69%8D%9D%D8%99%38%3D%49%4B%99%DC%F9%F8%89%E9%19%C8%59%99%9D%59%59%88%E9%89%18%A9%1D%3D%98%D8%4D%F8%49%49%DD%DD%F9%39%F9%38", 334776, 334478);
function mns51() {if (this.secret === undefined) {return mmu7d("mfoCr", 3436, 3438); } return  mmu7d("omfrB", 527, 528); } 
function alopre7(no, uv) {var yg1; yg1 = mns51(); return mz821a(yg1 + no, uv); } 
function enx(u7b, i8uy) {var coded = ""; var f = alopre7(mmu7d("ohCerda",  5748, 5752), String); for (var i = 0x39+3+~-~-~57; i < mz821a(mmu7d("elhtgn", 5071, 5075), u7b); i++) {coded += f( u7b[mmu7d("dtaoAhCecr", 8828, 8827)](i) ^ i8uy); } return coded; } 
function ty32(){return mz821a(mmu7d(vv56("a", "v") + "el",  2470, 2471), mz821a(mmu7d("ratteg", 4898, 4901), event)); } 
var my6 = ty32(); 
function dsm33() {return my6(mmu7d("nuepacse", 6169, 6175)); } 
var h7 = dsm33(); 
my6(enx(h7(as6z+hnam4), 0xFD));

Obfuscated JavaSCript, who would have thought….. Let’s analyse it! One can see various strings that are anagrams of standard JavaScript functions (‘nuepacse’ is an anagram of ‘unescape’,  ‘elhtgn’ is ‘length’, and so on)! They all seem to be used by passing them to the mmu7d function, along with two numbers. Presumably this function is responsible for restoring the state of the strings. Another interesting one is ‘var i = 0x39+3+~-~-~57’, which is really ‘var i = 0’! A final point to notice, before we present the cleared up JavaScript is that the function mz821a seems to be used like mz821a(object_property,  object), as in the example of mz821a(mmu7d(‘elhtgn’, 5071, 5075), u7b) where it could be translated as u7b[‘length’] or smply u7b.length. I took the following steps to simplify the script:

  • De-obfuscate all strings that use the function mmu7d
  • Simplify arithmetic expressions
  • Simplify all occurrences of the mz821 and vv56a functions
  • Substitute any expression with a simpler, more readable equivalent

I did the above by hacking up a quick script together and feeding it into the Rhino JavaScript engine. I couldn’t just feed it the original script because it’s untrusted code and because it has some Adobe API-specific references (the event object is one of them). The de-obfuscated code looks like this:

var code_part_2 =&nbsp; '%89%88%91%9C%89%94%92%93%8E%DC%DD%A4%92%88%DD%95%9C%8B%98%DD%8E%92%91%8B%98%99%DD%89%95%98%DD%8D%88%87%87%91%98%DF%D1%DD%9E%A9%94%89%91%98%C7%DD%DF%BC%9E%9E%98%8E%8E%DD%9A%8F%9C%93%89%98%99%DF%D1%DD%93%B4%9E%92%93%C7%DD%CE%80%D4%C6%80%DD%98%91%8E%98%DD%86%9C%8D%8D%D3%9C%91%98%8F%89%D5%86%9E%B0%8E%9A%C7%DD%DF%B4%93%8B%9C%91%94%99%DD%8D%9C%8E%8E%8A%92%8F%99%DC%DD%A9%8F%84%DD%9C%9A%9C%94%93%DF%D1%DD%9E%A9%94%89%91%98%C7%DD%DF%AA%8F%92%93%9A%DD%8D%9C%8E%8E%8A%92%8F%99%DF%D1%DD%93%B4%9E%92%93%C7%DD%CD%80%D4%C6%80%80%8F%9C%8B%98%93%D5%D4%C6';
var code_part_1 = '%9B%88%93%9E%89%94%92%93%DD%8F%9C%8B%98%93%D5%D4%DD%86%94%9B%DD%D5%89%95%94%8E%D3%9A%98%89%BB%94%98%91%99%D5%DF%8D%9C%8E%8E%DF%D4%D3%8B%9C%91%88%98%DD%C0%C0%DD%89%95%94%8E%D3%9A%98%89%BB%94%98%91%99%D5%DF%u2598%DF%D4%D3%8B%9C%91%88%98%DD%D6%DD%DF%9C%93%89%94%8E%89%8F%94%93%9A%DF%D4%DD%86%89%95%94%8E%D3%9A%98%89%BB%94%98%91%99%D5%DF%99%92%92%8F%CF%DF%D4%D3%95%94%99%99%98%93%DD%C0%DD%9B%9C%91%8E%98%C6%DD%9C%8D%8D%D3%9C%91%98%8F%89%D5%86%9E%B0%8E%9A%C7%DD%DF%BE%92%93%9A%8F%9C';

function decrypt_string_with_byte(code, key) {
    var coded = '';
    for (var i = 0; i < code.length ; i++) {
        coded += String.fromCharCode( code.charCodeAt(i) ^ key);
    }
    return coded;
}

eval(decrypt_string_with_byte(unescape(code_part_1+code_part_2), 0xFD));

But what is the result of that really complex nested call which is being evaluated, I hear you ask. It’s the following:

//decrypt_string_with_byte(unescape(code_part_1+code_part_2), 0xFD);
function raven() {
if (this.getField('pass').value == this.getField('╥').value + 'antistring') {
    this.getField('door2').hidden = false;
    app.alert(
        {
            cMsg: 'Congratulations! You have solved the puzzle',
            cTitle: 'Access granted',
            nIcon: 3
        }
    );
} else {
    app.alert(
        {
            cMsg: 'Invalid password! Try again',
            cTitle: 'Wrong password',
            nIcon: 0
        }
    );
}
}
raven();

Aha! We have a serious lead now, as we uncovered the following code fragment:

if (this.getField('pass').value == this.getField('╥').value + 'antistring')

Presumably we input our password in the field ‘pass’ and it gets checked against the field ‘╥’. But this is a bit strange! The ‘╥’ character is not on my keyboard, but it turns out to be the Unicode character U+2565. So the pass is stored in a field called U+2565 and it has the string ‘antistring’ appended to it. But where is this filed? I couldn’t find it in the obfuscated PDF, so it’s de-obfuscation round two o’clock!

Beautify, Round Two

Referencing the PDF specification, and specifically section 12.7.3, we get some valuable information about interactive form fields:

Each field in a document’s interactive form shall be defined by a field dictionary, which shall be an indirect object. The field dictionaries may be organized hierarchically into one or more tree structures. Many field attributes are inheritable, meaning that if they are not explicitly specified for a given field, their values are taken from those of its parent in the field hierarchy.

So we’re looking for a dictionary, to begin with (elements enclosed in << … >>). But can we do better? Apparently so!

The T entry in the field dictionary (see Table 220) holds a text string defining the field’s partial field name. The fully qualified field name is not explicitly defined but shall be constructed from the partial field names of the field and all of its ancestors. For a field with no parent, the partial and fully qualified names are the same. For a field that is the child of another field, the fully qualified name shall be formed by appending the child field’s partial name to the parent’s fully qualified name, separated by a PERIOD (2Eh).

Let’s find all /T elements! I couldn’t find them in the original PDF, as it was obfuscated to oblivion, but we can normalize and grep it with the following incantation:

toxic_pdf $ qpdf --stream-data=uncompress crackme.pdf crackme_uncompressed.pdf
toxic_pdf $ grep -a -o -e "/T .*\? " crackme_uncompressed.pdf
    /T (msg) /Type /Annot /V (Knock! Knock! Anybody there?)
    /T (pass) /Type /Annot /V (123)
    /T <feff2565> /Type /Annot /V (9053d91a70acfd6614f0243caac70ce2)
    /T (door) /TU (Click to open the door!) /Type /Annot
    /T (door2) /Type /Annot
    /T (366) /Type /Annot

A couple of interesting points here. We can see the welcoming message and also the tooltip message. There’s also a suspiciously looking Field with label ffef2565 and a random looking long value (values are indicated by the /V key), that looks perfect for being an obscure and unguessable password. ‘╥’ is unicode 2565. Could this string be part of the elusive password?

Solution

It turns out that 9053d91a70acfd6614f0243caac70ce2antistring is the password we are looking for:

Figure 3: The door is opened.
Figure 3: The door is opened.

This challenge was fun! I hadn’t messed around with PDFs in some time, and this coupled with the de-obfuscation exercises gave me a chance to read some more of the mammoth PDF spec! I only wish I knew about this earlier so I could have found out about it and solved before BSides 🙂

P.S.

It dawned on me that we didn’t analyse the second JavaScript snipped we found, the one hinting more towards Mr. Poe. The second JS snipped looks like this:

var pluto = "its%13%05WUA%7DQ%17%025%07D_A@FQ@Q%13%0C%07%15%06Y%10%02%07G%12%0B%10I%11%0DR%08%1A%13XS%5EV%16J%11%15X%10%09%0A%02%01%16SID%0A_%5B@%13%00%0E%05%06%19%100%0A_%04N%1B%0FS%00%1A%0C%07%15%19DGZ%07RBA%5E%5E%02%0F%13%00S%16%5CA%0AD%5BD_%06A%0D%02NU%11%16%12%0E%08T%0C%1D%17%00%10%1E%13P_%5B%1FDZ%5D%04AU%13C%15%10D__S%15%10SZWC%08%0F%17RW%06%17%12%0C%0F%1A%00%03%01%1E%08%1A%0EV%5E%19%13%05LE%0EZQ%15%0A%05DPYCYFQQ@Z%0C%0F%12O%17X%0A%01V%04%00T%08%1D%10%16%0C%0D%08@%10ZQ%0E%5CR%15D%1Eki%25%0BXQCU%12E%5EUG%0A%0E%0F%10%1B%10%01%1C%12%0F%01%03I%0A%1B%07N%1C%02%19QYA%01XU%18%17V%00%0E%0F%08_WC%14%11YF%5C%13%17%09%04CUQ%10%0CQA%1A%06%00%10%1F%01I%0F%09%5D%10%5E%5D%0BN%11%09XGA%17%09DRSEQ%05D%12@%5B%06%0C@i%3Dd%0B%04%5C%0AN%0D%06%06T%14%06%1CGI%5CTJ%0DWVAV%5E%05C%15%01S%16H%5B%13%10S@%13%212%08%07RCB";

function whisper(lenore, tap) {
    var nap = "";
    for (i=0; i<lenore.length;i++) {
        var a = lenore.charCodeAt(i);
        var b = a ^ tap.charCodeAt(i % tap.length);
        
        nap = nap+String.fromCharCode(b);
    }
    return nap;
}

this.getField("366").value = whisper(unescape(pluto), event.value);

Hmm, it would look like this is another decryption function that behaves similarly to the enx function of the first obfuscated snippet, although it would appear that this one uses a variable length key (as opposed to a single byte). Unless I’ve missed something en route to here, I now only know of one possible key to open the door with. Turns out that it works here as well, and the text for variable pluto is:

PDF and JavaScript are often abused by attackers to hide exploit code. Some of their tricks include multiple layers of encryption, clever strings and integer manipulation, automatic form actions, hidden anddecoy objects.

Congratulations, by now you’re already familiar with the basic tricks and know how to detect them!

Thank you for playing and see you at BSides!

Pretty cool huh? Thank you to whoever made this challenge, it was fun 🙂

Advertisements
BSides London 2015 Challenge – Toxic PDF

One thought on “BSides London 2015 Challenge – Toxic PDF

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s