–[ Introduction ]
Hello all! It has been some time since my last post, so I thought I’d write about something fun that I came across!
Return of The Penguin is the first reversing challenge from B-Sides London 2016, brought to us by the creator of last year’s Toxic PDF. In this challenge we are given an unknown binary and a number of guiding questions to answer!
–[ The Challenge ]
The unknown binary is a mystery to us (yet!), but judging by the guiding questions it should be an executable, as opposed to a different file-format, and it is very likely to contain malware-like features such as C&C communication, anti-VM or anti-disassembly tricks and so on.
The questions we are meant to answer as part of our analysis are:
- First, how can the application be analysed statically?
- So the creator took his time to remove any clues, but what about dynamic analysis?
- You dare to run the sample – in your isolated VM of course. What happens with the process?
- What is the purpose of the main function?
- Are there any external domains contacted by the sample?
–[ The Binary ]
It will surprise non of the readers that our first order of business is to identify what we have to deal with. In particular, I’d like to know at least a target OS and architecture! So, without further ado:
-> % file elf elf: ELF LSB executable, x86-64, version 1 (SYSV), unknown class 3
This is a good start, but there’s something off about this. This “unknown class 3” message should be investigated! Running hexdump on this binary also hints towards a 64-bit architecture:
00000230 |......../lib64/l| 00000240 |d-linux-x86-64.s| 00000250 |o.2.............| 00000260 |GNU.............|
Right, so let’s assume that we’re dealing with a 64-bit ELF binary. Binaries have headers! What are this binary’s headers?
$ readelf --headers elf ELF Header: Magic: 7f 45 4c 46 03 01 01 00 00 00 00 00 00 00 00 00 Class: <unknown: 3> ... OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Advanced Micro Devices X86-64 .... readelf: Warning: possibly corrupt ELF file header - it has a non-zero section header offset, but no section headers There are no program headers in this file.
So, turns out that we were correct, it’s a 64-bit ELF! Why are the headers corrupted though? And in particular, what is this class field that is unknown? Consulting  (p. 1-5, 1-6) tells us that class can be any of 0,1 or 2, but not 3! So this is likely a first trick in order to stop us from analyzing the file!
Indeed, running the makes it crash, but gdb won’t run it and ltrace also fails with a similar message. strace, on the other hand, does run it but hangs on a ptrace syscall! So, we are likely up against a number of anti-debugging and anti-disassembly tricks.
–[ Anti-Everything tricks ]
At this point I believe we can answer the first question, how can the application be statically analysed. The first trick used serves a double purpose, as it is meant to prevent both disassembly (or other static analysis) and debugging (with gdb at least).
The technique used is an anti-disassembly method that is meant to throw off parsers and disassemblers that assume correct data and structure. By providing an unknown class in the ELF header, then most parsers (including objdump, readelf and IDA) will be unable to parse the ELF file. We should edit this value (5th byte, starting at the top of the file) to be a 2, indicating a 64-bit architecture.
Having patched the binary, we can now successfully run objdump and readelf, as well as load it in a debugger or IDA. Dumping the symbols, we can start speculating about the binary’s capabilities:
$ readelf -s elf_mod Symbol table '.dynsym' contains 17 entries: Num: Value Size Type Bind Vis Ndx Name 1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND strlen@GLIBC_2.2.5 (2) 3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND system@GLIBC_2.2.5 (2) 4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND snprintf@GLIBC_2.2.5 (2) 5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND memset@GLIBC_2.2.5 (2) 7: 0000000000000000 0 FUNC GLOBAL DEFAULT UND srand@GLIBC_2.2.5 (2) 9: 0000000000000000 0 FUNC GLOBAL DEFAULT UND prctl@GLIBC_2.2.5 (2) 11: 0000000000000000 0 FUNC GLOBAL DEFAULT UND ptrace@GLIBC_2.2.5 (2) 12: 0000000000000000 0 FUNC GLOBAL DEFAULT UND mprotect@GLIBC_2.2.5 (2) 13: 0000000000000000 0 FUNC GLOBAL DEFAULT UND getpagesize@GLIBC_2.2.5 (2) 14: 0000000000000000 0 FUNC GLOBAL DEFAULT UND sleep@GLIBC_2.2.5 (2) 15: 0000000000000000 0 FUNC GLOBAL DEFAULT UND fork@GLIBC_2.2.5 (2)
A number of interesting external functions are referenced, and some of them are often used in malware. In particular, the ptrace-sleep combination is used in order to make the process sleep forever if a debugger is attached.
The other noticeable combination of functions is the getpagesize-mprotect-memset. My personal experience with these functions is from self-modifying malware, where the getpagesize function is used to calculate the number of pages required by the malware and mprotect to mark the malware’s memory segment as RWX, followed by the malware decrypting itself and then executing the newly created code. Let’s see what this one does!
–[ IDA and GDB ]
Running this sample in the command line and gdb produces two very different results. In particular, the gdb environment produces no result as the process continuously sleeps and only terminating if we manually send it a SIGTERM signal! Let’s momentarily turn to IDA.
Since this is a stripped binary, we’re going to have to do a lot of renaming! Rather than trace for any anti-debugging from the main function, I decided to go for the easier option of locating references to ptrace. Doing this led to the function in location 0x400ADF, a function hidden in the init_array (relevant post)! Lo and behold, our anti-debugging function!
Bypassing this anti-debugging trick would simply involve us breaking on test eax, eax and setting eax to 0! Looking at another promising function found in the .init_array, we find this:
There are a number of things to note on this figure, and I hope that my comments and renaming make sense! Firstly, the main function wasn’t in the binary’s symbols, since it is a stripped binary, but instead I located it by the technique I describe here.
Secondly, it looks like our assumptions check out! This function seems to be morphing main’s code. In particular, the basic block labeled as decryption_loop is doing a lot of indexing into main, followed by an XOR from some memory location (byte_6020C0) and writing the result back in main’s memory region, replacing the byte that it just read.
At this point I think we can answer a number of other questions as well.
So the creator took his time to remove any clues, but what about dynamic analysis?; well, he did leave the .comment section in, but no other meaningful strings! Patching the class field in the headers allows us to load this in a debugger, and patching the infinite-sleep allows us to finally run it in a debugger!
What is the purpose of the main function? At this point it looks like the real main is there in an encrypted form, since there is some modifying of its code! We will answer this in a future post!
–[ Next ]
Stay tuned for part two of this challenge 🙂
–[ Resources ]
 Executable and Linkable Format (ELF) – http://www.skyfree.org/linux/references/ELF_Format.pdf
 Beginners Guide to Basic Linux Anti Anti Debugging Techniques – https://www.gironsec.com/code/Linux_Anti_Debug.pdf
 Anti-Ptrace – http://www.julioauto.com/rants/anti_ptrace.htm
 In the lands of corrupted Elves – https://www.blackhat.com/docs/us-14/materials/arsenal/us-14-Hernandez-Melkor-Slides.pdf