For security testing purposes only

Document files and PDFs as a method of inserting exploits are one of the common features across publicly-disclosed targeted attacks, with the non-targeted incidents generally involving links to web pages hosting malicious code. Here I’m focussing on what appears the very first stage of a typical targeted attack (after the recce and intel gathering).

Technical Background

An operating system must be capable of identifying different file types, in order to know which application to process it – when the user clicks on a .doc file, the OS just knows to open it with LibreOffice or Microsoft Word. What the user then has is a running application and a file loaded into memory – the process either linking to the file’s data structure or loading it into its address space.

The other concept to understand is the role of file extensions. We give a Word document the .doc extension purely to identify it as such, and so the desktop GUI gives it the appropriate icon. Remove the file extension and the OS would identify its true file type. This is one feature that enables a baddy to trick a human user, and usually it works because most of us interact with computers through a GUI. So how does the OS know what a file actually is, without an extension?

Headers and File Internals

File types are actually defined by an initial sequence of bytes, which are sometimes referred to as the ‘file header’ or ‘magic number’. They can be seen by running the following command on a given file: $hexdump -n 50 (filename) In fact, this is how digital forensic software can determine whether images are being hidden using a false extension. In our case, the technique can be used to find whether malicious code is masquerading as a document or image. With executables, the first 125 byte seem to be an identifier, as a consequence of having a standard data structure and multiple headers.

When the file is opened, the OS determines which program/application should handle it by reading the first several bytes, and then initialises a process for that.

A Little Experiment

What we know so far is that it should be possible, in theory, to turn something like cmd.exe into a 'PDF' by changig the header bytes. The first thing to determine is what the file header is for a valid PDF, by opening two separate documents and isolating the initial bytes that are common to both files – any file with those bytes must be a PDF, right? I did this earlier in the command line, but GHex gives a different output for some reason.

Next step is to open cmd.exe in GHex, prefix its contents with the PDF header bytes and give it a name such as ‘testploit‘ (without an extension).

And there we go: our edited cmd.exe is now disguised as a PDF even on closer inspection. The file effectively should become a launcher for a Windows command prompt. It even passes itself off as a valid document when viewed in the properties window or scanned by VirusTotal.

There’s a much faster way, using Metasploit to create malicious PDFs complete with exploits for Adobe Reader.

Embedded Functions

What I’ve described so far is pretty amateur – a recipient would know something’s up if an actual document fails to materialise, plus it’s obvious if an .exe program is launched. In targeted attacks both the email and the attachment would be carefully tailored, to ensure that both are convincing and innocuous enough not to raise any suspicion.
It doesn’t even have to be that targeted – a fake brochure emailed to someone who attended a major marketing event (such as InfoSecurity Europe) would work, or perhaps a 'mislaid' USB drive containing a PDF with an interesting filename, and I'm guessing that most people don't habitually update their versions of Adobe Reader. The recipient would open the doc, hit the delete button and think nothing of it, by which time the payload would have done its job.
I created a basic PDF document, then used the $strings command to view its structure.

The /OpenAction string looks most promising. According to Tim Xia at Websense, this field can be used to cause a JavaScript action to run when the file is opened, JavaScript exploits being associated with a ‘heap spraying’ technique that could provide a way around Microsoft’s Address Space Layout Randomisation. The presence of JavaScript doesn’t necessarily mean there’s actually an exploit, though.
In the Websense analysis, ‘this.(function)‘ was placed in the /OpenAction field, with ‘>(function)‘ being a call to an object elsewhere in the file. I reckon both could be inserted into a PDF using a hex editor, using the same method I used for changing the file header bytes. The function could be anything – perhaps an exploit for a buffer overflow vulnerability within any of Adobe Reader’s functions, with a payload to fetch a malware installer.
The exploit creators went a couple of steps further, encoding the function and compressing it with zblib, but they still needed to reference it in the /OpenAction field.

Solutions

Of course, a policy of ‘don’t click shit!’ is always the first countermeasure that comes to mind, but if a hundred employees of a given organisation were sent a malicious attachment, it’s guaranteed that several of them will open it. Only one successful attempt is needed. I’d also argue that anyone could be made to open a malware-infected document if enough effort went into crafting the attack.

A security plan must take into account that people will open whatever attachments are mailed to them. Security then relies on: 1) Patching and exploit prevention, 2) Malware detection, 3) Preventing traffic between malware and a C&C server, 4) Detection and incident response.

Windows 7 and 8 users are in a relatively good position, as Microsoft works on the assumption that code vulnerabilities will always slip through the net, and decided to mitigate them with things like like ASLR and SafeSEH. There are ways around these, but they present an obstacle to getting an exploit to run. Patching Adobe Reader should also be effective, depending on whether the attackers are limited to stock exploits.
The Hong Kong CERT have recomended the use of alternative applications for reading PDFs and Microsoft Office documents, the idea being that users would be unaffected by exploits for Adobe/Microsoft. While it’s a good strategy in the short term, it’s more of a delaying tactic against an APT, and alternative applications would become vectors should they become popular.

Exploiting the Adobe PDF Reader

Several factors make Adobe Reader an attractive target for exploitation to get malicious code run on a target machine. The first is the application has many buffers that can be populated by loading a document. Adobe Reader can also be thought of as an interpreter, executing whatever valid code might be contained within a document, using functions that potentially have vulnerabilities. The biggest factor is the software is common to most desktop computers, giving the largest number of potential victims, a problem that’s exacerbated by web browsers that automatically load PDFs in a browser plugin after fetching them from web servers. The following two examples are from the exploits I found in the CRIMEPACK, Blackhole, Eleanor and Phoenix crimeware kits.

Collab.getIcon()

Discovered (or publicly disclosed) in March 2009, the Collab.getIcon() method/function vulnerability appears to be specific to Adobe Reader, and the exploit must be implemented as a JavaScript call to this function. According to the advisories the exploit is a typical stack overflow through a malformed call to getIcon(), and this allows arbitrary code execution – a typical way of changing the Instruction Pointer value to the address of some malicious code. An example of this and a copy of the vulnerable application (for Windows users) are available from the Offensive Security exploit database (number 9579). The exploit is also available as a Metasploit module.

We’re looking for two things within a malicious PDF: something that causes an exception, and a payload that executes when the exception occurs. So, if we run the strings utility on an example PDF from SecurityFocus… where the is the exploit? Where is the getIcon() request for that matter? The best place to start is by looking at the file’s structure and layout. PDFs are self-referencing, that is each section is an object marked by a reference number such as 10, 20, 30, etc. The contents of each object can also contain a reference to another object. In the SecurityFocus PDF, one section, 30, references some JavaScript in another section, 70R.



By the way, I’m using the SecurityFocus example because the references within the actual crimware PDF are also obfuscated. Looking further through the code, at the referenced object, some random characters are found. This, I believe, is the exploit and payload.

However, it’s unintelligible because the content of that section is compressed and obfuscated, which enables the malicious code to get past various intrusion detection/prevention methods. For a while it would also have made reverse-engineering tricky because the tools were less readily available. Obfuscated code is indicated by the ‘/Filter /FlateDecode‘ string.
To uncompress/decode this section, I used qpdf. Since this doesn’t work on the SecurityFocus sample, I ran qpdf on the actual crimeware PDF instead:
$qpdf --stream-data=uncompress geticon.pdf 3rdattempt.pdf
The output file contains the unobfuscated exploit and its payload, and the following section is instantly recognisable as an array buffering the payload. Of course, the payload is unreadable as the strings utility is attempting to convert hex shellcodes into ASCII text. To get those, the PDF must be run through a hex editor or something like Bokken.

Unfortunately the shellcode is OS-specific and I wasn’t using a Windows machine, so I didn’t analyse it further. What we do already know is the payload results in the installation of a banking Trojan. The payload was buffered as an array for the exploit code itself.

util.printf()

Problems with the printf() function in the C programming language are well-known. They aren’t necessarily caused by the developers of Adobe Reader, but instead it’s a vulnerability native to the C language, where the function doesn’t check the stack boundaries. Here the vulnrability might be in the C code underlying the JavaScript interpreter.
A full description and an exploit attempt is published on the CORE Security site, and that works by overwriting the Structured Exception Handler address with that of another location where the shellcode is placed. Again, the malicious code is only executed with the privileges of whoever’s running the PDF reader.

Conclusion

As both exploits were found in three crimeware kits, it’s obvious their authors targeted something common to most desktop computers – Adobe Reader. The versions of CRIMEPACK, Blackhole and Eleanor being examined were all created around the same period, so they were either sold by the same group, or the exploits were proven the most effective for circulation among the crimeware authors.

What’s the worst that can happen? Both exploits have already been out there for five years, and other vulnerabilities like them have been found since, so this post only gives a taste of what to expect in a crimeware kit.
The impact of a successful exploit here depends on the credentials the Adobe Reader application is running under. If it’s a standard user account with limited privileges, the exploit would lead to only that account being initially compromised, although privilege escalation is always possible afterwards. The latter is unlikely, though, as the crimeware has obviously been developed to automate things as much as possible, and the attacker would have many compromised admin accounts. If the user has admin privileges at the time the application is exploited, the payload has full control of the system.
As can be demonstrated using Metasploit, the payload could be anything, including a reverse shell for remote access or code that fetched a malware installer. The people behind the crimeware were counting on some victims being logged into the admin account, or on being able to escalate their privileges after the account was compromised.

References

CHANDEL, R. 2010. Hacking Articles. Hack Remote PC with Adobe Collab.getIcon() Buffer Overflow. [WWW]. www.hackingarticles.in/hack-remote-pc-with-adobe-collab-geticon-buffer-overflow/. 24th April 2017.

CORE SECURITY. 2008. Adobe Reader Javascript Printf Buffer Overflow. [WWW]. www.coresecurity.com/content/adobe-reader-buffer-overflow. 24th April 2017.

KREBS, B. 2010. Krebs on Security. A Peek Inside the ‘Eleonore’ Browser Exploit Kit. [WWW]. http://krebsonsecurity.com/tag/pdf-collab-geticon/. 24th April 2017.

MITRE. 2009. Common Vulnerabilities and Exposures. CVE-2009-0927. [WWW]. http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2009-0927. 24th April 2017.

SECURITYFOCUS. 2010. Adobe Acrobat and Reader Collab 'getIcon()' JavaScript Method Remote Code Execution Vulnerability. [PDF]. www.securityfocus.com/bid/34169/exploit. 24th April 2017.