Back to my home page

Analysis of the FBI Tor Malware

Gareth Owen, University of Portsmouth

Files

Background

The Tor network is an anonymising network that allows people to browse the web and access other services without being traced. As part of this network, there is the so called 'darknet', servers only accessible through Tor which host a variety of services from forums to e-mail. Whilst many of these services are innocent and aimed at those concerned about Human Rights abuses, the anonimity naturally attracts those with criminal intent such as the distribution of child pornography. It's then impossible for law enforcement agencies to trace the original IP address.

In 2013, a piece of malware was found embedded in Freedom Hosting's darknet server that would exploit a security hole in a particular web browser and execute code on the user's computer. This code gathered some information about the user and sent it to a server in Virginia and then crashed - it had no obvious malicious intent that is so characteristic of malware. It was therefore theorised that the FBI, who have offices in Virginia, and who have 'form' for writing malware, may have authored it - this now appears to be true. UPDATE: Confirmed authored by FBI with codename EgotisticalGiraffe.

Reverse Engineering the Code

The exploit

The exploit is written in javascript and exploits a known bug with a specific version of firefox (a prebundled tor firefox). The exploit is highly obfuscated but a quick scan through it reveals a long string of hex characters with the call opcode visible in the first few bytes (often a jump or call will be near the start of shellcode, so knowing the opcodes for these makes shellcode easy to identify). I wont analyse the exploit here, but we will look at the shellcode. Firstly, let's set the scene with some basic shellcode principles.

Position Independant Code

Shellcode has unique challenges to successful execution because it is directly injected into a process rather than launched by the Windows Loader. Hence, the shellcode has no idea where it is located in memory, and crucially, does not know where the standard windows API functions are located (the Windows Loader normally tells an application this).

Hence, we have to use a series of tricks to get this information. The FBI malware uses a very common trick to find its memory location:

call start
start:
pop ebp

The call function moves the execution to the start label but also pushes the location onto the stack (so that we can return from the call later). Here, we're going to abuse this, and steal the location from the stack, popping it into the ebp register. We now know the location of the next label, and can use it for accessing the data associated with our shellcode

Locating Windows APIs

As the Windows Loader normally loads the location of the Windows API into our program, we don't have the luxury of knowing where this information is when we're operating as shellcode. A common trick for finding API functions is to look at the Thread Information Block pointed to by the fs segment register. We can pass this structure to locate the DLLs that were loaded with our host program, and then go through the exports for the DLL until we find the desired function. Naturally this is tedious, so, the FBI shellcode uses a library included in the Metasploit Framework which is a function resolver written by Stephen Fewer. It works as follows:

push arguments
...
push FUNCTIONHASH
call <Stephen'sResolver>

The function hash is generated by following a simple hashing algorithm on the name of the function we want to call. It's not intended to obfuscate the reading of the code (although it achieves this aim) but merely to allow us to specify functions with a 32bit dword rather than a long string (in shellcode space is often limited). Thankfully, we can calculated the hashes ourselves, or use a lookup table someone else has generated

Startup

If we disassemble the start of the shellcode this is what we get:

Once we have worked out that the ebp register points to Fewer's API resolver, we realise the long hex numbers before the calls are actually hashes to the windows API calls. So, if we look that up in the tables, follow some of the data around, look up the function calls, and generally add lots of comments, this is what we get:

The code performs a sanity check to make sure the shellcode is safe to run by checking the start of a HTTP request header begins with GET. It then uses the Windows API LoadLibrary() call to load two DLLs, ws2_32.dll (Windows Sockets Library - for internet comms) and iphlpapi.dll (IP helper library).

Conecting to a HTTP server

After the requisite libraries are loaded, the shellcode then does the following:

Again, we go through the same steps, that hash refers to a Windows API: the connect() function. We also see, that the data at [ebp+0x2e1] is passed as a parameter to the connect() function - we know from the manual, this is a sockaddr structure. We know that ebp is pointing to where we are in memory, using that address, plus the offset, we can locate the data in memory too at 0x2E8 (ebp = 0x7).

So, before we analyse the sockaddr, let's add some comments to the code, name some memory offsets and see what we get:

The eax register contains the return from the connect call, if this is zero (according to Microsoft manuals) then the connect() succeeded. But where are we connecting to? If we now add some comments to our earlier hexdump, based on what we can see about sockaddr in the microsoft manuals, we get:

A quick whois of the IP address gives us very little information:

Gathering User Information

Next, the malware tries to get the windows hostname - typically this will be the name of the windows machine. It might be helpful in idenifying the suspect and confirming they've got the right person once arrested.

Next, it enumerates that hostname into the IP addresses that the computer is using.

It then abuses the SendARP() function, which is usually used for discovering the MAC addresses of other computers to find our MAC address instead. There are 'proper' ways to do this, but given the limited space available to shellcode this achieves the job. The MAC address will tie the user to a particular network card, which they may be able to trace through the supply chain.

Finally, it constructs the HTTP header, putting the MAC address in the Cookie: and the user's hostname in the Host: of the HTTP header, and send() it as a GET request to http://65.222.202.54/05cea4de-951d-4037-bf8f-f69055b279bb. The significance of the hex numbers is unknown, they may have been chosen arbitrarily or may link a user to a particular access to the server (to completely connect the dots at court).

The final stage

The final stage of the shellcode's sole purpose is to execute more shell code at the end of this shell code - it goes in a rather roundabout way to do this and I'm not really sure why - maybe it's a crude attempt at obfuscation.

So, here's how it does it. First it uses some string length operations to find some code embedded in the otherwise all data section. That code works out the location for the end of our HTTP request string, skips through all the nulls to the no-op instruction at the end of the shellcode, and then jumps there. What's there? who knows! I'm told there's more shell code (that's not that important) but I haven't yet had time to debug the exploit and obtain it.

Construct header, then jump to code at end of header.

Search through nulls at end of http header and jump to final no-op.

Running the code

So far, I've analysed the code in an entirely static way - largely for completeness. It's much quicker to analyse it by running it and also allows us to confirm that our analysis is correct. In this case, the malware doesn't do anything nasty so we're safe to run it outside of a VM. So let's run it and observe the exact data sent to the FBI. Because shellcode is not an exe file, I need a shellcode launcher to run it - they're quick to write yourself needing only to allocate some memory, load the shellcode and jump to it. Here's mine - it'll automatically breakpoint before calling the shellcode.

We then launch the debugger and step through the code just until the connect call. We have to point the shellcode at another machine than the FBI server because it's down. So, I'll point it at 192.168.0.254 on port 77 and then run netcat on that machine to capture the output. Here's the modified sockaddr struct from earlier - patched live while the code was paused.

We then continue stepping through the code right up to the send() call, let it execute, and then we can see the output in my netcat terminal, which will output everything that is received - showing what was sent to the FBI. You can see, the ID cookie contains my MAC address (blurred out of paranoia) and the Host header contains the name of my desktop.

Finally, we step through to the final stage - the noop at the end which presumably feeds onto more shellcode that I've yet to extract. You can also see in the hexdump window the constructed HTTP request that was sent.

Conclusions

The malware phones home with identifying information from the user's computer and then crashes the firefox browser. In terms of sophistication, it's nothing special with no obfuscation and no new tricks that arent widely known other than the exploit.