64-bit Linux Shellcode

Writing shellcode isn’t fundamentally different from writing ordinary assembly. If you can get an assembly routine to run on a given architecture, it’s usually not difficult to convert it to runnable shellcode. In fact, before going through the motions presented in this tutorial, I had never written any shellcode. From a low-level programming perspective, I found the challenges and limitations presented by shellcoding to be interesting, so I gave it a shot.

Most of the shellcode on the internet right now is targeted toward 32-bit POSIX-compliant computers, and it will usually work on 64-bit systems without modification. This post outlines an efficient process for writing, testing and finalizing shellcode specifically targeted to 64-bit Linux systems. This process could serve as a template for porting shellcode between different architectures.

Writing 64-bit shellcode

  1. Exit shellcode (sys_exit)
  2. Testing shellcode with stub.c
  3. “Hello World!” shellcode
  4. Actual, shell-spawning shellcode

This guide assumes you have nasm, nasm2shell (tarball here) and a basic understanding of assembly and syscalls. If you don’t, get started here. If you’ve written shellcode before, you may want to skip the sys_exit section. The files referenced in this post are in a zip file here. The zipfile also includes some bonus material. If you like this guide, be sure to check out the guide to rolling your own 64-bit reverse TCP shellcode.

sys_exit(return_code)

The sys_exit routine is one of the simplest fully complete programs. After being invoked, it exits (that’s it). To call sys_exit, place the correct syscall code in the RAX register and the exit code in the RDI register, then make a syscall:

If you’re on a 64-bit Linux machine, the sys_exit syscall on your system is probably 60. But what if you’re not on a 64-bit Linux system, or you’re trying to port this to another *nix platform? On POSIX systems, you can find the exit() syscall in the unistd.h header file that corresponds to your architecture. unistd.h can be found in the include/asm folder, generally in /usr/include/asm. If compilation is being done for a 64-bit architecture, unistd.h includes unistd_64.h, which defines all of the kernel syscalls. It defines sys_exit and its aliases as follows:

So the exit syscall is 60. In unistd_32.h, which defines syscalls for 32-bit architectures, many of the syscalls are identical to their 64-bit counterparts but sys_exit is 1:
#define __NR_exit 1
. Syscall 1 on 64-bit architectures is sys_write, so it’s important to double-check when porting assembly from one architecture to another!

The sys_exit shellcode

To turn assembly into shellcode, compile it to binary and convert the binary into a properly-escaped hexadecimal string. Traditionally, people converted binary to shellcode using objdump -D or a hex editor, but nasm2shell makes the process much simpler:

Voila. The routine for sys_exit can also be compiled from nasm and linked into a runnable ELF executable with ld if you would like to test it:

echo $?
is the bashism to print a program’s exit code. If you change the value placed in RDI before the exit syscall, it will show up as your exit code.

Nulls

You may have noticed that our shellcode contains nulls bytes. If you’ve had any experience with C or shellcoding before, you probably know shellcode can’t contain nulls because null bytes are used to mark the end of strings. We need to change our program so the machine code doesn’t contain any x00 bytes. In this case, it’s pretty easy:

No nulls. This shellcode is ready to be tested so you can submit it to Packetstorm.

Testing shellcode with stub.c

To double-check that a shellcode will run in the context of an actual executable, copy and paste it into the following c stub file:

This stub is an updated version of the classic shellcode test stub, with one key difference: In the new stub, the shellcode is #defined at compile-time so it can be placed directly into the main routine by gcc’s preprocessor. This is necessary because over time, Linux and GCC have become much more cautious about which sections of an executable file can contain executable code (opposed to non-executable variables). The traditional version of the program won’t work on newer versions of Linux:

The classic shellcode c stub will generate a segfault on newer systems because the shellcode[] character array is stored in the explicitly non-executable .rodata section of the ELF file. When the computer recasts the non-executable array as a function and tries to run it, the program crashes.

“Hello World!” 64-bit shellcode

Printing a string is a little more complicated than populating registers and issuing a syscall. sys_write needs the address of a string in order to print it. In an executable, the address of the string is predetermined and we can easily refer to it into our routine. Since we don’t know where our string will be located in memory when the shellcode runs, we’ll need to figure out the string’s address at runtime before we can print it with sys_write:

There are two traditional ways to do this. We can push the string itself onto the stack and pass the stack pointer to sys_write (an unconventional use of the stack, but more on this later) or we can jump to the address before the string and issue a call instruction. When a call instruction is executed, it pushes the subsequent address onto the stack before jumping to a new location. If we jump to a call instruction situated right before our string, it will put the address of our string on top of the stack before jumping to our sys_write routine. We can then access the string address by popping it off the stack and feeding it to sys_write like so:

As you can see, the first instruction jumps to a call, which puts the string address on top of the stack before jumping to run. run then pops the string address, prints it and exits.

You may have noticed that the code snippet includes “BITS 64″ and “GLOBAL _start.” These NASM directives allow us to compile the routine as an executable for testing:

The same file can also be dumped straight to shellcode if it works:

Pasting the shellcode into stub.c should produce the same output.

RIP relative addressing: the new method

Jumping to a call or pushing values onto the stack are the traditional ways to get addresses in shellcode. The advent of 64-bit processors also brought RIP relative addressing, which allows us to access memory relative to the instruction pointer, RIP. By nature, RIP relative addressing is position-independent and therefore well suited for shellcode.

RIP was a natural choice for printing “Hello World,” but when I did it, it generated nulls so I shelved it because work came up. Then reader Ca0s made a comment suggesting I place the string such that its RIP offset would be negative. Sure enough, the twos-complimented (negative) offset took care of all the nulls. Ca0s’s solution is here, and my adaptation is very similar:

The routine is straightforward. It jumps over the message and gets the address of message relative to RIP. The “rel” keyword explicitly designates RIP relative addressing, which can also be turned on globally by adding the NASM directive “DEFAULT REL” to the top of your file. It’s important to make sure that relative addressing is actually being used. If NASM interpreted the lea rsi, [message] instruction in its default absolute addressing mode, it would code the absolute address of message into to shellcode, which would cause a segfault or unpredictable behavior.

Despite having only 13 instructions, at 58 bytes, the RIP-relative shellcode is a little heavier than the other options. The disconnect between instruction count and compiled program length is a characteristic of CISC architectures.

Actual, shell-spawning 64-bit shellcode

The “Hello World!” and sys_exit examples may be instructive, but they don’t have much practical use. We want a shellcode that actually produces a shell. The following c program demonstrates the functionality we want:

execve is a c function that wraps a system call of the same name. It takes three arguments: a string that represents the path to an executable file, a pointer to an array of argument strings and a pointer to an array of environment variable strings in key=value format:
int execve(char *filename, char *argv[], char *envp[]);
You’re not supposed to pass null arguments to execve but I always have, and it’s always worked fine for me. It simplifies writing shellcode and the result is more compact. Before we write shellcode to run execve, remember that most Linux environments will drop permissions for spawned processes like execve. For obvious reasons, running new processes with the lowest-necessary permissions is an advisable practice.

We want to be sure our execve process is started with the highest-possible privileges, so we run setreuid first. setreuid sets the real and effective userid for a process. If we drop our shellcode into a process running as suid root, we want our shell to run as suid root, too! setreuid takes two arguments, the real userid and the effective userid. The function prototype looks like this:
int setreuid(uid_t ruid, uid_t euid);
. unistd_64.h shows that the syscall for setreuid is 113, so the assembly routine for setreuid will look like this:

Now we can run execve. The string argument for execve is treated like the argument for “Hello World!” in sys_write, except instead of getting an explicitly-defined string length, execve expects the string to be null-terminated. Since we can’t include the actual null byte in the shellcode, we need to get creative. The result looks like this:

As in the “Hello World!” example, we jump to a call before our string, which puts the string’s address on top of the stack. Then we zero out RAX and add 59 (the syscall for execve). The integer 59 is only one byte long, which means that the other 7 bytes of RAX are all zero, including AH (59 only occupies AL). Byte-addressed registers like AH and AL are still available on 64-bit systems, and they can come in handy when you only need to access one byte. Since AH contains a null byte, we move AH onto the end of our “/bin/sh” string so it takes the place of N. Our shell string is now null-terminated, and we’re ready to run execve.

Testing

If you have a sharp eye, you may have noticed that the above code snippet puts our executable code in the .data section. If you thought, “Isn’t executable code supposed to be in the .text section?” you thought correctly. However, the .text section of an ELF file is usually read-only. If the linker put our code in the .text section, the program would crash when we try to modify our /bin/shN string to add a null-terminator. Since code in the .data section is r/w and executable if necessary, the program runs and modifies itself without a problem. Try it out:

Sure enough, the shellcode spawns a shell. For the same reasons that led us to the .data section hack, putting the shellcode into a c stub and running it is a little trickier. You can make it work if you change the flags in the ELF file’s headers to allow r/w/x in the correct sections, but the actual process is left as an exercise for the reader. If you want to try it or you need to change the headers of an ELF file in the future, I highly recommend HT Editor. It’s a great program for viewing and editing the innards of many different executable file formats.

Our finished 64-bit Linux shellcode looks like this, and weighs in at 49 bytes. Not too shabby. If you don’t need the setreuid feature, you can remove it to make the shellcode even smaller.

Bonus

Jumping to a call before a string is one way to get a string’s location in memory. Another way to get a string’s address is to push the actual string onto the stack:

64-bit registers can hold eight bytes, or eight ASCII characters. This is just enough to contain the string “/bin/sh” and an extra, unimportant character (here, 0xff). Since Intel stores bytes in little-endian order, we put “/bin/sh”, 0xff into the register backward, like this: 0xff, hs/nib/. Since we need to put a null byte in the position of 0xff, we shift the string left 8 bits (one character, the width of 0xff) and then shift it back to the right 8 bits. This effectively replaces 0xff with nulls. Then we push our null-terminated string onto the top of the stack and pass the stack pointer to execve as the address of our string.

The push method method doesn’t contain any jumps or calls, in case that matters to you. Although the resulting shellcode has fewer instructions, it is three bytes larger. If either of these things matter to you in practice, please send me an email because I would love to hear why.

RIP-relative shellcode

In the same way that RIP-relative addressing simplified our “Hello World!” shellcode, RIP-relative addressing can be used in our shell-spawning routine:

Once again, the first instruction jumps over our data (the shell string), runs through the routine and then gets the address of the shell string using the lea Rd, [rel data] instruction, which loads the RIP relative address of data into register Rd. After the address is established, the execve syscall is issued and the program executes.

At 50 bytes, the RIP-relative shellcode is a close second in terms of size and best-yet in terms of instruction count, with only 13 instructions.

Files

All of the files for this post are in this zip file. The zip contains a bunch of DVD extras-style bonus material, including the source code for the 32-bit counterparts of the 64-bit routines and c stubs for testing.

Calling home

If you made it this far, check out the guide to rolling your own 64-bit reverse TCP shellcode.

Further Reading

Packetstorm’s Shellcode Archive

Shell-storm.org’s Archive

Incoming search terms:

5 thoughts on “64-bit Linux Shellcode

  1. On 64 bits you can use RIP relative addressing, so you don’t have to do the jmp call trick or push data into the stack.

    • This is true. RIP addressing is ideal but a bunch of work came up so I shelved the RIP version until I could figure out how to make it work without nulls. FWIW, the functional but null-producing version looks like this right now:
      ; sys_write(stdout, message, length)
      add rax, 1 ; sys_write
      add rdi, 1 ; stdout
      lea rsi, [rel message] ; message address
      add rdx, 14 ; message string length
      syscall

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">