Instructions
- Instructions are the building blocks of assembly programs.
- In x86 assembly Instructions has two components: Mnemonic and operands.
- Mnemonic is a word that identifies the instruction to execute, such as mov, which moves data.
-
Operand, used to identify the information used by the instruction, such as registers or data:
- Mnemonics have 0, 1, or more operands (arguments).
-
Operands can be:
- A register
- A memory location
- An immediate value (e.g, 0x6453).
Mnemonic |
Destination Operand |
Source Operand |
mov |
ecx |
0x42 |
Operation Codes (OpCodes) and Endianness
- Each instruction corresponds to operation code that tells the CPU which operation the program wants to perform.
-
Disassemblers translate operation codes to human-readable instructions.
Instruction |
mov ecx, |
x42 |
OpCode |
B9 |
42 00 00 00 |
- So the Opcode B9 42 00 00 00, translates to mov ecx, 0x42 in plain English, move the value 0x42 to the ecx register.
- X86 architecture uses the little endianness format, so 0x42 00 00 00, is treated as the value 0x42.
- Endianness of data describes whether the most significant (big-endian) or least significant (little endian) byte is ordered first at the smallest address of a larger data item.
- During network communication, malware must change between endianness, this is because network data use big-endian and x86 programs use little-endian.
- This means that the IP address 127.0.0.1 will be translated as 0x7F000001 in big endian format:
127/16 |
= |
7.9375 |
0.9375 x 16 |
= |
15 (F in hex) |
7 / 16 |
= |
0.4375 |
0.4375 x 16 |
= |
7 |
Answer |
= |
7F |
127.0.0.1 |
= |
0x7F000001 |
Therefore 127.0.0.1 would represented as 0x7F000001 over the network and 0x0100007F in little endian format (locally in memory).
Operands
Operands are used to identify the data used by the instruction and there are three different types of operands:
- Immediate operands are fixed values, such as the 0x42.
- Register operands refers to registers.
- Memory address operands, refer to memory address that contains the value of interest, typically denoted by a value, register or equation between square brackets, such as [eax].
Registers
- Small storage areas available to the CPU, whose contents can be accessed more quickly than storage available elsewhere.
-
X86 processors have a collection of registers available for use as temporary storage or workspace. The most common registers are of four categories:
- General registers used by the CPU during execution.
- Segment registers are used to track sections of memory.
- Status Flags used to make decisions.
- Instruction Pointers used to keep track of the next instruction to execute.
-
We will use registers to track arguments, variables, and function return values.
General Registers |
Segment Registers |
Status Registers |
Instruction Registers |
EAX (AX, AH, AL) |
CS |
EFLAGS |
EIP |
EBX (BX, BH, BL) |
SS |
||
ECX (CX, CH, CL) |
DS |
||
EDX (DX, DH, DL) |
ES |
||
EBP (BP) |
FS |
||
ESP (SP) |
GS |
||
ESI (SI) |
General purpose registers
General registers typically used to store data or memory addresses, all general registers are 32-bit in size and can be referenced as either 32 or 16 bits in assembly code.
These registers can be used in a consistent fashion throughout a program. The use of registers in a consistent way is known a convention and knowing the compilers convention provides analysts the ability to examine code more quickly and accurately.
- EAX: Used for additions, multiplications, and stores return values.
- EBX: often set to a commonly used value (such as 0) throughout a function to speed up calculations.
- ECX: Used as a function parameter or a loop counter.
- EDX:
used to reference all the full 32-bit register and DX is used to reference the lower 16 bits of the EDX register.
- EBP: Used to reference arguments and local variables.
- ESP: Points to the last item on the stack.
- ESI/EDI: Used by memory transfer instructions.
EAX, EBX, ECX and EDX can also be referenced as 8-bit values using the lowest 8-bits or the second set of 8 bits. For example, AL used to reference the lowest 8 bits of EAX register and AH used to reference the second 8 bits.
Segment (special Use) registers
Used to hold flags and track program execution.
- EIP: points to the next instruction to execute.
- EFLAGS: bits represents the outcome of computations, and they control CPU operations.
-
Segment registers:
- CS: code segment
- DS: Data segment
- SS: Stack segment
Flags
The EFLAGS register is a status register in an x86 architecture. It is a 32 bit in size and each bit is a flag. During execution each flag is either set (1) or cleared (0) to control CPU operations or indicate the results of CPU operation. There are 4 flags that are of interest to a malware reverse engineer.
- ZF: zero flag is set when the results of an operation is zero and cleared when the results is not zero.
- CF: Carry flag is set when the results of the operation is too large or too small for the destination operand, otherwise it is cleared.
- SF: sign flag is set when the operation results is negative and cleared if the result is positive. This flag is also set when the most significant bit is set after an arithmetic operation.
- TF: Trap flag is used for debugging. The x86 architecture will execute only one instruction at a time when this flag is set.
Instruction Register (EIP)
EIP is also known as the instruction pointer or program counter. It is a register that contains the memory address of the next instruction to be executed for a program. It basically tells the process what do next. When the EIP is pointing to a memory address that does not contain the legitimate code is known as corrupted EIP pointer. When this happens the program crashes.
Attackers often attempt to gain control of the EIP by the use of exploits, when they succeed, they are able to control what code is executed by the CPU.
Simple Instructions (Intel)
Mov – used to move data (Read/Write) into memory from one location to another. It can move data from registers or RAM.
mov destination, source
When an operand is surrounded by square brackets [] they are treated as memory references to data.
For example:
mov eax, [ebx]
this means, move the data at memory address ebx to eax.
Another instruction that is like mov is lea (load effective address), lea destination, source.
Instruction |
Description |
Mov eax, ebx | Copy content of ebx into eax |
Mov eax, 0x42 | Copy the value 0x42 into eax |
Mov eax, [0x4037c4] | Copy the 4 bytes at memory location 0x4037c4 into eax |
Mov eax, [ebx + esi * 4] | Copy the 4 bytes at memory location specified by the results of the equation ebx + esi * 4 into eax register. |
Arithmetic
Addition – add destination, source
Subtraction – sub destination, source
The subtraction operation modifies two important flags:
- ZF, set if result is zero
- CF, set if destination is less than the value subtracted.
Increment register by one – inc edx
Decrement register by one – dec ecx
Multiplication and division
Multiplication and division both act on a predefined register. The command is simply the instruction, plus the value that the register will be multiplied or divided by.
mul value
div value
The assignment of the register on which the mul or div instruction acts on occur many instructions earlier, so you might need to search for it within the program to find it.
The multiplication instruction always multiplies the eax by a value. Therefore, EAX must be setup appropriately before the multiplication occurs. The result is stored as 64-bit value across two registers, EDX and EAX.
EDX stores the most significant 32-bits of the operation and EAX stores the least significant 32-bit of the operation.
The division instruction does the same as the multiplication, except in the opposite direction:
- It divides the 64 bit across EDX and EAX by value.
- EAX and EDX most be setup appropriately before the division.
- The results of the division is stored in EAX and the reminder is stored in EDX.
- mul 0x50, multiplies EAX by 0x50 and stores the results in EDX : EAX.
- div 0x75, divides EDX : EAX by 0x75 and stores the results in EAX and the reminder in EDX.
Logical operators
OR, AND and XOR operates in the same way as subtraction and addition. They perform the specified operation between the source and destination operands and stores the results in the destination.
Shifting Registers
- shr (shift right) – shifts the bits in the destination operand to the right by a number of bits specified in the count operand.
- shl (shift left) – as above
- Bits shifted beyond the destination boundary are first shifted into the CF flag and zero bits are filled in during the shift.
The rotation instruction (rotate right (ror) and rotate left (rol))
- ror – rotate right, the least significant bit is rotated to the most significant position.
- rol – rotate left, the opposite as ror.
Instruction |
Description |
xor eax, eax | Clear EAX |
or eax, 0x7575 | Logical or on EAX with 0x7575 |
mov eax, 0xA shl eax, 2 |
Shift the EAX register to the left 2 bits; these two instruction results in EAX = 0x28, because 1010 (0xA) shifted 2 bits to the left is 101000 (0x28). |
mov bl, 0xA ror bl, 2 |
Rotate the bl register to the right 2 bits; these two instructions results in BL= 10000010, because 1010 rotate 2 bits right is 10000010 |
During malware analysis, if you see a function containing only the instructions XOR, OR, AND, SHL, SHR or ROL repeatedly and randomly, you have encountered an encryption or compression function. The recommendation is not to get to occupied into trying to analyse it unless it is absolutely necessary, instead mark it as encryption/compression function and move on.
NOP (No operation)
The NOP instruction simply does nothing, it proceeds to the next instruction. One of the techniques that malware analysts use to bypass malware defensive mechanisms is to utilize this instruction when performing code analysis, we will investigate such techniques in the upcoming parts of this series.
- Nop, is pseudonym for xhcg (exchange) eax, eax. Since it is exchanging with itself it basically does nothing.
- The OpCode for the no operation (nop) instruction is 0x90.
The Stack
- Memory for functions, local variables and flow control is stored in the stack, which is a data structure characterized by pushing items into the stack and popping items off the stack.
- It uses Last In First Out (LIFO) mechanism.
- The x86 architecture uses the ESP and EBP registers to support the stack mechanism.
- ESP, is the stack pointer and typically contains a memory address that points to the top of the stack.
- The value of this register changes as items are pushed onto or popped out of the stack.
- EBP, is the base pointer that stays consistent within a given function, so that the program can use it as placeholder to keep track of the location of local variables and parameter.
-
The stack instructions include:
- Push
- Pop
- Call
- Leave
- Enter
- Ret (return)
- The stack is allocated in the top down format in memory, the highest addresses are allocated and use first. As values are pushed into the stack, smaller addresses are used.
- The stack is used for short term storage only and frequently stores local variables, parameters and return address.
- The stacks primary usage is for the management of data exchanged between function calls.
Function Calls
Many functions contain a prologue, which is a few lines of code at the start of the function. The prologue prepares the stack and registers for use within the function. An epilogue at the end of the function is used to restore the stack and registers t their state before the function as called.
The most common implementation of function calls is:
- Arguments placed onto the stack using push instruction.
- A function is called using the call memory location instruction. This causes the current memory address (the content of the EIP register) to be pushed onto the stack. This address will be used to return to the main code when the function is finished.
- Using function prologue, space is allocated on the stack for local variables and EBP (the base pointer) is pushed onto the stack.
- This is done to save EBP for the calling function.
- Using epilogue, the stack is restored. ESP is adjusted to free the local variables and EBP is restored so that the calling function can address its variables properly. The leave instruction can be used as an epilogue because it sets ESP to equal EBP and pops EBP off the stack.
- The function returns by calling the ret instruction. This pops the return address off the stack and into EIP, so that the program will continue executing from where the original call was made.
- The stack is adjusted to remove the argument that were sent unless they’ll be used again later.
There are additional instructions for the pushing and popping that are provided by the x86 architecture, the most popular of which are pusha and pushad.
These two instructions are used to push all the registers onto the stack and are commonly used with popa and popad, which pops all registers off the stack.
-
Pusha – pushes the 16-bit registers onto the stack in the following order:
- AX
- CX
- DX
- BX
- SP
- BP
- SI
- DI
-
Pushad – pushes the 32-bit registers onto the stack in the following order:
- EAX
- ECX
- EDX
- EBX
- ESP
- EBP
- ESI
- EDI
- These instructions are normally encountered in shell code when someone wants to save the current state of the registers to the stack so that they can be restored at a later time. Compilers rarely use these instructions, so seeing them often indicates someone hand coded assembly and/or shellcode.
Conditional Instruction
The two most popular conditional instructions are test and cmp (compare). The ZF is typically the flag of interest after the test instruction.
-
Testing something against itself is often used to test for NULL values.
- test eax, eax
-
cmp used only to set the flags CF and ZF.
- cmp destination, source
- destination = source ZF is set (1) and CF is not set (0).
- Dst < src ZF is not set (0) and CF is set (1).
- Dst > src ZF is not set (0) and CF is not set (0).
Branching
Branching describes the control of the flow of the program through the use of branches. The most popular branching instruction is the jump (jmp) instruction.
jmp location
The above causes the next instruction to be executed is the one specified by the jmp, which is known as unconditional jump.
Conditional jumps is used because in assembly there is no if statement. Therefore the conditional jump uses the flags to determine whether to jump or to proceed to the next instruction. There are more than 30 conditional jumps that can be used, however only a small set of them is commonly encountered.
Conditional jump operations used on signed data used for arithmetic operations
Conditional jump operations used on unsigned data used for logical operations
Conditional jump operations that have special uses and checks the values of the Flags
Ref: https://www.tutorialspoint.com/assembly_programming/assembly_conditions.htm
REP instruction
Rep instructions are used to manipulate data buffers. They are also called string instructions and usually in the form of an array of bytes but they can also be single or double words. Most common data buffer instructions are:
- Movsx
- Cmpsx
- Stosx
- Scasx
Where x = b, w, d for byte, word or double respectively.
The ESI and EDI registers are used for these operations.
- ESI is source index register
- EDI is destination index register
- ECX is used as a counting varable.
- These instructions require a prefix to operate on data lengths greater than 1.
- Movsb, instruction will move only a single byte and does not utilize the ECX register. It is used to move a sequence of bytes from one location to another.
- rep repeat until ECX = 0
- repe/repz repeat until ECX = 0 or ZF = 0
- repne/repnz repeat until ECX = 0 or ZF = 1
The rep instruction increments the ESI and EDI offsets and decrements the ECX register.
The DF (direction flag) is rarely flipped in a compiled c program, however in a shellcode the direction is usually flipped to enable the storage of data in reverse direction.
The cmpsb, used to compare two sequences of bytes.
The scasb, used to search for a single value in a sequence of bytes.
The stosb, used to store values in a location specified by EDI.
C main Method and offset
A standard C program has two arguments for the main method, typically as follows:
- int main(int argc, char** argv)
- The parameters argc and argv are determined at runtime.
- Argc parameter is an integer that contains the number of arguments on the command line including the program name.
- Argv is pointer to an array of strings that contain the command line arguments.
Example:
filetestprogram.exe -r filename.txt
The above commandline will result in the following argc and argv when the program runs:
argc = 3
argv[ 0] = filetestprogram.exe
argv[ 1] = -r
argv[ 2] = filename.txt
the C code for this:
int main( int argc, char* argv[])
{
if (argc != 3)
{
return 0;
}
if (strncmp( argv[ 1], “-r”, 2) = = 0)
{
DeleteFileA( argv[ 2]);
}
return 0;
}
The assembly code for the above C program is:
004113CE cmp [ebp + argc], 3
004113D2 jz short loc_4113D8
004113D4 xor eax, eax
004113D6 jmp short loc_411414
004113D8 mov esi, esp
004113DA push 2 ; MaxCount
004113DC push offset Str2 ; “-r”
004113E1 mov eax, [ebp + argv]
004113E4 mov ecx, [eax + 4]
004113E7 push ecx ; Str1
004113E8 call strncmp
004113F8 test eax, eax
004113FA jnz short loc_411412
004113FC mov esi, esp
004113FE mov eax, [ebp + argv]
00411401 mov ecx, [eax + 8]
00411404 push ecx ; lpFileName
00411405 call DeleteFileA