CSAW CTF 2020 : Cuba Write-Up
Hi
This is my write up for the challange Cuba
of CSAW CTF 2020 :
So this challenge is a CUBA program wrapped in a Windows Executable. CUBA is a GPU langage created by NVIDIA to work around GPU with high performance langage.
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
Luckily there is a public SDK for it, with a disassembler :
https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html
Using a tool called cuobjdump
, we can extract the assembly code :
To extract ptx text from a host binary, use the following command:
cuobjdump -ptx <host binary>
And after reversing the output, we can see that it’s a simple xor looping through a ciphered flag
.global .align 1 .b8 $CORRECT[18] = {67, 79, 82, 82, 69, 67, 84, 32, 80, 65, 83, 83, 87, 79, 82, 68, 33, 0};
.global .align 1 .b8 $WRONG[27] = {87, 82, 79, 78, 71, 32, 80, 65, 83, 83, 87, 79, 82, 68, 44, 32, 84, 82, 89, 32, 65, 71, 65, 73, 78, 33, 0};
...
// load each xorred bytes on stack
mov.u16 %rs1, 99;
.loc 1 14 18;
st.u8 [%SP+30], %rs1;
mov.u16 %rs2, 103;
st.u8 [%SP+29], %rs2;
mov.u16 %rs3, 104;
st.u8 [%SP+28], %rs3;
mov.u16 %rs4, 122;
st.u8 [%SP+27], %rs4;
mov.u16 %rs5, 41;
st.u8 [%SP+26], %rs5;
mov.u16 %rs6, 113;
...
LOOP:
.loc 1 20 5;
mov.u32 %r4, %r21; // %r4 = i
mov.u32 %r3, %r20;
ld.u32 %r11, [%rd2]; // %r11 = FLAG_SIZE
setp.lt.s32 %p3, %r4, %r11; // %p3 = i < %r11
not.pred %p4, %p3; // %p4 = %p3 == 0
@%p4 bra FINAL_CMP; // if %p4 : jmp
bra.uni UNCIPHER;
UNCIPHER:
.loc 1 22 9;
cvt.s64.s32 %rd10, %r4; // %r4 = i
add.s64 %rd11, %rd1, %rd10; // %rd11 = &INPUT + 1
ld.u8 %rs25, [%rd11]; // %rs25 = INPUT[i]
cvt.r32.u16 %r14, %rs25; // unsigned -> signed
cvt.s32.s8 %r15, %r14; // %r15 = INPUT[i]
xor.b32 %r16, %r15, %r4; // %r16 = %r15 ^ i
cvt.s64.s32 %rd12, %r4;
add.u64 %rd13, %SP, 0; // %rd13 = &FLAG
add.s64 %rd14, %rd13, %rd12; // %rd13 = &FLAG + 1
ld.u8 %rs26, [%rd14]; // %rs26 = FLAG[i]
cvt.u32.u16 %r17, %rs26; // unsigned -> signed
cvt.s32.s8 %r18, %r17; // %r18 = FLAG[i]
setp.eq.s32 %p7, %r16, %r18; // %p7 = %r16 == %r18
not.pred %p8, %p7;
mov.u32 %r22, %r3;
@%p8 bra BB6_6;
...
bra.uni LOOP;
FINAL_CMP:
.loc 1 26 5;
setp.eq.s32 %p5, %r3, 31; // %p5 = %r3 == 31
not.pred %p6, %p5; // %p6 = %p5 == 0
@%p6 bra WRONG_PASS;
bra.uni CORRECT_PASS;
WRONG_PASS:
.loc 1 30 9;
mov.u64 %rd4, $WRONG;
cvta.global.u64 %rd5, %rd4;
mov.u64 %rd6, 0;
CORRECT_PASS:
.loc 1 27 9;
mov.u64 %rd7, $CORRECT;
cvta.global.u64 %rd8, %rd7;
mov.u64 %rd9, 0;
Then a decryption script :
xorred = [102, 109, 99, 100, 127, 104, 53, 52, 124, 86, 103, 56, 83, 100, 96, 80, 114, 125, 123, 99, 103, 74, 120, 72, 123, 113, 41, 122, 104, 103, 99]
for i in range(len(xorred)):
xorred[i] = xorred[i] ^ i
print(xorred)
And here is the flag : flag{m33t_m3_in_blips_n_ch3atz}
~r0da