Sandboxing Untrusted Python

(gist.github.com)

67 points | by mavdol04 1 day ago

11 comments

  • corv 1 day ago
    The gist dismisses sandbox-2 as “might as well use Docker or VMs” but IMO that misses what makes it interesting. The PyPy sandbox isn’t just isolation, it’s syscall interception with a controller in the loop.

    I’ve been building on that foundation: script runs in sandbox, all commands and file writes get captured, human-in-the-loop reviews the diff before anything executes. It’s not adversarial (block/contain) but collaborative (show intent, ask permission).

    Different tradeoff than WASM or containers: lighter than VMs, cross-platform, and the user sees exactly what the agent wants to do before approving.

    WIP, currently porting to PyPy 3.8 to unlock MacOS arm64 support: https://github.com/corv89/shannot

  • petters 1 day ago
    > Older alternatives like sandbox-2 exist, but they provide isolation near the OS level, not the language level. At that point we might as well use Docker or VMs.

    No,no, Docker is not a sandbox for untrusted code.

    • senko 1 day ago
      What if I told you that, back in the day, we were letting thousands of untrusted, unruly, mischievous people execute arbitrary code on the same machine, and somehow, the world didn't end?

      We live in a bizarre world where somehow "you need a hypervisor to be secure" and "to install this random piece of software, run curl | sudo bash" can live next to each other and both be treated seriously.

    • neoCrimeLabs 1 day ago
      It depends on your threat model, but generally speaking would not trust default container runtimes for a true sandbox.

      The kata-containers [1] runtime takes a container and runs it as a virtual host. It works with Docker, podman, k8s, etc.

      It's a way to get the convenience of a container, but benefits of a virtual host.

      This is not do-all-end-all, (there are more options), but this is a convenient one that is better than typical containers.

      [1] - https://katacontainers.io/

    • maple3142 1 day ago
      I don't think it is generally possible to escape from a docker container in default configuration (e.g. `docker run --rm -it alpine:3 sh`) if you have a reasonably update-to-date kernel from your distro. AFAIK a lot of kernel lpe use features like unprivileged user ns and io_uring which is not available in container by default, and truly unprivileged kernel lpe seems to be sufficient rare.
      • staticassertion 1 day ago
        The kernel policy is that any distro that isn't using a rolling release kernel is unpatched and vulnerable, so "reasonably up-to-date" is going to lean heavily on what you consider "reasonable".

        LPEs abound - unprivileged user ns was a whole gateway that was closed, io-uring was hot for a while, ebpf is another great target, and I'm sure more and more will be found every year as has been the case. Seccomp and unprivileged containers etc make a huge different to stomp out a lot of the attack surface, you can decide how comfortable you are with that though.

        • gruez 1 day ago
          >The kernel policy is that any distro that isn't using a rolling release kernel is unpatched and vulnerable, so "reasonably up-to-date" is going to lean heavily on what you consider "reasonable".

          I would expect major distributions to have embargoed CVE access specifically to prevent this issue.

          • staticassertion 1 day ago
            Nope, that is not the case. For one thing, upstream doesn't issue CVEs and doesn't really care about CVEs or consider them valid. For another, they forbid or severely limit embargos.
    • mavdol04 1 day ago
      You're right, Docker isn't a sandbox for untrusted code. I mentioned it because I've seen teams default to using it for isolating their agents on larger servers. So I made sure to clarify in the article that it's not secure for that purpose.
      • ottah 18 hours ago
        It depends on the task, and the risk of isolation failure. Docker can be sufficient if inputs are from trusted sources and network egress is reasonably limited.
    • ashishb 1 day ago
      Show me how you will escape a docker sandbox.
      • neoCrimeLabs 1 day ago
        This is a well understood and well documented subject. Do your own research.

        Start here to help give you ideas for what to research:

        https://linuxsecurity.com/features/what-is-a-container-escap...

        • ashishb 1 day ago
          > This is a well understood and well documented subject. Do your own research.

          Anything including GNU/Linux kernel can be broken with such security vulnerabilities.

          This is not a weakness in the design of containers. `npm install`, on the other hand, is broken by design (due to post-install.

          • neoCrimeLabs 1 day ago
            > This is not a weakness in the design of containers.

            Partially correct.

            Many container escapes are also because the security of the underlying host, container runtime, or container itself was poorly or inconsistently implemented. This creates gaps that allow escapes from the container. There is a much larger potential for mistakes, creating a much larger attack surface. This is in addition to kernel vulnerabilities.

            While you can implement effective hardening across all the layers, the potential for misconfiguration is still there, therefore there is still a large attack surface.

            While a virtual host can be escaped from, the attack surface is much smaller, leaving less room for potential escapes.

            This is why containers are considered riskier for a sandbox than a virtual host. Which one you use, and why, really should depend on your use case and threat model.

            Sad to say it, a disappointing amount of people don't put much hardening into their container environments, including production k8s clusters. So it's much easier to say that a virtual host is better for sandboxing than containers, because many people are less likely to get it wrong.

            • ashishb 1 day ago
              > Many container escapes are also because the security of the underlying host, container runtime, or container itself was poorly or inconsistently implemented.

              Sure, so running `npm install` inside the container is no worse than `npm install` on my machine. And in most cases, it is much better.

              • neoCrimeLabs 1 day ago
                Containers are more isolation than without. That was never in debate in our conversation.
        • coppsilgold 1 day ago
          Escaping a properly set up container is a kernel 0day. Due to how large the kernel attack surface is, such 0days are generally believed to exist. Unless you are a high value target, a container sandbox will likely be sufficient for your needs. If cloud service providers discounted this possibility then a 0day could be burned to attack them at scale.

          Also, you can use the runsc (gvisor) runtime for docker, if you are careful not to expose vulnerable protocols to the container there will be nothing escaping it with that runtime.

          • neoCrimeLabs 19 hours ago
            You start with the assumption of "properly set up container". Also I believe you are oversimplifying the attack surface.

            A container escape can be caused by combinations of breakdowns in several layers:

            - Kernel implementation - aka, a bug. It's rare, but it happens

            - Kernel compile time options selected - This has become more rare, but it can happen

            - Host OS misconfiguration - Can be a contributing factor to enabling escapes

            - Container runtime vulnerability - A vulnerability in the runtime itself

            - Container runtime misconfiguration - Was the runtime configured properly?

            - Individual container runtime misconfiguration - Was the individual container configured to run securely?

            - Individual Container build - what's in the container, and can be leveraged to attack the host

            - Running container attack surface - What's the running container's attack surface

            The last two are included to be complete, but in the case of the original article running untrusted python code makes them irrelevant in this circumstance.

            My point you must consider the system as a whole to consider its overall attack surface and risk of compromise. There is a lot more that can go wrong to enable a container escape than you implied.

            There are some people who are knowledgeable enough to ensure their containers are hardened at every level of the attack surface. Even then, how many are diligent enough to ensure that attention to detail every time? how many automate their configurations?

            Most default configurations are not hardened as a compromise to enable usability. Most people who build containers do not consider hardening every possible attack surface. Many don't even know the basics. Most companies don't do a good job hardening their shared container environments - often as a compromise to be "faster".

            So yeah, a properly set up container is hard to escape.

            Not all containers are set up properly - I'd argue most are not.

          • eyberg 1 day ago
            > Escaping a properly set up container is a kernel 0day.

            Not it is not. In fact many of the container escapes we see are because of bugs in the container runtimes themselves which can be quite different in their various implementations. CVE-2025-31133 was published 2? months ago and had nothing at all do with the kernel - just like many container escapes don't.

            • coppsilgold 1 day ago
              If a runtime is vulnerable then it didn't "set up a container properly".

              Containers are a kernel technology for isolating and restricting resources for a process and its descendants. Once set up correctly, any escape is a kernel 0day.

              For anyone who wants to understand what a container is I would recommend bubblewrap: https://github.com/containers/bubblewrap This is also what flatpak happens to use.

              It should not take long to realize that you can set it up in ways that are secure and ways which allow the process inside to reach out in undesired ways. As runtimes go, it's as simple as it gets.

            • theamk 1 day ago
              Note CVE-2025-31133 requires one of: (1) persistent container (2) attacker-controlled image. That means that as long as you always use "docker run" on known images (as opposed to "docker start"), you cannot be exploited via that bug even if the service itself is compromised.

              I am not saying that you should never update the OS, but a lot of of those container escapes have severe restrictions and may not apply to your specific config.

        • theamk 1 day ago
          Note this lists 3 vulnerabilities as an example: CVE-2016-5195 (Dirty COW), CVE-2019-5736 (host runc override) and CVE-2022-0185 (io_uring escape)

          Out of those, only first one is actually exploitable in common setups.

          CVE-2019-5736 requires either attacker-controlled image or "docker exec". This is not likely to be the case in the "untrusted python" use case, nor in many docker setups.

          CVE-2022-0185 is blocked by seccomp filter in default installs, so as long as you don't give your containers --privileged flags, you are OK. (And if you do give this flag, the escape is trivial without any vulnerabilities)

        • quotemstr 1 day ago
          This kind of response isn't helpful. He's right to ask about the motivations for the claim that containers in general are "not a sandbox" when the design of containers/namespaces/etc. looks like it should support using these things to make a sandbox. He's right to be confused!

          If you look at the interface contract, both containers and VMs ought to be about equally secure! Nobody is an idiot for reading about the two concepts and arriving at this conclusion.

          What you should have written is something about your belief that the inter-container, intra-kernel attacker surface is larger than the intra-hypervisor, inter-kernel attack surface and so it's less likely that someone will screw up implementing a hypervisor so as to open a security hole. I wouldn't agree with this position, but it would at least be defensible.

          Instead, you pulled out the tired old "education yourself" trope. You compounded the error with the weasely "are considered" passive-voice construction that lets you present the superior security of VMs as a law of nature instead of your personal opinion.

          In general, there's a lot of alpha in questioning supposedly established "facts" presented this way.

        • ranger_danger 1 day ago
          The burden of proof lies with the person making empirically unfalsifiable claims.
      • staticassertion 1 day ago
        Exploit the Linux kernel underneath it (not the only way, just the obvious one). Docker is a security boundary but it is not suitable for "I'm running arbitrary code".

        That is to say, Docker is typically a security win because you get things like seccomp and user/DAC isolation "for free". That's great. That's a win. Typically exploitation requires a way to get execution in the environment plus a privilege escalation. The combination of those two things may be considered sufficient.

        It is not sufficient for "I'm explicitly giving an attacker execution rights in this environment" because you remove the cost of "get execution in the environment" and the full burden is on the kernel, which is not very expensive to exploit.

        • ashishb 1 day ago
          > Exploit the Linux kernel underneath it (not the only way, just the obvious one). Docker is a security boundary but it is not suitable for "I'm running arbitrary code".

          Dockler is better for running arbitrary code compared to the direct `npm install <random-package>` that's common these days.

          I moved to a Dockerized sandbox[1], and I feel much better now against such malicious packages.

            1 - https://github.com/ashishb/amazing-sandbox
          • staticassertion 1 day ago
            It's better than nothing, obviously. But I don't consider `npm install <random-package>` to be equivalent to "RCE as a service", although it's somewhat close. I definitely wouldn't recommend `npm install <actually a random package>`, even in Docker.

            I also implemented `insanitybit/cargo-sandbox` using Docker but that doesn't mean I think `insanitybit/cargo-sandbox` is a sufficient barrier to arbitrary code execution, which is why I also had a hardened `cargo add` that looked for typosquatting of package names, and why I think package manager security in general needs to be improved.

            You can and should feel better about running commands like that in a container, as I said - seccomp and DAC are security boundaries. I wouldn't say "you should feel good enough to run an open SSH server and publish it for anyone to use".

            • ashishb 1 day ago
              > definitely wouldn't recommend `npm install <actually a random package>`, even in Docker.

              That's not the main attack vector. The attack vector is some random dependency that is used by a lot of popular packages, which you `npm install` indirectly.

              • staticassertion 1 day ago
                That doesn't change what I said. It definitely doesn't change what I said about docker as a security boundary.

                Again, it's great to run `npm` in a container. I do that too because it's the lowest effort solution I have available.

            • quotemstr 1 day ago
              > `npm install <random-package>` to be equivalent to "RCE as a service"

              It is literally that. When you write "npm install foo", npm will proceed to install the package called "foo" and then run its installation scripts. It's as if you'd run curl | bash. That npm install script can do literally anything your shell in your terminal can do.

              It's not "somewhat close" to RCE. It is literally, exactly, fully, completely RCE delivered as a god damn service to which you connect over the internet.

              • staticassertion 1 day ago
                I'm familiar with how build scripts work. As mentioned, I build insanitybit/cargo-sandbox exactly to deal with malicious build scripts.

                The reason I consider it different from "I'm opening SSH to the public, anyone can run a shell" is because the attack typically has to either be through a random package, which significantly reduces exposure, or through a compromised package, which requires an additional attack. Basically, somewhere along the way, something else had to go wrong if `npm install <x>` gives an attacker code execution, whereas "I'm giving a shell to the public" involves nothing else going wrong.

                Running a command yourself that may include code you don't expect is not, to me, the same as arbitrary code execution. It often implies it but I don't consider those to be identical.

                You can disagree with whether or not this meaningfully changes things (I don't feel strongly about it), but then I'd just point to "I don't think it's a sufficient barrier for either threat model but it's still an improvement".

                That isn't to downplay the situation at all. Once again,

                > that doesn't mean I think `insanitybit/cargo-sandbox` is a sufficient barrier to arbitrary code execution, which is why I also had a hardened `cargo add` that looked for typosquatting of package names, and why I think package manager security in general needs to be improved.

    • s_ting765 1 day ago
      Docker provides some host isolation which can be used effectively as a sandbox. It's not designed for security (and it does have some reasonable defaults) but it does give you options to layer on security modules like apparmor and seccomp very easily.
  • amluto 1 day ago
    The example is:

        @task(name="analyze_data", compute="MEDIUM", ram="512MB", timeout="30s", max_retries=1)
        def analyze_data(dataset: list) -> dict:
            # Your code runs safely in a Wasm sandbox
            return {"processed": len(dataset), "status": "complete"}
    
    This is fundamentally awkward in a language with as absurdly flexible a type system as Python. What if that list parameter contains objects that implement __getattr__? What if the output dict has an overridden __getattr__?

    Even defining semantics seems awkward, especially if one wants those semantics to simultaneously make sense and have any sort of clear security properties.

    edit: a quick look at the source suggests that the output is deserialized JSON regardless of what the type signature says. That’s certainly one solution.

    • mavdol04 1 day ago
      Yep, exactly.

      We stick to JSON to make sure we pass data, not behavior. It avoids all that complexity.

  • loeg 1 day ago
    > Python doesn't have a built-in way to run untrusted code safely. Multiple attempts have been made, but none really succeeded.

    Long, long ago, there was "repy"[1][2]. (This is definitely included in the "none succeeded" bucket, FWIW.)

    [1]: https://github.com/SeattleTestbed/repy_v2

    [2]: https://dl.acm.org/doi/10.1145/1866307.1866332

  • Alifatisk 21 hours ago
    > The thing is, Python dominates AI/ML, especially the AI agents space. We're moving from deterministic systems to probabilistic ones, where executing untrusted code is becoming common.

    This is so true

  • bArray 1 day ago
    I have been thinking about this myself, but am still not convinced about how to run untrusted Python code. I'm not convinced that the right solution is to run the code as WebASM [1].

    I have been looking towards some kind of quick-start qemu option as a possibility, but the project will take a while.

    [1] https://github.com/mavdol/capsule

    • mavdol04 1 day ago
      I see what you mean, but i think there is room for both approaches.

      If we want to isolate untrusted code at a very fine-grained level (like just a specific function), VMs can feel a bit heavy due to the overhead, complexity etc

      • quotemstr 1 day ago
        What you really want to do is decouple the sandbox specification annotations from the sandbox implementation backend, yes?
    • regenschutz 1 day ago
      What's the problem with WASM? It's a mature target, and was created primarily, if not solely, for running untrusted native code.
  • cmacleod4 20 hours ago
    As with most Python problems, the solution is to switch to Tcl - https://www.tcl-lang.org/man/tcl9.0/TclCmd/interp.html#M44 :-)
    • graemep 19 hours ago
      There is a lot to like about TCL but it does not have the huge ecosystem.
  • incognito124 1 day ago
    Sharing my friend's startup for sandboxed code execution:

    https://judge0.com/

  • ptspts 1 day ago
    Neither the article nor the README explains how it works.

    How does it work? Which WASM euntime does it use? Does it use a Python jnterpreter compiled to WASM?

  • maxloh 1 day ago
    Edit: never mind, I read it wrong.

    ---

    That is not save at all. You could always hijack builtin functions within untrusted code.

      def untrusted_function():
          original_map = map
      
          def noisy_map(func, *iterables):
              print(f"--- Log: map() called on {func.__name__} ---")
              return original_map(func, *iterables)
      
          globals()['map'] = noisy_map
    • mavdol04 1 day ago
      Actually, since it runs inside a WASM sandbox, even if the untrusted code overwrites built-ins like map or modifies globals(), it only affects its own isolated memory space. It cannot escape the WASM container or affect the host system
    • fud101 1 day ago
      it blows my mind how people call Perl ugly but yet this monstrosity is ok. Python being 'human' readable has got to be the biggest scam ever perpetrated against language design.
  • staticassertion 1 day ago
    Seems fine to me. I think you're going to take a huge performance hit by putting CPython into wasm. gVisor is mentioned as having a performance penalty but I'm extremely doubtful of that penalty (which is really on IO, which I expect to not be a huge deal for these workloads) being anywhere near the penalty of wasm.