Code Agent Sample: Lua Dungeon Crawler

1. Sample Overview

What Is This Sample?

This sample demonstrates a code agent — an LLM that generates executable code in response to natural language commands. Unlike tool-calling agents that select from predefined functions, code agents write arbitrary programs, enabling more flexible and creative behavior.

The sample is a text-based dungeon crawler where the player controls movement directly (WASD), but commands an AI companion using natural language. When the player types a command like “go get the sword,” the LLM generates Lua code that implements that behavior. The AI companion then executes this code each game tick until the task is complete.

NOTE: This sample makes use of a CUDA-based backend, and therefore will not work on non-NVIDIA hardware. Please reference other documentation here on how to switch backends.

Required Models

The code agent sample requires the following models:

Plugin	Model Name	GUID
nvigi.plugin.gpt.ggml.*	Qwen3 8B Instruct	545F7EC2-4C29-499B-8FC8-61720DF3C626

See the top-level documentation that shipped with your development pack for information on how to download these models.

NOTE: Other code-capable models may work. Models with strong instruction-following and code generation abilities tend to perform best.

How to Run

Because of the SDK layout, once the components are built they will be under the _artifacts tree; in order for the SDK to run an app like the sample, all of these DLLs and the executable must be in the same directory. We do this by copying the DLLs and EXE into the bin\x64 directory, and running the app from within that directory, so all plugins are available.

From Command Line:

Open a command prompt in <SDK_ROOT>
Run:

bin\x64\nvigi.codeagentlua.exe data/nvigi.models data/nvigi.test/nvigi.codeagentlua/codeagent_prompt_lua.txt

In Debugger:

Edit project settings for nvigi/samples/nvigi.codeagentlua
Set “Command” to <SDK_ROOT>\bin\x64\nvigi.codeagentlua.exe
Set “Command Arguments” to data/nvigi.models data/nvigi.test/nvigi.codeagentlua/codeagent_prompt_lua.txt
Set “Working Directory” to <SDK_ROOT>/bin/x64
Build and run copy_sdk_binaries.bat <cfg> after each build

Basic Gameplay

w/a/s/d — Move the player (P) up/left/down/right
Enter — Type a command for the AI companion (A)
q — Quit the game

Objective: Navigate the dungeon, collect weapons, and defeat all monsters.

Entities:

P — Player (you)
A — AI, your companion
G — Gorgon (defeated with sword)
B — Bat (defeated with bow + arrow)
^ — Sword
> — Bow
/ — Arrow

Combat: Walking adjacent to a monster triggers combat. You need the right weapon or you take damage.

Example Commands

Commands that tend to work well:

“Follow me”
“Go get the sword”
“Pick up the bow, then pick up an arrow, then kill a bat”
“Explore the dungeon”
“Stay here”
“How many monsters are in the dungeon?”

The AI will generate Lua code that runs each game tick, moving one space at a time toward completing the goal.

Depending on the LLM, it might require simpler commands. For instance, using GPT5 you can likely say “kill all of the bats” and it will reason out that it needs to get the bow, then the arrows then go kill all of the bats. Using smaller local LLMs, that same command will likely send the AI to attack the bats unarmed. Sometimes it is necessary for SLMs to even break compound tasks into multiple simple tasks. At times, “Go get the sword, then kill the gorgon” might work, but it will likely work more often as “Go get the sword”, wait for that task to complete, then follow it with “Go kill the gorgon”. Experiment with it and see what works.

Whenever you run a command, you can look at the file located in <WORKING_DIR>/ai_func_out.txt to see the function that was generated.

2. Code Agents vs Tool-Calling Agents

Modern LLM applications often need to interact with external systems — querying databases, calling APIs, or controlling game characters. There are two primary approaches: tool-calling agents and code agents.

What Is a Tool-Calling Agent?

A tool-calling agent works by having the LLM select from a predefined set of functions. The application provides a schema describing available tools (name, parameters, descriptions), and the LLM outputs structured JSON indicating which tool to call and with what arguments.

User: "What's the weather in Austin?"

LLM Output: { "tool": "get_weather", "args": { "city": "Austin" } }

Application: Calls get_weather("Austin"), returns "72°F, sunny"

LLM: "The weather in Austin is 72°F and sunny."

Advantages:

Predictable — Only predefined functions can be called
Easy to secure — Each tool is explicitly implemented and validated
Structured output — JSON parsing is well-understood
Lower risk — The LLM cannot execute arbitrary logic

Disadvantages:

Limited flexibility — Complex behaviors require many predefined tools
Combinatorial explosion — As capabilities grow, tool count explodes
No novel solutions — The LLM cannot invent behaviors you didn’t anticipate
Slow - Each function call is likely multiple inference calls to the LLM.
No logic - We rely on the LLM for all logic, when code is superior at exact solutions.
Verbose prompts - the json specifications make for large and difficult to read function declarations.

What Is a Code Agent?

A code agent generates executable code rather than selecting from predefined functions. Given a natural language request, the LLM writes a program that implements the desired behavior. The application then executes that code in a sandboxed environment.

User: "Go get the sword, then come back to me."

LLM Output:
function update_func(player, ai, monsters, items)
    if not ai:has_item("sword") then
        for _, item in ipairs(items) do
            if item.name == "sword" then
                local path = ai:find_path_to(item.position)
                if path and #path > 0 then
                    ai:move(path[1])
                    return "Moving toward sword"
                end
            end
        end
    else
        local path = ai:find_path_to(player.position)
        if path and #path > 0 then
            ai:move(path[1])
            return "Returning to player"
        end
    end
    return "Done"
end

Application: Executes this Lua function each game tick

Advantages:

Highly flexible — Can express any logic the language supports
Compositional — Complex behaviors emerge from simple primitives
Novel solutions — LLM can invent behaviors not explicitly designed
Fewer APIs — Expose primitives, let the LLM compose them
More Readable Prompts - Common scripting function declarations are the documentation.
Smaller prompts - Being able to write code allows the system prompt to be more concise, and faster to load.
Faster - one inference call can generate code that executes for the duration of the task.

Disadvantages:

Security risk — Arbitrary code execution is inherently dangerous
Unpredictable — Generated code may have bugs or unexpected behavior
Harder to validate — Cannot easily verify correctness before execution
Requires sandboxing — Must carefully restrict what the code can do

When to Use Each

Scenario	Recommended Approach
Limited, well-defined actions	Tool-calling
High-security requirements	Tool-calling
Simple request-response patterns	Tool-calling
Complex, multi-step behaviors	Code agent
Creative or exploratory tasks	Code agent
Game AI or robotics control	Code agent
When you can sandbox effectively	Code agent

Hybrid approaches are also possible: use tool-calling for high-risk operations (payments, deletions) and code agents for low-risk, creative tasks (game AI, data analysis).

This sample demonstrates a code agent because controlling a game character requires flexible, multi-step logic that would be tedious to express as individual tool calls. The character needs to pathfind, make decisions based on game state, and adapt to changing circumstances — all of which are naturally expressed as code.

3. Security Considerations

Executing LLM-generated code is inherently risky. Before evaluating scripting languages, we must understand the threats we need to mitigate.

The Risk of Code Agents

Unlike tool-calling agents where the application controls exactly what functions can be invoked, code agents execute arbitrary programs. The LLM might generate code that:

Accidentally or intentionally accesses the filesystem, network, or system resources
Consumes unbounded memory, crashing the application
Enters infinite loops, hanging the application
Exploits language features to escape the sandbox
Corrupts application state in unexpected ways

These risks exist even when the LLM is “well-intentioned” — bugs in generated code can trigger any of these behaviors. A robust code agent must defend against both malicious and accidental misuse.

Threat Model

We identified six categories of threats that a code agent sandbox must address:

Dangerous Function Access

Threat: Generated code calls functions that access the filesystem, execute shell commands, load arbitrary modules, or interact with the network.

Examples (pseudocode):

execute_shell("rm -rf /") — Run shell commands
read_file("/etc/passwd") — Read sensitive files
import("network_library") — Load dangerous modules
eval(arbitrary_code_string) — Execute arbitrary code

Mitigation: The sandbox must either not load dangerous libraries, or selectively remove dangerous functions after loading.

Memory Exhaustion

Threat: Generated code allocates unbounded memory, exhausting system resources and crashing the application.

Examples (pseudocode):

while true: list.append(large_string) — Infinite allocation
s = ""; for i in 1..billion: s = s + "x" — String concatenation bomb

Mitigation: The runtime must enforce a memory limit and fail gracefully when exceeded.

Stack Overflow

Threat: Generated code uses deep or infinite recursion, overflowing the call stack.

Examples (pseudocode):

function f(): f() — Infinite recursion
Mutually recursive functions that never terminate

Mitigation: The runtime must track call depth and abort execution when a threshold is exceeded.

Infinite Loops / Hangs

Threat: Generated code enters an infinite loop, hanging the application and preventing further user interaction.

Examples (pseudocode):

while true: pass — Infinite loop with no I/O
for i in 1..infinity: pass — Extremely long loop

Mitigation: The runtime must enforce a time limit or instruction count limit and abort long-running code.

Prototype/Metatable Manipulation

Threat: Generated code manipulates object prototypes or metatables to bypass sandbox restrictions or corrupt internal data structures. Many scripting languages allow customizing how objects behave through prototype chains, metatables, or similar mechanisms.

Examples (pseudocode):

set_prototype(entity, malicious_handler) — Override entity behavior
get_prototype(entity).write_handler = null — Disable write protection

Mitigation: Remove or restrict access to prototype/metatable manipulation functions.

Game State Corruption

Threat: Generated code directly modifies application state in ways that break game logic or cause crashes.

Examples (pseudocode):

player.health = -999 — Invalid health value
monster.position = null — Break rendering/pathfinding
ai.items = "not a list" — Type corruption

Mitigation: Expose application state through controlled interfaces that validate or reject invalid modifications.

Summary

The six categories this sample has addressed are:

Threat	Required Mitigation
Dangerous functions	Selective library loading, function removal
Memory exhaustion	Custom allocator with limits
Stack overflow	Call depth tracking
Infinite loops	Timeout / instruction counting
Metatable manipulation	Remove metatable functions
State corruption	Controlled state access, validation

This is not exhaustive and can expand or contract depending on your use case and the language you choose to have the ai write in.

The next section evaluates scripting languages against these requirements.

–

4. Scripting Engine Evaluation

Choosing a scripting language for a code agent requires balancing several concerns: security (can we sandbox it?), embeddability (can we integrate it into C++?), LLM familiarity (will models generate correct code?), and runtime characteristics (performance, memory footprint).

Candidates Considered

We evaluated three scripting languages:

Python

Pros:

LLMs are extremely proficient at generating Python code
Rich standard library
Familiar to most developers
Excellent documentation and community

Cons:

Difficult to sandbox securely — the standard library has many escape hatches
Larger than desired runtime footprint (~10-20MB+ for embedded Python)
Larger than desired compile time footprint (~150MB+ for embedded Python)
Complex C++ integration (reference counting, GIL, etc.)
Hard to limit memory and CPU usage reliably

Python’s sandboxing challenges are well-documented. Even “restricted” Python environments have historically been bypassed. For a code agent where untrusted LLM-generated code runs, this is a significant concern.

If you do consider python, pay special attention to these areas

If you are still using the GIL (python <= 3.13), then threaded implementations become difficult. It can become impossible to cleanly restart the python interpreter.
Subinterpreters can be restarted, but at a cost of lost memory.
Subprocesses can be restarted cleanly, handle memory loss properly, but can be difficult to debug.

ChaiScript

Pros:

Designed specifically for C++ embedding
Header-only library, easy to integrate
Clean syntax, similar to JavaScript
No external dependencies

Cons:

LLMs have limited training data on ChaiScript — generated code often has syntax errors
Smaller community and less documentation
Sandboxing support is limited
Less battle-tested than alternatives

ChaiScript’s obscurity was a significant problem. Models frequently hallucinated non-existent functions or used incorrect syntax, requiring extensive prompt engineering and retry logic.

If you do consider Chaiscript, pay special attention to these areas:

ChaiScript also has limited ability to properly sandbox memory usage.
Special care must be paid when using Chaiscript in a threaded environment with how variable scoping works.

Lua

Pros:

Designed for embedding from the start
Tiny footprint (~200KB)
Small compile time footprint (a few MB)
Excellent sandboxing support — can selectively load libraries and remove functions
Custom allocator support for memory limiting
Debug hooks for timeout/recursion control
LLMs generate reasonable Lua code (widely used in game modding, WoW, Roblox, etc.)
Battle-tested in thousands of game engines

Cons:

Syntax can be unfamiliar (1-indexed arrays, ~= for not-equal, : vs .)
Smaller standard library than Python
LLMs occasionally make syntax errors (though fewer than ChaiScript)

Why Lua Won

Lua was selected for this sample because it best balances our requirements:

Requirement	Python	ChaiScript	Lua
Sandboxing	❌ Difficult	⚠️ Limited	✅ Excellent
Memory limiting	❌ Hard	❌ No	✅ Custom allocator
Timeout control	❌ Complex	❌ No	✅ Debug hooks
LLM code quality	✅ Excellent	❌ Poor	⚠️ Good
Embedding ease	⚠️ Complex	✅ Easy	✅ Easy
Runtime size	❌ Large	✅ Small	✅ Tiny

While Python would produce better LLM-generated code, it cannot be safely sandboxed. ChaiScript embeds easily but is difficult to effectively secure and LLMs struggle with its syntax. Lua provides the best combination: reasonable LLM output quality with excellent security controls.

The occasional Lua syntax errors from LLMs can be mitigated with retry logic (see Section 6), making it a practical choice for production code agents.

5. How Lua Addresses Each Threat

This section explains how the sample implements each security mitigation using Lua’s features. The relevant code is in lua_bindings.cpp.

Selective Library Loading

Unlike luaL_openlibs() which loads everything, we selectively load only safe libraries:

// Load safe libraries
luaL_requiref(L, "_G", luaopen_base, 1);      // Basic functions
luaL_requiref(L, LUA_TABLIBNAME, luaopen_table, 1);   // Table manipulation
luaL_requiref(L, LUA_STRLIBNAME, luaopen_string, 1);  // String functions
luaL_requiref(L, LUA_MATHLIBNAME, luaopen_math, 1);   // Math functions
luaL_requiref(L, LUA_UTF8LIBNAME, luaopen_utf8, 1);   // UTF-8 support
luaL_requiref(L, LUA_OSLIBNAME, luaopen_os, 1);       // OS (then sanitized)

// NOT loading:
// - io: File I/O
// - debug: Can break sandbox
// - package: Can load arbitrary modules
// - coroutine: Could complicate timeout handling

By never loading dangerous libraries, those functions simply don’t exist — there’s nothing to exploit.

Dangerous Function Removal

Even safe libraries contain dangerous functions. After loading, we remove them:

// Remove dangerous os functions (keep time, date, difftime, clock)
removeFromLib(L, "os", "execute");    // Shell commands
removeFromLib(L, "os", "exit");       // Terminate program
removeFromLib(L, "os", "remove");     // Delete files
removeFromLib(L, "os", "rename");     // Move files
removeFromLib(L, "os", "getenv");     // Environment variables

// Remove dangerous base functions
removeGlobal(L, "dofile");            // Execute Lua file
removeGlobal(L, "loadfile");          // Load Lua file
removeGlobal(L, "load");              // Execute arbitrary strings
removeGlobal(L, "require");           // Module loading

The removeFromLib and removeGlobal helpers simply set these to nil, making them undefined.

Custom Memory Allocator

Lua allows replacing its memory allocator via lua_newstate(). We provide a custom allocator that tracks usage and enforces a limit:

static void* luaLimitedAlloc(void* ud, void* ptr, size_t osize, size_t nsize)
{
    LuaMemoryTracker* tracker = static_cast<LuaMemoryTracker*>(ud);
    
    // Calculate new usage
    size_t delta = nsize - (ptr ? osize : 0);
    size_t newUsage = tracker->currentUsage + delta;
    
    // Refuse allocation if it would exceed limit
    if (newUsage > tracker->limit)
    {
        return nullptr;  // Lua throws out-of-memory error
    }
    
    // Perform allocation and track usage
    void* newPtr = realloc(ptr, nsize);
    if (newPtr) tracker->currentUsage = newUsage;
    return newPtr;
}

// Create Lua state with 100MB limit
lua_State* L = lua_newstate(luaLimitedAlloc, tracker);

When the limit is reached, Lua receives a null pointer and throws a recoverable out-of-memory error.

Debug Hooks for Timeout and Recursion

Lua’s debug hooks let us intercept execution at key points. We use three hook types:

static void luaTimeoutHook(lua_State* L, lua_Debug* ar)
{
    // Track call depth for recursion limit
    if (ar->event == LUA_HOOKCALL || ar->event == LUA_HOOKTAILCALL)
    {
        g_luaCallDepth++;
        if (g_luaCallDepth > LUA_MAX_CALL_DEPTH)  // 200
        {
            luaL_error(L, "Stack overflow (recursion depth exceeded)");
        }
    }
    else if (ar->event == LUA_HOOKRET)
    {
        g_luaCallDepth--;
    }
    
    // Check timeout every N instructions
    if (ar->event == LUA_HOOKCOUNT)
    {
        auto elapsed = now - g_luaStartTime;
        if (elapsed > LUA_TIMEOUT_MS)  // 1000ms
        {
            luaL_error(L, "Execution timed out");
        }
    }
}

// Install hook before calling Lua code
lua_sethook(L, luaTimeoutHook, LUA_MASKCALL | LUA_MASKRET | LUA_MASKCOUNT, 1000);

This catches both infinite loops (via timeout) and infinite recursion (via call depth tracking).

Metatable Protection

Metatables control how Lua objects behave. If code could modify our Entity metatable, it could bypass protections. We remove the metatable functions entirely:

removeGlobal(L, "getmetatable");  // Can't inspect metatables
removeGlobal(L, "setmetatable");  // Can't modify metatables
removeGlobal(L, "rawget");        // Can't bypass __index
removeGlobal(L, "rawset");        // Can't bypass __newindex
removeGlobal(L, "rawequal");      // Can't bypass __eq
removeGlobal(L, "rawlen");        // Can't bypass __len

Without these functions, code can only interact with entities through our controlled __index and __newindex metamethods.

Entity Field Write Protection

Our __newindex metamethod controls what happens when code writes to entity fields. We block writes to built-in fields while allowing custom fields for AI state:

static int entity_newindex(lua_State* L)
{
    Entity* e = checkEntity(L, 1);
    const char* key = luaL_checkstring(L, 2);

    // Block writes to built-in fields
    if (strcmp(key, "name") == 0 || strcmp(key, "position") == 0 ||
        strcmp(key, "health") == 0 || /* ... */)
    {
        return luaL_error(L, "Cannot modify '%s' - use methods instead", key);
    }

    // Allow custom fields (stored in shadow table)
    getEntityCustomFields(L, e);
    lua_pushstring(L, key);
    lua_pushvalue(L, 3);
    lua_settable(L, -3);
    return 0;
}

This prevents monster.health = 0 (cheating) while allowing ai.my_target = monster (state tracking).

Summary

Threat	Lua Feature Used	Implementation
Dangerous functions	Selective loading	`luaL_requiref` for safe libs only
Dangerous functions	Function removal	`removeGlobal`, `removeFromLib`
Memory exhaustion	Custom allocator	`lua_newstate` with tracking allocator
Stack overflow	Debug hooks	`LUA_MASKCALL` / `LUA_MASKRET`
Infinite loops	Debug hooks	`LUA_MASKCOUNT` with timeout
Metatable abuse	Function removal	Remove `get/setmetatable`, `raw*`
State corruption	Metamethods	`__newindex` blocks built-in fields

6. Design Decisions for LLM Success

Security is necessary but not sufficient — the LLM must also generate correct code. This section covers design decisions that improve code generation success rates.

API Design: Entity Methods vs Global Functions

We expose functionality as both entity methods and global functions, but the system prompt emphasizes methods:

-- Method syntax (preferred, documented in prompt)
ai:move("w")
ai:has_item("sword")
local path = ai:find_path_to(player.position)

-- Global function syntax (also works, but not documented)
move_entity(ai, "w")
has_item(ai, "sword")
local path = find_path_astar(ai.position, player.position, entities)

Why methods work better:

More idiomatic — LLMs trained on Lua (game mods, Roblox, etc.) see method syntax frequently
Cleaner code — ai:move("w") vs move_entity(ai, "w") is more readable
Fewer parameters — Methods automatically use the entity and global maze, reducing chances for error
Better autocomplete patterns — LLMs predict ai: then method name more reliably

We still register the global functions as a fallback — if the LLM accidentally generates move_entity(ai, "w"), it works. Defense in depth for correctness.

Prompt Engineering

The system prompt (codeagent_prompt_lua.txt) is carefully structured to prevent common LLM errors:

1. Explicit examples for ambiguous cases:

NOTE: These methods take POSITIONS, not strings or names!
WRONG: ai:find_path_to("sword")      -- strings don't work!
WRONG: ai:distance_to(monster.weakness)  -- that's a string!
RIGHT: ai:find_path_to(item.position)    -- use .position

Without this, LLMs frequently hallucinate that find_path_to("sword") will find the sword by name.

2. Clear item location rules:

Items exist in ONE of two places (never both):
1) ON THE GROUND: Found in the global "items" table
2) IN AN INVENTORY: Found in entity.items (use entity:has_item() to check)

LLMs often confuse ground items with inventory items without explicit guidance.

3. Function signature with parameter types:

entity:find_path_to(position)
    --[[
    Parameters:
        position: A position table {row, col} - use entity.position for entities
    Returns:
        table: Array of direction strings, or empty {} if unreachable
    ]]

Documenting return types and parameter formats reduces hallucinations.

4. Lua-specific reminders:

- Lua tables are 1-indexed! Use: for i, item in ipairs(array) do ... end
- Method call: entity:method() (passes entity as first arg)
- Check empty table: if #path > 0 then ... end

LLMs trained primarily on Python/JavaScript often forget Lua’s 1-indexing.

Error Handling and Retry

LLM-generated code can fail on the first attempt. We implement a retry loop that feeds errors back to the model in an attempt to autocorrect.

for (int attempt = 0; attempt <= MAX_RETRIES && !success; attempt++)
{
    if (attempt == 0)
    {
        fullPrompt = "Write an update_func that satisfies: \"" + prompt + "\"";
    }
    else
    {
        // Include the failed code and error message
        fullPrompt = "Your previous code:\n" + code + 
                     "\nfailed with this error:\n" + lastError +
                     "\n\nGenerate corrected code.";
    }

    code = llmCreateAIFunc(fullPrompt);
    
    // Try to compile
    if (luaL_dostring(L, code.c_str()) != LUA_OK)
    {
        lastError = lua_tostring(L, -1);
        continue;  // Retry with error context
    }

    // Try runtime
    std::string result = callLuaAIFunc(L, ...);
    if (result.starts_with("Function failed"))
    {
        lastError = result;
        continue;  // Retry with error context
    }

    success = true;
}

This catches both compile-time errors (syntax mistakes) and runtime errors (nil indexing, type errors), giving the LLM a chance to self-correct.

Reference Semantics with Userdata

We pass C++ entities to Lua as userdata pointers, not copies:

// Push entity as pointer (reference semantics)
static void pushEntity(lua_State* L, Entity* entity)
{
    Entity** udata = (Entity**)lua_newuserdata(L, sizeof(Entity*));
    *udata = entity;
    luaL_getmetatable(L, ENTITY_META);
    lua_setmetatable(L, -2);
}

Why this matters:

Changes persist — When Lua code calls ai:move("w"), the C++ entity’s position actually changes. No sync-back needed.
No stale data — Reading monster.health always returns the current value
Efficient — No copying of entity data back and forth

The alternative (copying entities to Lua tables) would require syncing changes back to C++ after every Lua call, which is error-prone and inefficient.

Shadow Tables for Custom Fields

The AI needs to store persistent state across function calls (e.g., “Tell me the number of unique positions I have occupied?”). We support custom fields on entities using a shadow table:

// Shadow table: maps entity pointers to their custom fields
static void getEntityCustomFields(lua_State* L, Entity* entity)
{
    pushShadowTable(L);  // Global table in registry
    
    lua_pushlightuserdata(L, entity);  // Use pointer as key
    lua_gettable(L, -2);
    
    if (lua_isnil(L, -1))
    {
        // Create new table for this entity's custom fields
        lua_newtable(L);
        lua_pushlightuserdata(L, entity);
        lua_pushvalue(L, -2);
        lua_settable(L, -4);  // shadow[entity] = {}
    }
    lua_remove(L, -2);
}

This allows:

-- AI can store custom state for tracking
player.visited_positions = player.visited_positions or {}
local pos_key = player.position[1] .. "," .. player.position[2]
player.visited_positions[pos_key] = true

-- Count unique positions
local count = 0
for _ in pairs(player.visited_positions) do count = count + 1 end
return "You have visited " .. count .. " unique positions"

-- Built-in fields still work (read-only)
local pos = player.position  -- reads from C++

The shadow table uses weak keys, so when an entity is garbage collected, its custom fields are automatically cleaned up.

Caveat: While this feature works correctly, the current SLMs rarely generate code that uses it effectively, at least with Lua and our current SLM. Stateful tasks like “track unique positions visited” require the model to correctly initialize state, update it each call, and avoid resetting it with local. In practice, SLM-generated code often gets this wrong — especially with smaller models. Consider this an available capability rather than a reliable feature for LLM-generated code. Also consider future models or cloud models might be far superior at using such techniques.

Summary

Decision	Problem Solved	Implementation
Method syntax	Cleaner code, better LLM predictions	`__index` returns bound methods
Explicit examples in prompt	Prevent common hallucinations	“WRONG/RIGHT” examples
Error feedback retry	Self-correction of mistakes	Loop with error context
Userdata pointers	Changes persist, no sync needed	`pushEntity` stores pointer
Shadow tables	Custom state without corrupting entities	Registry table with weak keys

7. Alternative Architecture — Curated Function Libraries

While this sample demonstrates runtime code generation, some applications may require stricter control over what code can execute. An alternative approach combines LLM code generation during development with human curation for production.

The Hybrid Approach

Instead of generating code at runtime, you can:

Development Phase: Have your team interact with the code agent during development, making the kinds of requests end users would make. The LLM generates functions as usual, but each generated function is cached and logged.
Curation Phase: Developers review the generated functions, validating and approving the ones that work correctly. Over time, you build a library of vetted, production-ready functions.
Production Phase: When an end user makes a request, use semantic matching (embeddings, similarity search) to find a pre-approved function that matches their intent. If a match exists, use it. If not, either:
- Strict mode: Return an error (“I don’t know how to do that yet”)
- Fallback mode: Generate code at runtime (with all the sandboxing protections)

Benefits

Human oversight — Every function that runs in production has been reviewed
Predictable behavior — Users get tested, validated code paths
Reduced latency — No LLM inference needed for common requests
Security confidence — No runtime code generation in strict mode
Continuous improvement — New requests in fallback mode become candidates for curation

Implementation Considerations

Function storage: Store approved functions with metadata (original prompt, function code, semantic embedding of the request).

Semantic matching: Use an embedding model to convert user requests to vectors, then find the nearest approved function. Set a similarity threshold — below it, consider the request “unmatched.”

Parameterization: Some functions may need light parameterization (e.g., “go get the sword” vs “go get the bow”). Consider whether exact matches are sufficient or if you need template-based functions.

Fallback policy: Decide whether unmatched requests should fail gracefully or trigger runtime generation. This is a security/flexibility tradeoff.

When to Use This Approach

Scenario	Recommendation
High-security production environment	Strict mode (no runtime generation)
Internal tools with trusted users	Fallback mode acceptable
Games with predictable command patterns	Curated library works well
Open-ended creative applications	Runtime generation may be necessary

This approach lets you capture the benefits of LLM code generation during development while maintaining tight control over what runs in production.

8. Conclusion

Code agents offer a powerful alternative to tool-calling agents for tasks that require flexible, multi-step logic. By having the LLM generate executable code rather than selecting from predefined functions, we can build AI companions that adapt to novel situations and compose behaviors in ways we didn’t explicitly anticipate.

However, this power comes with responsibility. Executing LLM-generated code requires careful sandboxing to prevent:

Dangerous function access (file I/O, shell execution)
Resource exhaustion (memory, CPU, stack)
Sandbox escapes (metatable manipulation)
Application state corruption

Lua proved to be an excellent choice for this sample, offering:

Selective library loading and function removal
Custom memory allocators for hard limits
Debug hooks for timeout and recursion control
A syntax that LLMs can generate with reasonable accuracy

Beyond security, we found that API design and prompt engineering significantly impact LLM success rates. Method syntax, explicit WRONG/RIGHT examples, and error-feedback retry loops all contribute to more reliable code generation.

This sample demonstrates that code agents are practical today, but require thoughtful engineering. The techniques shown here — sandboxing, API design, prompt engineering, and graceful error handling — provide a foundation for building code agents in your own applications.

Key Takeaways:

Code agents vs tool-calling — Choose based on task complexity and security requirements
Language choice matters — Prioritize sandboxing capability, then LLM familiarity
Defense in depth — Multiple layers of protection for each threat category
Design for LLM success — API design and prompts are as important as the runtime
Expect iteration — Error feedback and retry loops improve success rates significantly

We encourage you to experiment with this sample, try different commands, and explore how the AI companion responds. The code is designed to be readable and modifiable — use it as a starting point for your own code agent implementations.