🪷 Loop memory recovery

The biggest context waste isn't the first read — it's the second, third, and fourth. After compaction, the agent re-reads the same file and dumps the whole thing again. tsh tracks this and stops it.

Read #	What the agent sees	Savings
1st read	Head 100 lines + structural skeleton + tail 20 + footer	~70–90%
2nd read (same file)	Structure only — `def`/`class`/`ERROR`/`WARN` lines + footer	~99%
N-th read	Same as 2nd — structural only + count indicator	~99%

Powered by SessionTracker — an ExecutionObserver on brush-core that watches every file-reading command (cat, head, tail, less, bat) and counts reads per file path across the session. No configuration needed. Fully automatic.

Real benchmark: two reads of server_log.txt (2,000 lines)

Metric	Without tsh	With tsh	Savings
Tokens sent to agent	53,623	~3,369	94%
Lines sent to agent	~4,000	158	96%
Found both needles (ERROR + FATAL)	Yes	Yes	No loss

🪻 Why an agent should use tsh

An LLM agent running shell commands needs a shell. Most reach for bash. But bash dumps everything into the agent's context window — 48,000 lines of logs, raw binary, credentials. tsh manages what actually reaches the agent's memory.

🪷 It's a real shell

Pipes, redirects, heredocs, command substitution, process substitution, loops, traps, job control. Powered by brush-core — a Rust implementation of bash.

🫧 Smart output limiting

Large outputs are automatically reduced to their structural skeleton — function signatures, class defs, imports, errors. First 100 lines + structural extraction + last 20. No LLM call, just fast pattern matching.

🪻 Internal pipes untouched

When you run grep foo | sort | uniq, those internal pipes run at OS speed. Only the final output to the agent gets routed through the safety layer. Zero overhead on intermediate work.

🪷 Pluggable & configurable

Tune via env vars: TSH_HEAD_LINES, TSH_TAIL_LINES, TSH_MAX_LINES. Or set TSH_NO_LIMIT=1 for full pass-through. The Python filter is pluggable — swap in your own logic.

🫧 Windows native

Auto-detects PowerShell's UTF-16LE encoding — with BOM, without BOM, or plain UTF-8. Agents on Windows need zero special configuration.

🪻 Multilingual structural awareness

Knows the structural patterns of Python, Rust, JavaScript, TypeScript, Java, C/C++, Go, and log formats. Extracts what matters — def, class, pub fn, ERROR, Traceback — elides the rest.

🪷 Quickstart

macOS / Linux / WSL

curl -fsSL https://raw.githubusercontent.com/maceip/tsh/main/install.sh | sh

Windows (PowerShell)

irm https://raw.githubusercontent.com/maceip/tsh/main/install.ps1 | iex

Then open a new terminal and run:

tsh

Usage

# Interactive shell (safety-filtered)
tsh

# Run a command
tsh -c 'cat /var/log/syslog | tail -50'

# Disable safety for debugging
tsh --no-safety -c 'echo "raw output"'

# Pipe a script
echo 'whoami && df -h' | tsh

Build from source

git clone https://github.com/maceip/tsh && cd tsh
cargo build --release
./target/release/tsh

Docker

docker compose up -d
docker compose run --rm tsh

🫧 Agent reference

Structured for LLM agents and automation tools. Exact types, defaults, and behavior for every flag, mode, and output.

Mode detection

tsh selects its mode automatically. No mode flag exists.

Condition	Mode	Behavior
`-c "cmd"` provided	Command	Runs the string through brush-core. Stdout is safety-filtered. Exits with the command's status code.
stdin is a pipe (not a terminal)	Script	Reads all stdin bytes as a script (handles UTF-16LE on Windows). Runs through brush-core. Output filtered.
stdin is a terminal	Shell	REPL with `tsh$` prompt. Full POSIX interactive shell (brush-core with bash-mode builtins). All stdout safety-filtered.

Priority: -c is checked first. Then pipe detection. Shell is the fallback.

CLI flags

Flag	Type	Default	Description
`-c` / `--command`	String	None	Execute this command string and exit. Like `bash -c`.
`--no-safety`	Boolean	`false`	Disable the safety filter. Output passes through unfiltered. For debugging.

Agent invocation patterns:

# Run commands with safety filtering
tsh -c 'ls -la /tmp && df -h'
tsh -c 'cat .env | head -10'
tsh -c 'grep -rn "password" config/'

# Pipe a script
echo 'whoami && env' | tsh

# Debug mode (no filtering)
tsh --no-safety -c 'echo "raw output"'

# Interactive
tsh

Output routing

How tsh handles command output:

stdout (fd 1): Captured via injected pipe. Text routed through Python safety filter. Binary data (detected by null bytes) bypasses safety and goes directly to terminal.
stderr (fd 2): Captured via pipe, forwarded directly to terminal. No safety filtering.
Internal pipes: cmd1 | cmd2 pipes run at OS speed between commands. Only the final stdout to the terminal is routed through safety.
Truncation: Stdout capped at 1MB per command. Excess is drained silently. [STDOUT TRUNCATED at 1MB] is printed once.

Shell capabilities

Full POSIX shell via brush-core:

Pipelines: cmd1 | cmd2 | cmd3
Redirection: >, >>, 2>&1, <(process substitution)
Command substitution: $(command)
Heredocs: cat <<'EOF'
Conditionals: if/then/fi, &&, ||
Loops: for, while, until
Signal handling: trap
All bash-mode builtins: echo, printf, test, read, export, mapfile, etc.

Error handling

Errors go to stderr (unfiltered). Exit code 1 on failure.

Condition	stderr message	Resolution
Empty stdin in pipe mode	`No input via stdin.`	Ensure data is piped.
Invalid encoding (Windows)	`Not valid UTF-8 or UTF-16LE.`	PowerShell: `$OutputEncoding = [System.Text.Encoding]::UTF8`
safety filter not found	`[tsh] safety filter not found. Pass-through mode.`	Ensure `python/safety_filter.py` exists and Python 3 is on PATH.

Pipeline patterns

# Safe command execution — output is filtered
tsh -c 'find . -name "*.py" -exec grep -l "import os" {} +'
tsh -c 'cat /var/log/auth.log | grep "Failed password" | tail -20'

# Chain commands — internal pipes at OS speed, only final output filtered
tsh -c 'ps aux | grep python | awk "{print \$2, \$11}" | sort'

# Script via stdin
echo 'for f in *.log; do wc -l "$f"; done | sort -rn' | tsh

# Debug — bypass safety
tsh --no-safety -c 'env | sort'

🪷 Performance

🫧 Routing overhead

Pipe routing adds single-digit milliseconds. Internal pipes between commands are pure OS speed — untouched by the router.

🪻 Async I/O

Tokio async runtime. Non-blocking pipe reads. safety subprocess spawns once and stays warm for the entire session.

🪷 Binary bypass

First chunk scanned for null bytes. Binary output skips safety entirely — straight to terminal. No serialization overhead.

🫧 Startup

Native Rust binary. Sub-millisecond to shell ready. Python safety process spawns in parallel with first command.

🫧 LangExtract — companion extraction engine

tsh ships with langextract-host, a zero-copy chunked extraction library for routing documents through local LLMs. Run it independently via cargo run -p langextract-host.

📄 Document

▸

chunk_text()

24KB slices · 1KB overlap · zero-copy · whitespace-safe

▾

JSON lines

▸

python/shim.py

redact inputs ▸ call LLM ▸ tag outputs ▸ JSON back

▸

Ollama / vLLM

▾

aggregated

▸

AnnotatedDocument[]

extraction_class · char_interval · attributes · alignment

Chunking

Max chunk: 24,000 bytes (~6,000 tokens)
Overlap: 1,000 bytes between chunks
Boundaries: Snaps to whitespace. Never splits mid-word or mid-codepoint.
Zero-allocation: Operates on borrowed &str slices.

Environment variables (LangExtract)

Variable	Default	Description
`OPENAI_API_BASE`	`http://localhost:11434/v1`	LLM endpoint.
`OPENAI_API_KEY`	`local-poc-key`	API key.
`LLM_MODEL_ID`	`llama3`	Model identifier.
`TSH_MODEL_DIR`	platform cache dir	Model download location.

🪻 Architecture

crates/
  tsh/                      # Shell binary — pipe routing, safety integration, brush-core
  langextract-host/         # Extraction engine — zero-copy chunker, async streaming
  tsh-model-manager/        # Model lifecycle — download, cache, SHA-256 verify
python/
  safety_filter.py             # safety output filter — redaction, PII masking (pluggable)
  shim.py                   # LangExtract shim — input mutations, LLM routing, output tagging
xtask/                      # CI — 42+ bash tests, 50+ Windows tests, platform-aware

Shell engine: brush-core — a Rust implementation of bash. tsh injects custom file descriptors so stdout/stderr flow through OS pipes into the async routing layer.

safety boundary: The Python safety filter is a long-running subprocess. tsh writes text to its stdin and reads sanitized output from its stdout. Binary data bypasses entirely. The filter is kill-on-drop.

References

The smart output limiter is informed by the following research on context compression for LLM agents.

[1] Jha, Erdogan, Kim, Keutzer, Gholami. "Characterizing Prompt Compression Methods for Long Context Inference." ICML 2024. Extractive compression achieves up to 10x compression with minimal accuracy loss.
[2] Lindenbauer & Slinko. "Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management." NeurIPS DL4Code Workshop, Dec 2025. Halves cost vs. LLM summarization. Code
[3] Tree-sitter code skeletonization (Repomix / Aider, 2024–2025). Parse code, return signatures + imports, strip bodies. ~70% token reduction. Code
[4] Zhang, Zhao et al. "cAST: AST-Based Code Chunking." EMNLP 2025 Findings. Recursively breaks large AST nodes into semantically coherent chunks. +4.3 Recall@5. Code
[5] Chirkova et al. "Provence: Context Pruning for RAG." ICLR 2025. Question-aware sentence pruning, plug-and-play for any LLM. Code
[6] Jiang et al. "LLMLingua-2." ACL 2024. BERT-level encoder for token classification via data distillation. 3x–6x faster, up to 20x compression. Code
[7] Li, Liu, Su, Collier. "Prompt Compression for LLMs: A Survey." NAACL 2025 (Oral). Comprehensive taxonomy of hard vs. soft prompt compression. Code
[8] Kang et al. "ACON: Agent Context Optimization." arXiv, Oct 2025. Gradient-free compression guideline optimization. 26–54% memory reduction, 95%+ accuracy.

🪷 tsh

🫧 Three modes, one binary

Interactive

Command

Script