How to connect WebUI and Cline to Telegram Cocoon
Background
It’s surprising that there’s almost no practical information about Telegram Cocoon beyond what’s on GitHub and the official website. Various media outlets have plenty of general coverage about the network launch, but almost nothing about real user experience.
I decided to spend a bit of time and figure out what’s actually going on in the network, how it works, and, most importantly, whether I, as a developer, can use it today. So in this article I’ll look at Cocoon from a developer’s perspective: how to install it and how to use it.
Telegram Cocoon is a decentralized AI inference network: Cocoon’s infrastructure is used to run ready-made open-source models. GPU owners provide compute through Cocoon Worker, and developers who want to integrate AI into their apps connect through Cocoon Client. Both the client and the workers connect to Cocoon Proxy, which handles load balancing - routing requests from the client to workers. The client pays for requests in Toncoin, and workers receive rewards minus the proxy fee, which is currently around 5%.
In theory, everything is perfect. However, as we will see further, the Cocoon network currently actually operates only two models: Qwen3-32B and Seed-X-PPO-7B, and about four workers. Based on their purpose, Seed-X-PPO-7B is mainly used for translations, while Qwen3-32B, as a more universal model, is used for text extraction. This is not enough to consider Cocoon as a full replacement for OpenAI models in scenarios such as Cline and WebUI: yes, everything runs and can be used, but in practice, even the free versions of ChatGPT, Gemini, and others will be more productive.
At the same time, if Cocoon starts adding strong open-source models quickly and at a low cost, its openness could allow it to compete with proprietary solutions. For now, however, it feels like the system is primarily being used by Telegram itself to cover its internal needs.
What are the advantages of Cocoon? First, it is claimed that no one can spy on what a user is discussing with a model except the client owner. Second, it is difficult to block an individual user. Third, it is a decentralized platform that, in theory, should provide access to any open-source models at a low cost. Fourth, from a developer’s perspective, it’s important that the client exposes an OpenAI-like API: you can plug it into familiar tools with almost no changes, just by switching the base URL. And fifth, there are no subscriptions—you pay in TON only for actual requests.
Practical part
What we want to end up with looks like this:
WebUI/Cline -> Cocoon Client (OpenAI API) -> Proxy -> Workers (GPU) -> Model
We’ll install Cocoon Client and connect an open-source chat, Open WebUI, to it so we can “chat” with the model while Cocoon handles the responses in the background. Then we’ll hook Cocoon up to the Cline agent in Visual Studio Code and try to write a working program.
Running Cocoon Client
Let’s start by installing Cocoon Client. The Cocoon project uses Intel TDX (Confidential Computing), and this is supported out of the box on Ubuntu 25.10, so for simplicity I’ll be using that. At the same time, it doesn’t really matter whether the server’s CPU actually supports TDX or not—in our scenario this isn’t critical. For those who are interested, here’s Canonical’s repository about TDX: https://github.com/canonical/tdx
Out of the box, the client didn’t start for me and kept crashing with TEE/TDX-related errors. Most VPS providers don’t support proper TEE attestation, so I run the client with those checks disabled. Open WebUI also didn’t work at first until I added the application/json headers in the response.
First, we’ll install all the dependencies, not only for Cocoon, but also for Open WebUI and Cline.
sudo apt-get update && sudo apt-get install -y \
zlib1g-dev \
libjemalloc-dev \
libssl-dev \
liblz4-dev \
libsodium-dev \
libreadline-dev \
apache2-utils \
autoconf \
automake \
libtool \
pkg-config
Next, we clone the project:
git clone --recursive https://github.com/TelegramMessenger/cocoon.git
As mentioned earlier, I wrote a small patch to make the Cocoon Client work properly with WebUI/Cline and to allow it to run even if Confidential Computing (TDX/TEE) is not supported:
https://gist.github.com/raiym/d5e916e915cb3e146d3b46d4a50344f8
Apply the patch.
git apply cocoon.patch
Download the network config from the official project website:
curl -o spec/mainnet-full-ton-config.json https://cocoon.org/resources/mainnet.cocoon.global.config.json
Next, we create a client-specific config in the project root called client.conf:
[node]
type = client
owner_address = UQAKPq2DV...HX4YpjBd
node_wallet_key = N4Y/5.../2ryRygu/6c=
root_contract_address = EQCns7bYSp0igFvS1wpb5wsZjCKCV19MD5AVzI4EyxsnU73k
ton_config = spec/mainnet-full-ton-config.json
instance = 0
Note that owner_address should be your own - this is the TON wallet that controls the client (though after creating it, I didn’t use it again).
Next, node_wallet_key is just 32 random bytes encoded in base64; it can be anything. This random value is used to derive the client’s wallet address on TON.
head -c 32 /dev/urandom | base64
N4Y/5.../2ryRygu/6c=
Next, root_contract_address is the address of the Cocoon root contract on the TON network. It stores the network configuration—for example, the list of proxies and economic parameters (token prices, multipliers, and so on). At the time of writing, there is effectively a single proxy in the network at IP 91.108.4.11 (possibly an Anycast IP) with two ports: :5222 and :8888 - one used by workers and the other by clients.
After that, we run the launcher; the script will handle compilation for us:
COCOON_ROUTER_POLICY=any COCOON_SKIP_TDX_USERCLAIMS=1 COCOON_SKIP_PROXY_HASH=1 COCOON_CLIENT_VERBOSITY=3 ./scripts/cocoon-launch client.con
Next time, we can skip the compilation step:
COCOON_ROUTER_POLICY=any COCOON_SKIP_TDX_USERCLAIMS=1 COCOON_SKIP_PROXY_HASH=1 COCOON_CLIENT_VERBOSITY=3 ./scripts/cocoon-launch client.conf --skip-build
After startup, the logs will show a TON wallet address that needs to be funded in order to access the models. I topped it up with 30 TON. Out of that, 15 TON goes into a deposit, which, as far as I understand, can be refunded if it isn’t fully used (I didn’t test this).
[CLIENT] **[ 1][t 2][2026-01-06 12:11:40.671298504][BaseRunner.cpp:1048][!client] ACTION REQUIRED: BALANCE ON CONTRACT UQAf6e2X3wxaK...N7rHHBzKMEfs IS TOO LOW: MINIMUM 2100000000 CURRENT -1
After funding the wallet, you should see something like this in the logs:
[CLIENT] [ 3][t 6][2026-02-01 09:49:00.527275537][TonlibWrapper.cpp:64][!TonlibClientWrapper] TonLib is synced
[CLIENT] [ 2][t 6][2026-02-01 09:49:00.679363092][TonlibClient.cpp:846][!GetAccountState] Unknown code hash: WwfGvQw0c1h036NUjNmgHHi+Hg/fINDF6+N3djlsVAA=
[CLIENT] [ 3][t 6][2026-02-01 09:49:00.679504904][BaseRunner.cpp:663][!client] got root contract state with ts=1769939333
[CLIENT] [ 3][t 6][2026-02-01 09:49:00.679579734][RootContractConfig.cpp:316][!client] parse root contract state: owner=UQDnlslXI2RtI1WhLmtelkb4CVQGxr8E_xSIjl0Hg79jNk6V unique_id=239 is_test=NO proxy_hashes_size=1 registered_proxies_count=1 last_proxy_seqno=3 workers_hashes_count=3 price_per_token=20 worker_fee_per_token=19 version=76 16 min_proxy_stake=15000000000 min_client_stake=15000000000 prompt_tokens_price_multiplier=10000 cached_tokens_price_multiplier=1000 completion_tokens_price_multiplier=80000 reasoning_tokens_price_multiplier=80000
[CLIENT] [ 3][t 2][2026-02-01 09:49:01.660136137][TonlibWrapper.cpp:66][!TonlibClientWrapper] TonLib is syncing: 56789955/56789955
[CLIENT] [ 3][t 2][2026-02-01 09:49:02.140429635][TonlibWrapper.cpp:64][!TonlibClientWrapper] TonLib is synced
After startup, the client starts listening on port 10000 and exposes a set of APIs. For example:
127.0.0.1:10000/v1/models returns the list of available models, the number of workers per model, and the current load (how many requests are currently running on each worker). This is where you can clearly see that only two models are available on the network:
{
"object": "list",
"data": [
{
"id": "ByteDance-Seed/Seed-X-PPO-7B",
"object": "model",
"created": 0,
"owned_by": "?",
"workers": [
{
"coefficient": 1000,
"running_requests": 4,
"max_running_requests": 60
},
...
{
"id": "Qwen/Qwen3-32B",
"object": "model",
"created": 0,
"owned_by": "?",
"workers": [
{
"coefficient": 1000,
"running_requests": 7,
"max_running_requests": 60
},
Besides that, the following endpoints are available (there are more):
- /v1/models
- /v1/chat/completions
- /v1/completions
- /stats
- /jsonstats
Next, for convenience, I ran Cocoon Client via systemd so it would keep running in the background:
# sudo systemctl cat cocoon-client
# /etc/systemd/system/cocoon-client.service
[Unit]
Description=Cocoon Client
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
WorkingDirectory=/root/cocoon
Environment=COCOON_ROUTER_POLICY=any
Environment=COCOON_SKIP_TDX_USERCLAIMS=1
Environment=COCOON_SKIP_PROXY_HASH=1
Environment=COCOON_CLIENT_VERBOSITY=3
ExecStartPre=/usr/bin/install -d -m 0755 /var/log/cocoon
ExecStart=/root/cocoon/scripts/cocoon-launch /root/cocoon/client.conf --skip-build
Restart=always
RestartSec=5
StandardOutput=append:/var/log/cocoon/client.log
StandardError=append:/var/log/cocoon/client.log
[Install]
WantedBy=multi-user.target
Connecting Cocoon Client to Open WebUI
Now we’ll install and run Open WebUI. We’ll point it at the Cocoon client on port 10000, and run the WebUI itself, for example, on port 8282:
sudo docker run -d --name openwebui --restart unless-stopped \
--add-host=host.docker.internal:host-gateway \
-p 8282:8080 \
-e OPENAI_API_BASE_URL="http://host.docker.internal:10000/v1" \
-e OPENAI_API_BASE="http://host.docker.internal:10000/v1" \
-e OPENAI_API_KEY="none" \
ghcr.io/open-webui/open-webui:main
It will connect directly to Cocoon Client. Open it via the server IP and port, set up the admin login, and, for example, let’s ask the model about Habr.
Technically, WebUI does work with Cocoon, we’ve confirmed that. But the real question is: where are you, @deniskin?
Connecting Cocoon Client to the Cline plugin
Since port 10000 shouldn’t be public, we’ll put a proxy on a different port and add a simple Bearer token check so only people who know the secret can use the API. We’ll need this to connect Cline.
# /etc/nginx/conf.d/cocoon-cline.conf
server {
listen 8181;
location / {
if ($http_authorization != "Bearer XXXXXXXXX") { return 401; }
proxy_pass http://127.0.0.1:10000;
proxy_http_version 1.1;
proxy_buffering off;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Connection "";
}
}
Now the Cocoon API is available on port 8181 and protected by a token. Next, in Cline’s settings in VS Code, enter your server’s IP address and the model name from the /v1/models response, for example, Qwen/Qwen3-32B:
Next, we ask Cline to write a simple Rust program that fetches the TON price from the Bitfinex exchange based on the TypeScript example from the official documentation. After a few attempts, Cline and Qwen finally produced working code. But for day-to-day work, in my opinion, this model still isn’t good enough.
Results
We were able to run Cocoon in chat mode via Open WebUI and connect it to Cline. However, only two models are available: Qwen3-32B and Seed-X-PPO-7B. Seed appears to be used by Telegram mainly for translation, while Qwen is used for extracting key information from text. There are no other models available on the network right now.
From a regular user or developer perspective, it only really makes sense to look at Qwen, but I wasn’t very impressed with the outcome: for day-to-day work, Qwen3-32B isn’t suitable either as a chat model or for coding. Cocoon’s current state feels more like a working prototype than a mature platform. And at the moment, Cocoon only supports text.
Conclusions
If Cocoon eventually adds more strong models, attracts more workers, and becomes cheaper than the rest of the industry, then we as users stand to gain a lot. It would be enough to buy a few dozen TON and use Cocoon as much as you want: a wide range of models, no censorship, no hard limits (other than your wallet), and no user-level blocking. It could be used both in chat mode and for programming. Video and image generation would be a very welcome addition.
Going further, as the platform matures, it could potentially attract truly large companies that need cheap inference. For example, I think using Cocoon for video generation integrated with ComfyUI could be quite promising.
That said, there are a lot of caveats. The platform launched on December 1, 2025, but the number of GPUs on the network can still be counted on one hand. The entire network generates only about 250 Toncoin in revenue per day, even though two months have already passed since launch. This creates the impression that Telegram simply doesn’t have enough resources to actively develop Cocoon—despite how much could already have been done. There’s a real risk that Telegram will end up using and developing the platform mainly for its own internal needs due to limited resources.
For real growth, Cocoon needs a dedicated team focused on identifying the most popular use cases, integrating them quickly, and explaining how to use them. I see a niche here: Open WebUI + a code agent + ComfyUI. If Cocoon can make this setup convenient and cheaper, it could become a serious competitor. In that case, the platform would gain steady traffic—but only if it’s continuously updated.
Marketing also matters. Money needs to be spent on building a community—Reddit, Discord—so a core group of enthusiasts can form around the platform.
Right now, only server-grade GPUs like the H200 are supported, and such a server costs $35,000 or more, since Confidential Compute is essentially limited to server hardware. Ideally, Cocoon would run not only on Intel, but also on AMD, Qualcomm, and Apple—and not only with NVIDIA GPUs, but also AMD, Huawei, and others. In other words, Cocoon should aim to become the Android of the AI world.
Cocoon can already be used today, but for now it’s more of an infrastructure prototype for Telegram than a true developer platform.
|
What we expected |
What we actually have now |
|---|---|
|
Cutting-edge models |
2 relatively old models |
|
Scale (1,000+ workers) |
roughly 16 workers |
|
OpenAI replacement |
no |
|
Privacy |
claimed, but depends on hardware |
|
Revenue covers GPU costs |
the entire Cocoon network earns ~250 TON/day |
|
Video generation support |
no |
|
Voice and image support |
no |
Links
Official Telegram Cocoon website: https://cocoon.org
Official GitHub repository: https://github.com/TelegramMessenger/cocoon
Real-time information about the Cocoon network: https://getaruai.com/#/cocoon
Open WebUI: https://github.com/open-webui/open-webui
Cline: https://github.com/cline/cline
Video (ComfyUI): https://github.com/Comfy-Org/ComfyUI