TorchForge RL Pipelines Now Operable on Together AI’s Cloud

Jessie A Ellis
Dec 04, 2025 17:54

Together AI introduces TorchForge RL pipelines on its cloud platform, enhancing distributed training and sandboxed environments with a BlackJack training demo.

TorchForge reinforcement learning (RL) pipelines are now seamlessly operable on Together AI’s Instant Clusters, offering robust support for distributed training, tool execution, and sandboxed environments, as demonstrated by an open-source BlackJack training demo, according to together.ai.

The AI Native Cloud: Foundation for Next-Gen RL

In the rapidly evolving field of reinforcement learning, building flexible and scalable systems necessitates compatible and efficient compute frameworks and tooling. Modern RL pipelines have transcended basic training loops, now relying heavily on distributed rollouts, high-throughput inference, and a coordinated use of CPU and GPU resources.

The comprehensive PyTorch stack, inclusive of TorchForge and Monarch, now operates with distributed training capabilities on Together Instant Clusters. These clusters provide:

Low-latency GPU communication: Utilizing InfiniBand/NVLink topologies for efficient RDMA-based data transfers and distributed actor messaging.
Consistent cluster bring-up: Preconfigured with drivers, NCCL, CUDA, and the GPU operator, enabling PyTorch distributed jobs to run without manual setup.
Heterogeneous RL workload scheduling: Optimized GPU nodes for policy replicas and trainers, alongside CPU-optimized nodes for environment and tool execution.

Together AI’s clusters are aptly suited for RL frameworks that require a blend of GPU-bound model computation and CPU-bound environment workloads.

Advanced Tool Integration and Demonstration

A significant portion of RL workloads involves executing tools, running code, or interacting with sandboxed environments. Together AI’s platform natively supports these requirements through:

Together CodeSandbox: MicroVM environments tailored for tool-use, coding tasks, and simulations.
Together Code Interpreter: Facilitates fast, isolated Python execution suitable for unit-test-based reward functions or code-evaluation tasks.

Both CodeSandbox and Code Interpreter integrate with OpenEnv and TorchForge environment services, allowing rollout workers to utilize these tools during training.

BlackJack Training Demo

Together AI has released a demonstration of a TorchForge RL pipeline running on its Instant Clusters, interacting with an OpenEnv environment hosted on Together CodeSandbox. This demo, adapted from a Meta reference implementation, trains a Qwen 1.5B model to play BlackJack using GRPO. The RL pipeline integrates a vLLM policy server, BlackJack environment, reference model, off-policy replay buffer, and a TorchTitan trainer—connected through Monarch’s actor mesh and using TorchStore for weight synchronization.

The OpenEnv GRPO BlackJack repository includes Kubernetes manifests and setup scripts. Deployment and training initiation are streamlined with simple kubectl commands, allowing experimentation with model configurations and GRPO hyperparameter adjustments.

Additionally, a standalone integration wraps Together’s Code Interpreter as an OpenEnv environment, enabling RL agents to interact with the Interpreter like any other environment. This integration allows RL pipelines to be applied to diverse tasks such as coding and mathematical reasoning.

The demonstrations highlight that sophisticated, multi-component RL training can be conducted on the Together AI Cloud with ease, setting the stage for a flexible, open RL framework in the PyTorch ecosystem, scalable on the Together AI Cloud.

Image source: Shutterstock

Source: https://blockchain.news/news/torchforge-rl-pipelines-operable-together-ai-cloud