Thinking Machines Lab Makes Tinker Generally Available: Adds Kimi K2 Thinking And Qwen3-VL Vision Input

Spread the love

Thinking Machines Lab has moved its Tkinter training API to general availability and added 3 key capabilities, support for the KM K2 thinking reasoning model, OpenAI compatible sampling, and image input via the Qwen3-VL vision language model. For AI engineers, this turns Tkinter into a practical way to improve frontier models without building distributed training infrastructure.

Table of Contents

What does Tkinter actually do?

Tkinter is a training API that focuses on fine tuning large language models and hides the heavy lifting of distributed training. You write a simple Python loop that only runs on a CPU machine. You define the data or RL environment, loss, and training logic. The Tkinter service maps that loop onto a cluster of GPUs and performs the exact computation you specify.

The API exposes a small set of primitives, like forward_backward To calculate the gradient, optim_step To update the weight, sample Serves to generate output, and to save and load state. This keeps the training logic clear for those who want to implement supervised learning, reinforcement learning, or preference optimization, but don’t want to manage GPU failures and scheduling.

Tkinter uses low rank optimization, LoRA, instead of full fine tuning for all supported models. LoRA trains small adapter matrices on top of frozen base weights, which reduces memory and makes it practical to run experiments repeatedly on large mixtures of expert models in the same cluster.

General availability and KM K2 thinking

The major change in the December 2025 update is that Tkinter no longer has a waiting list. Anyone can sign up, view the current model lineup and pricing, and run cookbook examples directly.

On the model side, users can now fine tune moonshotai/Kimi-K2-Thinking On Tinker. KM is a logic model with approximately 1 trillion total parameters in a combination of the architecture of K2 Thinking experts. It is designed for long ranges of thought and heavy equipment use, and is currently the largest model in the Tinker catalogue.

In the Tkinter model lineup, Kimi K2 Thinking appears as a reasoning MoE model, with a mix of Qwen3 dense and expert variants, the Llama-3 generation model, and DeepSeek-V3.1. Logic models always generate internal chains of thought before a visible answer, while instruction models focus on latency and direct responses.

OpenAI compatible sampling during training

Tkinter already had a native sample interface SamplingClientcreates specific guessing patterns ModelInput token id, pass by SamplingParamsand call sample To achieve a future that addresses outputs

A second path has been added in the new release that mirrors the OpenAI completion interface. A model checkpoint on Tkinter can be referenced via a URI like:

response = openai_client.completions.create(
    model="tinker://0034d8c9-0a88-52a9-b2b7-bce7cb1e6fef:train:0/sampler_weights/000080",
    prompt="The capital of France is",
    max_tokens=20,
    temperature=0.0,
    stop=["\n"],
)

Vision input with Qwen3-VL on Tkinter

The second key capability is image input. Tkinter now exposes 2 Qwen3-VL vision language models, Qwen/Qwen3-VL-30B-A3B-Instruct And Qwen/Qwen3-VL-235B-A22B-InstructThey are listed as Vision MOE models in the Tkinter model lineup and are available for training and sampling through the same API surface,

To send an image to a model, you create a ModelInput that binds one together ImageChunk With text segments. The research blog uses the following minimal example:

model_input = tinker.ModelInput(chunks=[
    tinker.types.ImageChunk(data=image_data, format="png"),
    tinker.types.EncodedTextChunk(tokens=tokenizer.encode("What is this?")),
])

Here image_data is raw bytes and format For example, identifies encoding png Or jpegYou can use the same representation for supervised learning and RL fine tuning, which keeps multimodel pipelines consistent at the API level, Vision inputs are fully supported in Tinker’s LoRA training setup,

https://thinkingmachines.ai/blog/tinker-general-available/

Qwen3-VL vs DINOv2 on Image Classification

The Tkinter team did a better job of showing what the new Vision Path can do Qwen3-VL-235B-A22B-Instruct As an image classifier. They used 4 standard datasets:

caltech 101
stanford cars
oxford flower
oxford pets

Because Qwen3-VL is a language model with visual input, classification is modeled as text generation. The model receives an image and generates the class name as a text sequence.

As a baseline, they fixed the DINOv2 base model. DINOv2 is a self-supervised vision transformer that encodes images into embeddings and is often used as the backbone for vision tasks. For this experiment, a classification vertex is attached on top of DINOv2 to predict the distribution over N labels in each dataset.

Both the Qwen3-VL-235B-A22B-Instruct and DINOv2 bases are trained using the LoRA adapter within Tkinter. The focus is data efficiency. The experiment starts with only 1 sample per class and increases the number of labeled examples per class. For each setting, the team measures classification accuracy.

key takeaways

Tkinter is now generally available, so anyone can sign up and fine tune Open Weight LLM through a Python training loop, while Tkinter handles the distributed training backend.
The platform supports Kimi K2 Thinking, a 1 trillion parameter blend of reasoning models from Moonshot AI’s experts, highlighting it as a fine-tunable reasoning model in the Tkinter lineup.

Tkinter adds an OpenAI compatible inference interface, which lets you sample using training checkpoints tinker://… Model URIs through standard OpenAI style clients and tooling.
Vision input is enabled through the Qwen3-VL models, Qwen3-VL 30B and Qwen3-VL 235B, so developers can create multimodal training pipelines that combine ImageChunk Input with text using similar LoRA based API.
Thinking Machines shows that Qwen3-VL 235B, fine-tuned on Tinker, achieves stronger few-shot image classification performance than the DINOv2 base baseline on datasets such as Caltech 101, Stanford Cars, Oxford Flowers, and Oxford Pets, highlighting the data efficiency of large vision language models.

Asif Razzaq Marktechpost Media Inc. Is the CEO of. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. Their most recent endeavor is the launch of MarketTechPost, an Artificial Intelligence media platform, known for its in-depth coverage of Machine Learning and Deep Learning news that is technically robust and easily understood by a wide audience. The platform boasts of over 2 million monthly views, which shows its popularity among the audience.

Source link