Meet LLMRouter: An Intelligent Routing System designed to Optimize LLM Inference by Dynamically Selecting the most Suitable Model for Each Query

Spread the love

LLMRouter is an open source routing library from the U Lab at the University of Illinois Urbana-Champaign that treats model selection as a first-order system problem. It sits among a pool of applications and LLMs and chooses a model for each query based on task complexity, quality goals, and cost, all exposed through an integrated Python API and CLI. The project comes with over 16 routing models, a data generation pipeline over 11 benchmarks, and a plugin system for custom routers.

Table of Contents

Router families and supported models

LLMRouter organizes routing algorithms into four families, Single-Round Routers, Multi-Round Routers, Personalized RoutersAnd Agentic RoutersSingle round routers included knnrouter, svmrouter, mlprouter, mfrouter, elorouter, routerdc, automix, hybrid_llm, graphrouter, causallm_routerand baselines smallest_llm And largest_llmThese models implement strategies such as k nearest neighbors, support vector machines, multilayer perceptron, matrix factorization, Elo rating, dual contrastive learning, automatic model blending, and graph based routing,

Is exposed through multi round routing router_r1A pre-trained example of router R1 integrated into LLMRouter. Router R1 formulates multi LLM routing and aggregation as a sequential decision process where the router itself is an LLM that alternates between internal logic steps and external model calls. It is trained with reinforcement learning using rule based reward that balances format, outcome and cost. In llmrouter, router_r1 Available as an additional installation target with testing of pinned dependencies vllm==0.6.3 And torch==2.4.0,

Who operates personalized routing? gmtrouterDescribed as a graph based personalized router with user preference learning. GMTRouter represents multi turn user LLM interactions as a heterogeneous graph over users, queries, responses and models. It runs a message passing architecture on this graph to infer user-specific routing preferences from few-shot interaction data, and experiments show accuracy and AUC gains over non-personalized baselines.

Agentic routers in LLMRouter extend routing to multi-step reasoning workflows. knnmultiroundrouter Multi turn trace uses k nearest neighbor logic and is intended for complex tasks. llmmultiroundrouter Exposes an LLM based agentic router that performs multi step routing without its own training loop. These agentic routers share the same configuration and data formats as other router families and can be swapped out via a single CLI flag.

Data generation pipeline for routing dataset

LLMRouter comes with a complete data generation pipeline that transforms standard benchmarks and LLM outputs into routing datasets. The pipeline supports 11 benchmarks, Natural QA, Trivia QA, MMLU, GPQA, MBPP, HumanEval, GSM8K, CommonsenseQA, MATH, OpenBookQA, and ARC Challenge. It takes place in three distinct stages. First, data_generation.py Extracts queries and ground truth labels and trains and tests JSONL splits. Second, generate_llm_embeddings.py From metadata the candidate creates embeddings for the LLM. third, api_calling_evaluation.py The LLM calls the API, evaluates the responses, and fuses the scores with the embeddings in the routing record. (GitHub)

The pipeline outputs query files, LLM embeddings JSON, query embeddings tensor and routing data JSONL files. The routing entry contains fields such as task_name, query, ground_truth, metric, model_name, response, performance, embedding_idAnd token_numConfiguration is handled entirely through YAML, so engineers point scripts at new datasets and candidate model lists without modifying the code,

Chat Interface and Plugin System

For interactive use, llmrouter chat Launches a Gradio based chat frontend on any router and configuration. The server can connect to a custom host and port and expose a public sharing link. Query modes control how routing looks at the context. current_only Uses only the latest user messages, full_context Dialogue connects history, and retrieval Augments the query with top k similar historical queries. The UI visualizes model options in real time and is powered by the same router configuration used for batch inference.

LLMRouter also offers a plugin system for custom routers. new routers stay down custom_routerssubcategory MetaRouterand implement route_single And route_batchConfiguration files under that directory define data paths, hyperparameters, and optional default API endpoints, Plugin search scans the project custom_routers folder, a ~/.llmrouter/plugins directory, and any additional paths LLMROUTER_PLUGINS environment variable. Examples custom routers include randomrouterwhich randomly selects a model, and thresholdrouterWhich is a trained router that estimates the query difficulty.

key takeaways

Routing as a first class abstraction:LLMRouter is an open source routing layer from UIUC that sits between applications and the heterogeneous LLM pool and centralizes model selection as a cost and quality aware prediction task rather than an ad-hoc script.
Four router families covering more than 16 algorithms: The library standardizes over 16 routers into four families, Single Round, Multi Round, Personalized, and Agentic, including knnrouter, graphrouter, routerdc, router_r1And gmtrouterAll exposed through a unified configuration and CLI.
Multi Round RL Routing through Router R1, router_r1 The router integrates the R1 framework, where an LLM router combines internal “thinking” steps with external “route” calls and is trained with a rule based reward that combines format, outcome and cost to optimize the performance cost tradeoff.

Graph Based Personalization with GMTRouter, gmtrouter models users, queries, responses, and LLMs as nodes in a heterogeneous graph and uses message passing to learn user specific routing preferences from a few shot histories, achieving approximately 21% accuracy gain and substantial AUC improvement over the robust baseline.
End-to-end pipeline and extensibility:LLMRouter provides a benchmark-driven data pipeline, CLI for training and inference, a Gradio chat UI, centralized API key handling, and a plugin system. MetaRouter This allows teams to register custom routers while reusing the same routing dataset and infrastructure.

check it out GitHub repo and technical detailsAlso, feel free to follow us Twitter And don’t forget to join us 100k+ ml subreddit and subscribe our newsletterwait! Are you on Telegram? Now you can also connect with us on Telegram.

Asif Razzaq Marktechpost Media Inc. Is the CEO of. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. Their most recent endeavor is the launch of MarketTechPost, an Artificial Intelligence media platform, known for its in-depth coverage of Machine Learning and Deep Learning news that is technically robust and easily understood by a wide audience. The platform boasts of over 2 million monthly views, which shows its popularity among the audience.

Source link

Related Stories

Liquid AI Introduces LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M: Dense Bi-Encoder and Late-Interaction Models for Fast Multilingual Search Across 11 Languages

Access Denied

In game theory, generalists sometimes win out over specialists | MIT News

You may have missed

Andy Burnham could soon challenge Keir Starmer as the Labour leader

Unpatchable ‘usbliter8’ Exploit Breaks Apple A12 and A13 SecureROM Boot Chain

Access Denied

Israel and Hezbollah agree ceasefire as US-Iran talks stall