ThinkMesh: A Python lib for parallel thinking in LLMs

ThinkMesh ThinkMesh is a python library for running diverse reasoning paths in parallel, scoring them with internal confidence signals, reallocates compute to promising branches, and fuses outcomes with verifiers and reducers. It works with offline Hugging Face Transformers and vLLM/TGI, and with hosted APIs. Note: This is still in it's early development phase and breaking changes can sometimes occur Highlights Parallel reasoning with DeepConf‑style confidence gating and budget reallocation Offline‑first with Transformers; optional vLLM/TGI for server‑side batching Hosted adapters for OpenAI and Anthropic Async execution with dynamic micro‑batches Reducers (majority/judge) and pluggable verifiers (regex/numeric/custom) Caching, metrics, and JSON traces Install git clone https://github.com/martianlantern/thinkmesh.git cd thinkmesh pip install -e " .[dev,transformers] " Quickstart: Offline DeepConf from thinkmesh import think , ThinkConfig , ModelSpec , StrategySpec cfg = ThinkConfig ( model = ModelSpec ( backend = "transformers" , model_name = "Qwen2.5-7B-Instruct" , max_tokens = 256 , temperature = 0.7 , seed = 42 , extra = { "device" : "cuda:0" }), strategy = StrategySpec ( name = "deepconf" , parallel = 8 , max_steps = 2 , deepconf = { "k" : 5 , "tau_low" : - 1.25 , "tau_ent" : 2.2 , "realloc_top_p" : 0.4 }), reducer = { "name" : "majority" }, budgets = { "wall_clock_s" : 20 , "tokens" : 4000 }, ) ans = think ( "Show that the product of any three consecutive integers is divisible by 3." , cfg ) print ( ans . content , ans . confidence ) Quickstart: OpenAI Self‑Consistency import os os . environ [ "OPENAI_API_KEY" ] = "sk-..." from thinkmesh import think , ThinkConfig , ModelSpec , StrategySpec cfg = ThinkConfig ( model = ModelSpec ( backend = "openai" , model_name = "gpt-4o-mini" , max_tokens = 256 , temperature = 0.6 ), strategy = StrategySpec ( name = "self_consistency" , parallel = 6 , max_steps = 1 ), reducer = { "name" : "majority" }, budgets = { "wall_clock_s" : 15 , "tokens" : 3000 }, ) print ( think ( "List three creative uses for a paperclip." , cfg ). content ) CLI thinkmesh think -m Qwen2.5-7B-Instruct --backend transformers --strategy deepconf " What is 37*43? " Examples Debate Strategy (hosted) from thinkmesh import think , ThinkConfig , ModelSpec , StrategySpec cfg = ThinkConfig ( model = ModelSpec ( backend = "openai" , model_name = "gpt-4o-mini" , max_tokens = 256 , temperature = 0.7 ), strategy = StrategySpec ( name = "debate" , parallel = 4 , max_steps = 2 , debate = { "rounds" : 2 }), reducer = { "name" : "judge" }, budgets = { "wall_clock_s" : 25 , "tokens" : 5000 }, ) print ( think ( "Argue whether every even integer > 2 is the sum of two primes." , cfg ). content ) vLLM Local Server from thinkmesh import think , ThinkConfig , ModelSpec , StrategySpec cfg = ThinkConfig ( model = ModelSpec ( backend = "vllm" , model_name = "Qwen2.5-7B-Instruct" , max_tokens = 256 , temperature = 0.7 , extra = { "base_url" : "http://localhost:8000/v1" , "api_key" : "sk-" }), strategy = StrategySpec ( name = "deepconf" , parallel = 8 , max_steps = 2 , deepconf = { "k" : 5 }), reducer = { "name" : "majority" }, budgets = { "wall_clock_s" : 20 , "tokens" : 4000 }, ) print ( think ( "Give a constructive proof for the Pigeonhole Principle on a simple case." , cfg ). content ) Custom Verifier from thinkmesh import think , ThinkConfig , ModelSpec , StrategySpec cfg = ThinkConfig ( model = ModelSpec ( backend = "transformers" , model_name = "Qwen2.5-7B-Instruct" , max_tokens = 128 ), strategy = StrategySpec ( name = "self_consistency" , parallel = 5 , max_steps = 1 ), reducer = { "name" : "majority" }, verifier = { "type" : "regex" , "pattern" : r"Final Answer\s*:\s*.+$" }, budgets = { "wall_clock_s" : 10 , "tokens" : 1500 }, ) print ( think ( "Answer with 'Final Answer: ' for 19*21." , cfg ). content ) Tree Of Thought (offline) from thinkmesh import think , ThinkConfig , ModelSpec , StrategySpec cfg = ThinkConfig ( model = ModelSpec ( backend = "transformers" , model_name = "Qwen2.5-7B-Instruct" , max_tokens = 192 ), strategy = StrategySpec ( name = "tree" , parallel = 6 , max_steps = 2 , tree = { "branches" : 3 , "depth" : 2 }), reducer = { "name" : "majority" }, budgets = { "wall_clock_s" : 20 , "tokens" : 3500 }, ) print ( think ( "Sketch a plan to prove that sqrt(2) is irrational." , cfg ). content ) Traces, Metrics, Caching Traces are emitted as JSON graphs inside the returned structure. Prometheus metrics and OpenTelemetry spans can be enabled via config extras. A local disk cache deduplicates repeated generations by hashing adapter, model, prompt, and params. Extending Implement a new backend by providing a Thinker.generate method that returns token text and optional token logprobs method that returns token text and optional token logprobs Add a new strategy by wiring a function in thinkmesh/strategies and registering by name and registering by name Add reducers/verifiers under thinkmesh/reduce License MIT References @misc{deepconf2025, title = {DeepConf: Deep Think with Confidence}, year = {2025}, howpublished = {\url{https://jiaweizzhao.github.io/deepconf/}} } @misc{wang2022selfconsistency, title = {Self-Consistency Improves Chain-of-Thought Reasoning in Language Models}, author = {Wang, Xuezhi and Wei, Jason and others}, year = {2022}, eprint = {2203.11171}, archivePrefix = {arXiv}, primaryClass = {cs.CL} } @misc{yao2023tree, title = {Tree of Thoughts: Deliberate Problem Solving with Large Language Models}, author = {Yao, Shunyu and others}, year = {2023}, eprint = {2305.10601}, archivePrefix = {arXiv}, primaryClass = {cs.AI} } Citation If you use this library in your work, please cite:

ThinkMesh: A Python lib for parallel thinking in LLMs

Share this article

Related Articles