Day 0 Support: Claude 4.5 Opus (+Advanced Features)
This guide covers Anthropic's latest model (Claude Opus 4.5) and its advanced features now available in LiteLLM: Tool Search, Programmatic Tool Calling, Tool Input Examples, and the Effort Parameter.
| Feature | Supported Models |
|---|---|
| Tool Search | Claude Opus 4.5, Sonnet 4.5 |
| Programmatic Tool Calling | Claude Opus 4.5, Sonnet 4.5 |
| Input Examples | Claude Opus 4.5, Sonnet 4.5 |
| Effort Parameter | Claude Opus 4.5 only |
Supported Providers: Anthropic, Bedrock, Vertex AI.
Usage​
- LiteLLM Python SDK
- LiteLLM Proxy
import os
from litellm import completion
# set env - [OPTIONAL] replace with your anthropic key
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
messages = [{"role": "user", "content": "Hey! how's it going?"}]
## OPENAI /chat/completions API format
response = completion(model="claude-opus-4-5-20251101", messages=messages)
print(response)
1. Setup config.yaml
model_list:
- model_name: claude-4 ### RECEIVED MODEL NAME ###
litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
model: claude-opus-4-5-20251101 ### MODEL NAME sent to `litellm.completion()` ###
api_key: "os.environ/ANTHROPIC_API_KEY" # does os.getenv("ANTHROPIC_API_KEY")
2. Start the proxy
litellm --config /path/to/config.yaml
3. Test it!
- OpenAI Chat Completions
- Anthropic /v1/messages API
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data ' {
"model": "claude-4",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
curl --location 'http://0.0.0.0:4000/v1/messages' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data ' {
"model": "claude-4",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
Usage - Bedrock​
LiteLLM uses the boto3 library to authenticate with Bedrock.
For more ways to authenticate with Bedrock, see the Bedrock documentation.
- LiteLLM Python SDK
- LiteLLM Proxy
import os
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
## OPENAI /chat/completions API format
response = completion(
model="bedrock/us.anthropic.claude-opus-4-5-20251101-v1:0",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
1. Setup config.yaml
model_list:
- model_name: claude-4 ### RECEIVED MODEL NAME ###
litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
model: bedrock/us.anthropic.claude-opus-4-5-20251101-v1:0 ### MODEL NAME sent to `litellm.completion()` ###
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: os.environ/AWS_REGION_NAME
2. Start the proxy
litellm --config /path/to/config.yaml
3. Test it!
- OpenAI Chat Completions
- Anthropic /v1/messages API
- Bedrock /invoke API
- Bedrock /converse API
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data ' {
"model": "claude-4",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
curl --location 'http://0.0.0.0:4000/v1/messages' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data ' {
"model": "claude-4",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
curl --location 'http://0.0.0.0:4000/bedrock/model/claude-4/invoke' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data ' {
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello, how are you?"}]
}'
curl --location 'http://0.0.0.0:4000/bedrock/model/claude-4/converse' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data ' {
"messages": [{"role": "user", "content": "Hello, how are you?"}]
}'
Usage - Vertex AI​
- LiteLLM Python SDK
- LiteLLM Proxy
from litellm import completion
import json
## GET CREDENTIALS
## RUN ##
# !gcloud auth application-default login - run this to add vertex credentials to your env
## OR ##
file_path = 'path/to/vertex_ai_service_account.json'
# Load the JSON file
with open(file_path, 'r') as file:
vertex_credentials = json.load(file)
# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)
## COMPLETION CALL
response = completion(
model="vertex_ai/claude-opus-4-5@20251101",
messages=[{ "content": "Hello, how are you?","role": "user"}],
vertex_credentials=vertex_credentials_json,
vertex_project="your-project-id",
vertex_location="us-east5"
)
1. Setup config.yaml
model_list:
- model_name: claude-4 ### RECEIVED MODEL NAME ###
litellm_params:
model: vertex_ai/claude-opus-4-5@20251101
vertex_credentials: "/path/to/service_account.json"
vertex_project: "your-project-id"
vertex_location: "us-east5"
2. Start the proxy
litellm --config /path/to/config.yaml
3. Test it!
- OpenAI Chat Completions
- Anthropic /v1/messages API
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data ' {
"model": "claude-4",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
curl --location 'http://0.0.0.0:4000/v1/messages' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data ' {
"model": "claude-4",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
Tool Search​
This lets Claude work with thousands of tools, by dynamically loading tools on-demand, instead of loading all tools into the context window upfront.
Usage Example​
- LiteLLM Python SDK
- LiteLLM Proxy
import litellm
import os
# Configure your API key
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
# Define your tools with defer_loading
tools = [
# Tool search tool (regex variant)
{
"type": "tool_search_tool_regex_20251119",
"name": "tool_search_tool_regex"
},
# Deferred tools - loaded on-demand
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location. Returns temperature and conditions.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
},
"defer_loading": True # Load on-demand
},
{
"type": "function",
"function": {
"name": "search_files",
"description": "Search through files in the workspace using keywords",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"},
"file_types": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["query"]
}
},
"defer_loading": True
},
{
"type": "function",
"function": {
"name": "query_database",
"description": "Execute SQL queries against the database",
"parameters": {
"type": "object",
"properties": {
"sql": {"type": "string"}
},
"required": ["sql"]
}
},
"defer_loading": True
}
]
# Make a request - Claude will search for and use relevant tools
response = litellm.completion(
model="anthropic/claude-opus-4-5-20251101",
messages=[{
"role": "user",
"content": "What's the weather like in San Francisco?"
}],
tools=tools
)
print("Claude's response:", response.choices[0].message.content)
print("Tool calls:", response.choices[0].message.tool_calls)
# Check tool search usage
if hasattr(response.usage, 'server_tool_use'):
print(f"Tool searches performed: {response.usage.server_tool_use.tool_search_requests}")
- Setup config.yaml
model_list:
- model_name: claude-4
litellm_params:
model: anthropic/claude-opus-4-5-20251101
api_key: os.environ/ANTHROPIC_API_KEY
- Start the proxy
litellm --config /path/to/config.yaml
- Test it!
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data ' {
"model": "claude-4",
"messages": [{
"role": "user",
"content": "What's the weather like in San Francisco?"
}],
"tools": [
# Tool search tool (regex variant)
{
"type": "tool_search_tool_regex_20251119",
"name": "tool_search_tool_regex"
},
# Deferred tools - loaded on-demand
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location. Returns temperature and conditions.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
},
"defer_loading": True # Load on-demand
},
{
"type": "function",
"function": {
"name": "search_files",
"description": "Search through files in the workspace using keywords",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"},
"file_types": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["query"]
}
},
"defer_loading": True
},
{
"type": "function",
"function": {
"name": "query_database",
"description": "Execute SQL queries against the database",
"parameters": {
"type": "object",
"properties": {
"sql": {"type": "string"}
},
"required": ["sql"]
}
},
"defer_loading": True
}
]
}
'
BM25 Variant (Natural Language Search)​
For natural language queries instead of regex patterns:
tools = [
{
"type": "tool_search_tool_bm25_20251119", # Natural language variant
"name": "tool_search_tool_bm25"
},
# ... your deferred tools
]
Programmatic Tool Calling​
Programmatic tool calling allows Claude to write code that calls your tools programmatically. Learn more
- LiteLLM Python SDK
- LiteLLM Proxy
import litellm
import json
# Define tools that can be called programmatically
tools = [
# Code execution tool (required for programmatic calling)
{
"type": "code_execution_20250825",
"name": "code_execution"
},
# Tool that can be called from code
{
"type": "function",
"function": {
"name": "query_database",
"description": "Execute a SQL query against the sales database. Returns a list of rows as JSON objects.",
"parameters": {
"type": "object",
"properties": {
"sql": {
"type": "string",
"description": "SQL query to execute"
}
},
"required": ["sql"]
}
},
"allowed_callers": ["code_execution_20250825"] # Enable programmatic calling
}
]
# First request
response = litellm.completion(
model="anthropic/claude-sonnet-4-5-20250929",
messages=[{
"role": "user",
"content": "Query sales data for West, East, and Central regions, then tell me which had the highest revenue"
}],
tools=tools
)
print("Claude's response:", response.choices[0].message)
# Handle tool calls
messages = [
{"role": "user", "content": "Query sales data for West, East, and Central regions, then tell me which had the highest revenue"},
{"role": "assistant", "content": response.choices[0].message.content, "tool_calls": response.choices[0].message.tool_calls}
]
# Process each tool call
for tool_call in response.choices[0].message.tool_calls:
# Check if it's a programmatic call
if hasattr(tool_call, 'caller') and tool_call.caller:
print(f"Programmatic call to {tool_call.function.name}")
print(f"Called from: {tool_call.caller}")
# Simulate tool execution
if tool_call.function.name == "query_database":
args = json.loads(tool_call.function.arguments)
# Simulate database query
result = json.dumps([
{"region": "West", "revenue": 150000},
{"region": "East", "revenue": 180000},
{"region": "Central", "revenue": 120000}
])
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_call.id,
"content": result
}]
})
# Get final response
final_response = litellm.completion(
model="anthropic/claude-sonnet-4-5-20250929",
messages=messages,
tools=tools
)
print("\nFinal answer:", final_response.choices[0].message.content)
- Setup config.yaml
model_list:
- model_name: claude-4
litellm_params:
model: anthropic/claude-opus-4-5-20251101
api_key: os.environ/ANTHROPIC_API_KEY
- Start the proxy
litellm --config /path/to/config.yaml
- Test it!
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data ' {
"model": "claude-4",
"messages": [{
"role": "user",
"content": "Query sales data for West, East, and Central regions, then tell me which had the highest revenue"
}],
"tools": [
# Code execution tool (required for programmatic calling)
{
"type": "code_execution_20250825",
"name": "code_execution"
},
# Tool that can be called from code
{
"type": "function",
"function": {
"name": "query_database",
"description": "Execute a SQL query against the sales database. Returns a list of rows as JSON objects.",
"parameters": {
"type": "object",
"properties": {
"sql": {
"type": "string",
"description": "SQL query to execute"
}
},
"required": ["sql"]
}
},
"allowed_callers": ["code_execution_20250825"] # Enable programmatic calling
}
]
}
'
Tool Input Examples​
You can now provide Claude with examples of how to use your tools. Learn more
- LiteLLM Python SDK
- LiteLLM Proxy
import litellm
tools = [
{
"type": "function",
"function": {
"name": "create_calendar_event",
"description": "Create a new calendar event with attendees and reminders",
"parameters": {
"type": "object",
"properties": {
"title": {"type": "string"},
"start_time": {
"type": "string",
"description": "ISO 8601 format: YYYY-MM-DDTHH:MM:SS"
},
"duration_minutes": {"type": "integer"},
"attendees": {
"type": "array",
"items": {
"type": "object",
"properties": {
"email": {"type": "string"},
"optional": {"type": "boolean"}
}
}
},
"reminders": {
"type": "array",
"items": {
"type": "object",
"properties": {
"minutes_before": {"type": "integer"},
"method": {"type": "string", "enum": ["email", "popup"]}
}
}
}
},
"required": ["title", "start_time", "duration_minutes"]
}
},
# Provide concrete examples
"input_examples": [
{
"title": "Team Standup",
"start_time": "2025-01-15T09:00:00",
"duration_minutes": 30,
"attendees": [
{"email": "alice@company.com", "optional": False},
{"email": "bob@company.com", "optional": False}
],
"reminders": [
{"minutes_before": 15, "method": "popup"}
]
},
{
"title": "Lunch Break",
"start_time": "2025-01-15T12:00:00",
"duration_minutes": 60
# Demonstrates optional fields can be omitted
}
]
}
]
response = litellm.completion(
model="anthropic/claude-sonnet-4-5-20250929",
messages=[{
"role": "user",
"content": "Schedule a team meeting for tomorrow at 2pm for 45 minutes with john@company.com and sarah@company.com"
}],
tools=tools
)
print("Tool call:", response.choices[0].message.tool_calls[0].function.arguments)
- Setup config.yaml
model_list:
- model_name: claude-4
litellm_params:
model: anthropic/claude-opus-4-5-20251101
api_key: os.environ/ANTHROPIC_API_KEY
- Start the proxy
litellm --config /path/to/config.yaml
- Test it!
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data ' {
"model": "claude-4",
"messages": [{
"role": "user",
"content": "Schedule a team meeting for tomorrow at 2pm for 45 minutes with john@company.com and sarah@company.com"
}],
"tools": [
{
"type": "function",
"function": {
"name": "create_calendar_event",
"description": "Create a new calendar event with attendees and reminders",
"parameters": {
"type": "object",
"properties": {
"title": {"type": "string"},
"start_time": {
"type": "string",
"description": "ISO 8601 format: YYYY-MM-DDTHH:MM:SS"
},
"duration_minutes": {"type": "integer"},
"attendees": {
"type": "array",
"items": {
"type": "object",
"properties": {
"email": {"type": "string"},
"optional": {"type": "boolean"}
}
}
},
"reminders": {
"type": "array",
"items": {
"type": "object",
"properties": {
"minutes_before": {"type": "integer"},
"method": {"type": "string", "enum": ["email", "popup"]}
}
}
}
},
"required": ["title", "start_time", "duration_minutes"]
}
},
# Provide concrete examples
"input_examples": [
{
"title": "Team Standup",
"start_time": "2025-01-15T09:00:00",
"duration_minutes": 30,
"attendees": [
{"email": "alice@company.com", "optional": False},
{"email": "bob@company.com", "optional": False}
],
"reminders": [
{"minutes_before": 15, "method": "popup"}
]
},
{
"title": "Lunch Break",
"start_time": "2025-01-15T12:00:00",
"duration_minutes": 60
# Demonstrates optional fields can be omitted
}
]
}
]
}
'
Effort Parameter: Control Token Usage​
Controls aspects like how much effort the model puts into its response, via output_config={"effort": ..}.
Soon, we will map OpenAI's reasoning_effort parameter to this.
Potential Values for effort parameter: "high", "medium", "low".
Usage Example​
- LiteLLM Python SDK
- LiteLLM Proxy
import litellm
message = "Analyze the trade-offs between microservices and monolithic architectures"
# High effort (default) - Maximum capability
response_high = litellm.completion(
model="anthropic/claude-opus-4-5-20251101",
messages=[{"role": "user", "content": message}],
output_config={"effort": "high"}
)
print("High effort response:")
print(response_high.choices[0].message.content)
print(f"Tokens used: {response_high.usage.completion_tokens}\n")
# Medium effort - Balanced approach
response_medium = litellm.completion(
model="anthropic/claude-opus-4-5-20251101",
messages=[{"role": "user", "content": message}],
output_config={"effort": "medium"}
)
print("Medium effort response:")
print(response_medium.choices[0].message.content)
print(f"Tokens used: {response_medium.usage.completion_tokens}\n")
# Low effort - Maximum efficiency
response_low = litellm.completion(
model="anthropic/claude-opus-4-5-20251101",
messages=[{"role": "user", "content": message}],
output_config={"effort": "low"}
)
print("Low effort response:")
print(response_low.choices[0].message.content)
print(f"Tokens used: {response_low.usage.completion_tokens}\n")
# Compare token usage
print("Token Comparison:")
print(f"High: {response_high.usage.completion_tokens} tokens")
print(f"Medium: {response_medium.usage.completion_tokens} tokens")
print(f"Low: {response_low.usage.completion_tokens} tokens")
- Setup config.yaml
model_list:
- model_name: claude-4
litellm_params:
model: anthropic/claude-opus-4-5-20251101
api_key: os.environ/ANTHROPIC_API_KEY
- Start the proxy
litellm --config /path/to/config.yaml
- Test it!
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data ' {
"model": "claude-4",
"messages": [{
"role": "user",
"content": "Analyze the trade-offs between microservices and monolithic architectures"
}],
"output_config": {
"effort": "high"
}
}
'
Cost Tracking: Monitor Tool Search Usage​
Understanding Tool Search Costs​
Tool search operations are tracked separately in the usage object, allowing you to monitor and optimize costs.
It is available in the usage object, under server_tool_use.tool_search_requests.
Anthropic charges $0.0001 per tool search request.
Tracking Example​
- LiteLLM Python SDK
- LiteLLM Proxy
import litellm
tools = [
{
"type": "tool_search_tool_regex_20251119",
"name": "tool_search_tool_regex"
},
# ... 100 deferred tools
]
response = litellm.completion(
model="anthropic/claude-sonnet-4-5-20250929",
messages=[{
"role": "user",
"content": "Find and use the weather tool for San Francisco"
}],
tools=tools
)
# Standard token usage
print("Token Usage:")
print(f" Input tokens: {response.usage.prompt_tokens}")
print(f" Output tokens: {response.usage.completion_tokens}")
print(f" Total tokens: {response.usage.total_tokens}")
# Tool search specific usage
if hasattr(response.usage, 'server_tool_use') and response.usage.server_tool_use:
print(f"\nTool Search Usage:")
print(f" Search requests: {response.usage.server_tool_use.tool_search_requests}")
# Calculate cost (example pricing)
input_cost = response.usage.prompt_tokens * 0.000003 # $3 per 1M tokens
output_cost = response.usage.completion_tokens * 0.000015 # $15 per 1M tokens
search_cost = response.usage.server_tool_use.tool_search_requests * 0.0001 # Example
total_cost = input_cost + output_cost + search_cost
print(f"\nCost Breakdown:")
print(f" Input tokens: ${input_cost:.6f}")
print(f" Output tokens: ${output_cost:.6f}")
print(f" Tool searches: ${search_cost:.6f}")
print(f" Total: ${total_cost:.6f}")
- Setup config.yaml
model_list:
- model_name: claude-4
litellm_params:
model: anthropic/claude-opus-4-5-20251101
api_key: os.environ/ANTHROPIC_API_KEY
- Start the proxy
litellm --config /path/to/config.yaml
- Test it!
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data ' {
"model": "claude-4",
"messages": [{
"role": "user",
"content": "Find and use the weather tool for San Francisco"
}],
"tools": [
{
"type": "tool_search_tool_regex_20251119",
"name": "tool_search_tool_regex"
},
# ... 100 deferred tools
]
}
'
Expected Response:
{
...,
"usage": {
...,
"server_tool_use": {
"tool_search_requests": 1
}
}
}
Cost Optimization Tips​
- Keep frequently used tools non-deferred (3-5 tools)
- Use tool search for large catalogs (10+ tools)
- Monitor search requests to identify optimization opportunities
- Combine with effort parameter for maximum efficiency
Combining Features​
The Power of Integration​
These features work together seamlessly. Here's a real-world example combining all of them:
- LiteLLM Python SDK
- LiteLLM Proxy
import litellm
import json
# Large tool catalog with search, programmatic calling, and examples
tools = [
# Enable tool search
{
"type": "tool_search_tool_regex_20251119",
"name": "tool_search_tool_regex"
},
# Enable programmatic calling
{
"type": "code_execution_20250825",
"name": "code_execution"
},
# Database tool with all features
{
"type": "function",
"function": {
"name": "query_database",
"description": "Execute SQL queries against the analytics database. Returns JSON array of results.",
"parameters": {
"type": "object",
"properties": {
"sql": {
"type": "string",
"description": "SQL SELECT statement"
},
"limit": {
"type": "integer",
"description": "Maximum rows to return"
}
},
"required": ["sql"]
}
},
"defer_loading": True, # Tool search
"allowed_callers": ["code_execution_20250825"], # Programmatic calling
"input_examples": [ # Input examples
{
"sql": "SELECT region, SUM(revenue) as total FROM sales GROUP BY region",
"limit": 100
}
]
},
# ... 50 more tools with defer_loading
]
# Make request with effort control
response = litellm.completion(
model="anthropic/claude-opus-4-5-20251101",
messages=[{
"role": "user",
"content": "Analyze sales by region for the last quarter and identify top performers"
}],
tools=tools,
output_config={"effort": "medium"} # Balanced efficiency
)
# Track comprehensive usage
print("Complete Usage Metrics:")
print(f" Input tokens: {response.usage.prompt_tokens}")
print(f" Output tokens: {response.usage.completion_tokens}")
print(f" Total tokens: {response.usage.total_tokens}")
if hasattr(response.usage, 'server_tool_use') and response.usage.server_tool_use:
print(f" Tool searches: {response.usage.server_tool_use.tool_search_requests}")
print(f"\nResponse: {response.choices[0].message.content}")
- Setup config.yaml
model_list:
- model_name: claude-4
litellm_params:
model: anthropic/claude-opus-4-5-20251101
api_key: os.environ/ANTHROPIC_API_KEY
- Start the proxy
litellm --config /path/to/config.yaml
- Test it!
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data ' {
"model": "claude-4",
"messages": [{
"role": "user",
"content": "Analyze sales by region for the last quarter and identify top performers"
}],
"tools": [
{
"type": "tool_search_tool_regex_20251119",
"name": "tool_search_tool_regex"
},
# ... 100 deferred tools
],
"output_config": {
"effort": "medium"
}
}
'
Expected Response:
{
...,
"usage": {
...,
"server_tool_use": {
"tool_search_requests": 1
}
}
}
Real-World Benefits​
This combination enables:
- Massive scale - Handle 1000+ tools efficiently
- Low latency - Programmatic calling reduces round trips
- High accuracy - Input examples ensure correct tool usage
- Cost control - Effort parameter optimizes token spend
- Full visibility - Track all usage metrics

