One API for LLM completions, knowledge retrieval, document processing, and notifications. Multi-client billing, model switching, and MCP-compatible discovery — built for agents.
Requests flow from any client through auth and metering, to the service layer, and out to LLM inference. Every call is tracked, costed, and auditable.
Each endpoint is also discoverable as an MCP tool via /mcp/manifest.
Send a real request and see the response with full usage metering.
// Response will appear here.
// Select an endpoint, fill in parameters, and click Send Request.
// Example curl:
curl -X POST https://api.imbila.ai/v1/completions \
-H "Authorization: Bearer sk-imbila-..." \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
Real-time view of API activity, cost, and performance across all clients.
| Time | Client | Endpoint | Model | Tokens | Cost | Latency | Status |
|---|---|---|---|---|---|---|---|
| Enter your API key and click refresh to load live data, or send a request from the playground. | |||||||
Request by model ID or by capability. The gateway picks the cheapest model matching your capability requirement.
| Model | Provider | Cost/1K In (ZAR) | Cost/1K Out (ZAR) | Max Tokens | Capabilities |
|---|---|---|---|---|---|
| llama-3.1-8b | Meta | R0.04 | R0.06 | 2,048 | chatcodesummarisefastcheap |
| llama-3.3-70b | Meta | R0.16 | R0.22 | 4,096 | chatcodesummarisesmart |
| mistral-7b | Mistral | R0.04 | R0.06 | 2,048 | chatcodefastcheap |
| gemma-7b | R0.04 | R0.06 | 2,048 | chatsummarisefastcheap | |
| qwen-1.5-14b | Alibaba | R0.08 | R0.11 | 2,048 | chatcodesmart |
Any MCP-compatible agent can discover these services automatically.
// MCP manifest endpoint
GET https://api.imbila.ai/mcp/manifest
// Returns tool definitions that any MCP client can consume:
{
"name": "imbila-api",
"tools": [
{ "name": "completions", "description": "LLM completion with model switching..." },
{ "name": "knowledge_query", "description": "Search knowledge base..." },
{ "name": "summarise", "description": "Summarise text or URL..." },
{ "name": "notify", "description": "Send email notification..." },
{ "name": "usage_report", "description": "Query usage statistics..." }
],
"auth": { "type": "bearer" }
}
// A2A Agent Card (for agent-to-agent discovery)
{
"name": "Imbila API Gateway",
"url": "https://api.imbila.ai",
"capabilities": ["completions", "knowledge", "summarise", "notify"],
"protocols": ["MCP", "A2A", "REST"]
}// Example: Agent calls completions via MCP
const result = await mcpClient.callTool("completions", {
messages: [{ role: "user", content: "Explain MCP in one paragraph" }],
capability: "smart"
});
// The gateway:
// 1. Authenticates via API key
// 2. Resolves "smart" → llama-3.3-70b
// 3. Runs inference on Workers AI
// 4. Logs: client_id, model, tokens, cost, latency
// 5. Returns response with usage metadata