LiteLLM AI Proxy

There is a need to have a central place to manage access to AI API endpoints. Additionally, there is a need to monitor the usage and cost per team/project. LiteLLM is an open source tool that provides solutions to both of those needs.

Overview

Platform Engineering is in the early phases of evaluating this product and hope to have a POC in place by end of 2024.

Below is a diagram of what the high-level goal:

graph TD

A(Apps & Users) --> B(LiteLLM)

subgraph LiteLLM_Service [LiteLLM Service]
B --> H{Check Budget}
H -->|Budget Available| C{Route Request}
H -->|Budget Exceeded| I(Request Rejected)
end


C -->|Azure| D(AI Inference on Azure)
C -->|Databricks| E(AI Inference on Databricks)
C -->|Kubernetes| F(AI Inference on Local K8s Cluster)
C -->|Other Clouds| G(Other Cloud Inference)


style B fill:#D9E6F2,stroke:#333,stroke-width:2px,color: #000;
style H fill:#F4A261,stroke:#333,stroke-width:2px,color: #000;
style I fill:#F94144,stroke:#333,stroke-width:2px,color: #000;
style C fill:#A8DADC,stroke:#333,stroke-width:2px,color: #000;
style E fill:#E9C46A,stroke:#333,stroke-width:1px,color: #000;
style D fill:#E9C46A,stroke:#333,stroke-width:1px,color: #000;
style F fill:#E9C46A,stroke:#333,stroke-width:1px,color: #000;
style G fill:#E9C46A,stroke:#333,stroke-width:1px,color: #000;

Table of Contents


Updated on August 7, 2025