-
Notifications
You must be signed in to change notification settings - Fork 62
Open
Description
Context
Currently, the system lacks visibility into individual user token usage. A flat subscription model risks significant cost overruns due to usage disparities between user groups (e.g., standard undergraduate use vs. heavy usage by PhD students/developers).
We need to urgently implement "Put User Cost" tracking to distinguish between Heavy and Light users, paving the way for future tiered pricing or quota limits.
Goals
- Short-term: Accurately track and record input/output token consumption per user.
- Long-term: Establish a unified AI Gateway (LiteLLM) to handle automatic pricing, cost calculation, and user-defined API keys.
- Legacy Debt: Address the billing visibility and stability issues within the current Python-based MCP (Model Context Protocol) service.
Proposed Solutions
Option A: MVP (Quick Implementation)
- Direct Logging: Parse the
usagefield from the LLM API response in the backend and write token counts directly to the database. - Retroactive Calculation: Implement a Python Cronjob to read historical chat logs and calculate past token consumption using an offline tokenizer.
- Note: Embedding costs are negligible and can be ignored or calculated separately.
Discussion
- Streaming Costs: If the Gateway cannot return real-time costs during streaming, should we implement a token counter in the Worker layer as a fallback?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Design / Spec