Skip to content

Implement Granular Token Billing and User Usage Tracking #58

@Junyi-99

Description

@Junyi-99

Context

Currently, the system lacks visibility into individual user token usage. A flat subscription model risks significant cost overruns due to usage disparities between user groups (e.g., standard undergraduate use vs. heavy usage by PhD students/developers).
We need to urgently implement "Put User Cost" tracking to distinguish between Heavy and Light users, paving the way for future tiered pricing or quota limits.

Goals

  1. Short-term: Accurately track and record input/output token consumption per user.
  2. Long-term: Establish a unified AI Gateway (LiteLLM) to handle automatic pricing, cost calculation, and user-defined API keys.
  3. Legacy Debt: Address the billing visibility and stability issues within the current Python-based MCP (Model Context Protocol) service.

Proposed Solutions

Option A: MVP (Quick Implementation)

  • Direct Logging: Parse the usage field from the LLM API response in the backend and write token counts directly to the database.
  • Retroactive Calculation: Implement a Python Cronjob to read historical chat logs and calculate past token consumption using an offline tokenizer.
  • Note: Embedding costs are negligible and can be ignored or calculated separately.

Discussion

  • Streaming Costs: If the Gateway cannot return real-time costs during streaming, should we implement a token counter in the Worker layer as a fallback?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Design / Spec

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions