Skip to content

DUAL OTLP Producers Guide


DUAL Mode Architecture

+------------------------------------------+
|   Flexible GraphRAG Application          |
|                                          |
|  +---------------+  +------------------+ |
|  | OpenInference |  |     OpenLIT      | |
|  |   (traces)    |  | (traces+metrics) | |
|  +------+--------+  +------+-----------+ |
+---------+---------------- +--------------+
          |                 |
          +--------+--------+
          +----------------+
          | OTEL Collector |
          |  (OTLP 4318)   |
          +--------+-------+
                   |
        +----------+----------+
        ↓                     ↓
   +---------+         +------------+
   |  Jaeger |         | Prometheus |
   |(traces) |         | (metrics)  |
   +---------+         +------+-----+
                       +-------------+
                       |   Grafana   |
                       +-------------+

Benefits: - OpenInference: Detailed LlamaIndex + LangChain traces - OpenLIT: Token metrics, costs, VectorDB metrics - Custom metrics: Graph extraction, retrieval - No conflicts — they coexist perfectly


Quick Setup (3 minutes)

Step 1: Install DUAL mode packages

uv pip install -e ".[observability-dual]"

Step 2: Enable DUAL Mode

Option A: Environment Variable (Recommended)

# In .env
OBSERVABILITY_BACKEND=both

Option B: Docker Compose

services:
  flexible-graphrag:
    environment:
      - OBSERVABILITY_BACKEND=both

Option C: Code (if you call setup directly)

setup_observability(backend="both")

Step 3: Restart Application

# Your app will automatically initialize both backends
# Check logs for:
# Setting up observability with backend: both
# OpenLIT active - token metrics enabled!
# OpenInference LlamaIndex instrumentation enabled
# OpenInference LangChain instrumentation enabled
# Observability setup complete - DUAL MODE

That's it!


What You Get

From OpenLIT (Automatic)

# Token metrics (GUARANTEED!)
gen_ai_usage_input_tokens_total{gen_ai_request_model="gpt-4"}
gen_ai_usage_output_tokens_total{gen_ai_request_model="gpt-4"}

# Request counts by model
gen_ai_total_requests{gen_ai_request_model="gpt-4"}

# Cost tracking (bonus!)
gen_ai_total_cost{gen_ai_request_model="gpt-4"}

# Latency histograms
gen_ai_request_duration_seconds_bucket{gen_ai_request_model="gpt-4"}

# VectorDB metrics (bonus!)
db_total_requests{db_system="neo4j"}

From OpenInference (LlamaIndex + LangChain traces)

# Detailed traces in Jaeger
- All LlamaIndex operations
- All LangChain operations
- Rich span attributes
- Token counts as attributes

From Your Custom Metrics (Still Work!)

# Graph extraction
rag_graph_entities_extracted_total
rag_graph_relations_extracted_total

# Search/retrieval
rag_retrieval_latency_ms
rag_retrieval_num_documents

# LLM operations
llm_generation_latency_ms
llm_requests_total

Grafana Dashboard Updates

Update "Tokens Generated/sec" Panel

Before (showing 0):

sum(rate(llm_tokens_generated_total[5m]))

After (works!):

# Prompt + Completion tokens/sec from OpenLIT
sum(rate(gen_ai_usage_input_tokens_total[5m])) + 
sum(rate(gen_ai_usage_output_tokens_total[5m]))

Add New Panels (Bonus!)

Cost per Hour:

sum(rate(gen_ai_total_cost[1h])) * 3600

Requests by Model:

sum by (gen_ai_request_model) (rate(gen_ai_total_requests[5m]))

VectorDB Operations:

sum by (db_system) (rate(db_total_requests[5m]))

Or Import Pre-Built Dashboard

Visit: https://docs.openlit.io/latest/sdk/destinations/prometheus-jaeger#3-import-the-pre-built-dashboard


OTEL Collector (No Changes Needed)

Your existing config works perfectly! Both OpenInference and OpenLIT send to the same OTLP endpoint.

receivers:
  otlp:
    protocols:
      http:
        endpoint: "0.0.0.0:4318"  # Both send here!
      grpc:
        endpoint: "0.0.0.0:4317"

# Spanmetrics still works but now optional
# OpenLIT sends token metrics directly!

service:
  pipelines:
    traces:
      receivers: [otlp]  # Receives from both!
      exporters: [otlp/jaeger, spanmetrics]

    metrics:
      receivers: [otlp, spanmetrics]  # Receives from both + spanmetrics!
      exporters: [prometheus]

Verification Steps

1. Check Application Logs

# Should see:
Setting up observability with backend: both
Initializing OpenLIT as OTLP producer...
OpenLIT active - token metrics enabled!
Initializing OpenInference as additional OTLP producer...
OpenInference LlamaIndex instrumentation enabled
OpenInference LangChain instrumentation enabled
Custom RAG metrics initialized
Observability setup complete - DUAL MODE
   OpenLIT -> Token metrics, costs, VectorDB metrics
   OpenInference -> Detailed traces (LlamaIndex + LangChain)
   Custom metrics -> Graph extraction, retrieval, etc.

2. Check Prometheus Metrics

Visit http://localhost:9090 and search for:

OpenLIT metrics: - gen_ai_usage_input_tokens_total - gen_ai_usage_output_tokens_total - gen_ai_total_requests - gen_ai_total_cost

Custom metrics: - rag_graph_entities_extracted_total - rag_graph_relations_extracted_total

Spanmetrics (from OpenInference): - calls_total - duration_milliseconds_bucket

3. Check Jaeger Traces

Visit http://localhost:16686

Should see traces from: - OpenInference (llama_index. and langchain. spans) - OpenLIT (gen_ai.* spans)

Both coexist.


FAQ

Q: Won't this create duplicate data?

A: No! They send complementary data: - OpenLIT: Focuses on LLM-specific metrics (tokens, costs) - OpenInference: Focuses on detailed traces - No conflicts - different metric names and purposes

Q: Is this inefficient?

A: Negligible overhead: - Both are lightweight OpenTelemetry producers - OTLP protocol is efficient - Collector handles deduplication if needed - Benefits far outweigh minimal overhead

Q: Can I disable one later?

A: Yes! Just change the env var:

OBSERVABILITY_BACKEND=openinference  # OpenInference only
OBSERVABILITY_BACKEND=openlit        # OpenLIT only
OBSERVABILITY_BACKEND=both           # Both

Q: Do I still need spanmetrics?

A: Optional now! OpenLIT already provides token metrics directly.

But keep it for now - it provides additional call count and latency metrics that complement the data.


Comparison: Single vs DUAL Mode

Metric/Feature OpenInference Only OpenLIT Only DUAL Mode
Token metrics Via spanmetrics Direct Direct
LlamaIndex traces Rich Basic Rich
LangChain traces Rich Basic Rich
Cost tracking No Yes Yes
VectorDB metrics No Yes Yes
Custom metrics Yes Yes Yes
Setup complexity Low Low Low

Real-World Benefits

Scenario 1: Debugging Performance Issues

  • OpenInference traces: See exactly where slowdown occurs
  • OpenLIT metrics: Confirm if it's LLM latency or token volume
  • Custom metrics: Check if graph extraction is the bottleneck

Scenario 2: Cost Optimization

  • OpenLIT costs: Track spend by model in Grafana
  • OpenLIT tokens: Identify expensive queries
  • OpenInference traces: Find unnecessary LLM calls in code

Scenario 3: Production Monitoring

  • OpenLIT dashboard: High-level LLM health (tokens/sec, cost/hour, errors)
  • Custom metrics: RAG-specific KPIs (entities extracted, retrieval latency)
  • OpenInference traces: Deep dive when issues occur

Migration Path

If You're Currently Using OpenInference

# Add OpenLIT (1.41.2+ no longer downgrades openai)
uv pip install "openlit>=1.41.2"
export OBSERVABILITY_BACKEND=both
# Restart - instant token metrics!

If You're Currently Using OpenLIT

# Add OpenInference for LlamaIndex + LangChain traces
uv pip install openinference-instrumentation-llama-index openinference-instrumentation-langchain
export OBSERVABILITY_BACKEND=both
# Restart - get richer traces!

If You're Starting Fresh

uv pip install -e ".[observability-dual]"
export OBSERVABILITY_BACKEND=both
# Perfect from day 1!

References


Summary

DUAL mode combines OpenInference traces (LlamaIndex + LangChain) with OpenLIT token/cost metrics. OpenLIT 1.41.2+ no longer downgrades openai.

DUAL mode provides:

  1. Token metrics (from OpenLIT)
  2. Rich traces (from OpenInference)
  3. Cost tracking (from OpenLIT)
  4. Custom metrics (unchanged)
  5. VectorDB operation metrics (from OpenLIT)

To enable:

uv pip install -e ".[observability-dual]"
export OBSERVABILITY_BACKEND=both
# Restart app — check Prometheus for gen_ai_usage_*_tokens_total


Status: DUAL mode implemented and available.