The official @tallyify/sdk package wraps any Promise-based AI call, extracts usage when the provider returns it, and sends aggregated spend metrics to Tallyify.
Tallyify SDK does not read, store or transmit provider API keys. Your OpenAI, Anthropic, Google, DeepSeek or Mistral keys remain inside your application. The SDK only sends aggregated usage metrics such as provider, model, token counts, latency, cost estimate and optional metadata.
Provider adapters are explicit wrappers around known SDK methods. They do not monkey-patch global fetch, http, https, axios, or provider clients outside the object you wrap.
Available adapters: trackOpenAI, trackAnthropic, trackGemini, trackDeepSeek, trackMistral, trackGroq, trackAzureOpenAI, and trackBedrock. The generic tracker.track() API remains the most portable option for custom clients and unsupported methods.
Streaming responses are tracked automatically. When the wrapped call returns a stream, the SDK hands you back a pass-through async iterable: you consume chunks exactly as before, while the SDK accumulates token usage as the stream is read and dispatches the telemetry event when it ends. This works for both the final-usage-chunk pattern (OpenAI) and the cumulative pattern (Anthropic), via the generic track() and every adapter.
streaming.ts
// Works with the generic tracker AND every adapter — no extra config.const stream = await tracker.track(
openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Write a haiku about the sea' }],
stream: true,
stream_options: { include_usage: true } // OpenAI: required for token counts
}),
{ provider: 'openai', model: 'gpt-4o-mini' }
);
// Consume chunks exactly as you normally would:forawait (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
// Usage is accumulated from the chunks; the telemetry event (tokens + cost)// is dispatched automatically once the stream finishes.
Live Cost Estimation
By default the SDK fetches live prices from the Tallyify catalog (cached per process) so cost_estimate is accurate for every model — not just the small built-in fallback table. It is best-effort: if the catalog is unreachable or the key lacks the pricing:read scope, it silently falls back to bundled prices. Set dynamicPricing: false to disable it.
dynamic-pricing.ts
const tracker = newTracker({
apiKey: process.env.TALLYIFY_API_KEY!,
dynamicPricing: true// default: pull live catalog prices for accurate cost estimates
});
Budget Guardrails
Add a budget to flag — or hard-stop — runaway spend. Because a call's cost is only known after it returns, this is a post-hoc tripwire: with enforce: true the SDK throws TallyifyBudgetError once a per-call or cumulative limit is crossed, so an overspend can't pass silently; otherwise it logs a warning and tags the event.
budgets.ts
import { Tracker, TallyifyBudgetError } from'@tallyify/sdk';
const tracker = newTracker({
apiKey: process.env.TALLYIFY_API_KEY!,
budget: {
maxCostPerCall: 0.50, // USD: flag a single call above this
maxTotalCost: 100, // USD: cumulative cap for this Tracker instance
enforce: true, // throw instead of only warning
onExceeded: (info) => console.error('Budget exceeded', info)
}
});
try {
await tracker.track(providerPromise, { provider: 'openai', model: 'gpt-4o' });
} catch (err) {
if (err instanceofTallyifyBudgetError) {
// overspend tripwire — info has { kind, limit, cost, totalCost, provider, model }
}
}
Retry Tracking
trackWithRetries() retries a call factory with exponential backoff and records every attempt, including the failures. The successful event carries a retry_count so you can see how much retries are costing you.
retries.ts
// Pass a factory (not a promise) so each attempt is a fresh call.const response = await tracker.trackWithRetries(
() => openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Hello!' }]
}),
{
provider: 'openai',
model: 'gpt-4o-mini',
retries: 3, // max attempts (default 3)
retryDelayMs: 250// exponential backoff base
}
);
// Each attempt is tracked; events carry retry_count so you can see retry cost.
The ingestion API rejects provider-key-looking strings, and deprecated externalKeyHint values are ignored. The SDK only ever sends aggregated usage metrics — never a real provider key.