LLM agents and production systems at GriffinAI

End-to-end architecture and delivery of production agentic LLM products — Transaction Execution Agent, Cardano Proposal Examiner, and multi-agent ops.

Date:

Role: Senior AI/ML Engineer

Stack: Python, Docker, CI/CD, AWS, containerized services, Slack/Telegram integrations

Outcomes

  • Transaction Execution Agent — user-facing transaction assistant with bounded transaction-preparation workflows; signing stayed with the user
  • Roughly ~2x faster average response times in TEA, with materially lower token cost through routing and call-structure optimization
  • Cardano Proposal Examiner — built v1 in ~2 weeks; governance knowledge graph and graph-of-thought–style structured reasoning; increased transparency and control
  • Supporting multi-agent communication / ops reporting layer and production deployment patterns around secure cloud integrations and external channels

Flagship case study

At GriffinAI, the hard part was not just model output quality. It was turning risky, user-facing and ops-facing workflows into systems that were explicit, measurable, and safe enough to operate in production.

TL;DR

  • TEA is the clearest public proof story: a transaction assistant where the system helped users reach a transaction-ready path, while final signing stayed with the user.
  • The core architectural shift was moving from more implicit provider-managed behavior toward explicit application-managed workflow control.
  • I optimized for bounded, inspectable production loops rather than demo-style intelligence.

Context

GriffinAI worked in crypto / web3, but the more durable engineering lesson was broader: production AI systems under ambiguity and risk need clear trust boundaries, explicit state, controlled tool use, and observability around the whole workflow.

The strongest secondary shipped proof asset is Cardano Proposal Examiner, where transparency and inspectability mattered more than raw generation quality.

Problem

For TEA, the challenge was not simply answering user questions. It was helping users move through a high-risk conversational workflow without letting the model implicitly own truth, validation, identity/context, or execution-adjacent decisions.

For Proposal Examiner, the problem was different: make structured reasoning inspectable enough that users could follow the path from evidence to conclusion, rather than treating generation quality alone as success.

What I owned

  • End-to-end architecture and delivery of production LLM agent products, bridging research, engineering, and product.
  • The TEA orchestration layer end to end: routing, scenario logic, tool contracts, structured outputs, state-handling decisions, and the migration toward more explicit workflow control.
  • Proposal Examiner v1 from scratch, shaped around structured reasoning, transparency, and control.
  • Supporting multi-agent communication / ops reporting patterns and production deployment around secure cloud integrations and external channels.

Architecture shift

The strongest architecture story here is TEA. The system started closer to a provider-heavy Assistants-style setup, where too much orchestration stayed implicit. That was useful for early feasibility, but it made workflow complexity harder to own.

The later design pushed routing, tool exposure, state handling, structured outputs, and product contracts into a more explicit application-managed workflow. The model stayed responsible for interpretation and workflow guidance, while truth, validation, identity/context, and execution-adjacent boundaries stayed outside the model.

Trade-offs

  • I gave up some free-form conversational smoothness in exchange for stronger control, traceability, predictability, and safer behavior.
  • The early architecture favored prototype speed; the later architecture favored workflow ownership and clearer contracts.
  • In high-risk flows, I consistently preferred bounded, inspectable behavior over surface magic.

Public-safe outcomes

  • TEA average response time improved by roughly ~2x.
  • Token cost dropped materially through routing and LLM call-structure optimization.
  • Proposal Examiner shipped publicly as v1 in ~2 weeks.
  • The overall systems became more predictable and inspectable, which mattered as much as raw speed.

Failure modes

The hardest TEA failure was a semantically wrong but operationally valid transaction preparation. That was the real trust problem: not obviously broken output, but plausible output with the wrong meaning.

Other important failure themes were context bloat, orchestration drift, broad tool-surface confusion, state drift, structured-output brittleness, and incomplete or incorrect tool sequencing.

For Proposal Examiner, the relevant risk shifted from action safety toward transparency, reasoning quality, and evidence quality.

What shipped vs. what stayed internal

Shipped / public-safe

  • TEA as a production user-facing transaction assistant with bounded transaction-preparation workflows; signing stayed with the user.
  • Proposal Examiner as a public v1.
  • Containerized services and secure cloud integrations around these systems.

Internal / deliberately omitted

  • Deeper backend implementation details, internal observability, and some TEA workflow internals.
  • Deeper performance details beyond the approved public-safe ranges.
  • More detailed architecture and performance characteristics of the market/social recommendation direction.