Skip to main content

Orchestrating Real-World Workflows with AI

Table of Contents

Before we begin: Regulation of AI-powered socially critical workflows is just getting started. This is a simple demo where we do address some compliance concerns, such as audit trails. But clearly, AI in regulated environments is a big topic! The sort that can keep us busy for years!

By now, anyone who’s spent any time with LLMs might be wondering what the fuss is about. Generative AI isn’t deterministic! Can it really be used for real-world financial services workflows like credit limit increases?

An interesting thing about AI-enabled workflows is that they sometimes mirror how humans would achieve the same task. We don’t trust one person to make credit decisions. Let’s model that using LLMs by orchestrating them.

For those who’ve worked in Operations, workflow and orchestration are second nature. In fact, we’ll use that very useful tool in the computer-science shed, a State Machine, to help us orchestrate a workflow that evaluates Credit Limit Increase Requests. Of course, this simple demo will have a very simple state machine, but state machines work really well at all levels of complexity!

This pattern is called the Multi-Agent State Machine. It can be useful to break down a high-stakes decision into simpler, more auditable steps.

Let’s walk through this very unrealistic, but very demo-friendly Credit Increase Decider AI Orchestrator to show how this approach brings rigor and transparency to automated lending.

Also, we’re not throwing away our specialized ML models! LLMs are not the only way to do AI, and we’ll continue to use the right tool for the job.

A Single Prompt Isn’t Enough
#

If we ask a large language model (LLM), “Should we give this customer a $500 credit increase?” we might get a decent answer. But “decent” isn’t good enough in a regulated environment. We need to ensure:

  • Deterministic checks (e.g., mathematical limits) are never skipped.
  • Every decision is grounded in a specific, citable policy.
  • The decision is being reviewed and assessed.

A single prompt struggles to handle all these constraints simultaneously. By moving to a State Machine (see orchestrator.py), we treat the decision as a pipeline where each stage has a specific, isolated responsibility.

Deterministic Rules
#

tl;dr It’s not AI, it’s Python!

Before even engaging with our LLM, we run deterministic “Hard Rules”. In this demo, if a customer hasn’t been with us for six months or is asking for more than a 50% increase, we don’t need an LLM to tell us “no”.

def _passes_hard_rules(self) -> bool:
    increase_pct = self.data.requested_increase / self.data.current_limit
    if increase_pct > 0.5:
        self.add_log("FAIL", f"Requested increase ({increase_pct:.1%}) exceeds 50% max limit")
        return False
    if self.data.months_active < 6:
        self.add_log("FAIL", "Account age violations.")
        return False
    return True

We’re using code for what it’s good at. We save the AI tokens for the qualitative “grey areas” only after our rules say it’s okay to proceed.

Policy Compliance
#

Now comes an interesting part — policy compliance. We have an “auditor” (an LLM) whose job it is to read our organization-specific rules around credit limit increases. Once the math passes, we move to POLICY_COMPLIANCE. This agent acts as a digital compliance officer. We feed it key portions of our internal lending rulebook, and ask it to find the most relevant rule that applies to the current customer’s request, bearing their customer details in mind.

In this demo, we have only 3 policies in config.py. However, increasingly large context windows make it easier to provide longer, complex policy documents to the auditor and expect a good response. And of course, the expectation from the auditor is: it will return JSON in a specific format. If it doesn’t, the process cannot continue and will return a “system error”.

This solves the “hallucination” problem — albeit in our little demo system. If a decision is challenged, our audit logs point exactly to the policy ID and the specific quote the Auditor used to justify its stance.

But what if we have a real system with, say, 500 rules? Sure, you can fit the first five Harry Potter books into the best models’ context windows these days, but now we have increased risk of hallucination. There are several strategies to resolve this, from vector databases to filtering out irrelevant rules deterministically. However, these are beyond the scope of this demo! But if you’re interested in learning more, Build a RAG Agent with LangChain is a great place to start.

The Underwriter and the Risk Officer
#

We now use a pair of agents who will function as adversaries (of sorts) to assess the credit limit increase request.

  • The Underwriter assesses the customer’s stated reason (e.g., “Home renovation”). It looks for risk signals but tends to be optimistic.
  • The Risk Officer audits the Underwriter’s assessment and reasoning. Its role isn’t to always say “no”, but to ensure the “Yes” is backed by evidence.

In orchestrator.py, we manage this hand-off:

# The Risk Officer critiques the Underwriter
risk_officer_response = self._call_gemini_json(
    RISK_OFFICER_SYSTEM_PROMPT.format(
        reason=self.data.reason,
        decision=underwriter_response["assessment"],
        reasoning=underwriter_response["internal_reasoning"]
    )
)

Human in the Loop
#

We don’t treat every AI output as a final verdict. We use a Confidence Score from the Risk Officer to determine the routing. This is our “safety valve”. Again, in orchestrator.py:

confidence = risk_officer_response.get("confidence_score", 0)
if 0.5 <= confidence <= 0.8:
    final_result = "PENDING_HUMAN_REVIEW"
    self.add_log("PENDING", "Risk Officer disagreement triggered human review.")
elif confidence < 0.5:
    final_result = "REJECTED"

By explicitly defining a state that brings a human into the process, we ensure that our staff only spend time on the most ambiguous cases, while the clear-cut requests are handled quickly and automatically.

Transparency
#

Compliance and Regulation demand transparency. We have some code which reads all the LLM output and creates customer communications as well as an audit trail. For our demo, we’ve generated it as a readable Markdown document. Even in this form, it’s a good audit trail for compliance, audit, and regulators.

Final Thoughts
#

Even though this is allegedly an “AI Powered Demo”, a good amount of human thought went into defining its constraints and rules. People familiar with Operations Tech will undoubtedly see echoes of previous work. Here, we defined the ‘hard rules’ and the search strategies, and treat the LLM as a specialized service for specific “fuzzy” tasks in the entire workflow.

Where possible we used good old Python code, and a State Machine to define workflow structure. LLMs were used in a very specific way to drive understanding — with full knowledge + clear indications that it can fail. How this workflow handles system errors is not covered here, but I’m sure you can imagine the possibilities.

Of course, this system can make mistakes! Traditional ML-based systems make mistakes too, and they’ve been productionized with due governance and oversight. Humans make mistakes too! From an engineering standpoint, ensuring as much determinism as possible + testing and review are key to delivering these systems for real.

If you want to have a play around with the code, it’s at github.com/pdvcs/ai-orchestrator-demo.