AWS Community Day Turkiye 2026

Building an AI Agent
That Detects Financial Crime
on AWS

opensource.finance

9 May 2026 · Istanbul

02 // THE BIND

Every team is stuck
between two failures.

Support chatbot · code review assistant · KYC bot · refund agent · document summarizer · incident postmortem writer. Same trap.

Failure 1

Add AI badly

Hallucinated reply to a real customer
Compliance breach buried in your audit log
Viral screenshot Monday morning
Lawyer email Tuesday

Failure 2

Don't add AI

Your competitor ships their "AI-powered" version
Your CTO asks why you didn't
Your roadmap reads like 2022
Recruits ask if you're "using LLMs"

There's a third option.

03 // REFRAME

Use AI to explain.
Not to decide.

The agent is not the judge

04 // WHO

I helped build fraud detection
used in 29 countries.

Yusuf Goksu. Platform engineer, Cambridge. Among the initial engineers on Tazama, funded by the Gates Foundation, now at the Linux Foundation. AWS Community Builder since 2022.

tazama.org

05 // THE GAP

Tazama needed a platform team
and on-prem servers.
I saw a gap.

Osprey: open-source transaction monitoring. Go, single binary, 60-second deploy. Apache 2.0. The version I wish had existed.

opensource.finance

06 // THE PROBLEM

A suspicious pattern
never looks suspicious alone.

# Seven transfers, 24 hours, one customer
tx_1042  09:14  TRY 198,500   -> acct_391
tx_1043  10:21  TRY 199,200   -> acct_412
tx_1044  11:08  TRY 197,800   -> acct_391
tx_1045  13:55  TRY 198,000   -> acct_507
tx_1046  17:02  TRY 199,500   -> acct_412
tx_1047  19:41  TRY 196,400   -> acct_507
tx_1048  22:18  TRY 195,800   -> acct_391

# Each one: under the TRY 200,000 declaration threshold.
# Together: structuring. A crime in itself.

MASAK requires banks to record source and purpose for any transfer of TRY 200,000 or more (effective 1 Jan 2026). Splitting amounts to stay under this threshold is structuring -- prosecutable on its own, regardless of where the funds come from.

07 // THE WORK

The real system has four jobs

1. Detect

Detect the pattern

Deterministic rules. Same input, same output. Replayable.

2. Explain

Show the evidence

Which rules fired, on which events, with what context.

3. Audit

Preserve the trail

Immutable record. Regulators will ask.

4. Escalate

Hand off to a human

The model drafts. The analyst decides. Always.

08 // BOUNDARIES

What AI should not do

Never

Decide guilt

Non-deterministic systems cannot own regulated decisions.

Never

File reports alone

Suspicious activity reports are legal documents.

Never

Override rules

If a rule fired, the agent explains it. It does not dismiss it.

Never

Reason as evidence

Evidence must point to data, not to model output.

09 // BOUNDARIES

What AI can safely do

Yes

Cross-correlate

Turn alert data into prose a human can read.

Yes

Filter noise

Pull transactions, profiles, graphs. Look up, do not invent.

Yes

Draft narratives

Who, what, when, where, why, how. From facts, reviewed by a person.

Yes

Say "I don't know"

"Source of funds: not in available data." That is a useful answer.

10 // ARCHITECTURE

The pattern at 10,000 feet

EventBridgeTransaction events in

→

Lambda / ECSDeterministic detection

→

DynamoDB + S3Evidence + audit store

→

Step FunctionsInvestigation workflow

→

Bedrock AgentTool calls + draft

→

Human ReviewApprove, edit, reject

Bedrock Guardrails on every model call · CloudTrail on every state change · IAM per tenant

11 // DETECTION

Detection belongs in
deterministic tools

# CEL rule: structuring under the MASAK threshold
expression: |
  amount >= 180000.0 &&
  amount < 200000.0 &&
  currency == "TRY"

# Same input, same output. Forever.
# Replayable. Auditable. Deterministic.

12 // THE AGENT

The agent gets tools,
not authority.

# Bedrock action group: the entire tool surface
tools:
  - get_alert(alert_id)
  - get_transactions(customer_id, window)
  - get_rule_explanation(rule_id)
  - get_counterparty_graph(customer_id, depth)
  - draft_narrative(alert_id, evidence)

# Five functions. That's the agent's whole world.
# A small tool surface is a safety feature.

13 // NARRATOR

Why not just call Claude?

Frontier APIs send your data out, give different answers, and cost money per word. A small fine-tuned model is private, consistent, and cheap.

Base model

Qwen3-4B-Instruct-2507

4-bit quantised. Decoder-only. Bedrock-supported architecture (Qwen3ForCausalLM).

Method

LoRA · Unsloth + TRL

16.5M trainable / 4.0B base (0.41%). ~70 min on a T4 GPU (g4dn.xlarge equivalent).

Coverage

12 rules · 6 typologies

FATF typologies: structuring, layering, smurfing, PEP exposure, shell companies, geographic risk, more.

Performance

Test perplexity 2.65

Trained on 3,000 synthetic samples. No real customer data. Ships on Ollama and HuggingFace.

ollama.com/josephgoksu/osprey-narrator · huggingface.co/josephgoksu/osprey-narrator-v0.1 · Apache 2.0

14 // FINE-TUNING

Small model, narrow task,
cheap to train.

# LoRA config (Unsloth + HuggingFace TRL)
base_model       = "Qwen/Qwen3-4B-Instruct-2507"   # 4-bit quantised
lora_rank        = 8
lora_alpha       = 16
target_modules   = ["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"]
trainable_params = 16_500_000      # 0.41% of 4.0B base
max_seq_length   = 4096

# Training run
dataset          = 3_000 synthetic alert -> narrative pairs
epochs           = 1
batch_size       = 2          # effective 8 with grad accumulation
learning_rate    = 2e-5
total_steps      = 375
duration         = ~70 min on a single NVIDIA T4 (16 GB)
final_test_loss  = 0.976      # perplexity 2.65

# AWS path: same recipe runs on g4dn.xlarge (T4) or g5.xlarge (A10G)
# For Bedrock Custom Model Import: merge LoRA, export safetensors, S3.

Unsloth gives ~2x training speedup over plain HF Transformers on the same GPU. Synthetic-only training: no real customer or transaction data was used.

15 // INFERENCE

Same weights. Two runtimes.

Ollama when you control the machine. Bedrock CMI when you want AWS to manage it.

Local · dev

Ollama

Runs on a laptop. Free. Starts instantly. Good for development, demos, on-prem.

Production · AWS

Bedrock Custom Model Import

Hosted by AWS. Scales to zero when idle. Pay only when used. IAM and audit built in.

16 // PROOF

I edited this talk
on the plane.

No wifi. No frontier API. Just a small open model running on the laptop in my lap.

10%battery left
0internet
100%local inference
Qwen3 32Bon Ollama

Different model from the Narrator (this one is general-purpose, 32B). Same setup: open weights, Ollama, no API.

17 // EVIDENCE

The evidence store
is the product.

DynamoDB

Live state

Active alerts, transaction state, customer profiles. Hot reads, low latency.

Immutable archive

Every event, every rule firing, every model call. Object Lock.

CloudTrail

Control plane

Who deployed which rule, when, with which IAM role.

The product is not the model. The product is the audit trail that lets a regulator believe you.

18 // GUARDRAILS

Guardrails are
not a checkbox.

Input

Block prompt injection

Guardrails evaluate input before the model sees it.

Output

Block policy violations

The model cannot claim guilt or invent unfired rules.

PII

Mask sensitive data

Account numbers, IDs, addresses. Once configured, always applied.

Scope

Constrain the domain

Only this alert, only this evidence. No legal or financial advice.

19 // DEMO

One transaction. One narrative.

INPUT:  TRY 198,500 transfer to acct_391
RULES FIRED:  threshold_split, rapid_movement

DRAFT (analyst review required):
"Customer cust_77 made seven transfers in 24 hours,
each below the TRY 200,000 MASAK declaration threshold.
Three recurring counterparties. Total: TRY 1,385,200.

Source of funds: not in available data.
Next step: analyst review."

20 // ISOLATION

Tenant isolation matters
more than the model.

EventBridge

Bus per tenant

source = "fintech.${tenant}"

Lambda

Tenant ID required

Every invocation carries tenant_id.

DynamoDB

Partition prefix

Partition key starts with tenant_id.

Bucket prefix

Per-tenant prefix scoped by IAM.

Bedrock

Tenant in context

Agent invocation includes tenant.

Guardrails

Per-tenant policy

Different tenants, different rules.

A leaked alert across tenants is not a bug. It is a contract violation. Isolate at every layer.

21 // LESSONS

I added these eventually.
I'd add them first.

Five things I added late. You should add them first.

Step Functions

I bolted orchestration on at v0.4. Start with the workflow on day one, even if it has one step.

Bedrock Custom Model Import

Shipped on Ollama first. CMI gets you IAM, audit, and scale-to-zero from the first deploy.

DynamoDB Streams

Retrofitted as event-driven memory. Stream into Bedrock for context recall from day one.

VPC Endpoints

Configure before the first NAT Gateway invoice arrives. Mine arrived first.

Guardrails as architecture

Designed in, not bolted on. Treat them as a hard requirement, not a launch-week task.

22 // PRINCIPLES

The non-negotiables

Evidence over output

The agent's draft is not evidence. The data behind it is.

Replay over trust

Same input, same result. Always.

Human review over automation

Non-deterministic systems cannot own regulated decisions.

Audit over speed

Every state transition logged. Forever.

23 // PATTERN

Steal the shape. Bring your own domain.

Same shape works for your problem too. Swap three things and the architecture stays the same.

Your rules

CEL · github.com/google/cel-spec

Google's Common Expression Language. Sandboxed, embeddable, language-agnostic. Write detection logic for any domain — Osprey runs them.

Your narrator

Same 70-min recipe

Small open model + LoRA + ~3K synthetic samples. Swap the prose: clinical notes, runbooks, refund letters, support replies.

Your deployment

Same AWS shape

EventBridge → Lambda → DynamoDB/S3 → Bedrock CMI → human review. Different events, same audit trail.

Caveat: regulated decisions still need a human in the loop. The pattern travels — the responsibility doesn't.

24 // OPEN SOURCE

If you want to go deeper.

Engine

Osprey

Go binary, Apache 2.0, CEL rules, 60-second deploy. Fork it, write your own rules.

Model

Osprey Narrator

Qwen3-4B + LoRA. Q4_K_M GGUF on Ollama, safetensors for Bedrock CMI.

Reference

Tazama

Linux Foundation real-time fraud platform. Digital Public Good.

github.com/opensource-finance

25 // Q&A

First evidence, then narrative.

First evidence, then narrative. The model drafts. The analyst decides.

josephgoksu.com · opensource.finance · github.com/josephgoksu

Thank you

Building an AI AgentThat Detects Financial Crimeon AWS

Every team is stuckbetween two failures.

Add AI badly

Don't add AI

Use AI to explain.Not to decide.

I helped build fraud detectionused in 29 countries.

Tazama needed a platform teamand on-prem servers.I saw a gap.

A suspicious patternnever looks suspicious alone.

The real system has four jobs

Detect the pattern

Show the evidence

Preserve the trail

Hand off to a human

What AI should not do

Decide guilt

File reports alone

Override rules

Reason as evidence

What AI can safely do

Cross-correlate

Filter noise

Draft narratives

Say "I don't know"

The pattern at 10,000 feet

Detection belongs indeterministic tools

The agent gets tools,not authority.

Why not just call Claude?

Qwen3-4B-Instruct-2507

LoRA · Unsloth + TRL

12 rules · 6 typologies

Test perplexity 2.65

Small model, narrow task,cheap to train.

Same weights. Two runtimes.

Ollama

Bedrock Custom Model Import

I edited this talkon the plane.

The evidence storeis the product.

Live state

Immutable archive

Control plane

Guardrails arenot a checkbox.

Block prompt injection

Block policy violations

Mask sensitive data

Constrain the domain

One transaction. One narrative.

Tenant isolation mattersmore than the model.

Bus per tenant

Tenant ID required

Partition prefix

Bucket prefix

Tenant in context

Per-tenant policy

I added these eventually.I'd add them first.

Step Functions

Bedrock Custom Model Import

DynamoDB Streams

VPC Endpoints

Guardrails as architecture

The non-negotiables

Evidence over output

Replay over trust

Human review over automation

Audit over speed

Steal the shape. Bring your own domain.

CEL · github.com/google/cel-spec

Same 70-min recipe

Same AWS shape

If you want to go deeper.

Osprey

Osprey Narrator

Tazama

First evidence, then narrative.

Building an AI Agent
That Detects Financial Crime
on AWS

Every team is stuck
between two failures.

Use AI to explain.
Not to decide.

I helped build fraud detection
used in 29 countries.

Tazama needed a platform team
and on-prem servers.
I saw a gap.

A suspicious pattern
never looks suspicious alone.

Detection belongs in
deterministic tools

The agent gets tools,
not authority.

Small model, narrow task,
cheap to train.

I edited this talk
on the plane.

The evidence store
is the product.

Guardrails are
not a checkbox.

Tenant isolation matters
more than the model.

I added these eventually.
I'd add them first.