AWS Community Day Turkiye 2026

Building an AI Agent
That Detects Financial Crime
on AWS

opensource.finance
9 May 2026 · Istanbul
02 // THE BIND

Every team is stuck
between two failures.

Support chatbot · code review assistant · KYC bot · refund agent · document summarizer · incident postmortem writer. Same trap.

Failure 1

Add AI badly

  • Hallucinated reply to a real customer
  • Compliance breach buried in your audit log
  • Viral screenshot Monday morning
  • Lawyer email Tuesday

Failure 2

Don't add AI

  • Your competitor ships their "AI-powered" version
  • Your CTO asks why you didn't
  • Your roadmap reads like 2022
  • Recruits ask if you're "using LLMs"

There's a third option.

02
03 // REFRAME

Use AI to explain.
Not to decide.

The agent is not the judge
03
04 // WHO

I helped build fraud detection
used in 29 countries.

Yusuf Goksu. Platform engineer, Cambridge. Among the initial engineers on Tazama, funded by the Gates Foundation, now at the Linux Foundation. AWS Community Builder since 2022.

tazama.org
tazama.org
04
05 // THE GAP

Tazama needed a platform team
and on-prem servers.
I saw a gap.

Osprey: open-source transaction monitoring. Go, single binary, 60-second deploy. Apache 2.0. The version I wish had existed.

opensource.finance
opensource.finance
05
06 // THE PROBLEM

A suspicious pattern
never looks suspicious alone.

# Seven transfers, 24 hours, one customer
tx_1042  09:14  TRY 198,500   -> acct_391
tx_1043  10:21  TRY 199,200   -> acct_412
tx_1044  11:08  TRY 197,800   -> acct_391
tx_1045  13:55  TRY 198,000   -> acct_507
tx_1046  17:02  TRY 199,500   -> acct_412
tx_1047  19:41  TRY 196,400   -> acct_507
tx_1048  22:18  TRY 195,800   -> acct_391

# Each one: under the TRY 200,000 declaration threshold.
# Together: structuring. A crime in itself.

MASAK requires banks to record source and purpose for any transfer of TRY 200,000 or more (effective 1 Jan 2026). Splitting amounts to stay under this threshold is structuring -- prosecutable on its own, regardless of where the funds come from.

06
07 // THE WORK

The real system has four jobs

1. Detect

Detect the pattern

Deterministic rules. Same input, same output. Replayable.

2. Explain

Show the evidence

Which rules fired, on which events, with what context.

3. Audit

Preserve the trail

Immutable record. Regulators will ask.

4. Escalate

Hand off to a human

The model drafts. The analyst decides. Always.

07
08 // BOUNDARIES

What AI should not do

Never

Decide guilt

Non-deterministic systems cannot own regulated decisions.

Never

File reports alone

Suspicious activity reports are legal documents.

Never

Override rules

If a rule fired, the agent explains it. It does not dismiss it.

Never

Reason as evidence

Evidence must point to data, not to model output.

08
09 // BOUNDARIES

What AI can safely do

Yes

Cross-correlate

Turn alert data into prose a human can read.

Yes

Filter noise

Pull transactions, profiles, graphs. Look up, do not invent.

Yes

Draft narratives

Who, what, when, where, why, how. From facts, reviewed by a person.

Yes

Say "I don't know"

"Source of funds: not in available data." That is a useful answer.

09
10 // ARCHITECTURE

The pattern at 10,000 feet

EventBridgeTransaction events in
Lambda / ECSDeterministic detection
DynamoDB + S3Evidence + audit store
Step FunctionsInvestigation workflow
Bedrock AgentTool calls + draft
Human ReviewApprove, edit, reject

Bedrock Guardrails on every model call · CloudTrail on every state change · IAM per tenant

10
11 // DETECTION

Detection belongs in
deterministic tools

# CEL rule: structuring under the MASAK threshold
expression: |
  amount >= 180000.0 &&
  amount < 200000.0 &&
  currency == "TRY"

# Same input, same output. Forever.
# Replayable. Auditable. Deterministic.
11
12 // THE AGENT

The agent gets tools,
not authority.

# Bedrock action group: the entire tool surface
tools:
  - get_alert(alert_id)
  - get_transactions(customer_id, window)
  - get_rule_explanation(rule_id)
  - get_counterparty_graph(customer_id, depth)
  - draft_narrative(alert_id, evidence)

# Five functions. That's the agent's whole world.
# A small tool surface is a safety feature.
12
13 // NARRATOR

Why not just call Claude?

Frontier APIs send your data out, give different answers, and cost money per word. A small fine-tuned model is private, consistent, and cheap.

Base model

Qwen3-4B-Instruct-2507

4-bit quantised. Decoder-only. Bedrock-supported architecture (Qwen3ForCausalLM).

Method

LoRA · Unsloth + TRL

16.5M trainable / 4.0B base (0.41%). ~70 min on a T4 GPU (g4dn.xlarge equivalent).

Coverage

12 rules · 6 typologies

FATF typologies: structuring, layering, smurfing, PEP exposure, shell companies, geographic risk, more.

Performance

Test perplexity 2.65

Trained on 3,000 synthetic samples. No real customer data. Ships on Ollama and HuggingFace.

ollama.com/josephgoksu/osprey-narrator · huggingface.co/josephgoksu/osprey-narrator-v0.1 · Apache 2.0

13
14 // FINE-TUNING

Small model, narrow task,
cheap to train.

# LoRA config (Unsloth + HuggingFace TRL)
base_model       = "Qwen/Qwen3-4B-Instruct-2507"   # 4-bit quantised
lora_rank        = 8
lora_alpha       = 16
target_modules   = ["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"]
trainable_params = 16_500_000      # 0.41% of 4.0B base
max_seq_length   = 4096

# Training run
dataset          = 3_000 synthetic alert -> narrative pairs
epochs           = 1
batch_size       = 2          # effective 8 with grad accumulation
learning_rate    = 2e-5
total_steps      = 375
duration         = ~70 min on a single NVIDIA T4 (16 GB)
final_test_loss  = 0.976      # perplexity 2.65

# AWS path: same recipe runs on g4dn.xlarge (T4) or g5.xlarge (A10G)
# For Bedrock Custom Model Import: merge LoRA, export safetensors, S3.

Unsloth gives ~2x training speedup over plain HF Transformers on the same GPU. Synthetic-only training: no real customer or transaction data was used.

14
15 // INFERENCE

Same weights. Two runtimes.

Ollama when you control the machine. Bedrock CMI when you want AWS to manage it.

Local · dev

Ollama

Runs on a laptop. Free. Starts instantly. Good for development, demos, on-prem.

Production · AWS

Bedrock Custom Model Import

Hosted by AWS. Scales to zero when idle. Pay only when used. IAM and audit built in.

15
16 // PROOF

I edited this talk
on the plane.

No wifi. No frontier API. Just a small open model running on the laptop in my lap.

  • 10%battery left
  • 0internet
  • 100%local inference
  • Qwen3 32Bon Ollama

Different model from the Narrator (this one is general-purpose, 32B). Same setup: open weights, Ollama, no API.

Laptop on a plane running a local model
16
17 // EVIDENCE

The evidence store
is the product.

DynamoDB

Live state

Active alerts, transaction state, customer profiles. Hot reads, low latency.

S3

Immutable archive

Every event, every rule firing, every model call. Object Lock.

CloudTrail

Control plane

Who deployed which rule, when, with which IAM role.

The product is not the model. The product is the audit trail that lets a regulator believe you.

17
18 // GUARDRAILS

Guardrails are
not a checkbox.

Input

Block prompt injection

Guardrails evaluate input before the model sees it.

Output

Block policy violations

The model cannot claim guilt or invent unfired rules.

PII

Mask sensitive data

Account numbers, IDs, addresses. Once configured, always applied.

Scope

Constrain the domain

Only this alert, only this evidence. No legal or financial advice.

18
19 // DEMO

One transaction. One narrative.

INPUT:  TRY 198,500 transfer to acct_391
RULES FIRED:  threshold_split, rapid_movement

DRAFT (analyst review required):
"Customer cust_77 made seven transfers in 24 hours,
each below the TRY 200,000 MASAK declaration threshold.
Three recurring counterparties. Total: TRY 1,385,200.

Source of funds: not in available data.
Next step: analyst review."
19
20 // ISOLATION

Tenant isolation matters
more than the model.

EventBridge

Bus per tenant

source = "fintech.${tenant}"

Lambda

Tenant ID required

Every invocation carries tenant_id.

DynamoDB

Partition prefix

Partition key starts with tenant_id.

S3

Bucket prefix

Per-tenant prefix scoped by IAM.

Bedrock

Tenant in context

Agent invocation includes tenant.

Guardrails

Per-tenant policy

Different tenants, different rules.

A leaked alert across tenants is not a bug. It is a contract violation. Isolate at every layer.

20
21 // LESSONS

I added these eventually.
I'd add them first.

Five things I added late. You should add them first.

01

Step Functions

I bolted orchestration on at v0.4. Start with the workflow on day one, even if it has one step.

02

Bedrock Custom Model Import

Shipped on Ollama first. CMI gets you IAM, audit, and scale-to-zero from the first deploy.

03

DynamoDB Streams

Retrofitted as event-driven memory. Stream into Bedrock for context recall from day one.

04

VPC Endpoints

Configure before the first NAT Gateway invoice arrives. Mine arrived first.

05

Guardrails as architecture

Designed in, not bolted on. Treat them as a hard requirement, not a launch-week task.

21
22 // PRINCIPLES

The non-negotiables

1

Evidence over output

The agent's draft is not evidence. The data behind it is.

2

Replay over trust

Same input, same result. Always.

3

Human review over automation

Non-deterministic systems cannot own regulated decisions.

4

Audit over speed

Every state transition logged. Forever.

22
23 // PATTERN

Steal the shape. Bring your own domain.

Same shape works for your problem too. Swap three things and the architecture stays the same.

Your rules

CEL · github.com/google/cel-spec

Google's Common Expression Language. Sandboxed, embeddable, language-agnostic. Write detection logic for any domain — Osprey runs them.

Your narrator

Same 70-min recipe

Small open model + LoRA + ~3K synthetic samples. Swap the prose: clinical notes, runbooks, refund letters, support replies.

Your deployment

Same AWS shape

EventBridge → Lambda → DynamoDB/S3 → Bedrock CMI → human review. Different events, same audit trail.

Caveat: regulated decisions still need a human in the loop. The pattern travels — the responsibility doesn't.

23
24 // OPEN SOURCE

If you want to go deeper.

Engine

Osprey

Go binary, Apache 2.0, CEL rules, 60-second deploy. Fork it, write your own rules.

Model

Osprey Narrator

Qwen3-4B + LoRA. Q4_K_M GGUF on Ollama, safetensors for Bedrock CMI.

Reference

Tazama

Linux Foundation real-time fraud platform. Digital Public Good.

github.com/opensource-finance
github.com/opensource-finance
24
25 // Q&A

First evidence, then narrative.

First evidence, then narrative. The model drafts. The analyst decides.

josephgoksu.com · opensource.finance · github.com/josephgoksu

Thank you
25