A Fintech Dilemma: Elegance vs. Performance

How would you solve this…..

We’ve been thinking about the “elegant vs. performant” trade-offs we make in fintech, specifically when it comes to transaction categorisation. It’s a classic problem, but we want to tap into the collective experience of our network for wide perspectives.

We’ve put together a simplified version of a real-world logic challenge below. It’s part logic, part code, and a little bit of a treasure hunt.

The Setup

Imagine you have transactions like:

  • “WOOLWORTHS SYDNEY 1234”
  • “UBER TRIP HELP.UBER.COM”
  • “PAYROLL ACME PTY LTD”
  • “WOOLWORTHS METRO 5678”

Before matching, all transactions are normalised using the following rules:

  • Case-insensitive
  • Remove punctuation
  • Collapse whitespace
  • Strip trailing numeric identifiers

And categorisation rules like:

  1. “WOOLWORTHS” → Groceries (Priority 1)
  2. “WOOLWORTHS METRO” → Convenience (Priority 2)
  3. “UBER” → Transport (Priority 1)
  4. “PAYROLL” → Income (Priority 1)

The Logic Puzzle

How would you build a matcher that satisfies the following requirements?

  1. Highest priority wins
  2. If priorities are tied → longest matching pattern wins
  3. If still tied → result must be 100% deterministic
  4. No match → “Uncategorised”

The Hidden Outcome

Once categorised, take the first letter of each category in order, concatenate them, and Base64 encode the string. If your logic is sound, that string decodes to a valid email address.

What we’re really curious about

We’re less interested in just “the answer” and more in how you think about building this in a real system:

Determinism

  • How do you guarantee consistent outputs across runs?
  • What is your final tie-breaker and why?

Performance

  • What matching strategy would you use at this scale?
  • How do you avoid brute-force comparisons?

Design Trade-offs

  • Where do you sit between readability vs performance?
  • Would you precompile rules? How?

Scalability

How would your system handle:

  • 10× more rules?
  • Real-time categorisation?
  • Rule updates without downtime?

Observability

  • How would you explain why a transaction was categorised a certain way?

Have a crack and send your approach to the decoded email. We’re curious to see the different ways people think this through. We’ll also happily summarise and share in a subsequent post the various approaches we’ve seen.

So, what is your go-to strategy for high-volume pattern matching?