The Commonwealth Bank has provided more detail of the data points and language models it is using to detect financial abuse in transaction descriptions.
The bank’s AI labs team has a research paper published on arXiv [pdf] that describes the “multi-step approach” and also invites input from “the wider research community” to improve on the current method.
In its research paper, CBA describes in detail the way it has tackled the problem with machine learning.
At a transaction level, the bank is looking at “specifics” such as dollar amount and frequency, as well as some “simple text” analysis of the free-text field, where it looks at variables such as “length of the transaction description, upper/lower/mixed case flags, number of words, length of the longest word in the transaction description, [and] does the message contains special characters/numbers”.
The bank also uses three pre-trained language models to detect “emotion, toxicity and sentiment” in the descriptions.
From there, it aggregates its findings up to a “relationship” level - between an abuser and a potential victim.
If the abuser has more than one victim, it flags as “two [or more] distinct relationships of high risk”.
The bank also checks whether the potential victim has replied or not.
This is all fed into a random forest model that ultimately classifies whether the relationships are “highly abusive or non-abusive”.
“The final model is already fully operational in the bank,” the AI Labs team wrote.
“To increase the model’s robustness, we regularly retrain it when the sent cases are verified from the customer vulnerability specialists.”
The AI Labs researchers said that owing to the “novelty” of the problem, there was little the bank could leverage or lean on in crafting a technical response to the messages.
It sought broader input into what it has put together so far, as well as offered up the model for adoption by other institutions.
The bank also flagged further improvements being pursued under its own steam.
“There are a range of potential improvements we are currently working on and aim to publish in future work,” the labs team wrote.
“Some examples of potential improvements are: better foreign language coverage, [and] use of several months [of] conversation history to detect long-term abuse”.