Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 18 February 2021

Neural mechanisms of credit card spending

  • Sachin Banker 1 , 2 ,
  • Derek Dunfield 2 ,
  • Alex Huang 2 &
  • Drazen Prelec 2 , 3 , 4 , 5  

Scientific Reports volume  11 , Article number:  4070 ( 2021 ) Cite this article

23k Accesses

14 Citations

345 Altmetric

Metrics details

  • Human behaviour
  • Neuroscience

Credit cards have often been blamed for consumer overspending and for the growth in household debt. Indeed, laboratory studies of purchase behavior have shown that credit cards can facilitate spending in ways that are difficult to justify on purely financial grounds. However, the psychological mechanisms behind this spending facilitation effect remain conjectural. A leading hypothesis is that credit cards reduce the pain of payment and so ‘release the brakes’ that hold expenditures in check. Alternatively, credit cards could provide a ‘step on the gas,’ increasing motivation to spend. Here we present the first evidence of differences in brain activation in the presence of real credit and cash purchase opportunities. In an fMRI shopping task, participants purchased items tailored to their interests, either by using a personal credit card or their own cash. Credit card purchases were associated with strong activation in the striatum, which coincided with onset of the credit card cue and was not related to product price. In contrast, reward network activation weakly predicted cash purchases, and only among relatively cheaper items. The presence of reward network activation differences highlights the potential neural impact of novel payment instruments in stimulating spending—these fundamental reward mechanisms could be exploited by new payment methods as we transition to a purely cashless society.

Similar content being viewed by others

research paper for credit card

A neurobehavioral study on the efficacy of price interventions in promoting healthy food choices among low socioeconomic families

research paper for credit card

Common neural value representations of hedonic and utilitarian products in the ventral striatum: An fMRI study

research paper for credit card

An fMRI Dataset on Social Reward Processing and Decision Making in Younger and Older Adults

Introduction.

Since their introduction in the 1960s, credit cards have gradually replaced cash and check transactions as the default payment method for consumer purchases, and are now the fastest growing method in the United States 1 . In the future credit cards may find themselves overtaken by digital wallets and other devices. From an economic perspective, it is not surprising that technological changes in payment transactions have some impact on macroeconomic variables, notably on U.S. household debt, which has been steadily rising over the last two decades 2 , 3 . This historical debt increase may be, in part, a rational household response to new lines of credit and to the other benefits of credit cards, in terms of convenience, security, and reward points.

However, evidence is accumulating that suggests credit cards take advantage of cognitive biases and other psychological mechanisms. Many, if not most consumers overestimate their future ability to repay and are surprised by the high interest charges when these come due 4 , 5 , 6 . Empirical studies show that shoppers with credit cards are willing to spend more on items 7 , 8 , check out with bigger baskets 9 , focus on and remember more product benefits rather than costs 10 , 11 , and make more indulgent and unplanned purchase choices 12 , 13 .

Do credit cards then serve to “release the brakes” on spending or instead act to “step on the gas”? Prior evidence indicates that, in fact, both mechanisms may be involved, such that spending facilitation effects are likely to be driven by combination of these processes. For instance, most relevant to this paper are reports that mere exposure to credit card logos can stimulate spending 14 , 15 , 16 . As first argued by Feinberg 14 , spending facilitation via mere exposure implicates classical conditioning mechanisms and cue-triggered cravings associated with addiction 17 , 18 . Salient cues can often trigger a motivational urge to pursue its reward, such as the pleasure associated with consumption—in this way fueling greater spending.

Yet, recent literature has focused greater attention upon an alternative mechanism derived from the mental accounting literature. That is, credit cards may instead weaken brakes on spending by lessening the pain associated with making payments. This “pain-of-payment” hypothesis was originally proposed in a metaphorical sense 19 , 20 , however more literal interpretations have also taken root more recently. With credit card purchases, the act of payment is temporally removed from the act of acquisition, and is further decoupled when multiple transactions, perhaps spread over many months, are represented as a single consolidated balance. This dissociation of purchasing from payment may put costs out of mind and reduce the influence of price on product purchase decisions.

Understanding the brain mechanisms that are responsible for these effects is important, as they are not likely to be confined to credit cards only. By tapping these mechanisms, any new payment technology can disturb old expenditure patterns in ways that people fail to anticipate, and may come to regret.

In this exploratory study, we provide the first evidence of differences in brain activation in the presence of real credit and cash purchase opportunities, presented in an fMRI shopping task. Participants used their own personal credit card or cash funds to make real purchases of products while we simultaneously observed brain activity. Our study focuses on the purchase of everyday products with cash and credit at relatively small dollar values, similar to those examined within prior literature on payment methods. We find that activation in the classical reward networks (the striatum) differentiates credit card purchases from non-purchases, and, importantly, bears little relation to price. In contrast, activation in these same networks is a weak predictor of cash purchases, but interacts with price to predict purchases of cheaper instead of more expensive items. Activation in the insula, a brain region previously linked to pain-of-paying 21 , 22 , 23 , 24 , does not differentiate credit from cash purchases in our study.

As we discuss in the conclusion, we cannot rule out that a reduction in pain-of-paying is responsible for credit card overspending at higher dollar amounts than those used in the study. However, our results suggest that classical cue-conditioning and the resulting sensitization of neural reward networks may have a separate role in motivating credit card purchases. Even if credit cards do “release the brakes” on spending, as argued by mental accounting, it appears that they could also help to “step on the gas.”

To facilitate comparisons with previous results, our study builds on the established SHOP (“Save Holdings Or Purchase”) fMRI paradigm 21 , 22 , 25 . In the task, participants make a series of purchase decisions for products offered at a steep discount relative to market price. A trial begins with a screenshot of a product that the participant has not seen previously in the study, followed by the product price, and concludes with a “buy” versus “no-buy” decision screen. Neural signals in the task have been shown to dissociate reward-related from price-related decision processing 21 . For this reason, it is a natural protocol for assessing competing hypotheses about credit card purchase facilitation mechanisms.

In the SHOP task, a decision to buy is marked by three neural signals (see Fig.  1 for SHOP trial structure; Fig.  2 top panel for activation pattern in the original SHOP study). Two signals come from the classic dopaminergic reward network involving the striatum and the ventromedial prefrontal cortex (VMPFC). Striatal activation is a leading predictor of purchase, appearing during the product and price screens, but losing significance by the decision point. Activation in the VMPFC predicts purchase during the price presentation and decision points, and also correlates with post-scanner estimates of consumer surplus (defined as the difference between stated willingness-to-pay for a product and its price). Thus, VMPFC activity has been interpreted as a net-value signal within a range of decision making contexts 21 , 22 , 25 , 26 , 27 , 28 .

figure 1

Shopping task trial structure in the current study. Participants viewed the product for 4 s, the payment method for 4 s, the price for 4 s, and then made a choice to purchase within 4 s. Post-decisional periods consisted of a confirmation, 4 s, and a pay response, 4 s. Purchase trial shown; if not purchased, the confirmation indicated “basket unchanged” and the pay screen indicated “no payment necessary.” Intertrial interval jittered 2–8 s. This study added the Method, Confirm, and Pay phases to the original SHOP paradigm.

figure 2

Comparison with Knutson et al. 21 . Neural activation time courses in the striatum, VMPFC, and rAIC distinguishing purchase (black) from non-purchase (grey), y-axis labeled with percent signal change. Above: Fig. 2 from Knutson et al. 21 . Below: time courses from the current study collapsed across payment methods. Phases: * = product, M = method, $ = price, ? = choice, C = confirm, P = pay.

A separate neural indicator of product purchase is reduced activity in the right anterior insula cortex (rAIC), when the price appears 21 , 22 , 25 . Because the rAIC has been previously implicated in the processing of negative emotions and pain 29 , 30 , 31 , 32 , 33 , 34 , its activation in the SHOP task has been interpreted as evidence consistent with a “pain-of-paying” caused by high price, acting as a brake on spending 21 , 22 . Paying for products has been thought to elicit an affective pain experience associated with activation in the anterior portion of the right insular cortex, in contrast to the posterior portion of the right insular cortex which has been linked to representation of physical pain experiences 23 .

Behavioral findings

The independent variables, product price and payment method, had the expected effects on purchase behavior in the fMRI shopping task. A hierarchical logistic regression predicting purchase decisions yielded parameters on price ( b  = − 0.334, se  = 0.116, p  = 0.004), payment method ( b  = − 0.036, se  = 0.099, p  = 0.715) and their interaction ( b  = 0.251, se  = 0.120, p  = 0.037) in the anticipated direction. This interaction follows predictions based on a prior test conducted in a similar context 35 . Consistent with previous empirical studies 7 , 8 , participants were more willing to purchase higher-price items with credit rather than with cash, and thus they spent more overall when using credit card (average basket = $87.41, SD  = 61) rather than cash ($84.19, SD  = 51). These behavioral findings supported the notion that credit cards facilitate purchasing behavior, and our analysis presented below focuses primarily on the associated neural activation evidence.

Neural activation

The current fMRI design follows the approach in the original SHOP article 21 . For comparison, Fig.  2 displays the activation time course in the current study alongside the Knutson et al. 21 results. Collapsing across payment methods, the time course in each region of interest (ROI) tracks the original results to a remarkable degree. The key regions of interest—striatum, VMPFC, and rAIC—are shown graphically within Fig.  3 . While we do not consider the current findings to be an exact replication of the original SHOP results, the neural activation patterns from the earlier study provide a benchmark reference, as discussed below.

figure 3

Regions of interest examined within the current study. Ventromedial prefrontal cortex shown in green and striatum shown in blue, from Bartra et al. 26 meta-analysis; right anterior insular cortex (rAIC) shown in red, from Kelly et al. 34 parcellation analysis; MNI x = − 6, y = 10, z = − 6.

Figure  4 breaks apart the time courses by payment method, and shows that the reward network differential buy signal is clearly present with credit card purchases, but is negligible with cash purchases. Logistic regressions of the purchase decision on the ROI signal change, payment method, and their interaction confirms that credit purchases were associated with greater differential striatal activation, beginning with the payment method screen and extending up until the decision screen (shown in the bottom panel of Fig.  4 ). The same pattern holds directionally but not significantly, for the VMPFC. However, if the neural signal in each ROI is collapsed across buy and no-buy decisions, there is no significant difference between credit card and cash trials, at any time point, suggesting that presentation of the credit card stimulus per se does not affect brain activity in the target ROIs.

figure 4

Above: ROI signal intensity time courses illustrating purchase (black) versus non-purchase (grey). Below: Buy decision regressed on ROI signal intensity, payment method, and interaction at each TR. Red indicates a negative coefficient. Parameter significance denoted by *** p  < .001, ** p  < .01, * p  < .05, ^ p  < .10. Phases: * = product, M = method, $ = price, ? = choice, C = confirm, P = pay.

Looking at the cash trials only, the reward signals are weaker predictors of purchases than in the original SHOP task 21 , even though the earlier study also required cash payments. However, in Knutson et al. 21 , participants tapped their experimental endowment—money they did not have before the study—potentially creating a house-money effect. In contrast, participants in the current study paid out-of-pocket with the $50 in cash they brought to the experiment.

As evident in Fig.  2 , collapsing across payment methods in the current study reveals activation time courses that track the Knutson et al. 21 results. This appears to be primarily due to purchase decisions using a credit card, not cash. Accordingly, comparing the current findings to past SHOP results indicates that credit card purchase decisions resemble house-money purchases. Thus, one interpretation of these findings is that when shopping with credit card, individuals act as if drawing on an endowment (from the financial institution backing the card).

An additional analysis shows that prices modulate the association of neural signals and the decision to purchase. The y-axis in Fig.  5 displays the differential purchasing signal. That is, we take the average ROI activation on purchase trials and the average ROI activation on non-purchase trials, and plot the difference between these means; this is plotted separately for high-price items and for low-price items, when using cash and when using credit (see Figure S2 in Supplementary Information for further information). Accordingly, points plotted at the zero line indicate that neural activation did not differ between purchase and non-purchase decisions on average. Points plotted above the zero line instead indicate that purchases were associated with greater activation in the ROI relative to non-purchase decisions (and conversely for points below the zero line). The significance levels in the table in Fig.  5 come from logistic regressions of the buy decision on ROI signal change and its interaction of signal with item price (a continuous variable).

figure 5

Above: y-axis plots the difference between average purchase and average non-purchase ROI signal intensity, for high-price (black) and low-price (grey) items by payment method. Below: Buy decision regressed on ROI signal intensity and the interaction between price (continuous) and ROI signal intensity at each TR, separately for credit and for cash. Red indicates a negative coefficient. Parameter significance denoted by *** p  < .001, ** p  < .01, * p  < .05, ^ p  < .10.

Focusing on decisions using cash, positive reward-related ROI activation in the striatum was associated with purchasing only among lower-priced items, and this differential purchasing signal is near zero for higher-priced items (see the left panel in Fig.  5 ). Confirmed by the interactions in the regression analysis, buying items with cash has a price-dependent neural signature that is clearest in the striatum. In contrast, the neural signature associated with credit purchases is not price-contingent, and is instead reflected by differential activation in reward-related ROIs, regardless of the price. Regression analyses that directly compare the differential sensitivity to price when using cash and credit are reported within the Supplementary Information; these findings suggest that credit cards reduce sensitivity to price information via heightened striatal activation, exhibited during the periods in which product price is presented to participants.

Although a single experiment is rarely definitive with respect to behavior outside of the lab, the results reported here provide clear clues about the neural mechanisms that differentiate credit card from cash purchases and that may be implicated in credit card overspending.

A leading hypothesis within recent literature is that credit cards facilitate purchasing by diminishing a pain-of-payment that would otherwise keep spending in check. The intuition behind it is that card transactions “decouple” (disassociate) payments from consumption 19 , 20 . The decoupling occurs because the payment is delayed, can be postponed repeatedly, and the actual repayment date may be ambiguous if diverse expenditures are lumped into a rolling balance. Decoupling of payments from consumption allows people to keep the cost of the item “out of mind,” creating a kind of analgesic at the moment of purchase.

We do not find neural evidence for this explanation, at least if pain is defined as a physical sensation and insula activity treated as its neural marker, as has been suggested in the past 21 , 22 , 23 . Although insular activation does differentiate purchase from non-purchase decisions, it does so only after the decision point, and does not clearly interact with either payment method or item price (Figs.  4 , 5 ). Insular activation seems to reflect simple product rejection in our study, perhaps similar to the rejection of bad offers in economic games 36 , 37 , 38 . Yet, our evidence is consistent with the more metaphorical interpretation of the pain-of-payment account. That is, while we did not observe credit cards to influence pain processing networks in the brain, our evidence did indicate that price information failed to have any modulating influence on neural mechanisms associated with credit card purchases (i.e., costs were out of mind).

At the same time, there are a number of important constraints within the current study that offer worthy directions for further exploration. For instance, it is possible that spending cash could elicit stronger negative affective responses at higher price levels than those examined within the current study. Some interesting exploratory research suggests that observing others make cash payments at higher price levels is associated with increased activation in the insula 24 . As applied within prior literature, our study design also mimics typical retail shopping environments in which participants add items to their basket and subsequently checkout (rather than parting with money at the moment the purchase decision is made) which may diminish the salience of cash payments. Furthermore, in conveying the payment method to participants, we also used an icon that included both Visa and Mastercard logos; additional research could help to clarify the role of brand logos in eliciting spending facilitation effects. As participants in this study had reasonable levels of financial literacy, additional research focusing on consumers with lower, or higher, levels of financial literacy and experience would be valuable to pursue. Additionally, while our study aimed to stick closely to prior SHOP tasks, more highly powered designs could offer greater insight into the role of the insula.

Taken altogether, the hypothesis that gains most support from the current evidence is that the reward network—the striatum in particular—has been chronically sensitized by prior experience with credit cards. In line with cue-triggered accounts of cravings, exposure to conditioned credit card cues may trigger sensitivity to rewards 14 , 17 , 18 , 39 , 40 . Such sensitization would show up in a reward anticipation increase following onset of the credit card logo in expectation of an imminent buy decision, a pattern that we indeed observe within striatal activity. Under this hypothesis, credit card cues may in part activate the pursuit of rewarding products rather than merely alleviating the pain associated with paying for them.

The difference in reward network activation between credit and cash conditions is notable in light of the small prices and modest behavioral effects. Self-reports taken after the shopping task suggest that participants were largely unaware of the influence of payment methods on their decisions, disagreeing with statements that they were more impulsive and less price-conscious when shopping with credit cards (see Supplementary Information). The differences in reward-related neural purchasing signals observed between payment methods do not appear to reflect inconveniences in using cash itself; indeed, prior SHOP studies examining cash purchases 21 documented similar reward-related neural purchasing signals, so long as participants were spending house-money from an experimental endowment. Further research could help to clarify the extent to which consumers consider shopping with credit card to be akin to spending house money. We do find that the impact of credit cards on behavior (purchase likelihood) and neural activity increases with price. Extrapolating on this price-related trend, one might expect greater credit card effects for big-ticket items in an actual marketplace.

Although recent literature largely interprets credit card facilitation of spending through a pain-of-payment lens, a considerable body of existing behavioral evidence is consistent with a cue-triggered account. Findings that credit card cues serve to heighten attention and memory toward the positive elements and away from the negative elements of product stimuli 10 , 11 , 41 fall in line with conditioning processes that have long been understood at both psychological and neurobiological levels 42 , 43 . Exposure to credit card logos has also been shown to increase the willingness to pay for items even when people pay with cash 14 , 15 , 16 , consistent with the idea that credit cards can serve as cues that trigger spending behavior. Moreover, while traditional mental accounting theories suggest that credit cards lessen pain-of-paying for all types of products, people are in fact more inclined to purchase vice products when shopping with credit cards 12 , as is suggested by an account in which credit cards prompt the pursuit of products that satisfy cue-triggered cravings. A conditioned spending response can lead individuals to become more attuned to consumption cues and also raise the marginal utility of consumption 39 , 40 .

Behavioral economic models with expectation-based reference points could potentially accommodate our findings and allow analytical extrapolation from the lab to the marketplace. The general idea in these models is that experience with a transaction instrument generates expectations, which then serve as a reference point 39 , 40 , 44 . If the expectations are to purchase, then failing to purchase becomes a loss relative to the reference point. Such models have explained addictive behavior in the past, however the expectation formation could be localized to a combination of card, product category, and physical environment (e.g., retail or online). In principle, any distinct transaction method: cash, credit, check or digital wallet, could stamp in its own unique set of “local preferences,” as the consumer accumulates experience.

It is notable that the neural mechanisms involved in facilitating credit card spending share similarities to neural mechanisms that have in the past been implicated in addictive behaviors. Specifically, our evidence indicates that credit card cues led to reward network sensitization in the striatum, a distinguishing feature of cue-triggered mechanisms that has emerged in studies of chemical addiction to substances 17 , 18 , 45 . While we certainly do not claim that consumers are “addicted” to credit cards, an appreciation of the overlapping physical substrates may offer insights into important individual differences in vulnerabilities to more extreme forms of credit card overspending. For example, the genetic factors involved in dopaminergic reward network function that have been linked to drug addiction 46 could also contribute to greater risk of credit card abuse, due to the underlying role they play in learning and conditioning processes.

Credit cards are now an established instrument, but similar neural effects may arise with any disruptive payment technology. New payment methods and digital currencies can sensitize reward networks in unexpected ways, removing the financial guardrails created by old purchasing habits and routines. Many new payment technologies have the ability to strengthen reinforcement mechanisms through the use of unique sounds heard when acquiring an item, visual notifications received on mobile devices, and even haptic stimuli that can simultaneously provide physical feedback. Such multisensory stimuli 47 can drive speedier conditioning in a way that could very quickly begin to impact consumer purchasing processes. Payment methods that are integrated within mobile devices could also exploit prior conditioning with the device and fuel more unrestrained purchasing behavior 48 .

This is a cautionary message for the consumer finance and payment industries, as well as for economic welfare analysis based on revealed preference. If neural mechanisms operate under the radar, one cannot assume that technical improvements in payment methods will make all consumers better off. Our study does not discuss consumer protection and related policy issues, but underlines the importance of keeping policy eyes open to neuroscience evidence as it comes in. Because novel payment methods have the potential to take advantage of the neurobiological processes that drive purchase behavior, developing guardrails to prevent misuse may enable consumers to fully benefit from advancements in payment technology.

Although payment methods are involved in every consumer purchase decision, the underlying mechanisms through which they operate have not been well understood. The current findings highlight considerable differences in brain mechanisms responsible for the influence of payment methods on purchasing decisions, and expose important consumer vulnerabilities that will require attention as payment methods rapidly evolve. Ultimately, each of the many billions of consumer financial transactions that occur across the world each year are made by individuals who share the neural mechanisms studied here.

Participants

A total of twenty-eight participants (ages 20–54; age M  = 28.7, SD  = 10.6; 18 women) completed the study. One participant was excluded from the analysis due to excessive head motion during the scan (more than 3 mm). The experimental procedures were approved by the MIT Institutional Review Board and were performed in accordance with relevant guidelines and regulations. All participants provided informed consent. Participants were compensated at least $75 for their time and received payment after 1–2 weeks of the study.

Median participants in the study had a childhood household income between $75,000 and $100,000, current household income between $25,000 and $44,999, and reported saving 5–10% of their current income. Median participants were also college graduates, and 77% of participants reported having not experienced extended unemployment in the past 2 years. Participants were also asked to respond to financial knowledge questions 49 probing their understanding of credit ratings and investments. On average, participants correctly answered 73%, or 11 of the 15 financial knowledge questions ( SD  = 2.4).

Our experimental design approach inherits heavily from prior publications adopting the SHOP paradigm 21 , 22 , 25 . To facilitate comparisons with benchmark SHOP studies, we retained the basic trial structure and added a payment method screen and two payment review screens (Fig.  1 ). The payment method screen (cash or credit card) was inserted between the product presentation and price screens. This sequencing was informed by results of a study showing that payment method matters if presented together with price information, but does not matter at the final checkout stage, after the consumer has presumably formed the intention to purchase 35 . The placement of the payment method prior to the price phase enabled us to examine whether the payment method modulated price-related or reward-related neural signals during the price differential computation. The trial ended with separate confirmation and checkout screens that required endorsement responses, giving participants a chance to “reflect on” but not change their decision, simulating the experience of receiving a receipt after a purchase. These additional stages were included to mimic the full sequence of a retail shopping experience and facilitate observation of post-decisional hedonics.

Each participant arrived to the study with their personal credit card and at least $50 in cash. Participants were told that they would be shopping within the lab’s experimental store, and that any purchases using cash or credit would be made through the lab at the end of the study. Therefore, any payments for purchases would come from a participant’s out-of-pocket funds rather than experimental endowments as in prior SHOP studies 21 . All products were offered at prices well below the minimum $50 cash on hand, with a median product price offer of $5.40 ( M  = $6.39, SD  = $3.73, min  = $1.50, max  = $18.00). Similar to prior SHOP studies, these offered prices were at a fixed 70% discount relative to actual retail price (i.e., corresponding to retail prices between $5 to $60). Participants were required to bring at least $50 in cash to the study in order to minimize differential liquidity constraints; that is, participants did not reject items simply because they did not have enough cash with them, as we structured all products to have price offers to be below the $50 that participants had on hand. The prices examined within this study are at the high end in relation to previous literature applying the SHOP paradigm 21 , 22 , 25 and behavioral research on payment method effects 41 , 41 , 50 . Yet, as we discuss within the conclusion, it is possible that other mechanisms could be at play when studying big-ticket items at prices higher than those examined in the current study.

To increase interest and simulate a typical retail experience, each participant faced a tailored set of product offerings. We populated a database of over 22,000 top selling items, drawing on product information from Amazon. An independent online sample then rated which categories they perceived to be most appealing, which reduced the database to approximately 4000 items, covering a wide range of categories, including beauty, kitchen, books, etc. Prior to entering the scanner, participants selected and rated the desirability of 42 categories from the lab’s experimental store on a 7-point scale. Products in personally more desirable categories were more likely to be offered in the fMRI shopping task.

The scanning task involved three shopping “runs,” with 28 trials each, or 84 in total. Within a trial, participants indicated whether they would buy a specific product at a stated price. If so, the product was added to the participant’s “shopping basket.” No products were repeated. Each product had a 50% chance of being offered for purchase with credit or with cash, pseudorandomly determined such that each payment method constituted half of the trials. At the end of the task, one product was randomly selected. If it was in the basket, the participant was asked to pay for the product at the stated price. Participants paid using their own personal credit card or out-of-pocket cash, as specified in the product offer. Regardless of payment method, items were shipped to participants by mail within 2–3 days of the study.

Each 24 s (s) trial consisted of six 4 s periods, followed by a jittered 2–8 s intertrial interval (see Fig.  1 for an illustration). Participants viewed a product in period 1; the payment method was introduced in period 2 with a cash or credit icon, the price in period 3. Participants signaled their decision to buy or not to buy in period 4. Following a buy decision, the participants saw a 4 s confirmation screen stating “this item has been added to your basket,” and a 4 s payment screen that required them to press a button to “commit to pay.” Following a no buy decision, the confirmation screen indicated “basket unchanged” and the payment screen required participants to press a button to acknowledge “no payment necessary.”

After exiting the scanner, participants reported their willingness to pay for each product shown in the scanner task by completing a separate incentive compatible auction procedure 51 and also completed several psychological scales. Post-scan measures were not recorded for one participant due to a technical error.

fMRI acquisition

All participants were right handed, native English speakers, with no history of neurological disorders. Participants were verified to have no magnetically reactive matter present in or on the body prior to scanning. All scans were performed using a 3 T Siemens Magnetom Tim Trio MRI System with a phase-array 32-channel head coil (Siemens Medical, Erlangen, Germany). Structural scans were acquired using a three-dimensional T1-weighted multi-echo MP-RAGE pulse sequence (TR = 2530 ms; TE = 1.64 ms, 3.5 ms, 5.36 ms, 7.22 ms; flip angle = 7°; slices = 176; thickness = 1 mm; matrix = 256 × 256). Task-based functional scans were collected using T2* weighted EPI sequence images sensitive to blood oxygen level-dependent (BOLD) contrast (TR = 2000 ms; TE = 30 ms; flip angle = 90°; slices = 32; thickness = 3 mm; matrix = 64 × 64). Analyses were conducted using the FMRIB Software Library, FSL, version 6.00 52 .

Behavioral analysis

To model the effects of price and payment method on purchase decisions, we conducted a hierarchical logistic regression in which purchase decision was predicted by price, payment method, their interaction, and demographic controls. The hierarchical model included random slopes for the price × payment method interaction and participant-level random effects, following prior work 35 . Price was a continuous, z-normalized regressor, normed at the participant-level price distribution. The demographic variables (age, marital status, education level, and amount of savings) controlled for differences in shopping behavior across participants.

ROI analysis

Region of interest analyses examined activity in a priori determined focal brain areas selected based on past observations that have isolated neural purchasing signals, as described above 21 , 22 , 25 . To specify the precise regions for analysis, we applied masks from meta-analyses of the striatum and VMPFC (see Fig. 9 within Bartra et al. 26 for brain maps depicting these regions), as well as the rAIC 34 , k = 2, cluster 2. Notably, the striatum contains the nucleus accumbens, an ROI referred to in past research 21 , 22 , 25 . See Fig.  3 for a graphical display of the ROIs.

These meta-analytically determined brain regions match the ROIs examined in prior SHOP experiments while offering interpretive advantages through the application of sample-independent functional definitions rather than sample-dependent anatomical definitions. Furthermore, automated ROI selection served to minimize potential experimenter bias associated with the manual adjustment of ROI coordinates for individual participants. ROIs for the ventral striatum and VMPFC were generated based on a five-way conjunction analysis identifying regions of the brain carrying a monotonic, modality-independent subjective value signal on the basis of thousands of independent brain scans 26 . The right anterior insula ROI was determined by applying a task-evoked coactivation-based parcellation analysis with hundreds of independent scans 34 . Whole brain contrast analyses verified that striatum activation was associated with product preference, VMPFC activation was associated with choice, and right anterior insula activation was associated with higher prices within our sample (see Supplementary Information).

Prior findings in the SHOP paradigm established that differential neural purchasing signals emerge during the price and choice phases 21 , 22 , 25 . Thus, we anticipated that the payment method would impact these neural purchasing signals at the price and choice phases, following presentation of the payment method. We focus our analysis and interpretation on these stages of the time course in which payment methods were predicted to modulate neural purchasing signals (in addition to the payment method phase), but we also provide results at all other stages of the time course for the reader’s reference (that is, including stages prior to the presentation of payment method itself and stages after participants already recorded a purchase decision). Our goal was to understand how payment method influenced the previously identified ROIs when making purchase decisions. In order to present these effects intuitively, we report the results of logistic regressions conducted separately for each ROI and at each acquisition point. The figures report parameter significance from logistic regression results without corrections; please note that the key interaction effects in the striatum remain significant after Bonferroni corrections.

Specifically, within each region of interest, we analyzed the relationship between signal change and purchasing behavior at each acquisition point (TR). Following prior literature applying the SHOP paradigm 21 , 22 , 25 , time courses were lagged by 4 s to compensate for the delay in the hemodynamic response; the time courses depicted in the figures reflect this 4 s lag. To identify the differential purchase signal associated with credit versus cash purchases, we first conducted logistic regressions of the purchase decision on the ROI signal change, payment method, and their interaction at each acquisition point (results shown in Fig.  4 ). In specific, for each ROI and acquisition point, we fit the following regression equation: \(Buy=logit({b}_{0}+{b}_{1}\,*\,ROIactivation+{b}_{2}\,*\,PaymentMethod+{b}_{3}\,*\,ROIactivation\,*\,PaymentMethod)\) ; Buy corresponds to the decision to purchase (Buy = 1, NoBuy = 0), ROIactivation refers to the activation in the particular ROI at the acquisition point on the trial, PaymentMethod refers to the contrast coded treatment (Credit = 1, Cash = − 1).

We next evaluated the relationship between ROI activity and purchase behavior by conducting logistic regressions of the purchase decision on the ROI signal change and its interaction with price (a continuous, z-normalized variable; results shown in Fig.  5 ). Specifically, for the price interaction analysis in Fig.  5 we fit the following regression equation: \(Buy=logit({b}_{0}+{b}_{1}\,*\,ROIactivation+{b}_{2}\,*\,Price+{b}_{3}\,*\,ROIactivation\,*\,Price)\) . These analyses allowed us to directly examine the effects of payment method on previously identified ROIs involved in making purchase decisions.

Notably, all regression results apply price as a continuous, z-normalized regressor, normed based on the participant-level price distribution. Participant price distributions had minimum offer prices that ranged from $1.50 to $1.96 across participants and maximum price values that ranged from $12.78 to $18.00. “High-price” and “low-price” categories were included for graphical displays only (i.e., Fig.  5 ) and were defined relative to the median of each participant’s price distribution; binary price variables were not used as regressors in any significance tests. Further details regarding whole brain analyses as well as additional participant characteristics are provided within the Supplementary Information.

Federal Reserve System. The Federal Reserve Payments Study (2019).

Consumer Financial Protection Bureau. The Consumer Credit Card Market (2019).

New York Federal Reserve Bank. Quarterly Report on Household Debt and Credit, 2019Q4 (2019).

Ausubel, L. M. The failure of competition in the credit card market. Am. Econ. Rev. 81 , 50–81 (1991).

Google Scholar  

Stango, V. & Zinman, J. What do consumers really pay on their checking and credit card accounts? Explicit, implicit, and avoidable costs. Am. Econ. Rev. 99 , 424–429 (2009).

Article   Google Scholar  

Heidhues, P. & Koszegi, B. Exploiting naivete about self-control in the credit market. Am. Econ. Rev. 100 , 2279–2303 (2010).

Prelec, D. & Simester, D. Always leave home without it: a further investigation of the credit-card effect on willingness to pay. Mark. Lett. 12 , 5–12 (2001).

Soman, D. The effect of payment transparency on consumption: quasi-experiments from the field. Mark. Lett. 14 , 173–183 (2003).

Article   ADS   Google Scholar  

Hirschman, E. C. Differences in consumer purchase behavior by credit card payment system. J. Consum. Res. 6 , 58–66 (1979).

Chatterjee, P. & Rose, R. L. Do payment mechanisms change the way consumers perceive products?. J. Consum. Res. 38 , 1129–1139 (2012).

Soman, D. Effects of payment mechanism on spending behavior: the role of rehearsal and immediacy of payments. J. Consum. Res. 27 , 460–474 (2001).

Thomas, M., Desai, K. K. & Seenivasan, S. How credit card payments increase unhealthy food purchases: visceral regulation of vices. J. Consum. Res. 38 , 126–139 (2011).

Inman, J. J., Winer, R. S. & Ferraro, R. The interplay among category characteristics, customer characteristics, and customer activities on in-store decision making. J. Mark. 73 , 19–29 (2009).

Feinberg, R. A. Credit cards as spending facilitating stimuli: a conditioning interpretation. J. Consum. Res. 13 , 348–356 (1986).

McCall, M. & Belmont, H. J. Credit card insignia and restaurant tipping: evidence for an associative link. J. Appl. Psychol. 81 , 609 (1996).

Raghubir, P. & Srivastava, J. Monopoly money: the effect of payment coupling and form on spending behavior. J. Exp. Psychol. Appl. 14 , 213 (2008).

Article   PubMed   Google Scholar  

Berridge, K. & Aldridge, J. W. Decision utility, incentive salience, and cue-triggered ‘wanting’. In Oxford Series in Social Cognition and Social Neuroscience (2009).

Wyvell, C. L. & Berridge, K. C. Incentive sensitization by previous amphetamine exposure: increased cue-triggered “wanting” for sucrose reward. J. Neurosci. 21 , 7831–7840 (2001).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Prelec, D. & Loewenstein, G. The red and the black: Mental accounting of savings and debt. Mark. Sci. 17 , 4–28 (1998).

Thaler, R. H. Mental accounting matters. J. Behav. Decis. Mak. 12 , 183 (1999).

Knutson, B., Rick, S., Wimmer, G. E., Prelec, D. & Loewenstein, G. Neural predictors of purchases. Neuron 53 , 147–156 (2007).

Knutson, B. et al. Neural antecedents of the endowment effect. Neuron 58 , 814–822 (2008).

Article   CAS   PubMed   Google Scholar  

Mazar, N., Plassmann, H., Robitaille, N. & Lindner, A. Pain of Paying? A Metaphor Gone Literal: Evidence from Neural and Behavioral Science. SSRN working paper (2016).

Ceravolo, M. G., Fabri, M., Fattobene, L., Polonara, G. & Raggetti, G. Cash, card or smartphone: the neural correlates of payment methods. Front. Neurosci. 13 , 1188 (2019).

Article   PubMed   PubMed Central   Google Scholar  

Karmarkar, U. R., Shiv, B. & Knutson, B. Cost conscious? The neural and behavioral impact of price primacy on decision making. J. Mark. Res. 52 , 467–481 (2015).

Bartra, O., McGuire, J. T. & Kable, J. W. The valuation system: a coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. Neuroimage 76 , 412–427 (2013).

Levy, D. J. & Glimcher, P. W. The root of all value: a neural common currency for choice. Curr. Opin. Neurobiol. 22 , 1027–1038 (2012).

Knutson, B. & Karmarkar, U. Appetite, consumption, and choice in the human brain. Interdiscip. Sci. Consum. 163 (2014).

Calder, A. J., Lawrence, A. D. & Young, A. W. Neuropsychology of fear and loathing. Nat. Rev. Neurosci. 2 , 352–363 (2001).

Coghill, R. C., Sang, C. N., Maisog, J. M. & Iadarola, M. J. Pain intensity processing within the human brain: a bilateral, distributed mechanism. J. Neurophysiol. 82 , 1934–1943 (1999).

Coghill, R. C. et al. Distributed processing of pain and vibration by the human brain. J. Neurosci. 14 , 4095–4108 (1994).

Paulus, M. P. & Stein, M. B. An insular view of anxiety. Biol. Psychiatry 60 , 383–387 (2006).

Critchley, H. D., Wiens, S., Rotshtein, P., Öhman, A. & Dolan, R. J. Neural systems supporting interoceptive awareness. Nat. Neurosci. 7 , 189–195 (2004).

Kelly, C. et al. A convergent functional architecture of the insula emerges across imaging modalities. Neuroimage 61 , 1129–1142 (2012).

Dunfield, D. & Prelec, D. Committing to Plastic: The Effect of Credit Cards on Purchase Intention. SSRN working paper (2013).

Tabibnia, G., Satpute, A. B. & Lieberman, M. D. The sunny side of fairness: preference for fairness activates reward circuitry (and disregarding unfairness activates self-control circuitry). Psychol. Sci. 19 , 339–347 (2008).

Sanfey, A. G., Rilling, J. K., Aronson, J. A., Nystrom, L. E. & Cohen, J. D. The neural basis of economic decision-making in the ultimatum game. Science 300 , 1755–1758 (2003).

Article   CAS   PubMed   ADS   Google Scholar  

Ruff, C. C. & Fehr, E. The neurobiology of rewards and values in social decision making. Nat. Rev. Neurosci. 15 , 549 (2014).

Bernheim, B. D. & Rangel, A. Addiction and cue-triggered decision processes. Am. Econ. Rev. 94 , 1558–1590 (2004).

Laibson, D. A cue-theory of consumption. Q. J. Econ. 116 , 81–119 (2001).

Article   MathSciNet   MATH   Google Scholar  

Park, J., Lee, C. & Thomas, M. Why do cashless payments increase unhealthy consumption? The decision-risk inattention hypothesis. J. Assoc. Consum. Res. 38 , 126–139 (2020).

Eichenbaum, H. & Cohen, N. J. From conditioning to conscious recollection: Memory systems of the brain (2001).

Grossberg, S. Processing of expected and unexpected events during conditioning and attention: a psychophysiological theory. Psychol. Rev. 89 , 529 (1982).

Köszegi, B. & Rabin, M. A model of reference-dependent preferences. Q. J. Econ. 121 , 1133–1165 (2006).

MATH   Google Scholar  

Peciña, S. & Berridge, K. C. Dopamine or opioid stimulation of nucleus accumbens similarly amplify cue-triggered ‘wanting’ for reward: entire core and medial shell mapped as substrates for PIT enhancement. Eur. J. Neurosci. 37 , 1529–1540 (2013).

Le Foll, B., Gallo, A., Le Strat, Y., Lu, L. & Gorwood, P. Genetics of dopamine receptors and drug addiction: a comprehensive review. Behav. Pharmacol. 20 , 1–17 (2009).

Article   PubMed   CAS   Google Scholar  

Shams, L. & Seitz, A. R. Benefits of multisensory learning. Trends Cogn. Sci. 12 , 411–417 (2008).

De-Sola Gutiérrez, J., Rodríguez de Fonseca, F. & Rubio, G. Cell-phone addiction: a review. Front. Psychiatry 7 , 175 (2016).

Perry, V. G. Is ignorance bliss? Consumer accuracy in judgments about credit ratings. J. Consum. Aff. 42 , 189–205 (2008).

Shah, A. M., Eisenkraft, N., Bettman, J. R. & Chartrand, T. L. ‘Paper or plastic?’: how we pay influences post-transaction connection. J. Consum. Res. ucv056 (2015).

Becker, G. M., DeGroot, M. H. & Marschak, J. Measuring utility by a single-response sequential method. Behav. Sci. 9 , 226–232 (1964).

Jenkinson, M., Beckmann, C. F., Behrens, T. E. J., Woolrich, M. W. & Smith, S. M. FSL. NeuroImage 62 , 782–790 (2012).

Download references

Acknowledgements

The research was funded by the MIT Sloan School of Management, through the MIT Sloan Neuroeconomics Lab. Derek Dunfield was supported by the MIT Intelligence Initiative and the National Science and Engineering Council of Canada. The authors gratefully acknowledge the Athinoula A. Martinos Imaging Center at the McGovern Institute for Brain Research, MIT for their support in data collection, and constructive manuscript comments by three anonymous referees and Danica Mijovic-Prelec.

Author information

Authors and affiliations.

Eccles School of Business, University of Utah, Salt Lake City, UT, 84112, USA

  • Sachin Banker

MIT Sloan Neuroeconomics Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA

Sachin Banker, Derek Dunfield, Alex Huang & Drazen Prelec

Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA

Drazen Prelec

Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA

Department of Economics, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA

You can also search for this author in PubMed   Google Scholar

Contributions

S.B., D.D., A.H., and D.P. contributed to the design and implementation of the study. S.B. and A.H. collected and analyzed the neural data. S.B. and D.P. drafted the manuscript.

Corresponding author

Correspondence to Sachin Banker .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary information 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Banker, S., Dunfield, D., Huang, A. et al. Neural mechanisms of credit card spending. Sci Rep 11 , 4070 (2021). https://doi.org/10.1038/s41598-021-83488-3

Download citation

Received : 30 July 2020

Accepted : 04 February 2021

Published : 18 February 2021

DOI : https://doi.org/10.1038/s41598-021-83488-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Bitcoin-denominated prices can reduce preference for vice products.

  • Joowon Park

Marketing Letters (2023)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

research paper for credit card

  • Open access
  • Published: 19 May 2021

Modelling customers credit card behaviour using bidirectional LSTM neural networks

  • Maher Ala’raj   ORCID: orcid.org/0000-0001-9315-0670 1 ,
  • Maysam F. Abbod 2 &
  • Munir Majdalawieh 1  

Journal of Big Data volume  8 , Article number:  69 ( 2021 ) Cite this article

21k Accesses

22 Citations

Metrics details

With the rapid growth of consumer credit and the huge amount of financial data developing effective credit scoring models is very crucial. Researchers have developed complex credit scoring models using statistical and artificial intelligence (AI) techniques to help banks and financial institutions to support their financial decisions. Neural networks are considered as a mostly wide used technique in finance and business applications. Thus, the main aim of this paper is to help bank management in scoring credit card clients using machine learning by modelling and predicting the consumer behaviour with respect to two aspects: the probability of single and consecutive missed payments for credit card customers. The proposed model is based on the bidirectional Long-Short Term Memory (LSTM) model to give the probability of a missed payment during the next month for each customer. The model was trained on a real credit card dataset and the customer behavioural scores are analysed using classical measures such as accuracy, Area Under the Curve, Brier score, Kolmogorov–Smirnov test, and H-measure. Calibration analysis of the LSTM model scores showed that they can be considered as probabilities of missed payments . The LSTM model was compared to four traditional machine learning algorithms: support vector machine, random forest, multi-layer perceptron neural network, and logistic regression. Experimental results show that, compared with traditional methods, the consumer credit scoring method based on the LSTM neural network has significantly improved consumer credit scoring.

Introduction

The case with many financial institutions such as banks is that credit lending products such as credit cards, personal loans and mortgages are the center of their dealings, and proper lending will yield huge gains. As a result, it is important for financial institutions and banks to get new customers and ensure to keep profitable ones. Banks have created a wide customer database over the years, which can be used to analyze a bank’s performance and make progressive business decisions. It is not possible that all customers will act the same way when it comes to financial performance, therefore, there should be distinguishable treatment between customers who qualify for certain profitable requirements, based on their repayment and purchasing behaviour customers exhibiting such behaviour can be offered greater incentives and rewards [ 1 ]. Banks need to know their good or bad customers, and they will need credit scoring and behavioural scoring to do so. Article [ 2 ] defined credit scoring as the means of analyzing the likelihood of applicant to falter in their repayments, or not. In Anderson [ 3 ] authors defined it by dividing the term into two parts: the first is ‘credit’, which means to buy an item and pay afterwards, and the second is ‘scoring’, which is alike with the method used for credit cards.

There are two major kinds of credit scoring, and they are application credit scoring, where a score is applied to provide a decision on a new credit application; and behavioural scoring, is where the score is used to address existing customers after they have been given a loan. Liu [ 4 ] banks use behavioural scoring to guide their decisions about lending in credit limit management strategies; managing debt collection and recovery; retaining future profitable customers; predicting accounts likely to close or settle early; offering new financial products and interest rates; managing dormant accounts; optimizing telemarketing operations; and predicting fraudulent activity [ 3 , 5 , 6 , 7 , 8 ], the number of risk payment and the future risk of payment [ 9 ].

Furthermore, Lim and Sohn [ 10 ] have emphasized the benefits of having multifaceted models that predict when customers will fail to pay or repay debts, as follows: (1) Calculating the profitability over a customer’s lifetime and doing profit scoring; (2) Making available to the bank an average of default levels over time, which is beneficial for debt provisioning; (3) Assisting in arriving at the terms of the loan; and (4) Adapting more to changing economic conditions. Banks usually try to estimate a borrower’s credibility and give a safe probability when a customer may miss a payment generally, and subsequent payments particularly [ 11 ]. These models help the bank to take actions quickly against any risk that ends up in unfavorable behaviour by borrowers [ 12 ].

This paper focuses on behavioural scoring. According to Hsieh [ 13 ], behavioural scoring is utilized to examine the behaviour of existing customers, considering their attitudinal variables and estimate their payment behaviour or credit status. Behavioural scoring lets lenders to consistently monitor the changing behaviour or features of customers and help to direct customer level decision making.

Motivations

The primary origin of a credit card related risk for banks is client default, which is the inability to reimburse a debt on a loan or security. A default can happen when a borrower cannot make convenient payments, misses payments, or dodges or quits making payments. In the case of credit cards, no assets are securing the debt, but the lender still has legal recourse in the event of default. Credit card corporations regularly give few months before an account goes into default. However, if after 6 months or more there have been no instalments, the account will get feed off, meaning the lender takes a loss on the account [ 14 ]. Consecutive missed payments for credit card debt are an early sign of customer bankruptcy. Following the Basel II convention, consumer credit default is commonly defined as delinquency beyond a period of 90 days [ 15 ]. Therefore, the research of this paper is motivated by the necessity of automatically scoring the customer’s behaviour on repayments to make risk decisions, and the use of credit card scores to make necessary financial security decisions. Using such scores, banks can classify customers into “risk groups”, which could help to detect potential bankruptcy early and block the customer’s card in time to limit losses. Hence, the task of estimating the missed payment probability for clients who already have one or more missed payments turns out to be important for bank management.

The main drawback of existing automatic scoring solutions lies in the necessity for bank management to manually extract features from raw transactional data. This process is subjective and can lead to the loss of information in the data. On the other hand, LSTM extracts features internally and in the way which is hidden from outside observers.

The main aim of this paper is to help bank management in scoring credit card clients using machine learning techniques. The main contributions and objectives of this paper, based on the above motivations, are:

Introduce a deep learning neural network architecture based on Long-Short Term Memory (LSTM) bidirectional neural networks as a method of customer behaviour score estimation.

Prove the feasibility of LSTM model and test it on the real credit cards dataset by comparison with other classifiers.

The developed LSTM model is compared to four classical machine learning algorithms: Support Vector Machine (SVM), Random Forest (RF), Bagged Neural Network (NN), and Logistic Regression (LOGR). The paper discusses the importance of performing a detailed comparison procedure while proving high accuracy using LSTM model that best fulfils the users’ interest.

The remainder of the paper is organized as follows: Section “ Machine learning approaches in behavioural scoring ” gives a preview of the relevant literature on machine learning models in credit and behavioural scoring. Section “ Methodology ” describes the proposed methodology that is used in this paper. Section “ Experimental design ” explains the experimental setup, whereas Section “ Results and discussion ” presents the experimental results and analysis. Finally, in Section “ Conclusion ”, conclusions are drawn, and future work prospects are discussed.

Machine learning approaches in behavioural scoring

The field of credit scoring has become a broadly investigated subject by researchers and the financial industry [ 16 ], with numerous models having been proposed and created utilizing measurable methodologies, for example, LOGR [ 6 ] and Linear Discriminant Analysis (LDA) [ 17 , 18 ]. Because of the financial crisis, the Basel Committee on Banking Supervision demanded all banks to apply thorough credit assessment models in their frameworks while conceding a loan to an individual customer or a company. Appropriately, research have shown that Artificial Intelligence (AI) procedures (e.g., neural networks, SVM, and RF) can be a decent exchange for measurable methodologies in building credit scoring models [ 19 , 20 , 21 ].

Behavioural scoring applies characteristics of customers’ ongoing behaviour to predict whether they are prone to default during a specific outcome period. Often the outcome period and fixed performance period are subjectively selected, which causes instability in the prediction-making process.

Most papers in the literature were centred on behavioural scoring with respect to customer loans [ 22 , 23 , 24 ]. However, behavioural scoring of client’s credit card payments has not been appropriately investigated. Behavioural scoring models support to analyse purchasing behaviour of existing customers [ 25 ]. Only a few works have studied the mining of bank databases from the viewpoint of customer behavioural scoring [ 26 ]. To alleviate this, Hsieh et al. [ 27 ] have used a Taiwanese bank credit card dataset to demonstrate the effectiveness of behavioural scoring. The authors use three commonly discussed data mining techniques: LDA, SVM, and Back Propagation Neural Networks (BPNN).

In recent years, loan and credit card transactions information has become significantly larger. Therefore, it is often difficult to use traditional mathematical and statistical models for such types of problems. To construct behavioural scoring models, professionals must think about a few significant issues, such as the extensiveness of the dataset to model, the planning horizon, and drivers of unwanted behaviour [ 6 ]. The literature does not contain solid suggestions on the most proficient method to respond to these questions.

One of the approaches is to use feature selection on features, generated from raw transactional data. Feature selection was used in credit scoring problems [ 28 ]. In general, feature selection is very important to use such as for knowledge discovery in databases (KDDs). Some of the applications are for Colorectal Cancer Cases Phenotype [ 29 ], breast cancer identification [ 30 ], household poverty [ 31 ], air pollution [ 32 ]. Meanwhile, the extended version of SVM-DHGLM increased the accuracy, precision, recall, for feature selection and classification [ 33 ].

Hence, this paper explores a portion of the issues influencing the structure of a behavioural scoring model using machine learning by investigating the performance of a large pool of credit card transactions dataset.

Pereira [ 34 ] examined the conduct of a credit card purchaser relying upon whether they do payments involving a tremendous measure of cash. In Alborzi and Khanbabaei [ 35 ], a new hybrid model of behavioural scoring and credit scoring based on data mining and neural network techniques is introduced for both banking and marketing purposes. A two-stage scoring approach with wide and deep learning usage suggested in Bastani et al. [ 36 ] is an integration of credit scoring and profit scoring. Stage 1 was designed to identify non-default loans, which were then moved to stage 2 for probability prediction, wide and deep learning were used to build the predictive models in both stages to achieve both memorization and generalization. In the study by Akkoç [ 37 ], the author has proposed a three-stage hybrid Adaptive Neuro Fuzzy Inference System credit scoring model, which is based on statistical and neuro-fuzzy techniques. Addo et al. [ 38 ] have built binary classifiers based on machine and deep learning models were built on real data to predict loan-default probability. In Gui [ 39 ], the author intends to apply multiple machine learning algorithms to analyse the default payment of credit cards. Based on the user operation behaviour data of the P2P lending industry, a consumer credit scoring method based on the attention mechanism LSTM was offered by Wang [ 21 ].

Considering the relevant literature and to the best of our knowledge, there are no studies which apply LSTM neural networks to the task of predicting consecutive missed payments and defaults for customers’ credit cards. For example, in [ 21 ] an LSTM neural network was used, but the application differs from the field of this research; in Heryadi and Warnars [ 40 ] and Graves et al. [ 41 ] various architectures of neural networks was used, but research topic was credit card fraud detection, which is different from ours. Also, we show that scores of the model can be treated as probabilities, which is significant fact.

This paper discovery is contributing to the literature of credit and behavioural scoring since as the application of LSTM neural networks to missed payment analysis with concurrent use of customer information has not be studied previously.

Methodology

Recureent and lstm nueral networks.

Recurrent neural networks (RNNs) are a special class of supervised machine learning models. They are made of a sequence of cells with hidden states which have non-linear dynamics. RNNs are used mostly with time series data, for example, speech recognition [ 42 ], unsupervised anomaly detection [ 43 ], and automated translation [ 44 ]. LSTM is also used in economics to forecast time series data as an alternative to the ARIMA model [ 45 ]. As transactional data in credit cards has a temporal nature, it is advisable to use RNNs instead of other types such as fully connected or convolutional neural networks.

In a recurrent neural network, connections between cells form directed cycles. Each cell contains a hidden state, which is updated on each iteration using its previous values. Such a structure creates an internal network state and works as a memory. The RNN equations are:

where \(x\) is an input vector, \(s\) is a hidden vector of RNN layer values, \(h\) is an output vector of RNN layer values, \(U\) is a weight matrix of the input layer to the hidden layer, \(V\) is a weight matrix of the hidden layer to the output layer, \(W\) is a weight matrix for the previous time point to the current time point of the hidden layer, and \(g\) and \(f\) are activation functions for output and hidden layers respectively. The structure of a standard RNN model is shown in Fig.  1 .

figure 1

RNN model structure [ 21 ]

In Fig.  1 , the work of one RNN cell is illustrated. We feed time series signal X to the cell element by element. The vector X can be an input vector or output from other RNN cell from the previous layer. The RNN cell holds its state \(s\) . At each iteration \(t\) , the state \(s_{t}\) and output \(h_{t}\) are calculated by Eq. ( 1 ). Because of their architecture, RNNs can [ 21 ]:

Recognize patterns, characteristics, and dependencies in sequential and time series data;

Store, remember, and process past complex signals for long time periods;

Map an input sequence to the output sequence at the current timestep and predict the sequence in the next timestep; and

Replicate any target dynamics after the training process, even with adjusted accuracy.

However, there are issues with learning long-term dependencies. Because RNN is prone to vanishing gradients during training, it is difficult to learn long-term dependencies [ 46 , 47 ]. To solve this problem, Hochreiter and Schmidhuber [ 48 ] have proposed an LSTM based on RNN. As with RNNs, LSTM predictions are always conditioned by the experience of the network’s inputs. Its distinguishing feature is the existence of special units called memory blocks in the recurrent hidden layer, which perform like accumulators of the state information. Every memory block has memory cells with self-connections, which store the temporal network state, and special multiplicative units called gates, which can control the stream of information. These cells and gates allow the LSTM to trap the gradient in the cell (also known as constant error carousels) and prevent it from vanishing. The gate activation functions are sigmoid, thus output value ranges from 0 to 1, and denotes how much information can be allowed to pass outside. The structure of a single LSTM cell is shown in Fig.  2 .

figure 2

LSTM model structure [ 21 ]

As seen in Fig.  2 , an LSTM cell consists of three gates, namely an input gate, that controls how many cell states need to be stored an output gate that controls how many cell states are sent to the next cell have to, and a forget gate, that controls how much information needs to be removed [ 49 , 50 ]. Two of these gates contain internal states. It can be seen that on each iteration \(t\) , the LSTM cell is using the previous values of the candidate vector \(C_{t - 1}\) and output vector \(h_{t - 1}\) to calculate their next values. The output of each gate is post-processed using activation functions. The shape of the activation function is important and can significantly affect the efficiency of the neural network [ 43 ].

By default, the activation function of the recurrent gates is a sigmoid function [ 48 ], which is a non-linear activation function that is used mostly in feedforward neural networks. It is a bounded monotonically increasing differentiable real function, defined for all real input values, as given by the following sigmoid function equation:

The sigmoid function is applied to the output layers of the deep learning architectures in binary classification problems, modelling logistic regression tasks as well as other neural network domains. However, the sigmoid activation function suffers major drawbacks which include sharp damp gradients during back propagation from deeper hidden layers to the input layers, gradient saturation, slow convergence, and non-zero-centred output, thereby causing the gradient updates to propagate in different directions [ 28 ].

The hyperbolic tangent function is the default activation function for an LSTM cell’s output gate [ 48 ]. The hyperbolic tangent function, tanh, is a smooth antisymmetric function with the range of values [− 1,1]. The output of the tanh function is given by:

The main advantage provided by tanh is that it produces zero-centred output, thereby aiding the back-propagation process. The detailed procedure of an LSTM cell is explained as follows:

On the first step, LSTM should decide which information to forget. For this purpose, the information of the previous memory state is processed through the forget gate \(f_{t}\) :

On the second step, input gates \(i_{t}\) decide which information should be updated, and the tanh layer updates the candidate vector \(\tilde{C}_{t}\) :

On the next step, memory states \(C_{t}\) are updated as a combination of the two parts above:

Finally, output gates \(o_{t}\) are used for controlling the output \(h_{t}\) :

Therefore, each LSTM layer is characterized by [ 48 ]:

Matrix \(W_{f}\) and \(b_{f}\) , vector, which are parameters of the forget gate;

matrix \(W_{C}\) and vector \(b_{C}\) , which are parameters of the input gate; and

matrix \(W_{o}\) and \(b_{o}\) , vector, which are parameters of the output gate.

To increase the performance and learning speed of LSTM neural networks, in the research [ 51 ] bidirectional LSTM neural networks were proposed. According to Schuster and Paliwal [ 51 ], bidirectional LSTMs are an extension of traditional LSTMs that can improve model performance on sequence classification problems. In problems where all time steps of the input sequence are available, bidirectional LSTMs train two instead of one LSTMs on the input sequence. The first on the input sequence as-is and the second on a reversed copy of the input sequence. This can provide additional context to the network and result in faster and even fuller learning on the problem.

According to Fig.  3 , the forward layer output sequence, h , is iteratively calculated using inputs in a positive sequence from time t  =  0 to time t  =  T , while the backward layer output sequence, \(\mathop{h}\limits^{\leftarrow} \) , is calculated using the reversed inputs from time t  =  T to t  =  0 . Both the forward and backward layer outputs are calculated by using the standard LSTM updating equations, Eqs. ( 2 – 7 ). The Bidirectional LSTM layer generates an output vector, Y t , in which each element is calculated by using the following equation:

figure 3

Bidirectional LSTM architecture [ 53 ]

One more extension of stacked LSTM neural networks is the “Attention” mechanism. The Attention Mechanism in the deep learning model is a model that simulates the attention of the human brain. When people observe images, they do not carefully look at every pixel of the image. Instead, they focus their attention selectively on some important parts of the image, ignoring other unimportant parts. Initially, attention mechanism was developed for automatic translation challenges [ 52 ], but then its usage was enhanced to image recognition and classification problems.

Proposed model

Even though the LSTM neural network principles are already well studied, choosing the architecture is often up to the researcher [ 21 , 45 , 53 ]. This includes choosing the number and type of layers, number of cells in each layer, activation functions, etc. In order to use the LSTM architecture in the behavioural scoring task, it must be modified to make it possible to use not only transactional data but also other customer data (age, salary, country of origin, etc.).

Usually neural network architecture is chosen with respect to data used for training. That’s why it is important to use spatial structure and order of input data to make it possible to build efficient model with low number of parameters (weights). For temporal input usually RNN’s and LSTM’s neural networks are used. However, for mixed temporal and non-temporal data LSTM network is not applicable. One solution is to feed non-temporal data into dense layers at the top of LSTM, but in this case non-temporal features are used only in final stage of model.

Attention layer require optional query input which is used as a context of temporal input. We use non-temporal data as a query input to this layer to add a context of customer good or bad payment behaviour. Hence, such layer is able to distinguish financial behaviour of customers with taking into account their educational and marital status, as well as gender and age.

As it is seen from Fig.  4 , the first two layers are bidirectional LSTM, next layer is Attention. The two last layers are the concatenation of output of Attention layer and the non-temporal client data. The last layer consists of only one neuron.

figure 4

The data processing flow in LSTM Neural Network model

Table 1 shows the hyperparameters for the developed models. As it can be seen, the model for monthly purchase estimation is more complex than the one for missed payment prediction. This can be explained by the fact that, in general, regression problems are more complex than the classification ones. Number of neurons in each layer was selected using grid search, activation functions were selected by adopting the most used from similar research [ 21 , 40 , 53 ].

Time window parameter is important, but it belongs to the input data rather than model, so it will be defined in Section “ Data description ”.

Experimental design

The aim of the LSTM model is to automate credit card behaviour scoring for customers as well as to trigger an early alert for credit card default. The framework of the proposed model is presented in Fig.  5 . The workflow presented will let us fully investigate the model performance to make reliable conclusions.

figure 5

Model framework

The proposed framework consists of several steps. Firstly, the dataset is pre-processed and formatted to be used by Bidirectional LSTM classifier. As a next step, fivefold validation technique is used to get prediction for all customers in dataset. Then the performance measures are calculated for different groups of customers which is of financial interest to the bank institutions (banks are especially interested in customers with unsatisfactory history of payments). To outline performance of the model it is compared to benchmark models using various performance measures. Results are discussed in the final section.

There are only few open source transactional datasets that can be used to test efficiency of proposed model. Majority of datasets are either non-temporal or they are from different field of research. To verify the practicality and effectiveness of the proposed LSTM model we use a public Footnote 1 real credit cards dataset used in Bahdanau et al. [ 52 ] and can be easily converted to temporal form.

Dataset description

The dataset used in this paper is a public non-transactional credit cards dataset that reflects customer’s default payments in Taiwan [ 54 ]. It has been widely used in validating credit and behavioural scoring models [ 55 , 56 , 57 ], also in deep learning models [ 58 , 59 ]. Usually, banks do not disclose transactional databases in raw form, and thus majority of datasets in the open access are in processed form. Hence, we used this dataset because this is the only publicly available dataset which can be converted into temporal form (customer payment statistics for each month rather than aggregated values).

The size of the data set is 30,000 records, which is large enough to test the efficiency of the proposed model. The number of non-default payments is 23,364, while the number of default payments is 6636 (proportion of default payments in dataset is 22%). There were no missing values in dataset.

In the dataset the following 23 variables are used as explanatory:

X1: Amount of the given credit, which includes both the individual consumer credit and his/her family (supplementary) credit.

X2: Gender (1 = male; 2 = female).

X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).

X4: Marital status (1 = married; 2 = single; 3 = others).

X5: Age (year).

X6–X11: History of past payment. Tracked payment records are denoted from September to April 2005 by X6–X11, respectively. The measurement scale for the repayment status is: − 1 = pay duly; 1 = payment delay for 1 month; 2 = payment delay for 2 months; ...; 8 = payment delay for 8 months; 9 = payment delay for 9 months and above.

X12–X17: Amount of bill statement. The amount of bill statement is denoted from September to April 2005 by X12–X17, respectively.

Amount of previous payment (NT dollar). X18 = amount paid in September 2005; X19 = amount paid in August 2005; ...; X23 = amount paid in April 2005.

The variables can be divided into two groups: numerical and categorical. The examples of the first are: X1 (amount of given credits), X5 (age), X6–X11 (history of past payment), etc. The second group contains such variables: X2 (gender), X3 (education), X4 (marital status).

Dataset pre-processing and partitioning

Before feeding into a neural network, it was split into two parts: temporal data and non-temporal data. Columns X6–X23 as temporal data that reflect customer behaviour in time were reshaped into a three-dimensional array of shape (number of customers, number of months, number of features). According to the data set description, for each customer we have information about his payment behaviour during 6 previous months. Therefore, the second dimension of the array is equal to six. The number of temporal features available for each customer is equal to three, namely:

Payment delay by the end of each past month;

Amount of bill statement by the end of each past month;

Amount of the payment in each month.

Non-temporal categorical data was split into binary, thus for each customer there are eight non-temporal features:

Amount of given credit.

Education—graduate school.

Education—university.

Education—high school.

Education—others.

Marital status.

To properly test the performance of the model we use fivefold cross validation as partitioning technique. All customers were randomly split into five groups, and during each fold each group become testing once.

As it can be seen, most of the information in the dataset is stored in temporal features of past credit card activity and payments. On the other hand, non-temporal features are too general and are, in fact, categorical features. That is why without using temporal features it is impossible to predict future missed payment probability.

Attention mechanism is used to provide a context. Hence, age and gender provide such context for temporal financial information. It means that similar payment behaviour for young and old customers can lead to different payment outcomes (e.g., young customers can forget or skip to pay in some month and have bad payment history, but they would pay eventually).

Benchmark models development

To measure how well the proposed approach has performed, the results of the proposed model are compared to five benchmark models, namely, GB, BNN, RF, SVM and LOGR. The latter model is the industry standard for developing credit scoring models [ 60 , 61 ]. However, [ 61 ] has stated that it is beneficial to compare a new method with the standard one as well as other established techniques. MLP, RF, and SVM have been used in several studies as a benchmark model [ 62 ]. The theoretical backgrounds of the models are described in the following sections.

Gradient boosting

Gradient Boosting (GB) machines are a group of powerful machine learning techniques that have demonstrated impressive accomplishment in a wide scope of practical applications. They are highly customizable to the particular needs of the application, like being learned with respect to different loss functions. The fundamental thought of boosting is to add new models to the ensemble consecutively. At each particular iteration, a new weak, base-learner model is trained with respect to the error of the full ensemble learnt up to the last iteration [ 63 ].

Bagging neural network

Neural Networks (NN) are machine learning frameworks motivated by the scheme of the biological neuron [ 64 ]. These are shown so as to have the option to copy the human brain capacities regarding discovering complex connections between the inputs and outputs [ 65 ]. One of the most well-known designs for NNs is the multi-layer perceptron, which comprises of one input layer, at least one hidden layer, and one output layer. As per [ 66 ], central points of contention waiting be tended to in building NNs are their topology, structure, and learning algorithm. The most used MLP topology for credit scoring is three-layer feedforward back propagation network. Consider the input of a credit scoring training set \(x = \left\{ {x_{1} , x_{2} , \ldots , x_{n} } \right\}\) ; the MLP model works in one direction, starting from feeding the data \(x\) to the input layer ( \(x\) includes the customer’s attributes or characteristics). These inputs are then sent to a hidden layer through links, or synapses, associated with the random initial weight for every input. The hidden layer will process what it has received from the input layer and, accordingly, will apply an activation function to it. The result is worked as a weighted input to the output layer, which will further process weighted inputs and apply the activation function, take the lead to a final decision [ 67 ]. In recent years ensemble models became more popular, so instead of a single NN, Bagging NN is used with 10 neural networks.

Support vector machines

A SVM is another ground-breaking machine learning method utilized in order and credit scoring issues. SVMs are used for binary classification to make the best separation that splits the input data into two classes (good and bad credit). SVMs were first proposed by Cortes and Vapnik [ 68 ], adapting the form of a linear classifier. The primary distinction of the SVM model from the linear one is the occurrence of a function that is used to map the data into a higher dimensional space. To achieve this, linear, polynomial, radial basis, and sigmoid kernel functions were suggested. An SVM maps non-linear data of two classes to a high-dimensional feature space, with a linear model then being used to implement the non-linear classes. The linear model in the new feature space will denote the non-linear decision margin in the original space. Consequently, the SVM will build an optimal line or hyperplane that can perfectly separate the two classes in the space. SVMs are being widely used in credit scoring and other fields owing to the method’s exceptional results [ 69 , 70 ].

Random forests

A random forest (RF), as proposed by Breiman [ 71 ], is considered an innovative decision tree (DT) technique which consists of a large number of trees that are created by generating n subsets from the core dataset, with each subset being a tree created based on randomly selected variables, therefore the name “random forest”. After all the DTs are generated and trained, the final decision class is based on a voting method, where the most popular class decided by the trees is selected as the final output class by the RF.

Logistic regression

Logistic Regression (LOGR) has been considered until now to be the industry standard for credit scoring model development [ 68 ]. It is a broadly used statistical technique that is popular for solving classification and regression problems. LOGR is used to model a binary outcome variable, usually characterized by 0 or 1 (good and bad loans). The LOGR formula is expressed in Atiya and Parlos [ 19 ].

Performance measure metrics

To validate the proposed model and in order to reach a reliable and strong conclusion on the predictive accuracy of the proposed method, five performance indicator measures are implemented, specifically: (1) accuracy, (2) Area Under the Curve (AUC), (3) H-measure, (4) Kolmogorov–Smirnov (KS) chart, and (5) Brier’s score. These are chosen because they are popular in credit scoring and they give a comprehensive view on all facets of model performance. The accuracy stands for the proportion of correctly classified good and bad loans, which measures the predictive power of the model. As such, this is a standard that measures the discriminating ability of the model [ 68 ]. The accuracy can be defined as the percentage of correctly classified instances

where TP, FN, FP and TN represent the number of true positives, false negatives, false positives and true negatives, respectively.

AUC is a tool used in binary classification analysis to determine which of the models used predicts the classes the best. According to Hand [ 72 ], the AUC can be used to estimate the model’s performance without any preceding evidence about the error costs. However, it assumes different cost distributions among classifiers depending on their actual score distribution, which prevents them from being compared effectively. As a result, Hand [ 72 ] proposed the H-measure as an alternative to AUC for measuring classification performance, which assumes different cost distributions between classifiers without depending on their scores. In other words, this measure finds a single threshold distribution for all classifiers. AUC is evaluated as area under the ROC-curve for measured classifier.

The KS distribution was originally formulated as an observance hypothesis test for distribution-fitting to data. In binary classification problems, it has been used as a divergence metric for assessing the classifier’s discriminant power by measuring the distance that its score produces between the cumulative distribution functions of the two data classes [ 73 ].

Lastly, the Brier score, which is also known as the mean squared error [ 74 ], measures the accuracy of the probability predictions of the classifier by taking the mean squared error of the probability. In other words, it shows the average quadratic possibility of a mistake. The main difference between the Brier score and accuracy is that it directly takes the probabilities into the account, while accuracy transforms these probabilities into zero or one based on a predetermined threshold or cut-off score. The lower the Brier score, the better the classifier performance. The most common formulation of the Brier score is:

in which \(f_{t}\) is the probability that was forecast, \(\sigma_{t}\) the actual outcome of the event at instance t (zero if it does not happen and one if it does happen) and N is the number of forecasting instances.

To check whether a model’s behavioural score can be considered as the likelihood of missed payment, calibration curves are used. Well-known as reliability diagrams, they can be applied to classifiers which predictand obtain a probability of the respective class. Reliability diagrams offer a diagnostic to check whether the scores are trustworthy. Thus, a prediction is considered as trustworthy if the event happens with an observed relative frequency consistent with the forecast value [ 75 ]. A calibration curve works by sorting the output scores of the classifier. In Particular, the forecasts are apportioned into a fixed number of buckets along the x-axis. The number of classes or labels are then counted for each bin (e.g., the relative observed frequency). After All, the counts are normalized. The results are then plotted as a line plot. If the classifier is forecasting accurately, then it is expected that the percentage of dominant class classifications and the mean probabilities assigned to the dominant classes in each bin to be close to one another. If it is not doing so accurately, these two values diverge. The point positions on the curve relative to the diagonal help to interpret the forecasts, for example:

Below the diagonal: the model has over-forecast; the probabilities are too large.

Above the diagonal: the model has under-forecast; the probabilities are too small.

Statistical significance tests

As indicated by Witten et al. [ 76 ], it is not adequate to demonstrate that one model accomplishes results in a way that is better than another, because of the different performance measures or splitting techniques used. For complete performance evaluation, it would appear to be proper to actualize some some hypothesis testing to stress that the experimental differences in performance are statistically significant and not just due to random splitting influences. Selecting the right test for detailed experiments depends on factors such as the number of datasets and the number of classifiers to be contrasted.

According to Demšar [ 77 ], statistical tests can be parametric (e.g., paired t -test) and non-parametric (e.g., Wilcoxon, McNemar). However, the author recommended that non-parametric tests are desirable to parametric tests as the last can be conceptually unsuitable and statistically unsafe. Non-parametric tests may be more applicable and safer than parametric tests since they do not presume the normality of data or homogeneity of variance [ 77 ]. Accordingly, in this study, the McNemar test to compare the ranking performance of all the models measured across a unique dataset is adopted [ 78 ]. According to Kavzoglu [ 79 ], the McNemar test investigates the statistical significance of the differences in classifiers’ performances. The test is a Chi-square (χ 2 ) test for goodness of fit, comparing the distribution of counts expected under the null hypothesis to the observed counts. It is applied to a 2 × 2 contingency table, the cells of which include the number of cases correctly and incorrectly classified by both models and the number of samples classified correctly by only one model.

The aim of the McNemar test is to check the null hypothesis, which says that neither of the two models performs better than the other. The alternative hypothesis asserts that the performance of the two models are not equal. The McNemar statistic is as illustrated in Eq. ( 11 ):

where \(n_{ij}\) indicates the number of cases misclassified by model \(i\) but classified correctly by model \(j\) , and \(n_{ji}\) indicates the number of cases misclassified by model \(j\) but not by model \(i\) .

The computed statistic is thought as a value from the \(\chi^{2}\) distribution with 1 degree of freedom. Based on this assumption, the p-value is calculated. If this p-value is smaller than predefined significance level α, then we fail to reject the null hypothesis. Otherwise, we reject the null hypothesis, and accept the alternative hypothesis. For example, if the value of test statistic is greater than 3.84, then (according to the \(\chi^{2}\) table at 95% confidence interval) it can be stated that the two methods differ in their performances. In other words, the difference in performance between the methods \(i\) and \(j\) is said to be statistically significant [ 78 , 79 ].

Results and discussion

In this section, the results of the proposed LSTM model are presented along with comparisons to the benchmark classifiers. The model is validated over the above-described dataset across five performance measure metrics. In addition, several tables and figures regarding the proposed model results and comparison to traditional models are provided and discussed. All the experiments for this study were performed using Python 3.8 × 64 on a PC with an AMD 8-core Ryzen™ 7 3700X 3.6–4.4 GHz processor and 32 GB RAM, running Microsoft Windows 10 operating system.

To outline the discrimination power of the Bidirectional LSTM model performance measures are calculated not only for all active customers, but for different subsets of them:

Customers with one missed payment during the last 2 months are the group that generally have a low risk of default, but the recent missed payment is a reason to look at those in this group more closely.

Customers with a missed payment during the last month is a subset of the first group. Whilst one missed payment can be made by chance, here there is a need to look at this group to distinguish riskier customers from other ones.

Customers with two consecutive missed payments form a group in which most customers might have financial problems because it is unlikely to forget to pay during more than 1 month.

Customers with three consecutive missed payments are those on a verge of default. For this group, a fourth missed payment is equivalent to default, so a Bidirectional LSTM prediction of the fourth missed payment is a prediction of default.

As a next step, the model was compared with five classical classifiers: Gradient Boosting, Bidirectional Neural Network, Logistic Regression, SVM, and Random Forest. Comparisons were made not only using the performance measures but also using the statistical McNemar test.

The LSTM model provides probability of missed payment for next month for each customer based on previous 6 months, and it does not use future data to predict past.

Bidirectional LSTM model results

To prove that the results obtained on the testing set are sound and to make the results of Bidirectional LSTM significant, different measures need to be evaluated, each of which reflect different aspects of the model performance:

Accuracy is the simplest method of evaluating the model preciseness. It does not consider any misclassification loss and simply displays the proportion of correctly classified missed payments for the default score threshold, which is equal to 0.5.

Specificity measures the proportion of missed payments that are correctly identified.

Specificity measures the proportion of payments made on time that are correctly identified.

The balanced accuracy in binary and multiclass classification problems to deal with imbalanced datasets. It is defined as the average of recall obtained on each class. For binary classification problems, balanced accuracy is evaluated using Eq. ( 12 ).

AUC tells us how the model will perform for different selected thresholds.

Brier score reflects the discriminatory power of the model (i.e., how certain the model is about the customer’s predicted missed payment).

KS reflects the maximum difference between the fraction of correctly classified customers, those who missed a payment, and incorrectly classified customers, those who did not miss a payment. The value tells us that model correctly classifies not only the presence of a missed payment, but also absence of it.

H-measure is an integral measure over all misclassification costs. A high H-measure value tells us that, regardless of actual cost of misclassification, the total loss cost of model is low.

As shown in Table 2 , the correctness of the LSTM model prediction ability is shown in “Accuracy” column. Performance measures for the customers with three or more consecutive missed payments is much lower than for other groups. It could be explained by the fact, that some proportion of customers drastically change its behaviour in the risk of bankruptcy and trial. So, based on its past behaviour they should have fourth missed payment, but pressure from the bank forces them to pay. The table shows that that the model considers consumers with payment problems as those who are more prone to them in the future. The classifier accuracy is lower than for the transactional dataset, which can be explained by initial data pre-processing which might lead to information loss.

Sensitivity (ability of model of identifying missed payments) is around 40% for first two groups of customers rise to 90% for the last one. On other hand, specificity for all groups except the last one is more than 90%. It tells us that if model identifies customer as “low risk”, bank management should not worry about future payments from him. As mentioned earlier, the higher the AUC value, the better the classifier is capable of distinguishing between classes. The proposed model shows similar prediction ability on all subsets of active customers except the last one. For those except the last it is higher than 77%, which proves good classifier separability. The lower the Brier score is, the better classifier performs. an increase can be seen in the Brier score for the customers with three missed payments. The higher the Kolmogorov–Smirnov chart statistics, the better the discriminative power of the model. As was mentioned before, for all subsets except the last one, this value is sufficiently high to prove good discriminative model ability.

As was mentioned before, the H-measure is a measure of the misclassification loss, and this depends on the relative proportion of objects belonging to each class. The influence of the different number of customers with missed payment fee can be seen from the table. But generally, the higher H-measure, the better the classifier is in terms of performance over different misclassification costs. For all subsets of customers that are investigated, this value is good enough.

As it can be seen from Fig.  6 , the AUC value for the Bidirectional LSTM model is high for all customers as well as for specific risk groups except the last one (with three consecutive missed payments), despite the fact that proportion of missed payments for all customers and for customers with missed payment differs greatly (see Table 2 ). The shape of the ROC curve is round for all customer groups. The highest AUC is for the customers with two consecutive missed payments. Bank can use this group to early put pressure on such customers and prevent third missed payment.

figure 6

ROC and AUC values for different customer groups

Figure  7 represents the behaviour score distribution for different customer groups along with the observed behaviour. The splitting process into ten buckets along the x-axis was based on the customer missed payment prediction. Thus, it is expected that the number of customers without a missed payment will decrease along the axis, while the number of customers with missed payment will increase, and the histogram reflects this tendency. So, whilst there are, of course, misclassified clients in every group, their percentage is significantly less than correctly classified. Thus, the proposed model can be considered as reliable.

figure 7

Distribution of scores for different customer groups

Figure  8 compares how well the probabilistic predictions of Bidirectional LSTM for the different client groups are calibrated using 10 bins. The calibration curve for all clients shows that it is the best calibrated among the others. It fits the line almost perfectly, which means that missed payment scores can be considered as probabilities. The only group with the curve far from the central line is customers with three or more consecutive missed payments. This curve has small over-forecast for low scores, but in general it also lies close enough to other curves.

figure 8

Calibration curves for different customer groups

Benchmark model results and comparison

To verify the strength and discriminative power of proposed model, its performance was compared to five benchmark models, namely GB, BNN, RF, SVM and LOGR. As all benchmark classifiers do not accept temporal input, all temporal data was flattened before feeding into each classifier. The comparison results are show in Table 3 which represents the performance indicator measures for the different classifiers on the same input data.

On the first sight, the correctness of the predictions is similar and high enough for all the models. The closest by performance model to proposed bidirectional LSTM model is the GB model. The reason why performance of all classifiers is so close to each other lies in lower dimensionality of the input data. For each customer there is only 23 features instead of hundreds of transactions in previous data set. So, simple classifiers have less problems in extraction useful information from feature space. Threshold changes can improve classifier accuracy; maximum accuracy can be achieved by applying the optimal threshold. So, as it can be seen, there is a slight increase for all of them when applying the optimal threshold, but the highest value still belongs to proposed model. It is obvious that bidirectional LSTM and GB have similar KS value, which is slightly higher than corresponding value for other classifiers. Similar pattern can be observed with H-measure. Brier score for bidirectional LSTM model is the lowest, which proves the quality of this model.

Proposed model has the highest sensitivity among all other models. Its specificity is equal to 95%. Despite some other classifiers like Random Forest, SVM and LOGR have higher specificity, their sensitivity is much lower. That is why balanced accuracy for Bidirectional LSTM classifier is the highest among other classifiers. Therefore, from the Fig.  9 it can be conducted that performance of Bidirectional LSTM is the best among all considered classifiers. The worst AUC value is from the SVM classifier (especially in the second part of the plot), which means that it is acceptable to use it to increase the True Positive Rate value.

figure 9

ROC and AUC values for all classifiers

For such complex problem even half of the percent of increase in accuracy or AUC of classifier leads to significant loss decrease for bank due to missed payments and bankruptcies of customers. That’s why we think that results are significant.

Figure  10 compares how well the probabilistic predictions of the different classifiers are calibrated, using a calibration curve with ten bins. The plot shows that there are two perfectly calibrated classifiers: Bidirectional LSTM and Bagging NN. That is why scores of this classifier can be used as probabilities. The worst curves have RF and LOGR classifiers. To make sure that the difference in performance measures are statistically significant and are not caused by chance, the McNemar test is used.

figure 10

Calibration curves for all classifiers

Table 4 represents the results of applying the McNemar test for pairwise comparison of the LSTM model and the other classifiers. During the application of McNemar test the same value of significance threshold is used for the transactional data set α = 0.05. According to the results above, every classifier pair shows a statistically significant performance difference. The closest to Bidirectional LSTM classifier is GB, which has p-value equal to \(3.62 \times 10^{ - 5}\) . Current results of McNemar test combined with the previously mentioned performance indicator measures prove that all the traditional classifiers show worse prediction ability in contrast with the Bidirectional LSTM model.

As a last step, we provide auxiliary table with standard deviations of the most important measures across all folds was measured. In the last column we provide training time of each model.

As we can see from Table 5 , LSTM neural network has the lowest values for accuracy and AUC, and third lowest for Brier Score. Despite LSTM is more complex model than others, it takes comparable time for training and because it utilizes GPU capabilities of testing PC, which makes training process much faster comparing to CPU-driven models.

The LSTM model can deal with real time data in addition, evaluation of LSTM model is very fast (seconds of computational time).

The paper emphasizes the importance of credit card scoring for assessing and decreasing bank losses. By conducting a detailed comparison procedure it was proven that the LSTM model is the one that gives the highest accuracy in predicting late fees and mis-payments, and that is why it is the best for banks’ interests. In this paper, Bidirectional LSTM model was presented and validated on non-transactional open dataset.

To prove the effectiveness of the proposed model, it was compared to five other traditional classification models. The following performance measures were used for the comparison, specifically: accuracy, AUC, H-measure, Kolmogorov–Smirnov test, Brier score, calibration curves, and the McNemar test. On Taiwanese bank credit card dataset, it has 82.4% accuracy, whilst the best of other models has 81.8%. It seems not so much, however in banking business even 1% of difference in bad credit card behaviour prediction makes huge difference in terms of bank losses.

All measures prove outperformance by the Bidirectional LSTM model. Therefore, it can be concluded that Bidirectional LSTM performs statistically better than other classifiers. Its calibration curve shows that the output of the model can be considered as the probability of default without any additional improvements.

Banks can use outcome of the model not only as a binary output (whether customer will have missed payment in each next month), but also can make use of scores of each client.

LSTM gives the probability of user to be insolvent in next month. It is up to management to set up thresholds above which bank moves this user into group of high or medium risk with corresponding consequences to the user (decreasing credit card limit, blocking card etc.).

In other words, the scores provided by LSTM model can be used to group customers into different risk groups. Thus, bank can use different security and service level for each of these risk groups. Moreover, such scores can be used as missing payment probabilities, so bank management can calculate potential losses of each customer and even credit portfolios. This will allow management to efficiently assess financial risks and make bold financial decisions.

In future work, the model will be tested on other datasets that are transactional and non-transactional in nature to prove its efficiency. Moreover, the proposed model will be extended to customer credit scoring for consumer loans.

Availability of data and materials

The dataset used for supporting the conclusions of this paper is available from the public data repository at http://archive.ics.uci.edu/ml/index.php .

The dataset is available at https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients .

Abbreviations

Long short term memory

  • Neural networks

Bagged neural network

Recurrent neural networks

Receiver operating curve

Area under the curve

Kolmogorov Smirnov

Dyché J. The CRM handbook: a business guide to customer relationship management. Boston: Addison-Wesley Longman Publishing; 2001.

Google Scholar  

Hand DJ, Henley WE. Statistical classification methods in consumer credit scoring: a review. J R Stat Soc Ser A. 1997;160(3):523–41. https://doi.org/10.1111/j.1467-985x.1997.00078x .

Article   Google Scholar  

Anderson R. The credit scoring toolkit: theory and practice for retail credit risk management and decision automation. Oxford: Oxford University Press; 2007.

Liu Y. New issues in credit scoring application. Institut für Wirtschaftsinformatik, Abteilung Wirtschaftsinformatik II, Georg-August-Universität, Göttingen. 2001.

Bensic M, Sarlija N, Zekic-Susac M. Modelling small-business credit scoring by using logistic regression, neural networks and decision trees. Intell Syst Acc Fin Manag. 2005;13(3):133–50. https://doi.org/10.1002/isaf.261 .

Kennedy K, Mac Namee B, Delany SJ, O’Sullivan M, Watson N. A window of opportunity: Assessing behavioural scoring. Expert Syst Appl. 2013;40(4):1372–80. https://doi.org/10.1016/j.eswa.2012.08.052 .

Malik M, Thomas LC. Modelling credit risk of portfolio of consumer loans. J Oper Res Soc. 2010;61(3):411–20. https://doi.org/10.1057/jors.2009.123 .

McNab H, Wynn A. Principles and practice of consumer credit risk management. Ottawa: CIB Publishing; 2000.

So MMC, Thomas LC. Modelling the profitability of credit cards by Markov decision processes. Eur J Oper Res. 2011;212(1):123–30. https://doi.org/10.1016/j.ejor.2011.01.023 .

Baesens B, Van Gestel T, Stepanova M, Van den Poel D, Vanthienen J. Neural network survival analysis for personal loan data. J Oper Res Soc. 2005;56(9):1089–98. https://doi.org/10.1057/palgrave.jors.2601990 .

Article   MATH   Google Scholar  

Lim MK, Sohn SY. Cluster-based dynamic scoring model. Expert Syst Appl. 2007;32(2):427–31. https://doi.org/10.1016/j.eswa.2005.12.006 .

Sarlija N, Bensic M, Zekic-Susac M. Comparison procedure of predicting the time to default in behavioural scoring. Expert Syst Appl. 2009;36(5):8778–8. https://doi.org/10.1016/j.eswa.2008.11.042 .

Hsieh N-C. An integrated data mining and behavioural scoring model for analyzing bank customers. Expert Syst Appl. 2004;27(4):623–33. https://doi.org/10.1016/j.eswa.2004.06.007 .

Bertola G, Disney R, Grant C. The economics of consumer credit. Cambridge: MIT Press; 2008.

Kim H, Cho H, Ryu D. An empirical study on credit card loan delinquency. Econ Syst. 2018. https://doi.org/10.1016/j.ecosys.2017.11.003 .

Kumar PR, Ravi V. Bankruptcy prediction in banks and firms via statistical and intelligent techniques: a review. Eur J Oper Res. 2007;180(1):1–28. https://doi.org/10.1016/j.ejor.2006.08.043 .

Baesens B, Van Gestel T, Viaene S, Stepanova M, Suykens J, Vanthienen J. Benchmarking state-of-the-art classification algorithms for credit scoring. J Oper Res Soc. 2003;54(6):627–35. https://doi.org/10.1057/palgrave.jors.2601545 .

Mylonakis J, Diacogiannis G. Evaluating the likelihood of using linear discriminant analysis as a commercial bank card owners credit scoring model. Int Bus Res. 2010. https://doi.org/10.5539/ibr.v3n2p9 .

Atiya AF, Parlos AG. New results on recurrent network training: unifying the algorithms and accelerating convergence. IEEE Trans Neural Netw. 2000;11(3):697–709. https://doi.org/10.1109/72.846741 .

Bellotti T, Crook J. Credit scoring with macroeconomic variables using survival analysis. J Oper Res Soc. 2009;60(12):1699–707. https://doi.org/10.1057/jors.2008.130 .

Wang C, Han D, Liu Q, Luo S. A deep learning approach for credit scoring of peer-to-peer lending using attention mechanism LSTM. IEEE Access. 2019;7:2161–8. https://doi.org/10.1109/access.2018.2887138 .

Thomas LC. A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. Int J Forecast. 2000;16(2):149–72. https://doi.org/10.1016/s0169-2070(00)00034-0 .

Thomas LC, Ho J, Scherer W. Time will tell: behavioural scoring and the dynamics of consumer credit assessment. IMA J Manag Math. 2001;12(1):89–103. https://doi.org/10.1093/imaman/12.1.89 .

Louzada F, Ara A, Fernandes GB. Classification methods applied to credit scoring: systematic review and overall comparison. Surv Oper Res Manag Sci. 2016;21(2):117–34. https://doi.org/10.1016/j.sorms.2016.10.001 .

Article   MathSciNet   Google Scholar  

Setiono R, Thong JYL, Yap C-S. Symbolic rule extraction from neural networks. Inf Manag. 1998;34(2):91–101. https://doi.org/10.1016/s0378-7206(98)00048-2 .

Sharda R, Wilson RL. Neural network experiments in business failures predication: a review of predictive performance issues. Int J Comput Intell Organ. 1996. https://doi.org/10.1109/hicss.1993.284245 .

Hsieh H-I, Lee T-P, Lee T-S. Data mining in building behavioural scoring models. 2010. https://doi.org/10.1109/cise.2010.5677005 .

Ha S, Nguyen H-N. Credit scoring with a feature selection approach based deep learning. MATEC Web Conf. 2016;54:5004. https://doi.org/10.1051/matecconf/20165405004 .

Cenggoro TW, Mahesworo B, Budiarto A, Baurley J, Suparyanto T, Pardamean B. Features importance in classification models for colorectal cancer cases phenotype in Indonesia. Procedia Comput Sci. 2019;157:313–20. https://doi.org/10.1016/j.procs.2019.08.172 .

Hassan MR, Hossain MM, Begg RK, Ramamohanarao K, Morsi Y. Breast-cancer identification using HMM-fuzzy approach. Comput Biol Med. 2010;40(3):240–51. https://doi.org/10.1016/j.compbiomed.2009.11.003 .

Sani NS, Abdul Rahman M, Bakar A, Sahran S, Sarim H. Machine learning approach for bottom 40 percent households (B40) poverty classification. Int J Adv Sci Eng Inf Technol. 2018;8:1698. https://doi.org/10.18517/ijaseit.8.4-2.6829 .

De Vito S, Piga M, Martinotto L, Di Francia G. CO, NO 2 and NO x urban pollution monitoring with on-field calibrated electronic nose by automatic bayesian regularization. Sensors Actuators B Chem. 2009;143(1):182–91. https://doi.org/10.1016/j.snb.2009.08.041 .

Caraka R, Lee Y, Chen R, Toharudin T. Using hierarchical likelihood towards support vector machine: theory and its application. IEEE Access. 2020. https://doi.org/10.1109/ACCESS.2020.3033796 .

Pereira S. Modelling credit card customer behaviour. Work project presented as a partial requirement for Degree of Master of Statistics and Information Management, with a specialization in Information Analysis and Management. 2019.

Alborzi M, Khanbabaei M. Using data mining and neural networks techniques to propose a new hybrid customer behaviour analysis and credit scoring model in banking services based on a developed RFM analysis method. Int J Bus Inf Syst. 2016;23(1):1. https://doi.org/10.1504/ijbis.2016.078020 .

Bastani K, Asgari E, Namavari H. Wide and deep learning for peer-to-peer lending. Expert Syst Appl. 2019;134:209–24. https://doi.org/10.1016/j.eswa.2019.05.042 .

Akkoç S. An empirical comparison of conventional techniques, neural networks and the three stage hybrid Adaptive Neuro Fuzzy Inference System (ANFIS) model for credit scoring analysis: The case of Turkish credit card data. Eur J Oper Res. 2012;222(1):168–78. https://doi.org/10.1016/j.ejor.2012.04.009 .

Addo P, Guegan D, Hassani B. Credit risk analysis using machine and deep learning models. Risks. 2018;6(2):38. https://doi.org/10.3390/risks6020038 .

Gui L. Application of machine learning algorithms in predicting credit card default payment, University of California. 2019.

Heryadi Y, Warnars HL. Spits Warnars, Learning temporal representation of transaction amount for fraudulent transaction recognition using CNN, stacked LSTM, and CNN-LSTM. 2017.

Jurgovsky J, et al. Sequence classification for credit-card fraud detection. Expert Syst Appl. 2018. https://doi.org/10.1016/j.eswa.2018.01.037 .

Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks. In: 2013 IEEE International conference on acoustics, speech and signal processing. 2013; p. 6645–9. https://doi.org/10.1109/ICASSP.2013.6638947 .

Malhotra P, Vig L, Shroff G, Agarwal P. Long short-term memory networks for anomaly detection in time series. 2015.

Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst. 2014;4.

Siami-Namini S, Namin AS. Forecasting economics and financial time series: ARIMA vs. LSTM. 2018.

Bengio Y, Frasconi P, Simard P. The problem of learning long-term dependencies in recurrent networks. In: IEEE international conference on neural networks. 1993. p. 1183–8.

Srinivasan K, Cherukuri AK, Vincent DR, Garg A, Chen BY. Chen, an efficient implementation of artificial neural networks with K-fold cross-validation for process optimization. J Internet Technol. 2019;20:1213–25. https://doi.org/10.3966/160792642019072004020 .

Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–80. https://doi.org/10.1162/neco.1997.9.8.1735 .

Lai CY, Chen RC, Caraka RE. Prediction average stock price market using LSTM. 2019.

Toharudin T, Pontoh R, Caraka R, Zahroh S, Lee Y, Chen R. Employing long short-term memory and facebook prophet model in air temperature forecasting. Commun Stat Simul Comput. 2021. https://doi.org/10.1080/03610918.2020.1854302 .

Schuster M, Paliwal K. Bidirectional recurrent neural networks. IEEE Trans Signal Proces. 1997;45:2673–81. https://doi.org/10.1109/78.650093 .

Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv:1409.3215 . 2014.

Cui Z, Ke R, Pu Z, Wang Y. Deep Bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. 2018.

Yeh I, Lien C. The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst Appl. 2009;36:2473–80.

Zhang W, Yang D, Zhang S, Ablanedo-Rosas JH, Wu X, Lou Y. A novel multi-stage ensemble model with enhanced outlier adaptation for credit scoring. Expert Syst Appl. 2021;165:113872.

Tripathi D, Edla DR, Bablani A, Shukla AK, Reddy BR. Experimental analysis of machine learning methods for credit score classification. Prog Artif Intell. 2021;15:1–27.

Jadhav S, He H, Jenkins K. Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl Soft Comput. 2018;1(69):541–53.

Hamori S, Kawai M, Kume T, Murakami Y, Watanabe C. Ensemble learning or deep learning? Application to default risk analysis. J Risk Financial Manag. 2018;11(1):12. https://doi.org/10.3390/jrfm11010012 .

Shen F, Zhao X, Kou G, Alsaadi FE. A new deep learning ensemble credit risk evaluation model with an improved synthetic minority oversampling technique. Appl Soft Comput. 2021;98:106852.

Bellotti T, Crook J. Modelling and predicting loss given default for credit cards. Work Pap Quant Financ Risk Manag Cent. 2007.

Lessmann S, Baesens B, Seow H-V, Thomas L. Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur J Oper Res. 2015. https://doi.org/10.1016/j.ejor.2015.05.030 .

Bhatia S, Sharma P, Burman R, Hazari S, Hande R. Credit scoring using machine learning techniques. Int J Comput Appl. 2017;161:1–4.

Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot. 2013;7:21. https://doi.org/10.3389/fnbot.2013.00021 .

Haykin SS. Neural networks and learning machines. 3rd ed. Pearson Education: Upper Saddle River; 2009.

Bhattacharyya S, Maulik U. Soft computing for image and multimedia data processing. Berlin: Springer; 2013.

Book   Google Scholar  

Angelini E, Tollo G, Roli A. A neural network approach for credit risk evaluation. Q Rev Econ Financ. 2008;48:733–55. https://doi.org/10.1016/j.qref.2007.04.001 .

Malhotra R, Malhotra DK. Evaluating consumer loans using neural networks. Omega. 2003;31:83–96. https://doi.org/10.2139/ssrn.314396 .

Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97. https://doi.org/10.1007/BF00994018 .

Lahsasna A, Ainon R, Wah T. Credit scoring models using soft computing methods: a survey. Int Arab J Inf Technol. 2010;7:115–23.

Huang C-L, Chen M-C, Wang C-J. Credit scoring with a data mining approach based on support vector machines. Expert Syst Appl. 2007;33(4):847–56. https://doi.org/10.1016/j.eswa.2006.07.007 .

Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. https://doi.org/10.1023/A:1010933404324 .

Hand DJ. Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn. 2009;77(1):103–23. https://doi.org/10.1007/s10994-009-5119-5 .

Adeodato PJ, Melo SB. On the equivalence between Kolmogorov–Smirnov and ROC curve metrics for binary classification. 2016.

Brier GW. Verification of forecasts expressed in terms of probability. Mon Weather Rev. 1950;78:1–3.

Bröcker J, Smith L. Increasing the reliability of reliability diagrams. Weather Forecast. 2007. https://doi.org/10.1175/WAF993.1 .

Witten IH, Frank EF, Hall MA. Credibility: evaluating what’s been learned. In: Witten IH, Frank E, Hall MA, editors. Data mining: practical machine learning tools and techniques. 3rd ed. Boston: Morgan Kaufmann; 2011. p. 147–87.

Chapter   Google Scholar  

Demšar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res. 2006;7:1–30.

MathSciNet   MATH   Google Scholar  

Dietterich TG. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 1998;10(7):1895–923. https://doi.org/10.1162/089976698300017197 .

Kavzoglu T. Object-oriented random forest for high resolution land cover mapping using quickbird-2 imagery. In: Samui P, Sekhar S, Balas VE, editors. Handbook of neural computation. Cambridge: Academic Press; 2017. p. 607–19.

Download references

Acknowledgements

Not applicable.

This work was supported by the Office of Research, Zayed University under Grant Number R20053.

Author information

Authors and affiliations.

Department of Information Systems, College of Technological Innovation, Zayed University, 19282, Dubai, United Arab Emirates

Maher Ala’raj & Munir Majdalawieh

Department of Electronic and Computer Engineering, College of Engineering, Design and Physical Sciences, Brunel University London, Kingston Lane, Uxbridge, UB8 3PH, UK

Maysam F. Abbod

You can also search for this author in PubMed   Google Scholar

Contributions

MA designed and carried out experiments and data analysis and drafted the manuscript. MM and MA participated in research coordination and checked, read and approved the final manuscript. All authors contributed in revising the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Maher Ala’raj .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Ala’raj, M., Abbod, M.F. & Majdalawieh, M. Modelling customers credit card behaviour using bidirectional LSTM neural networks. J Big Data 8 , 69 (2021). https://doi.org/10.1186/s40537-021-00461-7

Download citation

Received : 26 February 2021

Accepted : 03 May 2021

Published : 19 May 2021

DOI : https://doi.org/10.1186/s40537-021-00461-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Behavioural scoring
  • Bidirectional LSTM
  • Classification

research paper for credit card

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Consumers and credit cards: A credit cards: A review of the empirical literature review of the empirical literature

Profile image of Cliff A Robb

Research in the area of consumer credit card abundance of literature in the business, psychology, and public policy fields. 1960s, the work revolved around descriptive characteristics and evolved as scholars probed deeper by investigating relationships between credit cards and psychological constructs, and the need for consumer policy. While the scope of credit card research has broadened, there is a need to pause and reflect on what we actually know about the phenomenon, given its proclivity in society. This paper identifies the empirical research conducted over the past four decades in order to provide insights and recommendations for additional research. A total of 537 refereed journal articles from 8 databases were reviewed and evaluate to credit cards, with a final working 2012. Emerging trends are identified and suggestions for future research are provided. Research in the area of consumer credit card attitude and behavior has provided an abundance of literature in the business, psychology, and public policy fields. Beginning in the 1960s, the work revolved around descriptive characteristics and evolved as scholars probed onships between credit cards and psychological constructs, and the While the scope of credit card research has broadened, there is a need to pause and reflect on what we actually know about the phenomenon, given its proclivity in This paper identifies the empirical research conducted over the past four decades in order to provide insights and recommendations for additional research. A total of 537 refereed journal articles from 8 databases were reviewed and evaluated within specific parameters related with a final working sample of 103 journal articles published between 1969 and 2012. Emerging trends are identified and suggestions for future research are provided. attitude and behavior has provided an Beginning in the 1960s, the work revolved around descriptive characteristics and evolved as scholars probed onships between credit cards and psychological constructs, and the While the scope of credit card research has broadened, there is a need to pause and reflect on what we actually know about the phenomenon, given its proclivity in This paper identifies the empirical research conducted over the past four decades in order to provide insights and recommendations for additional research. A total of 537 refereed thin specific parameters related published between 1969 and 2012. Emerging trends are identified and suggestions for future research are provided.

Related Papers

research paper for credit card

Jing Jian Xiao

International Journal of Consumer Studies

Simon R James

Home Economics Research Journal

Sharon Danes

International Journal of Bank Marketing

Charles Blankson , Audhesh Paswan , Kwabena Boakye

Jean-charles Chebat , Michel Laroche , K. Fam

KONG YIN MEI

Mediterranean Journal of Social Sciences

Anita Ciunova Shuleska

Credit cards have become an important part of everyday life without which lot of people can not imagine their life. The aim of this paper is to reveal the demographic, socio-economic and behavioral differences in credit cards attitudes in Macedonia. First, attitudes toward payment cards were examined by employing factor analysis. The reliability of the scale was examined using the Cronbach&#39; alpha. The respondents were administered the 12-item version of the credit card attitude scale and asked questions regarding their demographic, socio-economic and behavioral characteristics. ANOVA test was used to reveal the gender and age (demographic) differences, income and household type (socio-economic) differences and behavioral (number of credit cards owned, period of ownership, payment of balance and usage frequency) differences in components of credit cards attitudes. The results of factor analysis identified three subscales of short credit card attitude scales while ANOVA showed sig...

Credit card unhealthy practices have been a world-wide challenge in the global business environment for years. The effect of default hits not only the victim, but also the banks, credit card companies and merchants. The objective of this paper is to examine the relationship amongst practices, attitudes, problems and risks related to credit card usage. A literature review on prior studies has indicated that there is a methodological gap to be filled in this area. Novelty is achieved by the usage of partial least square (PLS) model in answering the hypotheses. Multilevel method analysis using PLS allows for efficiency, convergence and power when investigating the causal effects in the two-level data, ensuring that the support for hypothesis is much more acceptable. Out of the 150 total survey questionnaires distributed, 114 were returned and used. Face to face data collection method was employed to enhance the response rate. Prior to collecting the data, the content of the survey ques...

Journal of Economic Psychology

Pamela Turner

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

RELATED PAPERS

Revista de Gestão

Israel José dos Santos Felipe

International Business Review

Erdener Kaynak

Proceedings of 5th SCF International Conference on “Economics and Social Impacts of Globalization and Future European Union ” , 2018

Şadi Taha Süngü

Shirkah: Journal of Economics and Business

Amanj Ahmed

Thenmalar Suresh Kumar

edibe betül karbay

Dr. G Thouseef Ahamed

Prof. M. Sadiq Sohail

Dorcas Kerre , Justus Mulwa Munyoki

Journal of Financial Services Marketing

Bruce A. Huhmann

Rüştü Yayar

Journal of Business and Social Review in Emerging Economies

Areeba Khan

Cliff A Robb

Economic Growth centre Working …

Faculty of Business and Management

Afiq Baharin

SHS Web of Conferences

samiaji santoso

jack jackson

Journal of Comparative International Management

Afshan Ahmed

Inoussa Boubacar

Judith Fischer

Brian Kennedy

Tạp chí Khoa học

Young Consumers: Insight and Ideas for Responsible Marketers

Tania Veludo

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Review of Machine Learning Approach on Credit Card Fraud Detection

  • Review Article
  • Open access
  • Published: 05 May 2022
  • Volume 2 , pages 55–68, ( 2022 )

Cite this article

You have full access to this open access article

research paper for credit card

  • Rejwan Bin Sulaiman   ORCID: orcid.org/0000-0002-3037-7808 1 ,
  • Vitaly Schetinin 1 &
  • Paul Sant 1  

34k Accesses

50 Citations

Explore all metrics

Massive usage of credit cards has caused an escalation of fraud. Usage of credit cards has resulted in the growth of online business advancement and ease of the e-payment system. The use of machine learning (methods) are adapted on a larger scale to detect and prevent fraud. ML algorithms play an essential role in analysing customer data. In this research article, we have conducted a comparative analysis of the literature review considering the ML techniques for credit card fraud detection (CCFD) and data confidentiality. In the end, we have proposed a hybrid solution, using the neural network (ANN) in a federated learning framework. It has been observed as an effective solution for achieving higher accuracy in CCFD while ensuring privacy.

Similar content being viewed by others

research paper for credit card

Credit Card Fraud Analysis Using Machine Learning

research paper for credit card

A Supervised Approach to Credit Card Fraud Detection Using an Artificial Neural Network

research paper for credit card

Adaptive Approach of Credit Card Fraud Detection Using Machine Learning Algorithms

Explore related subjects.

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

1 Introduction

In the twenty-first century, most financial institutions have increasingly made business facilities available for the public through internet banking. E-payment methods play an imperative role in today's competitive financial society. They have made purchasing goods and services very convenient. Financial institutions often provide customers with cards that make their lives convenient as they go shopping without carrying cash. Other than debit cards the credit cards are also beneficial to consumers because it protects them against purchased goods that might be damaged, lost or even stolen. Customers are required to verify the transaction with the merchant before carrying out any transaction using their credit card.

According to statistics, Visa [ 50 ] and Mastercard [ 51 ] issued 2287 million total credit cards during 2020 (4th quarter) worldwide (Figs.  1 and 2 ).

figure 1

Amount of Master credit card issued worldwide [ 51 ]

figure 2

Amount of Visa credit issued worldwide [ 50 ]

Visa issued 1131 million, whereas master card issued 1156 million cards worldwide. These statistics show how the usage of card-based transactions became easy and famous to the end-users. Fraudsters pave their way to manipulate this group of people due to the massive portion of global transactions falling in this category. And perhaps sometimes it is easy to social engineer humans easily.

Despite the several benefits that credit cards provide to consumers, they are also associated with problems such as security and fraud. Credit card fraud is considered a challenge which banks and financial institutions are facing. It occurs when unapproved individuals use credit cards for gaining money or property using fraudulent means. Credit card information is sensitive to be stolen via online platforms and web pages that are unsecured. They can also be obtained from identity theft schemes. Fraudsters can access the credit and debit card numbers of users illegitimately without their consent and knowledge.

According to “U.K. finance” [ 27 ], fraudulent activities associated with credit and debit cards have proven to be one of the major causes of financial losses in the finance industry. Due to the advancement of technology, it is big threat that leads to massive loss of finances globally. Therefore, it is imperative to carry out credit card fraud detection to reduce financial losses.

Machine learning is effective in determining which transactions are fraudulent and those that are legitimate. One of the main challenges associated with detection techniques is the barrier to exchanging ideas related to fraud detection. According to a study by “U.K. finance”, the number of credit and debit fraud cases reported in the U.K. worth £574.2 million in 2020 [ 27 ].

In recent years, fraud detection in credit card has increased tremendously, drawing the attention of most scholars and researchers [ 22 ]. This research paper seeks to review and evaluate various aspects of credit and debit fraud detection. The paper examines various techniques used to detect fraudulent credit card transactions and finally proposes a better technique for credit card fraud. Researchers are trying to solve some methodological barriers that pose a limitation in ML real-time application. Various research has been done in different domains such as abnormal patterns detection [ 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 ], biometric identification [ 36 , 37 ], Diabetes Prediction [ 38 , 39 ], Happiness prediction [ 40 ], Water quality prediction [ 41 ], accident prevention at Heathrow [ 42 ], timely diagnosis of bone diseases [ 43 ], Predicting informational efficiency using deep Neural Network [ 49 ]. Despite these limitations, researchers are working to gain the ML power to detect frauds.

1.1 Motivation

CCFD involves quite complex procedures and techniques for developing an effective detection system. Following are some of the problems in CCFD that I have analyzed from the literature review, and it has motivated me to propose an effective solution to the problems.

Credit card transactions are substantial in number and are heterogeneous. The users use the credit cards for various purposes based on geographical locations and currencies, which shows that the fraudulent transactions are widely diverse [ 10 ]. This problem has motivated me to devise a solution that can potentially help to detect the fraudulent transaction irrespective of geographical location. Fraud detection is also a multi-objective task. Banks and financial institutions need to give their users a good experience and service at all times. Therefore, it is challenging to use the customer datasets for experimental purposes while ensuring service availability and privacy. To compensate this challenge, my motivation leads to introduce the framework of federated learning for data privacy assurance.

Fraudulent transaction diversity and imbalanced datasets is also a big challenge in CCFD [ 22 ]. Getting real-time datasets of credit card transactions is quite challenging. Banks and financial sectors do not expose their customer's data due to GDPR. Therefore, it creates a challenge for the researchers to gather the datasets for credit card fraud detection. My motivation leads to helping research communities and data scientist who work in the financial sector to devise a system to fulfil the challenges of getting big data for an effective machine learning model.

2 Literature Review

It is imperative for any banking or financial institution that issues credit and debit cards to put in place an effective measure to detect any cases of fraudulent transactions. Some of the notable methods identified to help detect fraud in credit card that includes RF, ANN, SVM, k-nearest neighbors and other techniques that have a hybrid and privacy-preserving approach for data privacy.

We will discuss in brief all the approaches mentioned above.

2.1 Random Forest (R.F.)

Random forest is an algorithm based on ML which is constructed from a decision tree (DT) algorithm, commonly used to resolve various regression and classification problems. It helps in predicting output with high accuracy in large datasets. The Random Forest technique combines several classifiers to provide a solution to different intricate issues. The random forest helps in predicting the average mean of output from other trees. An increase in the number of trees tends to increase the precision of the outcome. The random forest method helps in eradicating various limitations of a decision tree algorithm [ 8 ]. It also minimizes the lifting of datasets and thus increasing precision. Several decision trees exist in a forest whereby a individual tree act as weak-learner; however, they together form strong learner. The RF technique is high-speed and effective in handling large volumes of datasets and unbalanced ones. However, the random forest has limitations in training the range of datasets, especially in the regression problems.

The various traditional algorithm was used, such as Logistic regression (L.R.), C4,5, and R.F. Logistic regression (L.R.) describes and explains the association between the dependent binary variable and independent variable. The C4.5 is commonly considered for data mining as DT classifiers in generating decisions based on various sets of data provided. Traditionally, the algorithm combined Threshold optimization (T) as well as Baye’s Minimum Risk Classifiers (M.R.) were used in fraudulent grouping transactions by altering the prospect of the limit. T and M.R. improve predictions' accuracy and reduce the overall cost involved [ 11 ]. However, logistic regression performs well in the regression problem, as it tolerates the model overfitting, unlike the decision tree. Also, there is a significantly less real-time scenario of having linear problems. When considering the CCFD, the real-time datasets are nonlinear. Therefore, the use of logistic regression is not suitable to be considered.

Olena et al. have proposed a hybrid approach for credit card fraud detection using random forest and isolation forest, which is used to identify anomaly-based transactions [ 15 ].The proposed model of the author is based on two primary sub-systems. One of them is concerned about anomaly detection that works based on unsupervised learning. The second one is an interpretation that incorporates the anomalies type. It is based on supervised learning. The proposed work's primary concern is the data speed that works effectively when considered with the hybrid model on the real-time data [ 15 , 16 ]. The system was evaluated for identifying the users' geolocation while performing transactions for detection purpose. This hybrid model is not based on the anomaly level. However, the anomaly type determines it. The system of anomaly-based transactions detects fraud, based on geolocation. However, preserving privacy and confidentiality is a lack of finding in this research work, as the real-time data is involved in detecting the fraudulent transaction. The researcher did not mention any hashing, or encrypted methods followed to keep user’s data from being exposed. Therefore, to comply with this challenge, there is a need to ensure data confidentiality for the credit card users for the research purpose. The researcher also did not mention how to tackle geolocation spoofing techniques to prevent fraud. Our contribution will be focused on considering the geolocation and time features for detecting frauds combining ANN and federated learning approach to ensure data confidentiality.

Although the random forest algorithms are quite effective in predicting the class of regression problems, they constitute various limitations when it comes to the CCFD in real-time. It can perform well on lab-based datasets where limited data is available. The random forest algorithms are slower in performance in real-time scenarios. The training process is slower, and it takes a longer time to make predictions. Therefore, for effective CCFD in real-life datasets, we need a large volume of data, and random forest algorithms lack the capability of training the datasets effectively and making predictions.

2.2 Artificial Neural Network (ANN) Method

ANN is a ML algorithm which functions similarly to the human brain. Typically, ANN is based on two types of methods: supervised method and unsupervised method. The Unsupervised Neural Network is widely used in detecting fraud cases since it has an accuracy rate of 95%[ 4 ]. The unsupervised neural network attempts to find similar patterns between the credit cardholders present and those found in earlier transactions. Suppose the details found in the current transactions are correlated with the previous transactions. Then, a fraud case is likely to be detected [ 4 ]. ANN methods are highly fault tolerant. For instance, the generation of output is sustained even with the corruption in single or multiple cells. Due to its high speed and effective processing capabilities, ANN can be considered an effective solution for the CCFD.

The author used three stages in detecting fraud; verifying the user, fuzzy clustering algorithm, and ANN classification phase to differentiate between legitimate and suspicious transactions. This technique helped generate an accuracy rate of 93.90 and 6.10% in classifying transactions incorrectly [ 7 ]. Although ANN, along with the clustering, performs well in detecting fraudulent transactions, the author failed to consider the appropriate structure of ANN that requires progressive trials and errors.

An artificial neural network that is trained using a simulated annealing algorithm is effective in identifying various fraudulent credit card transactions. The stimulation annealing algorithm optimizes the performance by finding out the best suitable configuration weight in the neural network [ 10 ].

Saurabh et al. have proposed a model based on the artificial neural network (ANN) [ 17 ] and backpropagation for credit card fraud detection [ 17 , 18 ]. The procedure is followed by taking the customers' dataset, i.e., name, transaction ID and time. With 80% of data for training, the author experimented, 20% of the data is taken for testing and validation purpose. The proposed model has given a significant outcome for the detection of fraudulent transactions in real-time data. For the evaluation purpose, authors have used confusion matrix, recall, accuracy, and precision. By performing this experiment, the achieved accuracy is 99.96% which is enhanced compared to the previous model while considering the real-time data. Although it has produced good results; however, for training and researching, this research work lacks the potential solution of data threat by the researcher or even by an individual bank employee. Therefore, it is required to have a solution that can potentially fulfil all the criteria for data confidentiality and integrity of the bank credit card transactions. The authors have not mentioned anything about data confidentiality while using it for training like name, age and gender. Therefore, our proposed work will use a federated learning model to ensure data privacy to train it for credit card fraud detection.

Data mining techniques such as the DT, MLP, and CFLANN are widely considered to determine patterns from the previous transaction. These models often use two types of datasets in comparing the performance. The Multiple-layer perception (MLP) model has an accuracy of 88.95% in the Australian-credit card dataset [Class Distribution: CLASS 2: +: 307 (44.5%), CLASS 1 383 (55.5%)] and 78.50% in the German-credit card dataset [ 24 ]. which gives the indication that the MLP perform differently in a different dataset. The use of MLP could not be very effective in CCFD as reason been having a larger number of parameters, and it causes the highly dense structure that ultimately results in redundancy and performance inefficiency. The author did not highlight this concern which is essential to consider to use the MLP process in real-time.

ANN is an effective algorithm that can be used in CCFD [ 4 , 7 , 10 ]. It can be seen from the literature; it has produced good performance when used in congestion with various functions and algorithms. Those functions have their individual lacking. However, the use of ANN in CCFD is proven to be promising due to its capability to accommodate a larger volume of data and distributed memory structure.

2.3 Support Vector Machine (SVM)

SVM is considered for classification and carry out regression analysis for various problem. In this approach, researchers often analyze the patterns in which customers use credit cards. The paying patterns of the customers were collected from the datasets. The support vector machine technique is used in classifying consumer patterns into either fraudulent or non-fraudulent transactions. The SVM method is effective, and it provides accurate results when fewer features have been used from the dataset [ 5 ]. However, the problem exists when a larger volume of datasets (at least over 100,000) is used. While considering the use of SVM in CCFD, it is ineffective when used in real-time as the size of datasets are large.

Rtayli et al. have proposed a method for credit card fraud risk (CCR) for the higher dimensionality data by using the classification technique of random forest classifier (RFC) [ 27 ] and SVM [ 26 , 27 ], in a hybrid approach. The idea was inspired by the feature selection of fraudulent transactions in the big imbalanced dataset. The fraud transactions are minimal in number and become difficult for detection. To evaluate the model, the author has used evaluation metrics that comprise accuracy, recall and area under the curve.

Based on SVM while using RFC suggested that it has produced the accuracy of 95%, false-positive transactions are decreased by improving the sensitively to 87% which has caused the better fraud detection in the massive dataset and imbalanced data [ 26 , 27 ]. This model has also improved the classification performance. Although the method produced efficient corresponding output for fraud detection while using classification features, this model limits the transaction's privacy in term of performing the evaluation metrics of accuracy and recall. Therefore, to fix privacy concern, we are using a federated learning model that trains data locally. We are also combining it with artificial neural network. RFC performs slow when dealing with large datasets.

2.4 K-Nearest Neighbour (KNN)

KNN is type of supervised ML method helpful in classifying and performing regression analysis on problems. It is an effective method in supervised learning. It helps in improving the detection and decreasing false-alarm rate. It uses a supervised technique in establishing the presence of fraudulent activity in credit card transactions [ 14 ]. The KNN fraud detection technique requires two estimates: correlation of transaction and distance between the occurrence of transaction in data. The KNN technique is suitable for detecting fraudulent activity during transaction time. By performing over-sampling and separating data, it can be possibly used to determine the anomalies in the targets. Therefore, it can be considered for CCFD in memory limitations. It can assist in CCFD while utilizing low memory and less computation power. It is a faster approach for any number of datasets. While comparing with other anomaly-based techniques, KNN results higher in accuracy and efficiency [ 12 ].

It is widely used in identifying a similar pattern in previous transactions carried out by the cardholder. The commonly used machine learning algorithms include LR, Naïve Bayes and KNN. The KNN has an accuracy rate of 97.69% when it comes to the detection of fraudulent transactions in Credit card [ 13 ]. It has produced optimum performance KNN is proven to be efficient in performance with respect to all metrics been used, as it didn’t record any false-positive while classifying. Another study was performed using KNN, where 72% accuracy was achieved for CCFD [ 12 ].

Although the authors conducted progressive tests while utilizing KNN, it is critical to note the algorithm's limitations. KNN is a memory-intensive algorithm that scales up non-essential data characteristics. It likewise falls short in the experiments cited above. When the algorithm is fed a large amount of data, the performance of the KNN algorithm degrades. As a result, these constraints have an effect on the accuracy and recall matrix in the CCFD process.

2.5 Hybrid Approach

The procedures for CCFD are now replaced by the ML techniques that have resulted in higher efficiency. One of the research teams has proposed a method that involves loan fraud detection while using the ML in credit card transactions [ 44 ]. The process was experimented with by using the Extreme Gradient Boosting (XGBoost) algorithm with other data mining procedures that have produced optimal results in CCFD. The research work was followed by keeping the valuable information without having knowledge about it.

To achieve the research targets, the authors have used a hybrid technique of supervised and unsupervised ML algorithms. In this procedure, PK-XGBoost and XGBoost were used. While observing the performance, PK-XGBoost has performed better in comparison with simple XGBoost [ 45 , 46 ]. The performance metric keeps the higher efficiency in detecting fraud while ensuring user privacy. Due to the higher number of transactions in credit cards, this approach possesses limitations in terms of privacy assurance. Also, XGBoost overfits the dataset in some cases to avoid these various parameters need to be tuned and act together to attain adequate accuracy.

The researchers have used the hybrid method for CCFD using the random forest as well as isolation forest that is used for identification of anomaly transaction [ 47 ]. This method is comprised of two categories. The first one is involved in anomaly-based detection while using unsupervised learning, and the other one is used for interpretations of anomaly detection, and it works on the basis of supervised learning. The proposed method is considered by using high-speed data when the method is used on real-life datasets [ 15 , 16 ]. The evaluation of the proposed system was evaluated for the identification of the user geolocation. This technique is not cantered on the anomaly level; instead, it is the anomaly-type that defines it. Although it helped to detect the fraudulent transaction on the basis of geolocation, however, data confidentiality and privacy could be compromised. While considering the author work, the model should be evaluated while ensuring the confidentiality of the data. Therefore, it is required to have a model that provides data confidentiality while achieving higher accuracy in CCFD of more extensive datasets.

2.6 Privacy-Preserving Techniques

In the ML approach, dataset training is essential, and for practical training, ML algorithms should be provided with a large volume of data. There has been various research done by using Credit card data in a privacy-preserving manner. One of the experiments was done using the supervised ML approach with blockchain technology. It was used on Ethereum, and it was performed on 300 thousand accounts. The results achieved showed that the alteration of parameters changes the value of precision and recall. Also, it was observed that the use of blockchain could be a threat on the basis of the fact that it is decentralised technology [ 53 ]. However, blockchain technology is one of the effective ways of ensuring data privacy due to its decentralised nature. However, considering the use of decentralized technology in the Real World for CCFD, it possesses various limitations that include scalability issues, maintaining data in the wallets. It is also processor-intensive, consuming higher energy, Hence it is expensive, and standardisation is not globally adapted. Therefore, considering the blockchain in banks and financial institutions for CCFD could not be the right choice.

The use of data for experimental purposes should be followed by the GDPR. The research was done by using the techniques of gossip learning and federated learning. It was observed that the gossip learning techniques are ineffective because of not having a central control system. While on the other hand, F.L. has performed better as of its semi decentralised nature [ 52 , 54 ].

Credit card data is imbalanced and skewed. Finance institutions are not allowed to share their credit card data due to privacy concerns and GDPR. Therefore, while considering this issue, experiments were done by using the techniques of federated learning. In this method, the data was trained locally on the participants, i.e., banks and financial institutes. The result showed that the use of F.L. could fulfil the privacy issue where the data is not shared to the central aggregated server; instead, only the trained model is shared [ 55 ]. This is an ideal situation where the data is secured in terms of privacy and confidentiality. F.L. is a cyclic process where the information is trained locally at the client’s devices, and the mean average of the model from the individual client is aggregated together. And by this way, anomaly-based fraudulent transactions are learnt from the respective clients, and thus an effective ML model is trained.

2.7 Blockchain Technology

There are various applications based on blockchain technology that has achieved good public attention. It is based on the fact that; it goes beyond the limits of central servers like banks and other institutions. Instead, it provides the decentralised approach where the user behaviour depends on the nature of the Blockchain technology. There is malicious software that can cause fraud in blockchain transactions. Michal et al. (2019) have proposed a supervised machine learning approach in blockchain technology [ 56 ]. The authors have used this technique on Ethereum blockchain. The experiment was performed on 300 thousand accounts, and the results were compared with random forest, SVM and XGBoost [ 57 ]. They have concluded in the experiment that the various transaction parameters alter the value of precision and recall. They have also suggested that Blockchain is self-maintained technology. This reliance on this could be a potential threat, especially in the finance sector. Therefore, our research is based on a more practical approach with federated learning which is semi-decentralised that ensures efficiency and privacy at the same time.

2.7.1 Why Not Blockchain?

Machine learning approaches are life-changing and continuously evolving in our daily life to make things more comfortable around us. The main hurdle in ML constitutes the diversified and complex training data. Crowdsourcing is one technique used for data collection for the central server, but it possesses limitations concerning data privacy [ 53 ]. Blockchain is one of the emerging technologies for making the possibility of providing the decentralised platform that could result in providing enhanced security to the data [ 57 ]. Therefore, it could be considered the medium of data collection for CCFD in how data is exchanged among banks and financial institutions securely. However, there are several drawback and limitation that make this technology less efficient to use for exchanging data. Furthermore, due to GDPR exchanging data constitutes privacy concerns. Following are some of the disadvantages of blockchain technology while considering CCFD:

The process slows down if there are too many users in a network.

Due to the consensus method used in Blockchain, it is harder to scale the data.

It requires higher energy usage.

Blockchain sometime show inefficiency in its operation.

User must maintain its data in wallets.

The technology is costly.

It is not standardised.

The issues mentioned above in blockchain technology discourage researchers and academic institutions from adopting this technology for CCFD. Our proposed research will fix this issue using the semi-decentralised technique of federated learning. It would provide higher efficiency where the participants will train their model locally (preserve security), resulting in faster processing capability than blockchain technology and higher data scalability (Table 1 ).

3 Classification Imbalance Problem

In credit cards, fraud detection data imbalance is one of the challenging parts that the researchers tried to study. While training the machine learning algorithm could lead to misclassification because of the ratio of genuine transactions towards the fraud transactions (Fig.  3 ).

figure 3

It shows the ratio of imbalance of the data has in the dataset used in most of the research on our table. 284,807 transactions are genuine, whereas 492 were a fraud

Pre-processing the data is one of the techniques to handle imbalanced data, where the oversampling of fraud transactions and under-sampling the legit transaction is performed. That increases the fraud class and decrease the legit transaction class in the original dataset. The performance of the ML algorithm increased after over-sampling where synthetic minority oversampling technique (SMOTE) is considered [ 10 ] for imbalanced data. Balanced classification-rate (BCR) and Matthews correlation-coefficient (MCC) are two metrics for handling class imbalances, and it was observed that the fraud miner is better at achieving higher accuracy. Even though there are various drawbacks of using the SMOTE that includes the noise and probability of overlapping between the class that results in overfitting the model, In the experiment [ 19 ], SMOTE is found to have achieved 2–4% better accuracy as compared to other classification methods. Although adaptive synthetic (ADASYN) and Ranked Minority Oversampling in Boosting (RAMO) methods were proposed afterwards, however, it caused the issue of classification while considering the increased number of iterations, and the researchers have suggested that the ensemble classifier could perform well in contrast to single-classifier when used with imbalance datasets.

4 Model Design

The centralised approach is one of the commonly adopted methods for credit card fraud detection. A fraud detection system (FDS) becomes inefficient when the limited datasets are available and the limited detection period. Banks and other financial centres cannot share their data on a central server due to GDPR. Users’ privacy can still be compromised even if the "anonymised" dataset is locally on servers as it could be reversed-engineered. Therefore, to cope with this challenge we are using FL in our research model as this gives the capability to train the real-time data locally on the edges devices and trained model is centrally shared among all other banks and research centres that can effectively enhance the accuracy of fraudulent transactions.

Secondly, in our research model, we will be using the ANN algorithm to find better evaluation matrix’s on clients’ data in combination with Federated learning to achieve higher accuracy. Furthermore, this model will play an essential role to accomplish the privacy of the user's data in the given hybrid model approach.

4.1 Proposed Model with Federated Learning and ANN

In our FL model, the following steps are involved in training the model until all participants achieve the full transition:

Clients selection

Based on the eligibility criteria, the server selects the participating clients.

Broadcasting

In this stage, the chosen client downloads our model. It will be an artificial neural network mode.

Computation phase

In this stage, all the participant devices compute the model-update by executing the program provided by the server.

Aggregation

In this stage, the server performs the aggregation of the updates from the device.

Model-update

In this, the shared server performs aggregation of the clients update locally and update the shared model.

Model Outline

The proposed model of federated learning with ANN can be classified into three phases followed one after the other until the last phase is completed, and the cycle continues. We will start from Step one as follows:

This step involves the distribution of our model (ANN) from the central server to the respective correspondence banks or financial institutions. It is displayed as "Black Brain" in Fig.  4 . Once the individual banks receive the model, it starts training the model with the available datasets locally. The training process is illustrated below, where the trained model is represented as differentiated by colors for the bank (A-purple, B-blue, C-green and D-red). Digit "1" shows the first phase of sending process of our model to the banks.

figure 4

: Step 1 of the proposed model

On completing step-1, step-2 starts simultaneously to send a trained model from banks to the central server of the federated learning model. On the server, all models from the respective banks are combined and form an "upgraded model" as illustrated in Fig.  5 .

figure 5

Step 2 of the proposed model

Step-3 is the last step of our proposed model, reflecting the sending of "upgraded model" (formed by the mean average of all corresponding trained models from different banks) to the individual bank separately. Furthermore, on receiving the model by the banks, it is trained locally as step-1. Once the training is completed, it is sent back to the server. The process is repeated cyclically until the expected outcome is ensured (Fig.  6 ).

figure 6

Step 3 of the proposed model

Cycle Repetition

After completion of step-3, the process is continued by sending the trained model to the server as the first step explained. Again, the server takes the mean-average of all banks, and it is sent to individual banks again. According to our hypothesis, this repeated training process repeatedly can ensure higher accuracy in CCFD. The overall process is represented in Fig.  7 .

figure 7

The model is commonly and collaboratively shared by banks and other research centres where the data is kept locally to their database. However, just the trained model is shared among all participants, not actual data. The central server will be trained mutually by all participants, resulting in better classification than the individual model trained locally. In simple words, the learning pattern is learnt locally at each client-side, and these learnt patterns are aggregated together in the central server. It is trained from the mutual inputs from all participants. This central model is shared back to all participants, and fraud detection is performed accordingly. By performing the steps mentioned above, FL can significantly enhance fraud detection accuracy, and simultaneously, the privacy of the customer's data is preserved by using the FL, which will incorporate the data according to GDPR.

5 Proposed Method

In this review paper, we found that the usage of supervised learning is common practice among researchers. SVM, KNN, Naïve-Bayes, logistic-regression and DT models are highly used. We also see that the hybrid approach gives a better performance than if usage of a single algorithm/ classifier. As it can be observed, various experiments that are performed on the CCFD in the previous section, although different ML models are proven to be effective in this process however due to data imbalance and heterogeneity, CCFD is always challenging, and models are unable to yield higher accuracy. The factor of data imbalance and heterogeneity could be enhanced for higher volume of data and also the real-time fraudulent patterns are observed constantly, so the model is updated with the potential feature variables. The use of real-time datasets involves privacy issues as the banks and financial institutions are obliged to follow GDPR rules. Our proposed solution suggests the use of a privacy-preserving approach of using the datasets for effective ML model training. Following is the flow chart of the proposed solution that will follow each step as shown, and eventually, it performs an iterative process.

Figure  8 shows our proposed methodology following number of steps from beginning to the end. Data splitting is performed into training, validation and testing with the percentage of 75%, 15% and 15% respectively across the whole dataset. Machine learning algorithm is used on the training data. In our proposed topology, we have used FL framework for model training. In this architecture, model is sent from FL central server to the local server comprising of local devices. The model sent at local devices is trained separately and eventually the trained model is sent back to the FL server and aggregated together. This process is repeated to keep the model updated with the latest patterns. In this framework, only the trained model from the local devices is shared to the FL server and the data is remained secured locally on devices. Once the model is trained, it can be evaluated for performance analysis by testing and validation data. And the trained model from the real time transaction data can be effectively used for CCFD.

figure 8

It shows our proposed methodology. In this model, transaction data can be used for preprocessing and applying ML models. Data splitting, processing, and using the ML model in FL framework is used for data privacy and effective model performance analysis

Our proposed solutions involve the use of a federated learning concept that follows the framework for banks and financial institutions to collaborate for training the ML model. In this process of collaboration, the model is trained locally on each participant, and the trained model is combined centrally without data. The mean average of the trained model is repeated across the participants for training and keep learning new patterns from the variety of data. In this process, the data is not shared; instead, only the trained model is combined centrally. It follows the data privacy concept, where the data is secured (not shared), but at the same time ML model is trained from the datasets. Experiments show that the use of Deep learning algorithms has produced effective outcomes in CCFD. Our proposed solution outlines the use of an artificial neural network with the F.L., which can bring up model training on the bigger scale real-time datasets where privacy is ensured, and the trained model can promise the optimal CCFD. Although work has been done on ANN for CCFD, however, it is based on lab-based datasets. Our proposed solution is novel in the sense that it uses the hybrid approach that is based on using real-time data in a privacy-preserving manner. The use of ANN for effective detection and federated learning for providing the framework of data privacy will provide a hybrid approach which is a novel contribution.

6 Conclusion

This review paper explores the various techniques been used for CCFD. It can be analysed that the ML techniques are a great way to enhance the accuracy of CCFD. However, we need large datasets to train the model to avoid the issue of data imbalance. The use of real-time datasets can provide us with more variety of data, while privacy remains an issue. According to our proposed method, we can utilise the real-time datasets to train the model in a privacy-preserving manner. A Federated learning framework with ANN can enhance the capability of the ML model to detect fraudulent transactions. The proposed hybrid approach can alter the way of CCFD in an effective manner while utilising the real-life datasets and give a new horizon in the field of the banking and finance industry. The proposed method can help the finance institutions and banks to utilise the real-time datasets by the mutual collaboration that would give a collective benefit for developing an effective system for CCFD. Although the proposed method is effective in terms of CCFD while using the real-time datasets in a privacy-preserving way, however, it has limitations when it comes to real-life deployment. All banks and financial institutes have their own rules and regulations, and they are quite strict about it. Adapting the proposed method will be challenging as every bank and finance institutes have their own limitations, and they rely on their internal resources rather than using a centralised approach. Although data is not shared centrally, even the trained model will be going to learn patterns that can be possibly decoded by hackers. Therefore, while keeping the limitations in place, there still needs to be work done for gaining the confidence of banks and financial institutes to adopt this technology.

Availability of Data and Material

Not applicable.

Lucas Y, Portier P-E, Laporte L, et al. Multiple perspectives HMM-based feature engineering for credit card fraud detection. In: ACM, 2019. p. 1359–1361.

Duman E, Elikucuk I. Solving credit card fraud detection problem by the new metaheuristics migrating birds optimization. Berlin: Springer; 2013.

Book   Google Scholar  

Botchey FE, Qin Z, Hughes-Lartey K. Mobile money fraud prediction—a cross-case analysis on the efficiency of support vector machines, gradient boosted decision trees, and Naïve Bayes algorithms. Information. 2020;11:383. https://doi.org/10.3390/info11080383 .

Article   Google Scholar  

Ogwueleka FN. Data mining application in credit card fraud detection system. J Eng Sci Technol. 2011;6:311–22.

Google Scholar  

Sriram Sasank JVV, Sahith GR, Abhinav K, Belwal M. Credit card fraud detection using various classification and sampling techniques: a comparative study. In: IEEE, 2019. p. 1713–1718.

Ojugo AA, Nwankwo O. Spectral-cluster solution for credit-card fraud detection using a genetic algorithm trained modular deep learning neural network. JINAV J Inf Vis. 2021;2:15–24. https://doi.org/10.35877/454RI.jinav274 .

Majhi SK, Bhatachharya S, Pradhan R, Biswal S. Fuzzy clustering using SALP swarm algorithm for automobile insurance fraud detection. J Intell Fuzzy Syst. 2019;36:2333–44. https://doi.org/10.3233/JIFS-169944 .

Darwish SM. An intelligent credit card fraud detection approach based on semantic fusion of two classifiers. Soft Comput. 2019;24:1243–53. https://doi.org/10.1007/s00500-019-03958-9 .

Sobanadevi V, Ravi G. Handling data imbalance using a heterogeneous bagging-based stacked ensemble (HBSE) for credit card fraud detection. Singapore: Springer; 2020.

Li C, Ding N, Dong H, Zhai Y. Application of credit card fraud detection based on CS-SVM. Int J Mach Learn Comput 2021;11(1).

Olowookere TA, Adewale OS. A framework for detecting credit card fraud with cost-sensitive meta-learning ensemble approach. Sci Afr. 2020;8:e00464. https://doi.org/10.1016/j.sciaf.2020.e00464 .

Itoo F, Meenakshi SS. Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. Int J Inf Technol. 2020;13:1503–11. https://doi.org/10.1007/s41870-020-00430-y .

Awoyemi JO, Adetunmbi AO, Oluwadare SA. Credit card fraud detection using machine learning techniques: a comparative analysis. IEEE, 2017. p. 1–9

Alam MN, Podder P, Bharati S, Mondal MRH. Effective machine learning approaches for credit card fraud detection. Cham: Springer; 2021.

Vynokurova O, Peleshko D, Bondarenko O, Ilyasov V, Serzhantov V, Peleshko M. Hybrid machine learning system for solving fraud detection tasks. In: 2020 IEEE third international conference on data stream mining & processing (DSMP), IEEE; 2020. p. 1–5.

Rai AK, Dwivedi RK. Fraud detection in credit card data using unsupervised machine learning based scheme. In: IEEE, 2020. p. 421–426.

Dubey SC, Mundhe KS, Kadam AA. Credit card fraud detection using artificial neural network and back propagation. In: 2020 4th international conference on intelligent computing and control systems (ICICCS). IEEE; 2020. p. 268–273.

Patidar R, Sharma L. Credit card fraud detection using neural network. Int J Soft Comput Eng (IJSCE), 2011;1(32–38).

Dhankhad S, Mohammed E, Far B. Supervised machine learning algorithms for credit card fraudulent transaction detection: a comparative study. In: IEEE, 2018. p. 122–125.

Puh M, Brkic L. Detecting credit card fraud using selected machine learning algorithms. In: Croatian Society MIPRO, 2019. p. 1250–1255.

Varmedja D, Karanovic M, Sladojevic S, et al. Credit card fraud detection—machine learning methods. In: IEEE, 2019. p. 1–5.

Zhu H, Liu G, Zhou M, Xie Y, Abusorrah A, Kang Q. Optimizing weighted extreme learning machines for imbalanced classification and application to credit card fraud detection. Neurocomputing. 2020;407:50–62. https://doi.org/10.1016/j.neucom.2020.04.078 .

Jemima Jebaseeli T, Venkatesan R, Ramalakshmi K. Fraud detection for credit card transactions using random forest algorithm. Singapore: Springer; 2020.

Dighe D, Patil S, Kokate S. Detection of credit card fraud transactions using machine learning algorithms and neural networks: a comparative study. In: IEEE, 2018. P. 1–6.

Mishra MK, Dash R (2014) A comparative study of Chebyshev functional link artificial neural network, multi-layer perceptron and decision tree for credit card fraud detection. In: IEEE, p. 228–233

Rtayli N, Enneya N. selection features and support vector machine for credit card risk identification. Procedia Manuf. 2020;46:941–8.

Xuan S, Liu G, Li Z, Zheng L, Wang S, Jiang C. Random forest for credit card fraud detection. In: 2018 IEEE 15th international conference on networking, sensing and control (ICNSC). IEEE; 2018, p. 1–6.

Worobec K. The definitive overview of payment industry fraud. In: Ukfinance.org.uk. 2021. https://www.ukfinance.org.uk/system/files/Fraud%20The%20Facts%202021-%20FINAL.pdf .

Jakaite L, Schetinin V, Maple C. Bayesian assessment of newborn brain maturity from two-channel sleep electroencephalograms. Comput Math Methods Med. 2012;2012:629654–7. https://doi.org/10.1155/2012/629654 .

Article   MATH   Google Scholar  

Jakaite L, Schetinin V, Maple C, Schult J. Bayesian decision trees for EEG assessment of newborn brain maturity. In: The 10th annual workshop on computational intelligence UKCI 2010. 2010. https://doi.org/10.1109/UKCI.2010.5625584

Jakaite L, Schetinin V, Schult J. Feature extraction from electroencephalograms for Bayesian assessment of newborn brain maturity. In: Proceedings of the 24th IEEE international symposium on computer-based medical systems. 2011. https://doi.org/10.1109/CBMS.2011.5999109

Jakaite L, Schetinin V, Schult J. Feature extraction from electroencephalograms for Bayesian assessment of newborn brain maturity. In: 24th International symposium on computer-based medical systems (CBMS), 2011. p. 1–6. https://doi.org/10.1109/CBMS.2011.5999109

Nyah N, Jakaite L, Schetinin V, Sant P, Aggoun A. Evolving polynomial neural networks for detecting abnormal patterns. In: 2016 IEEE 8th international conference on intelligent systems (I.S.), 2016. p. 74–80. https://doi.org/10.1109/IS.2016.7737403 .

Nyah N, Jakaite L, Schetinin V, Sant P, Aggoun A. Learning polynomial neural networks of a near-optimal connectivity for detecting abnormal patterns in biometric data. In: 2016 SAI computing conference (SAI), 2016. p. 409–413. https://doi.org/10.1109/SAI.2016.7556014 .

Schetinin V, Jakaite L. Classification of newborn EEG maturity with Bayesian averaging over decision trees. Expert Syst Appl. 2012;39(10):9340–7. https://doi.org/10.1016/j.eswa.2012.02.184 .

Schetinin V, Jakaite L. Extraction of features from sleep EEG for Bayesian assessment of brain development. PLoS ONE. 2017;12(3):1–13. https://doi.org/10.1371/journal.pone.0174027 .

Schetinin V, Jakaite L, Nyah N, Novakovic D, Krzanowski W. Feature extraction with GMDH-type neural networks for EEG-based person identification. Int J Neural Syst. 2018. https://doi.org/10.1142/S0129065717500642 .

Hassan MM, Billah MAM, Rahman MM, Zaman S, Shakil MMH, Angon JH. Early predictive analytics in healthcare for diabetes prediction using machine learning approach. In: 2021 12th international conference on computing communication and networking technologies (ICCCNT). IEEE; 2021. p. 01–05.

Hassan MM, Peya ZJ, Mollick S, Billah MAM, Shakil MMH, Dulla AU. Diabetes prediction in healthcare at early stage using machine learning approach. In: 2021 12th international conference on computing communication and networking technologies (ICCCNT). IEEE; 2021. p. 01–05.

Kong M, Li L, Wu R, Tao X. An empirical study of learning based happiness prediction approaches. Hum Centric Intell Syst. 2021;1(1–2):18.

Hassan M, Akter L, Rahman M, Zaman S, Hasib K, Jahan N, Smrity R, Farhana J, Raihan M, Mollick S. Efficient prediction of water quality index (WQI) using machine learning algorithms. Hum Centric Intell Syst. 2021;1(3–4):86.

Schetinin V, Jakaite L, Krzanowski WJ. Prediction of survival probabilities with Bayesian decision trees. Expert Syst Appl. 2013;40(14):5466–76. https://doi.org/10.1016/j.eswa.2013.04.009 .

Schetinin V, Jakaite L, Krzanowski W. Bayesian learning of models for estimating uncertainty in alert systems: application to air traffic conflict avoidance. Integr Comput Aided Eng. 2018;26:1–17. https://doi.org/10.3233/ICA-180567 .

Jakaite L, Schetinin V, Hladuvka J, Minaev S, Ambia A, Krzanowski W. Deep learning for early detection of pathological changes in X-ray bone microstructures: case of osteoarthritis. Sci Rep. 2021. https://doi.org/10.1038/s41598-021-81786-4 .

Wen H, Huang F. Personal loan fraud detection based on hybrid supervised and unsupervised learning. In: 2020 5th IEEE international conference on big data analytics (ICBDA). IEEE; 2020. p. 339–343.

Li W, Lin S, Qian X, et al. An evidence theory-based validation method for models with multivariate outputs and uncertainty. SIMULATION. 2021;97:821–34. https://doi.org/10.1177/00375497211022814 .

Zięba M, Tomczak SK, Tomczak JM. Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Syst Appl. 2016;58:93–101. https://doi.org/10.1016/j.eswa.2016.04.001 .

Vynokurova O, Peleshko D, Bondarenko O, et al. (2020) Hybrid Machine Learning System for Solving Fraud Detection Tasks. IEEE, pp 1–5

Rejwan BS, Schetinin V. Deep neural-network prediction for study of informational efficiency. In: Arai K, editor. Intelligent systems and applications. IntelliSys 2021. Lecture notes in networks and systems, vol. 295. Cham: Springer; 2022. https://doi.org/10.1007/978-3-030-82196-8_34 .

Chapter   Google Scholar  

Visa credit cards in circulation 2020|Statista. In: Statista. 2021. https://www.statista.com/statistics/618115/number-of-visa-credit-cards-worldwide-by-region/ .

Mastercard: credit cards in circulation 2021|Statista. In: Statista. 2021. https://www.statista.com/statistics/618137/number-of-mastercard-credit-cards-worldwide-by-region/ . Accessed 24 Nov 2021.

Hegedűs I, Danner G, Jelasity M. Decentralized learning works: an empirical comparison of gossip learning and federated learning. J Parallel Distrib Comput. 2021;148:109–24. https://doi.org/10.1016/j.jpdc.2020.10.006 .

Ostapowicz M, Żbikowski K. Detecting fraudulent accounts on blockchain: a supervised approach. Cham: Springer; 2019.

Danner G, Berta Á, Hegedűs I, Jelasity M. Robust fully distributed minibatch gradient descent with privacy preservation. Secur Commun Netw. 2018;2018:1–15. https://doi.org/10.1155/2018/6728020 .

Yang W, Zhang Y, Ye K, et al. FFD: a federated learning based method for credit card fraud detection. Cham: Springer; 2019.

Ostapowicz M, Żbikowski K. Detecting fraudulent accounts on blockchain: a supervised approach. In: International conference on web information systems engineering. Springer, Cham; 2020. p. 18–31.

Carneiro N, Figueira G, Costa M. A data mining based system for credit-card fraud detection in e-tail. Dec Support Syst. 2017;95:91–101. https://doi.org/10.1016/j.dss.2017.01.002 .

Download references

Author information

Authors and affiliations.

University of Bedfordshire, Luton, UK

Rejwan Bin Sulaiman, Vitaly Schetinin & Paul Sant

You can also search for this author in PubMed   Google Scholar

Contributions

RBS significantly contributed to the conceptual parts of the paper's contribution to the knowledge. VS and PS assisted with report improvement and review, as well as providing guidance on manuscript drafting.

Corresponding author

Correspondence to Rejwan Bin Sulaiman .

Ethics declarations

Conflict of interest.

The authors declare that they have no competing interests.

Ethics Approval

Consent to participate, consent for publication, additional information, publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Bin Sulaiman, R., Schetinin, V. & Sant, P. Review of Machine Learning Approach on Credit Card Fraud Detection. Hum-Cent Intell Syst 2 , 55–68 (2022). https://doi.org/10.1007/s44230-022-00004-0

Download citation

Received : 25 November 2021

Accepted : 28 March 2022

Published : 05 May 2022

Issue Date : June 2022

DOI : https://doi.org/10.1007/s44230-022-00004-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Artificial neural network (ANN)
  • Credit card fraud
  • Federated learning
  • Random forest (RF) method
  • Support vector machine (SVM)
  • Privacy-preserving

Advertisement

  • Find a journal
  • Publish with us
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PeerJ Comput Sci
  • PMC10280638

Logo of peerjcs

A systematic review of literature on credit card cyber fraud detection using machine and deep learning

Eyad abdel latif marazqah btoush.

1 School of Business, University of Southern Queensland, Toowoomba, QLD, Australia

Xujuan Zhou

Raj gururajan.

2 School of Computing, SRM Institute of Science and Technology, Chennai, India

Ka Ching Chan

Rohan genrich, prema sankaran.

3 School of Management, Presidency University, Bangalore, India

Associated Data

The following information was supplied regarding data availability:

This is a literature review.

The increasing spread of cyberattacks and crimes makes cyber security a top priority in the banking industry. Credit card cyber fraud is a major security risk worldwide. Conventional anomaly detection and rule-based techniques are two of the most common utilized approaches for detecting cyber fraud, however, they are the most time-consuming, resource-intensive, and inaccurate. Machine learning is one of the techniques gaining popularity and playing a significant role in this field. This study examines and synthesizes previous studies on the credit card cyber fraud detection. This review focuses specifically on exploring machine learning/deep learning approaches. In our review, we identified 181 research articles, published from 2019 to 2021. For the benefit of researchers, review of machine learning/deep learning techniques and their relevance in credit card cyber fraud detection is presented. Our review provides direction for choosing the most suitable techniques. This review also discusses the major problems, gaps, and limits in detecting cyber fraud in credit card and recommend research directions for the future. This comprehensive review enables researchers and banking industry to conduct innovation projects for cyber fraud detection.

Introduction

The banking industry has been profoundly impacted by the evolution of information technology (IT). Credit card and online net banking transactions, which are currently the majority of banking system transactions, all present additional vulnerabilities ( Jiang & Broby, 2021 ). Hackers have increasingly targeted banks with enormous quantities of client data. Therefore, banks have been in the forefront of cyber security for business. In the past thirteen years, cyber security industry expanded fast. The market is predicted to be valued 170.4 billion in 2022 ( Morgan, 2019 ). In the next three years, the cost of cybercrime is expected to rise by 15% every year, finally exceeding $10.5 trillion USD each year by 2025 ( Morgan, 2020 ).

In the banking industry, cyber fraud using credit cards is a significant concern that costs billions of dollars annually. Banking industry has made strengthening cyber security protection a priority. Multiple systems have been developed for monitoring and identifying credit card cyber fraud. However, because of the constantly evolving nature of threats, banking industry must be equipped with the most modern and effective cyber fraud management technologies ( Btoush et al., 2021 ).

The acceptance of credit card and other forms of online payments has exploded in recent years, this resulted in an increase in cyber fraud in credit cards. In credit card, there are several forms of cyber fraud. The first type is the actual theft of a credit card. The theft of confidential details of credit card is the second type of cyber fraud. When the credit card information is entered without the cardholder’s permission during an online transaction, further fraud is committed ( Al Smadi & Min, 2020 ; Trivedi et al., 2020 ).

The detection of cyber fraud in credit cards is a challenging task that attracted the interest of academics working in the fields of machine learning (ML). Datasets associated with credit cards have significant skewness. A great number of algorithms are unable to discriminate items from minority classes when working with datasets that have a considerable skew. In order to achieve efficiency, the systems that are used to identify cyber fraud need to react swiftly. Another important matter of concern is the way in which new methods of attack, influence the conditional distribution of the data over the time period ( Benchaji, Douzi & El Ouahidi, 2021 ). According to Al Rubaie (2021) , there are a number of challenges need to be addressed for cyber fraud detection in credit card. These challenges contain massive volume of data, that is unbalanced or incorrectly categorised, frequent changes in the type of transaction, and real-time detection.

As current technology being progressed, cyber credit card fraud is also developing rapidly, making cyber fraud detection a crucial area. The conventional techniques to resolve this problem is no longer sufficient. In the conventional technique, domain experts in cyber fraud compose the algorithms which are governed by strict rules. In addition, a proactive strategy must be used to combat cyber fraud. Every industry is attempting to employ ML-based solutions due to their popularity, speed, and effectiveness ( Priya & Saradha, 2021 ). ML and DL methods have been shown to be affective in this field. In particular, DL has garnered the most attention and had the most success in combating cyber threats recently. Its ability to minimize overfitting and discover underlying fraud tendencies, as well as its capacity to handle massive datasets, make it particularly useful in this field. In the past few years, DL techniques have been applied to recognize new fraudulent patterns and enable systems to respond flexibly to complex data patterns. In this review, we choose to focus on the latest research from 2019–2021 in order to provide the most up-to-date and relevant information on the topic because DL’s popularity has increased during this period.

While there are numerous cyber fraud detection techniques available, as yet no fraud detection systems have been able to deliver high efficiency and high accuracy. Thus it necessary to provide researchers and banking industry with an overview of the state of the art in cyber fraud detection and an analysis of the most recent studies in this field to conduct innovation projects for cyber fraud detection. To achieve this goal, this review will provide a detailed analysis of ML/DL techniques and their function in credit card cyber fraud detection and also offer recommendations for selecting the most suitable techniques for detecting cyber fraud. The study also includes the trends of research, gaps, future direction, and limitations in detecting cyber fraud in credit card.

This review focuses mostly on identifying the ML/DL techniques used to detect credit card cyber fraud. Moreover, we aim to analyse the gaps and trends in this field. Over the past few years, there have only been a few review articles published on detecting credit card cyber fraud. This review takes a look at the detection of card fraud from the standpoint of cybersecurity and applies ML/DL techniques and approached the topic from a financial standpoint. Furthermore, unlike other reviews, which also include conference article, ours only includes recent journal articles.

The aim of this review is to provide researchers with an overview of the state of the art in cyber fraud detection and an analysis of the most recent studies in this field. This review will assist researchers in selecting high-performance ML/DL algorithms and datasets to consider when attempting to detect cyber fraud. To answer the four research questions, we have utilized the search string to conduct research in six digital libraries. This resulted in a total of 2,094 article, all of which are journal article. In addition, we utilised the snowballing strategy to integrate more relevant articles missed by the automated search. Through careful referencing of the explored article, we have narrow down our collection and found the most relevant answers for our four research questions. As a result, 181 article were chosen for further study.

We describe our search study selection, data extraction procedures, and overall research methodology in “Survey Methodology” of this article. In “Result and Analysis”, we present the findings and answers to our research questions. In “Conclusions”, we conclude the study by discussing its findings.

Survey Methodology

The review investigates the present status of research on detecting cyber fraud in credit card and addresses our research questions. The methodology begins with a description of the data sources, the search strategy, the inclusion and exclusion criteria, as well as the quantity of research article selected from the different databases.

Research questions

This review attempts to summarise and analyse the ML and DL credit card cyber fraud detection algorithms from 2019 to 2021. The following research questions (RQs) are therefore posed:

RQ1: What ML/DL techniques are utilised in detection of credit card cyber fraud? This question aims to specify the ML/DL techniques that have been applied.

RQ2: What percentage of credit card cyber fraud detection articles discussed supervised, unsupervised, or semi-supervised techniques? This question seeks to determine the proportion of research articles that employ supervised, unsupervised, and semi-supervised credit cyber fraud detection techniques.

RQ3: What is the estimated overall performance and outcomes of ML/DL models? This question focuses on ML/DL model performance estimation and model results.

RQ4: What are the research trends, gaps, and potential future directions for cyber fraud detection in credit card? The question guides to uncover research trends, gaps in the existing literature, and future direction of credit card cyber fraud research.

Data sources and research strategy

After determining the research questions, we constructed the research as follows:

  • – The main search terms are determined by the research questions.
  • – Boolean operators (AND and OR) are used to restrict search results.
  • – The search terms utilised for this review are related to detect cyber fraud in credit card and ML/DL techniques used for fraud detection.

The methodology incorporates the following electronic literature databases in order to obtain a comprehensive and broad coverage of the literature and to maximise the probability of discovering highly relevant articles:

  • – Google Scholar—ACM—IEEE Xplore—SpringerLink—Web of Science—Scopus.

For the purpose of locating the most relevant article, particular Keywords were formulated into a search string. This string was divided into search units and Boolean operators were used to combine them. All of the mentioned resources have keyword-based search engines. We selected the following search string to retrieve the most relevant studies:

((AI OR “artificial intelligence” OR DL OR “deep learning” OR ML OR “machine learning”) AND (“Credit card fraud” OR “card fraud” OR “card-fraud” OR “credit-fraud” OR “card cyber fraud” OR “transaction fraud” OR “payment fraud” OR “fraud detec*” OR “bank* fraud” OR “financ* fraud”)).

We include “artificial intelligence” OR “deep learning” OR “machine learning” thus that we can find studies that utilised any of these techniques. Additionally, we included the “credit card fraud” OR “card fraud” OR “card-fraud” OR “credit-fraud” OR “card cyber fraud” OR “transaction fraud” OR “payment fraud” OR “fraud detec*” OR “bank* fraud” OR “financ* fraud” term to concentrate on any fraud-related content so that we do not miss any relevant articles.

We conducted a search for the above string in six digital libraries. The research string is edited and converted into an appropriate search query input for each library. Table 1 provides the detailed search queries. We limited our review to journal articles, excluding conference article, books, and other publications. In December 2021, our search conducted for the years from 2019 to 2021. There were a total of 2,094 items retrieved from research libraries. Table 2 depicts the distribution of the items throughout the libraries. We identified 365 duplicate article. After eliminating the duplicates, we continued with the selection process based on the remaining 1,729 article. In addition to the automatic searches of digital libraries, snowballing mechanism was also used.

Digital libraryQuery
Google scholar((AI OR “artificial intelligence” OR DL OR “deep learning” OR ML OR “machine learning”) AND (“Credit card fraud” OR “card fraud” OR “card-fraud” OR “credit-fraud” OR “card cyber fraud” OR “transaction fraud” OR “payment fraud” OR “fraud detec*” OR “bank* fraud” OR “financ* fraud”)).
ACM((All: AI) OR (All: “artificial intelligence”) OR (All: DL) OR (All: “deep learning”) OR (All: ML) OR (All: “machine learning”)) AND ((All: “credit card fraud”) OR (All: “card fraud”) OR (All: “card-fraud”) OR (All: “credit-fraud”) OR (All: “card cyber fraud”) OR (All: “transaction fraud”) OR (All: “payment fraud”) OR (All: “fraud detec*”) OR (All: “bank* fraud”) OR (All: “financ* fraud”)) AND (Publication date: (01/01/2019 TO 12/31/2021))
IEEE Xplore((AI OR “artificial intelligence” OR DL OR “deep learning” OR ML OR “machine learning”) AND (“Credit card fraud” OR “card fraud” OR “card-fraud” OR “credit-fraud” OR “card cyber fraud” OR “transaction fraud” OR “payment fraud” OR “fraud detec*” OR “bank* fraud” OR “financ* fraud”)). Filters applied: Journals 2019–2021.
Springerlink39 Result(s) for ‘((AI OR “artificial intelligence” OR DL OR “deep learning” OR ML OR “machine learning”) AND (“Credit card fraud” OR “card fraud” OR “card-fraud” OR “credit-fraud” OR “card cyber fraud” OR “transaction fraud” OR “payment fraud” OR “fraud detec*” OR “bank* fraud” OR “financ* fraud”))’ within article 2019–2021.
Web of science((AI OR “artificial intelligence” OR DL OR “deep learning” OR ML OR “machine learning”) AND (“Credit card fraud” OR “card fraud” OR “card-fraud” OR “credit-fraud” OR “card cyber fraud” OR “transaction fraud” OR “payment fraud” OR “fraud detec*” OR “bank* fraud” OR “financ* fraud”)). Refined by: publication years: 2019 or 2020 or 2021 Document types: Articles languages: English.
ScopusTITLE-ABS-KEY (((AI OR “artificial intelligence” OR DL OR “deep learning” OR ML OR “machine learning”) AND (“Credit card fraud” OR “card fraud” OR “card-fraud” OR “credit-fraud” OR “card cyber fraud” OR “transaction fraud” OR “payment fraud” OR “fraud detec*” OR “bank* fraud” OR “financ* fraud”))) AND (LIMIT-TO (PUBYEAR, 2021) OR LIMIT-TO (PUBYEAR, 2020) OR LIMIT-TO (PUBYEAR, 2019)) AND (LIMIT-TO (DOCTYPE, “AR”)).
NODatabaseWeb addressRetrieved article
1Google scholar 1,418
2Springerlink 39
3Scopus 292
4IEEE Xplore 76
5Web of science 233
6ACM 36
Total of retrieved article 2,094
The number of duplicates 365
The number of article after removing duplicates 1,729

Study selection

We executed the above search strategy during December 2021 and identified 2,094 article. After removing duplicates (365 articles), the titles and abstracts of 1,729 unique citations were screened for eligibility. We screened the titles and abstracts for relevance. If the study’s relevance could not be verified due to insufficient abstract information or the absence of an abstract, the citation was assigned for full-text review. Thus we reviewed the full text of 281 studies. Disagreements on the included studies were resolved through discussion and consensus. The selected article were filtered to ensure that only relevant studies were included in our review. Then the article were exported to EndNote and grouped for each database and then exported to a literature review management software called Rayyan ( Ouzzani et al., 2016 ) to facilitate the screening and selection process. To initiate the filtering and selection processes, duplicate articles gathered from multiple digital resources are eliminated. Then using inclusion and exclusion criteria, removed the irrelevant article. Using quality evaluation processes we included only the qualified article that offer the most effective answers to our study objectives. Using the collected article references, we searched for further related publications. Figure 1 displays the article selection process. The inclusion and exclusion criteria utilised for this review are detailed in Table 3 . After the filtration process was completed, 181 article were observed for this study.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1278-g001.jpg

Inclusion criteriaExclusion criteria
Include journal article onlyExclude conference article, chapter book, and other publication.
Include articles about credit card cyber fraud detectionExclude articles not related to detect cyber fraud in credit card
Include articles that used ML/DLExclude articles that did not use ML/DL
Include articles published in 2019, 2020, and 2021Exclude articles that published before 2019 and after 2021
Include articles in English languageExclude publications in languages other than English.

Data extraction

This process aims to analyse the final selection of article in order to collect the data required to answer the four research questions. Table 4 displays our data extraction form. In the final column of Table 4 , the reason for extracting the corresponding data were given. We answered RQ1 and RQ2 using information regarding techniques and datasets. We used this information to group studies with comparable datasets and techniques. Extraction of each article’s discussion and findings was an aid in estimating the overall performance of approaches and answering RQ3. By extracting out the article’ objectives and conclusions, we are able to recognise trends, conduct gap analysis, determine future research, and provide a response to RQ4. As a result, in order to identify the gaps and define the next direction of future research should take, on the basis of the article’s objectives and conclusions, we conducted a summary analysis.

StrategyCategoryDescriptionPurpose
Automatic extractionTitle of articlethe article’s titleAdditional information
Authors of articleThe author’s name
Article yearThe year of publication
Article typeJournal
Manual extractionObjectivesstudy objectivesRQ4
ConclusionOutcomes of studyRQ4
TechniquesML/DL technique utilised to support objectivesRQ1 and RQ2
Discussion and resultOutcomesRQ3
Algorithm typeML, DL, or mixRQ1 and RQ2
DatasetDataset used in articleRQ1
Future workGaps, trends, and future workRQ4

Result and analysis

Distribution of chosen articles throughout the years.

To explore the most recent techniques described in journals published in this field, limits were placed on publishing years. Our review selected article that were published from 2019 to 2021. In Fig. 2 we specified the distribution of article by year of publication. Since our study was completed in December 2021, it is important to note that article published after December 2021 were not included.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1278-g002.jpg

Publication type

In this review, we evaluated only journal publications. Table A1 displays the selected research articles published during the observation period.

Article IDArticle titleTypeYearReference
A1Comparative analysis of back-propagation neural network and k-means clustering algorithm in fraud detection in online credit card transaction.Journal2019
A2Credit card fraud detection using machine learning classification algorithms over highly imbalanced data.Journal2020
A3Hybrid CNN-BILSTM-Attention based identification and prevention system for banking transactions.Journal2021
A4Identify theft detection using machine learning.Journal2021
A5Hidden Markov model application for credit card fraud detection systems.Journal2020
A6Enhanced SMOTE & fast random forest techniques for credit card fraud detection.Journal2020
A7Fraud identification of credit card using ML techniques.Journal2020
A8Improvement in credit card fraud detection using ensemble classification technique and user data.Journal2021
A9Credit card fraud detection integrated account and transaction sub modules.Journal2021
A10Credit card fraud detection using autoencoder model in unbalanced datasets.Journal2019
A11Fraud detection in credit card using logistic regression.Journal2020
A12A financial fraud detection model based on LSTM deep learning technique.Journal2020
A13Comparative study of machine learning algorithms and correlation between input parameters.Journal2019
A14Example-dependent cost-sensitive credit cards fraud detection using SMOTE and Bayes minimum risk.Journal2020
A15Credit card fraud detection on skewed data using machine learning techniques.Journal2021
A16Facilitating user authorization from imbalanced data logs of credit card using artificial intelligence.Journal2020
A17Intelligence feature selection with social spider optimization based artificial neural network model for credit card fraud detection.Journal2020
A18Deal-deep ensemble algorithm framework for credit card fraud detection in real-time data stream with Google TensorFlow.Journal2020
A19Credit card fraud detection using artificial neural network.Journal2021
A20IFDTC4.5: intuitionistic fuzzy logic based decision tree for E-transactional fraud detection.Journal2020
A21Credit card fraud detection using hybrid models.Journal2019
A22Comparative analysis of different distribution dataset by using data mining techniques on credit card fraud detection.Journal2020
A23Improving detection of credit card fraudulent transaction using generative adversarial networks.Journal2019
A24Credit card fraud detection using pipeling and ensemble learning.Journal2020
A25Emerging approach for detection of financial fraud using machine learning.Journal2021
A26Detection of fraud transactions using recurrent neural network during COVID-19: fraud transaction during COVID-19.Journal2020
A27Enhancing the credit card fraud detection through ensemble techniques.Journal2019
A28Credit card fraud detection using data mining and statistical methods.Journal2020
A29Credit card fraud detection model based on LSTM recurrent neural networks.Journal2021
A30Credit card fraud detection using machine learning algorithms.Journal2020
A31Credit card fraud detection using autoencoders.Journal2020
A32Credit card fraud detection using naïve Bayes and robust scaling techniques.Journal2021
A33A closer look into the characteristics of fraudulent and transactions.Journal2020
A34Evaluation of deep neural networks for reduction of credit card fraud alerts.Journal2020
A35Deep convolution neural network model for credit-card fraud detection and alert.Journal2021
A36Graph neural network for fraud detection spatial-temporal attention.Journal2020
A37Deep learning-based hybrid approach of detecting fraudulent transactions.Journal2021
A38Combined technique of supervised classifier for the credit card fraud detection.Journal2020
A39Supervised machine learning algorithms for detection credit card fraud.Journal2021
A40Using harmony search algorithm in neural networks to improve fraud detection in banking system.Journal2020
A41Detecting electronic banking fraud on highly imbalanced data using hidden Markov models.Journal2021
A42Machine learning based on resampling approaches and deep reinforcement learning for credit card fraud detection systems.Journal2021
A43Credit card fraud detection system using data mining.Journal2020
A44A comparative study on credit card fraud detection.Journal2021
A45Supervised machine learning algorithms for credit card fraudulent transaction detection.Journal2019
A46Credit card fraud detection analysis using robust space invariant artificial neural networks (RSIANN).Journal2019
A47Credit card fraud detection system.Journal2020
A48Artificial intelligence based credit card fraud identification using fusion method.Journal2019
A49Credit card fraud detection using random forest.Journal2019
A50Performance evaluation of credit card fraud transaction using boosting algorithms.Journal2019
A51Fraud detection in credit card transaction using anomaly detection.Journal2021
A52Semi-supervised classification on credit card fraud detection using autoencoders.Journal2021
A53Artificial neural network technique for improving predication of credit card default: a stacked sparse autoencoder approach.Journal2021
A54Credit card fraud detection based on machine learning.Journal2019
A55Comparison of different ensemble methods in credit card default prediction.Journal2021
A56A novel method for detection of fraudulent bank transactions using multi-layer neural networks with adaptive learning rate.Journal2020
A57Using generative adversarial networks for improving classification effectives in credit card fraud detection.Journal2019
A58Ensemble of deep sequential models for credit card fraud detection.Journal2021
A59Detection of credit card fraudulent transaction using boosting algorithms.Journal2021
A60Predication credit card transaction fraud using machine learning algorithms.Journal2019
A61Financial fraud detection using naïve Bayes algorithm in highly imbalance data set.Journal2021
A62Anomaly detection in credit card transactions using machine learning.Journal2020
A63Uncertainty-aware credit card fraud detection using deep learning.Journal2021
A64Credit card fraud detection using ensemble classifier.Journal2019
A65An implementation of decision tree algorithm augmented with regression analysis for fraud detection in credit card.Journal2020
A66Credit card fraud detection technique using hybrid approach: an amalgamation of self-organizing maps and neural networks.Journal2020
A67Machine learning methods for discovering credit card fraud.Journal2020
A68Improved deep forest more for detection of fraudulent online transaction.Journal2020
A69Using variational auto encoding in credit card fraud detection.Journal2020
A70Credit card fraud detection using naïve Bayesian and c4.5 decision tree classifiers.Journal2020
A71Credit card fraud detection using fuzzy rough nearest neighbor and sequential minimal optimization with logistic regression.Journal2021
A72Fraud classification and detection model using different machine learning algorithm.Journal2021
A73An efficient domain-adaptation method using different machine learning GAN for fraud detection.Journal2020
A74Service-based credit card fraud detection using oracle SOA suite.Journal2021
A75Comparison and analysis of logistic regression, naïve Bayes and KNN machine learning algorithms for credit card fraud detection.Journal2021
A76Credit card fraud detection using isolation forest and local factor.Journal2021
A77Credit card fraud detection using random forest algorithm.Journal2019
A78A multiple classifiers system for anomaly detection in credit card data with unbalanced and overlapped classes.Journal2020
A79Supervised machine learning algorithms for credit card fraudulent transaction detection.Journal2019
A80Credit card fraud detection using machine learning.Journal2019
A81Champion-challenger analysis for credit card fraud detection: hybrid ensemble and deep learning.Journal2019
A82A novel framework for credit card fraud detection.Journal2021
A83Automatic machine learning algorithms for fraud detection in digital payment systems.Journal2020
A84A new hybrid method for credit card fraud detection on financial data.Journal2019
A85A study of fraud detection approaches in credit card transactions.Journal2020
A86Credit card fraud detection using Bayesian belief network.Journal2020
A87An efficient approach for credit card fraud detection.Journal2020
A88Comparative analysis for fraud detection using logistic regression, random forest and support vector machine.Journal2020
A89Fraud detection and prevention in banking financial transaction with machine learning using R.Journal2020
A90Comparative study on credit card fraud detection based on different support vector machines.Journal2021
A91Credit card fraud detection with autoencoder and probabilistic random forest.Journal2021
A92Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs.Journal2020
A93An experimental study with imbalanced classification approaches for credit card fraud detection.Journal2019
A94Credit card fraud detection system using machine learning.Journal2021
A95Analysis of credit card fraud detection using machine learning models on balanced and imbalanced datasets.Journal2021
A96Credit card fraud detection using machine learning and data science.Journal2019
A97Novel machine learning approach for analysis anonymous credit card fraud patterns.Journal2019
A98Credit card fraud detection using machine learning.Journal2021
A99Detection fraudulent credit card transactions using outlier detection.Journal2019
A100Credit card fraud detection in payment using machine learning classifiers.Journal2020
A101An autoencoder based model for detecting fraudulent credit card transaction.Journal2020
A102A comparative study on classification algorithms for credit card fraud detection.Journal2020
A103Credit card fraud detection using random forest algorithm.Journal2019
A104Credit card fraud detection using supervised learning approach.Journal2021
A105A SOMTE based oversampling data-point approach to solving the credit card data imbalance problem in financial fraud detection.Journal2021
A106Using machine learning to detect credit card fraudulent transactions.Journal2021
A107Credit card fraud detection using autoencoder neural network.Journal2019
A108Credit card fraud detection using ANN.Journal2019
A109An improved hybrid system for the prediction of debit and credit card fraud.Journal2019
A110Deep learning methods for credit card fraud detection.Journal2020
A111A comparison of data sampling techniques for credit card fraud detection.Journal2020
A112Credit card fraud detection using machine learning algorithms.Journal2020
A113A machine learning approach for detecting credit card fraudulent transaction.Journal2021
A114Credit card fraud detection using AdaBoost.Journal2020
A115A comparison study of credit card fraud detection: supervise unsupervised.Journal2019
A116Credit card fraud detection using random forest algorithm.Journal2019
A117A comparative study of machine learning classifiers for credit card fraud detection.Journal2020
A118Spectral-cluster solution for credit-card fraud detection using a genetic algorithm trained modular deep learning neural network.Journal2021
A119Comparative analysis of credit card fraud detection in simulated annealing trained artificial neural network and hierarchical temporal memory.Journal2021
A120Credit card fraud detection using isolation forest.Journal2021
A121Credit card fraud detection using machine learning algorithms.Journal2020
A122Credit card fraud detection framework a machine learning perspective.Journal2020
A123The improving prediction of credit card fraud detection on PSO optimized SVM.Journal2019
A124Credit card fraud detection using boosted stacking.Journal2019
A125Credit card fraud detection technique by applying graph database model.Journal2021
A126Online fraud detection using deep learning techniques.Journal2021
A127A hybrid method for credit card fraud detection using machine learning algorithm.Journal2021
A128Anomaly detection using unsupervised methods: credit card fraud case study.Journal2019
A129Discovering of credit card scheme with enhance and common by vote.Journal2021
A130Enhanced credit card fraud detection based on SVM-recursive feature elimination and hyper-parameters optimization.Journal2020
A131Bidirectional gated recurrent unit for improving classification on credit card fraud detection.Journal2021
A132Credit card fraud detection system using smote technique and whale optimization algorithm.Journal2019
A133Fraud detection in online transaction.Journal2020
A134Credit card fraud detection using machine learning.Journal2021
A135Machine learning approach on apache spark for credit card fraud detection.Journal2020
A136Credit card fraud detection using weighted support vector machine.Journal2020
A137Machine learning methods for analysis fraud credit card transaction.Journal2019
A138A review on credit card fraud detection using machine learning.Journal2019
A139Financial fraud detection using bio-inspired key optimization and machine learning technique.Journal2019
A140Semisupervised algorithms based credit card fraud detection using majority voting.Journal2021
A141Artificial intelligence framework for credit card fraud detection using supervised random forest.Journal2021
A142An intelligent payment card fraud detection system.Journal2021
A143HOBA: a novel feature engineering methodology for credit card fraud detection with a deep learning architecture.Journal2021
A144Dual autoencoders generative adversarial network for imbalanced classification problem.Journal2020
A145Performance analysis of isolation forest algorithm in fraud detection of credit card transactions.Journal2020
A146Credit card fraud detection from imbalanced dataset using machine learning algorithm.Journal2020
A147Credit card fraud forecasting model based on clustering analysis and integrated support vector machine.Journal2019
A148Credit card anomaly detection using improved deep autoencoder algorithm.Journal2020
A149Credit card fraud detection using deep learning techniques.Journal2021
A150Detecting credit card frauds using different machine learning algorithms.Journal2021
A151Isolation forest and local outlier factor for credit card fraud detection system.Journal2020
A152Analysis of machine learning credit card fraud detection models.Journal2021
A153Time varying inertia weight dragonfly algorithm with weighted feature-based support vector machine for credit card fraud detection.Journal2021
A154Predicting credit card fraud on a imbalanced data.Journal2019
A155Master card fraud detection using arbitrary forest.Journal2019
A156Credit card fraud detection using data analytic techniques.Journal2020
A157Optimized stacking ensemble (OSE) for credit card fraud detection using synthetic minority oversampling model.Journal2021
A158Aggrandized random forest to detect the credit card frauds.Journal2019
A159An efficient credit card fraud detection model based on machine learning methods.Journal2020
A160Modified focal loss in imbalanced XGBoost for credit card fraud detection.Journal2021
A161Credit card fraud detection using hidden Markov model.Journal2019
A162Credit card fraud detection using isolation forest.Journal2020
A163Comparing different models for credit card fraud detection.Journal2020
A164Credit card fraud detection under extreme imbalanced data: a comparative study of data-level algorithms.Journal2021
A165Credit card fraud detection: a comparison using random forest, SVM and ANN.Journal2019
A166Credit card fraud detection using machine learning methodology.Journal2019
A167Credit card fraud detection using machine learning.Journal2021
A168An intelligent approach to credit card fraud detection using an optimized light gradient boosting machine.Journal2020
A169Real time credit card fraud detection.Journal2021
A170Credit card fraud detection using federated learning techniques.Journal2020
A171A supervised learning algorithm for credit card fraud detection.Journal2021
A172A comparative study of credit card fraud detection using machine learning for United Kingdom dataset.Journal2019
A173Outlier detection credit card transactions using local outlier factor algorithm (LOF).Journal2019
A174Credit card fraud detection using machine learning approach.Journal2021
A175Real-time deep learning based credit card fraud detection.Journal2020
A176A perceptron based neural network data analysis architecture for the detection of fraud in credit card transactions in financial legacy system.Journal2021
A177Credit card fraud detection techniques.Journal2020
A178Adaptive model for credit card fraud detection.Journal2020
A179Credit card fraud detection by modelling behaviour pattern using hybrid ensemble model.Journal2021
A180Credit card fraud detection using PSO optimized neural network.Journal2020
A181Detection and prediction of credit card fraud transactions using machine learning.Journal2019

Data synthesis results

This section examines the ultimately selected article (181 article). In order to provide a response to each of our four research questions, a synthesis of the data is performed. For RQ1: What types of ML/DL algorithms and datasets are used in credit card cyber fraud detection?

Cyber fraud detection techniques

In this part we address RQ1, which seeks to specify the ML/DL techniques used in detecting cyber fraud in credit card from 2019 to 2021.

Machine learning

ML identified as a technique relevant to a wide range of problems, especially in sectors requiring data analysis and processing. ML, which is classified as supervised ML, unsupervised ML, and reinforced ML, plays a crucial role in resolving the unbalanced dataset. ML techniques are tremendously effective for detecting and preventing fraud because they enable the automated recognition of patterns across vast amounts of data. Adopting the proper ML models facilitates the differentiation between fraudulent and legitimate behaviour. These clever systems may adapt over time to new, unseen fraud schemes. Thousands of computations must be executed correctly in milliseconds for this to be possible. Both supervised and unsupervised technologies help detect cyber fraud and must be included in the future generation of fraud safeguards.

Supervised Learning is the training technique for ML algorithms on labelled data sets and configurable data with known variable targets. Classification, regression, and inference are all instances of supervised learning. In all field, supervised models that are trained on a large number of accurately labelled transactions are the most common ML technique. Each transaction is classified as either fraudulent or legitimate. The models are trained by giving them voluminous labelled transaction data in order for them to discover patterns that best resemble genuine behaviour.

Unsupervised learning is the process of training a ML algorithm on a dataset containing ambiguous target variables. The model make an effort to discover the most significant patterns in data. Unsupervised learning technique include dimension removal and cluster segmentation.

Semi-supervised learning combines supervised and unsupervised learning by training model on unlabeled data. In this method, the unsupervised learning attribute is utilised to determine the optimal data representation, while the directed learning attribute is used to analyse the relationships within that representation and subsequently create predictions.

Multiple research utilised supervised, unsupervised, and semi-supervised ML approaches. Table B1 displays the frequency of use of ML and DL techniques in the reviewed literature, indicating how often each technique type is utilised. Several article utilised several ML/DL techniques, as should be highlighted.

Learning typeTechniqueUsage frequencyReference
SupervisedLogic regression (LR)52 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , .
Naive Bayes (NB)42 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Decision tree (DT)49 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , .
Random forest (RF)74 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , .
K-near neighbor (KNN)39 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Support vector machine (SVM)56 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , .
Bayesian belief networks2 ,
Genetic algorithm (GA)5 , , , , .
Artificial immune systems (AIS)1
Fuzzy logic1
Logistic model tree (LMT)1
UnsupervisedHidden Markov model (HMM)7 , , , , , , .
K-means7 , , , , , , .
Isolation forest19 , , , , , , , , , , , , , , , , , ,
Self-organizing map (SOM)2 ,
Principle component analysis (PCA)3 , , .
Density based spatial clustering of applications with noise (DBSCAN)1 ,
Local outlier factor (LOF)13 , , , , , , , , , , , , .
One-class SVM3 , ,
Semi-supervisedSemi-supervised learning3 , ,
ReinforcementReinforcement1
Ensemble learningADA Boost20 , , , , , , , , , , , , , , , , , , , .
RUSBoost2 , .
XGBoost (XG)18 , , , , , , , , , , , , , , , , ,
CatBoost (CB),3 , , .
Gradient boosting12 , , , , , , , , , , , .
Light gradient boosted (Light GBM)4 , , , .
Bagging5 , , , , .
Voting10 , , , , , , , , , .
Pipelining1
stacking4 , , ,
Deep learningCNN7 , , , , , , .
DNN4 , , ,
DCNN4 , , , .
Long short-term memory (LSTM)/BILSTM8 , , , , , , , .
Auto-encoder (AE)18 , , , , , , , , , , , , , , , , ,
Dual autoencoders (DAE)4 , , ,
Deep reinforcement learning (DLR)1
Generative adversarial networks (GANs)7 , , , , , , .
Recurrent neural network (RNN)7 , , , , , , .
Gated recurrent units (GRU)3 , , .
Gradient descent algorithms1
Variational automatic coding (VAE)1
Artificial neural network (ANN)36 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , .
Multilayer perceptron (MLP)14 , , , , , , , , , , , , , .
Restricted Boltzmann machine (RBM)3 , , .
Deep belief network (DBN)1
Sampling techniqueSynthetic minority over sampling technique (SMOTE)17 , , , , , , , , , , , , , , , , .
The adaptive synthetic (ADASYN)3 , ,
Random oversampling (ROS)1
Tomek2 ,

Supervised techniques

Classification techniques

Utilizing supervised algorithms is the most common method for detecting credit card cyber fraud. Various supervised models are utilised in this field. Support vector machine (SVM) utilised to classify data samples into two groups using a maximum margin hyper plane. It specifically classifies fresh data points using a labelled dataset for every category. The SVM used in 56 reviewed articles. SVM’s kernel consists of mathematical functions that convert input data to high-dimensional space. Therefore, SVM can classify linear and nonlinear (using kernel function) data.

Linear, radial, polynomial, and sigmoid are the four types of kernel functions, utilised in Li et al. (2021) , this article uses SVM to detect credit card fraud. Using cuckoo search algorithm (CS) and genetic algorithm (GA) with particle swarm optimisation technique to optimise the SVM parameters (PSO). Experiments have shown that the linear kernel function is the most effective function. Kernel function is optimised using radial basis function. In terms of overall performance, PSO-SVM outperforms CS-SVM and GA-SVM.

Pavithra & Thangadurai (2019) suggested a hybrid architecture involving the optimization of the particles swarm (PSO). Feature selection algorithm based on SVM was used to improve prediction of cyber fraud. Results shown PSO-SVM method is an optimal preparatory instrument for enhancing feature selection optimisation. In Zhang, Bhandari & Black (2020) , a weighted SVM algorithm is utilised. Experiments revealed that this model significantly enhance the performance. Weighted feature based SVM (WFSVM) with time varying inertia weight base dragonfly algorithm (TVIWDA) proposed in Arun & Venkatachalapathy (2021) . TVIWDA-optimized property is chosen to increase the detection accuracy. Then, using the WFSVM classifier and the specified characteristics, the classification is performed. The results shown that the suggested model outperforms the current random tree based technique. WFSVM is more efficient with smaller datasets.

The decision tree (DT) approach has gained remarkable interest from researchers. The DT algorithm appeared in 49 articles. In Bandyopadhyay et al. (2021) , the DT classifier applied for detection of financial frauds. DT algorithm performs the best with an accuracy of (0.99) comparing with another classifier. DT with boosting technique applied in Barahim et al. (2019) . The results show that applying boosting with DT outperforms other methods. The model obtained highest accuracy of 98.3%. In Choubey & Gautam (2020) , a combination of supervised algorithms such as DT, RF, LR, naive Bayes (NB), and K-near neighbor (KNN) have been utilised. The study observed that hybrid classifier DT with KNN worked better than any other single classifier. In Hammed & Soyemi (2020) , the utilisation of the DT algorithm enhanced with regression analysis is described. The result indicates enhanced performance. This approach is accurate, with a misclassification error rate of 18.4%, and the system successfully validated all of the inserted incursions used for testing.

Among ML approaches, the C4.5 algorithm acts a DT classifier. The decision is based on certain occurrences of data. Four articles utilised C4.5 tree ( Askari & Hussain, 2020 ; Beigi & Amin Naseri, 2020 ; Husejinovic, 2020 ; Mijwil & Salem, 2020 ). New model applied C4.5 in Mijwil & Salem (2020) . The study revealed that C4.5 is the best classifier comparing with other ML techniques. Credit card fraud detection using C4.5 DT classifier with bagging ensemble has been applied in Husejinovic (2020) . The study revealed that bagging with C4.5 DT is the best algorithm. Logistic model tree (LMT) has been used in DT for classification. In Hussein, Abbas & Mahdi (2021) , LMT applied to fraud classification and detection. The result shows that applying LMT algorithm to classification fraud is better than other techniques. LMT model obtained 82.08% accuracy. Intuitionistic fuzzy logic based DT (IFDTC4. 5) applied in Askari & Hussain (2020) for transaction fraud detection. The results show that the IFDTC4.5 outperforms other techniques and able to detect fraud proficiently.

One of the most powerful techniques is RF, which is a modern variation of DT. According to the examined literature, RF is the most prevalent credit card fraud detection method (74 articles). Some reviewed articles used RF only for comparison with the developed methods. In Amusan et al. (2021) , RF applied for fraud detecting on skewed data. Results indicated that RF recorded highest accuracy (95.19%) comparing with KNN, LR, and DT. Furthermore, RF applied with other techniques such as SVM, NB, and KNN in Ata & Hazim (2020) . The results showed that RF algorithm performs better than the other techniques. A hybrid model or combination of supervised classifiers appeared in Choubey & Gautam (2020) . Several techniques such as RF, KNN, and LR have been applied. Results show that RF with KNN worked better than applied as a single classifier.

New model applied RF in Meenakshi et al. (2019) . The study revealed that the RF algorithm performs better with more training data, but testing and application speeds will decrease. Jonnalagadda, Gupta & Sen (2019) applied RF in their study. The recommended values for the highest level of RF precision are 98.6%. This proposed module is suitable to a larger data set and yields more precise results. With more training data, RF algorithm will perform better. In Hema & Muttipati (2020) LR, RF, and Catboost have been applied for discovering cyber fraud. The result shows RF with Catboost gives high accuracy. RF gives the best result with accuracy (99.95). RF with SMOTE applied in Ahirwar, Sharma & Bano (2020) . The results obtained by the RF algorithm showed that this approach would be successful in real time. This model is intended to have some insight into the identification of fraud.

Bayesian technique is an additional classification method. We explored 42 articles that utilised NB, and two articles used Bayesian belief networks (BBN). Detection of credit card fraud via NB and robust scaling approaches described in Borse, Patil & Dhotre (2021) . The results indicate that the NB classifier with the robust scaleris is the most effective in predicting fraudulent activity in the dataset. NB using robust scaling got the accuracy 97.78%. In Divakar & Chitharanjan (2019) , the NB classifier and other classifiers were applied. NB did not obtain the best result when comparing with other classifiers. In Gupta, Lohani & Manchanda (2021) , among ML algorithms such as LR, RF, and SVM, the NB algorithm’s performance is remarkable. BBN applied in Kumar, Mubarak & Dhanush (2020) for detecting fraud in credit card. Result showed a BBN is more accurate than the NB classifier. This is disturbed with using the fact of conditional dependence between the attributes in Bayesian network, but it requires more calculation and training process. The transaction of data value available in dataset which is trained with their results as fraud or genuine transaction which is predicted by a testing data value for individual transaction.

The K-nearest neighbors (KNN) algorithm applied in 39 articles. Various studies were used KNN technique in detecting credit card fraud. KNN uses neighbouring samples to identify class label. The KNN technique is best for overlapping sample sets ( Yao et al., 2019 ). In this review, several articles applied KNN as classifiers. Chowdari (2021) reported that the KNN is a stronger classifier at detecting fraud in credit cards comparing with other techniques such as DT, LR, and RF. In DeepaShree et al. (2019) , Kumar, Student & Budihul (2020) , the KNN classifier applied for credit card fraudulent transaction detection, comparing with RF and NB, KNN showed the highest accuracy than the RF algorithm and NB. In Parmar, Patel & Savsani (2020) and Vengatesan et al. (2020) , the KNN technique compared with many other techniques such SVM, LR, DT, RF XGBoost. The KNN model is the most precise model. KNN model got accuracy score: 99.95%. New ML approach to detect anonymous fraud patterns appeared in Manlangit, Azam & Shanmugam (2019) , Synthetic minority oversampling technique (SMOTE) with KNN proposed. Results reveal that proposed model performed well. KNN model achieves a precision 98.32% and 97.44.

Regression techniques

In this review, the studies utilised logistic regression (LR) technique frequently. A total of 52 studies employed LR for cyber fraud detection. LR models can be utilised for both multiclass and binary classification. LR is a statistical strategy that models a binary dependent variable using a logistic function. In Adityasundar et al. (2020) , LR applied over highly imbalanced data. Using unbalanced data, the study developed a classification model that is extremely resistant. New system uses LR to build the classifier proposed in Alenzi & Aljehane (2020) . Comparing the proposed LR-based classifier against the KNN and voting classifiers. The result demonstrates that LR-based produces the most accurate findings, with a 97.2% success. Itoo & Singh (2021) revealed a comparison between LR, NB, and KNN for fraud detection. Results show that LR achieved an optimal performance. LR was successful in achieving greater accuracy than KNN and NB. The LR attained accuracy of 95%, while the NB achieved 91%, and the KNN achieved 75% ( Itoo & Singh, 2021 ). In Karthik et al. (2019) , a newly proposed approach shown that employing a stacking classifier that applies LR as a meta classifier is the most promising method, followed by SVM, KNN, and LR. A study by Soh & Yusuf (2019) suggested four models to detect fraud on an imbalanced data. Result shows that the RF and KNN are overfitting. Thus, only the DT and LR have been compared. The result shows that LR with stepwise splitting rules has outperformed the DT with only 0.6% error rate. Sujatha (2019) used single and hybrid model of under sampling and over sampling. The study revealed that LR is best among all the algorithms. The result shows that the proposed model LR and NN approaches outperform DT.

Ensemble techniques

Random forest model is an ensemble approach appeared in the examined literature. RF often achieves superior performance against single DT by producing a stack of DT over training. New research conducted in 2021 revealed that RF outperforms K-means and SVM ( Al Rubaie, 2021 ).

Another ensemble method is bagging, which is a collection of different estimators created using a particular learning process to enhance a single estimator. Bagging reduces DT classifier variance. The approach creates random subsets from the training sample. In the reviewed articles, five article applied bagging methods ( Alias, Ibrahim & Zin, 2019 ; Husejinovic, 2020 ; Lin & Jiang, 2021 ; Mijwil & Salem, 2020 ; Karthik, Mishra & Reddy, 2022 ). Husejinovic (2020) applied C4.5 DT, NB, and bagging ensemble to predict fraud. Result shows that best algorithm is bagging with C4.5 DT.

Boosting includes adaptive boosting algorithm (AdaBoost), RUSBoost, gradient boosting algorithm (GBM), LightGBM, and XG Boost algorithm. A total of 59 articles utilised boosting techniques in the reviewed articles. AdaBoost employed by Barahim et al. (2019) . In this study, DT, NB, and SVM used with AdaBoost. The results show that AdaBoost with DT outperforms other techniques. A comparison of different ensemble methods to predict fraud in credit card has been done by Faraj, Mahmud & Rashid (2021) . Experiment shows that XGBoosting performs better when compared to other ensemble methods and also better than neural networks.

Stacking is a method of ensemble learning that combines multiple classification or regression systems. In stacking, a single model used to exactly integrate predictions from contributing models, but in boosting, a series of models are utilised to enhance the predictions of earlier models. In contrast to bagging, utilising the complete data set as compared to portions of the training dataset. Four articles have been used stacking to learn a classifier for detecting fraud in credit card ( Karthik et al., 2019 ; Muaz, Jayabalan & Thiruchelvam, 2020 ; Prabhakara et al., 2019 ; Veigas, Regulagadda & Kokatnoor, 2021 ). The stacked ensemble approach has demonstrated potential for detecting fraudulent transactions. Stacked ensemble has the best performance at 0.78 after trained for sampled datasets ( Muaz, Jayabalan & Thiruchelvam, 2020 ).

Unsupervised techniques

Clustering is the process of categorising similar instances into identical groupings. The clustering methods utilised far less comparing with classification methods in the reviewed article. The hidden Markov model is used to model probability distribution across sequences of observation. It consists of hidden states and observable outputs. HMM has been applied in seven articles. In Das et al. (2020) , HMM model applied to detect cyber. Results show a great performance of proposed system, also demonstrate advantage of learning cardholder’s spending behaviour. Singh et al. (2019) suggested method to identify cardholders spending profile, then attempts to find out the observation symbols, these observation symbols will help for an initial estimate of the model parameters. Thus, HMM can detect if the transaction is genuine or fraud. SMOTE utilised along with HMM and density based spatial clustering of application and noise. This new model (SMOTE+DBSCAN+HMM) performed relatively better for all the various hidden states.

K-means has been applied in seven articles. The K-means algorithm is a non-hierarchical method applied for data clustering. The algorithm uses a simple method. Thus, K-means classifies a given dataset into a specified number of clusters or K-clusters. In Abdulsalami et al. (2019) , K-mean was applied with back-propagation neural network (BPNN). The result shows that there is a significance difference between BPNN and K-means for detecting fraud credit card transaction. The BPNN model achieved a great accuracy with less false alarms comparing with K-means model. Results also show that the accuracy of BPNN is 93.1% while K-means accuracy is 79.9%.

Isolation forest is an unsupervised ensemble. No point-based distance calculation and no profiling of regular instances are done. Instead, the Isolation forest builds an ensemble of DTs. The concept of isolation forest is to spilt anomalies with the purpose of isolation them. An ensemble of DTs is generated for a particular data collection, the data points with the shortest average path length are considered anomalous. Isolation forest has been applied in 19 articles. In Meenu et al. (2020) , a new Isolation Forest model to detect fraud is utilised. The model demonstrates the efficiency in fraud detection, observed to be 98.72%, which indicates a significantly better approach than other fraud detection techniques. Isolation forest with local outlier factor to detect fraud applied in Vijayakumar et al. (2020) . Isolation forest showed accuracy as 99.72% while local outlier factor showed accuracy as 99.62%. Isolation factor is better observed in online transactions. A study by Palekar et al. (2020) that K-means clustering and (Isolation forest and local outlier factor) can be created and developed on a very large scale to detect fraud in credit card transaction.

Self-organising map (SOM) is unsupervised neural networks learning (NN). SOM is appropriate for building and analysing the profiles of customers to detect fraud. SOM applied in two reviewed articles. SOM and NN in hybrid approach applied in Harwani et al. (2020) . Compared to using SOM and ANN alone, the suggested model reached a better accuracy and cost. In Deb, Ghosal & Bose (2021) , three unsupervised algorithms, K-means, K-means clustering using principle component analysis (PCA), T-distributed stochastic neighbor embedding (T-SNE), and SOM are presented. This model achieved accuracy of 90% for fraud detection in credit card. The results show also K-means clustering along with PCA is much better than simple K-means. Also, T-SNE is much better than PCA as the PCA gets highly affected by outliers.

Semi-supervised techniques

A hybrid technique combining supervised and unsupervised learning. The unsupervised learning attribute is utilised to determine the optimal representation of data, whereas the supervised learning attribute is employed to investigate the relationships in the representation before beginning to predict. Semi-supervised learning is extremely useful when the data collection is unbalanced. The studies in this review utilised semi-supervised technique in their researches. Three studies employed semi-supervised to detect fraud in credit card ( Dzakiyullah, Pramuntadi & Fauziyyah, 2021 ; Pratap & Vijayaraghavulu, 2021 ; Shekar & Ramakrisha, 2021 ). In Dzakiyullah, Pramuntadi & Fauziyyah (2021) , a combination of semi-supervised learning and AutoEncoders to detect fraudulent transaction is presented. This proposed model utilized an autoencoder then trains the basic linear classifier to allocate the data collection into own class. Also, the T-SNE applied to visualise the essence of fraudulent and non-fraudulent transactions. Results obtained are helpful because that credit card fraud will be easily classified with 0.98%.

Semi supervised algorithms using majority voting applied in Pratap & Vijayaraghavulu (2021) ; in this study, 12 ML algorithms applied. Firstly, the standard models are used. Secondly, AdaBoost and majority voting added. Result indicates that the Majority voting technique achieves high accuracy.

Deep learning

Deep learning (DL) is subsection of ML uses data to teach computers how to perform tasks. The fundamental tenet of DL is that as we expand our NN and train them with new data, their performance continues to improve. The main advantage of DL over traditional ML is its higher performance on large datasets. The most frequently used DL algorithms in cybersecurity are feed forwards neural networks (FNNs), stacked autoencoders (SAE), and convolutional neural networks (CNNs). As shown in Fig. 3 , DL techniques have been used in 34 reviewed articles. A total of 39 reviewed articles used combination of DL and ML techniques to detect fraud in credit card.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1278-g003.jpg

An artificial neural network (ANN) employs cognitive computing to aid in the development of machines capable of employing self-learning algorithms including pattern recognition, natural language processing, and data mining. ANN presents more accurate results because it learns from the patterns of authorized behaviour and thus distinguishes between ‘fraud’ and ‘non-fraud’ in credit card transaction. We explored 36 articles that used ANN in our review. In Agarwal (2021) , ANN implemented for identity theft detection. The proposed model aims to use the different layers in a NN to determine the fraud transaction. The result shows that applying an ANN gives accuracy nearly equal to 100%. The result shows that ANN is best suited for determining if a transaction is fraudulent or not. New recent study applied ANN to detect fraud. The ANN technique has been used then compared with ML algorithms such as SVM, KNN. The result shows that ANN gives accuracy more than other ML algorithms, the suggested model is optimal for detecting credit card fraud ( Asha & Suresh Kumar, 2021 ).

In Abdulsalami et al. (2019) , back-propagation neural network (BPNN) and K-means are applied. The results indicate that the BPNN is more accurate than K-means algorithm. BPNN obtained accuracy of 79.9%. The results also indicate that K-means reduced prediction time provided it and advantage over BPNN. In Daliri (2020) harmony search algorithm with ANN (NNHS) are applied to improve fraud detection in banking system. The results show acceptable capability in fraud detection based on the information of customers. In Oumar & Augustin (2019) ANN with LR applied for fraud detection. Back-propagation has decreased the error function and enabled the model to discriminate between a fraudulent and a legitimate transaction. The suggested model is 99.48% accurate in its predictions and highly reliable.

Multilayer perceptron (MLP) is the most approach in ML because to its excellent accuracy in approximation nonlinear function. MLP comprises of three distinct layers. We explored 14 articles that used MLP in our review. In Alias, Ibrahim & Zin (2019) , MLP and fifteen other types of supervised ML techniques are examined to determine the one with highest accuracy for detecting fraudulent transaction. The result shows that MLP generated the greatest detection accuracy of 15 algorithms, at 98%. Can et al. (2020) applied MLP and other ML techniques such as DT, RF, and NB. Regarding amount-based profiling, both MLP and classifiers demonstrated substantial improvements. In Faridpour & Moradi (2020) , a novel ML-based model for detecting fraud in banking transaction utilising customer profile data is provided. In the proposed model, bank transactional data is utilised and an MLP with adjustable learning rate is trained to demonstrate the transaction authenticity, thus improving detection process. The suggested model surpasses SVM and LR. The accuracy of the proposed model is 0.9990.

Convolution neural network (CNN) is composed of multiple layers, output of which are used as inputs to layers that follow. ConvNET’s purpose is to reduce the input into a framework that is easier to comprehend, without sacrificing crucial information for making accurate predictions. CNN used in seven articles in the review. In Agarwal et al. (2021) , DL techniques like CNN, BILSTM with ATTENTION layer have been used to detect and classify the illegitimate transactions. The CNN-Bi-LSTM-ATTENTION model detects the fraudulent class with high accuracy. Analysis shows that the model is adequate and yields an accuracy of 95%. The results demonstrate that the addition of an attention layer increases the performance of the model, allowing it to accurately discriminate between fraudulent and legitimate transactions. A CNN, NB, DT, and RF hybrid model is deployed in Aswathy & Samuel (2019) , these algorithms are used as single models. Then these are used as hybrid models using majority voting technique. Adaptive boosting algorithm was used to boost the performance of classifiers.

DNNs, which provide potent tools for automatically producing high-level abstractions of complicated multimodal data, have recently garnered a great deal of interest from business and academics. DNNs learn features on their own, resulting in an increasingly accurate learning process. DNNs have been shown to be more efficient and accurate. Four studies employed DNN. In Arya & Sastry (2020) , the proposed model is flexible to data disparity and resistant to hidden transaction patterns. Adaptive optimisation is recommended to improve fraud prediction. Result demonstrates its superiority over current other methods.

Credit card fraud detection using uncertainty-aware DL was implemented in Habibpour et al. (2021) . It is vital to evaluate the uncertainty of DNN predictions. According to the study, there are three uncertainty quantification (UQ) techniques, ensemble, Monte Carlo dropout, and ensemble Monte Carlo dropout that can be used to quantify the level of uncertainty associated with predictions and produce a categorisation that is reliable. According to the findings, the ensemble method is superior at capturing the uncertainty related to predictions.

Deep convolution neural network (DCNN) applied in four articles. The DCNN technique can improve detection accuracy when a huge volume of data is involved. In Chen & Lai (2021) , existing ML models, including LR, SVM, and RF, as well as auto-encoder and other DL models. Results show a detection accuracy of 99% was attained over a 45-s duration. Despite the vast quantity of data, the model provides enhanced detection. DL technique provides high accuracy and rapid pattern in detecting complex and unknown patterns. 1DCNN, 2DCNN, and DCNN have also been utilised to detect credit card cyber fraud in Cheng et al. (2020) , Deepika & Senthil (2019) , Nguyen et al. (2020) .

A recurrent neural network, often known as an RNN, is a structure that used to remember previous input sequences. It is comprised of links between the internal nodes of a directed graph. Depending on the amount of their internal memory. RNN applied in seven articles in this review ( Bandyopadhyay & Dutta, 2020 ; Chen & Lai, 2021 ; Forough & Momtazi, 2021 ; Hussein et al., 2021 ; Osegi & Jumbo, 2021 ; Sadgali, Sael & Benabbou, 2021 ; Zhang et al., 2021 ). In Bandyopadhyay & Dutta (2020) , Implementing and applying RNN on synthetic dataset. The suggested model can detect fraudulent transaction with a 99.87% accuracy. The outcomes demonstrate that the approach is relevant and appropriate for detecting fraud. In Forough & Momtazi (2021) , a deep RNN-based ensemble model and an ANN-based voting approach proposed. The ensemble model leverages a variety of RNN as the fundamental classifier and combines output using an FFNN as voting method. Classification employs a number of GRU or LSTM network. The outcomes indicate that the suggested model outperforms competing models. The proposed model is superior to existing models in this field. Bidirectional gated recurrent unit (BGRU) is applied in Sadgali, Sael & Benabbou (2021) . Algorithms such as, GRU, LSTM, BRU, and SMOTE utilised in this model. BGRU obtained a high accuracy of 97.16%.

Long short-term memory (LSTM) is helpful technique to predict fraud because of the history knowledge it contains and the link that exists between prediction outputs and historical input. LSTM architecture enables sequence prediction problems to be learned through long-term reliance. LSTM and BiLSTM applied in eight articles ( Agarwal et al., 2021 ; Alghofaili, Albattah & Rassam, 2020 ; Benchaji, Douzi & El Ouahidi, 2021 ; Cheon et al., 2021 ; Forough & Momtazi, 2021 ; Nguyen et al., 2020 ; Osegi & Jumbo, 2021 ; Sadgali, Sael & Benabbou, 2021 ). In Alghofaili, Albattah & Rassam (2020) , a new model developed to improve both the present detection techniques and the detection accuracy in light of huge data. Findings demonstrated that LSTM performed perfectly, achieving 99.95% accuracy. Benchaji, Douzi & El Ouahidi (2021) recommended a model with the purpose of recording the previous purchasing behaviour of card holders. The results show that LSTM model obtained a high level of performance and accuracy.

DL based hybrid approach of detecting fraudulent transactions applied in Cheon et al. (2021) . The new model includes a Bi-LSTM-autoencoder with isolation forest. This model proposed a detection rate of 87% for fraudulent transactions. The suggested model scored the highest mark. This model has the potential to be employed as an effective method for detecting fraud.

Deep belief network (DBN) applied in one article ( Zhang et al., 2021 ). The new model utilised DBN and advanced feature engineering base on a Homogeneity-oriented behaviour analysis (HOBA). Results indicate that suggested model is effective and capable to identify fraud. DBN classifier with HOBA achieves a performance that is superior to that of the standard models.

Boltzmann machine (RBM) comprises of visible and hidden layers linked by symmetric weights. The neurones in the visible layer correspond to the X inputs, whilst the responses of the neurones H in hidden layer reflect the eventuality distribution of the inputs. RBM appeared in three articles in the review ( Niu, Wang & Yang, 2019 ; Suthan, 2021 ; Suvarna & Kowshalya, 2020 ). In Niu, Wang & Yang (2019) , supervised and unsupervised techniques have been applied. XGB and RF as a supervised technique obtain the best performance with AUROC is 0.961. RBM provides the best performance among unsupervised techniques. Results indicate that supervised models outperform the unsupervised models. Because of the problem of inadequate annotation and data imbalance, unsupervised techniques remain promising for credit card fraud detection.

A generative network (GAN) is comprised of two feed forward neural network, a Generate and a Discriminator, competing each other. The G produces new candidates while the D evaluates the quality. Each of the two networks is typically a DNN with multiple layers interconnected. GAN appeared in seven articles ( Ba, 2019 ; Fiore et al., 2019 ; Tingfei, Guangquan & Kuihua, 2020 ; Hwang & Kim, 2020 ; Niu, Wang & Yang, 2019 ; Wu, Cui & Welsch, 2020 ; Veigas, Regulagadda & Kokatnoor, 2021 ). In Ba (2019) , GANs employed as an oversampling technique. The findings indicate that Wasserstein-GAN is reliable during training and creates accurate fraudulent transactions comparing with other GANs. In Fiore et al. (2019) , GAN employed to enhance the effectiveness of classification. A model for addressing the problem of class imbalance is described. GAN trained to generate minority class instances, then combined with training data to create an augmented training set to enhance performance. The results indicate that a classifier trained on expanded data outperforms its original equivalent.

The input-output mapping between the encoding and decoding phases is discovered by the autoencoder (AE). The input is mapped by the encoder to the hidden layer, and the input is rebuilt by the decoder using the hidden layer as the output layer. AE appeared in 18 articles in this review. AE mentioned in 18 articles within this review. In Misra et al. (2020) , autoencoder model for cyber fraud detection is applied. Two-stage model with an autoencoder that coverts the transaction characteristics to a lower-dimensional feature vector at the first step. A classifier is then fed these feature vectors in a subsequent step. Results show that the suggested model outperform other models.

In Wu, Cui & Welsch (2020) , dual autoencoders generative adversarial networks (DAEGAN) is employed for the imbalanced classification problem. The new model trains GAN to duplicate fraudulent transaction for autoencoder training. To create two sets of features, two autoencoders encode the samples. The new model outperforms several classification algorithms. Due to extremely skewed class distributions, credit card datasets present classification situations that are unbalanced. To address this difficulty. New model proposes in Tingfei, Guangquan & Kuihua (2020) employing oversampling technique based on variational automatic coding (VAE) in combination with DL techniques. Results demonstrate that the VAE model outperforms synthetic minority oversampling strategies and conventional DNN methods. In addition, it performs better than previous oversampling techniques based on GAN models.

Metaheuristic techniques

In Makolo & Adeboye (2021) , a new hybrid model is created by applying Genetic algorithm and multivariate normal distribution to unbalanced dataset. After trained on the same dataset, the prediction accuracy compared to that of DT, ANN, and SVM. The model yielded a remarkable F-score of 93.5%, whereas ANN is 68.5%, DT is 80.0%, and SVM is 84.2%. Enhanced hybrid system for credit card fraud prediction in Nwogu & Nwachukwu (2019) . The genetic algorithm with RF model optimisation (GAORF) is employed. Utilising real and genetic algorithms. This model’s classification accuracy enhanced through the optimisation of RF models. This can assist in resolving the problem of a shortage of transaction data, as well as the problem of inadequate optimisation and convergence of RF algorithms. The model improved significantly reducing the overall number of misclassifications.

The use of harmony search algorithm (HAS) with NN to increase fraud detection is described in Daliri (2020) . The model uses HAS to optimise the parameters of ANN. Proposed NNHS model provides a method based on HAS that successfully predicts the optimal structure for ANN and identifies the algorithm hidden inside the data. The comparisons revealed that the highest accuracy achieved is 86%.

Instance-based learning

In Hussein, Abbas & Mahdi (2021) , fraud detection model utilising various ML algorithm, including NB, DR, rules classifier, lazy classifier (IBK, LWL, and KStar), meta classifier, and function classifier, implemented in this study. Results indicate that lazy classifier (LMT) technique is the most accurate, with an accuracy of 82.086%.

Percentage of articles that address supervised, unsupervised, or semi-supervised in credit fraud detection?

This section answers RQ2 which attempts to show the proportion of gathered research article that employ supervised, unsupervised, or semi-supervised techniques. We examined credit card fraud detection techniques described in research article. According to Fig. 4 , 74% of the chosen article utilised the supervised technique. Consequently, supervised technique is the most commonly employed in the reviewed article. In contrast, 12% utilised unsupervised techniques, and 12% utilised both supervised and unsupervised techniques. A total of 2% of reviewed article utilised semi-supervised learning. Additionally, 1% utilised reinforcement learning. Supervised and unsupervised learning have been implemented in 2019, 2020, and 2021. While semi-supervised learning only implemented three times in 2021. In the same manner, reinforcement learning has only been utilised in 2021. Compared to supervised and unsupervised learning, semi-supervised learning and reinforcement learning were not embraced by a large number of researchers. The ML/DL techniques type of each study article is listed in Table C1 for more information. The proportion of supervised, unsupervised, and semi-supervised is showed in Fig. 4 .

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1278-g004.jpg

Article IDML/DL techniquePerformance metricsResults and valueDatasetFuture work
A1Back propagation neural network (BPNN).
K-means
Precision, recall error rate
FPR, accuracy hit and miss rate.
There is a significance difference between K-means and BPNN. BPNN model has higher accuracy comparing with K-means. BPNN accuracy = 93.1%. K-means accuracy = 79.9%.Real credit card data/European cardholdersComparing the effect of combing these two models together so as to optimise the accuracy.
A2LRAccuracy,
recall, precision
The model reached high performance using imbalanced dataset. L-BFGS is 0.980. Lib-linear is 0.9816. Newton-CG is 0.9812. Sag is 0.997. Saga is 0.996.Real data/European cardholdersNA
A3CNN
BILSTM
Confusion matrix,
accuracy, precision, recall
The proposed model (CNN-BI-LSTM-ATTENTION) achieved high accuracy in fraud detecting. Adding attention layer enhances performance. Accuracy is 95%.IEEE-CIS fraud detection from NA
A4ANNAccuracy
Precision
Recall
The ANN proposed model is best suited for detecting fraud. The accuracy around 100%.NACombining this algorithm with other algorithms.
A5HMMNAApplying HMM model to detect credit card fraud would be successful.NANA
A6RF
SMOTE
Sensitivity
Specificity
Precision
F-measure
Accuracy
Misclassification rate, ROC
The model showed high performance. When using RF the large number of datasets can be processed automatically. Quick RF classifier accuracy with imbalanced dataset is 98%. Quick RF Classifier accuracy with balanced dataset is 99%.Real-world data/UCSD FICO/2009NA
A7ADA boost majority balloting
NB, QDA, LR, DT, RF, NN, KNN, and SVM.
-Accuracy
-Matthews correlation coefficient (MCC)
Results showed that using bulk balloting technique achieves high accuracy in detecting fraud.
NB: 0.9458. QDA: 0.9544. LR:0.9913 DT: 0.9837. RF: 0.9869. NN:0.971
KNN: 0.9718. SVM: 0.8526.
Genuine world MasterCard data set.Procedures be stretched out to the internet becoming acquainted with designs.
A8K-means
RF, J48
SVM
AccuracyResults showed that RF is better on global dataset with 92.1% accuracy.
K-means: 85.6%. RF: 92.1%. J48 DT: 89.3%. SVM: 89.9%.
Two types of data:
Global/Bank.
User dataset
For this model, the transaction time is required.
A9FraudMiner
RUSBoost
Bagged
KNN
SVM
Sensitivity
False alarm rate
Balanced Classification rate, MCC
This model showed great performance with catch rate 85.3% and MCC of 0.83.Public dataset/Provided by FISCO/UCSDNA
A10Autoencoder
LR
Confusion matrix
Accuracy
Recall
F1-score
Precision
Results showed that proposed model can detect fraud transaction between 64%, 79%, and 91%. This model is better than LR (57%) with unbalanced dataset. The model solved data balancing problem. : accuracy is 97.23. Recall is 0.90. Precision is 0.06. The F1-score is 0.12. While results on accuracy is 99.91. Recall is 0.57. Precision is 0.93 and F1-score is 0.71.Real dataset from ULBCompare the performance of this model with other classification algorithms.
A11LR, KNNConfusion matrix
Accuracy
Sensitivity
Error rate
The LR-based model is the best comparing with KNN and voting classifier.
Accuracy is 97.2%. Sensitivity is 97%. Error rate is 2.8%.
Real dataset/European cardholdersProposed model suffers in the response time.
A12LSTMAccuracy
Loss rate
Execution time
Results showed great performance of LSTM comparing with Autoencoder. Model accuracy is 99.95%.Real dataset/European cardholdersCalculate timing and location of fraud
A13LR, MLP, XGBoost, K-fold cross, RF, Bagging Gradient Boosting, Voting, KNN SVM, GNB.Accuracy
Confusion matrix
MLP achieved highest accuracy comparing with 15 algorithms. The accuracy is 98%Real dataset/European cardholdersFurther research of MLP to increase the detection performance.
A14LR
RF. XG
CatBoost (CB),
F1-score
AUC
Savings
Results showed that the CatBoost obtained the best savings with 0.7158 alone. When applying SMOTE the savings is 0.971. When applying SMOTE and BMR, the saving is 0.9762. XGBoost achieved the best saving 0.757 when applying BMR without the SMOTE. XG + BMR: F1-score is (0.2890). AUC is (0.9699). Savings is (0.7570). CB + SMOTE + BMR: F1-score is (0.8250). AUC is (0.9999). Savings is (0.9762).Real dataset/European cardholdersUsing another dataset. Also testing XG and CB
A15LR, RF
KNN
DT
Accuracy
Precision
Recall
The results show that RF achieved highest performance.
RF: accuracy (95.19%), precision (0.9794), recall (0.9226).
Real dataset/Europeans cardholdersOther data balancing techniques be explored.
A16SVM
RUSBoost
LR, MLP, DT, KNN, AdaBoost, RF
Accuracy
Precision
Specificity
F1-score
AUPR, ROC
The results showed that CtRUSBoost outperformed other algorithms. Results scores on three dataset: A, B, and C.
sensitivity (96.30), specificity (85.60), precision (94.20), F1-score (88.60). Sensitivity (99.60), specificity (98.70), precision (95.70), F1-score (97.60). Dataset C: sensitivity 100), specificity (99.80), precision (99.30), F1-score (99.60).
Three datasets from (A, B, C)Customized the model and adding new algorithms.
A17Social spider optimisation (SSO), ant colony optimisation (ACO), ANNSensitivity
Specificity
Accuracy
F-score
Kappa
The model SSO-ANN achieved high performance with 93.20% accuracy on Germane dataset, and 92.82% on Kaggle dataset.Benchmark dataset.
Kaggle dataset
Improving the model by using clustering techniques.
A18Deep ensemble algorithm (DEAL). CNN. DNN. MLP, Auto encoder. SVM, LRMean absolute error (MAE)
Fraud catching
rate (FCR)
Accuracy
DEAL model obtained high performance in detecting fraud. Model accuracy is 99.81%Real dataset/Europeans cardholdersUsing AI and IoT in cloud computing
A19SVM, KNN, ANNConfusion matrix,
accuracy, precision, recall
ANN provides high accuracy in detecting fraud comparing with the unsupervised algorithms.Real dataset/Europeans cardholdersNa
A20DT IFDTC4.5
intuitionistic fuzzy logic
Accuracy,
sensitivity, false positive rate, specificity
IFDTC4.5 outperforms existing techniques. The model able to detect fraud efficiently. However, still the frauds cannot be eliminated by 100%.Singaporean bank and one similar synthetic data set.Add multi factor authentication using the biometrics like iris, voice .
A21NB, DT, RF, CNNPrecision,
Recall,
Accuracy
Algorithms like NB, DT, RF and CNN are used. These algorithms are used as single models. Then these are used as hybrid models using majority voting technique. Adaptive boost also used in the model.Publicly available credit card data set.This model will extend to online model.
A22SVM, NB, KNN, RFAccuracy
Sensitivity
Specificity
Precision
Results showed that RF performs better than other algorithms. Applying sampling approach will improve the performance. NB: 97.80%. SVM: 97%. KNN: 46.98%. RF: 98.23%.Real dataset/European cardholders/ULBUsing huge dataset instead of sampling techniques
A23Generative adversarial networks GANsAUC
AUPRC
Recall
F1-score
Precision
The results show that applying Wasserstein-GAN will improve detecting fraudulent transactions comparing with traditional GAN. WCGAN model achieves: AUC is 0.948. AUPRC is 0.717. Recall is 0.6420. Precision is 0.852. F1-score is 0.710.NANA
A24LR, KNN, RF, NB, MLP, AdaBoost, pipelingAccuracy
Precision
Recall, F1-score
The results showed that applying pipeling can improve the model’s performance. Accuracy: 00.99%. Precision: 0.84. Recall: 0.86. F1-score: 0.85.Real dataset/European cardholdersNA
A25GNB, LR, DT, RFAccuracy
Recall
Precision
F1-score, MSE
The result showed that DT algorithm is the best with an accuracy: 0.999. Recall: 0.782. Precision: 0.766. F1-score: 0.774. MSE : 0.0008Real dataset/European cardholders/ULBNA
A26RNNAccuracy, recall, precision
F1-score, MSE
The result showed that RNN model is capable in detecting fraud. The accuracy is 99.87%. MSE is 0.01. F1-score is 0.99.Synthetic dataset and real datasetNA
A27DT, NB, SVM
AdaBoost
Accuracy
Sensitivity
Specificity
Precision
ROC, F1-measure
The results showed that applying Boosting with DT outperforms other methods. The model obtained highest accuracy of 98.3%. F measure is 93.98%. Using boosting techniques improve the performance.Real dataset/Europeans cardholdersNA
A28DT,SVM, k-means
Optimal resampling strategy, C4.5 DT
AdaBoost
Accuracy
Sensitivity
Cost sensitive
The suggested model obtained high performance with 96.59% accuracy and 67.52% sensitivity.Real dataset/CB bank/Brazilian bankCompare this model with other models
A29LSTMMSE
MAE, RMSE
Results showed that the LSTM model achieves perfect performance. AUC: 0.995. MSE: 0.0035. MAE:0.0065From the Kaggle website.Further study of other types of RNN technique.
A30SMOTE, LOF, isolation forest, SVM, LR, DT, RFAccuracy
Precision
MCC
LR, DT and RF are the best algorithms. The better parameter to deal with unbalanced data is MCC. Classifiers performing better when using SOMTE. RF: accuracy (0.9998), precision (0.9996), MCC (0.9996). DT: accuracy (0.9708), precision, (0.9814), MCC (0.9420). LR: accuracy (0.9718), precision, (0.9831), MCC (0.9438).Real dataset/Europeans cardholders/KaggleNA
A31AutoencodersNAThe results showed that Autoencoders model most promising for detecting fraud in credit card.Real data/European cardholdersUsing balanced dataset and unhidden features.
A32NB using robust scalingAccuracy,
Precision, Recall Sensitivity
AUC score
F1-score
The result shows NB which used Robust Scaleris showed improvements in predicting and detecting fraud in credit card. Accuracy: 97.78%. Precision: 99.79%. Recall: 97.78. F1-score 98.71. AUC: 95.73.Real dataset/Europeans cardholders/KaggleNA
A33NB, RF, DT, MLPPrecision
Recall, F-measure
Specificity
The result showed that the amount-based profiling both MLP and RF obtained high improvement. This model boost fraud detection.Dataset from 35 banks in TurkeyThe high number of false positive needs further study.
A34CNN
DAE
MLP
Precision
Recall
AUC
Confusion matrix
ROC curves
Results showed that DNN is capable in fraud detection. MLP2OH128H918 obtained an alert reduction rate. Threshold/D (0:1) of 35.16% when capturing 91.79% fraud cases. The rate of misclassification is 8.21%. Threshold/D (0:2) of 41.47% when capturing 87.75% fraud cases. Misclassification rate is 12.25%.Dataset from a Spanish organisation.NA
A35DCNN, RNN, SVM, LR, RF.AccuracyProposed model obtained accuracy of 99% in detecting fraud in credit card in time duration of 45 seconds.Real dataset/Europeans cardholdersApplying the fraud location and timing calculation.
A363DCNN, Spatial-temporal attention-based graph network (STAGN)AUC
Precision
recall
The suggested model showed a high performance in detecting fraud in credit card. The model is effective and accurate.Real-world data (Commercial bank)Builds a real-time detection system.
A37Bi-LSTM-autoencoder and isolation forestAccuracy
Confusion matrix
The suggested hybrid model contains Bi-LSTM Autoencoder and the isolation forest with unbalanced data. This model obtained the highest detection rate with 87%Real dataset/Europeans cardholdersNA
A38KNN, DT, RF
LR, NB
Confusion matrix recall/sensitivity precision timeHybrid classifier/combination of supervised classifiers which worked better than any other single classifier. KNN + DT: Sensitivity: 85.63%. Precision: 86.90%. KNN + LR: Sensitivity: 57%. Precision: 85.55%. KNN + RF: Sensitivity: 82%. Precision: 95.89%. KNN + NB: Sensitivity: 58%. Precision: 80.57%Real dataset/Europeans cardholders/KaggleUse unsupervised combined classifier for batter result and use more classifier.
A39LR, RF, DT, KNNAccuracy, specificity, precision,
sensitivity
The accuracy of LR is 94.9%, DT accuracy is 91.9%, and RF accuracy is 92.9%. KNN has a 93.9% success rate. Despite LR was more accurate, majority of this algorithm under fit. Thus, KNN is the best technique.Real dataset/Europeans cardholders/KaggleNA
A40ANN
Harmony search algorithm (HSA)
Accuracy, recall
SM calculation
confusion matrix
The suggested model NNHS provides a solution using HAS for ANN. The best accuracy achieved is 86. Recall is 87.German dataset available at the UCI websiteNA
A41HMM, SMOTE
DBSCAN
Precision
Recall
F1-score
Proposed approach (SMOTE + DBSCAN + HMM) performed relatively better for all the various hidden states.Simulated mobile based transactionsNA
A42Deep reinforcement.
Resampling SMOTE and ADASYN
Accuracy
Precision
Sensitivity
Specificity
The proposed model of ML with two resampling techniques and DRL is reliable. SMOTE and ADASYN are used to resampling dataset. The proposed system obtained high accuracy with 99%. RF and XGBoost are the best techniques.Real dataset/Europeans cardholders/KaggleExtend dataset. Applying new ML and DL algorithms
A43HMMAccuracyThe model is very efficient and showed the importance in learning spending behaviour. The accuracy is 80%.NANA
A44K-means. PCA
T-SNE
SOM
AccuracyThe model obtained an accuracy of 90%. The results were vary as the initialization of the weight of nodes SOM grid is done by randomly records or patterns.Statlog
Australian dataset.
Trying different iterations and store weights of SOM
A45KNN, RF, NBAccuracyKNN showed the highest accuracy than the RF algorithm and NB.Real-world datasetMore ML supervised algorithm can be added.
A46DCNN
space invariant ANN
AccuracyThe results showed that proposed robust SIANN (RSIANN) is outperformed other techniques. The accuracy is 85%. SVM accuracy is: 0.77. RF accuracy is: 0.72. NB accuracy is: 0.70.
DCNN accuracy is 0.82.
NAUsing kernels technique also using pre trained CNN.
A47SVM, NB, DTAccuracyResults showed that the new system will reduce the frauds which are happening while transactions.NANA
A48SVM, GNB, DTExecution timesThe proposed model using fusion of detection algorithms and AI. Support Vector Classifier take less time. SVC obtained solution with less time. 0.191343 ms.Real data/European cardholdersUsing other datasets also applying other algorithms
A49RFAccuracyThe result showed that RF obtained high performance. However, the speed will suffer. On the other hand, SVM suffer from unbalanced data. The SVM obtained good performance.NANA
A50NB, DT, RF, LR
AdaBoost
Gradient Boost
XGBoost
Accuracy
Recall
Precision
Confusion matrix
Results showed that XGBoost is the best boosting technique in predicting fraud. The accuracy is 100%. F1-score is 0.88. NB classifier: 95.6%. DT classifier: 90.0%. RF classifier: 97.7%. LR: 98.3%. AdaBoost: 99.9%. Gradient boost: 99.9%. XGBoost: 100%.Real dataset/Europeans cardholders/KaggleNA
A51RF, DT
LR, LOF
Isolation forest
F1-scores
Precision
Recall
Results showed that isolation forest obtained better efficiency. RF: 95.5%. DT: 94.3%. LR: 90%. Isolation forest: 99.77%. Local outlier factor: 99.69%.Real dataset/Europeans cardholdersUsing NN for training the system, to obtain better accuracy.
A52Semi-supervised learning.
AutoEncoders
Precision
Recall
F1-score
The results show that using semi-supervised technique is efficient to detect fraud. Accuracy is 0.98%.Real dataset/Europeans cardholdersInvestigate the intelligent dependent attributes.
A53AutoencodersAccuracy
Precision
F1-score
Sensitivity
The proposed model obtained high performance. SSAE+LDA model showed significant improvement comparing with other research on same dataset. Accuracy is 90%, F1-score is 90%, precision is 91%, sensitivity is 90%.Real dataset/UCIStudy effect of optimizers, stacking diverse autoencoders
A54Light gradient boosting. RFAccuracy
AUC
This study only used to identify the fraudulent user. The results show that light gradient boosting obtained great performance with a total recall rate of 99%.Real dataset/Europeans cardholdersFurther study on how to judge fraud ring based on relation map.
A55XGBoosting
Neural network
Accuracy
Precision, F1-score
Recall, ROC, AUC
Results indicated that XGBoosting performs better when comparing with other ensemble models. XGB AUS is 0.778Consumer’s dataset/Taiwan.NA
A56MLP
LR, SVM. Gradient descent algorithms.
AccuracyResults showed that proposed model performs good comparing with LR and SVM. MLP Accuracy: 0.9990
LR Accuracy: 0.9723
SVM Accuracy: 0.9345
NAA dependent variable with numerous classifications can be used.
A57GANAccuracy
Precision
The model obtained an improved sensitivity. GAN model can training of small dataset.
GAN Accuracy: 0.99962. Precision: 0.9583.
Real dataset/Europeans cardholdersDevelop a strategy to reduce the decreasing in specificity to minimum
A58Ensemble learning approach
RNN, FFNN
LSTM, GRU
Recall
Precision
F1-score
Results showed that proposed model based on LSTM with ensemble GRU on two datasets outperforms other models. The new model is efficient in term of realtime.-Real dataset/Europeans cardholders. -Brazilian bankDevelop new Model to take advantage of deep encoder and decoder.
A59CatBoost
XGBoost
Stochastic gradient boosting
Precision
Recall
Confusion matrix
Results showed that the CatBoost is the best comparing with XGBoost and SGB boosting. CatBoost accuracy is 0.921. Recall is 1.00. XGBoost accuracy is 0.914. Recall is 0.99. SGB accuracy is 0.907. Recall is 0.97.NANew models using supervised and unsupervised.
A60LR, ANN
SVM, RF
Boosted Tree
Kolmogorov-Smirnov Formula.
FDR
The new model using boosted tree shows best performance in fraud detection.
FDR = 49.83%
Real dataset/government agency/USASome data and fields such as time, day point of sale should be added.
A61NB, RF
LR, SVM
AUC
Precision
Recall
NB technique shows high performance comparing with other techniques. Accuracy is 80.4%. Area under curve is 96.3%Real dataset/Europeans cardholdersDevelop another model for sampling imbalanced data.
A62Isolation forestPrecision-recall curve (AUCPR)
AUC
The proposed model demonstrate the efficiency in fraud detection, observed to be 98.72%, which indicates a significantly better approach than other techniques.Real dataset/Europeans cardholders/KaggleFinancial institutions must make available data set. Thus, outcome will be more efficient.
A63UQ techniques:
MCD
EMCD
Confusion matrix
UAcc, USen
USpe, UPre
The suggested model using UQ provide high performance in predicting fraud. Ensemble technique is efficient in fraud prediction. MCD: UAcc (0.82)
Ensemble: UAcc (0.85). EMCD: UAcc (0.84)
Publicly available dataset/Vesta corporationThe quality of final uncertainty estimates should be improved.
A64RF, LR, DT, GNB combination with ensemble.Matthews correlation coefficient (MCC)The accuracy of all the five models is 100% & even the MCC score is +1 for the models been evaluated.Real dataset/Europeans cardholdersNA
A65DT augmented with regression analysis.Accuracy
Confusion matrix
The results showed that new model successfully verified the injected intrusions. Accuracy is 81.6% with 18.4% misclassification error.Dataset from the UCI repositoryNA
A66SOM
ANN
AccuracyUsing hybrid model of SOM and ANN achieved high performance compared to use ANN or SOM alone.Dataset from the UCI repositoryCreating a NN with some optimization technique.
A67LR, RF, and CatBoostAccuracy
Precision
Recall
The result showed that model of RF with CatBoost provides efficient accuracy. RF technique has the most elevated incentive than the LR and CatBoost algorithm.. RF: Accuracy (99.95). CatBoost: Accuracy (99.93). LR: Accuracy (99.88).Real dataset/Europeans cardholders/KaggleNA
A68Deep forest
XGBoost
AE, gcForest
Accuracy
Precision, Recall
Confusion matrix
The proposed model showed high performance in detecting card fraud.Dataset from China’s bank.NA
A69GAN, variational
automatic coding (VAE)
Accuracy
F-measure
Precision
The model showed that VAE-based oversampling performs better than the normal DNN and synthetic minority over sampling technique as it can solve the imbalanced problem.Real dataset/Europeans cardholdersImproving the model recall rate
A70C4.5 DT, NB
Bagging ensemble
Accuracy
Precision, Recall
The model shows that bagging with C4.5 DT is the best algorithm with rate of 1,000 for class 0.0825 for class 1.Real dataset/Europeans cardholdersNA
A71Fuzzy rough nearest neighbor (FRNN)
SMO, LR, MLP, NB, IBK, RF
Positive predictive value (PPV).
F-measure
Specificity
PPV, F-measure
The results showed that the suggested model provided significant results. The rate of detection is 84.90, AUC is 0.8555/Australian dataset. While 76.30% detection rate with 0.679 AUC/German dataset.Australian dataset/German datasetOther ensemble techniques should be considered.
A72NB, DT, (LMT, J48,)
Rules classifier
Lazy classifier
Meta classifier
Accuracy
Recall
Precision
F1-score
The result showed that applying LMT algorithm to classification fraud is better than other techniques. LMT model obtained 82.08% accuracy.Client’s data in Taiwan. Data available on:
.
Further study to find out new algorithms with higher voting.
A73Feature maps and GANs
SVM
CNN
AUC score
ROC
Confusion matrix
Results showed that the suggested model is applicable to test datasets and less time is required for learning. SVM obtained better detection. However, learning time exceeds other models when dataset increase. CNN-based model needs long time. SOMTE performance is effective.Machine learning group ULB. Kaggle.Change on oversampling techniques in the suggested model.
A74CNN,SVM, RF
isolation forest
Autoencoder
Accuracy
Precision
ML models have been implemented for classification purpose. Achieved competitive accuracy in CNN model. CNN: Accuracy (99.51).Real dataset/Europeans cardholdersPredict fraud in real-time. Applying service on the cloud platform.
A75LR, NB, KNNAccuracy, Recall
Specificity
Sensitivity
F-measure
Precision
Results showed that LR showed optimal performance. It is getting high accuracy of 95%. NB accuracy is 91%. KNN accuracy is 75%. LR showed better sensitivity, precision, specificity, and F-measure.Real dataset/Europeans cardholders/KaggleNA
A76Isolate forest and local outlier factor (LOF) algorithmsAccuracy
Precision
Recall
F-measure
The result showed that local outlier factor achieved high accuracy with 97%. Isolation forest accuracy is 76%Real dataset/Europeans cardholders/KaggleNA
A77RF, DTAccuracy
Sensitivity
Specificity
Precision
The result showed that this model is accurate on large dataset with 98.6% accuracy. RF provides high performance, however, it needs many training data.Dataset from product reviews on credit card transaction.Develop AI/ML/DL techniques
A78Multiple classifiers system (MCS). NB, C4.5, KNN, ANN, SVM.TPR
TNR
Accuracy
Results showed that the suggested model can tackle the unbalanced class distribution and overlapping class samples. The proposed model obtained high TPR, which is 0.840 and 0.930 accuracy. TNR is 0.955.Dataset1: ULB
Dataset2: credit cardholders/Taiwan bank
Considering combining the DL algorithms for promising detection results.
A79KNN, SVM, LR
HYBRID NB-RF
XGB
Accuracy, recall, precision, TPR, FPR,Results showed that all proposed models are superior in performance. Staking classifier using LR as meta classifier is most promising then SVM, LR, KNN and HNB-RF. Stacking classifier accuracy is 0.95. RF accuracy is 0.94.Real dataset/Europeans cardholders/KaggleApplying Voting classifier.
A80Hybrid models using AdaBoost
and majority voting, NB, SVM
MCCResults showed that the majority voting obtained high accuracy. The best MCC score is 0.823.A publicly available data set/Turkish bank.Applying online learning models so we enable efficient fraud detection.
A81DT, LR, Shallow NN. Challenger model: DL model with ensemble.AUROC
K–S statistics
alert rate, recall precision
Results showed that after testing off-line and post-line, operate the FDS with DL model. This shown +3.8% improvement of recall. The hybrid ensemble model perform well in detecting fraud.Dataset from company/South KoreaNA
A82LR, NB, AdaBoost, and voting classifierAccuracy, recall, precision, sensitivity
F1-score
Results showed a good accuracy for
NB: 91.41%. LR: 94.51%. AdaBoost: 95.67%. Voting: 94.69%.
Real dataset/Europeans cardholders/KaggleA hybrid classification method will be designed.
A83Ensembles of classifiers based on DT, XGBoost and LightGBM.Accuracy,
precision, recall
AUC
Confusion matrix
The result showed that the ensemble of models allowed to detect maximum 85.7% of fraud. Accuracy is 79‒85%.Real dataset/Europeans cardholders/KaggleNA
A84AdaBoost voting
KNN, greater part casting ballot techniques.
MCCThe results showed that perfect MCC score achieved when using AdaBoost and greater part casting a ballot. Commotion from 10% to 30% included with data. The model yielded best MCC of 0.942.Informational index from a Turkish bankNA
A85RF, KNN, NB, SVMAccuracyThe result shows that RF has the highest accuracy of detection of fraud. RF accuracy is: 0.9996.NASeeking information from advanced technologies.
A86ANN, BBNConfusion matrixResult showed a Bayesian Network is more accurate than the NB Classifier. This is disturbed with using the fact of conditional dependence between the attributes in Bayesian Network, but it requires more difficult to calculation and as training process.Real dataset/Europeans cardholders/KaggleNA
A87KNN, NB, LRAccuracy, sensitivity, specificity,The result showed that KNN performed high performance of matrices except accuracy.Real data/European cardholdersNA
A88LR, RF, SVMAccuracy,
precision,
F1-score, recall
Compression between LR, RF and SVM is performed and the accuracy of LR is 77.97%, RF is 81.79% and SVM is 65.16. So, RF is better than the SVM and LR.Real dataset/UCINA
A89LR, RF, XGBoost, ANN, isolation forest, PCA with SVM.Accuracy, sensitivity, specificity, MCC precision, BCRResults show that RF and XGBoost provided better result than other models. The accuracy of XGBoost is 0.9951. RF accuracy is 0.9955.Mobile money transactions published on Kaggle.Combined ANN with genetic algorithm to enhance accuracy.
A90SVM, GA
Cuckoo search
Particle swarm
Accuracy
Precision
Recall
The results showed that Linear kernel function is the best. Accuracy is 91.56%. Radial basis used to enhance kernel accuracy. The accuracy improved from 42.86 to 98.05%. Overall, PSO-SVM better than CS-SVM and GA-SVM.Data from law enforcement department in ChinaLook for new algorithms to optimize SVM
A91AE-PRE
Bootstrap aggregating
Bagging
Accuracy
TPR, TNR
FPR ROC curve
AUC, MCC
The result shows that AE-PRF is efficient when dataset is unbalanced. AE-PRF obtained high performance in accuracy.Real dataset/Europeans cardholders/KaggleImprove AE-PRF model with adding fine-tuning the hyperparameters of AE and RF models.
A92Multi-perspective HMMsPR-AUCThe results showed that HMM model is powerful in detecting fraud.Real dataset/BelgianCombine LSTM with HMM-base features
A93C5.0, SVM, ANN NB, BBN, LR, KNN, artificial immune systems (AIS).Accuracy
Recall
Precision
The results showed that C5.0, SVM, and ANN are performing well with imbalanced classification problem. Even these techniques improve the classifier’s performance in fraud, high number of fraud cases continue undetected.Two dataset available at Develop new model with big data driven ecosystem.
A94Hybrid model:
DT, SVM, ANN
genetic algorithm (GA).
F-score
Accuracy
Recall
The results showed that the suggested hybrid model obtained high accuracy with 93.5% comparing with ANN, SVM, and DT. The hybrid model applied GA outperform other techniques.Realworld dataset from financial institutionReal-life test for the suggested model
A95DT, RF, KNN, LR
K-means, DBSCAN, MLP, NB, XGBoost
Gradient boost
Accuracy
Precision
Recall
F1-score
The result showed that RF yielded perfect performance result with accuracy 99.995. RF is suitable for large datasets.Real dataset/Europeans cardholders/KaggleNA
A96Local outlier factor.
Isolation forest
Precision
Accuracy
The results showed that the model reached over than 99.6% accuracy. Precision at 28%. When fed more data in the model, the precision raised to 33%.Dataset from German bank in 2006.Adding more algorithms. Using more dataset.
A97KNN, PCA,
SMOTE
Recall
Precision
F1-score
The results showed that the suggested model performed well. For KNN: Precision 98.32. F-score 97.44%. For Time subset when using the misclassified instance, precision is 100% and F-score is 98.24%.Real dataset/Europeans cardholders/KaggleKnow how PCA can affect the performance of a dataset.
A98KNN, DT, LR RF, XGBoostAccuracy
F1-score
Precision
Recall, AUC-ROC
The results show that the XGBoost and DT outruns all other algorithms in detecting fraud.Real dataset/Europeans cardholders/KaggleStudy on other ML algorithms and various forms of stacked classifiers.
A99Outlier detection
DT, RF and NN
Precision
Recall
ROC
Confusion matrix
The results showed that RF is the most precise and accurate technique. However, it takes long time to train. NN is the next best algorithm. DT is the least accurate. In term of time efficiency and computational resource utilization the NN is the best technique.Real dataset/Europeans cardholders/KaggleNA
A100NB, C4.5 DT, and bagging ensemble learner.Precision
Recall
PRC
The result showed that the performance is between 99.9% and 100%. The best classifier is C4.5 DT with 94.1% precision and 78.9% recall. The acceptable performance is bagging ensemble with 91.6% precision and 80.7% recall. As for the worst performance, it is the NB classifier with precision of 65.6% and a recall of 81%.Real dataset/Europeans cardholders/KaggleOther classifiers will be used and applied to a set of local data that will be collected from banks in
Iraq.
A101Autoencoders
MLP, KNN and LR
Accuracy Precision Recall
F1-score
Results showed that the suggested model maintains a good performance. It outperforms the systems based on either different classifiers or variants of autoencoder. It establishes the efficiency of proposed two stage model. Proposed method accuracy is 0.9994. Precision is 0.8534. F1-score is 0.8265.Dataset from ULB machine learning group on Kaggle.Proposed two stage model can be tuned to handle stream data. The model can be trained on a batch of transactions.
A102LOF, AdaBoost, RF, isolation forest, DT, KNN, HMM, GA, ANN, NB, LRAccuracy
Confusion matrix
Results showed that the local outlier factor accuracy is greater than other algorithms. Local outlier factor accuracy is: 0.898.Real data/European cardholdersNA
A103RFAccuracyThe results showed that RF performs better with large dataset. The accuracy is 99.9%. The SVM algorithm can be used instead of RF. However, SVM still suffers from the imbalanced dataset.NAPrivacy preserving techniques can be applied in distributed environment.
A104RFAccuracy
F1-score, Precision, Recall
The result showed that the RF performed better comparing with DT and NB. The suggested model showed better accuracy on huge dataset.Real dataset/100,000 cardholdersApplying semi-supervised technique
A105Oversampling with SMOTE
SVM, LR, DT, RF
Accuracy Precision Recall,
F1-score
Results showed that when using SMOTE technique, the model works better in predicting fraudulent. RF and DT provided best performance.Real dataset/Europeans cardholdersBuilding a real-time solution to detect fraud.
A106RF
AdaBoost oversampling ADASYN
Accuracy
Recall
Precision
F1-score
This research examines various existing credit card fraud systems using ML approaches. Despite the fact that RF produces outstanding results on tiny sets of data, there are still certain problems, such as data imbalance. RF accuracy is: 0.999.Real dataset/Europeans cardholders/KaggleUsing large amount of data. More pre-processing procedures.
A107Autoencoder neural network DAERecall
Accuracy
The results showed the DAE improves classification accuracy of minority class of imbalanced datasets. Proposed model increases accuracy of minority class. When threshold equal to 0.6, model achieves best performance with 97.93%.Real dataset/Europeans cardholders/KaggleDimensionality reduction of high-dimensional data needs further research.
A108ANN with LRAccuracy, Precision and RecallThe results show that the model is very good. Accuracy achieved of 0.9948, the recall is 0.8639 and precision of 0.2134.Real data/European cardholdersNA
A109GAORFAccuracy
Confusion matrix
The results showed that using real and genetic algorithm optimised RF models. The model has good improvement and bringing down misclassifications.Commercial bank in NigeriaNA
A1102DCNN, 1DCNN
LSTM, NLP
SMOTE
Accuracy
F-score
Precision
Recall
The result showed that using CNN and LSTM yielded better performance. LSTM (50 blocks) was the highest with F1-score of 84.85%. Sampling techniques applied to solve imbalanced dataset and improve model performance.Real dataset/Europeans cardholders/KaggleHyperparameters to build DL techniques to improve performance.
A111ANN, RF, GBM
RUS, SMOTE
DBSMOTE
SMOTEENN
F1-score
Recall
Precision
Accuracy
The result showed that using sampling techniques enhanced the detecting of fraud in credit card. Recall obtained with SMOTE by DRF classifier is 0.81 which is the best. Precision is 0.86. Staked ensemble shown promise in detecting fraud.Real dataset/Europeans cardholders/KaggleUsing other sampling techniques. Applying unsupervised and semi- supervised techniques.
A112Local outlier factor, LR, RF, DT
isolation forest
Accuracy
MCC
The result showed that the LR, SVM obtained higher accuracy. SVM accuracy is 0.9987. LR accuracy is 0.9990. One-class SVM applied in this study.Real dataset/Europeans cardholdersNA
A113ANN.LR, DT, RF and XGBoostAccuracy
Precision
Recall
F1-score
Results showed that ANN and XGBoost performed a high performance. ANN achieved a 99% accuracy.-Real dataset/Europeans cardholders.
-Synthetic dataset
Use more real world datasets.
A114NB, SVM
AdaBoost
MCC
Accuracy
The results showed that boosting technique achieved a good accuracy. The best MCC score is 0.823.Real world dataset.Extend the model to online learning model.
A115KNN, LR, SVM, RF, DT, XGB, OCSVM, AE, RBM, GANAUROC
FPR
TPR
The results showed that applying supervised approach such as, RF and XGB achieved better performance. XGB obtained 0.989 AUROC. RF obtained 0.988 AUROC. Unsupervised techniques RBM achieved the best performance with 0.961 AUROC.Real dataset/Europeans cardholders/KaggleFocuses on new GAN model
A116RFAccuracy
Sensitivity
Specificity Precision
The result showed that building multiple DT achieved good performance with 98.6% accuracy.Real dataset/Europeans cardholders/KaggleNA
A117IBk, IB1, KStar, RandomCommittee, and RandomTree
AdaBoost
Accuracy
Precision
Recall
The results showed that the best accuracy achieved by Bagging, Rotation Forest, Random SubSpace, Random Committee, LMT, and REPTree. The IBK, IB1, RandomCommittee, KStar, and RandomTree obtained good accuracy. And can detect fraud 348 (35.27%), 354 (40.97%), 396 (45.83%), 397 (45.94%), and 399 (46.18%) respectively.UCSD—FICO dataset.NA
A118Spectral-clustering hybrid of GA trained modular NN.Sensitivity
Specificity
Accuracy
Results showed that hybrid model is efficient in detecting fraud. The model obtained sensitivity of 90%, specificity of 19% and prediction accuracy of 74% with improvement rate of 12% for data inclusion.Dataset from banks/Africa and Nigeria.NA
A119ANN, SA-ANN
HTM-CLA
DRNN, LSTM
AccuracyResults showed that the HTM-CLA offered a realistic features. HTM-CLA with SA-ANN achieved good performance. The maximum accuracy obtained from SA-ANN.Real dataset/Australia
Real dataset/German
Reduce computational burden in HTM-CLA technique
A120Isolation forest
KNN, DT, LR, RF
Sensitivity time and precisionThe result showed that KNN sensitivity is better than DT. However, DT needs less time to detect fraud. DT is the best model.Real data/European cardholdersNA
A121LR, DT, RF, NB, ANNAccuracy
Recall
Precision
Results showed that the accuracy is 94.84% when using LR. 91.62% when using NB and 92.88% when using DT. ANN obtained better accuracy of 98.69%. ANN is the best.Real dataset/Europeans cardholdersNA
A122KNN, DT, SVM, LR, RF
XGBoost
Accuracy
F1-score
Confusion matrix
The result showed that KNN model is the best comparing with other techniques.
Accuracy is 99.95%. F1-score is 85.71%.
Real dataset/Europeans cardholdersUsing other resampling and applying DL techniques.
A123Hybrid architecture involving the optimization of the particles swarm (PSO)
SVM
Accuracy
F1-score
Confusion matrix
The PSO algorithm is used to select characteristics and the SVM is used for the iterative development of the feature selection. Results shown that a minimum of functionalities is extracted by the suggested PSOSVM. The PSO-SVM algorithm is an optimal preparatory instrument for enhancing feature selection optimisation. Accuracy for German dataset: with SVM: 78.69. PSOSVM: 89.42. Accuracy for Australian Dataset: with SVM: 78.84.PSOSVM: 89.27.German credit card datasets.
Australian credit cards
NA
A124Stacking AdaBoost
majority voting
LR, DT, RF
AccuracyThe result showed that the suggested model provided better fraud detection. The boosted stacking performs better than others. Boosted Staking accuracy is 94.5%Real dataset/Europeans cardholders/KaggleNA
A125Neo4j, PageRank, RF, DT KNN, SVM, MLP, LOF, isolation forestAccuracy
MCC, F1-score
Recall, Precision
ROC, AUC,AUPR
The result showed that significant improvement in performance metrics of DT. LOF yielded a better result with 99.54% accuracy and recall 83.39%. When using PageRank graph feature. RF accuracy is 99.47%.Synthetic dataset/BankSimOther graph algorithms to extract feature and DL should be studied further.
A126Autoencoder,
RBM
Recall, Precision
AUC
The result showed the AE and RBM can make AUC more accurate. AE based camera and H2O applied.Real dataset/Europeans cardholdersNA
A127AdaBoost
Majority vote
MLP, SVM
LOR, HS
MCC metricsThe result showed that the hybrid model of majority voting provided good accuracy. The model achieved great location rate 98% with 0.1%. Perfect MCC score when using AdaBoost and Majority voting.Real dataset/Europeans cardholders/KaggleExamined other internet study models
A128AE, one-class SVM and robust Mahalanobis outlier detectionPrecision
Error rate
MSE
Results showed that the advantage of robust Mahalanobis is that does not need label for training. The performance of the three models was vary. To get vision about performance of models the available labels used for model performance evaluations.Real dataset from international corporationGlobal and local outlier, cardholder behaviour need to be considered.
A129AdaBoost, NB, RT
Majority voting DT, GBM, NN, SVM, Spark ML
MCCThe results showed that the hybrid model of NB, SVM, and DL techniques obtained an ideal MCC score 0.823.Public real dataset/bankExpand to internet learning
A130SVM-RFE
Hyper-parameters Optimization
SMOTE
Accuracy Precision
Recall, Specificity
F-score
Results showed that the proposed model is high effective and obtained the best accuracy with 99%.Real dataset/Europeans cardholders/KaggleUsing more complex datasets
A131RNN
SMOTE Tomek
LSTM, BLSTM
GRU, BGRU
Accuracy
Recall, Precision
AUC
Results showed that BGRU achieved the best accuracy 97.16%, then BLSTM with 96.04%.Real dataset/Europeans cardholders/KaggleFocuses on the behavior of customer.
A132WOA
SMOTE
BPNN
AccuracyThe result showed that the WOA and SMOTE obtained more efficient than BPNN.Real dataset/Europeans cardholdersNA
A133NB, SVM, RFAccuracyThe results showed that the RF is the best technique with accuracy of 100%.Real data/European cardholdersNA
A134RF, SVM, LOF
isolation forest
Accuracy
Precision
Recall
The result showed that the RF obtained 99.92 accuracy. RF performed better comparing with other techniques.Real dataset/Europeans cardholdersImprove dataset and add other algorithms to the suggested model
A135K-means
C5.0 DT
Hadoop and Spark
Accuracy
ROC
AUC
The results showed that the spark-based IHA hybrid model obtained 94% accuracy. It is suitable for detect fraud.Public domain
Applying this model to other fields
A136SVM
Undersampling
techniques
Accuracy
Precision
Recall
Results showed that the new model improves the performance. Accuracy is 99.9%. SVM obtained best precision with 89.5%.Real dataset/Europeans cardholdersNA
A137Isolation forestAccuracyThe results showed that the isolation forest obtained accuracy with 99.87.Professional survey organizations.Using hybrid techniques and AI
A138RFAccuracyThe results showed that RF using feedback and delayed supervised sample is better than other techniques. RF accuracy is 0.962.NAApplying semi-supervised techniques
A139SVM, KNN
AdaBoost
PSOS
RIG
Accuracy
Precision
Recall
F-measure
The results point out that PSOS technique is the best feature optimisation technique. This technique enhanced the accuracy from 82.90% to 85.51%. PSOS technique gives more performance.Australian financial dataset.Extend the model by using hybrid techniques
A140AdaBoost majority voting, NB, SVM, DLMCCThe results showed that Majority voting obtained a high accuracy and best MCC score with 0.823.Public realworld data setExtend to online learning model
A141RF, NNAccuracy
Precision
Recall, F-measure
The result showed that RF obtained accuracy with 90%. RF is suitable technique.Real-life B2C datasetThr RF itself needs improvement.
A142NB, RF, DT, GBT, DS, ANN, RT, MLP, LIR, LOR, SVMAccuracy
ACC
MCC
The results showed that the best AUC obtained is 0.937 from GBT using aggregated features. Aggregated features improve the models performance.Public data sets. Benchmark databases.Further evaluation of this models using different datasets.
A143HOBA
DBN, RNN, CNN
BPNN, SVM, RF
Accuracy Precision, Recall
F1-measure
The results showed that the DBN with HOBA variable obtained better performance. Using DL techniques and HOBA feature engineering improve the performance.Real-world dataset/bank in ChinaBuild real-time model. Build a combination model of ML and DL
A144AE
GAN
Precision
Recall
F1-measure
The result shows that the DAEGAN model achieved best performance. AUC is 0.958. Recall is 0.815. AUPRC is 0.805. DAEGAN improves accuracy.Real dataset/Europeans cardholdersImprove the model
A145Isolation forestAUCPR
F1-score
Precision, Recall
ROC-AUC
The result showed that the model achieved good performance. AUCPR is better than ROC-AUC in describing performance. Precision is 0.807. Recall 0.763. F1-score is 0.784. ROC-AUC is 0.973. AUCPR is 0.759.Real-life dataset from ULB. Kaggle.NA
A146SVM, LOF
isolation forest
SVM
Accuracy
Precision
F1-score, Recall
The results point out that isolation forest with LOF model very fast and accurate. The accuracy is 99.74%, SVM obtained 45.84%. LOF achieved 99.66%.NANA
A147SVM, K-means
AdaBoost
Recall, AccuracyThe result showed that SVM and AdaBoost obtained high performance.Dataset from a bank.NA
A148Deep auto-encoderAccuracy
Precision, Recall
AUC-ROC Curve
The results showed that the algorithm is perfect and gave high performance 98.8% acceptance rate. The proposed algorithm can be used for any Binary classification task.Real dataset/Europeans cardholdersNA
A149NNAccuracyThe result showed that the suggested model can be integrated with mobile apps to detect fraud. Model obtained excellent accuracy with 99.75%.Real dataset/Europeans cardholdersNA
A150RF, DT, SVM, GNB LRAccuracyThe result showed that the DT provided better performance. However, speed still suffer.NAUsing other ML and DL techniques
A151Isolation forest
LOF
Recall, Precision
F1-score
The isolation forest obtained accuracy with 99.72%. With number of errors 71. LOF accuracy is 99.62% and number of errors 107. Isolation forest is better in detecting fraud.Real dataset/Europeans cardholdersUsing NN technique
A152DT, RF, HMM, NNAccuracy false alarm rate, MCCThe results point out that the RF obtained high performance with 0.999% accuracy in fraud detection.Real dataset/UCINA
A153TVIWDA
SVM
WFSVM
Accuracy
Precision, Recall
F1-score
The result showed that using TVIWDA with WFSVM improved the accuracy of detection. The suggested system obtained 97.82% accuracy. Precision is 92.62%.German credit card dataset.Solving the imbalanced data problem
A154Oversampling pre-processing technique SAS.
RF, KNN, DT, LR
AccuracyThe study proposed 4 models to detect credit card fraud. The result showed that the RF and KNN are overfitting. Thus, only the DT and LR have been compared. The best performing model is a LR. Result shows that LR with stepwise splitting rules has outperformed the DT with only 0.6% error rate.Real dataset/Europeans cardholders/KaggleUse different sampling technique such as undersampling, SMOTE or roughly balancing to compare the result.
A155RF, DNNAccuracyThe results showed that the RF perform perfect with large number of data. RF accuracy is 0.999.NANA
A156KNN, LRAccuracy Precision
Recall, F-measure
The result shows that the KNN technique is achieved best result. Precision is 0.95. Recall is 0.72. F1-score is 0.82.Real dataset/Europeans cardholdersNA
A157SMOTE
MLP, KNN, SVM
OSE, NN, GAN
Accuracy
F1-score
The results point out that the model using stacking classifier which combines GAN-improved MLP with SVM and KNN. OSE is preferred because of its ability to harness the abilities of MLP which works better in finding hidden patterns. The accuracy of OSE is 99.8%Real dataset/Europeans cardholders/KaggleApply weighted voting and boosting algorithms
A158Aggrandized RFAccuracy Precision
Recall, F-measure
Sensitivity
Specificity
The result showed that the aggrandized random forest is obtained high accuracy with 0.9972% for balanced data. And 0.9995% for imbalanced data. RF is the best technique in detecting fraud.NANA
A159RF, ANN, SVM, LR, tree classifier gradient boostingAccuracy
Precision
Recall
F1-score
Results showed that the RF algorithm demonstrate an accuracy percentage with 95.988%. SVM accuracy is 93.228%. LR accuracy is 92.89%. NB accuracy is 91.2%. DT accuracy is 90.9%. GBM accuracy is 93.99%.ULB dataset from KaggleApply other ML techniques
A160SVM, NB, KNN
focal loss
XGBoost
W-CEL, LR
Accuracy
Precision
Recall
MCC
The result showed that the suggested model achieved accuracy with 100%. Precision is 0.97. Recall is 0.56. MC is 0.72 using extreme imbalanced dataset. When using mild balanced dataset, the accuracy is 99%. 0.88 precision. 0.87 recall. 0.89 MCC. The suggested model is not working well when using extreme dataset. XGBoost improves model performance.ULB dataset from KaggleSolve the imbalanced dataset problem
A161HMMNAThe study provided a method to find out the spending behaviour of cardholder, then find out the observation symbols so that help in estimating the model performance.NANA
A162LOF, K-means
isolation forest
Precision
Recall
F1-score
The result shows that proposed model provided an accuracy with 98%. K-means clustering, isolation forest and LOF.Real dataset/Europeans cardholdersNA
A163KNN, LR, RF XGBoost extreme gradient boostPrecision
ROC-AUC
As the XGBoost is showing more accuracy than other models. Out of these algorithms, XGBoost model is preferable over the RF model and LR model.Real data/European cardholdersRF model would be improved
A164AdaSyn, ROS, RUS, Tomeklinks
AIIKNN, Tomek
SMOTE+ENN, AdaBoost, KNN, RF, SVM, eXtreme
XGBoost, LR
Accuracy
Precision
Recall
K-fold
AUC-ROC
Execution time
The result showed that oversampling followed by undersampling performs well for ensemble classification models. AIIKNN, SMTN, and RUS are performing well. SVM and KNN achieved perfect results. Best precision provided by oversampling followed by undersampling methods in conjunction with RF. NB classifier was the least.Machine learning Group ULB. Kaggle.NA
A165RF, SVM, ANN.AccuracyThe result showed that ANN produced high accuracy then RF then SVM.NAUsing more techniques.
A166KNN, isolation forest, local outlier factorAccuracy
Recall score
Results showed that all algorithms achieved 95.0% accuracy. Isolation forest had high accuracy and K-means produced the low accuracy. LR and vanilla LR gave great accuracy.Real dataset/Europeans cardholdersImplement an autoencoder or SVM.
A167LIGHTGBM
AdaBoost, RF
Accuracy, precision and recallThe results showed that AdaBoost provided the highest result with 0.9613. In term of precision, Light BGM produces the highest result with 0.986. AdaBoost provided the highest recall with 0.889.Real dataset/Europeans cardholders/KaggleAdding more parameters.
A168OLightGBM
RFLR
SVM, DT, KNN
NB
Accuracy
Recall
Precision
F1-measure
The results highlight the importance of adopting an efficient parameter optimization strategy for enhancing the predictive performance. The proposed model outperformed other techniques with accuracy 98.40%. AUC is 92.88%. Precision is 97.34%. F1-score is 56.95%.Real dataset/Europeans cardholders/KaggleNA
A169RF, Apache KafkaTrue positive rate (TPR), TNR, recall, precision accuracyUsing Apache Kafka to consume the transactions from the transaction record and publish them in real time. This project is using Cassandra as the storage layer. This proposed system offers the user maximum security and precision.Data from the file system to the Cassandra database.NA
A170Autoencoder
RBM
Federate learning
Accuracy
ROC, Recall
Precision
The results showed that the average accuracy of Autoencoder is 94% and RBM is 88%. AUC achieved a result of 0.94.Real dataset/Europeans cardholdersNA
A171RFAccuracy
Recall, Precision
F1-score
The result showed that RF obtains good performance on small dataset. Some problems with imbalanced dataset. RF accuracy is 0.9632. Precision is 0.894. Recall is 0.85. F1-score is 0.871.Real dataset/Europeans cardholdersImprove RF itself
A172RF, LR, DT, KNN NB,
Undersampling and oversampling techniques.
Accuracy
Sensitivity Specificity Precision
Matthews’s co-relation
Results showed that LR is the best algorithm. The proposed classifier NN and LR outperform DT. LR accuracy is 0.9699.Real dataset/Europeans cardholders/KaggleNA
A173Local outlier factor, LOF, INFLO, and AVFAccuracy
Recall
Precision
The results showed that using LOF, INFLO, and AFV resulted in the highest level of LOF. 96% accuracy, 98% recall, and 93% precision.World websiteTrying other algorithms.
A174LR, DT, SVM, NB, RF, KNNAccuracy
Precision Recall
The result showed that using RF obtained best accuracy of 99.947%, precision is 76%, and recall is 92.68%.Real dataset/Europeans cardholdersANN can be used to construct new classification techniques.
A175Deep learning based fraud detection model (DLFD)Accuracy
Precision
Recall
DL model is constructed for the prediction process using Keras. Comparison with existing models indicate high performance in detecting fraud. Detection rate is 8.7%. DLFD accuracy/0.997. Precision/0.929. Recall/0.795.BankSim dataset was used for analysis of performance.Improving the TPR levels and also on handling the concept drift.
A176ANNAccuracyThe result showed that ANN is successful in fraud detection. Accuracy is 98%. However, ANN faced problems when training on huge datasets.Dataset from company/South AfricaNA
A177ANN, GA, LR,
SMOTE
Accuracy
Precision
Recall, F1-score
The results showed that the ANN with genetic algorithm obtained accurate results. The accuracy is 99.83%. Precision is 50.70%. Recall is 97.27%. F1-score is 66.66%.Real dataset/Europeans cardholdersNA
A178SVM, fuzzy association rules (FAR). Gradient recurrent unitNAThe results showed that the proposed framework provided significant contribution. The framework allow to detect abnormal transaction.NAImplementation and evaluation the framework.
A179Hybrid ensemble-based. Boosting and bagging, RF, LRMCC, Precision
Recall
Detection rate
Accuracy
Results showed that the model is efficient in detecting fraud. MCC is 1.00. The false positive rate is 0.00235. False negative rate is 0.0003048. The detection rate is 0.9918. Accuracy is 0.9996. MCC is 0.9959.Brazilian bank data and UCSD-FICO dataNA
A180Particle swarm optimization (PSO). NNAccuracy
Precision
Recall
Results showed that performance of PSO is very high with 99.9% accuracy.Real dataset/European cardholdersFocus on solving imbalanced.
A181LR, RF
Under sampling and oversampling
Confusion matrix,
precision, F1-score,
Roc-AUC
RF precision is 0.93. F1-score is 0.85. The oversampling, under sampling of data for accuracy of classifiers is promising. Oversampling technique gave better fraud prediction results as compared to random under sampling.Real dataset/European cardholdersNN and using combination of HMM or KNN to achieve better in fraud detection.

Overall performance estimation of ML/DL model in credit fraud detection

This section addresses RQ3, which concerns the estimate of ML/DL model performance. Accuracy of estimation is the primary performance indicator for ML/DL models. This question focuses on the following features of estimating accuracy; performance metric, accuracy value, and dataset. As the construction of ML/DL models is dependent on the dataset, we examined the data sources of ML/DL models in the reviewed article. In addition, we found a number of datasets utilised in the experiments of associated article. This review articles employs two sets of datasets; real-word data set and synthetic dataset. The dataset utilised most frequently in the reviewed article is a real-word dataset. In addition, 154 research article employed real-world datasets, eight utilised synthetic datasets, and 19 did not specify the dataset source.

Evaluation metrics were used to calculate ML/DL model performance. Confusion matrix provides output matrix that characterises the model’s overall effectiveness. ML/DL model’s accuracy is compared using confusion matrix sensitivity and specificity, F-score, precision, receiver operating characteristic (ROC), and area under precision recall area (AUPR).

In this review, a number of different performance indicators have been used in addition to accuracy. As shown in Table C1 , we found 177 article that clearly presented the performance metrics of the proposed models. Four article did not mention the performance metrics. We discovered that 177 of reviewed article mentioned the performance indicators of their suggested models. However, four reviewed article did not mention the performance metrics. In this review, accuracy, recall, precision, and F-score were often employed as performance indicators. Accuracy is the proportion of test set records that were properly categorised transaction to fraudulent or non-fraudulent. The ration of true positives to all positives is referred to as precision. The proportion of fraudulent transactions that we correctly detected as fraudulent compared to the total number of fraudulent transactions would be the precision. Recall is percentage of all correctly classified predictions made by an algorithm. In addition, the value of F1 provides a single score that is proportionate to both recall and precision. Full two-dimensional area under the entire ROC curve is measured by AUC. One of the best indicators for analysing the effectiveness of credit card fraud detection is the ROC curve. The classification’s quality is measured by MCC. Because it covers true positive, true negatives, false positive, and false negatives, it is a balanced metric. MCC utilised in 13 reviewed article.

In addition, 30 of the 181 studies employed only a single performance metric, with the majority of these article using only accuracy (24) article, MCC (five) article, and execution time (one) article. Using single performance metric is insufficient for determining the quality of ML/DL model. However, article such as 43 and 74 utilised more than five performance indicators to represent the performance of their ML/DL model. In addition, a number of reviewed article give computational performance measurements as well as performance metrics. The length of time the model took to complete the assigned task is called execution time. To ascertain how long the model takes to detect fraud, the execution time is calculated. As a result, we guarantee that the model successfully achieves its goal. Execution time employed in Alghofaili, Albattah & Rassam (2020) , Devi, Thangavel & Anbhazhagan (2019) , Singh, Ranjan & Tiwari (2021) . The loss rate function compares actual and expected training output to speed up learning. Loss rate employed in article ( Alghofaili, Albattah & Rassam, 2020 ). Test of the effect of cost sensitive wrapping of Bayes minimal risk (BMR) applied in article ( Almhaithawi, Jafar & Aljnidi, 2020 ) as a cost-saving measure. Balanced accuracy (BCR) combines the matrices of sensitivity and specificity to produce a balanced outcome. BCR presented in article ( Layek, 2020 ). In ( Arun & Venkatachalapathy, 2020 ) Kappa assesse the predication performance of the classifier model. Few article ( Arya & Sastry, 2020 ; Bandyopadhyay et al., 2021 ; Bandyopadhyay & Dutta, 2020 ; Benchaji, Douzi & El Ouahidi, 2021 ; Rezapour, 2019 ) introduced mean square error (MSE) assessment metrics, mean absolute error (MAE), and root mean square error (RMSE). Table C1 shows the proposed ML/DL model along with performance and datasets.

Trend of research

To answer RQ4, we examine the trend of the reviewed article. In addition, we compare the models created over the three years to determine and evaluate which techniques recently garnered more attention. This also assist, to identify the gaps so that future research will be able to address them in their own work. First, we examined the distribution of the chosen article by the publication year. In year 2019 (47 articles), 2020 (70 articles), and 2021 (64 articles). Significant difference existed between the years 2019 and 2020, the number of published articles for credit card fraud detection increased (23 articles). However, there was no notable difference between 2020 and 2021 (six articles). Fig. 2 demonstrates this comparison.

In response to RQ1, we demonstrated that 110 distinct ML models, 34 distinct DL models, and 39 models that combine ML and DL have been utilised by researchers. RF, LR, and SVM are the most commonly employed ML approaches. ANN, AUE, and LSTM are the most utilised DL approaches. In addition, we observed increased interest in combining ML and DL models.

In our review, we count the various learning-based credit card cyber fraud detection techniques applied in the reviewed article to answer RQ2. From this review we found that the most common technique among the reviewed article is the use of supervised algorithm. Supervised algorithms applied in 74% of the reviewed article. A total of 12% of the reviewed article utilised unsupervised techniques. A total of 12% used supervised and unsupervised techniques. A total of 2% applied semi-supervised technique. A total of 1% used reinforcement technique. For the RQ3, we listed the performance metrics that each research article applied. We discovered that 24 out of 181 reviewed article utilised accuracy as their only key performance metric. We also found a number of datasets that utilised in the reviewed article. Majority of the reviewed article using real-world datasets. A total of 154 research article applied real-world data, eight article used synthetic data, and 19 did not mention the source.

In RQ4, we identified research gaps by investigating unexplored or infrequently studied algorithms. In addition, we found supervised learning as the most prevalent learning technique and SMOTE as the most prevalent oversampling technique. The majority of researchers focused on supervised techniques such as LR, RF, SVM, and NN.

Combination techniques that employ multiple algorithms are becoming increasingly prevalent in the detection of cyber fraud. Detecting cyber fraud in credit card increasingly involves the use of DL. DL techniques utilised 34 times in the reviewed article, whereas 39 of the reviewed article applied a combination of DL and ML techniques for credit card cyber fraud detection. DL is advantageous for fraud detection since it solves the difficulty of recognising unexpected and sophisticated fraud patterns. Moreover, as the number of fraud cases to be recognised is relatively limited, DL may be effective. DL have garnered the most attention and had the most success in combating cyberthreats recently. Due to its ability to minimise overfitting and discover underlying fraud tendencies. Moreover, the capacity to handle massive datasets.

For supervised learning algorithms to predict future credit card transaction, each observation must have a label. Given that there is no classification for these observations, this could be a problem when trying to identify fraudulent transactions. Additionally, since fraudsters constantly alter their behaviour, it is challenging to develop a supervised learning model for a given transaction. The normal class is often the only one that unsupervised algorithms need labels for, and they can predict future observations based on deviations from the normal data. Future research should give more attention to unsupervised and semi supervised techniques, which can yield new insights. In addition, paying more attention to DL techniques such as CNN, RNN, and LSTM, we recommend that further research may be conducted on ML techniques, especially semi-supervised and unsupervised techniques in order to improve ML model performance. In addition, performing additional research on DL techniques is needed. As a result of the unavailability of a balanced dataset and the shortage of datasets, financial institutions are encouraged to make the essential dataset available, so that research outputs will be more effective and qualitative.

To detect cyber fraud in credit card, supervised, unsupervised, and semi-supervised ML/DL techniques applied in the reviewed article. Figure 4 displays that 74% of the reviewed article utilised supervised techniques. As a result, it is the most common technique used in the reviewed article. In addition, according to the reviewed article, classification and regression techniques been always of interest. On the other hand, 12% of selected articles applied unsupervised techniques, 12% of selected articles applied both supervised and unsupervised techniques, while 2% articles applied semi supervised techniques, and 1% articles applied reinforcement learning. A growing trend in this field is the use of ensemble techniques that capitalise on the benefits of several classification methods. The use of ensemble methods increased in 2020 and 2021 comparing with 2019. The other interesting finding is that DL approaches have attracted considerable interest during 2019 to 2021. The number of research articles that used DL techniques as single technique or combined with other ML techniques in 2019 is 15 articles, in 2020, 30 articles, and in 2021, 28 articles. It appears that the popularity of DL algorithms has increased.

The countries that published research on utilising ML/DL techniques to detect credit card cyber fraud is growing over time. In 2021, Ghana, Romania, Taiwan, and Vietnam are among the new countries that made an effort in detecting cyber fraud. India is the pioneer when it comes to the publication of ML/DL studies. Figure 5 depicts the number of article published by country and year (2019, 2020, and 2021).

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1278-g005.jpg

Gap analysis and the future direction

The most effective way for determining the approaches that are most appropriate for this research problem is to categorise the ML/DL algorithms used in detecting cyber fraud in credit card. Additionally, it is beneficial to determine why particular tactics were chosen. Supervised algorithms have always been of interest, as 74% of the reviewed articles have been used supervised algorithms, with the most commonly used being RF then LR then SVM. Unsupervised learning algorithms also applied in 12% articles with the most commonly used being Isolation forest. However, it is interesting that only 12% of the 181 reviewed studies utilised unsupervised learning techniques. Semi-supervised approach employed in 2% of the reviewed articles. It appears that semi-supervised and unsupervised learning techniques may be researched further. According to reviewed articles ( Choubey & Gautam, 2020 ; More et al., 2021 ; Muaz, Jayabalan & Thiruchelvam, 2020 ; Shirgave et al., 2019 ), unsupervised or semi-supervised learning techniques such as one-SVM, isolation forest, and K-means clustering should be utilised more in credit card fraud detection.

In the three years, DL techniques have been examined increasingly frequently. Utilising DL to get greater accuracy and efficient performance. By applying DL techniques, new fraudulent patterns can be recognised and system can respond flexibly to complex data patterns. Thus, for efficient credit card fraud detection, researchers are encouraged to conduct additional study on DL techniques. Several studies such as ( Benchaji, Douzi & El Ouahidi, 2021 ; Jonnalagadda, Gupta & Sen, 2019 ; Kalid et al., 2020 ) suggested further study of DL techniques for detection in credit card. Moreover, as each ML/DL technique has its own limitations, it is necessary to consider combining the ML and DL algorithms for promising detection results. Several article such as ( Agarwal, 2021 ; Dang et al., 2021 ; Gamini et al., 2021 ; Kalid et al., 2020 ; Singh & Jain, 2019 ) suggested combinations of DL methods and traditional ML methods to cyber fraud detection from an unbalanced data and enhance the accuracy.

Several reviewed article cited the lack of the dataset as the limitation of their work. According to Meenu et al. (2020) , the research outcomes will be more effective and of higher quality if the financial institutions make the crucial data set of various fraudulent actions available. As a result, one of the key problems in many studies is the lack of data. Limitations on the availability of the data could be overcome if there is a vital data set of diverse fraudulent activities across nations. Maniraj et al. (2019) noted that when dataset size increase, algorithm precision also increases. It appears that adding additional data will undoubtedly increase the model’s ability to detect fraud and decrease the number of false positives. The banks themselves must formally support this. The study ( Seera et al., 2021 ) proposed conducting further evaluation of their generated model with real data from diverse regions.

Additionally, the datasets are significantly skewed, which is a problem. Numerous studies attempted to develop a model that could perform properly with data that is highly skewed. Several articles ( Balne, Singh & Yada, 2020 ; Ojugo & Nwankwo, 2021 ; Shekar & Ramakrisha, 2021 ; Voican, 2021 ; Vengatesan et al., 2020 ), unbalanced data was applied, and balancing the dataset using sampling techniques such as oversampling or undersampling is left as future work. Several articles ( Ahirwar, Sharma & Bano, 2020 ; Almhaithawi, Jafar & Aljnidi, 2020 ; Manlangit, Azam & Shanmugam, 2019 ) applied oversampling techniques.

Undersampling techniques have been applied in several article ( Amusan et al., 2021 ; Ata & Hazim, 2020 ; Muaz, Jayabalan & Thiruchelvam, 2020 ; Rezapour, 2019 ; Zhang, Bhandari & Black, 2020 ). In Amusan et al. (2021) , a random undersampling technique was used, and the study recommended that other balancing data techniques be explored. One reviewed article ( Ata & Hazim, 2020 ) applied an undersampling technique. However, the study recommends adopting the suggested model by using massive dataset instead of using sampling technique. In addition, some articles such as Trisanto et al. (2021) and Singh, Ranjan & Tiwari (2021) applied undersampling techniques and oversampling techniques.

Oversampling technique such as SMOTE, ADASYN, DBSMOTE, and SMOTEEN have been used. Undersampling techniques such as random undersampling (RUS) has been applied. In light of this, future studies should consider applying alternative oversampling techniques, such as borderline-SMOTE and borderline oversampling with SVM, as well as undersampling techniques. In addition to fraud location, an algorithm to determine the timing of the fraud is required ( Alghofaili, Albattah & Rassam, 2020 ; Chen & Lai, 2021 ). In addition, an algorithm can be developed to predict fraudulent transactions in a real-time and deploying the service on various cloud platforms to make it easily accessible and reliable ( Ingole et al., 2021 ).

Limitation of the review

Our review is restricted to journal article published in 2019, 2020, and 2021 that apply ML/DL techniques. By using our methodology in the early stages, we eliminated several irrelevant article. This assured that the selected article met the requirements for our review. Even though we searched the most prominent digital libraries for the article, there may be more digital libraries having relevant research article that were not included for this study. The snowballing method used to include relevant article that excluded during automatic searching in order to address this limitation. In addition, as it is probable that while looking for the keywords, we would have missed some synonyms. Hence, we also analysed the search terms and keywords for recognised collection of research works. We restricted our search to only English-language articles. This creates a language bias, as there may be article in this field of study written in other languages.

Conclusions

This review studied cyber fraud detection in credit card using ML/DL techniques. We examined ML/DL models from the perspectives of ML/DL technique type, ML/DL performance estimation, and the learning-based fraud detection. The study focused on relevant studies that were published in 2019, 2020, and 2021. In order to address the four research questions posed in this study, we reviewed 181 research article. In our review, we have provided a detailed analysis of ML/DL techniques and their function in credit card cyber fraud detection and also offered recommendations for selecting the most suitable techniques for detecting cyber fraud. The study also includes the trends of research, gaps, future direction, and limitations in detecting cyber fraud in credit cards. We believe that this comprehensive review enables researchers and banking industry to develop innovation systems for cyber fraud detection.

On the basis of this analysis, we suggest that more research may be conducted on semi-supervised learning and unsupervised learning techniques. Based on our review, we recommend that DL techniques might be further researched for credit card cyber fraud detection. Researchers are encouraged to conduct further research on integrating the ML/DL algorithms for effective detection outcomes. In addition, researchers are advised to use both oversampling and undersampling techniques because the datasets are extremely skewed. Furthermore, we recommend researchers to mention dataset sources and performance metrics employed to present the outcomes. Banks are also encouraged to make available dataset of different fraudulent activities across nation for further research.

Funding Statement

The authors received no funding for this work.

Additional Information and Declarations

The authors declare that they have no competing interests.

Eyad Abdel Latif Marazqah Btoush conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Xujuan Zhou conceived and designed the experiments, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Raj Gururajan conceived and designed the experiments, performed the experiments, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Ka Ching Chan conceived and designed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Rohan Genrich conceived and designed the experiments, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Prema Sankaran conceived and designed the experiments, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Determinants of consumers’ intention to use credit card: a perspective of multifaceted perceived risk

Asian Journal of Economics and Banking

ISSN : 2615-9821

Article publication date: 24 August 2020

Issue publication date: 18 December 2020

The purpose of this study is to develop a theoretical model for consumer behavioral intention by integrating the technology acceptance model (TAM) and the theory of perceived risk, which is tested on the intended use of credit cards in Vietnam.

Design/methodology/approach

The data were collected from 485 bank customers through a nationwide online survey. An exploratory and confirmatory factor analyzes were performed to validate the factor structure of the measurement items while structural equation modeling was used to validate the proposed model and testing the hypotheses.

The results of structural equation modeling reveal that perceived risk, perceived usefulness, social influence and perceived ease of use were significant determinants of consumer intention to use a credit card. Of them, only perceived risk discouraged the intended use of a credit card, which was synthesized from psychological, financial, performance, privacy, time, social and security risk.

Research limitations/implications

This study measured the first-order risk dimensions based on the payment function of the credit card only; these measurements missed potential losses relevant to credit function of credit cards.

Practical implications

This study can be beneficial to banks enacting policies to attract more consumers and to help decide how to allocate resources to retain and expand their customer base.

Originality/value

The study adds value to the literature on consumer behavior by confirming the impact of second-order perceived risk on the intended use of credit cards, which most previous studies have not demonstrated. The research also provides an empirical evidence to the academic research platform on e-banking services in Vietnam, especially related to the credit card industry.

  • Perceived risk
  • Behavioral intention
  • Credit card

Trinh, H.N. , Tran, H.H. and Vuong, D.H.Q. (2020), "Determinants of consumers’ intention to use credit card: a perspective of multifaceted perceived risk", Asian Journal of Economics and Banking , Vol. 4 No. 3, pp. 105-120. https://doi.org/10.1108/AJEB-06-2020-0018

Emerald Publishing Limited

Copyright © 2020, Hoang Nam Trinh, Hong Ha Tran and Duc Hoang Quan Vuong.

Published in Asian Journal of Economics and Banking . Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

1. Introduction

Credit cards, a combination of payment card and personal consumption credit, are widely used in around the world. Starting with a relationship between vendors and consumers, as well as a need to buy first and pay later, Franklin National Bank in New York, the USA, issued first-ever credit cards to market in 1951. Year after year, the rapid development of consumer demand for credit cards exceeded the bank’s responsibility and management capacity. Consequently, many international credit card organizations have been established and operated independently around the world with six famous brands including American Express, Diners Club, Japan credit bureau, Visa, MasterCard and Chinese union pay. Banks join these institutions and are licensed to issue and acquire credit cards. To expand the credit card market segment, banks are constantly issuing cards to new customers and encouraging existing customers using them in daily spending. Based on practical requirements, many researchers are interested in consumer intended and actual use of credit cards.

Studies of consumer behavior on credit cards have mainly focused on the decisive role of individual demographic characteristics, credit card attributes and personal perception about credit cards. Some authors proved that differences in demographics such as age, gender, occupation and financial status lead to differences in his intention to use credit cards (Dewri et al. , 2016 ; Foscht et al. , 2010 ; Porto and Xiao, 2019 ). Others have confirmed that consumers decide to use credit cards because of their advantages compared to other payment methods such as cash, e-money or debit card (Chahal et al. , 2014 ; Ooi and Tan, 2016 ; Qureshi et al. , 2018 ). Assuming consumers are always rational in their behavior (Fishbein and Ajzen, 1975 ), some authors believed that a person decides using credit cards because of their ability to finance his daily expenses effectively (Porto and Xiao, 2019 ; Tan et al. , 2014 ; Trinh and Vuong, 2017 ). Moreover, some empirical studies have highlighted that social groups such as family, friends and colleagues have a significant influence on consumer intended use of credit cards (Ali et al. , 2017 ; Amin, 2013 ; Tan et al. , 2014 ; Varaprasad et al. , 2013 ).

Reasonable consumers are not only interested in the benefits of using a credit card but also they care about their potential losses (Fishbein and Ajzen, 1975 ; Mitchell, 1999 ). Many authors agreed that perceived risk is a major barrier to the intended use of e-services (Roy et al. , 2017 ; Yang et al. , 2015 ). Similarly, perceived risk has been considered as a deciding factor for the intention to use credit cards (Nguyen and Cassidy, 2018 ; Tan et al. , 2014 ; Tseng, 2016 ; Varaprasad et al. , 2013 ). However, their outcomes were inconsistent; perceived risk had significantly negative impact (Nguyen and Cassidy, 2018 ), significantly positive influence (Varaprasad et al. , 2013 ) or insignificant effect on consumer intended use of credit cards (Tan et al. , 2014 ; Tseng, 2016 ).

As the credit card market becomes more competitive, a better understanding of consumer behavior becomes imperative for banks. However, unlike previous research studies, this study focuses on the impact of perceived risk on the intended use of credit cards. To achieve this goal, the study begins with a brief review of consumer behavior. As a result, a theoretical model and testable hypotheses are developed, followed by the methodology and data collected. The findings are described and discussed before making some conclusions, as well as future research directions.

2. Literature review and proposed theoretical model

2.1 literature review.

Several research frameworks have been developed over the years to explain consumer intended and actual behavior. Prominent among them, theory of perceived risk (TPR) (Bauer, 1960 ) focuses on how consumers are concerned about the potential losses that influence on their intention in a specific purchase situation. However, consumers are not only risk averse but also rational; they intent to do something when they find this behavior useful, easy to do or they are encouraged by influencers, which are inherited from theory of reasoned actions (Fishbein and Ajzen, 1975 ), technology acceptance model (TAM) (Davis et al. , 1989 ), theory of planning behavior (TPB) (Ajzen, 1991 ) or unified theory of acceptance and use of technology (UTAUT) (Venkatesh et al. , 2003 ). These theories are applied independently or together in many studies on consumer intended use of e-services (Alalwan et al. , 2017 ; Liu et al. , 2019 ; Pelaez et al. , 2019 ; Tam and Oliveira, 2017 ).

Credit card is a technology product, used on electronic devices with two basic functions, namely, payment and credit (Foscht et al. , 2010 ). Credit cardholder can buy first, pay later based on the bank’s commitment (Amin, 2013 ). Accordingly, the issuing bank will pay the biller on behalf of the cardholder, who is responsible for returning full and timely (Foscht et al. , 2010 ). In modern commerce, credit cards are becoming increasingly important and popular all over the world (Porto and Xiao, 2019 ). Studies on credit cards are conducted and published in prestigious scientific journals, in which perceived risk from TPR, perceived usefulness from TAM/UTAUT, perceived ease of use from TAM/TPB/UTAUT and social influence from TPB/UTAUT are frequently used to predict consumer intended use of credit cards. These concepts are briefly described as followed:

Perceived usefulness was proposed as the degree to which a person believes that using a particular system would enhance his/her performance (Davis et al. , 1989 ; Venkatesh et al. , 2003 ). Credit cards are appreciated for non-cash payments and personal consumer credit (Chahal et al. , 2014 ). Consumers prefer credit cards due to uncertainty when carrying cash (Khare et al. , 2012 ) or special discounts from famous brands (Dali et al. , 2015 ). They use credit cards as a source of revolving credit with long grace period (Chahal et al. , 2014 ; Khare et al. , 2012 ). They can even withdraw cash by credit cards as required (Chahal et al. , 2014 ). As a result, consumer appreciate the performance of credit card usage, so they are more likely to use it in their daily expenses (Amin, 2013 ; Nguyen and Cassidy, 2018 ; Ooi and Tan, 2016 ; Trinh and Vuong, 2017 ; Varaprasad et al. , 2013 ).

Ajzen (1991) and Davis et al. (1989) considered perceived ease of use as the degree to which a person believes that using a particular system would be easy. Ajzen (1991) assumed that this perception is determined by a total set of accessible control beliefs. Qureshi et al. (2018) stated consumers are easy to register a credit card with a quick and simple procedure. Chahal et al. (2014) and Dali et al. (2015) posited credit card’s non-stop usability in numerous electronic devices. Moreover, the credit card payment process is so simple that cardholders do not need much effort to learn and use it regularly (Khare et al. , 2012 ). Consequently, many studies have confirmed that consumers appreciate credit cards and tend to use them for daily (Ali et al. , 2017 ; Amin, 2013 ; Nguyen and Cassidy, 2018 ; Porto and Xiao, 2019 ; Trinh and Vuong, 2017 ; Tseng, 2016 ).

Social influence referred to a degree to which a consumer perceives that important people believe that he/she should or should not perform a particular behavior (Ajzen, 1991 ; Venkatesh et al. , 2003 ). Consumers are irresistible to observe and evaluate credit card features, they feel uncomfortable when their friends, colleagues always use and talk about them (Qureshi et al. , 2018 ). Amin, 2013 argued that consumers tend to acquire and imitate the financial attitudes behaviors of family members. Moreover, media, which is designed specifically to reach a large audience or viewers has contributed to raising consumer awareness about credit cards (Ali et al. , 2017 ). Empirical evidence suggested that social groups’ perspective may enhance one’s intended use of credit cards (Ali et al. , 2017 ; Amin, 2013 ; Nguyen and Cassidy, 2018 ; Trinh and Vuong, 2017 ; Varaprasad et al. , 2013 ). However, Leong et al. (2013) suggested that social influence only effects indirectly on the intended use of credit cards through perceived usefulness and perceived ease of use.

Perceived risk, in consumer behavior perspective, refers primarily to consumer subjective expectations for incident losses (Bauer, 1960 ; Featherman and Pavlou, 2003 ). Consumers are granted a credit line to pay their bills, and they must spend a lot of time, money and effort to use it safely and effectively (Chahal et al. , 2014 ; Yang et al. , 2015 ). However, their payments are not always successful because of operational breakdowns or system malfunctions (Varaprasad et al. , 2013 ). Meanwhile, the losses of personal privacy and system security are serious and consumers may be accounted until the authorities clarify the responsibilities of stakeholders (Tan et al. , 2014 ; Tseng, 2016 ). As a result, consumers are less like to use credit cards when they are deeply concerned about their uncertainty (Nguyen and Cassidy, 2018 ). However, some studies found that user’s credit card adoption is not from how they perceives the losses caused by its use (Tan et al. , 2014 ; Tseng, 2016 ). Varaprasad et al. (2013) argued that the bank’s efforts make consumers choose credit cards even if they are afraid of un-expectations caused by this type of payment instrument. Despite some differences, most of these studies have shared a one-dimensional approach to perceived risk on credit cards. This approach refers perceived risk as a common perception, defined by several observed variables, and therefore, does not reflect consumer valuation of different types of potential losses relevant to credit card use.

2.2 Proposed research model

Based on the above review about consumer behavior and prior studies on the intention to use credit cards, the study proposes a theoretical model of the intended behavior by integrating some prominent adoption theories. The model suggests perceived risk, usefulness, ease of use and social influence as exploratory factors to predict consumer intended use of credit cards. These constructs and their hypotheses are described below:

Perceived usefulness affects positively the intention to use credit cards.

Consumers are rational, who are not only interested in benefits but also in losses whenever they make decision, especially for those behaviors, which they cannot see or touch, just feel only how they work. These concerns are mentioned as the risk perceptions, which were first proposed in TPR (Bauer, 1960 ). Nowadays, this concept becomes more seriously in the context of e-services, where data are transferred between connected e-devices. Such e-transactions are invisible to consumers, who may be faced to unexpected outcomes and this may prevent them to perform behaviors. Some literature reviews about perceived risk are conducted in technology adoption, including e-shopping (Pelaez et al. , 2019 ), e-payment (Patil et al. , 2018 ) and e-banking (Mutahar et al. , 2018 ). Among many approaches of using perceived risk in studies on consumer intended use of technology, (Featherman and Pavlou, 2003 ; Hanafizadeh and Khedmatgozar, 2012 ) summarized perceived risk is situation specific and is considered as a second-order factor, which is commonly formed by performance, financial, social, time, psychological, security, privacy factors ( Table 1 ). This approach has been used in many empirical studies (Martins et al. , 2014 ; Mutahar et al. , 2018 ; Tandon et al. , 2016 ; Yang et al. , 2015 ). As such, this study hypothesizes that:

Perceived risk is a second-order construct of seven first-order risks, including financial, performance, psychological, social, time, security and privacy risk.

Financial, performance, psychological, social, time, security and privacy risk perception have positively related to perceived risk.

Perceived risk affects negatively perceived usefulness on credit cards.

Perceived risk affects negatively the intention to use credit cards.

Perceived ease of use affects positively perceived usefulness on credit cards.

Perceived ease of use affects positively the intention to use a credit card.

Social influence affects positively perceived usefulness on credit card.

Social influence affects positively intended use of credit card.

Based upon above discussions, a theoretical model is developed to predict consumer intended use of credit cards with four explanatory factors, including perceived usefulness, perceived risk, perceived ease of use and social influence, where perceived risk is a second-order construct related to seven first-order risk dimensions, including financial, performance, social, psychological, time, security and privacy risk ( Figure 1 ).

3. Methodology

The empirical data for this study are obtained through an online survey, which were based on our review of prior studies relevant to the proposed theoretical model. Some expressions were customized to fit the context of credit cards. The research was anchored on a five-point Likert-type scale measurement varying from “1 (strongly disagree)” to “5 (strongly agree).” A pre-test was also performed with five banking experts with a background on credit cards to ensure that the questionnaire has no semantic problems. Some modifications of content and structure were amended based on the provided feedback. The instruments were then further pilot-tested with 15 consumers, who have experienced in using credit cards for paying bills. Insignificant changes were made to the wordings resulted from the tests. A final questionnaire focuses on 11 first-order constructs corresponding to the proposed model with 46 questions asked ( Table 2 ).

The survey was conducted by using 724 respondents selected through convenient sampling of Vietnamese bank customers, who are potential customers encouraged by the bank to register and use credit cards. Only 485 responses were valid and usable, yielding a valid response rate of 67% among volunteered participants. With 46 observed variables, the required sample size is from 138 to 230 (Cattell, 1978 ). The data from 485 respondents are, therefore, compatible. Based on collected data, both exploratory factor analysis and confirmatory factor analysis (CFA) are conducted to select and arrange the significant variables to particular factors (Byrne, 2010 ; Hair et al. , 2014 ). Finally, structural equation modeling is used for building the model of determinants of the intention to use credit cards (Anderson and Gerbing, 1991 ; Byrne, 2010 ).

4. Findings

4.1 profile of respondents and intention to use credit cards.

The data presented in Table 3 provides the demographic details on a gender, marital status, occupation, age and highest level of academic qualification of the respondents. These controlled variables are considered in this study based on prior studies relevant to consumers’ intended use of credit cards. Prior studies supposed that the differences in these demographic characteristics may lead to the differences in the intention to use credit cards (Dewri et al. , 2016 ; Porto and Xiao, 2019 ; Qureshi et al. , 2018 ).

Of our samples, majority of the respondents are male (51.3%), married (61.4%) compared to female (48.7%) and single (38.6%). Survey participants are mostly young adulthood with 73% of them below the age of 45. The results also show that 20.5% of respondents have college education; 44.7% of them are graduated and 34.8% remaining are post-graduated. Regarding the respondents’ occupation, their largest proportion belongs to public services (30.5%), followed by trading services (26.4%), financial services (25.4%) and industries (15.1%). However, the one-way ANOVA tests in comparing means of intention to use credit card insist that there is no significant difference between independent groups divided by these demographic variables, which is inconsistent to prior studies (Dewri et al. , 2016 ; Porto and Xiao, 2019 ; Qureshi et al. , 2018 ).

4.2 Factor analysis

Applying exploratory factor analysis on data collected from survey questionnaires, 10 factors are extracted from 39 observed variables, except PU4, FIR1, SOR1, which are eliminated from the analysis because its loading factors are less than 0.5 (Hair et al. , 2014 ). These extracted factors are suitable to the proposal model ( Table 4 ). The Kaiser-Meyer-Olkin measure coefficient is 0.847 with a statistical significance of 0.000, indicates that the exploratory factor analysis (EFA) of the independent components is appropriate. A total extracted variance of variables is 62.944%, greater than 50% as required by (Anderson and Gerbing, 1991 ). Observed variables in intention to use credit cards (IU) have high loading coefficients (≥0.82) and its data variation is well-explained (≥78%). Therefore, the measurements are acceptable for CFA ( Byrne, 2010 ).

A CFA is applied for 11 first-order factors with 43 observed variables to examine the model-data fit. Empirical results are shown as follows: χ 2 /df = 2.301, comparative fix index (CFI) = 0.915, Tukey and Lewis index (TLI) = 0.904 and root mean square eror approximation (RMSEA) = 0.052 ( p = 0.000), so the measurement model is compatible with the data (McDonald and Ho, 2002 ). Next, the validity of convergence is achievable because all factor loadings are greater than 0.5 ( Table 4 ) and significant t -statistics (Anderson and Gerbing, 1991 ). Moreover, the average variance extracted (AVE) values ( Table 4 ) are between 0.519 and 0.788, which are greater than both 0.5 and squares of their correlation coefficients ( Table 5 ), respectively, then each construct is a distinct construct and discriminant validity is acceptable (Fornell and Larcker, 1981 ). Therefore, CFA results confirm that 43 observed variables are extracted into 11 first-order constructs, as well as the measurements are model-data fit, discriminant validity, uni-dimensionality, convergence validity and internal consistency reliability.

Due to the existing of second-order factor in the proposed model, the next CFA is needed to estimate the relative of seven first-order risk dimensions, including financial, performance, psychological, social, time, security and privacy risk, with the second-order reflective perceived risk on the measurement model. The results are shown as follows: χ 2 /df = 2.343, CFI = 0.91, TLI = 0.904 and RMSEA = 0.053 ( p = 0.000), so the model fit the data very well (McDonald and Ho, 2002 ). Thus, hypothesis H2 is supported.

4.3 Structural equation modeling

A structural equation model (SEM) is conducted to test the proposed model with 3 independent constructs (social influence, perceived ease of use and perceived risk) and 2 dependent constructs (perceived usefulness and intention to use credit cards), which are measured by 43 observed variables as mentioned in above factor analyzes. Figure 2 shows the whole SEM for the proposed model. All indicators (χ 2 /df = 2.340, CFI = 0.910, TLI = 0.904 and RMSEA = 0.053) show that the proposed model is appropriate for data collected from the market (McDonald and Ho, 2002 ). The result of SEM is described in Table 6 . Whereby, perceived usefulness, perceived risk, social influence and perceived ease of use accounted 50.1% of the variance in intention to use credit cards with coefficients of 0.320, −0.539, 0.141 and 0.089, respectively. Moreover, perceived risk, social influence and perceived ease of use are determinants of perceived usefulness on credit cards. Finally, perceived risk on credit cards is a multi-dimensional construct, which is synthesized from psychological, financial, performance, privacy, time, social and security risk in decreased contribution, respectively. Therefore, all hypotheses are accepted.

5. Discussion

The purpose of this study was to examine the effect of perceived risk on the intended use of credit cards. By integrating popular technology adoption theories, the study assessed the relationships among three exogenous variables (perceived risk, perceived ease of use and social influence) and two endogenous variables (perceived usefulness and behavioral intention). Table 6 and Figure 2 present the results of hypothesis testing for the research model including the path coefficients and their significant values.

First, perceived risk was considered as consumer’s subjective expectations for incident losses relevant to credit card use, which was compared with previous research studies (Nguyen and Cassidy, 2018 ; Tan et al. , 2014 ; Tseng, 2016 ; Varaprasad et al. , 2013 ). The CFA results indicated that perceived risk was a second-order reflective construct related with seven first-order risk dimensions, including financial, performance, psychological, social, time, security and privacy risk. With this finding, the study became very different from prior studies, where perceived risk was conceptualized as one-dimensional construct (Nguyen and Cassidy, 2018 ; Tan et al. , 2014 ; Varaprasad et al. , 2013 ) or two one-dimensional constructs (Tseng, 2016 ). The SEM analysis illustrated that psychological risk (PSR) dimension had the strongest related with the perceived risk, followed by financial risk (FIR), performance risk (PER), privacy risk (PRR), time risk (TIR), social risk (SOR) and security risk (SER).

Subsequently, perceived risk was found to have a negative effect on the intended use with the largest level of impact ( β = −0.539), which was almost equal to the total of impact level from three remaining factors in the model. This finding had contributed to the TPR (Bauer, 1960 ) by insisting the negative impact of perceived risk in behavioral research on credit cards, which Tan et al. (2014) , Tseng (2016) and Varaprasad et al. (2013) could not. Furthermore, this result was better than those of previous studies (Nguyen and Cassidy, 2018 ) with its impact level of −0.18. The results insisted the significant relationship between perceived risk and perceived usefulness, which Nguyen and Cassidy (2018) , Tan et al. (2014) and Varaprasad et al. (2013) did not mention or Tseng (2016) failed to prove. These findings made the present study different from previous works.

Finally, the SEM analysis confirmed the relationships among perceived usefulness, perceived ease of use, social influence and behavioral intention. The findings showed that perceived ease of use and social influence have positive impact on both perceived usefulness ( β EOU = 0.428, β SI = 0.218) and the intended use ( β EOU = 0.089, β SI = 0.141). In turn, perceived usefulness also affected on the intention to use. Thus, this study demonstrated all hypotheses related to perceived usefulness, perceived ease of use, social influence. These findings were consistent with prior studies (Leong et al. , 2013 ; Nguyen and Cassidy, 2018 ; Tan et al. , 2014 ).

6. Conclusions

This study is a pioneering effort in context of credit card adoption by proposing a theoretical model to determine factors affecting consumer intention to use credit cards, including perceived risk from TPR (Bauer, 1960 ), perceived usefulness, perceived ease of use and social influence from TRA, TAM, TPB and UTAUT. Based on collected data from 485 bank customers, this study reveals that perceived risk is a reflective second-order factor related to seven first-order risk dimensions – psychological, financial, performance, privacy, time, social and security risk. The results show that the intended use of credit cards is affected by perceived risk, followed by perceived usefulness, social influence and perceived ease of use in decreased ranking. All these factors encourage consumer to use credit cards, except perceived risk. Moreover, perceived risk, perceived ease of use and social influence are antecedents of perceived usefulness on credit cards.

This study has both theoretical and practical contributions. The first theoretical contribution of this work was to conceptualize perceived risk as a reflective second-order construct, that was modeled and decomposed into the seven first-order risk dimensions, including psychological, financial, performance, privacy, time, social and security risk. Second, the research contributed to the literature on consumer behavior by confirming the impact of perceived risk on the intended use of credit cards, which most previous studies have not demonstrated. Finally, the research findings provided an empirical evidence as theoretical contribution to the academic research platform on e-banking services in Vietnam, especially related to the credit card industry.

This study can be beneficial to banks enacting policies to attract more consumers and to help decide how to allocate resources to retain and expand their customer base. Based on factors influencing consumer intended use of credit cards, banks may encourage them to own and use credit cards for paying goods and services. As the findings imply, banks should focus their resources on overcoming the risk aspects, which can help motivating potential consumers. Banks should advertise that credit card is not a risky service by providing positive reviews at point of sales or in mass media. The publicity of loss protection policies and service-level agreements may reduce potential losses of performance or finance. Additional effective risk preventing policies may include money back guarantees, so that consumers feel more comfortable and safe with the system. Other whence, the positive impact of perceived usefulness, perceived ease of use and social influence on credit card acceptance can be exploited by banks in framing or refining the transactional procedures or relevant services. In the constantly changing business world, banks and related stakeholders should add more useful features and services to credit cards and they should simplify the procedures in making payment via credit cards. Therefore, they will be ready to accept the offers made by credit card issuers and encourage others to use credit cards.

Although this study provided substantive explanations for perceived risk and its effect on consumer intention to use credit cards, it still has several limitations. First, the first-order risk dimensions were measured based on the payment function of the credit card only; these measurements missed potential losses relevant to credit function of credit cards. Second, the present study focused on perceived risk and other factors as the antecedents of the intention to use credit cards while these relationships might be moderated by age, gender, experience, etc. Finally, the empirical data are collected randomly from only Vietnamese bank customers; this limited data may mislead to the accuracy and explain the ability of the proposed theoretical model. Thereby, future studies may perform a multi-national survey on both payment and credit functions of credit cards, as well as integrating reasonable moderators into the proposed model to address these shortcomings.

Proposed theoretical model

Proposed research model and the result of SEM

Multi-dimensional perceived risk

Dimension of perceived riskDefinition
FIR Potential financial losses due to purchasing a subscription to a poorly performing e-service or potential internet-based fraud
PER Potential performance problems, malfunctioning, transaction processing errors, reliability and/or security problems, and therefore, not performing as expected
SOR Potential losses to their perceived status in their social group as a result of using an e-service
PSR Potential losses to their self-esteem, peace of mind or self-perception (ego) due to worrying, feeling frustrated, foolish or stressful as a result of using an e-service
TIR Potential losses to convenience, time and effort caused by wasting time researching, purchasing, setting up, switching to and learning how to use the e-service
SER Potential losses involving transmitting sensitive data through e-services that breach technological data protection
PRR Potential losses to the privacy and confidentiality of their personally identifying information and that e-service usage exposes them to potential identity theft

;

Constructs No. of itemsSources
Perceived usefulness (PU) 7
FIR 4
PER 4 (2015)
SOR 4 (2015)
PSR 3 (2015)
TIR 3 (2015)
SER 4
PRR 4
Perceived ease of use (EOU) 5
Social influence (SI) 4
Intention to use credit card (IU) 4

Descriptive statistics and mean comparative analysis

Variable Freq. (%) Mean
Female 236 48.7 3.72
Male 249 51.3 3.62
Under 35 207 42.7 3.73
From 35 to 45 147 30.3 3.71
Above 45 131 27.0 3.68
Under 500 89 18.4 3.70
500–900 208 42.9 3.66
900–1,600 131 27.0 3.61
1,600–3,200 46 9.4 3.74
Above 3,200 11 2.3 4.18
Single 187 38.6 3.65
Married 298 61.4 3.68
College and lower 99 20.5 3.65
Graduated 217 44.7 3.71
Higher graduated 169 34.8 3.70
Industries 73 15.1 3.62
Trading services 128 26.4 3.66
Financial services 123 25.4 3.76
Public services 148 30.5 3.68
Other 13 2.6 3.31

Factor analysis

Loading coefficients
Construct EFA CFA Correlated item-total
PU1. Purchase without carrying cash 0.719 0.771 0.680
PU2. Buy first and repay later 0.844 0.784 0.714
PU3. Pay the bill 0.593 0.637 0.590
PU4. Cash withdraw at ATM
PU5. Installment purchase 0.774 0.766 0.722
PU6. Free of interest for up to 45 days 0.656 0.675 0.608
PU7. Revolving credit 0.635 0.674 0.618
EOU1. Simple registration 0.699 0.684 0.650
EOU2. Use credit card easily 0.854 0.839 0.775
EOU3. Learn to use easily 0.927 0.913 0.810
EOU4. Ease to use 0.825 0.827 0.739
EOU5. Use everywhere and every time 0.549 0.581 0.555
SI1. Family 0.736 0.724 0.659
SI2. Friends 0.762 0.791 0.708
SI3. Colleagues 0.794 0.791 0.717
SI4. Multi-media 0.759 0.772 0.691
SER1. Credit card may be copied or counterfeited 0.860 0.844 0.794
SER2. Payment via website is unsecured 0.865 0.856 0.811
SER3. Payment on ATM/POS is unsecured 0.826 0.847 0.799
SER4. Payment systems may be attacked or hacked 0.848 0.856 0.811
PRR1. Personal information is collected 0.883 0.884 0.836
PRR2. Personal information is shared in internet 0.903 0.88 0.837
PRR3. Personal information is used illegally 0.852 0.842 0.806
PRR4. Personal information is hijacked 0.879 0.887 0.844
PER1. Unusable due to technical errors 0.645 0.727 0.621
PER2. Insatiable my spending needs 0.826 0.757 0.668
PER3. Do not help me control spending 0.717 0.701 0.624
PER4. Not well-performed as advertised 0.664 0.707 0.617
FIR1. It will cost me money to use credit card
FIR2. Lose by my typing mistakes 0.668 0.7 0.614
FIR3. Lose by others’ unlawful activity 0.820 0.799 0.702
FIR4. There is no compensation for lost money 0.786 0.806 0.679
TIR1. It takes time to learn how to use 0.848 0.844 0.730
TIR2. It takes time to perform transactions 0.771 0.725 0.645
TIR3. It takes time to solve problems 0.737 0.795 0.677
SOR1. My relatives discourage me
SOR2. I am judged negatively by others 0.831 0.833 0.754
SOR3. I look foolish to others 0.881 0.879 0.790
SOR4. No direct support from service providers 0.794 0.803 0.741
PSR1. I feel anxious 0.715 0.694 0.575
PSR2. I feel frustrated 0.737 0.881 0.666
PSR3. I feel depressed 0.626 0.564 0.493
IU1. I am desire to use 0.867 0.877 0.829
IU2. I plan to use 0.930 0.927 0.884
IU3. I use it as soon as possible 0.922 0.913 0.879
IU4. I will use it usually in the future 0.825 0.831 0.797

Correlation coefficients matrix

TIR PU PRI SEC EOU SOR IU FIR SI PER PSR
TIR 0.786
PU −0.040 0.719
PRI 0.341 0.007 0.873
SEC 0.192 −0.084 0.224 0.853
EOU −0.054 0.479 0.115 −0.041 0.777
SOR 0.273 −0.157 0.066 0.131 −0.173 0.839
IU −0.262 0.478 −0.294 −0.199 0.295 −0.351 0.888
FIR 0.334 −0.056 0.409 0.132 0.049 0.188 −0.345 0.774
SI −0.059 0.350 −0.148 −0.163 0.324 −0.135 0.348 −0.054 0.770
PER 0.282 −0.154 0.382 0.300 0.003 0.158 −0.406 0.320 −0.148 0.723
PSR 0.428 −0.100 0.402 0.271 −0.085 0.344 −0.396 0.461 −0.110 0.427 0.725

Results of the structural equation model

Hypothesis Relationship Estimate S.E. CR . Result
PU → IU 0.320 0.048 6.359 *** Accepted
FIR ← PR 0.609 Accepted
PER ← PR 0.590 0.126 7.360 *** Accepted
PSR ← PR 0.707 0.145 7.414 *** Accepted
SOR ← PR 0.392 0.141 7.698 *** Accepted
TIR ← PR 0.553 0.112 5.979 *** Accepted
SER ← PR 0.340 0.125 5.478 *** Accepted
PRR ← PR 0.569 0.152 7.838 *** Accepted
PR → PU −0.103 0.071 −1.951 0.051 Accepted
PR → IU −0.539 0.087 −7.934 *** Accepted
EOU → PU 0.428 0.047 8.080 *** Accepted
EOU → IU 0.089 0.038 1.987 0.047 Accepted
SI → PU 0.218 0.047 4.434 *** Accepted
SI → IU 0.141 0.038 3.327 *** Accepted

Ajzen , I. ( 1991 ), “ The theory of planned behavior ”, Organizational Behavior and Human Decision Processes , Vol. 50 No. 2 , pp. 179 - 211 .

Alalwan , A. , Dwivedi , Y. and Rana , N. ( 2017 ), “ Factors influencing adoption of mobile banking by Jordanian bank customers: extending UTAUT2 with trust ”, International Journal of Information Management , Vol. 37 No. 3 , pp. 99 - 110 , doi: 10.1016/j.ijinfomgt.2017.01.002 .

Ali , M. , Raza , S. and Puah , C. ( 2017 ), “ Factors affecting to select Islamic credit cards in Pakistan: the TRA model ”, Journal of Islamic Marketing , Vol. 8 No. 3 , pp. 330 - 344 , doi: 10.1108/JIMA-06-2015-0043 .

Amin , H. ( 2013 ), “ Factors influencing Malaysian bank customers to choose Islamic credit cards: empirical evidence from the TRA model ”, Journal of Islamic Marketing , Vol. 4 No. 3 , pp. 245 - 263 , doi: 10.1108/JIMA-02-2012-0013 .

Anderson , J.C. and Gerbing , D.W. ( 1991 ), “ Predicting the performance of measures in a confirmatory factor analysis with a pretest assessment of their substantive validities ”, Journal of Applied Psychology , Vol. 76 No. 5 , pp. 732 - 740 .

Bauer , R.A. ( 1960 ), Consumer Behavior as Risk Taking, Dynamic Marketing for a Changing World , American Marketing Association , Chicago .

Byrne , B.M. ( 2010 ), Structural Equation Modeling with Amos: Basic Concepts, Applications and Programming , 2th ed . Taylor and Francis Group , New Jersey .

Cao , Q. and Niu , X. ( 2019 ), “ Integrating context-awareness and UTAUT to explain Alipay user adoption ”, International Journal of Industrial Ergonomics , Vol. 69 , pp. 9 - 13 , doi: 10.1016/j.ergon.2018.09.004 .

Cattell , R.B. ( 1978 ), “ Matched determiners vs factor invariance: a reply to Korth ”, Multivariate Behavioral Research , Vol. 13 No. 4 , pp. 431 - 448 .

Chahal , H. , Sahi , G. and Rani , A. ( 2014 ), “ Moderating role of perceived risk in credit card usage and experience link ”, Journal of Indian Business Research , Vol. 6 No. 4 , pp. 286 - 308 , doi: 10.1108/JIBR-06-2014-0034 .

Chhonker , M. , Verma , D. and Kar , A. ( 2017 ), “ Review of technology adoption frameworks in mobile commerce ”, Procedia Computer Science , Vol. 122 , pp. 888 - 895 .

Dali , N. , Yousafzai , S. and Hamid , H. ( 2015 ), “ Credit cards preferences of Islamic and conventional credit card ”, Journal of Islamic Marketing , Vol. 6 No. 1 , pp. 72 - 94 , doi: 10.1108/JIMA-05-2013-0039 .

Davis , F.D. , Bagozzi , R.P. and Warshaw , P.R. ( 1989 ), “ User acceptance of computer technology: a comparison of two theoretical models ”, Management Science , Vol. 35 No. 8 , pp. 982 - 1003 .

Dewri , L. , Islam , R. and Saha , N. ( 2016 ), “ Behavioral analysis of credit card users in a developing country: a case of Bangladesh ”, International Journal of Business and Management , Vol. 11 No. 4 , p. 299 , doi: 10.5539/ijbm.v11n4p299 .

Featherman , M.S. and Pavlou , P.A. ( 2003 ), “ Predicting e-services adoption: a perceived risk facets perspective ”, International Journal of Human-Computer Studies , Vol. 59 No. 4 , pp. 451 - 474 , doi: 10.1016/S1071-5819(03)00111-3 .

Fishbein , M. and Ajzen , I. ( 1975 ), Belief, Attitude, Intention and Behavior: An Introduction to Theory and Research , Addison-Wesley , Reading .

Fornell , C. and Larcker , D.F. ( 1981 ), “ Evaluating structural equation models with unobservable variables and measurement error ”, Journal of Marketing Research , Vol. 18 No. 1 , pp. 39 - 50 .

Foscht , T. , Maloles , C. , Swoboda , B. and Chia , S. ( 2010 ), “ Debit and credit card usage and satisfaction: who uses which and why – evidence from Austria ”, International Journal of Bank Marketing , Vol. 28 No. 2 , pp. 150 - 165 , doi: 10.1108/02652321011018332 .

Hair , J. , Black , W. , Babin , B. Anderson , R. and Tatham , R. ( 2014 ), Multivariate Data Analysis , 7th ed . Pearson Education , Hoboken, NJ .

Hanafizadeh , P. and Khedmatgozar , H. ( 2012 ), “ The mediating role of the dimensions of the perceived risk in the effect of customers’ awareness on the adoption of internet banking in Iran ”, Electronic Commerce Research , Vol. 12 No. 2 , pp. 151 - 175 , doi: 10.1007/s10660-012-9090-z .

Khare , A. , Khare , A. and Singh , S. ( 2012 ), “ Factors affecting credit card use in India ”, Asia Pacific Journal of Marketing and Logistics , Vol. 24 No. 2 , pp. 236 - 256 , doi: 10.1108/13555851211218048 .

Leong , L. , Hew , T. , Tan , G. and Ooi , K. ( 2013 ), “ Predicting the determinants of the NFC-enabled mobile credit card acceptance: a neural networks approach ”, Expert Systems with Applications , Vol. 40 No. 14 , pp. 5604 - 5620 , doi: 10.1016/j.eswa.2013.04.018 .

Liébana , F. , Luna , I. and Montoro , F. ( 2017 ), “ Intention to use new mobile payment systems: a comparative analysis of SMS and NFC payments ”, Economic Research-Ekonomska Istrazivanja , Vol. 30 , pp. 892 - 910 , doi: 10.1080/1331677X.2017.1305784 .

Liu , Z. , Ben , S. and Zhang , R. ( 2019 ), “ Factors affecting consumers’ mobile payment behavior: a meta-analysis ”, Electronic Commerce Research , Vol. 19 No. 3 , pp. 575 - 601 , doi: 10.1007/s10660-019-09349-4 .

McDonald , R.P. and Ho , M.R. ( 2002 ), “ Principles and practice in reporting structural equation analyses ”, Psychological Methods , Vol. 7 No. 1 , pp. 64 - 82 , doi: 10.1037/1082-989X.7.1.64 .

Malaquias , R. and Hwang , Y. ( 2019 ), “ Mobile banking use: a comparative study with Brazilian and US participants ”, International Journal of Information Management , Vol. 44 , pp. 132 - 140 , doi: 10.1016/j.ijinfomgt.2018.10.004 .

Martins , C. , Oliveira , T. and Popovič , A. ( 2014 ), “ Understanding the internet banking adoption: a unified theory of acceptance and use of technology and perceived risk application ”, International Journal of Information Management , Vol. 34 No. 1 , pp. 1 - 13 , doi: 10.1016/j.ijinfomgt.2013.06.002 .

Mitchell , V.W. ( 1999 ), “ Consumer perceived risk conceptualizations and models ”, European Journal of Marketing , Vol. 33 Nos 1/2 , pp. 163 - 195 .

Mutahar , A. , Daud , N. , Ramayah , T. , Isaac , O. and Aldholay , A. ( 2018 ), “ The effect of awareness and perceived risk on the technology acceptance model (TAM): mobile banking in Yemen ”, International Journal of Services and Standards , Vol. 12 No. 2 , pp. 180 - 204 , doi: 10.1504/IJSS.2018.091840 .

Nguyen , O.D.Y. and Cassidy , J.F. ( 2018 ), “ Consumer intention and credit card adoption in Vietnam ”, Asia Pacific Journal of Marketing and Logistics , Vol. 30 No. 4 , pp. 779 - 796 , doi: 10.1108/APJML-01-2017-0010 .

Ooi , K. and Tan , G. ( 2016 ), “ Mobile technology acceptance model: an investigation using mobile users to explore smartphone credit card ”, Expert Systems with Applications , Vol. 59 , pp. 33 - 46 , doi: 10.1016/j.eswa.2016.04.015 .

Patil , P. , Rana , N. , Dwivedi , Y. and Hamour , H. ( 2018 ), “ The role of trust and risk in mobile payments adoption: a meta-analytic review ”, Pacific Asia Conference on Information Systems .

Pelaez , A. , Chen , C. , Chen , Y.X. and Pelaez , A. ( 2019 ), “ Effects of perceived risk on intention to purchase: a meta-analysis ”, Journal of Computer Information Systems , Vol. 59 No. 1 , pp. 73 - 84 , doi: 10.1080/08874417.2017.1300514 .

Porto , N. and Xiao , J. ( 2019 ), “ Credit card adoption and usage in China: urban-rural comparisons ”, The Singapore Economic Review , Vol. 64 No. 01 , pp. 41 - 56 , doi: 10.1142/S021759081743010X .

Qureshi , J. , Baqai , S. and Qureshi , M. ( 2018 ), “ Consumers’ attitude towards usage of debit and credit cards: evidences from the digital economy of Pakistan ”, International Journal of Economics and Financial Issues , Vol. 8 , pp. 220 - 228 .

Roy , S. , Balaji , M. , Kesharwani , A. and Sekhon , H. ( 2017 ), “ Predicting internet banking adoption in India: a perceived risk perspective ”, Journal of Strategic Marketing , Vol. 25 Nos 5/6 , pp. 418 - 438 , doi: 10.1080/0965254X.2016.1148771 .

Sripalawat , J. , Thongmak , M. and Ngramyarn , A. ( 2011 ), “ M-banking in metropolitan Bangkok and a comparison with other countries ”, Journal of Computer Information Systems , Vol. 51 , pp. 67 - 76 , doi: 10.1080/08874417.2011.11645487 .

Tam , C. and Oliveira , T. ( 2017 ), “ Literature review of mobile banking and individual performance ”, International Journal of Bank Marketing , Vol. 35 No. 7 , pp. 1044 - 1067 , doi: 10.1108/ijbm-09-2015-0143 .

Tan , G. , Ooi , K. , Chong , S. and Hew , T. ( 2014 ), “ NFC mobile credit card: the next frontier of mobile payment? ”, Telematics and Informatics , Vol. 31 No. 2 , pp. 292 - 307 , doi: 10.1016/j.tele.2013.06.002 .

Tandon , U. , Kiran , R. and Sah , A. ( 2016 ), “ Understanding online shopping adoption in India: unified theory of acceptance and use of technology 2 (UTAUT2) with perceived risk application ”, Service Science , Vol. 8 No. 4 , pp. 420 - 437 , doi: 10.1287/serv.2016.0154 .

Trinh , H.N. and Vuong , D.H.Q. ( 2017 ), “ Developing credit card market from Vietnamese consumers’ perspective ”, Journal of Science Ho Chi Minh City Open University , Vol. 21 No. 1 , pp. 61 - 75 .

Tseng , S. ( 2016 ), “ Bringing enjoy shopping by using credit cards: the antecedents of internal beliefs ”, Journal of Economics and Economic Education Research , Vol. 17 , p. 16 .

Varaprasad , G. , Chandran , K. , Sridharan , R. and Unnithan , A. ( 2013 ), “ An empirical investigation on credit card adoption in India ”, International Journal of Service Science, Management, Engineering, and Technology , Vol. 4 , pp. 13 - 29 , doi: 10.4018/jssmet.2013010102 .

Venkatesh , V. , Morris , M. , Davis , G. and Davis , F. ( 2003 ), “ User acceptance of information technology: toward a unified view ”, MIS Quarterly: Management Information Systems , Vol. 27 , pp. 425 - 478 , doi: 10.2307/30036540 .

Yang , Y. , Liu , Y. , Li , H. and Yu , B. ( 2015 ), “ Understanding perceived risks in mobile payment acceptance ”, Industrial Management and Data Systems , Vol. 115 No. 2 , pp. 253 - 269 , doi: 10.1108/IMDS-08-2014-0243 .

Zhang , Y. , Weng , Q. and Zhu , N. ( 2018 ), “ The relationships between electronic banking adoption and its antecedents: a meta-analytic study of the role of national culture ”, International Journal of Information Management , Vol. 40 , pp. 76 - 87 , doi: 10.1016/j.ijinfomgt.2018.01.015 .

Corresponding author

Related articles, all feedback is valuable.

Please share your general feedback

Report an issue or find answers to frequently asked questions

Contact Customer Support

Subscribe to the PwC Newsletter

Join the community, edit social preview.

research paper for credit card

Add a new code entry for this paper

Remove a code repository from this paper, mark the official implementation from paper authors, add a new evaluation result row.

TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK REMOVE

Remove a task

Add a method, remove a method, edit datasets, data analysis on credit card debt: rate of consumption and impact on individuals and the us economy.

15 Jul 2024  ·  Mayowa Akinwande , Alexander Lopez , Tobi Yusuf , Austine Unuriode , Babatunde Yusuf , Toyyibat Yussuph , Stanley Okoro · Edit social preview

This paper provides a comprehensive examination of the evolution of credit cards in the United States, tracing their historical development, causes, consequences, and impact on both individuals and the economy. It delves into the transformation of credit cards from specialized merchant cards to ubiquitous financial tools, driven by legal changes like the Marquette decision. Credit card debt has emerged as a significant financial challenge for many Americans due to economic factors, consumerism, high healthcare costs, and financial illiteracy. The consequences of this debt on individuals are extensive, affecting their financial well-being, credit scores, savings, and even their physical and mental health. On a larger scale, credit cards stimulate consumer spending, drive e-commerce growth, and generate revenue for financial institutions, but they can also contribute to economic instability if not managed responsibly. The paper emphasizes various strategies to prevent and manage credit card debt, including financial education, budgeting, responsible credit card uses, and professional counselling. Empirical studies support the relationship between credit card debt and factors such as financial literacy and consumer behavior. Regression analysis reveals that personal consumption and GDP positively impacts credit card debt indicating that responsible management is essential. The paper offers comprehensive recommendations for addressing credit card debt challenges and maximizing the benefits of credit card usage, encompassing financial education, policy reforms, and public awareness campaigns. These recommendations aim to transform credit cards into tools that empower individuals financially and contribute to economic stability, rather than sources of financial stress.

Code Edit Add Remove Mark official

Datasets edit.

COMMENTS

  1. (PDF) Credit Card Analytics: A Review of Fraud Detection and Risk

    The paper further explores the intricacies of data management within the credit card indu stry, underscoring the i mporta nce of high-quality, stand ardized data for accurate modeling.

  2. Credit Card Fraud Detection using Machine Learning Algorithms

    Abstract. Credit card frauds are easy and friendly targets. E-commerce and many other online sites have increased the online payment modes, increasing the risk for online frauds. Increase in fraud rates, researchers started using different machine learning methods to detect and analyse frauds in online transactions.

  3. Neural mechanisms of credit card spending

    Abstract. Credit cards have often been blamed for consumer overspending and for the growth in household debt. Indeed, laboratory studies of purchase behavior have shown that credit cards can ...

  4. Credit card fraud detection in the era of disruptive technologies: A

    The work in Al-Hashedi and Magalingam (2021) covers research papers on financial fraud in general from 2009 to 2019 inclusive. It mainly discusses works based on data mining techniques and classifies the literature based on range of factors, including publication year, publisher, method used, and research area (credit fraud, cryptocurrency ...

  5. Credit Card Fraud Detection Using Machine Learning

    card statistics 2021) the number of people using credit cards around the world was 2.8 billion in 2019, in addition 70% of those users own a single card at least. Reports of Credit card fraud in the US rose by 44.7% from 271,927 in 2019 to 393,207 reports in 2020. There are two kinds of credit card fraud, the first one is by having a credit

  6. (PDF) Credit Card Fraud Detection

    1.3 "A Research Paper on Credit Card Fraud Detection" The proposed model involves pre-processing the credit card transaction data and then apply- ing various

  7. A machine learning based credit card fraud detection using the GA

    The recent advances of e-commerce and e-payment systems have sparked an increase in financial fraud cases such as credit card fraud. It is therefore crucial to implement mechanisms that can detect the credit card fraud. Features of credit card frauds play important role when machine learning is used for credit card fraud detection, and they must be chosen properly. This paper proposes a ...

  8. Enhanced credit card fraud detection based on attention mechanism and

    As credit card becomes the most popular payment mode particularly in the online sector, the fraudulent activities using credit card payment technologies are rapidly increasing as a result. For this end, it is obligatory for financial institutions to continuously improve their fraud detection systems to reduce huge losses. The purpose of this paper is to develop a novel system for credit card ...

  9. Modelling customers credit card behaviour using bidirectional LSTM

    The model was trained on a real credit card dataset and the customer behavioural scores are analysed using classical measures such as accuracy, Area Under the Curve, Brier score, Kolmogorov-Smirnov test, and H-measure. ... Therefore, the research of this paper is motivated by the necessity of automatically scoring the customer's behaviour ...

  10. Credit Card Fraud Detection: A Systematic Review

    When the research is based on big data analytics, there will be a huge volume of data which can be implemented in Apache Hadoop, Spark, etc. Tensorflow, H2O, Pytorch, Keras, etc. are the libraries imported in the application of deep learning. ... Artikis, A., et al.: A prototype for credit card fraud management: industry paper. In: Proceedings ...

  11. (PDF) Consumers and credit cards: A credit cards: A review of the

    Research in the area of consumer credit card abundance of literature in the business, psychology, and public policy fields. 1960s, the work revolved around descriptive characteristics and evolved as scholars probed deeper by investigating ... Since the first paper on consumer credit cards was published in 1969, researchers have attempted to ...

  12. Examining the dynamics leading towards credit card usage ...

    Many researchers have investigated the consumer's attitude towards using credit cards. However, how the different attributes contribute to credit card usage attitude is not evident. Thus, the main theoretical contribution of this study is to examine the importance and performance of a set of variables that explain the attitude towards using credit cards. It provides essential inputs to ...

  13. The Impact of Credit Cards on Spending: A Field Experiment

    1 Introduction. In this paper, we report results from the fi rst field experiment to examine the impact of. credit cards on spending, a quest ion of great interest for economics, law and public ...

  14. Review of Machine Learning Approach on Credit Card Fraud Detection

    This research paper seeks to review and evaluate various aspects of credit and debit fraud detection. The paper examines various techniques used to detect fraudulent credit card transactions and finally proposes a better technique for credit card fraud. ... There has been various research done by using Credit card data in a privacy-preserving ...

  15. A systematic review of literature on credit card cyber fraud detection

    The review investigates the present status of research on detecting cyber fraud in credit card and addresses our research questions. The methodology begins with a description of the data sources, the search strategy, the inclusion and exclusion criteria, as well as the quantity of research article selected from the different databases. ...

  16. Research article Investigating the associations of consumer financial

    According to the U.S. Credit Card Statistics in 2021, 70.2% of consumers have at least one credit card, and 14% have at least ten. Moreover, the number of credit card accounts increased by 2.5% year-over-year, implying that credit cards have become a primary and vital payment method in modern societies.

  17. Credit Card Fraud Detection Using Machine Learning

    Credit card fraud detection is presently the most frequently occurring problem in the present world. This is due to the rise in both online transactions and e-commerce platforms. Credit card fraud generally happens when the card was stolen for any of the unauthorized purposes or even when the fraudster uses the credit card information for his use. In the present world, we are facing a lot of ...

  18. Determinants of consumers' intention to use credit card: a perspective

    Consumers prefer credit cards due to uncertainty when carrying cash (Khare et al., 2012) or special discounts from famous brands (Dali et al., 2015). They use credit cards as a source of revolving credit with long grace period (Chahal et al., 2014; Khare et al., 2012). They can even withdraw cash by credit cards as required (Chahal et al., 2014).

  19. Credit Cards, Credit Utilization, and Consumption

    Figure 1 shows how the average U.S. consumer's credit card limit and debt varied significantly from 2000-2014. From 2000-2008, the average credit card limit increased by approximately 40 percent, from around $10,000 to a peak of $14,000. During 2009, overall limits collapsed rapidly before recovering slightly in 2012.

  20. PDF 2021 Consumer Credit Card Market Report

    Credit cards are central to the financial lives of over 175 million American consumers. Over the last few years and through 2019, the credit card market, the largest U.S. consumer lending market measured by number of users, continued to grow in almost all measures until suddenly reversing course in March 2020.

  21. Papers with Code

    Credit card debt has emerged as a significant financial challenge for many Americans due to economic factors, consumerism, high healthcare costs, and financial illiteracy. The consequences of this debt on individuals are extensive, affecting their financial well-being, credit scores, savings, and even their physical and mental health.

  22. Buy now, pay later (BNPL) ...on your credit card

    1. Introduction. 'Buy now, pay later' (BNPL) is an unregulated FinTech credit product enabling consumers to defer payments interest-free into one or more (often four or fewer) instalments. With £2.7bn in UK BNPL lending during 2020, the UK BNPL market is larger by volume of lending than the UK payday loan market at its peak.

  23. Antecedents of credit card usage behaviour: An Indian perspective

    This paper will explore the potential connections between credit card usage and financial well-being in India, drawing on available research and data. We will look at factors such as debt levels, savings rates, and financial literacy in relation to credit card usage, and examine the ways in which cultural and societal factors may shape the ...

  24. (PDF) Credit Cards: A Sectoral Analysis

    Objective: This paper aims at sectoral analysis of the credit card industry in India by considering top three credit card issuers i.e., HDFC bank, SBI Cards, and ICICI Bank. Methodology: In order ...