At Sift Science, we analyze a lot of data. We distill fraud signals in real-time from terabytes of data and more than a billion global events per month. Previously, we discovered that the U.S. has more fraud than Nigeria and solved the mystery of Doral, FL. At our “Cats N’ Hacks” Hackathon last week, I decided to put some of our fraud signals to the test. Working with our Machine Learning Engineer, Keren Gu, we discovered some interesting fraud patterns[1]:

Habit #1: Fraudsters Go Hungry

Normal Transactions When we looked at total non-fraudulent (normal) transactions by hour, normal users had slow starts to their mornings. We noticed a slight dip in transaction volume around lunchtime and suspect that’s because people are taking lunch breaks! Happily fed, they resumed activity in the afternoon and activity petered out as users went home for the day.

What about fraudsters?

Fraudulent Transactions

Fraudsters, however, work through lunch. We don’t see the same dip in activity during lunchtime in the fraudulent sample. It seems that fraudsters are too busy scheming their next move.

Habit #2: Fraudsters Are Night Owls

night owls

When we analyzed fraudulent transactions as a percentage of all transactions, 3AM was the most fraudulent hour in the day, and night-time in general was a more dangerous time. This finding is consistent with our historical findings and it makes sense: fraudsters are more likely to execute attacks outside of normal business hours when employees aren’t around to monitor fraud.

Habit #3: Fraudsters Are International

international

Indian email address domains had one of the highest fraud rates when compared to other top-level domains. However, don’t give up on those great Bollywood movies just yet! We’re only looking at data from the past three months. We’ve seen this list fluctuate quite a bit depending on what new tactics fraudsters use.

Habit #4: Fraudsters Don Multiple Identities

multipleidentities

Fraudsters tend to make multiple accounts on their laptop or phone to commit fraud. When multiple accounts are associated with the same device, the higher the likelihood of fraud. The graph above shows how many times more likely a user is fraudulent given the number of accounts associated with the user’s device. Phew, that was a mouthful! Said in another way, a user who has 6 accounts on her laptop is 15 times more likely to be fraudulent than the average person. Users with only 1 account however, are less likely to be fraudulent.

Habit #5: Fraudsters Still Use Microsoft

outlook

Some of the most fraudulent email domains are operated by Microsoft. Why could this be? Two possible reasons are that 1) Microsoft has been around for a lot longer and 2) email addresses were easier to create back in the day. Today, websites use challenge responses such as image verification or two-factor authentication to verify your [tooltip tip=”and innocent!”]legitimate[/tooltip] identity.

Habit #6: Fraudsters Are Really Boring

boring

One of the most widely recognized predictors of fraud is the number of digits in an email address. The more numbers, the more likely that it’s fraud. Why? Because fraudsters are boring (and lazy). They use computer programs to sequentially generate email addresses so they don’t have to think of new ones. Emails such as “foo1234@test.com” or “foo1234568@testing.com” are highly suspicious. However, detecting fraud using email address alone can be really difficult. The only way to really get good at detecting fraud is to look at hundreds of signals, sometimes in the thousands (that’s where machine learning can help).

Habit #7: Fraudsters Are Sneaky

sneaky

Fraudsters like to create disposable accounts that are short-lived. In analyzing the age of fraudulent user accounts (meaning, the amount of time between account creation and a fraudulent transaction), we found that they sign up on sites and then quickly commit fraud. The longer the account age, the less likely the user is committing fraud. Nonetheless, experienced fraudsters know that fraud detection companies track this type of signal. In the graph above, we noticed “sleeper” fraud agents became active after 30 and 60 days of account creation. Fraudsters are sneaky!

Obviously, the above is not a definitive sample set. Data can help us find potential answers as to why fraudsters behave in the ways that they do, but as statisticians say, “correlation is not causation”! It’s important to use common sense and human intuition when it comes to dealing with fraud.

These Insights Brought To You By:

Sift Science Cats n' Hacks Hackathon

Did you like reading about these insights and patterns? What other fraud signals interest you? Let us know in the comments, and we’ll pick out a few to write about in our next installment!

[1] Data was collected from the past three months over our entire network. From the hundreds of millions of transactions we processed during that time, we analyzed about 6 million. Our “fraud” sample consisted of transactions confirmed fraudulent by our customers; our “normal” sample consisted of transactions confirmed by our customers to be non-fraudulent, as well as a subset of unlabeled transactions. Please keep in mind that every company faces different type of fraud, and that our findings may not be representative of what you see. All transaction timestamps are local to the user.

  1. Non-fraudulent users such as myself also use digits in their e-mail address. It is an anti-spam measure. Such an address is not a new identity; purpose is to generate temporary e-mail addresses for online transactions, rather than to give away your real address (thereby opening it to spam). The temporary address with digits works only for a while, then is discarded.
    I hope nobody reads your article thinking that they can start writing automated rules which identify fraudsters by the number of digits which appear in their e-mail address.

    1. Hey Kaz, that’s a great point. It’s generally a bad idea to make business decisions off of simplified rules. That’s why we use an adaptive approach using machine learning technology to analyze thousands of signals at once – not just the number of digits. We’re really careful not to disclose our more complex and powerful ones to the public, but I can say that we use signals such as position of digits in email address, user’s page visit sequence, etc. Every company has a different type of fraud problem and we help them predict the likelihood of fraud based on their customer’s behaviors and signals we see in our global network.

  2. Generally valid (along with other considerations) among fraudster orders. @Kaz it’s made pretty clear that you can’t write one rule for email digits in the sentence, “However, detecting fraud using email address alone can be really difficult. The only way to really get good at detecting fraud is to look at hundreds of signals, sometimes in the thousands.” Anyone doing so, is going to quickly find their fraud filtering is majorly lacking.
    One of the clues I like is regarding, phone number area codes mismatching billing and shipping addresses. People move, but often, fraudsters leave either a bogus number, or their own number for a different city/state than the card holder. Fraudsters will go as far as to link the same bogus phone number on multiple accounts.

    I’d give you more insights, but publicly that would only help the fraudsters get better. 😉

  3. Thank you for a very interesting article, whilst the results are not a total surprise it does go to show how fraudsters behave and they are still not that bright.

  4. A legitimate user could easily trip several of these red flags. Some people are insomiacs, or do shift works. Or have temporary jet lag. So: legitimate transaction happens at 3 a.m. Some people don’t like putting in honest information into every field on a form: they put in a legitimate shipping address (since you can’t get the stuff without it), yet use a fake phone number. Strike two! Use of a throwaway e-mail address containing digits to thwart spam: strike three …
    Just because you have statistics with clear patterns, doesn’t mean they it can be effectively used without a huge proportion of false positives, even if you are relying on multiple estimators.

    Let’s say that 95% of the fraudsters check off on seven different estimators. Gee, that seems really useful, right? But what if 3% of legit users also check off on those estimators?

    Oops, depending on the volume of transactions, that’s not even good enough for implementing a system where these factors are only used as alerts to mobilize human investigators, let alone any automatic rejection.

    The false positives have to be very, very, low.

    1. @Kaz, I get what you’re saying about false positives.
      For us though, and our fraud requirements might be unique, the false positives are not worth the risk. Providing us with false information is suspicious. Purchasing at 3am is suspicious. If you’ve got a decent record with us then these signals don’t trip any flags, but a new user with a false telephone number purchasing at 3am is too risky for us to take a chance on. Statistically there’s a HUGE probability for that user to be a fraudster.

      I understand that as a customer giving this information can sometimes feel invasive and so we put in false information, ESPECIALLY for a new account at a site we don’t necessarily trust but sitting on the other side of the fence these are suspicious signals. It’s like if someone was walking up and down your pavement at 3AM in the morning with laptop open. He could be a network technician needing to work on the neighbourhood after hours to avoid disruptions, but I’ll take a wager that people would call the cops on such behaviour.

      When the vast majority of legitimate users doesn’t present these signals, we’re obviously going to be suspicious of those that do.

  5. Nice post! You attribute the 3am spike to fraudsters that want to avoid detection when staffing is low. However, the graph doesn’t show absolute number to back that up. An equally valid conclusion is fraudulent access stays consistent throughout the day, while legitimate usage goes down.

      1. Lol fraud will always live all these programs and systems are just the basic knowledge of how fraud works and companies like soft science only use this as a business opportunity tbh fraud will never end anybody can buy card details on dark net and have all Info for card holder and bypass 80% of checks and live a comfortable life for free 🙂

  6. In the last three years ( I am working on it, its a much older set ) improving our anti fraud system we are looking also to our customer profile. If the name doesn’t fit our profile, it will get a flag. If you’ll use internet explorer or a bot passes browser info with IE something is wrong. Female name, flag it. So each company can make his own anti fraud system. A fraud has more then one failed payment pending, always! You can see them trying to order 200 dollars, 100 dollars, 75 dollars; if the CC passes validation. Sometimes everything is perfect and the gut feeling of a staff member asks for revalidation of account info. Gut feeling isn’t always based onto numbers but you have to keep it.
    The main focus getting the fraud is to have a very low false positive rate which I think we achieved. Because the false positive rate decreases the effectiveness of your staffs approval behavior. They tend to help the fraudsters because they helped the false positives. So the system should be focussing on decreasing the fraud rate and decreasing the human error rate ( performance driven ).

  7. I have had over 300 credit card fraud attempts in the last year. My card processing company catches 98% of them. I always verify an order by calling the provided phone number or email to the billing address, with a limited time for responses. Usually neither return a personal connection. Also with different shipping addresses from billing, I do an address search on Google earth, mostly finding vacant buildings or lots. Which I mark as fraudulent and refund or decline the order. I’ve only had one order get to the doorstep which I recovered the funds via dispute.

  8. An email address is associated with one or multiple accounts, facebook, twitter, linkedin, online stores, etc etc. You can easy verify this information and create a score based an email reputation history not because someone use a temporary email address.

Leave a Reply

Your email address will not be published. Required fields are marked *