API Development at Sift Science

At Sift Science, APIs are incredibly important. We spend a lot of time improving upon our existing APIs and thinking about how to design even better ones. A couple of weeks ago at the San Francisco API Craft Meetup, I gave a talk on how we built the API that powers our new Sift console.

At its inception, the Sift console was an internal Rails app built for investigating model issues. As we made the console accessible to our users, we rewrote it as a single-page JavaScript app driven by a set of private, undocumented APIs. For the third iteration of our console, we took an API-driven approach. The new APIs that drive the console are powerful and comprehensive enough to allow our users to build their own interfaces atop their data. The console is just another consumer of these underlying APIs.

Some of the technologies we utilized to build it include:

In my talk, I discussed how we migrated our API and console, as well as some of the lessons we learned along the way.

If you missed the talk, you can check it out below! Questions about Sift? Feel free to drop us a line any time.

Running ML Infrastructure on HBase

We recently hosted our first ever HBase meetup! This was a very exciting event for us as it was the first time we showed off some of the great infrastructure and systems we've built to power our machine learning platform.

Of course, we didn't start with HBase. When we first launched in April 2012 our platform was built on MongoDB. At the time, Mongo provided a great balance between flexibility and operability, but we very quickly outgrew it and moved to HBase and now proudly serve thousands of sites and many 10s of thousands of requests per second on our HBase cluster. 

In our talk at the Meetup, Andrey focuses on the underlying infrastructure we have built to support both online and offline learning at scale and how HBase, in particular, lends itself to this problem.

We look forward to hosting more meetups around infrastructure, systems and data science in the coming weeks and months. If you're interested in learning more or hacking on HBase and machine learning, please don't hesitate to reach out to us! 

Service Incident Postmortem: Breakdown and Root Cause

We experienced an outage of our APIs from August 26th, 2014 from approximately 11:36PM PDT to 12:53AM PDT, August 27th, 2014. While we've outlined what happened and the impact to our customers, we'd like to detail the root cause, how we fixed the issue and what we're doing to ensure it doesn't happen again.

Root Cause

Our event processing system is asynchronous. At the edge of our networks we run API servers that receive event data from both our Javascript Snippet and Event APIs. These servers are  mostly stateless but depend on a small database of account information (for validating input) and Amazon SQS for queuing work for our classifier fleet. While the small database of account information is accessed using a full read-through cache that is tolerant of downstream outages, our use of SQS had no provisions for unavailability.

We made the naive assumption based on SQS documentation that a queue would be always available given the Redundant Infrastructure guarantee. More specifically, we assumed that for any logical queue, there were many physical queues across Availability Zones providing the queues availability. This evidently is not the case. While our primary event processing queue had been alive for over 2 years and processed many 10s of billions of events, on the evening of August 26th it simply vanished and reappeared a few hours later.

Read More

The True Cost of e-Commerce Fraud For A Store Owner

How do experts measure fraud? A recurring theme in any fraud-centric conversation is how to comprehend its total costs. Throughout my 12 years in e-Commerce, I’ve worked with countless merchants and their many partners in finance, operations, and marketing. Too often, businesses push fraud to the back-burner, not realizing its true costs. The reality is that the impact of e-Commerce fraud on a merchant’s bottom line is deeply damaging. In this post, I’ll share a real-world example to better illustrate the true cost of fraud.  

Meet Jennifer

Jennifer is a store owner who sells jeans through Shopify, an e-Commerce platform. She buys her most popular product - the Boyfriend Jeans - from her local wholesale vendor at $20 a pair. Jennifer uses keystone markup (twice the wholesale cost) to price her item at $40 and offers free shipping on all purchases.

At first glance, a simple calculation shows a 50% profit ($20 profit from a $40 sale) for her Boyfriend Jeans. Although a 50% profit on every sale sounds appealing to many merchants, there are many more costs that haven’t been accounted for.

Read More

5 Worst Internet Scams of All Time

Online fraud is expensive. The recent StubHub scam cost $1.6 million and the Target data breach cost an estimated $200M (and counting). At Sift Science, we help customers fight back by analyzing millions of data points on patterns of fraudulent behavior and new tactics. We hear about fraud stories, large and small, and discover something new everyday. Today, we take you back in time to show you 5 of the worst online scams of all time:

5. $1.3MM Lost in Online Dating Scam

In 2013, Ellen, a comfortably retired Canadian woman lost her life savings of $1.3 million to “Dave”. “Dave” connected with and wooed the lonely Ellen, who thought she had found companionship on an online dating site. Dating site fraudsters prey on vulnerable men and women to elicit money, gifts, and other favors. It’s almost too easy to fabricate stories, personalities, and relationships from behind a screen. After crooks form “relationships” digitally, all they have to do is devise legitimate-sounding reasons for their victims to send money overseas.

Read More

Seven Habits of Highly Fraudulent Users

At Sift Science, we analyze a lot of data. We distill fraud signals in real-time from terabytes of data and more than a billion global events per month. Previously, we discovered that the U.S. has more fraud than Nigeria and solved the mystery of Doral, FL. At our “Cats N’ Hacks” Hackathon last week, I decided to put some of our fraud signals to the test. Working with our Machine Learning Engineer, Keren Gu, we discovered some interesting fraud patterns[1]:  

Habit #1: Fraudsters Go Hungry

Normal Transactions
Normal Transactions

When we looked at total non-fraudulent (normal) transactions by hour, normal users had slow starts to their mornings. We noticed a slight dip in transaction volume around lunchtime and suspect that's because people are taking lunch breaks! Happily fed, they resumed activity in the afternoon and activity petered out as users went home for the day.

What about fraudsters?

Fraudulent Transactions
Fraudulent Transactions

Fraudsters, however, work through lunch. We don’t see the same dip in activity during lunchtime in the fraudulent sample. It seems that fraudsters are too busy scheming their next move.

Habit #2: Fraudsters Are Night Owls

night owls
night owls

When we analyzed fraudulent transactions as a percentage of all transactions, 3AM was the most fraudulent hour in the day, and night-time in general was a more dangerous time. This finding is consistent with our historical findings and it makes sense: fraudsters are more likely to execute attacks outside of normal business hours when employees aren’t around to monitor fraud.

Habit #3: Fraudsters Are International

international
international

Indian email address domains had one of the highest fraud rates when compared to other top-level domains. However, don't give up on those great Bollywood movies just yet! We're only looking at data from the past three months. We've seen this list fluctuate quite a bit depending on what new tactics fraudsters use.

Habit #4: Fraudsters Don Multiple Identities

multipleidentities
multipleidentities

Fraudsters tend to make multiple accounts on their laptop or phone to commit fraud. When multiple accounts are associated with the same device, the higher the likelihood of fraud. The graph above shows how many times more likely a user is fraudulent given the number of accounts associated with the user's device. Phew, that was a mouthful! Said in another way, a user who has 6 accounts on her laptop is 15 times more likely to be fraudulent than the average person. Users with only 1 account however, are less likely to be fraudulent.

Habit #5: Fraudsters Still Use Microsoft

outlook
outlook

Some of the most fraudulent email domains are operated by Microsoft. Why could this be? Two possible reasons are that 1) Microsoft has been around for a lot longer and 2) email addresses were easier to create back in the day. Today, websites use challenge responses such as image verification or two-factor authentication to verify your [tooltip tip="and innocent!"]legitimate[/tooltip] identity. 

Habit #6: Fraudsters Are Really Boring

boring
boring

One of the most widely recognized predictors of fraud is the number of digits in an email address. The more numbers, the more likely that it’s fraud. Why? Because fraudsters are boring (and lazy). They use computer programs to sequentially generate email addresses so they don’t have to think of new ones. Emails such as “foo1234@test.com” or “foo1234568@testing.com” are highly suspicious. However, detecting fraud using email address alone can be really difficult. The only way to really get good at detecting fraud is to look at hundreds of signals, sometimes in the thousands (that’s where machine learning can help).

Habit #7: Fraudsters Are Sneaky

sneaky
sneaky

Fraudsters like to create disposable accounts that are short-lived. In analyzing the age of fraudulent user accounts (meaning, the amount of time between account creation and a fraudulent transaction), we found that they sign up on sites and then quickly commit fraud. The longer the account age, the less likely the user is committing fraud. Nonetheless, experienced fraudsters know that fraud detection companies track this type of signal. In the graph above, we noticed "sleeper" fraud agents became active after 30 and 60 days of account creation. Fraudsters are sneaky!

Obviously, the above is not a definitive sample set. Data can help us find potential answers as to why fraudsters behave in the ways that they do, but as statisticians say, “correlation is not causation”! It’s important to use common sense and human intuition when it comes to dealing with fraud.

These Insights Brought To You By:

Sift Science Cats n' Hacks Hackathon
Sift Science Cats n' Hacks Hackathon

Did you like reading about these insights and patterns? What other fraud signals interest you? Let us know in the comments, and we’ll pick out a few to write about in our next installment!

[1] Data was collected from the past three months over our entire network. From the hundreds of millions of transactions we processed during that time, we analyzed about 6 million. Our “fraud” sample consisted of transactions confirmed fraudulent by our customers; our “normal” sample consisted of transactions confirmed by our customers to be non-fraudulent, as well as a subset of unlabeled transactions. Please keep in mind that every company faces different type of fraud, and that our findings may not be representative of what you see. All transaction timestamps are local to the user.

Custom Workflows to Match Your Business

Our customers range from on-demand services like Instacart to online retailers like JackThreads to small stores using platforms like Shopify.

Each of our customers is unique not only in the way that fraud affects them, but also in the way fraud teams work through manual reviews of suspicious orders and users. Many of our customers prefer to review just their most recent orders while others prefer to focus on orders with high order values or have mismatches between shipping and billing addresses.

We’ve listened, and with the latest release of the Sift Science console, we’re really proud to give customers the ability to customize manual review queues in the way that makes the most sense for their business.

Custom queues that are personalized for your business  You can now filter queues by any attribute that you send Sift, including order value or country. Also, you can create queues using attributes our algorithms calculate, like the distance between billing address and shipping address or the number of failed transactions.

You still have built-in Orders and Users queues, but now you’ll have the ability to customize those queues further. Also, you can now build a queue completely from scratch through Search, and share that queue with other analysts by sharing a URL.

It's now easier to train Sift Science to spot fraud We’ve also made labeling users a one-click experience in Queues and the User Details panel to help analysts understand the labeling process better as well as be more efficient. You can still add a reason (like chargeback or spam) after you’ve labeled a user.

Release We’ll be rolling these changes out to you on August 4, and we won’t be supporting earlier versions of the console moving forward.

Help make Sift Science better! We love feedback! If you have any thoughts you'd like to share, please let us know what you think by emailing support@siftscience.com.

Thanks! The Sift Scientists

Behind the Signal: Doral, FL

What’s up with Doral? Let’s say you’re going through orders, and you come across one with a high order value where the billing and shipping addresses don’t match. You decide to do a bit of sleuthing, starting with research on the shipping city: Doral, FL.

At first glance, shipping to Doral seems like a no-brainer:

Based on that information, it’d be perfectly reasonable to ship that order.

However, there’s also cause for caution. Sift Science has found that --  despite Doral’s wealth and status as member of the Trump empire -- orders shipped there are 8X more risky than normal!

What Versus Why

At Sift, insights like these are discovered automatically, and often the signals are subtle and not immediately intuitive. After all, a computer can say "what", but it takes a human being to say "why".

For Doral specifically, I did ask “why”, and here’s what I found.

Doral’s land zoning looks like this:

Map of Doral, FL
Map of Doral, FL

Yes, Doral is home to not only 90 holes of golf, but also a lot of industrial land!

It has over 3K logistics-related companies and the Miami Free Zone, which offers 750K sq. ft. of duty-free warehouse space.  Its proximity to Miami International Airport (the #1 airport in international freight) and Port Miami (which moved 12.5B tons of cargo last year) means it has a thriving logistics industry.

Some of these warehouses offer package forwarding as a service. So, shipping to Doral is risky because that package has a higher likelihood of being forwarded to someplace else.

So what do you do?

Clearly, you shouldn’t blacklist Doral, since most of its inhabitants (population: 48K and growing) could be great customers. Similarly, you shouldn’t blacklist forwarding addresses since not all fraudsters use forwarding addresses and not all forwarding addresses ship to fraudsters.

Ultimately, shipping city is only one factor you should take into account when assessing fraud risk. It could be worth cancelling the order if there were other risky signals, such as how the email was typed in, what products -- colors, sizes, etc. -- make up the order, and shipping address.

At Sift, we have a proprietary database of over tens of thousands of known risky addresses.  However, the number of risky addresses is growing every day. As some addresses get blacklisted by larger retailers, package forwarding companies will change their warehouse locations. Plus, the freight industry itself has been under pressure, and package forwarding as a side-business could be a nice source of incremental profit.

While we’re not experts in package forwarding and freight, we are experts in identifying risk signals associated with this industry using machine learning. We track addresses and more to predict the likelihood of fraud. Follow us on Twitter to learn about more fraud signals or let us know about fraud signals you've investigated in the comments below. We'd love to hear them!

How Did My Credit Card Info Get Stolen?

Nobody likes dealing with credit card fraud. It can be embarrassing and difficult to admit that you’ve been a victim. At Sift Science, we often hear from our customers about 2AM nights at the office spent triaging thousands of orders that were placed with stolen credit cards. Today, we thought it would be helpful to understand how it all starts. To do this, we need to go underground deep inside criminal territory. It goes without saying that credit card fraud is malicious and illegal. It can result in felony charges added with several years of imprisonment in jail.  

Simply put, credit card fraud starts with theft. With determination and time, fraudsters can obtain credit card numbers and information at any price. In fact, an entire underground economy, complete with moderators and reviewers, exists for criminals to buy and sell your information online. Databases of people’s names, credit card numbers, and even complete bank account login information (also known as “FULLINFO” or “FULLZ”) can be sold anywhere from $2 to $50. “Carders” as these thieves are called, even share tutorials and spread information on which sites are vulnerable to attack.

The act of the theft itself can take shape in a number of ways. The most common is through hacking databases, sending phony emails (also known as “phishing”), and exploiting security holes. Sophisticated carders usually hoard the information and sell them in bulk to consolidators. The consolidators then sell them on the black market lurking in secret online forums or chat rooms. They even offer flash sales on bulk discounts. Here is a sampling of “products” and prices we found on our own research via Google:

menu
menu

Once thieves obtain these credit card numbers, they run test transactions to make sure the cards are valid. In the old days, thieves would encode the credit card numbers onto fake plastic cards. Today, with the increasing prevalence of online payments, thieves first test them on sites by buying small ticket items (e.g. $3 earrings) or signing up on sites that offer free product trials. Once they verify that the card works, they move on to bigger ticket items, leading to outright theft and chargeback fraud.

By the time we realize we have been victims of fraud, it’s too late. Goods will have been shipped and gifts cards redeemed. It shouldn’t be a surprise then that fraud has caused over $5B in losses a year and has shut down thousands of businesses.

To learn more about simple steps you can take to prevent credit fraud yourself, check out Federal Trade Commission’s article, Protecting Against Credit Card Fraud.

Three Ways Gamers Cheat in Online Poker

As we mentioned before, there are many signals linked to fraud in the digital world. At Sift Science, we use advanced fraud detection technology to help customers identify bad behavior and adapt to tactics in real time. In the online gambling sphere, where regulations and oversight are unclear, gaining player trust by providing a safe and fair environment is paramount. One way to improve game experience is to prevent fraudulent behavior. Here are three common ways gamers commit fraud in online poker.

1. Bonus Abuse Through Multiple Accounts

Poker sites often give away play money using bonus codes to attract new players. Fraudsters try to take advantage of this and sign up using multiple accounts at the same game table or tournament, causing the poker site to lose money while also providing a bad experience for other players. Usually it’s enough to track account registration by IP address, but for advanced cases, more sophisticated tools are required. The best fraud detection tools use device fingerprinting to find multiple accounts created by a single laptop or computer.

2. Computer Bots in Poker Rooms

Hackers have created computer programs (“bots”) that automate online poker play. Bots are banned from poker sites because they create an unfair advantage–computers have no emotion, so they are not subject to “tilt” (the poker term for player aggression when they play a poor strategy). Fraud rings have been caught colluding and cheating players out of hundreds of thousands of dollars using bots.

So how do poker sites detect bots? While most detection techniques are proprietary and unknown to the general public, some measures include monitoring player reaction time, suspicious mouse movements, and randomized pop-up windows with challenge questions.

3. Chip Dumping in Tournaments or Ring Games

Chip dumping happens when a player intentionally loses chips to another player at the table to give them a better chance to win. It has become a way for players to launder money. Fraudsters use stolen credit cards to deposit funds and then dump chips at a cash table to another account he or she created. In other cases, the fraudster will hijack an innocent player’s account (“account takeover”). Online poker rooms typically check for players making curiously large bets with a terrible hand or folding on a relatively safe bet.

 

Interestingly, most fraud is caught by vigilant human players who report fraudulent behavior. However, cyber criminals can still take advantage of even the most experienced (and most valued) players. One reason is that online poker is still mostly illegal in the US and most sites are physically located offshore. It can be difficult to determine whether sites are legitimate and whether it’s safe to hand over your credit card number. The good news is that there are simple steps players can take to protect themselves from fraud.

To learn more about common methods online poker rooms use to combat fraud, check out Cheating & Collusion at Online Poker Rooms. If you’ve been a victim of online fraud or would like to learn more about us, let's talk.

Our next chapter

The internet offers unprecedented connectivity, scalability, and anonymity. Unfortunately, it can also be abused. As activity moves from the physical to the online world, so does fraud. Online chargebacks, spam, referral abuse, and account takeovers cause all sorts of headaches for businesses that would rather focus on their core competencies. At Sift Science, we make world-class online fraud detection easy and accessible to merchants of all sizes. Just over a year ago, we launched our first product: a fraud detection API that empowers online merchants with realtime, large-scale machine learning. This is the same core fraud detection technology used by giants like Amazon and Google.

And boy oh boy, it’s been a busy year. We launched a new version of our API, a real-time fraud console, plugins for Shopify and Magento, and many other exciting changes. We now analyze more than $1.5 billion of transactions and 600 million events each month. We’ve helped customers detect, in realtime, 95% of their fraud with an industry-leading 7% false positive rate. We’ve cut their manual review rate more than sixfold, while enabling them to capture revenue that would have otherwise been rejected. Our customers include retailers of physical and digital goods, financial services companies, marketplaces, mobile-only companies, nonprofit organizations, and online communities on all six habitable continents. They range from high-growth businesses like Airbnb, Uber, OpenTable, Indeed, JackThreads, Kickstarter, and HotelTonight, to mom-and-pop shops collecting their first dollars. We also won the Best Emerging Technology Award at this year’s Merchant Risk Council conference (a key event in the anti-fraud industry). Woohoo!

And now, some exciting news. We recently closed an $18M Series B round of funding led by Spark Capital. We welcome Mo Koyfman to our board of directors, a kindred spirit who shares our passion for great product experiences and big thinking. We’ll use the funds to grow our team and accelerate our sales, marketing, and product development initiatives. We have just begun our mission to make the internet a better place. Our machine learning product improves with more customers and data, and over time we believe that this network can deliver tremendous value across the web.

To our customers and investors - thank you for your continued support. We will work hard to deliver even more value. To our potential customers - don’t hesitate to contact us and learn how we  can help protect your business. To potential candidates - we’re hiring across the board.

Onward!

 

What is Big Data (Part I)

This post is part of a series that discusses, in simple terms, machine learning and big data. Today we're demystifying big data. To learn about machine learning, check out Machine Learning For Poets.

What is Big Data?

What is big data? Many define it in terms of the computing power it requires. To understand what big data is, however, you first need to know what big data means. In this post, we’ll discuss the implications of big data’s meteoric rise.

What big data means

The excitement around big data isn’t just marketing hype. In fact, it captures a qualitative shift, from model complexity to data complexity.

Answering complicated questions used to require equally complicated models. Despite their elegant mathematical underpinnings, these were usually imperfect, especially when modeling real life. They required many assumptions, which didn’t always hold true (e.g. “Humans are rational”).

Human behavior is more complicated than  E = mc2. Therefore, when making predictions about humans, discovering how things actually work has proven more effective than depending on a caveat-laden model.

In other words, big data frees us to derive insights empirically. With enough information, you can approximate what you want to know by "asking the data directly" rather than relying on assumptions. Fewer assumptions mean fewer places for things to go wrong.

Of course, the quantity of data required to reduce model complexity results in -- you guessed it -- increased data complexity.

Fight fraud with big data

At Sift, we know that big data is critical to staying ahead of fraudsters. Contemplating what I think fraudsters do is less important than discovering what they actually do.  Predicting fraudster attacks based solely on recent trends is less effective than incorporating all information.  Constraining your fraud team to a limited set of variables is less efficient than using every piece of information available.

So now you understand the most important aspect of what big data is: its implications. Next up: the logistical challenges that define it.

For more insight, look at Alon Halevy, Peter Norvig, and Fernando Pereira’s excellent paper The Unreasonable Effectiveness of Big Data. Stay tuned for more explanations, applications, and discussion on machine learning and big data. If there are specific topics you’d like us to cover, let us know at info@siftscience.com or @siftscience!

Five Fun Fraud Facts

As an e-commerce fraud analyst, you’re expected to decide whether a transaction is good or bad, often with ambiguous transaction and customer data. This can leave you feeling like Lucy, especially during the holiday season.

In the absence of a fraud detection system, here are five signals you can use to assess fraud risk. Remember, these are aggregate signals based on data from many companies. Your mileage may vary.

  1. Fraudsters have stacks on stacks of cards. If the customer has multiple credit cards on file from different banks, their order is 7x more likely fraudulent.
  2. fraudsters dislike capital letters. if a customer wrote their billing name in all lowercase letters, the order is 2.7x more suspicious.
  3. Fraudsters stay (virtually) on the move. A buyer with multiple billing zip codes within a week is 30x more likely to be fraudulent.
  4. Fraudsters favor disposable email addresses. An email address with two or more digits is twice as likely to be fraud than one with zero or one digit.
  5. Fraudsters are night owls. Transactions at 2AM are 50% more likely fraudulent, while 4AM transactions are 100% more likely fraudulent.

How can you further improve your fraud detection accuracy? Customization. Advanced fraud detection solutions like Sift Science can incorporate data unique to your business into our scoring. We call these custom events.

For example, an online shoe store like Zappos could send us the shoe size for each transaction as a custom event. It might turn out that size 10 shoes are more fraudulent than size 15 shoes. This makes sense intuitively: there are more people walking around with size 10 feet and fraudsters often focus on goods they can resell easily. Wondering how Sift Science can solve your e-commerce fraud challenges? Drop us a line, we’d love to help.

E-commerce fraud: where it hurts

Here at Sift Science, we make powerful fraud detection software available to companies of all sizes. Fraud can mean many things and impact many different parts of these organizations. As noted in our post on global fraud, we detect three main kinds of e-commerce fraud (plus other specialized kinds): payment fraud, new account fraud and account takeover. Below, we’ll take a closer look at each type and whom within a company they hurt.

Payment fraud

Online payment fraud means using stolen means of payment to make a purchase. Typically, this involves stolen credit card numbers although it could entail other payment info like bank account routing numbers or Paypal credentials. Stolen credit card numbers are shockingly easy to obtain cheaply online. The first time many unprotected merchants learn of payment fraud is after they’ve fulfilled an order, when their credit card acquirer notifies them of a [tooltip tip="the original charge will be reversed, as the consumer has realized their card was used without their permission"]chargeback[/tooltip]. For [tooltip tip="when the physical card is not physically present at the time of the transaction. E.g. all e-commerce"]card not present[/tooltip] transactions, the merchant must make restitution, meaning they lose both the revenue and merchandise itself.

While payment fraud understandably impacts the Finance team through the chargeback fees and lost revenue, it also hurts Sales via channel cannibalization. In one case, an e-commerce company detected fraudsters buying their products with stolen credit cards and reselling them on Amazon Marketplace at deeply discounted prices. Fraudsters were thus not only stealing merchandise, but were undercutting legitimate online vendors, causing them to become angry with the original company.

New account fraud

New account fraud occurs when a fraudster opens a new account on a site and does something undesirable with it. Marketplaces and social networks in particular focus on limiting this kind of activity due to their peer-to-peer models. If fake listings or users proliferate, legitimate ones will be spooked, defrauded or otherwise deterred from participating, and the community could become stagnant.

New account fraud impacts Community and User Experience teams due to its drag on user engagement.  These teams must spend valuable resources sifting out fraudulent activity patterns. The recent uncovering of a major attempted Kickstarter scam that raised over $120,000 for Kobe beef jerky serves as prominent example of this sort of fraud. Luckily, a film documentary team caught them and Kickstarter froze the account before any backers were charged.

Account takeover

Account takeover is simply when a fraudster commandeers an existing account and uses it for malicious purposes. Wired reporter Mat Honan’s detailed account of how hackers systematically gained access to his entire online identity (iPhone, Amazon, Gmail, Twitter) provides a chilling example of its feasibility. On a larger scale, Riot Games, maker of the popular online game League of Legends, announced this week that its databases were hacked and user names, (salted) passwords, email addresses as well as 120,000 old transaction records were compromised. Account takeover hurts Customer Service teams given the need to change passwords, create new accounts and otherwise repair the damage caused.

Specialized challenges: referral fraud

Fraud takes many other specialized forms. Consider referral fraud, a.k.a. affiliate fraud. Fraudsters will take advantage of refer-a-friend programs at many e-commerce sites by creating multiple identities to maximize their gains. The vast array of sites offering such programs bonuses can be found at refAround. With referral fraud, it’s Marketing who loses, as customer acquisition spending comes from their budget.

What can you do to fight back? Consider a fraud detection solution like Sift Science. We provide protection from all the above mentioned e-commerce fraud types and constantly update our models with emerging threats, leveraging insights from across the customer base. Please get in touch to discuss your challenges, we’d love to talk.

The USA has more e-commerce fraud than Nigeria

Sift Science customers hail from all [tooltip tip="On that note, let us know if you know any Antarctica startups..."]six habitable continents[/tooltip]. We’re seeing e-commerce fraud activity from practically everywhere as well, Albania to Vietnam. Since the Sift Science team includes quite a few data geeks who love #uberdata and OkCupid’s OkTrends blog, we thought we’d share a visualization of our global fraudulent transactions. What sort of fraud are we seeing? That deserves its own post (coming soon), but there are three major types: payment fraud (e.g. using stolen credit cards to buy goods), new account fraud (i.e. creating an account to do illicit stuff like money laundering) and account takeover (i.e. using someone's existing account to do illicit stuff). Global e-commerce fraud rates by country

Above is a map of [tooltip tip="Defined as reported fraudulent transactions / total transactions originating in that country"]fraud rates[/tooltip] by country. Based on a sample of our transaction data, here are the top ten most fraudulent countries. You can see the top 25 countries at the end of the post.

  1. Latvia
  2. Egypt
  3. United States
  4. Mexico
  5. Ukraine
  6. Hungary
  7. Malaysia
  8. Colombia
  9. Romania
  10. Philippines

Biggest surprise? Nigeria. For all of the flak Nigeria gets with their e-mail scams (not all of which originate in Nigeria), we’re not seeing a lot of fraud from Nigerian IPs. In fact, Nigeria (#17) has only slightly more fraud than Canada (#18).

Several caveats are worth noting. Since this is based on a sample of our collected transaction data, it is not necessarily representative of the overall e-commerce fraud rates globally. For simplicity’s sake (developer time is a precious commodity at Sift!), we used the reported IP address as the country of origin. Lastly, just because a country shows up as higher fraud on this list doesn’t mean a merchant should create a fraud rule for it. We instead suggest adopting a more robust and versatile solution able to adapt to new patterns.

For our more technical readers-- we used just over half a million transactions and included only those countries with at least 1000 total samples and at least 10 fraud samples. That puts the size of the 95% confidence interval on the fraud rate at just under 1%. To draw the map itself, we used d3 with topojson. Then, we overlaid the countries onto a Mercator projection, and computed the color as [percent of transactions labeled as fraud]*[max red saturation].

In the future, we’ll be sharing other insights from the terabytes of data we analyze to detect fraud. What would you like to see? Get in touch via Twitter or email with your suggestions.

Here are the top 25 fraudulent countries, from most fraudulent to least fraudulent.

  1. Latvia
  2. Egypt
  3. United States
  4. Mexico
  5. Ukraine
  6. Hungary
  7. Malaysia
  8. Colombia
  9. Romania
  10. Philippines
  11. Greece
  12. Brazil
  13. China
  14. Indonesia
  15. Russia
  16. Singapore
  17. Nigeria
  18. Canada
  19. Portugal
  20. Switzerland
  21. United Kingdom
  22. India
  23. Netherlands
  24. France
  25. Austria

Mobile e-commerce fraud detection insights

Mobile e-commerce is exploding. In the US, 56% of people already own smartphones. Internationally, adoption projections for countries like China show this trend is just beginning. Unfortunately, with the increasing limitations on mobile device fingerprinting, mobile e-commerce fraud detection has also become more complex.

Less data, mo problems

Mobile fraud suffers from two data-related problems: merchants ask for less customer information and the device data they do collect is less useful. Merchants request less info because conversion stands as their greatest challenge. Specifically, mobile customers give up nearly half their shopping attempts because the process takes too long. While the prioritization of conversion and growth over fraud detection is understandable, merchants are increasing their risk.

Example of efficiency versus mobile e-commerce fraud detection

Besides the fact that some traditional signals are unavailable on mobile devices (e.g. IP-based location), merchants are finding that remaining data is often insufficient. In May, Gartner estimated that ~40% of mobile devices could not be uniquely identified...quite problematic as fraudsters shift to mobile along with legitimate customers.

Unique mobile e-commerce fraud detection patterns

Large-scale machine learning solutions like Sift Science provide a competitive advantage due to their breadth and flexibility. Two examples from our data illustrate machine learning’s power in e-commerce fraud detection. First, when comparing top fraud signals for a desktop web site to a mobile app, we found almost entirely different predictive fraud patterns (see table).

Differences between desktop and mobile app fraud detection

Notably, while behavior matters in both environments, the nature of in-app navigation requires a detection solution able to take into account the unique way each app is designed.  At Sift, we do this by accepting custom events. These are crucial in understanding whether a customer is a potential fraudster. The results also make a strong case for capturing more data, given the potential for any pattern to be predictive in detecting fraud.

New accounts: always riskier?

As a second example, consider the common belief that transactions from newly created accounts are riskier. In fact, our system uncovered a more nuanced reality, one difficult to detect without machine learning.

Nuanced results spotted by machine learning-based e-commerce fraud detection

Why might this be? Many sites have a “sign up when you make your first purchase” option that’s used by legitimate customers. In contrast, fraudsters tend to create accounts and then go shop for merchandise. Of course, time ranges will differ between companies, so custom variables are crucial for a fraud detection system.

These mobile e-commerce fraud detection insights demonstrate how a large-scale machine learning based solution not only catches more fraud, but also more efficiently identifies legitimate customers. Check back here often (or sign up for our email list) because we’ll be covering other fraud-related topics in future posts, such as technical aspects of mobile fraud and a look at fraud by country.

Mobile fraud detection in iOS 7

Rapid smartphone and tablet adoption has led to huge new opportunities for mobile e-commerce, including the rise of mobile-only companies like Uber and HotelTonight. However, mobile fraud patterns are significantly different than desktop ones. Most notably, the common anti-fraud technique device fingerprinting is ineffective and leads to many false positives when trying to use it for mobile fraud detection.

Limits of mobile device fingerprinting

Device fingerprinting is the set of system configuration settings that collectively can identify a computer. On the mobile web, customers don’t have many of these settings, such as Flash cookies or user-customizable plug-ins/extensions. As a result, sites frequently see seemingly identical devices; worsened due to certain mobile carriers (e.g. MetroPCS) having a relatively small IP address pool. Inside mobile apps, the customer’s system configuration settings aren’t available at all. Therefore, since the demise of UDID this past spring, mobile app developers haven’t had an easy way to identify devices.

MAC address- a temporary fix?

To address this, we’d advised our clients to implement Bluetooth MAC address identification as a substitute for mobile device fingerprinting. For some customers, this reduced credit card fraud by 80% almost immediately. Unfortunately, Apple will effectively be blocking access to the MAC address in iOS 7 since every iOS device will report an identical 02:00:00:00:00:00. Beta testers already are. While just the latest step in Apple’s efforts to get developers onto their iAd platform, it poses a serious challenge to mobile fraud detection.

Mobile fraud detection with large-scale machine learning

There are no easy workarounds. It is possible that Apple will grant select companies access to the device’s MAC address, but so far that hasn't happened. Implementing a mobile fraud detection model that analyzes data beyond device fingerprint (e.g. fine-grained GPS location) and outputs an assessment of fraud likelihood will provide the only truly long term solution. At Sift Science, we believe that large-scale machine learning is the best approach, since it integrates every possible data point and adapts to the specific fraud patterns of each business. Drop us a line—we’d love to chat about your specific fraud challenges.

Prevent chargebacks strategically

Ten-second summary: Prevent chargebacks (both criminal and friendly)  with cross-functional coordination. To maximize profits, your fraud team should work with:

  • The website design team to rationalize payment form fields by substituting user-entered information with insights you already have from machine learning.
  • The customer service team to efficiently prevent and dispute friendly fraud.

................................................................

Thinking strategically about preventing chargebacks

Preventing e-commerce chargebacks -- due to malicious purchases with stolen credit cards or ‘friendly-fraud’ from disgruntled customers --  is a challenge that requires strategic coordination.

By working cross-functionally, your fraud team can:

  • Maximize your bottom line with an effective checkout form (with the website design teams)
  • Minimize the impact of friendly fraud (with the customer service team).

Maximize your bottom line with a balanced checkout form

Frictionless checkout is critical. An estimated 18% of customers who abandon carts do so because of checkout page complexity. ((The Baymard Institute’s E-Commerce Checkout Usability Report (June 2013) and blog has extensive research on ecommerce desktop and mobile best practices, like this post on streamlining the checkout process.)) Too simple a payment form, however, hinders chargeback prevention.

We’ve previously discussed why payment forms are so complicated. So how should your web design and fraud teams simplify checkout?

Rationalize fields. Some payment fields can be inferred from others and removed without impacting fraud prevention. For example, the customer’s billing address and zip code are critical to assessing chargeback risk, but billing city & state can be inferred from zip code. Similarly, credit card issuer maps to credit card number.

Apple's payment form has low friction
Apple's payment form has low friction
07-2012
07-2012
July 2013
July 2013

Simplify field entry.  Let shoppers input information as directly as possible. Expiration date fields     and    provide the same data to the fraud team, but the latter requires converting month number to name. Also, be mindful of the order of text fields and drop-down boxes, to reduce a customer's switching between keyboard and mouse.

This payment form won't prevent chargebacks -- it'll give you carpel tunnel!
This payment form won't prevent chargebacks -- it'll give you carpel tunnel!

Turbocharge chargeback prevention with data. Some fields are bottlenecks to conversion, but incrementally useful for fighting fraud. Remove the field, but keep chargebacks low by using signals derived from the customer themselves. Sift Science’s machine learning technology, for example, can surface subtle fraud signals based on a user's behavioral, network and identity traits. ((Conventional rules-based fraud detection assumes the past is like the present, which isn't the case with fraudsters. We’ll write more on this later, but data will only boost your bottom line if you have the right tools!))

Every business is different, so run A/B tests to optimize your unique conversion-information trade offs. Event tracking in Google Analytics can elucidate form abandonment issues on a field-by-field basis. (See footnote for info & alternatives). ((Be sure to use _trackPageview and be sure to set up a separate profile. For more about event tracking, see here. There are also jquery-based methods, simpler less granular tracking, and standalone software options.))

Minimize the impact of friendly fraud

Friendly fraud accounts for 23% of revenue lost to chargebacks, but this does not include the 57% of all fraud losses from credits issued. ((Cybersource (2013) via the Fraud Practice)) Friendly fraud is exasperating but inevitable since it’s easy for consumers to contest charges.

  • The good: Unlike a cancelled check, you can dispute chargebacks.
  • The bad: Contesting a chargeback is as fun as filing taxes.
  • The ugly:  While 80% of taxpayers qualify for a refund, merchants win back only 43% of chargeback revenue. ((Chargeback win rate is from Cybersource (2013). Tax refund data is from NPR.))

Collaborate with the customer service team. Your fraud team must be efficient and thorough in chargeback disputes. When a customer disputes a transaction, the onus is on you to prove that you acted correctly. ((The Fair Credit Billing Act protects consumers (and their credit scores) form billing errors and requires that credit card companies address disputes in a certain amount of time. Effective at protecting consumers from potential abuse by credit card companies, an unfortunate side effect is that chargebacks become an easy and effective way for disgruntled ecommerce shoppers to express discontent.))

  • Record of all customer interactions (especially related to returns or replacements) and make these easy for the fraud team to access.
  • Be clear about billing timing and method, return procedures (how and to what address), and ways to contact the customer service team.
  • Choose a billing name that is easy to recognize (or find on the internet) on credit card statements, especially if customers know your product rather than your business name. (37signals cut chargebacks 30% by changing theirs.) ((37signals, maker of SaaS software like Highrise, gives details in this blog post. Spoiler alert: They list a URL where customers can find information to get around character restrictions.))
  • Provide easy access to commonly requested information, such as tracking number, date of transaction/shipping/delivery, delivery address, item purchased, and purchaser name.

Fraud losses -- whether malicious criminals or friendly -- and are a challenge. With strategic collaboration, data, and machine learning, you can overcome them and boost your bottom line.

At Sift Science, we believe all merchants should have access to the same class of technology used by Google and Amazon to prevent e-commerce fraud at a reasonable price. Get started with Sift's simple, one-hour integration!

Any other questions? We'd love to hear from you! Ping us at info@siftscience.com. 

Fight fraud frugally

At Sift Science, we’re developing an advanced machine learning system that is revolutionizing how e-commerce businesses protect themselves from cybercrime. We believe that all merchants should have access to the same class of technology used by Google and Amazon to prevent e-commerce fraud.

Today, we're delighted to announce new prices, which make Sift Science's simple and effective fraud detection free for 95% of e-commerce merchants and less than one penny per transaction for the others. 

The costs of fraud are high. The price to prevent it shouldn't be.

With fraud, everyone loses... except fraudsters

In the US, e-commerce fraud cost the industry $3.5 billion in 2012 (0.9% of all revenue).1

Three things you can do with $3.5 billion:

      • Buy an Audi A3 for each resident of Ann Arbor, Michigan.2
      • Fix the 2013 budget deficits of 11 states and the District of Columbia.3
      • Become the 4th largest owner of Facebook (~6% stake).4

Two reasons 0.9% is unacceptably high:

  • At 0.9%, the e-commerce fraud loss rate is an eighth of the industry’s average profit margin (6.8%).5 Without e-commerce fraud, industry profits would be ~13% higher.
  • Businesses would have reinvested those profits in ways that help their customers and the economy, such as offering lower prices or hiring more employees.

It’s infuriating that fraudsters hurt the online economy so much. Merchants face chargebacks, lose reputation, and devote resources to their worst customers. Shoppers pay higher prices, get frustrated by lengthy checkout pages, and wait longer for their orders to be delivered. Finally, payments companies see their networks shrink as fraudsters drive merchants into excessive chargeback inferno.

Sift Science: now free for 95% of all e-commerce businesses

At Sift Science, we're building a product that make it easy for everyone to detect fraud. Although we serve businesses, our prices are not that of a traditional 'enterprise' company:

Fight e-commerce fraud for free

Yes, you read that right. Every month, the first 10,000 transactions are free. Ninety-five percent of e-commerce businesses can now use our innovative machine learning product, completely free. A higher-volume business pays a bit more -- our team needs to eat, after all -- but the average cost per transaction will always be one penny or less.

We're betting that Sift Science clients -- with more efficient fraud-prevention teams, fewer chargebacks, and negligible third-party-tool costs -- will reinvest their savings and experience incredible growth. And we'll grow with them.

Check out our pricing page for more details and to calculate your monthly costs. Then get started with our simple, one-hour integration. Questions? Ping us at info@siftscience.com. 

Thanks for reading, and have a great day!

Sources:1 Cybersource 2013 2 An Audi A3 costs $27,270 (Edmunds). Ann Arbor's population was 116,121 (US Census, 2012) 3 Math based on the Kaiser Family Foundation's State Budget Shortfalls SFY2013 4 FB market cap as of 6/20/201, ownership stake based on http://whoownsfacebook.com IBIS world, cited by Inc.com

Announcing Sift Science: fight fraud with large-scale machine learning

"Invariably, simple models and a lot of data trump more elaborate models based on less data." - Fernando Pereira, Peter Norvig, and Alon Halevy, The Unreasonable Effectiveness of Data

Imagine your website's traffic is skyrocketing. Sales are up each month. Then one day, your payment processor calls you up and tells you you've been hit—with $50,000 in credit card chargebacks.

What happened? Fraud. A criminal ring bought $50,000 of goods with stolen credit card numbers from the black market. Weeks after you shipped the goods, the original cardholders noticed a suspicious charge on their monthly statement, called up their bank, and reversed it, generating a chargeback. You, the site owner, are left footing the bill.

When clobbered by fraud, most sites default to fixed rules, such as reviewing every account with more than ten transactions. Commercial systems today deploy 175-225 standard rules, sometimes supplemented by crude statistical models with a few hundred parameters.

But fraudsters don't play by a fixed set of rules. So why should you?

Large-scale machine learning to the rescue

Sift Science uses large-scale machine learning to automatically discover new fraud patterns. Our algorithm has pored over hundreds of millions of user actions, from both good users and confirmed fraudsters, and distilled them down into one million statistical patterns that predict fraud.

What makes large-scale learning unique is the detail of patterns learned. Like peering through a microscope, when you up the resolution, you can spot surprising details the naked eye would never notice. For example, a user who signs up and waits an hour before making a purchase is 7x more likely to generate a chargeback than a user who purchases immediately after signup. Our system has pinpointed particular page navigation sequences, IP ranges, email address patterns, graph connectivity structures, browser configurations, and even types of text entered that predict fraudulent activity. And it’s learning more patterns each day.

Sites like Airbnb, Uber, Listia, and others already rely on Sift Science. When you use us, you're joining a network of sites fighting fraud together. As the network grows, our algorithm will crunch more data, learn more patterns, and fight fraud more accurately for everybody.

How it works

You can get started with Sift Science in minutes using our interactive quickstart, which is just three easy steps: copy-and-paste a Javascript snippet, log transactions, and send examples of banned users. The Javascript snippet captures in-browser data like page views or properties of the user’s machine. Transactions tell Sift Science about payments. If a user's IP address is from Nigeria but their billing zip is from San Mateo, CA, that's a suspicious sign. Known fraudsters train our learning algorithm to spot patterns unique to your site.

We’ve designed our API to be simple and quick to integrate. For example, here’s how you would send us a transaction in Ruby:

HTTParty.post("https://api.siftscience.com/v202/events", body: { "$user_id" => "al_capone", "$type" => "$transaction", "$amount" => 153250000,  # $153.25 in micro USD "$currency_code" => "USD",                 "$billing_zip" => "94111", "$user_email" => "al@thecapones.com", "$api_key" => "XXXXXX",                      "trip_time" => 231, }.to_json).body

That’s it! As data flows in, Sift Science will start crunching it. Every site has its own unique twists, so we’ve built a trainer that lets you explicitly mark users as fraudulent or not fraudulent. Just like marking an e-mail as spam, as you mark more users, our algorithm will learn to detect exactly the type of fraud patterns you’re dealing with.

trainer
trainer

Ready, set, go

Building a large-scale machine learning system takes time and patience. Sift Science started as part of Y Combinator's summer 2011 batch, but a machine learning system can't be launched in the span of one summer. Each site is a little different, and to make a product that works across verticals, we put in long hours, held late-night debugging sessions, and ran hundreds of accuracy experiments. Along the way, we've been lucky enough to have the support of a dream team of investors with expertise in payments, artificial intelligence, fraud, and security:

  • Max Levchin (PayPal, Slide) led our seed round, with participation from Chris Dixon, Founder Collective, Marc Benioff (Salesforce), SV Angel, Start Fund, Alex Rampell (TrialPay, SiteAdvisor), Kevin Scott (LinkedIn), Lee Linden (Karma Science), Garry Tan (Posterous, Y Combinator), Harj Taggar (Y Combinator), and Alexis Ohanian (Reddit, Y Combinator).
  • We recently raised a Series A from Union Square Ventures and First Round Capital, and Albert Wenger of USV has joined our board. Rich Barton (Zillow, Expedia), Chris Dixon (Hunch, SiteAdvisor), and previous investors participated.

We're thrilled, at long last, to launch our public beta and show you what we’ve built. So kick the tires and give it a spin. Try Sift Science, and start fighting fraud with large-scale machine learning today!

Update: check out the coverage in WiredAllThingsDGigaOmTechCrunchTheNextWebVentureBeatWSJ, and Silicon Valley Business Journal.