“Invariably, simple models and a lot of data trump more elaborate models based on less data.” – Fernando Pereira, Peter Norvig, and Alon Halevy, The Unreasonable Effectiveness of Data

Imagine your website’s traffic is skyrocketing. Sales are up each month. Then one day, your payment processor calls you up and tells you you’ve been hit—with $50,000 in credit card chargebacks.

What happened? Fraud. A criminal ring bought $50,000 of goods with stolen credit card numbers from the black market. Weeks after you shipped the goods, the original cardholders noticed a suspicious charge on their monthly statement, called up their bank, and reversed it, generating a chargeback. You, the site owner, are left footing the bill.

When clobbered by fraud, most sites default to fixed rules, such as reviewing every account with more than ten transactions. Commercial systems today deploy 175-225 standard rules, sometimes supplemented by crude statistical models with a few hundred parameters.

But fraudsters don’t play by a fixed set of rules. So why should you?

Large-scale machine learning to the rescue

Sift Science uses large-scale machine learning to automatically discover new fraud patterns. Our algorithm has pored over hundreds of millions of user actions, from both good users and confirmed fraudsters, and distilled them down into one million statistical patterns that predict fraud.

What makes large-scale learning unique is the detail of patterns learned. Like peering through a microscope, when you up the resolution, you can spot surprising details the naked eye would never notice. For example, a user who signs up and waits an hour before making a purchase is 7x more likely to generate a chargeback than a user who purchases immediately after signup. Our system has pinpointed particular page navigation sequences, IP ranges, email address patterns, graph connectivity structures, browser configurations, and even types of text entered that predict fraudulent activity. And it’s learning more patterns each day.

Sites like Airbnb, Uber, Listia, and others already rely on Sift Science. When you use us, you’re joining a network of sites fighting fraud together. As the network grows, our algorithm will crunch more data, learn more patterns, and fight fraud more accurately for everybody.

How it works

You can get started with Sift Science in minutes using our integration guide. The Javascript snippet captures in-browser data like page views or properties of the user’s machine. Transactions tell Sift Science about payments. If a user’s IP address is from Nigeria but their billing zip is from San Mateo, CA, that’s a suspicious sign. Known fraudsters train our learning algorithm to spot patterns unique to your site.

We’ve designed our API to be simple and quick to integrate. For example, here’s how you would send us a transaction in Ruby:

HTTParty.post(“https://api.siftscience.com/v202/events”, body: { “$user_id” => “al_capone”, “$type” => “$transaction”, “$amount” => 153250000,  # $153.25 in micro USD “$currency_code” => “USD”,                 “$billing_zip” => “94111”, “$user_email” => “al@thecapones.com”, “$api_key” => “XXXXXX”,                      “trip_time” => 231, }.to_json).body

That’s it! As data flows in, Sift Science will start crunching it. Every site has its own unique twists, so we’ve built a trainer that lets you explicitly mark users as fraudulent or not fraudulent. Just like marking an e-mail as spam, as you mark more users, our algorithm will learn to detect exactly the type of fraud patterns you’re dealing with.


Ready, set, go

Building a large-scale machine learning system takes time and patience. Sift Science started as part of Y Combinator’s summer 2011 batch, but a machine learning system can’t be launched in the span of one summer. Each site is a little different, and to make a product that works across verticals, we put in long hours, held late-night debugging sessions, and ran hundreds of accuracy experiments. Along the way, we’ve been lucky enough to have the support of a dream team of investors with expertise in payments, artificial intelligence, fraud, and security:

  • Max Levchin (PayPal, Slide) led our seed round, with participation from Chris Dixon, Founder Collective, Marc Benioff (Salesforce), SV Angel, Start Fund, Alex Rampell (TrialPay, SiteAdvisor), Kevin Scott (LinkedIn), Lee Linden (Karma Science), Garry Tan (Posterous, Y Combinator), Harj Taggar (Y Combinator), and Alexis Ohanian (Reddit, Y Combinator).
  • We recently raised a Series A from Union Square Ventures and First Round Capital, and Albert Wenger of USV has joined our board. Rich Barton (Zillow, Expedia), Chris Dixon (Hunch, SiteAdvisor), and previous investors participated.

We’re thrilled, at long last, to launch our public beta and show you what we’ve built. So kick the tires and give it a spin. Try Sift Science, and start fighting fraud with large-scale machine learning today!

Update: check out the coverage in WiredAllThingsDGigaOmTechCrunchTheNextWebVentureBeatWSJ, and Silicon Valley Business Journal.

  1. What a fantastic idea. I’m surprised that I don’t see the source IP address getting analyzed though — in my experience, that can be a good indicator of fraud. Do you have plans to include this in your engine?

  2. Nice Work! A decent use of Machine learning. But why the Nigeria example? Such an example would not help a country struggling with image issues. This is not to exonerate the bad eggs or a denial of fraudulent activities. I just think it is so easy for bloggers to default to Nigeria when it comes to fraud/scam cases. Not a good trend… I guess i am degressing from the great work you guys have done.

    1. This idea is fantastic. I thought about this idea years ago but never had it tried. What kind of models do you guys use? Neural network? Logistics? or some other machine learning method? I am really interested about it. Thanks.

  3. voilà, j’ai encore appris plein de choses qui m’intéresses en merendant sur ce site. Comptez sur moi pour te faire un lien de retour sur mon blog.

  4. Oh my goodness! Amazing article dude! Thank you so much, HoweverI am experiencing problems with your RSS. I don’t
    understand the reason why I am unable to join it. Is there anybody having similar RSS issues?
    Anyone who knows the answer can you kindly respond? Thanks!!