Introducing "Unlabeling" - A new word and a new Sift feature!

Have you ever accidentally labeled a user as “Bad” or “Not Bad”? Or perhaps further investigation on a user left you wishing you could undo the label you initially selected? If this describes you - you aren’t alone! In response to popular demand, we're excited to introduce the new and amazing “unlabeling” feature. Starting today, you can fix your labels quickly and easily.  

Why does accurate labeling matter?  

When you mark a user as “Bad” or “Not Bad”, you are training Sift's advanced machine learning system to better find and predict fraud for your business. For example, if you’ve mistakenly labeled a legitimate and good user as “Bad”, then Sift will learn incorrectly and may mistakenly identify other good users as potentially fraudulent.

We understand that labeling errors do happen and that people change their minds. After all, we're all only human.

This new feature lets you remove the “Bad” or "Not Bad” labels for a user - think of it as the undo button for your fraud team! With unlabeling, you don’t have to worry because it’s easier than ever to focus on fighting fraud.

Best of all? Unlabeling is available today! You can either label directly in the console (as shown in the GIF below) or use our Labels API.

unlabeling.gif

New to labeling? No worries - read about it here and start labeling today!

Happy Labeling (and Unlabeling)!

ML Meetup Success!

This is a guest post from ML Meetup organizer Tony Tran.


The SF Bay Area ML Meetup group recently held an event at Sift Science. It was our first time hosting an event at Sift, as well as our first time having Sift engineers present. Overall, it was an excellent event. The food was great, the venue was beautiful, and our hosts were extremely kind. Those of you who couldn’t attend really missed out! But not to worry, I’ll give you a quick run-down of what happened.

“What did I miss with the talks?”

Both of the presentation slides are available online (ml_infrastructure, feature_engineering) and are easy to understand without narration.

Andrey Gusev gave a lightning talk on “Machine Learning Infrasture.” In it he discussed:

  • Data transformations
  • Online and Batch learning
  • Motivations for using HBase

Doug Beeferman gave the main presentation on “Feature Engineering for Real-Time Fraud Detection.” In it he discussed:

  • What “fraud” means
  • Useful features for fraud detection
  • 10 Lessons Learned (this was extremely insightful)

Unfortunately, my above bullet points don’t do the talks justice, so definitely check out the slides (ml_infrastructure, feature_engineering). Also, Andrey gave a similar talk at the HBase meetup that was recorded (link).

“How was Sift Science?”

One of the most common questions that I get from people who didn’t attend the event is, “how was Sift Science?”

There were a total of 15 Sift Science employees at the event including the CEO and CTO. In my opinion, it really says a lot about the company when both the CEO and CTO are present for community events like these -- I haven’t seen this happen too often.

My impression of Sift was that everyone was extremely humble and willing to help. In addition, the team came off as being very serious when it came to engineering quality, yet very light hearted when it came to interacting with one another. I would say that Sift did a great job at building their company culture. It feels like a place where people would genuinely feel happy coming to work (at least that’s what they tell me).

Would I recommend checking them out if you’re an engineer, or even a non-engineer, looking for a place to work? Absolutely.

Just to clarify, I am in no way affiliated with Sift Science. If you have any questions about this event, or want to get my thoughts on Sift Science, feel free to reach out to me on Twitter (@quicksorter) or message me via the meetup group.

 

The Summary of Sift in Fall 2014

It's been a whirlwind of a fall for Sift Science. Twelve-plus events in 3 months -- including incredible tech talk opportunities, university visits, and interesting conferences -- helped us to close out our 2014 push. What are your favorite events to attend? What tech talk should Sift host next?

Join us on this recap of our fall, where we fell in love.... With great new friends, places, potential Sifties, and Sifterns!


Greylock Tech Fair

July 31, 2014

Held in our own backyard at the San Francisco Ferry Building, the Greylock tech fair was a fun event! Sift joined 49 other startups to meet hundreds of CS local students.

HBase Meetup

August 28, 2014

Sift hosted the SF HBase Meetup, welcoming 52 guests and 3 incredible speakers, including Michael Stack of Cloudera and our own Tech Lead for ML Infrastructure, Andrey Gusev! This event was an opportunity to showcase our amazing new office and gather some like-minded individuals, excited about scaling and data management.

PennApps Hackathon

September 12-14, 2014

In our first trip to UPenn’s Fall Hackathon, we encountered some incredible projects and awesome teams. Sift sponsored 2 prizes, one for Best Data Viz and another for Best Use of ML, and the hacks really blew us away.

Doug at Airbnb Nerds

October 1, 2014

Sift’s first engineer, Doug Beeferman, spoke at Airbnb Engineering’s regularly scheduled tech talk night in early October. Feature engineering proved a fascinating topic for his audience, as Doug demonstrated how Sift takes machine learning out of academia and into real-world, real-time fraud detection.

Doug at the nerds.airbnb.com event

Micah for APICraftSF

Oct 2, 2014

The very next day, Sift engineer Micah Wylde impressed the API Craft SF crowd with an overview of the evolution our API. His talk provided great insight into the time and effort that Sift dedicated to making its console as user-friendly as possible.

Reflections | Projections

October 2, 2014

For R|P, Sift sent two Illinois alumni to connect with the students and share their sifty knowledge. In addition to setting up a table at the career fair, Andrey and Alex shared their insights with the Illinois SIGMIS group on machine learning in the real world and working at a startup.

Sift at Illinois

Tech Talk, MIT

October 6, 2014

With the help of our summer Siftern Keren, Doug took his tech talk on the road. Four pizzas and 40 cannoli later, we connected with a host of ML-focused students on their home turf.

Grace Hopper Celebration

October 8-11, 2014

Sift’s first official visit to the Grace Hopper Conference was totally awesome. Not only did we meet hundreds of smart, ambitious, and inspired women, but we also spread the word about Sift Science! Our Lead Solutions Engineer, Katherine, presented on “Fraud Detection with Machine Learning: A Case Study from Sift Science", and drew quite the crowd. We can’t wait for GHC15!

Code@Night, Princeton

October 10, 2014

The next stop on Doug’s East Coast tour en route to Start@A Startup was at Princeton’s Code@Night. Very sifty Siftern-emeritus David hosted the event and rounded up fifty Tigers for the event.

Start @ A Startup

October 11-12, 2014

Sift CEO Jason and Doug took NYC by storm with Start@, hosting a few panels and offering key learnings for the cream-of-the-crop student attendees. With another talk in the bag and countless engaging conversations, our second year at Start@ was a great success.

UW Startup Career Fair

October 21, 2014

Ahh, UW. As Jason’s alma mater and the old stomping grounds of many a Siftern, UW holds a very special place in Sift’s heart. We had a table set up at the career fair, and a special talk during the post-event reception. Did you see our twinkling Sift booth?

UW booth 2014

UC Berkeley Startup Fair

October 22, 2014

Sift Science employs several proud Golden Bears, and with Cal right across the Bay, we couldn’t help but join in on the UCB Startup Fair. As expected, we met so many amazing students of all years and emphases.

SF Bay Area Machine Learning Meetup at Sift

December 3, 2014

Our final external event of the year takes place tonight! Will you be there?


What are some of the events that you attend or host every year? Which are on your not-to-be-missed list?

Our biggest release yet - the new Sift Science Console!

Today, we released a new version of the Sift Science Console that makes manual reviews even faster, easier and more accurate.

Here're some new features we've introduced:

Lists

With Lists, you can create and save an unlimited number of manual review queues instead of being limited to the Orders, Users and Search tabs. Sharing your Lists with coworkers is as easy as sharing a link in an email.

You can also make decisions faster directly in a List. We now summarize order details and user attributes, like Number of Billing Addresses or Account Age, saving you a trip to User Details to get the same information.

Social Identity Checks

Users who have social profiles tend to be much less suspicious than users without a presence on social media. We now link to profiles on major social networking sites, including LinkedIn, Facebook, Instagram and Twitter, directly from User Details.

Better User Details

We've redesigned User Details to organize Orders, Identity, Network and Social data into dedicated sections. For example, you can now see every order a user has placed all in one section.

A Fresh Coat of Paint

Last, we've redesigned the look-and-feel of the Console to reflect a more modern style we think you'll love.

If you have any feedback, questions or concerns, please reach out to me personally at sripad@siftscience.com or contact our support team at support@siftscience.com.

Thanks so much for your support!

Sripad Sriram

API Development at Sift Science

At Sift Science, APIs are incredibly important. We spend a lot of time improving upon our existing APIs and thinking about how to design even better ones. A couple of weeks ago at the San Francisco API Craft Meetup, I gave a talk on how we built the API that powers our new Sift console.

At its inception, the Sift console was an internal Rails app built for investigating model issues. As we made the console accessible to our users, we rewrote it as a single-page JavaScript app driven by a set of private, undocumented APIs. For the third iteration of our console, we took an API-driven approach. The new APIs that drive the console are powerful and comprehensive enough to allow our users to build their own interfaces atop their data. The console is just another consumer of these underlying APIs.

Some of the technologies we utilized to build it include:

In my talk, I discussed how we migrated our API and console, as well as some of the lessons we learned along the way.

If you missed the talk, you can check it out below! Questions about Sift? Feel free to drop us a line any time.

Running ML Infrastructure on HBase

We recently hosted our first ever HBase meetup! This was a very exciting event for us as it was the first time we showed off some of the great infrastructure and systems we've built to power our machine learning platform.

Of course, we didn't start with HBase. When we first launched in April 2012 our platform was built on MongoDB. At the time, Mongo provided a great balance between flexibility and operability, but we very quickly outgrew it and moved to HBase and now proudly serve thousands of sites and many 10s of thousands of requests per second on our HBase cluster. 

In our talk at the Meetup, Andrey focuses on the underlying infrastructure we have built to support both online and offline learning at scale and how HBase, in particular, lends itself to this problem.

We look forward to hosting more meetups around infrastructure, systems and data science in the coming weeks and months. If you're interested in learning more or hacking on HBase and machine learning, please don't hesitate to reach out to us! 

Service Incident Postmortem: Breakdown and Root Cause

We experienced an outage of our APIs from August 26th, 2014 from approximately 11:36PM PDT to 12:53AM PDT, August 27th, 2014. While we've outlined what happened and the impact to our customers, we'd like to detail the root cause, how we fixed the issue and what we're doing to ensure it doesn't happen again.

Root Cause

Our event processing system is asynchronous. At the edge of our networks we run API servers that receive event data from both our Javascript Snippet and Event APIs. These servers are  mostly stateless but depend on a small database of account information (for validating input) and Amazon SQS for queuing work for our classifier fleet. While the small database of account information is accessed using a full read-through cache that is tolerant of downstream outages, our use of SQS had no provisions for unavailability.

We made the naive assumption based on SQS documentation that a queue would be always available given the Redundant Infrastructure guarantee. More specifically, we assumed that for any logical queue, there were many physical queues across Availability Zones providing the queues availability. This evidently is not the case. While our primary event processing queue had been alive for over 2 years and processed many 10s of billions of events, on the evening of August 26th it simply vanished and reappeared a few hours later.

Read More

The True Cost of e-Commerce Fraud For A Store Owner

How do experts measure fraud? A recurring theme in any fraud-centric conversation is how to comprehend its total costs. Throughout my 12 years in e-Commerce, I’ve worked with countless merchants and their many partners in finance, operations, and marketing. Too often, businesses push fraud to the back-burner, not realizing its true costs. The reality is that the impact of e-Commerce fraud on a merchant’s bottom line is deeply damaging. In this post, I’ll share a real-world example to better illustrate the true cost of fraud.  

Meet Jennifer

Jennifer is a store owner who sells jeans through Shopify, an e-Commerce platform. She buys her most popular product - the Boyfriend Jeans - from her local wholesale vendor at $20 a pair. Jennifer uses keystone markup (twice the wholesale cost) to price her item at $40 and offers free shipping on all purchases.

At first glance, a simple calculation shows a 50% profit ($20 profit from a $40 sale) for her Boyfriend Jeans. Although a 50% profit on every sale sounds appealing to many merchants, there are many more costs that haven’t been accounted for.

Read More

5 Worst Internet Scams of All Time

Online fraud is expensive. The recent StubHub scam cost $1.6 million and the Target data breach cost an estimated $200M (and counting). At Sift Science, we help customers fight back by analyzing millions of data points on patterns of fraudulent behavior and new tactics. We hear about fraud stories, large and small, and discover something new everyday. Today, we take you back in time to show you 5 of the worst online scams of all time:

5. $1.3MM Lost in Online Dating Scam

In 2013, Ellen, a comfortably retired Canadian woman lost her life savings of $1.3 million to “Dave”. “Dave” connected with and wooed the lonely Ellen, who thought she had found companionship on an online dating site. Dating site fraudsters prey on vulnerable men and women to elicit money, gifts, and other favors. It’s almost too easy to fabricate stories, personalities, and relationships from behind a screen. After crooks form “relationships” digitally, all they have to do is devise legitimate-sounding reasons for their victims to send money overseas.

Read More

Seven Habits of Highly Fraudulent Users

At Sift Science, we analyze a lot of data. We distill fraud signals in real-time from terabytes of data and more than a billion global events per month. Previously, we discovered that the U.S. has more fraud than Nigeria and solved the mystery of Doral, FL. At our “Cats N’ Hacks” Hackathon last week, I decided to put some of our fraud signals to the test. Working with our Machine Learning Engineer, Keren Gu, we discovered some interesting fraud patterns[1]:  

Habit #1: Fraudsters Go Hungry

Normal Transactions
Normal Transactions

When we looked at total non-fraudulent (normal) transactions by hour, normal users had slow starts to their mornings. We noticed a slight dip in transaction volume around lunchtime and suspect that's because people are taking lunch breaks! Happily fed, they resumed activity in the afternoon and activity petered out as users went home for the day.

What about fraudsters?

Fraudulent Transactions
Fraudulent Transactions

Fraudsters, however, work through lunch. We don’t see the same dip in activity during lunchtime in the fraudulent sample. It seems that fraudsters are too busy scheming their next move.

Habit #2: Fraudsters Are Night Owls

night owls
night owls

When we analyzed fraudulent transactions as a percentage of all transactions, 3AM was the most fraudulent hour in the day, and night-time in general was a more dangerous time. This finding is consistent with our historical findings and it makes sense: fraudsters are more likely to execute attacks outside of normal business hours when employees aren’t around to monitor fraud.

Habit #3: Fraudsters Are International

international
international

Indian email address domains had one of the highest fraud rates when compared to other top-level domains. However, don't give up on those great Bollywood movies just yet! We're only looking at data from the past three months. We've seen this list fluctuate quite a bit depending on what new tactics fraudsters use.

Habit #4: Fraudsters Don Multiple Identities

multipleidentities
multipleidentities

Fraudsters tend to make multiple accounts on their laptop or phone to commit fraud. When multiple accounts are associated with the same device, the higher the likelihood of fraud. The graph above shows how many times more likely a user is fraudulent given the number of accounts associated with the user's device. Phew, that was a mouthful! Said in another way, a user who has 6 accounts on her laptop is 15 times more likely to be fraudulent than the average person. Users with only 1 account however, are less likely to be fraudulent.

Habit #5: Fraudsters Still Use Microsoft

outlook
outlook

Some of the most fraudulent email domains are operated by Microsoft. Why could this be? Two possible reasons are that 1) Microsoft has been around for a lot longer and 2) email addresses were easier to create back in the day. Today, websites use challenge responses such as image verification or two-factor authentication to verify your [tooltip tip="and innocent!"]legitimate[/tooltip] identity. 

Habit #6: Fraudsters Are Really Boring

boring
boring

One of the most widely recognized predictors of fraud is the number of digits in an email address. The more numbers, the more likely that it’s fraud. Why? Because fraudsters are boring (and lazy). They use computer programs to sequentially generate email addresses so they don’t have to think of new ones. Emails such as “foo1234@test.com” or “foo1234568@testing.com” are highly suspicious. However, detecting fraud using email address alone can be really difficult. The only way to really get good at detecting fraud is to look at hundreds of signals, sometimes in the thousands (that’s where machine learning can help).

Habit #7: Fraudsters Are Sneaky

sneaky
sneaky

Fraudsters like to create disposable accounts that are short-lived. In analyzing the age of fraudulent user accounts (meaning, the amount of time between account creation and a fraudulent transaction), we found that they sign up on sites and then quickly commit fraud. The longer the account age, the less likely the user is committing fraud. Nonetheless, experienced fraudsters know that fraud detection companies track this type of signal. In the graph above, we noticed "sleeper" fraud agents became active after 30 and 60 days of account creation. Fraudsters are sneaky!

Obviously, the above is not a definitive sample set. Data can help us find potential answers as to why fraudsters behave in the ways that they do, but as statisticians say, “correlation is not causation”! It’s important to use common sense and human intuition when it comes to dealing with fraud.

These Insights Brought To You By:

Sift Science Cats n' Hacks Hackathon
Sift Science Cats n' Hacks Hackathon

Did you like reading about these insights and patterns? What other fraud signals interest you? Let us know in the comments, and we’ll pick out a few to write about in our next installment!

[1] Data was collected from the past three months over our entire network. From the hundreds of millions of transactions we processed during that time, we analyzed about 6 million. Our “fraud” sample consisted of transactions confirmed fraudulent by our customers; our “normal” sample consisted of transactions confirmed by our customers to be non-fraudulent, as well as a subset of unlabeled transactions. Please keep in mind that every company faces different type of fraud, and that our findings may not be representative of what you see. All transaction timestamps are local to the user.

Custom Workflows to Match Your Business

Our customers range from on-demand services like Instacart to online retailers like JackThreads to small stores using platforms like Shopify.

Each of our customers is unique not only in the way that fraud affects them, but also in the way fraud teams work through manual reviews of suspicious orders and users. Many of our customers prefer to review just their most recent orders while others prefer to focus on orders with high order values or have mismatches between shipping and billing addresses.

We’ve listened, and with the latest release of the Sift Science console, we’re really proud to give customers the ability to customize manual review queues in the way that makes the most sense for their business.

Custom queues that are personalized for your business  You can now filter queues by any attribute that you send Sift, including order value or country. Also, you can create queues using attributes our algorithms calculate, like the distance between billing address and shipping address or the number of failed transactions.

You still have built-in Orders and Users queues, but now you’ll have the ability to customize those queues further. Also, you can now build a queue completely from scratch through Search, and share that queue with other analysts by sharing a URL.

It's now easier to train Sift Science to spot fraud We’ve also made labeling users a one-click experience in Queues and the User Details panel to help analysts understand the labeling process better as well as be more efficient. You can still add a reason (like chargeback or spam) after you’ve labeled a user.

Release We’ll be rolling these changes out to you on August 4, and we won’t be supporting earlier versions of the console moving forward.

Help make Sift Science better! We love feedback! If you have any thoughts you'd like to share, please let us know what you think by emailing support@siftscience.com.

Thanks! The Sift Scientists

Behind the Signal: Doral, FL

What’s up with Doral? Let’s say you’re going through orders, and you come across one with a high order value where the billing and shipping addresses don’t match. You decide to do a bit of sleuthing, starting with research on the shipping city: Doral, FL.

At first glance, shipping to Doral seems like a no-brainer:

Based on that information, it’d be perfectly reasonable to ship that order.

However, there’s also cause for caution. Sift Science has found that --  despite Doral’s wealth and status as member of the Trump empire -- orders shipped there are 8X more risky than normal!

What Versus Why

At Sift, insights like these are discovered automatically, and often the signals are subtle and not immediately intuitive. After all, a computer can say "what", but it takes a human being to say "why".

For Doral specifically, I did ask “why”, and here’s what I found.

Doral’s land zoning looks like this:

Map of Doral, FL
Map of Doral, FL

Yes, Doral is home to not only 90 holes of golf, but also a lot of industrial land!

It has over 3K logistics-related companies and the Miami Free Zone, which offers 750K sq. ft. of duty-free warehouse space.  Its proximity to Miami International Airport (the #1 airport in international freight) and Port Miami (which moved 12.5B tons of cargo last year) means it has a thriving logistics industry.

Some of these warehouses offer package forwarding as a service. So, shipping to Doral is risky because that package has a higher likelihood of being forwarded to someplace else.

So what do you do?

Clearly, you shouldn’t blacklist Doral, since most of its inhabitants (population: 48K and growing) could be great customers. Similarly, you shouldn’t blacklist forwarding addresses since not all fraudsters use forwarding addresses and not all forwarding addresses ship to fraudsters.

Ultimately, shipping city is only one factor you should take into account when assessing fraud risk. It could be worth cancelling the order if there were other risky signals, such as how the email was typed in, what products -- colors, sizes, etc. -- make up the order, and shipping address.

At Sift, we have a proprietary database of over tens of thousands of known risky addresses.  However, the number of risky addresses is growing every day. As some addresses get blacklisted by larger retailers, package forwarding companies will change their warehouse locations. Plus, the freight industry itself has been under pressure, and package forwarding as a side-business could be a nice source of incremental profit.

While we’re not experts in package forwarding and freight, we are experts in identifying risk signals associated with this industry using machine learning. We track addresses and more to predict the likelihood of fraud. Follow us on Twitter to learn about more fraud signals or let us know about fraud signals you've investigated in the comments below. We'd love to hear them!

How Did My Credit Card Info Get Stolen?

Nobody likes dealing with credit card fraud. It can be embarrassing and difficult to admit that you’ve been a victim. At Sift Science, we often hear from our customers about 2AM nights at the office spent triaging thousands of orders that were placed with stolen credit cards. Today, we thought it would be helpful to understand how it all starts. To do this, we need to go underground deep inside criminal territory. It goes without saying that credit card fraud is malicious and illegal. It can result in felony charges added with several years of imprisonment in jail.  

Simply put, credit card fraud starts with theft. With determination and time, fraudsters can obtain credit card numbers and information at any price. In fact, an entire underground economy, complete with moderators and reviewers, exists for criminals to buy and sell your information online. Databases of people’s names, credit card numbers, and even complete bank account login information (also known as “FULLINFO” or “FULLZ”) can be sold anywhere from $2 to $50. “Carders” as these thieves are called, even share tutorials and spread information on which sites are vulnerable to attack.

The act of the theft itself can take shape in a number of ways. The most common is through hacking databases, sending phony emails (also known as “phishing”), and exploiting security holes. Sophisticated carders usually hoard the information and sell them in bulk to consolidators. The consolidators then sell them on the black market lurking in secret online forums or chat rooms. They even offer flash sales on bulk discounts. Here is a sampling of “products” and prices we found on our own research via Google:

menu
menu

Once thieves obtain these credit card numbers, they run test transactions to make sure the cards are valid. In the old days, thieves would encode the credit card numbers onto fake plastic cards. Today, with the increasing prevalence of online payments, thieves first test them on sites by buying small ticket items (e.g. $3 earrings) or signing up on sites that offer free product trials. Once they verify that the card works, they move on to bigger ticket items, leading to outright theft and chargeback fraud.

By the time we realize we have been victims of fraud, it’s too late. Goods will have been shipped and gifts cards redeemed. It shouldn’t be a surprise then that fraud has caused over $5B in losses a year and has shut down thousands of businesses.

To learn more about simple steps you can take to prevent credit fraud yourself, check out Federal Trade Commission’s article, Protecting Against Credit Card Fraud.

Three Ways Gamers Cheat in Online Poker

As we mentioned before, there are many signals linked to fraud in the digital world. At Sift Science, we use advanced fraud detection technology to help customers identify bad behavior and adapt to tactics in real time. In the online gambling sphere, where regulations and oversight are unclear, gaining player trust by providing a safe and fair environment is paramount. One way to improve game experience is to prevent fraudulent behavior. Here are three common ways gamers commit fraud in online poker.

1. Bonus Abuse Through Multiple Accounts

Poker sites often give away play money using bonus codes to attract new players. Fraudsters try to take advantage of this and sign up using multiple accounts at the same game table or tournament, causing the poker site to lose money while also providing a bad experience for other players. Usually it’s enough to track account registration by IP address, but for advanced cases, more sophisticated tools are required. The best fraud detection tools use device fingerprinting to find multiple accounts created by a single laptop or computer.

2. Computer Bots in Poker Rooms

Hackers have created computer programs (“bots”) that automate online poker play. Bots are banned from poker sites because they create an unfair advantage–computers have no emotion, so they are not subject to “tilt” (the poker term for player aggression when they play a poor strategy). Fraud rings have been caught colluding and cheating players out of hundreds of thousands of dollars using bots.

So how do poker sites detect bots? While most detection techniques are proprietary and unknown to the general public, some measures include monitoring player reaction time, suspicious mouse movements, and randomized pop-up windows with challenge questions.

3. Chip Dumping in Tournaments or Ring Games

Chip dumping happens when a player intentionally loses chips to another player at the table to give them a better chance to win. It has become a way for players to launder money. Fraudsters use stolen credit cards to deposit funds and then dump chips at a cash table to another account he or she created. In other cases, the fraudster will hijack an innocent player’s account (“account takeover”). Online poker rooms typically check for players making curiously large bets with a terrible hand or folding on a relatively safe bet.

 

Interestingly, most fraud is caught by vigilant human players who report fraudulent behavior. However, cyber criminals can still take advantage of even the most experienced (and most valued) players. One reason is that online poker is still mostly illegal in the US and most sites are physically located offshore. It can be difficult to determine whether sites are legitimate and whether it’s safe to hand over your credit card number. The good news is that there are simple steps players can take to protect themselves from fraud.

To learn more about common methods online poker rooms use to combat fraud, check out Cheating & Collusion at Online Poker Rooms. If you’ve been a victim of online fraud or would like to learn more about us, let's talk.

Our next chapter

The internet offers unprecedented connectivity, scalability, and anonymity. Unfortunately, it can also be abused. As activity moves from the physical to the online world, so does fraud. Online chargebacks, spam, referral abuse, and account takeovers cause all sorts of headaches for businesses that would rather focus on their core competencies. At Sift Science, we make world-class online fraud detection easy and accessible to merchants of all sizes. Just over a year ago, we launched our first product: a fraud detection API that empowers online merchants with realtime, large-scale machine learning. This is the same core fraud detection technology used by giants like Amazon and Google.

And boy oh boy, it’s been a busy year. We launched a new version of our API, a real-time fraud console, plugins for Shopify and Magento, and many other exciting changes. We now analyze more than $1.5 billion of transactions and 600 million events each month. We’ve helped customers detect, in realtime, 95% of their fraud with an industry-leading 7% false positive rate. We’ve cut their manual review rate more than sixfold, while enabling them to capture revenue that would have otherwise been rejected. Our customers include retailers of physical and digital goods, financial services companies, marketplaces, mobile-only companies, nonprofit organizations, and online communities on all six habitable continents. They range from high-growth businesses like Airbnb, Uber, OpenTable, Indeed, JackThreads, Kickstarter, and HotelTonight, to mom-and-pop shops collecting their first dollars. We also won the Best Emerging Technology Award at this year’s Merchant Risk Council conference (a key event in the anti-fraud industry). Woohoo!

And now, some exciting news. We recently closed an $18M Series B round of funding led by Spark Capital. We welcome Mo Koyfman to our board of directors, a kindred spirit who shares our passion for great product experiences and big thinking. We’ll use the funds to grow our team and accelerate our sales, marketing, and product development initiatives. We have just begun our mission to make the internet a better place. Our machine learning product improves with more customers and data, and over time we believe that this network can deliver tremendous value across the web.

To our customers and investors - thank you for your continued support. We will work hard to deliver even more value. To our potential customers - don’t hesitate to contact us and learn how we  can help protect your business. To potential candidates - we’re hiring across the board.

Onward!

 

What is Big Data (Part I)

This post is part of a series that discusses, in simple terms, machine learning and big data. Today we're demystifying big data. To learn about machine learning, check out Machine Learning For Poets.

What is Big Data?

What is big data? Many define it in terms of the computing power it requires. To understand what big data is, however, you first need to know what big data means. In this post, we’ll discuss the implications of big data’s meteoric rise.

What big data means

The excitement around big data isn’t just marketing hype. In fact, it captures a qualitative shift, from model complexity to data complexity.

Answering complicated questions used to require equally complicated models. Despite their elegant mathematical underpinnings, these were usually imperfect, especially when modeling real life. They required many assumptions, which didn’t always hold true (e.g. “Humans are rational”).

Human behavior is more complicated than  E = mc2. Therefore, when making predictions about humans, discovering how things actually work has proven more effective than depending on a caveat-laden model.

In other words, big data frees us to derive insights empirically. With enough information, you can approximate what you want to know by "asking the data directly" rather than relying on assumptions. Fewer assumptions mean fewer places for things to go wrong.

Of course, the quantity of data required to reduce model complexity results in -- you guessed it -- increased data complexity.

Fight fraud with big data

At Sift, we know that big data is critical to staying ahead of fraudsters. Contemplating what I think fraudsters do is less important than discovering what they actually do.  Predicting fraudster attacks based solely on recent trends is less effective than incorporating all information.  Constraining your fraud team to a limited set of variables is less efficient than using every piece of information available.

So now you understand the most important aspect of what big data is: its implications. Next up: the logistical challenges that define it.

For more insight, look at Alon Halevy, Peter Norvig, and Fernando Pereira’s excellent paper The Unreasonable Effectiveness of Big Data. Stay tuned for more explanations, applications, and discussion on machine learning and big data. If there are specific topics you’d like us to cover, let us know at info@siftscience.com or @siftscience!

Five Fun Fraud Facts

As an e-commerce fraud analyst, you’re expected to decide whether a transaction is good or bad, often with ambiguous transaction and customer data. This can leave you feeling like Lucy, especially during the holiday season.

In the absence of a fraud detection system, here are five signals you can use to assess fraud risk. Remember, these are aggregate signals based on data from many companies. Your mileage may vary.

  1. Fraudsters have stacks on stacks of cards. If the customer has multiple credit cards on file from different banks, their order is 7x more likely fraudulent.
  2. fraudsters dislike capital letters. if a customer wrote their billing name in all lowercase letters, the order is 2.7x more suspicious.
  3. Fraudsters stay (virtually) on the move. A buyer with multiple billing zip codes within a week is 30x more likely to be fraudulent.
  4. Fraudsters favor disposable email addresses. An email address with two or more digits is twice as likely to be fraud than one with zero or one digit.
  5. Fraudsters are night owls. Transactions at 2AM are 50% more likely fraudulent, while 4AM transactions are 100% more likely fraudulent.

How can you further improve your fraud detection accuracy? Customization. Advanced fraud detection solutions like Sift Science can incorporate data unique to your business into our scoring. We call these custom events.

For example, an online shoe store like Zappos could send us the shoe size for each transaction as a custom event. It might turn out that size 10 shoes are more fraudulent than size 15 shoes. This makes sense intuitively: there are more people walking around with size 10 feet and fraudsters often focus on goods they can resell easily. Wondering how Sift Science can solve your e-commerce fraud challenges? Drop us a line, we’d love to help.

E-commerce fraud: where it hurts

Here at Sift Science, we make powerful fraud detection software available to companies of all sizes. Fraud can mean many things and impact many different parts of these organizations. As noted in our post on global fraud, we detect three main kinds of e-commerce fraud (plus other specialized kinds): payment fraud, new account fraud and account takeover. Below, we’ll take a closer look at each type and whom within a company they hurt.

Payment fraud

Online payment fraud means using stolen means of payment to make a purchase. Typically, this involves stolen credit card numbers although it could entail other payment info like bank account routing numbers or Paypal credentials. Stolen credit card numbers are shockingly easy to obtain cheaply online. The first time many unprotected merchants learn of payment fraud is after they’ve fulfilled an order, when their credit card acquirer notifies them of a [tooltip tip="the original charge will be reversed, as the consumer has realized their card was used without their permission"]chargeback[/tooltip]. For [tooltip tip="when the physical card is not physically present at the time of the transaction. E.g. all e-commerce"]card not present[/tooltip] transactions, the merchant must make restitution, meaning they lose both the revenue and merchandise itself.

While payment fraud understandably impacts the Finance team through the chargeback fees and lost revenue, it also hurts Sales via channel cannibalization. In one case, an e-commerce company detected fraudsters buying their products with stolen credit cards and reselling them on Amazon Marketplace at deeply discounted prices. Fraudsters were thus not only stealing merchandise, but were undercutting legitimate online vendors, causing them to become angry with the original company.

New account fraud

New account fraud occurs when a fraudster opens a new account on a site and does something undesirable with it. Marketplaces and social networks in particular focus on limiting this kind of activity due to their peer-to-peer models. If fake listings or users proliferate, legitimate ones will be spooked, defrauded or otherwise deterred from participating, and the community could become stagnant.

New account fraud impacts Community and User Experience teams due to its drag on user engagement.  These teams must spend valuable resources sifting out fraudulent activity patterns. The recent uncovering of a major attempted Kickstarter scam that raised over $120,000 for Kobe beef jerky serves as prominent example of this sort of fraud. Luckily, a film documentary team caught them and Kickstarter froze the account before any backers were charged.

Account takeover

Account takeover is simply when a fraudster commandeers an existing account and uses it for malicious purposes. Wired reporter Mat Honan’s detailed account of how hackers systematically gained access to his entire online identity (iPhone, Amazon, Gmail, Twitter) provides a chilling example of its feasibility. On a larger scale, Riot Games, maker of the popular online game League of Legends, announced this week that its databases were hacked and user names, (salted) passwords, email addresses as well as 120,000 old transaction records were compromised. Account takeover hurts Customer Service teams given the need to change passwords, create new accounts and otherwise repair the damage caused.

Specialized challenges: referral fraud

Fraud takes many other specialized forms. Consider referral fraud, a.k.a. affiliate fraud. Fraudsters will take advantage of refer-a-friend programs at many e-commerce sites by creating multiple identities to maximize their gains. The vast array of sites offering such programs bonuses can be found at refAround. With referral fraud, it’s Marketing who loses, as customer acquisition spending comes from their budget.

What can you do to fight back? Consider a fraud detection solution like Sift Science. We provide protection from all the above mentioned e-commerce fraud types and constantly update our models with emerging threats, leveraging insights from across the customer base. Please get in touch to discuss your challenges, we’d love to talk.

The USA has more e-commerce fraud than Nigeria

Sift Science customers hail from all [tooltip tip="On that note, let us know if you know any Antarctica startups..."]six habitable continents[/tooltip]. We’re seeing e-commerce fraud activity from practically everywhere as well, Albania to Vietnam. Since the Sift Science team includes quite a few data geeks who love #uberdata and OkCupid’s OkTrends blog, we thought we’d share a visualization of our global fraudulent transactions. What sort of fraud are we seeing? That deserves its own post (coming soon), but there are three major types: payment fraud (e.g. using stolen credit cards to buy goods), new account fraud (i.e. creating an account to do illicit stuff like money laundering) and account takeover (i.e. using someone's existing account to do illicit stuff). Global e-commerce fraud rates by country

Above is a map of [tooltip tip="Defined as reported fraudulent transactions / total transactions originating in that country"]fraud rates[/tooltip] by country. Based on a sample of our transaction data, here are the top ten most fraudulent countries. You can see the top 25 countries at the end of the post.

  1. Latvia
  2. Egypt
  3. United States
  4. Mexico
  5. Ukraine
  6. Hungary
  7. Malaysia
  8. Colombia
  9. Romania
  10. Philippines

Biggest surprise? Nigeria. For all of the flak Nigeria gets with their e-mail scams (not all of which originate in Nigeria), we’re not seeing a lot of fraud from Nigerian IPs. In fact, Nigeria (#17) has only slightly more fraud than Canada (#18).

Several caveats are worth noting. Since this is based on a sample of our collected transaction data, it is not necessarily representative of the overall e-commerce fraud rates globally. For simplicity’s sake (developer time is a precious commodity at Sift!), we used the reported IP address as the country of origin. Lastly, just because a country shows up as higher fraud on this list doesn’t mean a merchant should create a fraud rule for it. We instead suggest adopting a more robust and versatile solution able to adapt to new patterns.

For our more technical readers-- we used just over half a million transactions and included only those countries with at least 1000 total samples and at least 10 fraud samples. That puts the size of the 95% confidence interval on the fraud rate at just under 1%. To draw the map itself, we used d3 with topojson. Then, we overlaid the countries onto a Mercator projection, and computed the color as [percent of transactions labeled as fraud]*[max red saturation].

In the future, we’ll be sharing other insights from the terabytes of data we analyze to detect fraud. What would you like to see? Get in touch via Twitter or email with your suggestions.

Here are the top 25 fraudulent countries, from most fraudulent to least fraudulent.

  1. Latvia
  2. Egypt
  3. United States
  4. Mexico
  5. Ukraine
  6. Hungary
  7. Malaysia
  8. Colombia
  9. Romania
  10. Philippines
  11. Greece
  12. Brazil
  13. China
  14. Indonesia
  15. Russia
  16. Singapore
  17. Nigeria
  18. Canada
  19. Portugal
  20. Switzerland
  21. United Kingdom
  22. India
  23. Netherlands
  24. France
  25. Austria

Mobile e-commerce fraud detection insights

Mobile e-commerce is exploding. In the US, 56% of people already own smartphones. Internationally, adoption projections for countries like China show this trend is just beginning. Unfortunately, with the increasing limitations on mobile device fingerprinting, mobile e-commerce fraud detection has also become more complex.

Less data, mo problems

Mobile fraud suffers from two data-related problems: merchants ask for less customer information and the device data they do collect is less useful. Merchants request less info because conversion stands as their greatest challenge. Specifically, mobile customers give up nearly half their shopping attempts because the process takes too long. While the prioritization of conversion and growth over fraud detection is understandable, merchants are increasing their risk.

Example of efficiency versus mobile e-commerce fraud detection

Besides the fact that some traditional signals are unavailable on mobile devices (e.g. IP-based location), merchants are finding that remaining data is often insufficient. In May, Gartner estimated that ~40% of mobile devices could not be uniquely identified...quite problematic as fraudsters shift to mobile along with legitimate customers.

Unique mobile e-commerce fraud detection patterns

Large-scale machine learning solutions like Sift Science provide a competitive advantage due to their breadth and flexibility. Two examples from our data illustrate machine learning’s power in e-commerce fraud detection. First, when comparing top fraud signals for a desktop web site to a mobile app, we found almost entirely different predictive fraud patterns (see table).

Differences between desktop and mobile app fraud detection

Notably, while behavior matters in both environments, the nature of in-app navigation requires a detection solution able to take into account the unique way each app is designed.  At Sift, we do this by accepting custom events. These are crucial in understanding whether a customer is a potential fraudster. The results also make a strong case for capturing more data, given the potential for any pattern to be predictive in detecting fraud.

New accounts: always riskier?

As a second example, consider the common belief that transactions from newly created accounts are riskier. In fact, our system uncovered a more nuanced reality, one difficult to detect without machine learning.

Nuanced results spotted by machine learning-based e-commerce fraud detection

Why might this be? Many sites have a “sign up when you make your first purchase” option that’s used by legitimate customers. In contrast, fraudsters tend to create accounts and then go shop for merchandise. Of course, time ranges will differ between companies, so custom variables are crucial for a fraud detection system.

These mobile e-commerce fraud detection insights demonstrate how a large-scale machine learning based solution not only catches more fraud, but also more efficiently identifies legitimate customers. Check back here often (or sign up for our email list) because we’ll be covering other fraud-related topics in future posts, such as technical aspects of mobile fraud and a look at fraud by country.