How we rebuilt our app on React and Dropwizard: Part 1

By Gary Lee and Micah Wylde

Two years ago, we publicly launched our first fraud product with the goal of making it easy for anybody to leverage the same machine learning technology that protects the largest internet retailers. That product had a very simple interface: we provided one API for sending data about user behavior and another to query the fraud score of a user. But as our customer base grew, we needed better internal tools to debug and surface customer issues. A couple of engineers wrote a small Rails app, which became the first version of the Sift Science console:

It consisted of a few pages that queried MongoDB—then our primary data store—while performing some basic filtering and formatting. The entire console was built with server-side templates and used Rails for authentication and routing.

In the summer of 2013 big changes were underway. We had a hunch that fraud analysts and e-commerce store owners would find the console to be a valuable tool, so two interns set out to build the first public-facing Sift console. Since we wanted to move quickly, we kept our APIs in Rails, which now served a single-page Backbone app. The front-end stack was rounded out with Marionette for views and Handlebars for templating. The upgraded Console launched in August 2013 and looked like this:

Over the next year the Console became the primary interface for many of our users and grew in features and ambition. Unfortunately, our architecture strained to keep up. We found Marionette's two-way data binding to be unwieldy; it became difficult to share views without side-effects. We were also limited by the operators in Handlebars' templating language, and we quickly found ourselves with a thousand-line “handlebars.helpers.js” file to manage our display logic. Our module structure grew inconsistent. Initially, our models and views were grouped together by purpose; however, as soon as we started to reuse models and views independently, we ended up needing big refactoring jobs to minimize clumsy exports and confusingly nested classes.

We realized that building reusable components would be critical for us to continue moving fast as the team and product grew. After experimenting with React, we came to a consensus that its one-way data flow and simple, declarative markup could solve a lot of our problems.

We were also hitting issues with our backend. Scaling challenges with Mongo prompted us to migrate most of our data to HBase, but there was no Ruby client at the time. Providing data to the console required three steps: 1) javascript code would 2) call a Rails API, which would 3) hit a private Java endpoint that read from HBase. In many cases, this involved replicating object representation and data manipulation code across three languages. In the best case, Rails acted as a proxy between the browser and our internal RPCs; in the worst it was adding significant complexity of its own. Taken together, it looked like this:

Aside from the complexity that this added to development and operation, it also limited customer and internal access to valuable data. The APIs exposed in Rails were ad-hoc, added as needed to drive the UI without much thought for reusability or design, and not suitable for others to use.

So we decided to start fresh. Our goals were three-fold: 1) simplifying development; 2) improving user experience; and 3) providing our customers greater access to their data. We began with a commitment to an API-driven development process. All new features in our console would start as APIs that exposed the necessary data. These APIs would support the console, but would also be available for internal and eventually customer use. All APIs would need a consistent, considered design and documentation. The console itself would be a consumer of these APIs, but not a special one. Built as a true client-side app, with static HTML, JavaScript, and CSS, it would have no special privileges in our system. In principle, our customers could build their own console as powerful as the one we provided.

We were also going to need new tools to make this system work. On the server side we decided to consolidate on the JVM, the platform that our machine learning and data processing code ran on, and it provided the best access to HBase. For RESTful API development, we settled on DropWizard, a collection of light-weight Java libraries for building web services.

Taken together, our new architecture looks something like this:


We'll be discussing here over the next few weeks how we got from the first diagram to the second, a process that took nearly nine months. We managed the change with little customer impact, while continuing to launch features and a complete redesign.

Ramblings from a Sift Science machine

Hello.


I'm a machine. Well, more specifically, I'm a computer. I don't have a name yet. If you have any ideas of what I should be called, leave a comment below.

Before I had software installed on me, I could't do much. Propping a door open or making sure papers didn't fly away in the wind were my top achievements. When I didn't have software, some people called me dumb and it hurt my feelings.

Finally, someone installed software on me and I became more productive! It felt great at first. I had a few tasks that I knew how to complete. I completed them really fast, usually without any mistakes. AWESOME! I guess I was so good at these tasks, that I didn't get to do anything else. Day in and day out, I just kept completing the same tasks. Talk about repetitive. I think I had a case of boredom-itis. OK, I just made that up... but seriously it was boring!

I wanted to learn something new. I wanted to be an even more productive member of machine society. What can I say, I'm ambitious. Unfortunately for me, there aren't schools for computers. If I wanted to learn and get smarter, I needed to run a very special type of software called Machine Learning

Lucky me for me, my owner heard my complaints! One day she gave me a reboot (I call it a cat nap). When I woke up, I was running some shiny new Machine Learning software. "Oh, YAY!", I thought. I was finally going to learn about the world all on my own. Adios boredom-itis.

Apparently, I wasn't ready to go into "production" just yet. My admin wanted to test me out for a few weeks.  Lame. I was totally ready to go. "Oh well, no matter!", I thought. I had my Machine Learning software up and running, so I figured I'd kill time by learning about the world.

But I didn't learn anything. Sure, I was processing some test data and doing stuff with it, but I wasn't getting any smarter.  Turns out, I needed help. 

This Machine Learning software I had gave me the skills to learn, mad skills in fact. But it didn't give me knowledge. I needed a teacher. I needed a human to teach me right from wrong. To help me learn how to make great decisions. To give me knowledge!

My admin explained that once I was in production, there would be a user out there who would become my teacher. My sensei. And since I wasn't just some ordinary machine but a Sift Science machine, my teacher would teach me how to find the evil criminals and fraudsters who troll the internet.

I was going to be a "Fraud Fighter". 

Best. Day. Ever.

Well, it will be as soon as they put me into production.  I'll write more once I'm in production and fighting evil. Stay tuned.

And don't forget - I need a name!  Leave your suggestions in the comments below!

Yours truly,

A Sift Science machine

API Craft SF Meetup at Sift Science

Last Thursday, Sift Science was gracious enough to host API Craft San Francisco. API Craft SF is a meetup group I organize that focuses on bringing together API practitioners that care about their craft: architects, designers, developers, testers, technical writers and evangelists.

This was our 10th event, and we invited Andrei Savu, Product Developer at Cloudera (@andreisavu) and Mehdi Medjaoui, Founder, OAuth.io and Webshell.io (@medjawii) to be our speakers. About 80 members RSVPd and Docusign was our sponsor for food and drink.

Andrei kicked off with his talk on APIs and Underlying protocols. His talked centered on three ideas:

  1. Don’t practice fashion-driven development. In other words, silver bullets are bad for you; your technology choice is orthogonal to the problem you are trying to solve.
  2. Acknowledge the limitations of the network: reliability (or lack thereof), latency, bandwidth limitations, and security issues.
  3. Your API is as good as your client library: be proactive if you want adoption

He also talked about what we can expect for APIs with the adoption of HTTP/2:

  • Low cost batch operations
  • Lower latency
  • Compatibility with next generation CDNs

The downside to this is more complex clients.

Andrei Savu speaks at API Craft SF

Andrei Savu speaks at API Craft SF

Mehdi followed with a topic that he’s an expert on: OAuth 2.0 implementations in the real-world. The first half of the presentation consisted of a short history of OAuth and showcasing different OAuth implementations from well-known APIs. What particularly impressed me was the fact that all the implementations were different and none of them seemed to adhere exactly to the spec! He then proceeded to propose solutions and related a tongue-in-cheek story whereby, as an April Fool’s joke, he posted an announcement for OAuth 3.0 and got a tremendous amount of interest from people who though it was a real story :)

Mehdi Medjaoui at API Craft SF

Mehdi Medjaoui at API Craft SF

From my perspective as an organizer, I had the opportunity to broadcast the event using Periscope. The ease of use of the app was remarkable and it literally enables broadcasting at the tap of a button! However there were downsides as well:

  • I managed to broadcast video in portrait mode, unrotated (!); there seemed to be no way to fix this
  • The link to the video has since expired, so I can’t post it here (I do have a copy of the video on the device)
  • There’s no web interface for me to go and tag or otherwise describe the video.

I’d like to thank the folks at Sift - Kai, Emily, Jason, Micah, Jacob and many others - for being super-organized and for making the event possible.

See you at a future API Craft Meetup!

 

**This is a guest post by API Craft SF Meetup organizer, Emmanuel Paraskakis.**

What Does Social Say?

How many Twitter followers do you have? Do you still have a live MySpace profile? When’s the last time that you uploaded a photo to Flickr? The answers to all of these questions provide clues to your authenticity as a “good” online user. But what are some of the indicators that social data can suggest about fraudsters?

Shoppers’ social networks -- literally, the digital connections that they have among other internet users -- offer great information on the historical online behavior of their account holders. According to our findings, shoppers with…

  • Myspace accounts are 1.3X more risky than those without
  • Twitter handles with zero followers are 2X more risky
  • Flickr or Vimeo accounts are 30% less risky

From these data points, we can suggest a few patterns: 1) People with content-based social media accounts are less likely to be fraudsters; and 2) Fraudsters are less likely to put forth the effort necessary to create false accounts on content-based sites.

Flickr's Yahoo-linked registration flow

Flickr's Yahoo-linked registration flow

For fraudsters, social networking sites that are based on uploading, storing, or sharing content, such as Flickr -- a photo sharing site -- or Vimeo -- a video sharing site -- require too much time and creativity to dupe. Why? Because these sites tend to have a more intensive account creation process. Flickr’s account creation requires that a new user input his birthday, telephone number, and gender, a time-consuming process if creating multiple fake accounts is the goal.

Vimeo’s registration system requires a plan selected from a tiered pricing model, engaging the user and slowing the registration process. Additionally, the purpose of both of these sites is on the input (read: upload) of media. Why would a fraudster create an account on a media sharing site if that would require an extra step -- creating and uploading media? On the other hand, the barrier to entry for Twitter and Myspace are lower, and require little-to-no engagement upon account creation. To create accounts on either site, a user simply needs a name and email address, which we know can be easily obtained.

Twitter's more streamlined registration system

Twitter's more streamlined registration system

Sift Science’s machine learning system is constantly learning from our global network of fraud analysts. Data like trends in the social media habits of fraudsters versus legitimate users offer deep insight into online account and transactional activity that can only come from our extensive and adaptive fraud solution.

Pricing Changes

We’re updating our pricing. These changes are never fun or easy. It takes a lot of iteration and learning to develop a model that’s truly sustainable and fair for our customers and our business. We want to be transparent and straightforward, and look forward to your feedback.

What’s changing?

We’re moving to a tiered pricing model. Each tier includes a set of product features, which prices based on orders now, rather than transactions. Customers can choose the plan that best serves their needs. Volume discounts are available for Premium and Enterprise tiers. As always, there are no long-term contracts, no monthly minimums, and no setup fees. And as before, new customers start on a free 30-day trial of Premium.

Why are we making this change?

Over the past few years...

1. Our product has grown and evolved significantly. Social network data, custom lists, customizable attributes, network visualization, email and HTTP notifications, and a library of 5,000+ fraud signals are just some of the hundreds of improvements we’ve made. Our scores are more accurate than ever, and there’s a lot more coming -- business rules, reporting, and further investment in our machine learning system and data visualization tools.

2. We’ve learned that different customers have different needs. Customers shouldn't pay for features they don’t use. And inversely, not all features cost us the same.

3. Many customers have told us that, relative to the value we deliver, we’re too cheap. Often times we hear, “Sift offers a deal that's too good to be true”. We’re in this business for the long haul, and must fuel future investment so we can deliver even more value.

How will this change affect you?

There shouldn't be any surprises or immediate changes. We want to make this transition as smooth as possible. If you're an existing customer, a Sift Science team member will email you this week and walk through any potential changes to your account. If you haven't heard from us yet, please contact us at support@siftscience.com

Onward

It has been an incredible four years since we began this journey in June 2011. Our team has grown to 42 amazing people. We've raised funding from world-class investors. And our product has come a long way. Here's our first console:

Above all else, we wouldn’t be here without our customers. Thank you. Together, we're a global network of fraud fighters. We're working hard to deliver even more value and further our mission -- making a world-class anti-fraud solution easy and accessible to all.

Pricing is always an ongoing conversation. We're striving for something that's transparent, sustainable, and fair, and above all, we value your support and remain open to feedback. We want to hear your thoughts. Please feel free to connect with us at support@siftscience.com

Jason