“Real-time Fraud Detection”

This is part of a series of interviews on how companies are building data products. In these interviews, we’re sharing how data teams use data, with a deep dive into a data product at the company. We also cover tech stacks, best practices and other lessons learned.

About

Aaron Biller is a lead data engineer at Postmates. Postmates is an on-demand delivery platform with operations in 3,500 cities in the US and Mexico. With over 5 Million deliveries each month, Postmates is transforming the way food and merchandise is moved around cities.

A successful, on-time delivery is the single most important event for Postmates’ business.

“In the very, very early days, we would query our production database to understand how many deliveries we had for the past day. Reporting on our metrics would happen with spreadsheets. Clearly that wouldn’t scale, and so we shifted our analytics to a data warehouse and used Amazon Redshift” says Aaron Biller, Engineering Lead for Data at Postmates.

What business problem does this data product solve? 

“Data is ubiquitous at Postmates. It’s much more than reporting – we’re delivering data as a product and support data microservices”, says Biller.

Consider fraud prevention. On-demand platforms like Postmates have a unique exposure to payments and fraud because they have to assess risk in real-time. 

While the warehouse does not operate in real-time, it ingests and transforms event data from all transactions for downstream consumption by predictive models and real-time services.

The Postmates risk team has engineered an internal risk detection microservice called “Pegasus”. Event data passes through a series of transformations in Redshift and feeds into “business rules”, which take the transformed data as input and produce decisions as output, with live decisions for every individual transaction on the Postmates platform.

In addition to Fraud, the data team has built an infrastructure that drives four major use cases for data:

  • Reporting: Daily reporting on successful deliveries and other KPIs along different dimensions like market, partners, service, etc.
  • Operations: Minimize the cost per delivery by optimizing route efficiencies, batching / chaining orders, improving delivery times and reducing wait times for couriers
  • Payments: Calculate payouts to Postmates’ fleet of couriers, based on their completed pick-ups, deliveries, wait times, distance traveled and also bonuses and tips.
  • Risk / Fraud: Calculate and predict risk factors for credit card fraud and other transactions that involve payments such as growth promotions and funding of Fleet debit cards.

What is the tech-stack used?

As things have progressed at Postmates, they added more developers, more microservices and more data sources. “The amount of data we have, period, and the amount of new data we generate every day has expanded exponentially,” describes Biller the growth at Postmates.

Consider the amount of data collected during “peak delivery time” on Sunday nights, when people order their dinner to eat at home.

“Three years ago, we captured data from a certain number of ongoing deliveries on a Sunday night at peak. We’re now at about 30x the number of deliveries in flight. And we’re also monitoring and tracking so many more events per delivery. In short, we’re doing 30x the deliveries, and a single delivery includes 10x the data, and it just keeps growing,” explains Biller.

Amazon Redshift and Google BigQuery are the primary data warehouses.  

  • Data ingestion is done via EMR and Kinesis
  • Business intelligence is Chartio
  • Modeling is done in the data warehouse using Airflow

What are the sources of data?

The vast majority of raw data comes from the Postmates app itself. In addition to the app, the data team has built integrations with 3rd party services. Examples include:

  • Facebook for ad campaigns
  • Stripe for payments
  • Zendesk for support

You can write a query that combines 13 data sources in one single query and just run it and get data. That’s extraordinarily useful and powerful from an analytics and reporting perspective.”