Robinhood is a stock brokerage application that democratizes access to the financial markets, which enables its customers to buy and sell U.S. listed stocks and ETFs with zero commission. The company debuted with a waiting list of nearly 1 million people, which means they had to pay attention to scale from the very beginning.
Robinhood’s data stack is hosted on AWS, and the core technology they use is ELK (Elasticsearch, Logstash, and Kibana) – a tool for powering search and analytics. Logstash is responsible for collecting, parsing and transforming logs, before passing them on to Elasticsearch, while data is visualized through Kibana. They grew up from a single ELK cluster with a few GBs of data to three clusters with over 15 TBs. Before data goes to ELK clusters, it is buffered in Kafka, as the rates of which documents enter vary significantly between different data sources. Kafka also shields the system from failures and communicates its state with data producers and consumers. As with many other companies, Robinhood uses Airflow to schedule various jobs across the stack, beating competition such as Pinball, Azkaban and Luigi. Robinhood data science team uses Amazon Redshift to help identify possible instances of fraud and money laundering.