Funnel analysis is awesome. Whether your company has a checkout, a registration, or any kind of process on a website or even in real life, funnel analysis lets you see how many customers are lost at each step of the way before reaching the golden goal of conversion. It helps you find troublesome steps on the path, fix them, and improve conversion rates.

Basic funnel analysis is prepared as follows:

    1. Figure out the steps, e.g. homepage - product page - check out page - billing page.

    2. Determine how to identify the user, e.g. userid for signed up users, ip, session id.

    3. Write queries to get the actual users per step.

    4. Outer join the queries by user id so you can track her in the funnel.

    5. Aggregate.

Sadly, it’s easier said than done. Obviously there is Google Analytics offering basic capabilities, as well as other tools like Woopra. But whether you use them or not, funnel analysis presents technical challenges on the stairway to analytics heaven. Here are 5 of them.

1. Overcoming the Code Barrier

Filtering funnel data by various dimensions requires SQL or another programming language. However, most BI analysts don’t know how to code and the learning curve is rough. Even spending time at the Codeacademy doesn’t help since the queries need complicated join clauses to combine data from different tables. Don’t try this at home kids.

2. Defining Funnel Steps

Page views are the simplest form of steps in the funnel. But what about more advanced definitions? You might want to look at the whole login process as a single step. Maybe you’d like a time limited step, for example, users who entered the shopping cart within 2 minutes of loading a sneakers page. It ain’t easy to define them. In fact, to look at the funnel from different perspectives you may need to redefine them further down the way. Do some stretches, run a warm up lap, and prepare to lift heavyweight queries.

3. Identifying Entities

Sneaker collectors visit your website looking for rare shoes. You want to know their locations, pageviews, visit times, buttons clicked, and so on, so you track every visit and every action. They may access your site via desktop one time and mobile another time. Most of them won’t buy a pair during the first visit, but only upon later visits after comparing prices with the competition. So how do you define each entity and tie all that info together? By IP, username, something else? It isn’t easy. You’ll need to think hard about entity definitions, track all the events per entity, and write long queries to combine all the data.

4. Filtering Out False Data

Consider the following path: homepage > product page > shopping cart > checkout > thank you page. If someone skips the homepage and gets to your vintage Adidas Stan Smiths via organic search results, this session isn’t included in this path. Therefore, your queries must be precise to filter out such cases. Otherwise you’re going to analyze false data and deduce the wrong insights.

5. Choosing Data Format & Platform

Choosing the right data format and platform affects funnel analysis performance and scalability. Take Redis Bitmap. Bitmap is a binary array. You can use it to represent the funnel, each index being a step in the funnel with "1" where the user has been and "0" where she hasn’t. Storing this bitmap per user gives you great performance for analysis. Unfortunately, Redis doesn’t scale well, and bitmaps don’t support advanced analysis - number of visits per funnel step, date of visit, and the visiting order.

Another option is to use MongoDB, keeping a document of pageviews per user. You could use MapReduce to check if the user was at a specific step in the funnel or not, and a second pass to sum up the funnel for all users. It’s easy to setup this process, and if you run it on Hadoop it also scales (running it on MongoDB does not scale as well to petabytes).

Wrapping It Up

There are plenty of challenges when it comes to funnel analysis and no easy solutions. You’ll need to step up your coding skills and write smart queries that process the data correctly. You’ll have to think hard about defining the funnel and entities and choose the right data format and platform. Make sure to wear comfortable sneakers.