Monitoring your data apps with app tracing gives you better control and can help train new hires to use tools. Amazon Redshift training, for example, can become much easier when you use Integrate.io Data Tracing. If you don’t know how monitoring data apps benefit your organization and tools, the following article will give you a few ideas.

App Tracing surfaces important information about how apps & users interact with your data. It can help answer questions like:

  • Which user is responsible for this spike in concurrency?
  • Who is the most “expensive” Looker user?
  • What is the average latency of a dashboard or model? Of all dashboards executed by a particular user?
  • My Apache Airflow task latency is increasing or jobs are failing. What is causing that?

What is a “Data App”?

Data apps typically fall into one or more of three categories:

  1. Data integration services: Vendors who ETL data from external systems or applications into your data environment.
  2. Workflow orchestration: Tools for workflow orchestration – typically batch processing on your data pipeline.
  3. Visualization & Analysis: Reporting, modeling, and visualization apps used by analysts and data scientists to highlight emerging trends in information.

Data Integration

Integrate can fulfill all three of these functions. It integrates with your favorite tools to show you which end-users connect to your data warehouse. It comes ready to integrate with popular tools like:

  • Integrate.io
  • Dataform
  • Airflow
  • Talend
  • Fivetran

Whether you want to connect your data to more BI tools or you want insights that lead to better Amazon Redshift training, Integrate can help.

Visualization & Analysis

Integrate App Tracing thrives on visualization and analysis. If you want to know how much data an app uses, just click on the icon to get a graph that shows a timeline of the app’s data usage. 

You can also get visualization analysis to learn:

  • How your users interact with tables.
  • Whether you have tables that never get used.
  • Which models and tables get used most often.

Visualization makes information easier for everyone to understand. Whether you have years of experience in IT or you just want an overview of your data use, Integrate.io App Tracing gives you a straightforward approach to view information.

How it Works

App Tracing requires the data app to annotate the executed SQL with a comment. The comment encodes metadata about the application which submitted this query.

Integrate.io will automatically index all data contained in the annotation, and make it accessible as first-class labels in our system. I.e. for Discover searches, Saved Searches, and aggregations in the Throughput Analysis page.

Supported Apps

Out of the box, we support:

  • Looker
  • Mode
  • Periscope Data
  • Chartio
  • Stitch Data
  • Segment
  • ETLeap
  • Apache Airflow (via a plugin)

Don’t see your data app? No problem. Any queries tagged with our format will be automatically detected. See here for instructions on using the Tag Generator to create tags to embed into your SQL.

Example: Which Looker User is Causing a Concurrency Spike

In the below example, a query spike in WLM 3 causes a bottleneck in query latency. The result is that queries which would otherwise take 13-14 seconds to execute are stuck in the queue for > greater than 3 minutes.

App Tracing detects that the majority of these queries are from Looker. How do you know which user is causing this?

Click on the chart, and a widget will pinpoint the specific Looker user(s) who ran those queries. In this example, we see that user 248 is responsible.

App Tracing in Looker

Armed with this information, you can now:

  • Ask the person why they are running so many queries
  • Optimize the queries executed by this dashboard
  • Increase the concurrency of the queue to reduce queue times

Monitoring & Setting an Alarm

See all the activity for this user by heading to Discover and use the new ‘App’ filter to search for Looker user 248.

To set up an alarm to get email notifications, save that search and stream the following metrics to CloudWatch:

  • Query count
  • Execution time & queue time
  • The number of rows scanned & memory consumed by queries run by this user
Cloudswatch App Tracing with intermix.io

See What Customers are Saying

The following Slack conversation took place the morning we soft-launched app tracing in June 2018:

comment_app_tracing

Make Amazon Redshift More Effective

Integrate.io makes Amazon Redshit more powerful. We’ve seen clients use Integrate.io to:

  • Manage costs by getting a deeper understanding of how people use Amazon Redshift. 
  • Discover historic trends that help users find the data they need in Amazon Redshift.
  • Unmask end-users to gain a better understanding of their workflows.
  • Develop Amazon Redshift training that addresses the specific problems their users have.

Using Apache Airflow?

If you’re using Amazon Redshift in combination with Apache Airflow, and you’re trying to monitor your DAGs  – we’d love to talk! We’re running a private beta for a new Airflow plug-in with a few select customers. Go ahead and click on the chat widget on the bottom right of this window. Answer three simple questions, schedule a call, and then mention “Airflow” at the end and we’ll get you set up! As a bonus, we’ll throw in an extended trial of 4 weeks instead of 2!