Pro Tableau. Visual Analytics for business

Maksim Krupenin
6 min readDec 1, 2016

--

In this article I’ll describe a great tool for data analytics — Tableau. In general, it’s not a tool, it’s a BI Platform and Tableau is one of the market leaders according to the annual Gartner report Magic Quadrant for Business Intelligence and Analytics Platforms. Here are some examples of using Tableau:

  • Analysis of a dataset on a desktop. Data sources: structured files, databases, web-services (Google Analytics, Amazon,…), etc. Tableau Desktop is a tool for these tasks.
  • Infographics for journalists. It’s possible to publish a data analysis in a cloud Tableau Public and embed it in an article. Here’s an example.
  • Diploma thesis in data analysis for students — Tableau Desktop. It’s not a free tool, but there is a free 1 year license for students.
  • Corporate analytic system on Tableau Server or Tableau Online. Interactive dashboards for business users and ad-hoc analysis (self-service BI) for advanced analysts.

My colleagues and I develop and implement BI systems and Data Warehouses for corporate customers (banks and financial institutions) and this article will be interesting for people who are developing or implementing corporate solutions or planning to do it. It should be noted that we used different BI platforms in our solutions (for example Oracle BI, Qlik View/Sense, MS BI) and Tableau is the choice of experienced team.

I would not like to describe basic things, there are good training videos. The following are practical tips about using Tableau. I hope it will be useful to someone.

DWH. To be, or not to be?!

Currently there are a lot of quick implementation of Tableau and other Data Discovery Tools (for example Qlik View/Sense). Usually these are «departmental» local solutions without data warehouse. It’s not bad for local or temporary solution. But if you are planning to implement a BI system in a large organization and you have a lot of source systems you should implement Enterprise Data Warehouse (EDW). Moreover, you can implement EDW and BI in parallel.

Which DBMS for EDW is better? In my opinion, MPP (massively parallel processing) DB is a right choice for analytics. For example, Teradata, HPE Vertica, Greenplum etc. We chose Vertica for new projects. Tableau + Vertica work very well.

Also there is an interesting solution for Big Data — Apache Kylin — Extreme OLAP engine for Hadoop. We haven’t tested it yet. Could you share your experience in comments?

Data access architecture (extract vs live)

There are 2 types connections to data in Tableau:

  • Extract — all data is copied from data source (db,file,etc) to special files and memory. Data is compressed. It’s possible to do incremental refresh. It works well if you have enough memory and you don’t need latest data.
  • Live — Tableau queries from data source only required for visualization data only (for example, SQL queries are generated in case of using RDBMS).It’s a preferable type if you use DWH with big amount of historic data. But you have to optimize your data structure for fast reading.

We work with DWH and we usually use Live connection. Next I briefly describe data architecture of our DWH. It consists of 3 main layers:

  • Staging Layer — intermediate storage area for dirty data from source system.
  • Core Layer (Detail Data Store) — cleansed, transformed and consolidated detail information. Data is stored in normalized structure (3NF).
  • Access Layer (Data Marts) — detail data, derivative calculated indicators, aggregated data. We use dimensional modeling (Kimball approach). The main requirement is fast reading for reporting and analytics.

Tableau sends queries to Access Layer where data is stored in dimension and fact (measure) tables. Additional benefit of this data architecture is the ability to quickly describe physical data model (Data Source) in Tableau because Tableau use Dimensions and Measures entities too. An example of Data Source:

Approaches for improving performance in Oracle DB

As noted above you have to optimize data model if you use Live Connection mode. You can directly connect Tableau to OLTP system but you will have troubles with performance. Next I would like to describe some approaches for improving performance in case of using Oracle DB as a data source:

  • Data model denormalization. As a result, you get simpler and faster SQL queries. As noted above we use dimensional modeling (Kimball approach) in Access Layer design.
  • Oracle Partitioning Option. In my opinion, large DWH on Oracle and this option are inseparable things. Partitioning is a licensed option of Oracle Enterprise Edition. The main idea is simple. You split a big table into partitions (for example by dates) which are stored separately. As an example, if you need the latest data Oracle scans only needed partition.
  • Parallel Query Execution. I will explain it by example. You have a huge fact table with finance transactions. Table is splitted into partitions by months. You set a degree of parallelism (ALTER TABLE <table_name> PARALLEL 8;). A monthly chart in Tableau needs to summarize and group all transactions by months from this table. Oracle executes such queries in several parallel threads. It’s much faster but uses more server resources. This feature is available in Oracle Enterprise Edition.
  • Materialized views. If previous methods are not enough, you can use materialized views for the groups of similar queries. This is a technology when database store on disk and refresh results of particular hard queries.
  • Oracle In-Memory Option. One more licensed option which was introduced in version 12.1.0.2. With this option you can define tables for auto duplication in memory. In addition, data stores in memory by columns (like in Vertica) and it’s optimal for analytical queries. We have not used this feature yet because we haven’t got expected significant performance improvement. But it’s a promising option.

There are some other techniques, but I will introduce them in the new article. As well it should be noted that you can use described approaches not only in Oracle DB.

Performance Optimization on Tableau side

In addition to previous paragraph, I will notice performance features on Tableau side. It’s described well in official documentation. I like next power features:

  • Parallel Queries — capabilities of source databases to execute more queries at the same time;
  • Query Fusion — Tableau looks at all of the queries in your dashboard and finds ways to simplify them into fewer queries;
  • External Query Caching — Tableau saves the query cache as part of the workbook on both Desktop and Server, offering near-instant load times for applicable workbooks. There is no need to re-query for answers since the results are already there, and users can simply refresh to replace the cache.

As well there is a very useful tool in Tableau which helps to optimize dashboards: Help > Settings and Performance > Start/Stop Performance Recording. This feature helps to analyze all stages of dashboard’s formation. You can see which queries were sent to database, how long it took, calculation time and other events. Next image shows the example of such analysis. Described above performance improvements were added in Tableau Version 9.0 and you can see the difference in performance in comparison with Version 8.3:

Great visualization

In my opinion, it’s much more interesting to analyze data using tool with beautiful and interactive visualization. Tableau is the one of the best tool from this perspective. You can see examples of visualization in Tableau gallery and some our dashboards. Also you can embed Tableau dashboards in web application. Here’s video with example:

A fly in the ointment

You can read a lot about other great features of Tableau but I would like to mention some disadvantages which are significant for us:

  • There’s no functionality for creating pixel perfect reports in Tableau. We need to use our proprietary tool for this. Do you know a good open source tool?
  • Tableau Server is available only for Windows and it’s not good for some our customers.

We hope that Tableau team will fix it.

Thank you for your attention!

--

--

Maksim Krupenin
Maksim Krupenin

Written by Maksim Krupenin

Data & Analytics Solutions Architect

No responses yet