Using Apache Superset, a Powerful and Free Data Analysis Tool

Introduction

Among data analysis tools, Apache Superset, provided as open-source software, is considered one of the best choices for deploying reports at a large scale efficiently and completely free of charge. In this article, I will guide you through installing, configuring Superset, and connecting data sources.

This application was initiated by Maxime Beauchemin (the creator of Apache Airflow) as a hackathon project when he was working at Airbnb, and it joined the Apache Incubator program in 2017.

Essentially, Superset's features are quite similar to other data analysis software, including:

  • Creating and managing dashboards
  • Supporting multiple database types: SQLite, PostgreSQL, MySQL, etc.
  • Supporting direct querying

Installation and Configuration

Here, I will guide you through installing Superset using the following Docker command:

docker run -d -p {outside port}:{inside port} --name {container name} apache/superset

Example:

docker run -d -p 8080:8088 --name superset apache/superset


After the Superset Docker container is running, we access that container to run the command for initializing an account as follows:

docker exec -it superset superset fab create-admin --username {username} --firstname {firstname} --lastname {lastname} --email {email} --password {password}

Example:

docker exec -it superset superset fab create-admin --username admin --firstname Superset --lastname Admin --email admin@superset.com --password admin


Next, you run the following command to load some pre-existing examples:

docker exec -it superset superset load_examples


To start Superset:

docker exec -it superset superset init

After that, you can access http://localhost:8080 to start using Superset. The result will have some example data that we loaded previously.


Connecting Data Sources

To analyze data, you first need to create a connection to the database source (such as Postgres, MySQL, etc.). The connection process is simple and similar to how typical data connection tools work. Here, I will guide you on how to connect to PostgreSQL. If you are not familiar with Postgres, you can refer to this article to install and use PostgreSQL basics.

First, access the page to create a new database connection.


Next, enter the SQLALCHEMY URI with the following structure:

postgresql://{username}:{password}@{host}:{port}/{database}


After successfully connecting, you can use the features that Apache Superset supports, such as creating Dashboards, creating charts (with support for many chart types and diverse customization capabilities), querying data, saving queries, and viewing query history.

Creating Charts based on Datasets
Creating Charts based on Datasets


SQL Query
SQL Query

Conclusion

Apache Superset provides relatively comprehensive tools to support data analysis and visualization. It can embed query results into other applications, connect to various data sources, and, importantly, it is open-source and completely free.

Although it may not be comparable to powerful paid tools like Tableau or Power BI in some aspects, overall, Superset is a very worthwhile tool because it meets most data analysis and reporting needs.

What do you think? Leave a comment below!

Comments

Popular posts from this blog

Kubernetes Practice Series

NodeJS Practice Series

Docker Practice Series

React Practice Series

Sitemap

Setting up Kubernetes Dashboard with Kind

Deploying a NodeJS Server on Google Kubernetes Engine

DevOps Practice Series

A Handy Guide to Using Dynamic Import in JavaScript

Using Kafka with Docker and NodeJS