Using Apache Superset, a Powerful and Free Data Analysis Tool
Introduction
Among data analysis tools, Apache Superset, provided as open-source software, is considered one of the best choices for deploying reports at a large scale efficiently and completely free of charge. In this article, I will guide you through installing, configuring Superset, and connecting data sources.
This application was initiated by Maxime Beauchemin (the creator of Apache Airflow) as a hackathon project when he was working at Airbnb, and it joined the Apache Incubator program in 2017.
Essentially, Superset's features are quite similar to other data analysis software, including:
- Creating and managing dashboards
- Supporting multiple database types: SQLite, PostgreSQL, MySQL, etc.
- Supporting direct querying
Installation and Configuration
Here, I will guide you through installing Superset using the following Docker command:
Example:
After the Superset Docker container is running, we access that container to run the command for initializing an account as follows:
Example:
Next, you run the following command to load some pre-existing examples:
To start Superset:
After that, you can access http://localhost:8080 to start using Superset. The result will have some example data that we loaded previously.
Connecting Data Sources
To analyze data, you first need to create a connection to the database source (such as Postgres, MySQL, etc.). The connection process is simple and similar to how typical data connection tools work. Here, I will guide you on how to connect to PostgreSQL. If you are not familiar with Postgres, you can refer to this article to install and use PostgreSQL basics.
First, access the page to create a new database connection.
Next, enter the SQLALCHEMY URI with the following structure:
After successfully connecting, you can use the features that Apache Superset supports, such as creating Dashboards, creating charts (with support for many chart types and diverse customization capabilities), querying data, saving queries, and viewing query history.
Conclusion
Apache Superset provides relatively comprehensive tools to support data analysis and visualization. It can embed query results into other applications, connect to various data sources, and, importantly, it is open-source and completely free.
Although it may not be comparable to powerful paid tools like Tableau or Power BI in some aspects, overall, Superset is a very worthwhile tool because it meets most data analysis and reporting needs.
What do you think? Leave a comment below!
Comments
Post a Comment