Must-know information about data analytics, data stacks and business value realization for a decision-maker.
In our day-to-day life, we monitor a lot of indicators about physical health, financial health, mental health, and more. Similarly, there is a multitude of indicators that aid enterprises in understanding their current and target business operation states. If you are in the journey of capturing these KPIs, you need to know the data sources and software ecosystem that render the error-free, information-rich, actionable KPIs from the available data. This exercise of KPIs building is more of a data analysis, which is like baking.
Using unclean ingredients for the baking makes the pastry unconsumable, similarly, the unclean data need to be processed appropriately before bringing to the baker’s table. The ingredient quantity and the oven temperature bring out the crispy cookies, similarly when the math well applied to the data brings the error-free and acceptable KPIs.
Using unclean ingredients for the baking makes the pastry unconsumable, similarly, the unclean data need to be processed appropriately before bringing to the baker’s table. The ingredient quantity and the oven temperature bring out the crispy cookies, similarly when the math well applied to the data brings the error-free and acceptable KPIs.
Good data infrastructure coupled with competent data analysis brings the dependable KPIs for making your business data-driven.
Let us discuss about data stack and data definition,
Data Stack: Data stack is a set of software units that helps to move the data from different data sources (from SAP, CRM, HRMS, Financial Systems, etc), loads into a new unified destination, clean the data, and set it ready for data visualization (for business users) and consumption of data scientists (for advanced use cases). You can learn more details here.
Data Definition: Data definition is simply defined as how various data points (variables) are arithmetically processed to get a final value that helps in making a business decision. Let me demonstrate this with an example.
In the below data set, the sales of outlets are captured, the product visibility in the storefront enables easy access of the product and more sales eventually. But some necessary items such as fruits and vegetables though moved to less visible areas also generates enough sales. If you create a new KPI concerning product placement/visibility in a Type I supermarket in Tier 1 location, that will help the sales acceleration. This needs more questions to be answered about the product attributes, day of sale, and current product visibility.
In the below data set, the sales of outlets are captured, the product visibility in the storefront enables easy access of the product and more sales eventually. But some necessary items such as fruits and vegetables though moved to less visible areas also generates enough sales. If you create a new KPI concerning product placement/visibility in a Type I supermarket in Tier 1 location, that will help the sales acceleration. This needs more questions to be answered about the product attributes, day of sale, and current product visibility.
A statistical and mathematical calculation that renders the new KPI for the business users in an error-free and recurrent decision making of product placement in various cities in different types of supermarkets is termed as data definition (some practitioners term this as data augmentation or concoction).
A statistical and mathematical calculation that renders the new KPI for the business users in an error-free and recurrent decision making of product placement in various cities in different types of supermarkets is termed as data definition (some practitioners term this as data augmentation or concoction).
Your company should set up its infrastructure with a central database that harbours the data for analysis (by your business users and data scientists) and reporting.
This paves the path to the data-driven business. Yes, you have a single version of data that gives the necessary information for your business operation. These data need to be cleaned and packed in different boxes that can be accessed by different groups.
Ok, but where to begin?
The first and foremost is the C-suite support. This goes without saying.
What could be the potential use case in your industry? In most setting the best one to start is with business intelligence projects rather than a data science project.
After deciding the use cases, you need to work out the data stack (or data infrastructure). Then who will handle the data and the governance of the data within your organization. This potentially answers the questions:
Your data infrastructure decision is very much dependent on the type of data you have (structured/unstructured) and the use cases that you have decided to work on.
Few more things to consider:
The general layout of the BI project is as follows:
Data loading is the process of moving the data from the source systems such as ERP, CRM, HRMS, other third-party applications to the data warehouse. Here we have two options for data loading:
So, how to choose between these options:
If your answer is “yes” for any one of the above, then you need to go with the “make” decision. If you have any other situation than the above quoted, write to DataTheta to obtain our opinion.
There are multiple factors you need to consider:
Speed to data consumption: How fast your team wants to consume the data. If you have ample time in hand, then plan for your first data engineer hire. If you are moving now new to the business intelligence space it is advisable to outsource the data engineering work as the workload will be less than 3 weeks for the initial projects.
Skill Availability: The skill availability is less due to the skyrocketing demand in the market. When I am drafting this article, the count of job openings is 14,000+ in India and high in other markets too.
Cost: The cost of hiring 3+ years of experience data engineer will cost around 80kEU to 110kEU per annum in Europe and 110K USD per annum in the US. Moreover, the workload will be more at the initial days of the project. Hence outsourcing makes perfect sense in most cases.
Cost: The cost of hiring 3+ years of experience data engineer will cost around 80kEU to 110kEU per annum in Europe and 110K USD per annum in the US. Moreover, the workload will be more at the initial days of the project. Hence outsourcing makes perfect sense in most cases.
What is the amount of data your business will aggregate in the next year? The expected volume of data is the deciding factor for the selection of a database.
There are regular databases and massively parallel processing (MPP) databases.
If your data table does not exceed 5 – 8 million records, then you may opt for a regular SQL database. We recommend PostgreSQL for various reasons. If your business stores more data then you may consider Snowflake, Redshift, or BigQuery. If you have multiple petabytes of data, you need to consider the Data Lake architecture. Is this word new to you? Read more about Data Lake here.
After the data arrives in the central database, it is imperative to break it into clean, small, useful chunks and harbor it in different tables. This process is called data transformation and aggregation. There is a multitude of tools available to do this job.
Datatheta uses Pentaho for the data transformation jobs. Dbt is another tool worth the mention. It is an open-source tool and reduces the repetitive jobs of a data engineer. The CRUD procedure automation and keeping tab of the table lineage are useful features apart from the data testing and version control.
Data visualization is the critical component of a BI project. Mere visual appeal is not the game decider. Instead, the following need to be considered:
At DataTheta, we tried Power BI, Tableau, Metabase, and Highcharts.
Metabase is an open-source tool. If you have less than 50 users to access the dashboard and the users know about reading the information content from data, then this is for you. If Metabase is hosted in your server then it is free to use, yet powerful.
If you are a bigger organization with more than 100 end users and require centralized control of the dashboard, then Power BI is the best option. The other tools such as Tableau and Qlik are also good to explore.
We tried Highcharts, this comes in a perpetual license model. If you have an internal team to handle the BI, this is a low-cost alternative to the Power BI and Tableau.
The cloud service provider plays an important role in your business intelligence journey. If you prefer to stick to an on-premise data centre, you need to rethink the decision. Cloud services are useful and efficient in various aspects.
AWS, Azure, and Google Cloud are the market leaders in this space. If you are already utilizing the service of any of these providers, consider building the data stacks with them. You may negotiate a better deal based on your existing relationship. This article has covered the subject comprehensively.
It is important to know how the entire data ecosystem works. More important is deriving business value from data-driven decision-making. Data literacy is the important outcome of these efforts. This can be achieved by:
This is an ever-growing field, and the technology evolves faster, so allow the right people to support your data analytics journey.
This article is created to give an overall picture of the data analytics stacks. If you are interested in our analytics service and skills-as-a-service reach out to us. You are welcome to write your comments and queries.
Images Credit: Freepik.com and Excalidraw.com
From global engineering and IT departments to solo data analysts, DataTheta has solutions for every team.