Let’s talk about mini-marts. When we build out BI, traditionally, we work towards launching the data-mart; a complete and cohesively built star or snowflake schema that allows us to create reporting and self-service areas to launch to a complete business area.
This is a wonderful endeavour, and I need to ensure that everyone knows this is not something I am against at all. What I have found though, in my experience, is that when working with a small team, or starting a BI program, it creates a far-reaching target to get to, before you can show value to your consumers, in this case, your internal or external clients that consume your data from the mart.
I need to also add the caveat right near the top here, that if you and your team are not documenting what you are doing, even with something as simple as a text document that highlights the purpose of every action along the way, you are setting yourselves up for failure. Full transparency within your team, is a must.
To build value rapidly, as we do in AGILE and in DevOps, and to achieve the shortest turn-arounds for our clients, then, we deploy mini-marts. We concentrate on the value that the business users identify as being of the most import to them; the areas that they already use to describe their business. We usually target areas that are being manually compiled by a clerk, team lead, or manager, by collecting data from pre-built queries and then cleansing it through a manual process, usually in Excel.
We identify all the areas and fields from the transactional system, as well as any low-hanging fruit that is adjacent, and we concentrate on building our mini-mart based on this; the customer identified value, and a few extras. Using the 80/20 principle, we begin the process of designing our mart from the front end backward.
Power BI On-premises is our current reporting server of choice in our organization, so we usually begin with a mock up, consisting of a bunch of fake data on a single spreadsheet that feeds the visuals. This is something that can be achieved in half a day at most, and provides our users with something they can see, and interact with to guide the design. This allows them to make rapid choices about what really matters to them, long before we put in any effort in creating the ETL for the initiative.
Once they are happy and have given us the feedback, we have a much more solid design from our users that we can then use to design the mart. This technique is guided through having the mock up, and a conversational meeting that is led by the 7 question technique outlined in the model-storming process from the “Agile Data Warehouse” by Lawrence Corr. This model-storming technique gives us the measures and dimensions and we can design our star schema model, usually in a visual manner, using software from QUEST or just in Visio.
Now we know what needs to be pulled, transformed, and prepared, and the realm building work begins. Using SSIS we will pull the data into our landing zone in the staging instance of SQL Server. This landing zone is a partition specifically designed for us to pull raw data into and, if we need to, add static tables such as translation values, grouping values, or targets that are updated yearly. (Some of which we then go on to automate through SharePoint forms for ease of data entry.)
We use T-SQL, Python and R to affect our stage 2 ETL, and transform and align the base tables for our model. This is the point where we are creating the star schema, and that is located in our clean staging area of the data warehouse. This area also doubles as our future sandbox area, when we are ready to engage with our organization’s citizen data scientists.
This clean staging area contains mostly transformed data, although we do not concentrate on ensuring all the final measures are there, but that our fact tables are well built and designed for the final model. We then pull the tables for our mini mart into a SSAS tabular model, creating the final measures, building all our joins, changing titles to plain language, and ensuring that the mini-mart encompasses the areas we need in the value-add reporting initiative for our business user.
We deploy the mini-mart to our warehouse, connect it up to the mock-up report to display real data, and then launch in alpha stage to a limited user group that allows us to get them involved in the final data validation of all the business logic, built into all three stages of the ETL process. (Next week we will discuss three phase ETL, and why it is my choice for ETL design.)
As I have outlined in the previous post, we now go through the iterative processes and we have created value for our users in a very short amount of time. Additionally, we have created what I would refer to as the embryo of the greater data mart; the seed that will allow us to build outward, in an organic and client-driven way. (Much like a fractal)
When we get the next project, initiative, or request from the same business area, we already have some semblance of an idea of where we are starting. We may already have twenty, forty, or even eighty percent of the data we need for the next project completed in the existing mart. Now we build up on it with this next initiative, adding to our dimension tables, maybe needing to re-create them slightly to conform them to the new project, but it is not ground up again, and we are still building value for our consumers.
This is the basic concept of building the mini-mart, and why I use them for rapid development, and immediate value-add to our users. Instead of spending a year to three building a larger data mart, during the first year we launched 4 primary mini-marts. I then went on to create a second generation of 2 of them, being able to provide over 15 value-adding reporting initiatives to my internal clients. This process allowed our minuscule team to create labour time savings from automating manual compilation and business rules into these mini-marts that exceeded the amount of time we spent building them, hour for hour, by over fifteen percent!