Tuesday, 12 November 2019

Reflections from the Summit


This past week I attended PASS Summit 2019 in Seattle. I had an amazing time and it was great to catch up with friends and colleagues, both old and new! This was my first attendance at the Summit, although I had made some friends along the road to the conference through speaking at SQL Saturdays in Edmonton, AB and Victoria, BC. In reflecting after arriving home, I have some initial thoughts and impressions from the conference I would like to share with everyone.

Some additional context for this post. I was able to have the registration for the conference paid by my employer this year, although it was with the caveat that I would be paying for my own travel and expenses. This did not include the Pre-cons, which I was unable to find the budget for this year.

Firstly, I would like to commend the organizers, PASS board of directors, Community team, sponsors and attendees for making this past week a memorable and amazing experience. The session content was amazing and valuable, the keynotes were excellent, the food was good, and the camaraderie was steps above anything I have experienced at conferences before. Truly the community is #SQLFamily.

I took a lot of value from this time, in both professional and personal growth. This week served to highlight areas I am happy to be excelling in, and to shine spotlight to the areas I need to improve. This I believe should be anyone's first goal in attending such an event, and in this I was able to take a great many things away.

For others that are thinking of attending for the first time, or those that may have had some struggles their first time around, I would like to share some things. Number one is find a mentor. I was lucky enough to be close to Chris Wood here in Edmonton, who has attended summit for 14 years, in addition to tech outbound and SQL cruise attendance. His knowledge and wisdom were invaluable to me in taking in all that was on offer, which is, honestly, a lot. Secondly, the sessions will be there, they continue, and you will never be able to attend all the ones you want to, there is too much concurrency. Plan to purchase the downloads or USB and focus on the areas where you have questions that could be answered either in the session, or one-to-one afterward with the speaker.

Lastly, do not neglect the "hallway track". The opportunity to swap ideas, share challenges and solutions, and pick the brains of the top minds in our industry is the primary resource this conference has on offer, and the community is genuinely interested in helping each other out.

Your perspective will change, expect that. Your perspective on your work, new ideas and concepts can find fit in what you are doing, new technology, announced releases, and solution brainstorms will provide insight to improve your work. You will also grow personally. I did, at least. Knowledge of your strengths and weaknesses, and tracks to address both will provide context to your value as a professional. This is the opportunity to assist in directing your growth for the next year.

I understand more about where I have made strides with areas I struggle (Emotional Intelligence), but also where I have improved in this area over the last couple years. I was also able to connect and reach out to others who struggle as well, and people who are masters of this in their field and work. I was able to get guidance on new resources to explore and found new colleagues that can work on this with me over the next year.

I will conclude with saying that every penny spent was worth it, and I will be working hard to ensure I can come back next year, when summit will be hosted in Houston! 

Cheers!





Saturday, 20 July 2019

BI Development and First Principles Thinking


Good Day to all! 

Recently I have heard more and more talk in the business world of the concept of first principles thinking. Elon Musk has been quite vocal about it, and as his star continues rising, and people are taking more notice into the methods to his madness, it has become something I have heard around the water cooler so to speak.



This is not a new concept, in fact it is one of the oldest in philosophical thought (Thank you to my lovely wife for helping me get a grasp on the history of this) and in classical circles has been more commonly referred to as Aristotelian Process, standing in stark contrast to Socratic thought which has existed for around the same amount of time.



Although Socratic thought is a common element in clinical psychology, and Socratic methods are a leading component of academia, there is something truly wonderful about Aristotelian methods, and there are advantages in using first principle methods in developing and delivering business intelligence.



With that minor digression complete, let us take a moment to talk about how we apply what we know, what we have learned, and understand how it is we measure, define, predict and otherwise provide intelligence for business.



Developing our mini-marts, and the business should primarily lead our Minimally Viable Products that are delivered to the business users in developing the scope, scale, and content of intelligence that we deliver. What happens then when we have to build a proof of concept? Do we just throw random numbers up on a dashboard and hope that we are on the right track? I would argue this is where we apply some element of first principle thinking.



Every single business in the history of business, and existence today is nearly entirely focussed on one primary goal. Make money. That is it. Some people want to change the world, looking at you, Mr. Musk, and some people have businesses that are designed solely to break even. They all, though, have the commonality that money comes in, money goes out, and the goal is to have more of the former than the latter. This means that no matter how you look at things within any business, whether it is public sector, private sector, healthcare, education, sales, manufacturing, shipping, retail, service, hospitality, or any other we can make the very safe assumption that the movement and management of money will be important. If it is a department within a larger organization, it will remain the same; money will be the ultimate tip of the balance in the decision-making process.



As we are in the business of decision support, that means, we need to ensure that we look at the money, as the questions about the money, and find out how our users are interacting with their money. Ultimately, it is very hard to go wrong with delivering a proof of concept that looks at gross revenue and breaks it down by department, division, location, what have you.



Money doesn’t materialize from thin air though (Unless you are in financial management) however the adage that time is money has a lot of truth to it. Most, if not all businesses, rely on the trading of time as a resource toward the making of money. The time that people invest in producing, manufacturing, selling, shipping, delivering services, or just waiting (Looking at financial management again). This means we can, again, make a very safe assumption that the measurement of time is going to be important to the business, regardless of what business it is. I am not referring to the dimensional split of time into a period of reporting, but rather the measurement or fact of time passing. This can be cycle times, time to complete, manufacturing time, time to fill positions, time of training, transition times, client contact delays, constituent engagement times, etc. Regardless of the exact context, time will be an important factor in determining the success of the business, and as such, will be something important to deliver in reporting.



Time, though, is also the bridge between what the business does, and what the business makes in revenue; this is the reason why the last thing we need to focus on in this triad of business intelligence delivery is volume. This doesn’t mean the relative audio level of business (Looking at Matthew), but rather the amount of something the business does. This is easy in sales, manufacturing, retail, but sometimes can get a bit muddy in more service-oriented businesses and departments. Let me assure you, the volume, or amount of time a service is completed, delivered, fulfilled, or otherwise remains a measurable and important facet in these businesses too. To find the thing that the business does, which I would hope you have a firm grasp on if you are working within it, and describe the volume of that thing. The number of interaction with customers, the number of sales, the number of widgets produced and shipped. There will always be an easy win by delivering a measure of the volume of business the business does.



Everything else in developing KPIs, Metrics, Measures, Decision supports, and all manner of analytics will ultimately be delivering a measure of either money, time, or volume, or more likely a combination of these things. If you are new to interacting with a business, a department, or an industry, just remember, that business can be as complex and multi-faceted as the people that make it up, but it will only ever really measure these three things. Find them in your data, earmark them, understand them, and then everything past that becomes considerably easier in the long run.  

Cheers!
SQLDoch

Sunday, 7 July 2019

Mini-Marts Redux! Directors cut.


In my previous post, I discussed the outline of utilizing mini-marts to the organization while also building towards a robust and sustainable framework on which to build additional value in the future. I want to come back and revisit this in slightly more depth, as I have had a chance to employ these techniques through a couple of different positions over the last couple years, and there have been some definite opportunities for learning along the way.

First off, let’s revisit the overall outline, as it has been found that we can find ways to engage even earlier with our business users, and allow them to outline or outright define the solution with just some gentle guidance along the way.

First up, we engage in reconnaissance with the business unit. This stage is a pre-engagement phase where we speak with individual users, managers, and others from in and informally associated to our target department. We attempt to gain some insight into the pain points that we may be able to address.  In most cases, this will be something that everyone has become aware of, but no one yet knows how to solve, or has the time to devote to building solutions.

From this recon phase, we can build out a wire frame. A basic representation of the capabilities of our BI solution. Within the Microsoft BI Stack. We can generally find that the department in question will have more than one Excel report they are compiling utilizing manual compilation and some system extract, either scheduled or ad-hoc. This sort of basic excel report can form an excellent foundation for the building of a wireframe.

Once this wireframe is complete, we can begin the engagement with the target department. The wireframe acts as a central talking point to the discussion and helps solidify the idea in our user’s minds of the direction we can go. A majority of the time, the users may not even be aware, as you and I are, of the changes made to what our capabilities are in delivering business intelligence over the last decade. The advances in the platform, capability, and delivery are our purview, and as such, we should be helping to guide our users towards seeing the best value on the investment in ourselves, and in the data platform.

Once we have a go ahead, we can begin work. We will take the ideas from our users, and the wireframe discussions, and develop a proof-of-concept. Consider this to be a first draft of the MVP. This proof of concept can be built from stage one extract. We are not aiming for performance, translated values, or complete usability. The idea with the proof is that it can be developed in an incredibly short turn around times. We build the source extract, move data into our warehouse in a flat format, and build a basic concept dashboard or BI product from this flat extract. Functions will be there, and some basic layout, but we know this will be the first draft only.

With the proof of concept in hand, we then re-engage with our users, forming the core of the focus group for the ongoing design and development of the dashboard product. We give access to a limited number of people in the user group and allow them to interact with the product. They are instructed only to provide as much feedback as possible. We begin the pre-validation of our source extract at this stage, and we work with our users to enhance and clarify the design vision. This is the first half of the first cycle. From here, we begin the real work involved in developing a mini-mart and a BI product for the users.

With our fresh feedback in hand, relevant to the product, and based on real usage, we begin the work of putting together the mini-mart. Stage 2 ETL is design and completed, deployed in a segmented development zone. This isolated zone in our warehouse development environment is the core of our new mini-mart. It may include elements or dimensions even from other areas, that we have previously developed, or it may be in isolation, depending on the context of the business.

 We can build our MVP once we have completed stage 2 ETL, or we can dive right into the final stage 3 ETL build, which should be relatively rapid if we have been building for future scope all along. (Further on scoping in future articles) Ideally with our stage 3 ETL complete, validation feedback in hand from our focus group working with the proof of concept dashboard, all the pieces are falling into place to develop the MVP with as little pain as possible. The wireframe, to proof to MVP cycle, is complete.

With the MVP launched, we move then onto the full sized focus group, engaging further with our users, showing the fruits of their labours and feedback, and showing them a product they feel not only happy to use, but a part of creating. As has been discussed previously, we are always aiming above buy-in on our projects, and the goal is to have the business users feel ownership of the product. This ownership comes from early and ongoing engagement, and from feeling that they have been heard and understood through every phase of development.

The addition of the proof of concept phase over the last year has been a great boon in my works in developing BI value rapidly for my organizations, and for increasing uptake and ownership within the business group to the BI products developed. Ultimately, if I have developed something they will use and will make their ability to consume and work with data easier and less painful for the business, I have achieved some measure of success in my role.

Until next time, Cheers!



SQLDoch

Sunday, 23 June 2019

Graph Databases, It's not what, but how.


It has been too long. Way too long. There have been some big changes in my life of late, not the least of which is the continued growth of my miniature DBA at home. In February of 2019, our condo building had a major flood in the mechanical room, and we have been under evacuation since then, currently expected to return in mid-July, and my prayers go out that this holds. Also, I have taken a new position, back in the private sector, and working diligently to bring value to my new company. We have achieved some amazing short term success, and there will be updates on how we accomplished this in future installments talking on the Agile BI Methodology I have developed.

In recent months I have mentioned or brought up the idea of using Graph databases within SQL Server to a fair number of people in different contexts. Whether it be at a local PASS event, at the workplace, or meeting with colleagues over lunch invariably, the first question posed is “What are graph databases?”

Outside of a few people who are much more steeped in advanced combinations maths than me, there are very few times where it matters what a graph database is. How many of us have taken the time to truly examine relational maths before diving in and understanding relational databases?

The better question, then, is “What can a graph database be used for?”. The answer to that is what I want to explore in this and future blog posts.

If you have datasets that are using multiple key reference tables to manage many to many relationships between tables; if you have recursive relationships that are required to be mapped within an application or for reporting; if you need to manage retrieval of granular data across large arrays of related tables in an efficient way; if you want to add additional attributes to the relationships between tables themselves. These are some of the reasons and use cases for employing Graph databases within your newer SQL Environment (SQL Server 2017 + / Azure SQL).

There are a growing number of applications for the use of complex many to many relationships within the scope of database operations. The most common example brought to the forefront is in managing social networks, but this is not m, but this is not my favourite example at all. There are very few of us in that support operations where social networking is a key component of our business, so that begs the question of what other examples we have.

The most common example I have encountered when it comes to many to many relationships within regular business operations is in the realm of inventory management in the case of inventory management. In the case of a company that sells a product, there will likely exist many to many relationships between orders and products. The vast majority of the time this is accommodates through a junction table of some sort, where a client table is joined to an order table, which is linked to the product table as a reference, or alternatively it is linked to the product table in a denormalized fashion, causing rapid growth of the orders table, to the point where it becomes unable to scale past a certain growth target without additional investment in infrastructure.

To avoid the exponential growth of our database, and the ability for our applications to manage retrieval of order data, we can utilize a graph database, where our client, orders and products exist within nodes, and our association of products to an order and orders to a client are managed through edges of the graph table. The edge tables can be large, but as the logic of the engine only utilizes these edges to manage relationships between the order and products, we see marked increases in performance for individual order files or aggregates up to the client level. In this case, we are managing the many to many of the orders to products with the graph relationship, rather than a relational relationship.

Expanding upon our use of databases to support applications, recursive tables and especially recursive many to many or recursive one too many can be difficult to manage within the scope of a relational database without creating many denormalized or recursive relationships. The examples I have coming to mind is in managing client hierarchies within a growing complex corporate landscape.

In the event that you have a clientele that is in the realm of larger corporate entities with multiple subsidiaries or affiliated companies that make up additional clientele, it sometimes becomes necessary to manage the hierarchy of clients for both the application display of data, as well as for report and business intelligence. When you begin to interweave subsidiaries, branches, or other such client entries within the same table as other higher-level client entities, it becomes a mix or entities that are required to be managed for either billing, marketing, or other business uses.

In order the manage this in a relational database we would look at creating a denormalized clientele table, and attempt to manage it with hierarchical ID management, or we would use single or multiple junction tables to manage the recursion within the single table, either way adding overhead to our table, and to our queries against said table.

Utilizing a node for the primary client table and managing the hierarchical relationships within the self said table with a recursive edge table allows us to manage the relationships between our clients while providing a single, normalized client or bill-to table.

Stay tuned for the next installment, which should not be eight months in the making, where we will explore other use cases for Graph databases, and then dive into them a bit more.

Cheers!






Monday, 15 October 2018

Data is a Secure Asset, so Secure It.


Good day! This will be the last post in this series before I make a pivot to speaking on Graph Databases in SQL Server for a few. We will certainly return to the topic of data management in the future though, so stay tuned. On a personal note, I was recently let go without reason or notice from my position so, as of today, I am contemplating where I want to go next in my career, but I can assure you that it will be fun and exciting! I pray for success for my former employers, and that with time, they cultivate wisdom and peace in their endeavors.


On to the matter at hand: Data security. Securing our data is a matter of utmost importance. Every day, it seems, we hear about a recent data breach, and over a few months, we find out the reasons for it. We always shake our heads and ask how any company could let their data be so vulnerable, and then we look at where we are and ignore the evidence of the same sort of vulnerabilities. I have been a staunch advocate for data security throughout my career and in this, perhaps I can share a bit of what I know on the subject, so that it can help you to become the same.


Firstly, let us talk about the reasons. There exists, of course, the obvious reasons that we want to protect our data, as it is an asset to our company. That being said we can also look at a larger picture of data security. We have an obligation, in most cases, to the laws of the land to keep our data safe and secure, whether it be through GDPR, FOIP, HIA, HIPA or some other data protection act that has been passed in our jurisdiction. We also need the peace of mind that we are doing everything we possibly can, to ensure our data is safe for our clients, and pass that along to them.


I’ve said it in a previous post, but it bears repeating: data is an asset, and should be given the same organizational gravitas as cash. It is not an afterthought, and not something that should be left lying around. It also should be treated with the same diligence for chain of custody as cash and, luckily, we live in an era where the tools are available to us to do so.


Data security, like network security, is a matter of layers. A wise man once told me that obfuscation is no security, so we will step aside from that as a means of securing our data. We cannot just hide our databases and hope they are not found by bad actors; we must fortify them with as many layers of security as possible. For those of you coming from a background in network security and the OSI model, the majority of this will be dealing with layers 5 through 7, as we are going to assume that network security is doing their job well in layers 1 through 4.



Securing your data in layers involved a fairly easy stepped list:



1.    Know your zone!

2.    Understand your security!

3.    Use the tools!

4.    Avoid defaults!



Seems simple, so let’s dive into the details on these.



Knowing your zone involves being aware of your organization’s architecture, network security, and data needs. Know that your data should always be housed exclusively in the most secure zones for network security, usually layer 5. Find out what your application and data using structures need access to. Vertically align three-layer applications and avoid replication where possible. If you are housing data in layer 6, and it is mission critical or personally identifiable information, you should address that, and find ways to move that data to a more secure location. If you need to have layer 7 applications using that data, find a way to proxy it in a secure method. With the release of SQL Server 2017 we saw some new and interesting ways to manage proxy data without replication, which is the ideal way to manage these vertically aligned applications.

Understand your security! Don’t just know the layers of network security, but find out about what you are doing with your instances and databases in your zone. Dig in and find out what solutions are being used to address data at rest and data in motion. Find out how many linked servers are in place and find out why they are. Do what you can to address the biggest vulnerability of any data structures: surface area. Reduce it as much as possible. Think about what would happen if a bad actor got into a specific database. What could they do from there? What data could they access? Could they redirect, copy, export, or otherwise take data that would affect the organization? The customers? The staff?

Within SQL Server there exist many tools to secure our data, both at rest and in motion. If we have a database or instance that isn’t using a database master key, we have immediately failed at securing our data. This fundamental step is not optional, and should be activated on all databases, including our master, before any data goes into it. As a side note, I prefer 21-character generated passwords from an enterprise password manager to accomplish this and I will say, I do not recommend anything less than 14 characters. Ensure that you have a valid and current certificate for your instance connections, and they are encrypted connections. If you get a warning in SSMS or VS or ADS about an insecure connection to a database, that should be addressed immediately. Ensure you have implemented transparent disk encryption (TDE) on your database file systems, so the data files remain encrypted at rest. Use “Always On Encrypted” if possible, to ensure that your data in motion is encrypted, and make sure they are keyed differently all the way down each layer. Encrypt your file backups of certificates and keys from SQL server with a separate password, which I have begun to refer to as a Back-Up Key (BUK) after the infamous Buck Woody.

Avoid defaults like the plague. Every single person in the world, who has ever touched SQL Server in any way, knows about the SA account. Disable it, use your enterprise domain controller, and only use windows auth wherever possible. Everyone knows the default schema of [dbo]. Do not use it, where possible. Replace the dbo schema with one of your own and secure the schema. Always make sure your logins are secured to the schema for access where needed, and remove them when not. Remove any BUILTIN\ domain accounts so that if your network security is breached, your databases remain secured. Utilize separate accounts for file systems, I prefer gMSA accounts, but the method is up to you. And always, always, encrypt the instance, the databases and the data movement.

As a final note, I do not like linked servers. I will make no qualms about that, and will be happy to argue the reasons until the cows come home. Do not use them. Manage data movement through SSIS, which allows an encrypted data movement, and avoids increasing the surface area of your enterprise data. I cannot say it with more emphasis. Do not use linked servers in an enterprise.

Tuesday, 9 October 2018

Data Driven means Data is an Asset (Pt. 2)


  To continue with my streak of slightly ranting/slightly advice columns on how to run a Business Intelligence program, today I will touch on Data as an asset. This is going to be a bit more on the ranting side, as I have a bone to pick with people who put the term ‘data-driven’ into their mission statement or board packages, and then turn around and do nothing to manage, care for, foster, or otherwise tend to their data as an asset. Too many times I have seen this term thrown around within organizations that treat data as a by-product, with all the respect and dignity they would industrial waste.

 Data is an asset. This means that it holds value to an organization. Like all assets, it depreciates over time. The nice thing about data, is it is generally a part of the business to generate data just through the daily operations. Monetary assets that the company generates, through sales, services, or other income streams, are more difficult to generate. This ease of generation in data, tends to cause many organizations to overlook proper care of this asset.

Proper caring for this asset, maintenance or upkeep if you will, starts from the bottom with a strong core of expertise in the organization. These are the people who know and understand data, and are passionate about keeping it, using it, monetizing it, and maintaining it. We layer onto that a strong foundation of technical infrastructure, and then ice the whole thing with data management processes into a delicious cake of data assets for the organization. (Is anyone else hungry?)

The last four companies I have been involved with professionally, including my current, have all stated without reservation during the interview process that they were ‘data-driven’ companies. None of those four companies could tell me during the interview, who their data stewards were. One was able to tell me their lead DBA (because he was on the interview panel), and two were able to accurately tell me, what type of data servers they primarily used as an organization. (One was Oracle, the other Microsoft.)

Let’s start with the people. I do not believe myself unreasonable to have certain expectations for a person who makes their career as a DBA. I have done it before, and I take what I do seriously. I expect that there are good working processes in place, good documentation of those processes, at least half of the business practices follow industry best practice. (I learned young that all the way is not a good expectation to have.) And finally, I expect there to be solid documentation on all data assets.

When it comes to the infrastructure, I have higher expectations. I expect solid HA infrastructure for primary assets, with backups offsite, that run on appropriate schedules. I expect good disk capacity and planning, and I expect that all servers are set up to best practice guidelines running current software, and up to date security patches. I expect there to be scheduled maintenance, a solid DR plan, strong security, and a sustainable retention policy in place and enacted with procedures.

Finally, the data management processes. The business of data. I have expectations that any organization that calls itself ‘data-driven’ has a solid data management policy for the enterprise, first and foremost. There should be stewardship and ownership baked into that policy and practiced in the organization, as a whole. I expect that there should be mastered data for the primary KPIs of the organization, there should be domain controls and audit processes set up for data entry, and there should be processes and procedures for all of these things. I expect the organization to treat data generation as seriously as data security, which they enforce and control with the same vigor.

I don’t see these things. I have seen parts working in some organizations, and other parts working in others. This is going to change. Over the last decade, data has become the buzzword. The idea that keeps boardrooms buzzing but doesn’t mean anything. Over the next decade, it will become a lynchpin. Any company that isn’t currently on track to engage with itself, and treat it’s data like an actual asset, will suffer for it. They will be left in the dust by the companies that do.

There are exceptions. There are industries as a whole that lag behind in this and will always be out of date. I don’t see that as an excuse, but rather as a challenge. As such, I will continue to fight for treating data as an asset, that it has value, and that it should be protected.



Reflections from the Summit

This past week I attended PASS Summit 2019 in Seattle. I had an amazing time and it was great to catch up with friends and colleagues, b...