Sunday 23 June 2019

Graph Databases, It's not what, but how.

It has been too long. Way too long. There have been some big changes in my life of late, not the least of which is the continued growth of my miniature DBA at home. In February of 2019, our condo building had a major flood in the mechanical room, and we have been under evacuation since then, currently expected to return in mid-July, and my prayers go out that this holds. Also, I have taken a new position, back in the private sector, and working diligently to bring value to my new company. We have achieved some amazing short term success, and there will be updates on how we accomplished this in future installments talking on the Agile BI Methodology I have developed.

In recent months I have mentioned or brought up the idea of using Graph databases within SQL Server to a fair number of people in different contexts. Whether it be at a local PASS event, at the workplace, or meeting with colleagues over lunch invariably, the first question posed is “What are graph databases?”

Outside of a few people who are much more steeped in advanced combinations maths than me, there are very few times where it matters what a graph database is. How many of us have taken the time to truly examine relational maths before diving in and understanding relational databases?

The better question, then, is “What can a graph database be used for?”. The answer to that is what I want to explore in this and future blog posts.

If you have datasets that are using multiple key reference tables to manage many to many relationships between tables; if you have recursive relationships that are required to be mapped within an application or for reporting; if you need to manage retrieval of granular data across large arrays of related tables in an efficient way; if you want to add additional attributes to the relationships between tables themselves. These are some of the reasons and use cases for employing Graph databases within your newer SQL Environment (SQL Server 2017 + / Azure SQL).

There are a growing number of applications for the use of complex many to many relationships within the scope of database operations. The most common example brought to the forefront is in managing social networks, but this is not m, but this is not my favourite example at all. There are very few of us in that support operations where social networking is a key component of our business, so that begs the question of what other examples we have.

The most common example I have encountered when it comes to many to many relationships within regular business operations is in the realm of inventory management in the case of inventory management. In the case of a company that sells a product, there will likely exist many to many relationships between orders and products. The vast majority of the time this is accommodates through a junction table of some sort, where a client table is joined to an order table, which is linked to the product table as a reference, or alternatively it is linked to the product table in a denormalized fashion, causing rapid growth of the orders table, to the point where it becomes unable to scale past a certain growth target without additional investment in infrastructure.

To avoid the exponential growth of our database, and the ability for our applications to manage retrieval of order data, we can utilize a graph database, where our client, orders and products exist within nodes, and our association of products to an order and orders to a client are managed through edges of the graph table. The edge tables can be large, but as the logic of the engine only utilizes these edges to manage relationships between the order and products, we see marked increases in performance for individual order files or aggregates up to the client level. In this case, we are managing the many to many of the orders to products with the graph relationship, rather than a relational relationship.

Expanding upon our use of databases to support applications, recursive tables and especially recursive many to many or recursive one too many can be difficult to manage within the scope of a relational database without creating many denormalized or recursive relationships. The examples I have coming to mind is in managing client hierarchies within a growing complex corporate landscape.

In the event that you have a clientele that is in the realm of larger corporate entities with multiple subsidiaries or affiliated companies that make up additional clientele, it sometimes becomes necessary to manage the hierarchy of clients for both the application display of data, as well as for report and business intelligence. When you begin to interweave subsidiaries, branches, or other such client entries within the same table as other higher-level client entities, it becomes a mix or entities that are required to be managed for either billing, marketing, or other business uses.

In order the manage this in a relational database we would look at creating a denormalized clientele table, and attempt to manage it with hierarchical ID management, or we would use single or multiple junction tables to manage the recursion within the single table, either way adding overhead to our table, and to our queries against said table.

Utilizing a node for the primary client table and managing the hierarchical relationships within the self said table with a recursive edge table allows us to manage the relationships between our clients while providing a single, normalized client or bill-to table.

Stay tuned for the next installment, which should not be eight months in the making, where we will explore other use cases for Graph databases, and then dive into them a bit more.


No comments:

Post a Comment

Reflections from the Summit

This past week I attended PASS Summit 2019 in Seattle. I had an amazing time and it was great to catch up with friends and colleagues, b...