SQL Doch: September 2018

This week has been an interesting one for me, and as such, it has led me to my topic this week. I will be circling back again to some more info on Agile BI, and specifically on the service-oriented methods. This week though, we are talking about leading people and inspiring loyalty. Early in the week, I received a very interesting offer to leave the post-secondary institution I have been working at for the last year and move into a consulting firm. I spent the week agonizing and weighing pros and cons of both. Eventually, I decided to stay where I am and continue building up enterprise BI, following the structure of a six-year roadmap I have been working from.

The primary motivation for staying where I am: in a word, loyalty. There are fantastic people I work with who are driven and passionate about what we are building. But the primary thing that made the decision easier for me, was not the loyalty I have, but rather the loyalty I was shown. Both of my colleagues in data management asked me, on a personal level, to stay. Both used the same words to describe what their reasons were for wanting me to stay, and that struck a nerve with me. I thought about what I was giving to my team, and what that looks like to others.

To put this in some perspective, we have a mixed matrix structure at the office, and I do not sit at the top. I do not hold a position as a manager, or even a team lead. I am a senior BI analyst and run backup as DBA for liability coverage. Having such a small team to work with (there are three of us) though, has given us plenty of opportunity to all work together and help each other out. I come from a background of leading people, and I believe it when it was pointed out to me that I have the “no man left behind” philosophy as a hold over from my military days.

We do not always get to choose the people we work with, but this has never been an issue for me. I lead naturally from the front, and those that want to follow, are welcome to join. It does not take a long time for the people I work with to make the decision to either keep up, or not. This leads me to the core of this week’s argument. From my experience as a leader, team leader, manager and executive, I have only two rules about leading people. These two rules are the only rules you need to affectively lead people and inspire loyalty in the people you lead. These two things are also the two things I get from my current manager, and it is more than enough for me to respect him as a leader, and as a person.

Number One: Challenge your people, constantly.

I do not mean be difficult or push them to constantly feel overworked. This isn’t about piling on hundreds of hours of menial tasks that you just don’t want to do. I mean, actually challenging them to do better and be better at what they do. Push the limits of their comfort zone and introduce them to new tasks that make them think and become engaged in their work. Find out what drives them and use it to push them forward. Make them better at what they do by leading them towards the things they want to do, and task them to discover more, learn more, and be more than they are, every day.

Challenge yourself as much, or more than those around you. Find the things that will make you a better leader, a better worker, and a better person and face them head on. Lead from the front by challenging yourself and showing results. It will inspire those that want to follow to do the same. My father has always said of me that to reach for the moon was too easy, Saturn is out there somewhere. No truer words have ever been said.

Number Two: Support your people, always.

Again, I will start with what I do not mean. I do not mean hold their hands and guide them through each step. I do not mean to do work for them. These are your colleagues, and they are adults (In theory). What I mean is, support them in facing the challenges you present. Send them articles and links that would interest them, that show them the way forward, and open their minds. Suggest webinars, seminars, and conferences. Push them to seek out authors and to read about the things that inspire them. Help them think or rethink solutions to problems. Play devil’s advocate and continue to challenge them to think from new perspectives.

Support yourself in the same way. Seek out new ideas, new perspectives and different ways of solving the problems you are facing at work. Reach out to the same colleagues and lean on their expertise as well. Engage with them on things that are difficult. Working together as a team is not a sign of weakness, and it is not a sign of poor leadership. If you are open about the issues being faced, and earnest in seeking perspective, you will build a strong and solid team, and one that will follow you into the very depths of hell.

That is all there is to it. There is no “magic bullet” here, just earnest, open, honest hard work as a team. Work together to push and drive each other, don’t separate yourself from your team, and lead from the front, constantly challenging and always supporting. I suppose that would fall as the secret third rule. If you want to lead a team instead of just manage people, do not separate yourself from them. You don’t necessarily need to sit in the middle of your team, you earned that office after all, but don’t be absent either. If your team only hears from you as you assign tasks and collect results, you are not leading the team. You can be replaced by a Kanban board and an algorithm.

As a final note for those who struggle with the belief that because you are a manager you are infallible, or otherwise above your team in some way; you are not either of those. We are all human, and we all crave the same things: respect, challenge, and support.

I have had four fantastic managers in my lifetime. Four out of dozens. All four of these people worked very visibly within their teams, challenged me, supported me in facing those challenges, and solely led from the front. Currently, the two different managers I work with, follow at least two of the 2 and a half rules above. I am extremely lucky to be surround by people that have chosen to follow me, and I have chosen to follow, and that is worth more than dollars in a lot of cases.

Cheers!

SQLDoch

In the recent months I had the pleasure of hearing a presentation by Randolph West, data platform MVP, on how he took on a challenge of replicating a trillion row insert into a table in SQL Server. This was a combination of the Mount Everest idiom, and to prove that SQL Server performance is not as bad as some people say it is. This lit a spark in me, and I wanted to try it out as well. Joe Obbish was the first to do it, and Randolph was the second, so now I wanted to be in that group.

In order to remain unbiased in my attempt I refused to read either blog about how it was accomplished. I instead decided to run a series of test at 1 billion rows, to see if I could get it running with the best efficiency possible and create a baseline for how long I could expect the entire process to take. Joe’s method accomplished the task in approximately two and a half days.

I will state early on that I did not manage to match this, instead the total operation at the end took 111 hours to insert 1,000,000,715,000 rows into a clustered columnstore index. Given that I accomplished this with a simple query structure though, I feel that I was able to achieve what I set out to do.

The Hardware

The data lab machine I have at home is called the Databox, and is a stand-alone custom build I put together specifically for data operations testing and data science competitions I am involved in. I built it myself, and I designed the build to be optimized for disk IO and data operations.

· Intel 8^th generation I3 (4 core, no hyperthreading) i3-8100 3.6 ghz 16 PCI Express lanes

· MSI H370M Bazooka Motherboard

· 16 GB DDR4-2400 Ram

· 32 GB Intel Optane 3dX-Point Memory Cache (4 PCIe Lanes)

· GTX 750 ti (8 PCIe Lanes)

· 2 TB SeaGate FireCuda SHDD (Hybrid Hdd) (8gb NAND Flash on 2tb 4 head disk with 64 mb cache)

(Benchmark Disk Write is just shy of 300mbs second on the disk, achieving between 2x and 3x the throughput of the FireCuda alone, and achieving results that rival well over half the current market of SATA SSDs, and for a fraction of the out of pocket cost of going to a large volume NVMe drive)

Software

The data lab machine runs the current build of insider edition Windows 10 Pro, and SQL Server 2017 (Version 14.0.2002.14) Developer Edition. I also run the SQL ML Server, and ata the time of testing the March 2018 Edition of Power BI Report Server, and Power BI Desktop, but the reporting server is not the important part today.

For the purposes of the query, my instance was configured with minimum 6gb, maximum 12gb of memory is reserved for SQL Server, MaxDOP is set to 4 Cost threshold was set to 4. Query store was active on the database set to all, single disk filegroup on Primary fast disk, and Memory optimized filegroup of 200 MB.

The Testing Process

First thing first, I needed a structure. After some quick reading on the Clustered Columnstore Index compression structures, I found that a columnstore contains 1,048,576 rows. So I determined that this would be the optimal load volume per cycle, as it would allow the columnstore to be loaded without having to manage a deltastore in a separate column throughout the operation.

I built two tables, one that would hold an insert value, and one that would be the target table. I populated the insert table with 1,048,576 rows of the value “False”, in a bit format column with a clustered columnstore index.

USE ColumstoreTest;

SET NOCOUNT ON;

DECLARE @i INT = 0

BEGIN TRANSACTION

WHILE @i <= 1048575

BEGIN

INSERT INTO dbo.InstTbl (Beta) VALUES (0)

SET @i = @i + 1

END

COMMIT TRANSACTION

SET NOCOUNT OFF

If at First you don’t Succeed, Fail and Fail Again.

Now to the testing, with some guesswork along the way. My first thought was to run a single transaction process, in the same way I ran the InstTbl. This led to some disappoint results, as the compression wouldn’t run until after the insert was completed, and this created a large amount of extra write operations which dragged performance down considerably.

My second thought was to create a stored procedure to run the inserts, and to call the stored procedure, loading the insert through temp. This was an improvement, but still a disappointing result. Onward and upward.

My third attempt was to load the insert table into memory optimized temp storage and then drop that into the insert operation. This too yielded disappointing results, as the memory optimized table required a separate index and as such, it decreased index scan performance. This was also the cause of a concern with running a partition swap operation and chunking the insert into larger sections but was not possible because the indexes did not properly match. (If someone has a suggestion for this approach I am open to hear it.)

My fourth attempt was to load a table variable and then insert from there, but again, this created too much overhead on the bus, and my write speeds went down.

Fifth attempt, and the final one that I ended up using was simple, probably a bit too simple, but it worked amazingly well. Set stats off, Set processor affinity, (Was supposed to set nocount on, but forgot during the test run), and insert with OPTION MaxDOP 4 and force a parallel plan with a query hint (I know, but it worked)

In the course of running the test I averaged out to 150 million rows per minute being inserted into the table and was able to complete it in just over 111 hours, as stated. I can put Trillion Row + Dataset on my resume and move on with some other projects. I will circle back, after now reading how Joe Accomplished it, and see if I can achieve similar or better results.

USE ColumstoreTest;

ALTER SERVER CONFIGURATION

SET PROCESS AFFINITY CPU=0;

SET NOCOUNT OFF

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

SET NOCOUNT ON;

INSERT INTO TrillionPlusTwo (Alpha)

SELECT Beta FROM InstTbl

OPTION (MAXDOP 4, USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'))

GO 954675

The Lesson

We don’t always get things a 100% right the first time. There are many factors that affect performance, and a series of smaller scale test was able to get me into the performance range I wanted. If I had gone ahead with attempt 1, it would have taken over a month to run. By looking at different configurations, and settings I was able to achieve what I wanted to, and now have this table available to me for additional testing.

When I speak about AGILE and DevOps, this test shows a microcosm of why I prefer this method. I don’t go through a lot of prepatory planning, but use some expertise and develop working options. We toss a bunch of small value tests through these options (ideally with an end user, but for this case it was just me), and within a very short time, come up with a direction that will work long term, and check all the boxes necessary and if necessary make cuts where we can regain value fast, and circle back later on these.

This was a fantastic experience, and something I will build on in future.

SQL Doch

PS Special thanks to Joe Obbish and Randoph West.

Sunday, 9 September 2018

Leading people, inspiring loyalty.

Number One: Challenge your people, constantly.

Number Two: Support your people, always.

Monday, 3 September 2018

A TrillionPlus, a few more.

Software

Reflections from the Summit