So you have NULL values in your SQL Server table and you want to populate those NULL values with the last non-NULL value, based on a particular order. Once you have only one NULL values encapsulated between two populated values, there are quick and fast solutions. But what if you find a larger gap of NULL values and you want to populate these values as well?
Let’s make a small example to visualize the problem. Connect to your favorite SQL Server database and fun this:
DROP TABLE IF EXISTS testNonNull; GO CREATE TABLE testNonNull (id INT IDENTITY(1,1) NOT NULL ,valFluct TINYINT ,valIncrem TINYINT ,dateD SMALLDATETIME); GO INSERT INTO testNonNull (valFluct,valIncrem, dateD) SELECT 10,10, '2018/08/01' UNION ALL SELECT 14,14, '2018/08/02' UNION ALL SELECT NULL,NULL, '2018/08/03' UNION ALL SELECT 9,16, '2018/08/04' UNION ALL SELECT NULL,NULL, '2018/08/05' UNION ALL SELECT NULL,NULL, '2018/08/06' UNION ALL SELECT NULL,NULL, '2018/08/07' UNION ALL SELECT 32,18, '2018/08/08' UNION ALL SELECT 45,22, '2018/08/09' UNION ALL SELECT NULL,NULL, '2018/08/10' UNION ALL SELECT NULL,NULL, '2018/08/11' UNION ALL SELECT 14,25, '2018/08/12' UNION ALL SELECT NULL,NULL, '2018/08/13'; GO
And once you run a simple SELECT statement against this testNonNULL table, you will see the gaps:
A simple solution using OVER() clause for the ROWS UNBOUNDED PRECEDING would be a easy solution, but once applied, you quickly realize, that it is working correctly only for incremental values, eg.:
SELECT dateD ,valFluct AS FluctuatingValue ,MAX(valFluct) OVER (ORDER BY dateD ASC ROWS UNBOUNDED PRECEDING) AS nonNULL_FluctuatingValues ,valIncrem ,MAX(valIncrem) OVER (ORDER BY dateD ASC ROWS UNBOUNDED PRECEDING) AS nonNULL_IncrementalValues FROM testNonNull ORDER BY dateD
All works fine for the Incremental values, where as the Fluctuating values are not populated correctly:
So the query returns wrong values for the FluctuatingValue column two times, every time the lower number of currently maximum running number appears – and this is absolutely not correct. On the other hand, Incremental values (since they increment with every new value), are populated correctly.
In order to keep the simplicity of the window function and not overdo with complicated T-SQL, the following solution is genius, yet intriguing. Converting the value and column – upon which the sort is being done – to binary to extract the max value for the current window, will satisfy both the incremental and fluctuating type of values. The original post was introduced by Itzik Ben-Gan in SQL Magazine.
Simple substitution will do the trick:
SELECT dateD ,valFluct AS FluctuatingValue ,CAST(SUBSTRING(MAX(CAST(dateD AS BINARY(32)) + CAST(valFluct AS BINARY(12))) OVER( ORDER BY dateD ASC ROWS UNBOUNDED PRECEDING ),33,12) AS INT) AS nonNULL_FluctuatingValues ,valIncrem ,MAX(valIncrem) OVER (ORDER BY dateD ASC ROWS UNBOUNDED PRECEDING) AS nonNULL_IncrementalValues FROM testNonNull ORDER BY dateD
And the return results are what you would want for the values to be filled in with.
Enjoy!
“Propagading” is not a word. “Propagating” is: https://www.merriam-webster.com/dictionary/propagate
LikeLike
Thanks. Typo.
LikeLike
thanks for sharing. this helped me today 🙂
Itzik is insane! My brain was mush after sitting through one of his full day training sessions.
LikeLiked by 1 person
Hi, please forgive the newbe question, but this may help me help my programmer solve an almost identical problem.
Will this solution work with MS SQL?
Thanks!
LikeLike
Thank you so much, it helped me !!
LikeLike
Note, it does not work if you use [date] data type instead of [smalldatetime]
LikeLike
just change smalldatetime to date in the table DDL, and add UNION ALL SELECT 33,NULL, ‘2019/08/13’ to the values to insert, and you will see it does not work.
LikeLike