why can't I use a windowed function in a where clause in SQL Server?
One answer, though not particularly informative, is because the spec says that you can't.
See the article by Itzik Ben Gan - Logical Query Processing: What It Is And What It Means to You and in particular the image here. Window functions are evaluated at the time of the SELECT on the result set remaining after all the WHERE/JOIN/GROUP BY/HAVING clauses have been dealt with (step 5.1).
really I'm looking for the reasoning behind not being able to use
windowing functions in where clauses.
As mentioned, a window order clause is mandatory, and SQL Server
doesn’t allow the ordering to be based on a constant—for example,
ORDER BY NULL. But surprisingly, when passing an expression based on a
subquery that returns a constant—for example, ORDER BY (SELECT
NULL)—SQL Server will accept it. At the same time, the optimizer
un-nests, or expands, the expression and realizes that the ordering is
the same for all rows. Therefore, it removes the ordering requirement
from the input data. Here’s a complete query demonstrating this
technique:
SELECT actid, tranid, val,
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
FROM dbo.Transactions;
Observe in the properties of the Index Scan iterator that the Ordered
property is False, meaning that the iterator is not required to return
the data in index key order
The above means that when you are using constant ordering is not performed. I will strongly recommend to read the book as Itzik Ben-Gan describes in depth how the window functions are working and how to optimize various of cases when they are used.
The index you have is not usefull for your window function because
1.To get ID column value, SQL may end up doing key lookups or even end up scanning whole other index if it crosses Tipping point.So your index may not be used at all.
2.You are ordering by val desc which requires a sort with no suitable index and may even endup spilling to TEMPDB
3.For one more interesting fragmenation aspect ,see below
Typically for a Window function to perform well,you will need a POCindex which means
P,O--Partition and order by columns should be in key clause
C--covering --columns you are including in select should be included
So for below query to work optimally.
SELECT
Id,
Name,
Value,
RANK() OVER (PARTITION BY Name ORDER BY Value DESC) AS Rank
FROM
Stats
ORDER BY
Value DESC
You will need below index
create index nci_test on dbo.table(name,value desc)
include(id)
There is one more issue with your index created with " value desc".
Normally in an index all values will be stored in Ascending order by default,but with this index you are asking to store in a reverse way which can cause logical fragmentation which can be seen from answer of Martin Smith here ..Pasting relevant terms from the answer here ...
If the index is created with keys descending but new rows are appended with ascending key values then you can end up with every page out of logical order. This can severely impact the size of the IO reads when scanning the table and it is not in cache
So few options..
1.Run the index rebuild based on your frequency to see if it helps
2.Changing the query to order by partition clause will eliminate the need for index to be created with "val desc" option
SELECT
Id,
Name,
Value,
RANK() OVER (PARTITION BY Name ORDER BY Value DESC) AS Rank
FROM
Stats
ORDER BY
name DESC
The above query doesnt need an index to be created like the one you created .you can change it like below..which also takes care of Fragmentation aspects noted above
CREATE INDEX Leaderboard__index ON Stats (Name, Value)
include(id);
-- sample data
IF OBJECT_ID('tempdb..#yourtable') IS NOT NULL DROP TABLE #yourtable;
CREATE TABLE #yourtable
(
someID int identity primary key,
[date] date NOT NULL,
[Returns] decimal(10,9) NOT NULL,
[Final Return Index] decimal(10,9) NULL
);
INSERT #yourtable([date], [Returns])
VALUES
('11/1/2016', 0.000542159 ),
('11/2/2016',-0.001629094 ),
('11/3/2016', 0.000568779 ),
('11/4/2016',-0.001246407 ),
('11/7/2016', 0.000795611 ),
('11/8/2016', 0.000663507 ),
('11/9/2016',-0.000254819 );
GO
-- Review what you have
SELECT * FROM #yourtable;
-- The solution
DECLARE @runningTotal decimal(10,9) = 1;
UPDATE #yourtable
SET @runningTotal = [Final Return Index] = @runningTotal+[Returns]
FROM #yourtable WITH (TABLOCKX)
OPTION (MAXDOP 1);
SELECT [date], [Returns], [Final Return Index]
FROM #yourtable;
Results:
date Returns Final Return Index
---------- --------------- --------------------
2016-11-01 0.000542159 1.000542159
2016-11-02 -0.001629094 0.998913065
2016-11-03 0.000568779 0.999481844
2016-11-04 -0.001246407 0.998235437
2016-11-07 0.000795611 0.999031048
2016-11-08 0.000663507 0.999694555
2016-11-09 -0.000254819 0.999439736
I don't know what data type you are using in SQL or Excel but you'll have to iron that out to get more precise results. Note the "rules" at the end of the aforementioned article by Jeff Moden.
Alternatively, if you are running SQL Server 2012+ you can use what's referred to as a "window aggregate function" like so:
-- sample data
IF OBJECT_ID('tempdb..#yourtable') IS NOT NULL DROP TABLE #yourtable;
CREATE TABLE #yourtable
(
[date] date NOT NULL,
[Returns] decimal(10,9) NOT NULL
);
INSERT #yourtable([date], [Returns])
VALUES
('11/1/2016', 0.000542159 ),
('11/2/2016',-0.001629094 ),
('11/3/2016', 0.000568779 ),
('11/4/2016',-0.001246407 ),
('11/7/2016', 0.000795611 ),
('11/8/2016', 0.000663507 ),
('11/9/2016',-0.000254819 );
-- Solution:
SELECT
[date],
[Returns],
[Final Return Index] = 1 + SUM([Returns]) OVER (PARTITION BY (SELECT NULL) ORDER BY date)
FROM #yourtable;
One answer, though not particularly informative, is because the spec says that you can't.
See the article by Itzik Ben Gan - Logical Query Processing: What It Is And What It Means to You and in particular the image here. Window functions are evaluated at the time of the
SELECT
on the result set remaining after all theWHERE
/JOIN
/GROUP BY
/HAVING
clauses have been dealt with (step 5.1).The reason that they are not allowed in the
WHERE
clause is that it would create ambiguity. Stealing Itzik Ben Gan's example from High-Performance T-SQL Using Window Functions (p.25)Suppose your table was
And your query
What would be the right result? Would you expect that the
col1 > 'B'
predicate ran before or after the row numbering?There is no need to worry about specifying constant in the
ORDER BY
expression. The following is quoted from the Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions written byItzik Ben-Gan
(it was available for free download from Microsoft free e-books site):The above means that when you are using constant ordering is not performed. I will strongly recommend to read the book as
Itzik Ben-Gan
describes in depth how the window functions are working and how to optimize various of cases when they are used.The index you have is not usefull for your window function because
Typically for a Window function to perform well,you will need a POCindex which means
So for below query to work optimally.
You will need below index
There is one more issue with your index created with "
value desc
".Normally in an index all values will be stored in Ascending order by default,but with this index you are asking to store in a reverse way which can cause logical fragmentation which can be seen from answer of Martin Smith here ..Pasting relevant terms from the answer here ...
So few options..
1.Run the index rebuild based on your frequency to see if it helps
2.Changing the query to order by partition clause will eliminate the need for index to be created with "val desc" option
The above query doesnt need an index to be created like the one you created .you can change it like below..which also takes care of Fragmentation aspects noted above
References:
Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions
Here's one approach referred to as the "Quirky Update" method.You can read more about in this article by Jeff Moden: http://www.sqlservercentral.com/articles/T-SQL/68467/ and in Itzik Ben-Gan's Book: https://www.amazon.com/Microsoft-High-Performance-Functions-Developer-Reference/dp/0735658366
Results:
I don't know what data type you are using in SQL or Excel but you'll have to iron that out to get more precise results. Note the "rules" at the end of the aforementioned article by Jeff Moden.
Alternatively, if you are running SQL Server 2012+ you can use what's referred to as a "window aggregate function" like so:
Results will be the same as above.