Imagine the following table (called
id somedate somevalue -- -------- --------- 45 01/Jan/09 3 23 08/Jan/09 5 12 02/Feb/09 0 77 14/Feb/09 7 39 20/Feb/09 34 33 02/Mar/09 6
I would like a query that returns a running total in date order, like:
id somedate somevalue runningtotal -- -------- --------- ------------ 45 01/Jan/09 3 3 23 08/Jan/09 5 8 12 02/Feb/09 0 8 77 14/Feb/09 7 15 39 20/Feb/09 34 49 33 02/Mar/09 6 55
I know there are various ways of doing this in SQL Server 2000 / 2005 / 2008.
I am particularly interested in this sort of method that uses the aggregating-set-statement trick:
INSERT INTO @AnotherTbl(id, somedate, somevalue, runningtotal) SELECT id, somedate, somevalue, null FROM TestTable ORDER BY somedate DECLARE @RunningTotal int SET @RunningTotal = 0 UPDATE @AnotherTbl SET @RunningTotal = runningtotal = @RunningTotal + somevalue FROM @AnotherTbl
... this is very efficient but I have heard there are issues around this because you can't necessarily guarantee that the
UPDATE statement will process the rows in the correct order. Maybe we can get some definitive answers about that issue.
But maybe there are other ways that people can suggest?
edit: Now with a SqlFiddle with the setup and the 'update trick' example above
Update, if you are running SQL Server 2012 see: https://stackoverflow.com/a/10309947
The problem is that the SQL Server implementation of the Over clause is somewhat limited.
Oracle (and ANSI-SQL) allow you to do things like:
SELECT somedate, somevalue, SUM(somevalue) OVER(ORDER BY somedate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotal FROM Table
SQL Server gives you no clean solution to this problem. My gut is telling me that this is one of those rare cases where a cursor is the fastest, though I will have to do some benchmarking on big results.
The update trick is handy but I feel its fairly fragile. It seems that if you are updating a full table then it will proceed in the order of the primary key. So if you set your date as a primary key ascending you will
probably be safe. But you are relying on an undocumented SQL Server implementation detail (also if the query ends up being performed by two procs I wonder what will happen, see: MAXDOP):
Full working sample:
drop table #t create table #t ( ord int primary key, total int, running_total int) insert #t(ord,total) values (2,20) -- notice the malicious re-ordering insert #t(ord,total) values (1,10) insert #t(ord,total) values (3,10) insert #t(ord,total) values (4,1) declare @total int set @total = 0 update #t set running_total = @total, @total = @total + total select * from #t order by ord ord total running_total ----------- ----------- ------------- 1 10 10 2 20 30 3 10 40 4 1 41
You asked for a benchmark this is the lowdown.
The fastest SAFE way of doing this would be the Cursor, it is an order of magnitude faster than the correlated sub-query of cross-join.
The absolute fastest way is the UPDATE trick. My only concern with it is that I am not certain that under all circumstances the update will proceed in a linear way. There is nothing in the query that explicitly says so.
Bottom line, for production code I would go with the cursor.
create table #t ( ord int primary key, total int, running_total int) set nocount on declare @i int set @i = 0 begin tran while @i < 10000 begin insert #t (ord, total) values (@i, rand() * 100) set @i = @i +1 end commit
SELECT ord,total, (SELECT SUM(total) FROM #t b WHERE b.ord <= a.ord) AS b FROM #t a -- CPU 11731, Reads 154934, Duration 11135
SELECT a.ord, a.total, SUM(b.total) AS RunningTotal FROM #t a CROSS JOIN #t b WHERE (b.ord <= a.ord) GROUP BY a.ord,a.total ORDER BY a.ord -- CPU 16053, Reads 154935, Duration 4647
DECLARE @TotalTable table(ord int primary key, total int, running_total int) DECLARE forward_cursor CURSOR FAST_FORWARD FOR SELECT ord, total FROM #t ORDER BY ord OPEN forward_cursor DECLARE @running_total int, @ord int, @total int SET @running_total = 0 FETCH NEXT FROM forward_cursor INTO @ord, @total WHILE (@@FETCH_STATUS = 0) BEGIN SET @running_total = @running_total + @total INSERT @TotalTable VALUES(@ord, @total, @running_total) FETCH NEXT FROM forward_cursor INTO @ord, @total END CLOSE forward_cursor DEALLOCATE forward_cursor SELECT * FROM @TotalTable -- CPU 359, Reads 30392, Duration 496
declare @total int set @total = 0 update #t set running_total = @total, @total = @total + total select * from #t -- CPU 0, Reads 58, Duration 139
While Sam Saffron did great work on it, he still didn't provide recursive common table expression code for this problem. And for us who working with SQL Server 2008 R2 and not Denali, it's still fastest way to get running total, it's about 10 times faster than cursor on my work computer for 100000 rows, and it's also inline query.
So, here it is (I'm supposing that there's an
ord column in the table and it's sequential number without gaps, for fast processing there also should be unique constraint on this number):
;with CTE_RunningTotal as ( select T.ord, T.total, T.total as running_total from #t as T where T.ord = 0 union all select T.ord, T.total, T.total + C.running_total as running_total from CTE_RunningTotal as C inner join #t as T on T.ord = C.ord + 1 ) select C.ord, C.total, C.running_total from CTE_RunningTotal as C option (maxrecursion 0) -- CPU 140, Reads 110014, Duration 132
I also was curious about this update with variable or quirky update. So usually it works ok, but how we can be sure that it works every time? well, here's a little trick (found it here - http://www.sqlservercentral.com/Forums/Topic802558-203-21.aspx#bm981258) - you just check current and previous
ord and use
1/0 assignment in case they are different from what you expecting:
declare @total int, @ord int select @total = 0, @ord = -1 update #t set @total = @total + total, @ord = case when ord <> @ord + 1 then 1/0 else ord end, ------------------------ running_total = @total select * from #t -- CPU 0, Reads 58, Duration 139
From what I've seen if you have proper clustered index/primary key on your table (in our case it would be index by
ord_id) update will proceed in a linear way all the time (never encountered divide by zero). That said, it's up to you to decide if you want to use it in production code :)
update 2 I'm linking this answer, cause it includes some useful info about unreliability of the quirky update - nvarchar concatenation / index / nvarchar(max) inexplicable behavior.
The APPLY operator in SQL 2005 and higher works for this:
select t.id , t.somedate , t.somevalue , rt.runningTotal from TestTable t cross apply (select sum(somevalue) as runningTotal from TestTable where somedate <= t.somedate ) as rt order by t.somedate
SELECT TOP 25 amount, (SELECT SUM(amount) FROM time_detail b WHERE b.time_detail_id <= a.time_detail_id) AS Total FROM time_detail a
You can also use the ROW_NUMBER() function and a temp table to create an arbitrary column to use in the comparison on the inner SELECT statement.
Use a correlated sub-query. Very simple, here you go:
SELECT somedate, (SELECT SUM(somevalue) FROM TestTable t2 WHERE t2.somedate<=t1.somedate) AS running_total FROM TestTable t1 GROUP BY somedate ORDER BY somedate
The code might not be exactly correct, but I'm sure that the idea is.
The GROUP BY is in case a date appears more than once, you would only want to see it once in the result set.
If you don't mind seeing repeating dates, or you want to see the original value and id, then the following is what you want:
SELECT id, somedate, somevalue, (SELECT SUM(somevalue) FROM TestTable t2 WHERE t2.somedate<=t1.somedate) AS running_total FROM TestTable t1 ORDER BY somedate
You can also denormalize - store running totals in the same table:
Selects work much faster than any other solutions, but modifications may be slower
If you are using Sql server 2008 R2 above. Then, It would be shortest way to do;
Select id ,somedate ,somevalue, LAG(runningtotal) OVER (ORDER BY somedate) + somevalue AS runningtotal From TestTable
LAG is use to get previous row value. You can do google for more info.
Assuming that windowing works on SQL Server 2008 like it does elsewhere (that I've tried), give this a go:
select testtable.*, sum(somevalue) over(order by somedate) from testtable order by somedate;
MSDN says it's available in SQL Server 2008 (and maybe 2005 as well?) but I don't have an instance to hand to try it.
EDIT: well, apparently SQL Server doesn't allow a window specification ("OVER(...)") without specifying "PARTITION BY" (dividing the result up into groups but not aggregating in quite the way GROUP BY does). Annoying-- the MSDN syntax reference suggests that its optional, but I only have SqlServer 2000 instances around at the moment.
The query I gave works in both Oracle 10.2.0.3.0 and PostgreSQL 8.4-beta. So tell MS to catch up ;)
Though best way is to get it done will be using a window function, it can also be done using a simple correlated sub-query.
Select id, someday, somevalue, (select sum(somevalue) from testtable as t2 where t2.id = t1.id and t2.someday <= t1.someday) as runningtotal from testtable as t1 order by id,someday;
I believe a running total can be achieved using the simple INNER JOIN operation below.
SELECT ROW_NUMBER() OVER (ORDER BY SomeDate) AS OrderID ,rt.* INTO #tmp FROM ( SELECT 45 AS ID, CAST('01-01-2009' AS DATETIME) AS SomeDate, 3 AS SomeValue UNION ALL SELECT 23, CAST('01-08-2009' AS DATETIME), 5 UNION ALL SELECT 12, CAST('02-02-2009' AS DATETIME), 0 UNION ALL SELECT 77, CAST('02-14-2009' AS DATETIME), 7 UNION ALL SELECT 39, CAST('02-20-2009' AS DATETIME), 34 UNION ALL SELECT 33, CAST('03-02-2009' AS DATETIME), 6 ) rt SELECT t1.ID ,t1.SomeDate ,t1.SomeValue ,SUM(t2.SomeValue) AS RunningTotal FROM #tmp t1 JOIN #tmp t2 ON t2.OrderID <= t1.OrderID GROUP BY t1.OrderID ,t1.ID ,t1.SomeDate ,t1.SomeValue ORDER BY t1.OrderID DROP TABLE #tmp
The following will produce the required results.
SELECT a.SomeDate, a.SomeValue, SUM(b.SomeValue) AS RunningTotal FROM TestTable a CROSS JOIN TestTable b WHERE (b.SomeDate <= a.SomeDate) GROUP BY a.SomeDate,a.SomeValue ORDER BY a.SomeDate,a.SomeValue
Having a clustered index on SomeDate will greatly improve the performance.
Using join Another variation is to use join. Now the query could look like:
SELECT a.id, a.value, SUM(b.Value)FROM RunTotalTestData a, RunTotalTestData b WHERE b.id <= a.id GROUP BY a.id, a.value ORDER BY a.id;
for more you can visite this link http://askme.indianyouth.info/details/calculating-simple-running-totals-in-sql-server-12
Here are 2 simple ways to calculate running total:
Approach 1: It can be written this way if your DBMS supports Analytical Functions
SELECT id ,somedate ,somevalue ,runningtotal = SUM(somevalue) OVER (ORDER BY somedate ASC) FROM TestTable
Approach 2: You can make use of OUTER APPLY if your database version / DBMS itself does not support Analytical Functions
SELECT T.id ,T.somedate ,T.somevalue ,runningtotal = OA.runningtotal FROM TestTable T OUTER APPLY ( SELECT runningtotal = SUM(TI.somevalue) FROM TestTable TI WHERE TI.somedate <= S.somedate ) OA;
Note:- If you have to calculate the running total for different partitions separately, it can be done as posted here: Calculating Running totals across rows and grouping by ID
BEGIN TRAN CREATE TABLE #Table (_Id INT IDENTITY(1,1) ,id INT , somedate VARCHAR(100) , somevalue INT) INSERT INTO #Table ( id , somedate , somevalue ) SELECT 45 , '01/Jan/09', 3 UNION ALL SELECT 23 , '08/Jan/09', 5 UNION ALL SELECT 12 , '02/Feb/09', 0 UNION ALL SELECT 77 , '14/Feb/09', 7 UNION ALL SELECT 39 , '20/Feb/09', 34 UNION ALL SELECT 33 , '02/Mar/09', 6 ;WITH CTE ( _Id, id , _somedate , _somevalue ,_totvalue ) AS ( SELECT _Id , id , somedate , somevalue ,somevalue FROM #Table WHERE _id = 1 UNION ALL SELECT #Table._Id , #Table.id , somedate , somevalue , somevalue + _totvalue FROM #Table,CTE WHERE #Table._id > 1 AND CTE._Id = ( #Table._id-1 ) ) SELECT * FROM CTE ROLLBACK TRAN