SQL find sum of entries by date including previous date
Posted By: Anonymous
This might not be possible, but here this is the request I’m trying to translate into SQL:
For every date where at least one entry exists in the ACCOUNT_ENTRY
table, I want to find the sum of all entries for all accounts (in the ACCOUNT
table) for that date. If an account doesn’t have an entry for that particular date, I want to use the next latest entry up to that point.
Note that I’m currently doing this on an H2 DB, but that could change in the future so I’m trying to stay away from vendor specific Stored Procedures.
Example
ACCOUNT
table
ID | ACCOUNT_LABEL |
---|---|
1 | account 123 |
2 | account 456 |
3 | account 789 |
ACCOUNT_ENTRY
table
ID | ACCOUNT_ID | ENTRY_DATE | BOOK_VALUE | MARKET_VALUE |
---|---|---|---|---|
7 | 1 | 2021-05-31 | 100 | 110 |
5 | 1 | 2021-05-28 | 90 | 100 |
6 | 2 | 2021-05-28 | 70 | 80 |
4 | 3 | 2021-05-28 | 50 | 60 |
3 | 1 | 2021-05-27 | 80 | 90 |
2 | 2 | 2021-05-27 | 60 | 70 |
1 | 3 | 2021-05-27 | 40 | 50 |
Desired result
ENTRY_DATE | SUM_OF_BOOK_VAL | SUM_OF_MARKET_VAL |
---|---|---|
2021-05-27 | 180 | 210 |
2021-05-28 | 210 | 240 |
2021-05-31 | 220 | 250 |
2021-05-27
Book value = 80 + 60 + 40 (Row ID 1 + 2 + 3)2021-05-27
Market value = 90 + 70 + 50 (Row ID 1 + 2 + 3)2021-05-28
Book value = 90 + 70 + 50 (Row ID 4 + 5 + 6)2021-05-28
Market value = 100 + 80 + 60 (Row ID 4 + 5 + 6)2021-05-31
is equal to the results from2021-05-28
+ the new single new entry from2021-05-31
(Row ID = 7)- Book Value = 70 + 50 + 100 = 220
- Market Value = 80 + 60 + 110 = 250
Result I can get so far
ENTRY_DATE | SUM_OF_BOOK_VAL | SUM_OF_MARKET_VAL |
---|---|---|
2021-05-27 | 180 | 210 |
2021-05-28 | 210 | 240 |
2021-05-31 | 100 | 110 |
This would be easy to do in my application code, but I was hoping to leave the compute to the DB and not transport all the data over to the application. I could also roll data forward, but then I could end up with significant chunks of data which aren’t necessary.
Here’s the query I’m using so far, but it doesn’t handle situations where it has to look back at previous entries.
Query
SELECT
TO_CHAR(ENTRY_DATE, 'yyyy-mm-dd') AS ENTRYDATE,
SUM(
CASE
WHEN A.JOINT_ACCOUNT = 'TRUE' THEN MARKET_VALUE / 2
ELSE MARKET_VALUE
END
) AS MARKETVALUE,
SUM(
CASE
WHEN A.JOINT_ACCOUNT = 'TRUE' THEN BOOK_VALUE / 2
ELSE BOOK_VALUE
END
) AS BOOKVALUE
FROM
ACCOUNT_ENTRY AE
INNER JOIN ACCOUNT A ON AE.ACCOUNT_ID = A.ID
INNER JOIN (
SELECT
MAX(ID) AS MAX_ID,
TO_CHAR(ENTRY_DATE, 'yyyy-mm-dd'),
AE.ACCOUNT_ID AS AID
FROM
ACCOUNT_ENTRY AE
GROUP BY
AE.ACCOUNT_ID,
TO_CHAR(ENTRY_DATE, 'yyyy-mm-dd')
) MAX_ENTRIES ON AE.ID = MAX_ENTRIES.MAX_ID
GROUP BY
TO_CHAR(ENTRY_DATE, 'yyyy-mm-dd')
ORDER BY
TO_CHAR(ENTRY_DATE, 'yyyy-mm-dd') DESC
This might not be possible, but here this is the request I’m trying to translate into SQL:
For every date where at least one entry exists in the ACCOUNT_ENTRY
table, I want to find the sum of all entries for all accounts (in the ACCOUNT
table) for that date. If an account doesn’t have an entry for that particular date, I want to use the next latest entry up to that point.
Note that I’m currently doing this on an H2 DB, but that could change in the future so I’m trying to stay away from vendor specific Stored Procedures.
Example
ACCOUNT
table
ID | ACCOUNT_LABEL |
---|---|
1 | account 123 |
2 | account 456 |
3 | account 789 |
ACCOUNT_ENTRY
table
ID | ACCOUNT_ID | ENTRY_DATE | BOOK_VALUE | MARKET_VALUE |
---|---|---|---|---|
7 | 1 | 2021-05-31 | 100 | 110 |
5 | 1 | 2021-05-28 | 90 | 100 |
6 | 2 | 2021-05-28 | 70 | 80 |
4 | 3 | 2021-05-28 | 50 | 60 |
3 | 1 | 2021-05-27 | 80 | 90 |
2 | 2 | 2021-05-27 | 60 | 70 |
1 | 3 | 2021-05-27 | 40 | 50 |
Desired result
ENTRY_DATE | SUM_OF_BOOK_VAL | SUM_OF_MARKET_VAL |
---|---|---|
2021-05-27 | 180 | 210 |
2021-05-28 | 210 | 240 |
2021-05-31 | 220 | 250 |
2021-05-27
Book value = 80 + 60 + 40 (Row ID 1 + 2 + 3)2021-05-27
Market value = 90 + 70 + 50 (Row ID 1 + 2 + 3)2021-05-28
Book value = 90 + 70 + 50 (Row ID 4 + 5 + 6)2021-05-28
Market value = 100 + 80 + 60 (Row ID 4 + 5 + 6)2021-05-31
is equal to the results from2021-05-28
+ the new single new entry from2021-05-31
(Row ID = 7)- Book Value = 70 + 50 + 100 = 220
- Market Value = 80 + 60 + 110 = 250
Result I can get so far
ENTRY_DATE | SUM_OF_BOOK_VAL | SUM_OF_MARKET_VAL |
---|---|---|
2021-05-27 | 180 | 210 |
2021-05-28 | 210 | 240 |
2021-05-31 | 100 | 110 |
This would be easy to do in my application code, but I was hoping to leave the compute to the DB and not transport all the data over to the application. I could also roll data forward, but then I could end up with significant chunks of data which aren’t necessary.
Here’s the query I’m using so far, but it doesn’t handle situations where it has to look back at previous entries.
Query
SELECT
TO_CHAR(ENTRY_DATE, 'yyyy-mm-dd') AS ENTRYDATE,
SUM(
CASE
WHEN A.JOINT_ACCOUNT = 'TRUE' THEN MARKET_VALUE / 2
ELSE MARKET_VALUE
END
) AS MARKETVALUE,
SUM(
CASE
WHEN A.JOINT_ACCOUNT = 'TRUE' THEN BOOK_VALUE / 2
ELSE BOOK_VALUE
END
) AS BOOKVALUE
FROM
ACCOUNT_ENTRY AE
INNER JOIN ACCOUNT A ON AE.ACCOUNT_ID = A.ID
INNER JOIN (
SELECT
MAX(ID) AS MAX_ID,
TO_CHAR(ENTRY_DATE, 'yyyy-mm-dd'),
AE.ACCOUNT_ID AS AID
FROM
ACCOUNT_ENTRY AE
GROUP BY
AE.ACCOUNT_ID,
TO_CHAR(ENTRY_DATE, 'yyyy-mm-dd')
) MAX_ENTRIES ON AE.ID = MAX_ENTRIES.MAX_ID
GROUP BY
TO_CHAR(ENTRY_DATE, 'yyyy-mm-dd')
ORDER BY
TO_CHAR(ENTRY_DATE, 'yyyy-mm-dd') DESC
SOLUTION Found
Thanks to @Gordon Linoff for sharing the COALESCE and LAG functions.
First, just to simplify my query, I created a view. This isn’t relevant to the challenge I was facing, but important so that I could eliminate multiple entries for the same account and same day
create view LATEST_ACCOUNT_ENTRY as
select
max(id) as entry_id,
account_id,
to_char(entry_date, 'yyyy-mm-dd') as e_date
from
account_entry
group by
account_id,
e_date
Final Query (going to be simplified)
select
e_date,
sum(coalesce(BOOK_VALUE, PREVIOUS_BOOK_VALUE)),
sum(coalesce(MARKET_VALUE, PREVIOUS_MARKET_VALUE))
from
(
select
dates.e_date,
acc_id,
entry_id,
book_value,
LAG(book_value) OVER (
partition by dates.acc_id
order by
dates.e_date
) as previous_book_value,
market_value,
LAG(market_value) OVER (
partition by dates.acc_id
order by
dates.e_date
) as previous_market_value
from
(
select
distinct to_char(entry_date, 'yyyy-mm-dd') as e_date,
accts.id as acc_id
from
account_entry
cross join (
select
id
from
account
) accts
) dates
left join (
select
lae.ACCOUNT_ID,
lae.E_DATE,
lae.ENTRY_ID,
CASE
WHEN JOINT_ACCOUNT = 'TRUE' THEN BOOK_VALUE / 2
ELSE BOOK_VALUE
END as BOOK_VALUE,
CASE
WHEN JOINT_ACCOUNT = 'TRUE' THEN MARKET_VALUE / 2
ELSE MARKET_VALUE
END as MARKET_VALUE
from
LATEST_ACCOUNT_ENTRY lae
inner join account_entry ae on lae.ENTRY_ID = ae.id
inner join account acc on ae.account_id = acc.id
) entries on dates.e_date = entries.e_date
and dates.acc_id = entries.account_id
)
group by
e_date
Solution
You can do this by generating rows for all dates and accounts and then using lag(ignore nulls)
to get the most recent value. So, for the rows:
select a.account_id, d.entry_date, ae.book_val, ae.market_val,
coalesce(ae.book_val,
lag(ae.book_val) ignore nulls over (partition by a.account_id order by d.entry_date)
) as imputed_book_val,
coalesce(ae.market_val,
lag(ae.market_val) ignore nulls over (partition by a.account_id order by d.entry_date)
) as imputed_market_val
from (select distinct account_id from account_entry) a cross join
(select distinct entry_date from account_entry) d left join
account_entry ae
on ae.account_id = a.account_id and
ae.entry_date = d.entry_date;
Then you can aggregate this:
select entry_date, sum(imputed_book_val), sum(imputed_market_val)
from (select a.account_id, d.entry_date, ae.book_val, ae.market_val,
coalesce(ae.book_val,
lag(ae.book_val) ignore nulls over (partition by a.account_id order by d.entry_date)
) as imputed_book_val,
coalesce(ae.market_val,
lag(ae.market_val) ignore nulls over (partition by a.account_id order by d.entry_date)
) as imputed_market_val
from (select distinct account_id from account_entry) a cross join
(select distinct entry_date from account_entry) d left join
account_entry ae
on ae.account_id = a.account_id and
ae.entry_date = d.entry_date
) ad
group by entry_date;
Answered By: Anonymous
Disclaimer: This content is shared under creative common license cc-by-sa 3.0. It is generated from StackExchange Website Network.