Wednesday, October 19, 2016

Select Rows With Max Value


We often need to select rows based on the maximum value of a specific column, such as date. Let’s say we  have a table that stores customers orders, and we want to list the last order dates and amounts for each customer.
Here is a demo data set and query to achieve this. The same query can be extended to select rows based on the maximum value of any other column.
The table data set is:
SQL> select * from t_orders
  2  order by cust_id, order_date desc;

CUST_ID         ORDER_DAT  ORDER_AMT
--------------- --------- ----------
100             27-APR-10          8
100             17-JAN-10         21
101             02-AUG-10          7
101             26-JUL-10          4
101             17-APR-10          4
101             28-MAR-10         55
102             28-MAR-10          9
103             27-APR-10          4
103             18-MAR-10          3
103             17-JAN-10         15
103             07-JAN-10          1
104             26-JUN-10          2
104             18-MAR-10          2

13 rows selected.
We see from the table t_orders that four customers have placed a total of 13 orders. Of these we want to choose the latest order details for each customer.
The query to do this is:
SQL> -- Last order date of each customer
SQL> select cust_id
  2       , last_order_date
  3       , order_amt last_order_amt
  4  from
  5  (
  6    select cust_id
  7       , order_date
  8       , max(order_date) over
  9           (partition by cust_id) last_order_date
 10       , order_amt
 11    from t_orders
 12  )
 13  where order_date = last_order_date;

CUST_ID         LAST_ORDE LAST_ORDER_AMT
--------------- --------- --------------
100             27-APR-10              8
101             02-AUG-10              7
102             28-MAR-10              9
103             27-APR-10              4
104             26-JUN-10              2
The inner query uses the analytics function max() on a partition by customer id. The outer query picks that record for each customer in which the order date matches the maximum order date.
A similar query to select order details for the maximum order amount placed by each customer. In this case, max() is applied to order_amt.
SQL> -- Max order amount of each customer
SQL> select cust_id
  2       , order_date date_of_max_order
  3       , max_order_amt
  4  from
  5  (
  6    select cust_id
  7       , order_date
  8       , order_amt
  9       , max(order_amt) over
 10           (partition by cust_id) max_order_amt
 11    from t_orders
 12  )
 13  where order_amt = max_order_amt;

CUST_ID         DATE_OF_M MAX_ORDER_AMT
--------------- --------- -------------
100             17-JAN-10            21
101             28-MAR-10            55
102             28-MAR-10             9
103             17-JAN-10            15
104             26-JUN-10             2
104             18-MAR-10             2
Notice that in the second query, two records are retrieved for cust_id 104 as it has two orders with the same maximum amount.

No comments:

Post a Comment

How To Fix Hive – Partition Table Query Failed When Stored As Parquet

This article is about the bug in Hive filtering option, when the partition table query stored as parquet. Big data developers will help y...