Home » Code » 经典sql难题——组内极值

经典sql难题——组内极值

表结构如下:

CREATE TABLE `salaries` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `emp_no` int(11) DEFAULT NULL COMMENT '员工编号',
  `dept_no` int(11) DEFAULT NULL COMMENT '部门编号',
  `salary` int(10) DEFAULT NULL COMMENT '薪水',
  PRIMARY KEY (`id`)
);

INSERT INTO `test`.`salaries`(`id`, `emp_no`, `dept_no`, `salary`) VALUES (1, 1, 1, 10000);
INSERT INTO `test`.`salaries`(`id`, `emp_no`, `dept_no`, `salary`) VALUES (2, 2, 1, 10000);
INSERT INTO `test`.`salaries`(`id`, `emp_no`, `dept_no`, `salary`) VALUES (3, 3, 1, 8000);
INSERT INTO `test`.`salaries`(`id`, `emp_no`, `dept_no`, `salary`) VALUES (4, 4, 1, 8000);
INSERT INTO `test`.`salaries`(`id`, `emp_no`, `dept_no`, `salary`) VALUES (5, 5, 1, 7000);
INSERT INTO `test`.`salaries`(`id`, `emp_no`, `dept_no`, `salary`) VALUES (6, 6, 2, 10000);
INSERT INTO `test`.`salaries`(`id`, `emp_no`, `dept_no`, `salary`) VALUES (7, 7, 2, 9000);
INSERT INTO `test`.`salaries`(`id`, `emp_no`, `dept_no`, `salary`) VALUES (8, 8, 2, 8000);
INSERT INTO `test`.`salaries`(`id`, `emp_no`, `dept_no`, `salary`) VALUES (9, 9, 3, 10000);
INSERT INTO `test`.`salaries`(`id`, `emp_no`, `dept_no`, `salary`) VALUES (10, 10, 3, 9000);
INSERT INTO `test`.`salaries`(`id`, `emp_no`, `dept_no`, `salary`) VALUES (11, 11, 3, 9000);
INSERT INTO `test`.`salaries`(`id`, `emp_no`, `dept_no`, `salary`) VALUES (12, 12, 3, 8000);

组内最大:求出各部门薪水最高的员工,如果有多个员工薪水相同,需要全部显示。

方法1,相关子查询:

mysql> select * from salaries s1 where salary = (select max(salary) from salaries s2 where s1.dept_no = s2.dept_no);
+----+--------+---------+--------+
| id | emp_no | dept_no | salary |
+----+--------+---------+--------+
|  1 |      1 |       1 |  10000 |
|  2 |      2 |       1 |  10000 |
|  6 |      6 |       2 |  10000 |
|  9 |      9 |       3 |  10000 |
+----+--------+---------+--------+
4 rows in set (0.00 sec)

方法2:非相关子查询:

mysql> select s1.* from salaries s1 inner join (select dept_no, max(salary) as salary from salaries group by dept_no) s2 on s1.dept_no = s2.dept_no and s1.salary = s2.salary;
+----+--------+---------+--------+
| id | emp_no | dept_no | salary |
+----+--------+---------+--------+
|  1 |      1 |       1 |  10000 |
|  2 |      2 |       1 |  10000 |
|  6 |      6 |       2 |  10000 |
|  9 |      9 |       3 |  10000 |
+----+--------+---------+--------+
4 rows in set (0.00 sec)

方法3:使用 left join:

mysql> select s1.* from salaries s1 left join salaries s2 on s1.dept_no = s2.dept_no and s1.salary < s2.salary where s2.id is null;
+----+--------+---------+--------+
| id | emp_no | dept_no | salary |
+----+--------+---------+--------+
|  1 |      1 |       1 |  10000 |
|  2 |      2 |       1 |  10000 |
|  6 |      6 |       2 |  10000 |
|  9 |      9 |       3 |  10000 |
+----+--------+---------+--------+
4 rows in set (0.00 sec)

left join 的原理是,在笛卡尔积中,当左表的 salary 处于最大值时,右表没有更大的 salary,因此通过 s2.id is null 筛选出最大的记录。看下边完整的记录即可明白:

mysql> select * from salaries s1 left join salaries s2 on s1.dept_no = s2.dept_no and s1.salary < s2.salary order by s1.dept_no;
+----+--------+---------+--------+------+--------+---------+--------+
| id | emp_no | dept_no | salary | id   | emp_no | dept_no | salary |
+----+--------+---------+--------+------+--------+---------+--------+
|  5 |      5 |       1 |   7000 |    4 |      4 |       1 |   8000 |
|  4 |      4 |       1 |   8000 |    1 |      1 |       1 |  10000 |
|  5 |      5 |       1 |   7000 |    1 |      1 |       1 |  10000 |
|  3 |      3 |       1 |   8000 |    2 |      2 |       1 |  10000 |
|  1 |      1 |       1 |  10000 | NULL |   NULL |    NULL |   NULL |
|  4 |      4 |       1 |   8000 |    2 |      2 |       1 |  10000 |
|  2 |      2 |       1 |  10000 | NULL |   NULL |    NULL |   NULL |
|  5 |      5 |       1 |   7000 |    2 |      2 |       1 |  10000 |
|  5 |      5 |       1 |   7000 |    3 |      3 |       1 |   8000 |
|  3 |      3 |       1 |   8000 |    1 |      1 |       1 |  10000 |
|  7 |      7 |       2 |   9000 |    6 |      6 |       2 |  10000 |
|  8 |      8 |       2 |   8000 |    6 |      6 |       2 |  10000 |
|  8 |      8 |       2 |   8000 |    7 |      7 |       2 |   9000 |
|  6 |      6 |       2 |  10000 | NULL |   NULL |    NULL |   NULL |
| 12 |     12 |       3 |   8000 |   10 |     10 |       3 |   9000 |
| 12 |     12 |       3 |   8000 |   11 |     11 |       3 |   9000 |
| 10 |     10 |       3 |   9000 |    9 |      9 |       3 |  10000 |
| 11 |     11 |       3 |   9000 |    9 |      9 |       3 |  10000 |
|  9 |      9 |       3 |  10000 | NULL |   NULL |    NULL |   NULL |
| 12 |     12 |       3 |   8000 |    9 |      9 |       3 |  10000 |
+----+--------+---------+--------+------+--------+---------+--------+
20 rows in set (0.00 sec)

组内Top N: 求出各部门薪水排名前2的员工,如果有多个员工薪水相同,需要全部显示。

这其实是问题1的一般化,问题1就是排名第1,这是排名Top N。相关子查询的方式如下:

mysql> select * from salaries s1 where (select count(distinct(s2.salary)) from salaries s2 where s1.dept_no = s2.dept_no and s1.salary < s2.salary) < 2;
+----+--------+---------+--------+
| id | emp_no | dept_no | salary |
+----+--------+---------+--------+
|  1 |      1 |       1 |  10000 |
|  2 |      2 |       1 |  10000 |
|  3 |      3 |       1 |   8000 |
|  4 |      4 |       1 |   8000 |
|  6 |      6 |       2 |  10000 |
|  7 |      7 |       2 |   9000 |
|  9 |      9 |       3 |  10000 |
| 10 |     10 |       3 |   9000 |
| 11 |     11 |       3 |   9000 |
+----+--------+---------+--------+
9 rows in set (0.00 sec)

这语句要怎么理解呢?首先,如何定义薪水排名第1,换个说法就是薪水比我高的一个都没有,有0个。如何定义薪水排名第2,就是在薪水在我之上的,薪水去重后值只是1个,依此类推。上边语句把最末的”< 2″换成”in(0, 1)”估计就更好理解了。Top N就是要把排名第1,第2,…第N的取回来,in (0, 1, … , N – 1),换个简洁的写法”< N”。因此,如果把问题改为排名第N的,很显然,结果就是”in(N – 1)”。

同样的可以使用left join的方式实现:

mysql> select s1.* from salaries s1 left join salaries s2 on s1.dept_no = s2.dept_no and s1.salary < s2.salary group by s1.dept_no, s1.emp_no, s1.salary having(count(distinct(s2.salary))) < 2;
+----+--------+---------+--------+
| id | emp_no | dept_no | salary |
+----+--------+---------+--------+
|  1 |      1 |       1 |  10000 |
|  2 |      2 |       1 |  10000 |
|  3 |      3 |       1 |   8000 |
|  4 |      4 |       1 |   8000 |
|  6 |      6 |       2 |  10000 |
|  7 |      7 |       2 |   9000 |
|  9 |      9 |       3 |  10000 |
| 10 |     10 |       3 |   9000 |
| 11 |     11 |       3 |   9000 |
+----+--------+---------+--------+
9 rows in set (0.00 sec)

在这种情况下,分组后,count(distinct(s2.salary)) 就是比它大的薪水去重后有几个值。

mysql> select *, count(distinct(s2.salary)) from salaries s1 left join salaries s2 on s1.dept_no = s2.dept_no and s1.salary < s2.salary group by s1.dept_no, s1.emp_no, s1.salary;
+----+--------+---------+--------+------+--------+---------+--------+----------------------------+
| id | emp_no | dept_no | salary | id   | emp_no | dept_no | salary | count(distinct(s2.salary)) |
+----+--------+---------+--------+------+--------+---------+--------+----------------------------+
|  1 |      1 |       1 |  10000 | NULL |   NULL |    NULL |   NULL |                          0 |
|  2 |      2 |       1 |  10000 | NULL |   NULL |    NULL |   NULL |                          0 |
|  3 |      3 |       1 |   8000 |    2 |      2 |       1 |  10000 |                          1 |
|  4 |      4 |       1 |   8000 |    1 |      1 |       1 |  10000 |                          1 |
|  5 |      5 |       1 |   7000 |    1 |      1 |       1 |  10000 |                          2 |
|  6 |      6 |       2 |  10000 | NULL |   NULL |    NULL |   NULL |                          0 |
|  7 |      7 |       2 |   9000 |    6 |      6 |       2 |  10000 |                          1 |
|  8 |      8 |       2 |   8000 |    6 |      6 |       2 |  10000 |                          2 |
|  9 |      9 |       3 |  10000 | NULL |   NULL |    NULL |   NULL |                          0 |
| 10 |     10 |       3 |   9000 |    9 |      9 |       3 |  10000 |                          1 |
| 11 |     11 |       3 |   9000 |    9 |      9 |       3 |  10000 |                          1 |
| 12 |     12 |       3 |   8000 |   10 |     10 |       3 |   9000 |                          2 |
+----+--------+---------+--------+------+--------+---------+--------+----------------------------+
12 rows in set (0.00 sec)

一个变形题:求各部门员工薪水部门内排名,薪水相同的排名相同。有了上边基础,采用 left join 形式,so easy:

mysql> select s1.*, count(distinct(s2.salary))+1 as rank from salaries s1 left join salaries s2 on s1.dept_no = s2.dept_no and s1.salary < s2.salary group by s1.dept_no, s1.emp_no, s1.salary;
+----+--------+---------+--------+------+
| id | emp_no | dept_no | salary | rank |
+----+--------+---------+--------+------+
|  1 |      1 |       1 |  10000 |    1 |
|  2 |      2 |       1 |  10000 |    1 |
|  3 |      3 |       1 |   8000 |    2 |
|  4 |      4 |       1 |   8000 |    2 |
|  5 |      5 |       1 |   7000 |    3 |
|  6 |      6 |       2 |  10000 |    1 |
|  7 |      7 |       2 |   9000 |    2 |
|  8 |      8 |       2 |   8000 |    3 |
|  9 |      9 |       3 |  10000 |    1 |
| 10 |     10 |       3 |   9000 |    2 |
| 11 |     11 |       3 |   9000 |    2 |
| 12 |     12 |       3 |   8000 |    3 |
+----+--------+---------+--------+------+
12 rows in set (0.00 sec)

ddd

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.