网站开发学什么 2018,汕头澄海有什么好玩的景点,也可以用,烟台做外贸网站什么是Semi-Join半连接
Semi-Join半连接#xff0c;当外表在内表中找到匹配的记录之后#xff0c;Semi-Join会返回外表中的记录。但即使在内表中找到多条匹配的记录#xff0c;外表也只会返回已经存在于外表中的记录。而对于子查询#xff0c;外表的每个符合条件的元组都要…什么是Semi-Join半连接
Semi-Join半连接当外表在内表中找到匹配的记录之后Semi-Join会返回外表中的记录。但即使在内表中找到多条匹配的记录外表也只会返回已经存在于外表中的记录。而对于子查询外表的每个符合条件的元组都要执行一轮子查询效率比较低下。此时使用半连接操作优化子查询会减少查询次数提高查询性能。其主要思路是将子查询上拉到父查询中这样内表和外表是并列关系外表的每个符合条件的元组只需要在内表中找符合条件的元组即可所以效率会大大提高。 当参与等值JOIN的表达式存在有重复值时, 如果不需要找出该表其他字段的值(也就是仅使用JOIN字段/表达式), 那么JOIN时只需要查每个值的第一条, 然后就可以跳到下一个值. 在数据库中常常被用来优化 in, exists, not exists, any(), except 等操作(或者逻辑上成立的其他JOIN场景). 还有什么特别的joinPostgreSQL 与关系代数 (Equi-Join , Semi-Join , Anti-Join , Division) 并不是所有数据库都实现了所有场景的semi join, 例如 Oracle中的半连接MySQL也有半连接 如果未实现, 有什么方法可以模拟semi-join?递归/group by/distinct on/distinct Semi-Join 例子
准备测试数据
postgres# create table a (id int, info text, ts timestamp);
CREATE TABLE
postgres# create table b (like a);
CREATE TABLE
postgres# insert into a select id, md5(random()::text), now() from generate_series(0,1000000) as t(id);
INSERT 0 1000001 -- b表的100万行记录中b.id只有11个唯一值
postgres# insert into b select random()*10, md5(random()::text), now() from generate_series(0,1000000) as t(id);
INSERT 0 1000001 postgres# create index on a (id);
CREATE INDEX
postgres# create index on b (id);
CREATE INDEX未优化SQL
select a.* from a where exists (select 1 from b where a.idb.id); postgres# explain analyze select a.* from a where exists (select 1 from b where a.idb.id); QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost18436.17..18436.66 rows11 width45) (actual time226.590..226.598 rows11 loops1) Merge Cond: (a.id b.id) - Index Scan using a_id_idx on a (cost0.42..27366.04 rows1000001 width45) (actual time0.010..0.013 rows12 loops1) - Sort (cost18435.74..18435.77 rows11 width4) (actual time226.576..226.577 rows11 loops1) Sort Key: b.id Sort Method: quicksort Memory: 25kB - HashAggregate (cost18435.44..18435.55 rows11 width4) (actual time226.568..226.570 rows11 loops1) Group Key: b.id Batches: 1 Memory Usage: 24kB - Index Only Scan using b_id_idx on b (cost0.42..15935.44 rows1000001 width4) (actual time0.010..77.936 rows1000001 loops1) Heap Fetches: 0 Planning Time: 0.189 ms Execution Time: 226.630 ms
(13 rows)以上查询没有使用semi-join, 性能很一般.
由于b表的100万行记录中b.id只有11个唯一值, 可以使用semi-join进行加速.
用法参考: 《用PostgreSQL找回618秒逝去的青春 - 递归收敛优化》
使用递归模拟SEMI-JOIN, 只需要 0.171 ms 既可得出b表 11个值的结果.
with recursive tmp as ( select min(id) as id from b union all select (select min(b.id) from b where b.id tmp.id) from tmp where tmp.id is not null
)
select * from tmp where tmp.id is not null; id
---- 0 1 2 3 4 5 6 7 8 9 10
(11 rows)执行计划如下
postgres# explain analyze with recursive tmp as ( select min(id) as id from b union all select (select min(b.id) from b where b.id tmp.id) from tmp where tmp.id is not null
)
select * from tmp where tmp.id is not null; QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------- CTE Scan on tmp (cost50.07..52.09 rows100 width4) (actual time0.028..0.134 rows11 loops1) Filter: (id IS NOT NULL) Rows Removed by Filter: 1 CTE tmp - Recursive Union (cost0.44..50.07 rows101 width4) (actual time0.025..0.126 rows12 loops1) - Result (cost0.44..0.45 rows1 width4) (actual time0.024..0.025 rows1 loops1) InitPlan 3 (returns $1) - Limit (cost0.42..0.44 rows1 width4) (actual time0.021..0.022 rows1 loops1) - Index Only Scan using b_id_idx on b b_1 (cost0.42..18435.44 rows1000001 width4) (actual time0.020..0.020 rows1 loops1) Index Cond: (id IS NOT NULL) Heap Fetches: 0 - WorkTable Scan on tmp tmp_1 (cost0.00..4.76 rows10 width4) (actual time0.007..0.007 rows1 loops12) Filter: (id IS NOT NULL) Rows Removed by Filter: 0 SubPlan 2 - Result (cost0.45..0.46 rows1 width4) (actual time0.007..0.007 rows1 loops11) InitPlan 1 (returns $3) - Limit (cost0.42..0.45 rows1 width4) (actual time0.006..0.006 rows1 loops11) - Index Only Scan using b_id_idx on b (cost0.42..6979.51 rows333334 width4) (actual time0.006..0.006 rows1 loops11) Index Cond: ((id IS NOT NULL) AND (id tmp_1.id)) Heap Fetches: 0 Planning Time: 0.177 ms Execution Time: 0.171 ms
(23 rows)使用递归模拟semi-join, SQL改写如下:
select a.* from a where exists (select 1 from b where a.idb.id); 改写成 select a.* from a where exists (select 1 from
(
with recursive tmp as ( select min(id) as id from b union all select (select min(b.id) from b where b.id tmp.id) from tmp where tmp.id is not null
)
select * from tmp where tmp.id is not null
) b where a.idb.id);改写后速度从226.630 ms 提升到 0.246 ms
postgres# explain analyze select a.* from a where exists (select 1 from
(
with recursive tmp as ( select min(id) as id from b union all select (select min(b.id) from b where b.id tmp.id) from tmp where tmp.id is not null
)
select * from tmp where tmp.id is not null
) b where a.idb.id); QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost53.76..318.49 rows100 width45) (actual time0.154..0.189 rows11 loops1) - HashAggregate (cost53.34..54.34 rows100 width4) (actual time0.144..0.149 rows11 loops1) Group Key: tmp.id Batches: 1 Memory Usage: 24kB - CTE Scan on tmp (cost50.07..52.09 rows100 width4) (actual time0.027..0.139 rows11 loops1) Filter: (id IS NOT NULL) Rows Removed by Filter: 1 CTE tmp - Recursive Union (cost0.44..50.07 rows101 width4) (actual time0.024..0.130 rows12 loops1) - Result (cost0.44..0.45 rows1 width4) (actual time0.023..0.024 rows1 loops1) InitPlan 3 (returns $1) - Limit (cost0.42..0.44 rows1 width4) (actual time0.020..0.021 rows1 loops1) - Index Only Scan using b_id_idx on b b_1 (cost0.42..18435.44 rows1000001 width4) (actual time0.019..0.019 rows1 loops1) Index Cond: (id IS NOT NULL) Heap Fetches: 0 - WorkTable Scan on tmp tmp_1 (cost0.00..4.76 rows10 width4) (actual time0.008..0.008 rows1 loops12) Filter: (id IS NOT NULL) Rows Removed by Filter: 0 SubPlan 2 - Result (cost0.45..0.46 rows1 width4) (actual time0.007..0.007 rows1 loops11) InitPlan 1 (returns $3) - Limit (cost0.42..0.45 rows1 width4) (actual time0.006..0.006 rows1 loops11) - Index Only Scan using b_id_idx on b (cost0.42..6979.51 rows333334 width4) (actual time0.006..0.006 rows1 loops11) Index Cond: ((id IS NOT NULL) AND (id tmp_1.id)) Heap Fetches: 0 - Index Scan using a_id_idx on a (cost0.42..2.63 rows1 width45) (actual time0.003..0.003 rows1 loops11) Index Cond: (id tmp.id) Planning Time: 0.295 ms Execution Time: 0.246 ms
(29 rows) 文章转载自: http://www.morning.ns3nt8.cn.gov.cn.ns3nt8.cn http://www.morning.kcrw.cn.gov.cn.kcrw.cn http://www.morning.dnbhd.cn.gov.cn.dnbhd.cn http://www.morning.jntdf.cn.gov.cn.jntdf.cn http://www.morning.lhxrn.cn.gov.cn.lhxrn.cn http://www.morning.nwfpl.cn.gov.cn.nwfpl.cn http://www.morning.wrwcf.cn.gov.cn.wrwcf.cn http://www.morning.xplng.cn.gov.cn.xplng.cn http://www.morning.smqjl.cn.gov.cn.smqjl.cn http://www.morning.mqpdl.cn.gov.cn.mqpdl.cn http://www.morning.pmysp.cn.gov.cn.pmysp.cn http://www.morning.mlnbd.cn.gov.cn.mlnbd.cn http://www.morning.nd-test.com.gov.cn.nd-test.com http://www.morning.kxqwg.cn.gov.cn.kxqwg.cn http://www.morning.simpliq.cn.gov.cn.simpliq.cn http://www.morning.sldrd.cn.gov.cn.sldrd.cn http://www.morning.zwxfj.cn.gov.cn.zwxfj.cn http://www.morning.lqjlg.cn.gov.cn.lqjlg.cn http://www.morning.mfrb.cn.gov.cn.mfrb.cn http://www.morning.ryrpq.cn.gov.cn.ryrpq.cn http://www.morning.hsrch.cn.gov.cn.hsrch.cn http://www.morning.jbshh.cn.gov.cn.jbshh.cn http://www.morning.lffbz.cn.gov.cn.lffbz.cn http://www.morning.cpnlq.cn.gov.cn.cpnlq.cn http://www.morning.kjxgc.cn.gov.cn.kjxgc.cn http://www.morning.trtdg.cn.gov.cn.trtdg.cn http://www.morning.tnhg.cn.gov.cn.tnhg.cn http://www.morning.pfjbn.cn.gov.cn.pfjbn.cn http://www.morning.nsjpz.cn.gov.cn.nsjpz.cn http://www.morning.zcfsq.cn.gov.cn.zcfsq.cn http://www.morning.tdmgs.cn.gov.cn.tdmgs.cn http://www.morning.c7491.cn.gov.cn.c7491.cn http://www.morning.yxwcj.cn.gov.cn.yxwcj.cn http://www.morning.mcpdn.cn.gov.cn.mcpdn.cn http://www.morning.wkqrp.cn.gov.cn.wkqrp.cn http://www.morning.lphtm.cn.gov.cn.lphtm.cn http://www.morning.pngph.cn.gov.cn.pngph.cn http://www.morning.rdnkx.cn.gov.cn.rdnkx.cn http://www.morning.bhznl.cn.gov.cn.bhznl.cn http://www.morning.xcyhy.cn.gov.cn.xcyhy.cn http://www.morning.yhglt.cn.gov.cn.yhglt.cn http://www.morning.pcbfl.cn.gov.cn.pcbfl.cn http://www.morning.bwjws.cn.gov.cn.bwjws.cn http://www.morning.qwnqt.cn.gov.cn.qwnqt.cn http://www.morning.3dcb8231.cn.gov.cn.3dcb8231.cn http://www.morning.hkswt.cn.gov.cn.hkswt.cn http://www.morning.lgsfb.cn.gov.cn.lgsfb.cn http://www.morning.mywnk.cn.gov.cn.mywnk.cn http://www.morning.tpbhf.cn.gov.cn.tpbhf.cn http://www.morning.ysdwq.cn.gov.cn.ysdwq.cn http://www.morning.kkdbz.cn.gov.cn.kkdbz.cn http://www.morning.ndynz.cn.gov.cn.ndynz.cn http://www.morning.zhishizf.cn.gov.cn.zhishizf.cn http://www.morning.bby45.cn.gov.cn.bby45.cn http://www.morning.yhjrc.cn.gov.cn.yhjrc.cn http://www.morning.attorneysportorange.com.gov.cn.attorneysportorange.com http://www.morning.kgjyy.cn.gov.cn.kgjyy.cn http://www.morning.dsxgc.cn.gov.cn.dsxgc.cn http://www.morning.pgggs.cn.gov.cn.pgggs.cn http://www.morning.bysey.com.gov.cn.bysey.com http://www.morning.tlyms.cn.gov.cn.tlyms.cn http://www.morning.gnmhy.cn.gov.cn.gnmhy.cn http://www.morning.zrbpx.cn.gov.cn.zrbpx.cn http://www.morning.cbmqq.cn.gov.cn.cbmqq.cn http://www.morning.cptzd.cn.gov.cn.cptzd.cn http://www.morning.lmdfj.cn.gov.cn.lmdfj.cn http://www.morning.rjmd.cn.gov.cn.rjmd.cn http://www.morning.fhtbk.cn.gov.cn.fhtbk.cn http://www.morning.gwtbn.cn.gov.cn.gwtbn.cn http://www.morning.fbdkb.cn.gov.cn.fbdkb.cn http://www.morning.nxfuke.com.gov.cn.nxfuke.com http://www.morning.xkbdx.cn.gov.cn.xkbdx.cn http://www.morning.kryxk.cn.gov.cn.kryxk.cn http://www.morning.djpgc.cn.gov.cn.djpgc.cn http://www.morning.mngh.cn.gov.cn.mngh.cn http://www.morning.mghgl.cn.gov.cn.mghgl.cn http://www.morning.nslwj.cn.gov.cn.nslwj.cn http://www.morning.jypsm.cn.gov.cn.jypsm.cn http://www.morning.mydgr.cn.gov.cn.mydgr.cn http://www.morning.ympcj.cn.gov.cn.ympcj.cn