打开网站总显示建设中,甘肃网站seo技术厂家,手机测评网站,西安网站建设昆奇背景
对postgres数据库熟悉的同学会发现在高并发场景下在获取快照处易出现性能瓶颈#xff0c;其原因在于PG使用全局数组在共享内存中保存所有事务的状态#xff0c;在获取快照时需要加锁以保证数据一致性。获取快照时需要持有ProcArraryLock共享锁比遍历ProcArray数组中活跃…背景
对postgres数据库熟悉的同学会发现在高并发场景下在获取快照处易出现性能瓶颈其原因在于PG使用全局数组在共享内存中保存所有事务的状态在获取快照时需要加锁以保证数据一致性。获取快照时需要持有ProcArraryLock共享锁比遍历ProcArray数组中活跃事务与此同时提交或回滚的事务需要申请ProcArray排他锁已清除本事务。可想而知在高并发场景下对ProcArrayLock的申请会成为数据库的瓶颈。为克服上述问题polardb引入CSNCOMMIT SEQUENCE NUM事务快照机制避免对ProcarryLock的申请。
1 CSN 机制
1.1 CSN原理
PolarDB在事务层通过CSN快照来代替PG原生快照 如图所示每个非只读事务在运行过程中会被分配一个xid在事务提交时推进CSN同时会将单前的CSN与事务的XID的映射关系保存起来。 图中实心竖线标识获取快照时刻会获取最新提交CSN的下一个值4。TX1、TX3、TX5均已提交其对应的CSN为1、2、3。TX2、TX4、TX6正在运行TX6、TX8是未来还未开启的事务。对于当前快照而言严格小于CSN4的事务的提交结果均可见其余事务还未提交不可见。
1.2 CSN的实现
CSNCommit Sequence Number提交顺序号本身与XID事务号也会留存一个映射关系以便将事务本身以及其对应的可见性进行关联这个映射关系会留存在CSNLog中。事务ID 2048、2049、2050、2051、2052、2053对应的CSN号依次是5、4、7、10、6、8也就是事务的提交顺序是2049、2048、2052、2050、2053、2051. PolarDB与之对应为每个事务id分配8个字节uint64的CSN号所以一个8kB页面能保存1k个事务的CSN号。CSNLOG达到一定大小后会分块每个CSNLOG文件块的大小为256kB。同xid号类似CSN号预留了几个特殊的号。CSNLOG定义代码如下
2 CSN快照与可见性判断
2.1 CSN相关数据结构
polar_csn_mvcc_var_cache结构体维护了最老的活跃事务xid、下一个将要分配的CSN以及最新完成的事务xid。 当其他事务要获取该事务的CSN状态时如果该事务处于正在提交阶段那么其他事务通过获取CommitSeqNoLock锁的排他模式来等待其完成。 CSNLogControlLock用于写入csnlog文件时加锁保护。
2.2 CSN快照的获取
PolarDB中获取CSN快照函数为GetSnapshotDataCSN实现流程如下 1、获取polar_shmem_csn_mvcc_var_cache-polar_next_csn作为snapshot-polar_snapshot_csn值。 2、snapshot-xmin polar_shmem_csn_mvcc_var_cache-polar_oldest_active_xid 3、snapshot-xmaxpolar_shmem_csn_mvcc_var_cache-polar_latest_completed_xid1 4、根据GUC参数old_snapshot_threshold决定是否需要设置snapshot-lsn以及snapshot-whenTaken 。 5、最后根据GUC参数polar_csn_xid_snapshot表示是否从csn快照中生成xid快照。
tatic Snapshot
GetSnapshotDataCSN(Snapshot snapshot)
{TransactionId xmin;TransactionId xmax;CommitSeqNo snapshotcsn;Assert(snapshot ! NULL);/** The ProcArrayLock is not needed here. We only set our xmin if* its not already set. There are only a few functions that check* the xmin under exclusive ProcArrayLock:* 1) ProcArrayInstallRestored/ImportedXmin -- can only care about* our xmin long after it has been first set.* 2) ProcArrayEndTransaction is not called concurrently with* GetSnapshotData.*//* Anything older than oldestActiveXid is surely finished by now. */xmin pg_atomic_read_u32(polar_shmem_csn_mvcc_var_cache-polar_oldest_active_xid);/* If no performance issue, we try best to maintain RecentXmin for xid based snapshot */RecentXmin xmin;/* Announce my xmin, to hold back GlobalXmin. */if (!TransactionIdIsValid(MyPgXact-xmin)){TransactionId oldest_active_xid;MyPgXact-xmin xmin;TransactionXmin xmin;/** Recheck, if oldestActiveXid advanced after we read it.** This protects against a race condition with GetRecentGlobalXmin().* If a transaction ends runs GetRecentGlobalXmin(), just after we fetch* polar_oldest_active_xid, but before we set MyPgXact-xmin, its possible* that GetRecentGlobalXmin() computed a new GlobalXmin that doesnt* cover the xmin that we got. To fix that, check polar_oldest_active_xid* again, after setting xmin. Redoing it once is enough, we dont need* to loop, because the (stale) xmin that we set prevents the same* race condition from advancing RecentGlobalXmin again.** For a brief moment, we can have the situation that our xmin is* lower than RecentGlobalXmin, but its OK because we dont use that xmin* until weve re-checked and corrected it if necessary.*//** memory barrier to make sure that setting the xmin in our PGPROC entry* is made visible to others, before the read below.*/pg_memory_barrier();oldest_active_xid pg_atomic_read_u32(polar_shmem_csn_mvcc_var_cache-polar_oldest_active_xid);if (oldest_active_xid ! xmin){/*no cover begin*/xmin oldest_active_xid;RecentXmin xmin;MyPgXact-xmin xmin;TransactionXmin xmin;/*no cover end*/}}/** Get the current snapshot CSN. This* serializes us with any concurrent commits.*/snapshotcsn pg_atomic_read_u64(polar_shmem_csn_mvcc_var_cache-polar_next_csn);/** Also get xmax. It is always latestCompletedXid 1.* Make sure to read it after CSN (see TransactionIdAsyncCommitTree())*/pg_read_barrier();xmax pg_atomic_read_u32(polar_shmem_csn_mvcc_var_cache-polar_latest_completed_xid);Assert(TransactionIdIsNormal(xmax));TransactionIdAdvance(xmax);snapshot-xmin xmin;snapshot-xmax xmax;snapshot-polar_snapshot_csn snapshotcsn;snapshot-polar_csn_xid_snapshot false;snapshot-xcnt 0;snapshot-subxcnt 0;snapshot-suboverflowed false;snapshot-curcid GetCurrentCommandId(false);/** This is a new snapshot, so set both refcounts are zero, and mark it as* not copied in persistent memory.*/snapshot-active_count 0;snapshot-regd_count 0;snapshot-copied false;if (old_snapshot_threshold 0){/** If not using snapshot too old feature, fill related fields with* dummy values that dont require any locking.*/snapshot-lsn InvalidXLogRecPtr;snapshot-whenTaken 0;}else{/** Capture the current time and WAL stream location in case this* snapshot becomes old enough to need to fall back on the special* old snapshot logic.*/snapshot-lsn GetXLogInsertRecPtr();snapshot-whenTaken GetSnapshotCurrentTimestamp();MaintainOldSnapshotTimeMapping(snapshot-whenTaken, xmin);}/* * We get RecentGlobalXmin/RecentGlobalDataXmin lazily in polar csn.* In master mode, we reset it when end transaction;* In hot standby mode, wal replayed by startup backend, we has to reset* it when get snapshot,* because RecentGlobalXmin/RecentGlobalDataXmin are backend variables.*/if (RecoveryInProgress())resetGlobalXminCacheCSN();/* * We need xid snapshot, should generate it from csn snapshot.* The logic is:* 1. Scan csnlog from xmin(inclusive) to xmax(exclusive)* 2. Add xids whose status are in_progress or committing or * committed csn snapshotcsn to xid array* Like hot standby, we dont know which xids are top-level and which are* subxacts. So we use subxip to store xids as more as possible. */if (polar_csn_xid_snapshot){if (TransactionIdPrecedes(xmin, xmax))polar_csnlog_get_running_xids(xmin, xmax, snapshotcsn, GetMaxSnapshotSubxidCount(),snapshot-subxcnt, snapshot-subxip, snapshot-suboverflowed);snapshot-polar_csn_xid_snapshot true;}return snapshot;
}2.3 MVCC可见性判断流程
结合行头的结构其中的xmin、xmax以及Clog、上述CSNLOG的映射机制MVCC的大致判断流程如下所示实现函数为HeapTupleSatisfiesMVCC对于xid在CSN快照中的可见性判断函数为XidVisibleInSnapshotCSN其流程图如下
2.4 事务commit和abort如何更新CSN
CSN快照获取主要依据polar_shmem_csn_mvcc_var_cache变量中维护的成员变量参考前面的CSN快照获取。 因此这里主要关注事务在commit和abort时如何更新polar_shmem_csn_mvcc_var_cache的成员变量。
AdvanceOldestActiveXidCSN函数用于推进-polar_oldest_active_xid这个值 进程退出、事务提交以及回滚之后、以及在备机上回放commit和abort时需要推进polar_shmem_csn_mvcc_var_cache-polar_oldest_active_xid当事务的xid等于polar_shmem_csn_mvcc_var_cache-polar_oldest_active_xid时才会推进polar_shmem_csn_mvcc_var_cache-polar_oldest_active_xid的值否则直接返回。
polar_xact_abort_tree_csn在事务回滚时设置CSN的值POLAR_CSN_ABORTED并推进polar_shmem_csn_mvcc_var_cache-polar_latest_completed_xid值。 polar_xact_commit_tree_csn在事务提交时设置该事务CSN的值并推进和polar_shmem_csn_mvcc_var_cache-polar_latest_completed_xid和polar_shmem_csn_mvcc_var_cache-polar_next_csn的值。
polar_shmem_csn_mvcc_var_cache-polar_next_csn只有事务提交才会推进回滚事务不会推进该值。
对于开启CSN功能之后PG中原来的维护xid分配的全局变量ShmemVariableCache中的数据成员只有ShmemVariableCache-nextXid会更新用于分配xid。而原来的ShmemVariableCache-latestCompletedXid等在已经被polar_shmem_csn_mvcc_var_cache-polar_latest_completed_xid所取代因此事务状态变化时并不需要维护其值。