在b-树中,您可以将键和数据存储在内部节点和叶节点中,但在b+树中,您必须仅将数据存储在叶节点中。

在b+树中这样做有什么好处吗?

为什么不在所有地方都使用b-树而不是b+树,因为直觉上它们看起来更快?

我的意思是,为什么需要在b+树中复制键(数据)?


当前回答

The primary distinction between B-tree and B+tree is that B-tree eliminates the redundant storage of search key values.Since search keys are not repeated in the B-tree,we may not be able to store the index using fewer tree nodes than in corresponding B+tree index.However,since search key that appear in non-leaf nodes appear nowhere else in B-tree,we are forced to include an additional pointer field for each search key in a non-leaf node. Their are space advantages for B-tree, as repetition does not occur and can be used for large indices.

其他回答

数据库系统概念示例

B+树

相应的b -树

下图有助于显示B+树和B树之间的区别。

B+树的优点:

Because B+ trees don't have data associated with interior nodes, more keys can fit on a page of memory. Therefore, it will require fewer cache misses in order to access data that is on a leaf node. The leaf nodes of B+ trees are linked, so doing a full scan of all objects in a tree requires just one linear pass through all the leaf nodes. A B tree, on the other hand, would require a traversal of every level in the tree. This full-tree traversal will likely involve more cache misses than the linear traversal of B+ leaves.

B树的优点:

因为B树包含每个键的数据,所以经常访问的节点可以位于更靠近根的位置,因此可以更快地访问。


举个例子——你有一个每一行都有大量数据的表。这意味着对象的每个实例都是大的。

如果在这里使用B树,那么大部分时间都花在扫描带有数据的页面上——这是没有用的。在数据库中,这就是使用B+树来避免扫描对象数据的原因。

B+树将键和数据分开。

但如果你的数据量比较小,你可以用键来存储它们就像B树那样。

B+树的一个可能的用途是它适用于各种情况 哪里的树长得太大,以至于它不适合可用 内存。因此,您通常期望执行多个I/O。 B+树确实经常被使用,即使它实际上适合 内存,然后你的缓存管理器可能会永久保存它。但 这是一个特殊的情况,而不是一般的情况,缓存策略是 与B+树的维护分开。

另外,在B+树中,叶子页以 一个链表(或双链表),用于优化遍历 (用于范围搜索、排序等)。所以指针的数量是 所使用的特定算法的函数。

B+树尤其适用于基于块的存储(例如:硬盘)。考虑到这一点,你会得到几个优势,例如(从我的脑海中):

high fanout / low depth: that means you have to get less blocks to get to the data. with data intermingled with the pointers, each read gets less pointers, so you need more seeks to get to the data simple and consistent block storage: an inner node has N pointers, nothing else, a leaf node has data, nothing else. that makes it easy to parse, debug and even reconstruct. high key density means the top nodes are almost certainly on cache, in many cases all inner nodes get quickly cached, so only the data access has to go to disk.