范围查询
我们已经实现了KV存储顶部的表结构,我们能够通过主键检索记录。在本章中,我们将添加以排序顺序检索一系列记录的能力。
B-树迭代器
第一步是将范围查询添加到B树中。
BIter
类型允许我们迭代地遍历B树。// B-tree iterator type BIter struct { tree *BTree path []BNode // from root to leaf pos []uint16 // indexes into nodes } // get the current KV pair func (iter *BIter) Deref() ([]byte, []byte) // precondition of the Deref() func (iter *BIter) Valid() bool // moving backward and forward func (iter *BIter) Prev() func (iter *BIter) Next()
BIter
是从根结点到叶结点中的KV对的路径。移动迭代器只是将位置或节点移动到同级。:func iterPrev(iter *BIter, level int) { if iter.pos[level] > 0 { iter.pos[level]-- // move within this node } else if level > 0 { iterPrev(iter, level-1) // move to a slibing node } else { return // dummy key } if level+1 < len(iter.pos) { // update the kid node node := iter.path[level] kid := iter.tree.get(node.getPtr(iter.pos[level])) iter.path[level+1] = kid iter.pos[level+1] = kid.nkeys() - 1 } } func (iter *BIter) Prev() { iterPrev(iter, len(iter.path)-1) }
B树SeekLE是在范围查询中查找初始位置的函数。它只是一个记录路径的普通B树查找。
// find the closest position that is less or equal to the input key func (tree *BTree) SeekLE(key []byte) *BIter { iter := &BIter{tree: tree} for ptr := tree.root; ptr != 0; { node := tree.get(ptr) idx := nodeLookupLE(node, key) iter.path = append(iter.path, node) iter.pos = append(iter.pos, idx) if node.btype() == BNODE_NODE { ptr = node.getPtr(idx) } else { ptr = 0 } } return iter }
NodeLookupLE函数仅适用于范围查询中的“小于或等于”运算符,对于其他3个运算符(小于、大于、大于或等于),结果可能为一个。我们会用BT树解决的。寻求功能。
const ( CMP_GE = +3 // >= CMP_GT = +2 // > CMP_LT = -2 // < CMP_LE = -3 // <= ) // find the closest position to a key with respect to the `cmp` relation func (tree *BTree) Seek(key []byte, cmp int) *BIter { iter := tree.SeekLE(key) if cmp != CMP_LE && iter.Valid() { cur, _ := iter.Deref() if !cmpOK(cur, cmp, key) { // off by one if cmp > 0 { iter.Next() } else { iter.Prev() } } } return iter } // key cmp ref func cmpOK(key []byte, cmp int, ref []byte) bool { r := bytes.Compare(key, ref) switch cmp { case CMP_GE: return r >= 0 case CMP_GT: return r > 0 case CMP_LT: return r < 0 case CMP_LE: return r <= 0 default: panic("what?") } }
数据序列化
要支持范围查询,必须在KV存储区中正确比较序列化主键。一种方法是反序列化主键,然后逐列比较。我们将使用另一种方法,让序列化的键字节反映它们的字典顺序,也就是说,键可以按字节进行正确的比较。比
bytes.Compare
较或memcmp
,而不首先对它们进行反序列化。让我们称这种技术为“顺序保持编码”,它可以在不控制底层KV存储的关键字比较功能的情况下使用。对于整数,您可以很容易地看到,无符号大端整数是顺序保持-最重要的位在大端格式的第一位。和以空值结尾的字符串也是保序的。
对于有符号整数,问题是负数具有最有效位(符号位)集。我们需要在大端编码之前翻转符号位,使负数更低。
// order-preserving encoding func encodeValues(out []byte, vals []Value) []byte { for _, v := range vals { switch v.Type { case TYPE_INT64: var buf [8]byte u := uint64(v.I64) + (1 << 63) binary.BigEndian.PutUint64(buf[:], u) out = append(out, buf[:]...) case TYPE_BYTES: out = append(out, escapeString(v.Str)...) out = append(out, 0) // null-terminated default: panic("what?") } } return out } func decodeValues(in []byte, out []Value) { // omitted... }
以空结尾的字符串的问题是它们不能包含空字节。我们将通过“转义”空字节来解决这个问题。
“\x00“
替换为“\x01\x01”
,则转义字节“\x01”
本身替换为“\x01\x02”
,则仍保留排序顺序。// Strings are encoded as nul terminated strings, // escape the nul byte so that strings contain no nul byte. func escapeString(in []byte) []byte { zeros := bytes.Count(in, []byte{0}) ones := bytes.Count(in, []byte{1}) if zeros+ones == 0 { return in } out := make([]byte, len(in)+zeros+ones) pos := 0 for _, ch := range in { if ch <= 1 { out[pos+0] = 0x01 out[pos+1] = ch + 1 pos += 2 } else { out[pos] = ch pos += 1 } } return out }
范围查询
最后,我们将添加Scanner类型,它允许我们按照排序顺序遍历一系列记录.
// the iterator for range queries type Scanner struct { // the range, from Key1 to Key2 Cmp1 int // CMP_?? Cmp2 int Key1 Record Key2 Record // internal tdef *TableDef iter *BIter // the underlying B-tree iterator keyEnd []byte // the encoded Key2 } // within the range or not? func (sc *Scanner) Valid() bool // move the underlying B-tree iterator func (sc *Scanner) Next() // fetch the current row func (sc *Scanner) Deref(rec *Record) func (db *DB) Scan(table string, req *Scanner) error { tdef := getTableDef(db, table) if tdef == nil { return fmt.Errorf("table not found: %s", table) } return dbScan(db, tdef, req) }
初始化迭代器:
func dbScan(db *DB, tdef *TableDef, req *Scanner) error { // sanity checks switch { case req.Cmp1 > 0 && req.Cmp2 < 0: case req.Cmp2 > 0 && req.Cmp1 < 0: default: return fmt.Errorf("bad range") } values1, err := checkRecord(tdef, req.Key1, tdef.PKeys) if err != nil { return err } values2, err := checkRecord(tdef, req.Key2, tdef.PKeys) if err != nil { return err } req.tdef = tdef // seek to the start key keyStart := encodeKey(nil, tdef.Prefix, values1[:tdef.PKeys]) req.keyEnd = encodeKey(nil, tdef.Prefix, values2[:tdef.PKeys]) req.iter = db.kv.tree.Seek(keyStart, req.Cmp1) return nil }
移动迭代器:
// within the range or not? func (sc *Scanner) Valid() bool { if !sc.iter.Valid() { return false } key, _ := sc.iter.Deref() return cmpOK(key, sc.Cmp2, sc.keyEnd) } // move the underlying B-tree iterator func (sc *Scanner) Next() { assert(sc.Valid()) if sc.Cmp1 > 0 { sc.iter.Next() } else { sc.iter.Prev() } }
点查询只是范围查询的特殊情况,那么为什么不去掉它们呢?
// get a single row by the primary key func dbGet(db *DB, tdef *TableDef, rec *Record) (bool, error) { // just a shortcut for the scan operation sc := Scanner{ Cmp1: CMP_GE, Cmp2: CMP_LE, Key1: *rec, Key2: *rec, } if err := dbScan(db, tdef, &sc); err != nil { return false, err } if sc.Valid() { sc.Deref(rec) return true, nil } else { return false, nil } }
我们只允许对完整的主键进行范围查询,但是对主键的前缀进行范围查询也是合法的。我们将在下一章中解决这个问题,以及二级索引。