These 53 commits are when the Protocol Buffers files have changed:
Commit: | bccf8a2 | |
---|---|---|
Author: | Manish R Jain | |
Committer: | GitHub |
feat(simplify): Remove 25% of Badger code (#12) In this PR, I only keep the features that are needed by Outserv. This results in a 25% reduction of Go codebase, from 15K LOC to 11K LOC, excluding protos, flat buffers, and tests. - Managed mode is the only mode available. - Write transactions and Oracle are gone. - Memtable and WAL are gone. - SyncWrites option is gone. We no longer need it because we only have SSTables and we always sync them. - Backup and Load are gone. - WriteBatch is now using Skiplists. - A separate memtable held by the DB object is gone. It only holds immutable memtables now. - Sequence struct is gone. - TTL for keys is gone. - Various tools are gone: - bank.go - backup.go - restore.go - For sake of speed, I commented out lots of tests from: - db_test.go - managed_db_test.go - stream_writer_test.go - txn_test.go - rotate_test.go I'll fix them later.
The documentation is generated from this commit.
Commit: | 7689a23 | |
---|---|---|
Author: | Manish R Jain |
Expel to Outcaste
Commit: | 73c1ce3 | |
---|---|---|
Author: | Naman Jain | |
Committer: | GitHub |
pb: avoid protobuf warning due to common filename (#1519) (#1731) If a Go program imports both badger (v1) and badger/v2, a warning will be produced at init time: WARNING: proto: file "pb.proto" is already registered A future release will panic on registration conflicts. See: https://developers.google.com/protocol-buffers/docs/reference/go/faq#namespace-conflict The problem is the "pb.proto" filename; it's registered globally, which makes it very likely to cause conflicts with other protobuf-generated packages in a Go binary. Coincidentally, this is a problem with badger's pb package in the v1 module, since that too uses the name "pb.proto". Instead, call the file "badgerpb2.proto", which should be unique enough, and it's also the name we use for the Protobuf package. Finally, update gen.sh to work out of the box via just "bash gen.sh" without needing extra tweaks, thanks to the "paths=source_relative" option. It forces output files to be produced next to the input files, which is what we want. (cherry picked from commit 3e6a4b7ca9637d3e2b7522c4ef04052ef31ec6d9) Co-authored-by: Daniel Martí <mvdan@mvdan.cc>
Commit: | 74ade98 | |
---|---|---|
Author: | Manish R Jain | |
Committer: | GitHub |
opt(stream): add option to directly copy over tables from lower levels (#1700) This PR adds FullCopy option in Stream. This allows sending the table entirely to the writer. If this option is set to true we directly copy over the tables from the last 2 levels. This option increases the stream speed while also lowering the memory consumption on the DB that is streaming the KVs. For 71GB, compressed and encrypted DB we observed 3x improvement in speed. The DB contained ~65GB in the last 2 levels while remaining in the above levels. To use this option, the following options should be set in Stream. stream.KeyToList = nil stream.ChooseKey = nil stream.SinceTs = 0 db.managedTxns = true If we use stream writer for receiving the KVs, the encryption mode has to be the same in sender and receiver. This will restrict db.StreamDB() to use the same encryption mode in both input and output DB. Added TODO for allowing different encryption modes.
Commit: | 1bebc26 | |
---|---|---|
Author: | Manish R Jain | |
Committer: | GitHub |
feat(Trie): Working prefix match with holes (#1654) This PR adds a way to match a prefix, while ignoring certain portions of the key (aka holes). This is useful to implement multi-tenancy in Dgraph, where namespace is stored on the byte index 3-11 in the prefix key, and those bytes need to be ignored to subscribe to updates across all the namespaces.
Commit: | 6266a4e | |
---|---|---|
Author: | aman-bansal |
changing metrics to v3 and badgerpb2 to badgerpb3
Commit: | b69163b | |
---|---|---|
Author: | aman bansal | |
Committer: | GitHub |
changing badger module path to v3 (#1636) * changing badger module path to v3 * running go mod tidy * updating metric keys + proto package
Commit: | 54688d8 | |
---|---|---|
Author: | aman-bansal |
changing badger module path to v3
Commit: | 8d26d52 | |
---|---|---|
Author: | Manish R Jain | |
Committer: | GitHub |
opt(memory): Use z.Calloc for allocating KVList (#1563) KVs can take up a lot of memory in the stream framework. With this change, we allocate them using z.Allocator, and allow callers to KeyToList to use the allocator to generate KVs as well. After we call Send, we release them. Also change Stream Framework to spit out StreamDone markers.
Commit: | 599363b | |
---|---|---|
Author: | Ibrahim Jarif | |
Committer: | GitHub |
[BREAKING] feat(index): Use flatbuffers instead of protobuf (#1546) This PR - Uses flatbuffers instead of protobufs for table index and directly stores byte slices in the cache. - Uses MaxVersion to pick the oldest tables for compaction first. - Uses leveldb/bloom so that we can test it without unmarshal - Adds uncompressed size and key count in table index. - Updates write bench tool to use managed mode.
Commit: | 58460bb | |
---|---|---|
Author: | Daniel Martí | |
Committer: | Ibrahim Jarif |
pb: avoid protobuf warning due to common filename (#1519) If a Go program imports both badger (v1) and badger/v2, a warning will be produced at init time: WARNING: proto: file "pb.proto" is already registered A future release will panic on registration conflicts. See: https://developers.google.com/protocol-buffers/docs/reference/go/faq#namespace-conflict The problem is the "pb.proto" filename; it's registered globally, which makes it very likely to cause conflicts with other protobuf-generated packages in a Go binary. Coincidentally, this is a problem with badger's pb package in the v1 module, since that too uses the name "pb.proto". Instead, call the file "badgerpb2.proto", which should be unique enough, and it's also the name we use for the Protobuf package. Finally, update gen.sh to work out of the box via just "bash gen.sh" without needing extra tweaks, thanks to the "paths=source_relative" option. It forces output files to be produced next to the input files, which is what we want.
Commit: | 7d91a85 | |
---|---|---|
Author: | Ibrahim Jarif |
feat(index): Use flatbuffers instead of protobuf Use flatbuffers instead of protobuf for storing table index.
Commit: | 3e6a4b7 | |
---|---|---|
Author: | Daniel Martí | |
Committer: | GitHub |
pb: avoid protobuf warning due to common filename (#1519) If a Go program imports both badger (v1) and badger/v2, a warning will be produced at init time: WARNING: proto: file "pb.proto" is already registered A future release will panic on registration conflicts. See: https://developers.google.com/protocol-buffers/docs/reference/go/faq#namespace-conflict The problem is the "pb.proto" filename; it's registered globally, which makes it very likely to cause conflicts with other protobuf-generated packages in a Go binary. Coincidentally, this is a problem with badger's pb package in the v1 module, since that too uses the name "pb.proto". Instead, call the file "badgerpb2.proto", which should be unique enough, and it's also the name we use for the Protobuf package. Finally, update gen.sh to work out of the box via just "bash gen.sh" without needing extra tweaks, thanks to the "paths=source_relative" option. It forces output files to be produced next to the input files, which is what we want.
Commit: | cddf7c0 | |
---|---|---|
Author: | Ibrahim Jarif | |
Committer: | GitHub |
Proto: Rename dgraph.badger.v2.pb to badgerpb2 (#1314) This PR renames badger protobuf package from `dgraph.badger.v2.pb` to `badgerpb2`. The `pb.pb.go` file has been regenerated using the `pb/gen.sh` script.
Commit: | 8097259 | |
---|---|---|
Author: | Daniel Martí | |
Committer: | GitHub |
Proto: make badger/v2 compatible with v1 (#1293) There were two instances of init-time work being incompatible with v1. That is, if one imported both v1 and v2 as part of a Go build, the resulting binary would end up panicking before main could run. The examples below are done with the latest versions of v1 and v2, and a main.go as follows: package main import ( _ "github.com/dgraph-io/badger" _ "github.com/dgraph-io/badger/v2" ) func main() {} First, the protobuf package used "pb" as its proto package name. This is a problem, because types are registered globally with their fully qualified names, like "pb.Foo". Since both badger/pb and badger/v2/pb tried to globally register types with the same qualified names, we'd get a panic: $ go run . panic: proto: duplicate enum registered: pb.ManifestChange_Operation goroutine 1 [running]: github.com/golang/protobuf/proto.RegisterEnum(...) .../go/pkg/mod/github.com/golang/protobuf@v1.3.1/proto/properties.go:459 github.com/dgraph-io/badger/v2/pb.init.0() .../badger/pb/pb.pb.go:638 +0x459 To fix this, make v2's proto package fully qualified. Since the namespace is global, just "v2.pb" wouldn't suffice; it's not unique enough. "dgraph.badger.v2.pb" seems good, since it follows the Go module path pretty closely. The second issue was with expvar, which too uses globally registered names: $ go run . 2020/04/08 22:59:20 Reuse of exported var name: badger_disk_reads_total panic: Reuse of exported var name: badger_disk_reads_total goroutine 1 [running]: log.Panicln(0xc00010de48, 0x2, 0x2) .../src/log/log.go:365 +0xac expvar.Publish(0x906fcc, 0x17, 0x9946a0, 0xc0000b0318) .../src/expvar/expvar.go:278 +0x267 expvar.NewInt(...) .../src/expvar/expvar.go:298 github.com/dgraph-io/badger/v2/y.init.1() .../badger/y/metrics.go:55 +0x65 exit status 2 This time, replacing the "badger_" var prefix with "badger_v2_" seems like it's simple enough as a fix. Fixes #1208.
Commit: | 69f35b3 | |
---|---|---|
Author: | michele meloni | |
Committer: | GitHub |
Add go_package in .proto (#1282)
Commit: | ea19351 | |
---|---|---|
Author: | Ashish Goswami | |
Committer: | Ibrahim Jarif |
Introduce StreamDone in Stream Writer (#1061) This PR introduces, a way to tell StreamWriter to close a stream. Previously Streams were always open until Flush is called on StreamWriter. This resulted in memory utilisation, because of underlying TableBuilder to a sortedWriter. Also closing all sorted writer in single call resulted in more memory allocation(during Flush()). This can be useful in some case such as bulk loader in Dgraph, where only one stream is active at a time. (cherry picked from commit 385da9100e3534a5f82b3c37e0990305f8008a3a)
Commit: | f46f8ea | |
---|---|---|
Author: | Ibrahim Jarif | |
Committer: | GitHub |
Store total key-value size in table footer (#1137) This PR stores the total key-value size in a table in the table footer. The key-value size can be accessed via db.Tables(..) call. It returns the list of all tables along with the total size of key-values.
Commit: | 385da91 | |
---|---|---|
Author: | Ashish Goswami | |
Committer: | GitHub |
Introduce StreamDone in Stream Writer (#1061) This PR introduces, a way to tell StreamWriter to close a stream. Previously Streams were always open until Flush is called on StreamWriter. This resulted in memory utilisation, because of underlying TableBuilder to a sortedWriter. Also closing all sorted writer in single call resulted in more memory allocation(during Flush()). This can be useful in some case such as bulk loader in Dgraph, where only one stream is active at a time.
Commit: | e7d0a7b | |
---|---|---|
Author: | Ibrahim Jarif | |
Committer: | GitHub |
Support compression in Badger (#1013) This commit adds support for compression in badger. Two compression algorithms - Snappy and ZSTD are supported for now. The compression algorithm information is stored in the manifest file. We compress blocks (typically of size 4KB) stored in the SST. The compression algorithm can be specified via the CompressionType option to badger.
Commit: | a425b0e | |
---|---|---|
Author: | balaji | |
Committer: | GitHub |
Support encryption at rest (#1042) This PR will add support for encrypting data which going to be on disk. Two components are being encrypted, one is sst and another one is vlog. In sst, each block is encrypted with seperate IV using AES CTR mode. In vlog, each entry is being encrypted. Each vlog will have base IV of 12 bytes. IV for the each entry is generated by merging base IV with entry offset. Data are encrypted using datakey, which is generated by the badger. The datakey is further encrypted using user provided key and stored in disk. So that user can change key. In order to change key user has to provide old key and new key. we'll decrypt using old key and store the datakey back to disk by encrypting using the new key. By this mechanism, it'll simplfy the key change.
Commit: | d9b02b7 | |
---|---|---|
Author: | பாலாஜி ஜின்னா |
add encryption details to table manifest changes Signed-off-by: பாலாஜி ஜின்னா <balaji@dgraph.io>
Commit: | e9fab80 | |
---|---|---|
Author: | balaji | |
Committer: | GitHub |
integrate encryption to db (#1000) * add encrption to the tab᷆le
Commit: | dd79bfa | |
---|---|---|
Author: | பாலாஜி ஜின்னா |
add support for encryption to the db Signed-off-by: பாலாஜி ஜின்னா <balaji@dgraph.io>
Commit: | 8868b65 | |
---|---|---|
Author: | balaji | |
Committer: | GitHub |
encryption registry (#975) For encryption at rest support. We'll be using two keys for encryption. One key is provided by the user. Another one will be generated by the badger(data key). The key generated by the badger will be used for encryption and decryption. The key generated by the badger will be encrypted by the user-provided key and persisted to the disk. The main reason behind this implementation is that It'll be easy to rotate keys. We just need to decrypt the data key with the old key and encrypt back with the new key. As a first step in this commit. Key Registry is added. Which is responsible for maintaining all the badger generated key and doing key rotation. We generate a new key for every 10 days, which can be changed by the option.
Commit: | 9f39780 | |
---|---|---|
Author: | பாலாஜி ஜின்னா | |
Committer: | பாலாஜி ஜின்னா |
encryption registry Signed-off-by: பாலாஜி ஜின்னா <balaji@dgraph.io>
Commit: | 9ce7439 | |
---|---|---|
Author: | பாலாஜி ஜின்னா |
clean up Signed-off-by: பாலாஜி ஜின்னா <balaji@dgraph.io>
Commit: | adfb181 | |
---|---|---|
Author: | பாலாஜி ஜின்னா |
new change Signed-off-by: பாலாஜி ஜின்னா <balaji@dgraph.io>
Commit: | b05fa0a | |
---|---|---|
Author: | பாலாஜி ஜின்னா |
hook key registry Signed-off-by: பாலாஜி ஜின்னா <balaji@dgraph.io>
Commit: | 4c9b1b4 | |
---|---|---|
Author: | பாலாஜி ஜின்னா |
add key registry Signed-off-by: பாலாஜி ஜின்னா <balaji@dgraph.io>
Commit: | 4603b3a | |
---|---|---|
Author: | பாலாஜி ஜின்னா |
wip Signed-off-by: பாலாஜி ஜின்னா <balaji@dgraph.io>
Commit: | 61c492d | |
---|---|---|
Author: | Ibrahim Jarif | |
Committer: | GitHub |
[breaking/format] Add key-offset index to the end of SST table (#881) * Add key-offset index to the end of SST table This commit adds the key-offset index used for searching the blocks in a table to the end the SST file. The length of the index is stored in the last 4 bytes of the file.
Commit: | 069bc6b | |
---|---|---|
Author: | ashish | |
Committer: | ashish |
Remove proto, store plane offsets for binary search in block
Commit: | 58b24a8 | |
---|---|---|
Author: | Ibrahim Jarif | |
Committer: | Ibrahim Jarif |
fixup
Commit: | 5061241 | |
---|---|---|
Author: | ashish | |
Committer: | ashish |
Add Checksum at block level
Commit: | d0ded42 | |
---|---|---|
Author: | ashish | |
Committer: | ashish |
Rebase with ibrahim/footer-protobuf
Commit: | 6751285 | |
---|---|---|
Author: | ashish | |
Committer: | ashish |
Change Chechsum proto
Commit: | 222923c | |
---|---|---|
Author: | ashish | |
Committer: | ashish |
Add checksum implementation * Add checksum in protos * Add CRC32C and xxHash checksum implementation * Change blockMeta to blockIndex in protos
Commit: | db8bb57 | |
---|---|---|
Author: | Ibrahim Jarif |
Change pb.checksum format and add checksum.go file
Commit: | 362779c | |
---|---|---|
Author: | Ibrahim Jarif |
Remove checksum from manifest file
Commit: | aca502d | |
---|---|---|
Author: | Ibrahim Jarif |
Address review comments
Commit: | 413ce9e | |
---|---|---|
Author: | Ibrahim Jarif | |
Committer: | Ibrahim Jarif |
fixup
Commit: | fc950be | |
---|---|---|
Author: | Ibrahim Jarif |
address review comments
Commit: | e711436 | |
---|---|---|
Author: | Ibrahim Jarif | |
Committer: | Ibrahim Jarif |
Add key-offset index to the end of SST table This commit adds the key-offset index used for searching the blocks in a table to the end the SST file. The length of the index is stored in the last 4 bytes of the file.
Commit: | 7116e16 | |
---|---|---|
Author: | Ashish Goswami | |
Committer: | Manish R Jain |
Add StreamWriter for fast sorted stream writes (#802) Changes: * Start work on a sorted stream writer which can avoid paying the cost of compactions. * Logic for sorted stream writer is all written. * Add a stream id to each key in the output from Stream framework. * Make StreamWriter work. Performance is incredible, able to write at 100MBps. * Add Tests for Stream Writer * Update Oracle after Stream Writer is done * Mods in stream framework so we can decrease the number of key ranges * Add awareness of stream id in sortedWriter. * Sync the directories when StreamWriter is done. * Add licenses * Moving builder.Finish within the goroutine improves throughput by 15%. Getting 112MBps write speed. * Change Stream Writer Tests to have managed modes also * Rename files to stream_writer.go and the corresponding test file. * Fix a bug caused by marking an index done below current done until, causing a panic. Now we just recreate the oracle. * Update protos
Commit: | 2017987 | |
---|---|---|
Author: | Manish R Jain | |
Committer: | GitHub |
Introduce SSTable sha256 checksums (#689) Add SHA256 checksums for SSTables in MANIFEST. If a table no longer matches this checksum, that table would be skipped over with an error. Tested that it works with previous badger directories. As new tables get created, Badger would store their checksums in MANIFEST. Modified `badger info` to show the checksums stored in MANIFEST, so user can manually compare the output from `sha256sum <filename>` if needed. Fixes #680 .
Commit: | 599bc29 | |
---|---|---|
Author: | Manish R Jain | |
Committer: | GitHub |
Renaming and Refactoring (#655) - Flatten function can now be called from live code path, i.e. while Badger is running. - Refactor stop and start compactions code, so it can be used in Flatten, DropAll, and other places. - Rename protos package to pb. Consolidate both the proto files into one pb.proto. - Rename KeyToKVList to just KeyToList in Stream. - Rename KVPair to just KV. Changes: * Rename protos to pb. * Rename KeyToKVList to KeyToList. * Refactor stop and start compactions, so we can use it in various places. * Rename KVPair to KV * Defer startCompactions right next to stopCompactions. * Catch all usages of KVPair. Rename to KV.
Commit: | 14cbd89 | |
---|---|---|
Author: | Manish R Jain | |
Committer: | GitHub |
Port Stream framework from Dgraph to Badger (#653) This PR ports over a Stream framework from Dgraph to Badger. This framework allows users to concurrently iterate over Badger, converting data to key-value lists, which then get sent out serially. In Dgraph we use this framework for shipping snapshots from leader to followers, doing periodic delta merging of updates, moving predicates from one Alpha group to another, and so on. However, the framework is general enough that it could lie within Badger, and that's what this PR achieves. In addition, the Backup API of Badger now uses this framework to make taking backups faster. Changes: * Port Stream from Dgraph to Badger. * Switch Backup to use the new Stream framework. * Update godocs. * Remove a t.Logf data race. * Self-review
Commit: | ba13ac7 | |
---|---|---|
Author: | Gus | |
Committer: | Manish R Jain |
Exclude deleted and expired items from backup (#624) * added check to exclude deleted and expired items from backup. also added check to exclude expired items from restore. * changed delete and expire checks to keep a copy of the data, but save a marker indicating state. delete state is detected during restore using a key prefix. * added meta field to protos.KVPair to store meta state during backup. regenerated protos. changed Backup and Load to store and use meta values. added backup and restore tests to complement the existing ones. * fixed test * Handle DiscardEarlierVersions and IsDeletedOrExpired correctly. * removed unneeded test funcs. cleaned up tests. added incremental backup test. * fixed small data race. updated tests. * Cosmetic changes
Commit: | a5499e5 | |
---|---|---|
Author: | Deepak Jois | |
Committer: | GitHub |
Add support for TTL We add an additional field in the LSM key structure to capture a Unix timestamp, beyond which the key would be considered expired and would be treated as if it is deleted. This changes the on-disk format, so we also need to increment the manifest version number. Fixes #298
Commit: | 671c20e | |
---|---|---|
Author: | Deepak Jois | |
Committer: | Deepak Jois |
Add DB.Backup() and DB.Load() for backup purposes. We write a backup of the latest snapshot in the DB (including all previous versions) to a protobuf-encoded file. Fixes #135
Commit: | ff3547c | |
---|---|---|
Author: | Deepak Jois | |
Committer: | Deepak Jois |
Add KV.Backup() to backup data to a file.
Commit: | f3d5152 | |
---|---|---|
Author: | Sam Hughes | |
Committer: | Sam Hughes |
Replace compaction log with manifest What this does do: - appends to manifest and syncs before deleting old files from disk, to ensure safety amidst kill -9 or system crash What this could do but does not: - include file size, smallest/biggest keys in the manifest - keep track of value log files It might be nice to have file size and include all files, to verify that a `cp -R` or file-based backup/restore has included everything. Right now this just gets us crash safety.