Proto commits in outcaste-io/badger

These 53 commits are when the Protocol Buffers files have changed:

2022-07-06

Commit:	bccf8a2
Author:	Manish R Jain	2022-07-06 13:30:17 -0700
Committer:	GitHub	2022-07-06 13:30:17 -0700

feat(simplify): Remove 25% of Badger code (#12) In this PR, I only keep the features that are needed by Outserv. This results in a 25% reduction of Go codebase, from 15K LOC to 11K LOC, excluding protos, flat buffers, and tests. - Managed mode is the only mode available. - Write transactions and Oracle are gone. - Memtable and WAL are gone. - SyncWrites option is gone. We no longer need it because we only have SSTables and we always sync them. - Backup and Load are gone. - WriteBatch is now using Skiplists. - A separate memtable held by the DB object is gone. It only holds immutable memtables now. - Sequence struct is gone. - TTL for keys is gone. - Various tools are gone: - bank.go - backup.go - restore.go - For sake of speed, I commented out lots of tests from: - db_test.go - managed_db_test.go - stream_writer_test.go - txn_test.go - rotate_test.go I'll fix them later.

The documentation is generated from this commit.

2022-02-10

Commit:	7689a23
Author:	Manish R Jain	2022-02-10 12:08:23 -0800

Expel to Outcaste

2021-07-21

Commit:	73c1ce3
Author:	Naman Jain	2021-07-21 12:10:40 +0530
Committer:	GitHub	2021-07-21 12:10:40 +0530

pb: avoid protobuf warning due to common filename (#1519) (#1731) If a Go program imports both badger (v1) and badger/v2, a warning will be produced at init time: WARNING: proto: file "pb.proto" is already registered A future release will panic on registration conflicts. See: https://developers.google.com/protocol-buffers/docs/reference/go/faq#namespace-conflict The problem is the "pb.proto" filename; it's registered globally, which makes it very likely to cause conflicts with other protobuf-generated packages in a Go binary. Coincidentally, this is a problem with badger's pb package in the v1 module, since that too uses the name "pb.proto". Instead, call the file "badgerpb2.proto", which should be unique enough, and it's also the name we use for the Protobuf package. Finally, update gen.sh to work out of the box via just "bash gen.sh" without needing extra tweaks, thanks to the "paths=source_relative" option. It forces output files to be produced next to the input files, which is what we want. (cherry picked from commit 3e6a4b7ca9637d3e2b7522c4ef04052ef31ec6d9) Co-authored-by: Daniel Martí <mvdan@mvdan.cc>

2021-05-21

Commit:	74ade98
Author:	Manish R Jain	2021-05-20 23:20:56 -0700
Committer:	GitHub	2021-05-21 11:50:56 +0530

opt(stream): add option to directly copy over tables from lower levels (#1700) This PR adds FullCopy option in Stream. This allows sending the table entirely to the writer. If this option is set to true we directly copy over the tables from the last 2 levels. This option increases the stream speed while also lowering the memory consumption on the DB that is streaming the KVs. For 71GB, compressed and encrypted DB we observed 3x improvement in speed. The DB contained ~65GB in the last 2 levels while remaining in the above levels. To use this option, the following options should be set in Stream. stream.KeyToList = nil stream.ChooseKey = nil stream.SinceTs = 0 db.managedTxns = true If we use stream writer for receiving the KVs, the encryption mode has to be the same in sender and receiver. This will restrict db.StreamDB() to use the same encryption mode in both input and output DB. Added TODO for allowing different encryption modes.

2021-01-28

Commit:	1bebc26
Author:	Manish R Jain	2021-01-27 20:43:18 -0800
Committer:	GitHub	2021-01-27 20:43:18 -0800

feat(Trie): Working prefix match with holes (#1654) This PR adds a way to match a prefix, while ignoring certain portions of the key (aka holes). This is useful to implement multi-tenancy in Dgraph, where namespace is stored on the byte index 3-11 in the prefix key, and those bytes need to be ignored to subscribe to updates across all the namespaces.

2021-01-15

Commit:	6266a4e
Author:	aman-bansal	2021-01-15 14:06:49 +0530

changing metrics to v3 and badgerpb2 to badgerpb3

2021-01-13

Commit:	b69163b
Author:	aman bansal	2021-01-13 10:55:37 +0530
Committer:	GitHub	2021-01-13 10:55:37 +0530

changing badger module path to v3 (#1636) * changing badger module path to v3 * running go mod tidy * updating metric keys + proto package

2021-01-06

Commit:	54688d8
Author:	aman-bansal	2021-01-06 15:28:31 +0530

changing badger module path to v3

2020-10-13

Commit:	8d26d52
Author:	Manish R Jain	2020-10-13 14:31:31 -0700
Committer:	GitHub	2020-10-13 14:31:31 -0700

opt(memory): Use z.Calloc for allocating KVList (#1563) KVs can take up a lot of memory in the stream framework. With this change, we allocate them using z.Allocator, and allow callers to KeyToList to use the allocator to generate KVs as well. After we call Send, we release them. Also change Stream Framework to spit out StreamDone markers.

2020-10-03

Commit:	599363b
Author:	Ibrahim Jarif	2020-10-03 22:04:58 +0530
Committer:	GitHub	2020-10-03 22:04:58 +0530

[BREAKING] feat(index): Use flatbuffers instead of protobuf (#1546) This PR - Uses flatbuffers instead of protobufs for table index and directly stores byte slices in the cache. - Uses MaxVersion to pick the oldest tables for compaction first. - Uses leveldb/bloom so that we can test it without unmarshal - Adds uncompressed size and key count in table index. - Updates write bench tool to use managed mode.

2020-10-02

Commit:	58460bb
Author:	Daniel Martí	2020-09-27 11:13:19 +0100
Committer:	Ibrahim Jarif	2020-10-03 02:30:32 +0530

pb: avoid protobuf warning due to common filename (#1519) If a Go program imports both badger (v1) and badger/v2, a warning will be produced at init time: WARNING: proto: file "pb.proto" is already registered A future release will panic on registration conflicts. See: https://developers.google.com/protocol-buffers/docs/reference/go/faq#namespace-conflict The problem is the "pb.proto" filename; it's registered globally, which makes it very likely to cause conflicts with other protobuf-generated packages in a Go binary. Coincidentally, this is a problem with badger's pb package in the v1 module, since that too uses the name "pb.proto". Instead, call the file "badgerpb2.proto", which should be unique enough, and it's also the name we use for the Protobuf package. Finally, update gen.sh to work out of the box via just "bash gen.sh" without needing extra tweaks, thanks to the "paths=source_relative" option. It forces output files to be produced next to the input files, which is what we want.

2020-09-30

Commit:	7d91a85
Author:	Ibrahim Jarif	2020-09-30 14:17:52 +0530

feat(index): Use flatbuffers instead of protobuf Use flatbuffers instead of protobuf for storing table index.

2020-09-27

Commit:	3e6a4b7
Author:	Daniel Martí	2020-09-27 11:13:19 +0100
Committer:	GitHub	2020-09-27 15:43:19 +0530

2020-04-21

Commit:	cddf7c0
Author:	Ibrahim Jarif	2020-04-21 11:56:06 +0530
Committer:	GitHub	2020-04-21 11:56:06 +0530

Proto: Rename dgraph.badger.v2.pb to badgerpb2 (#1314) This PR renames badger protobuf package from `dgraph.badger.v2.pb` to `badgerpb2`. The `pb.pb.go` file has been regenerated using the `pb/gen.sh` script.

2020-04-09

Commit:	8097259
Author:	Daniel Martí	2020-04-09 10:41:09 +0100
Committer:	GitHub	2020-04-09 15:11:09 +0530

Proto: make badger/v2 compatible with v1 (#1293) There were two instances of init-time work being incompatible with v1. That is, if one imported both v1 and v2 as part of a Go build, the resulting binary would end up panicking before main could run. The examples below are done with the latest versions of v1 and v2, and a main.go as follows: package main import ( _ "github.com/dgraph-io/badger" _ "github.com/dgraph-io/badger/v2" ) func main() {} First, the protobuf package used "pb" as its proto package name. This is a problem, because types are registered globally with their fully qualified names, like "pb.Foo". Since both badger/pb and badger/v2/pb tried to globally register types with the same qualified names, we'd get a panic: $ go run . panic: proto: duplicate enum registered: pb.ManifestChange_Operation goroutine 1 [running]: github.com/golang/protobuf/proto.RegisterEnum(...) .../go/pkg/mod/github.com/golang/protobuf@v1.3.1/proto/properties.go:459 github.com/dgraph-io/badger/v2/pb.init.0() .../badger/pb/pb.pb.go:638 +0x459 To fix this, make v2's proto package fully qualified. Since the namespace is global, just "v2.pb" wouldn't suffice; it's not unique enough. "dgraph.badger.v2.pb" seems good, since it follows the Go module path pretty closely. The second issue was with expvar, which too uses globally registered names: $ go run . 2020/04/08 22:59:20 Reuse of exported var name: badger_disk_reads_total panic: Reuse of exported var name: badger_disk_reads_total goroutine 1 [running]: log.Panicln(0xc00010de48, 0x2, 0x2) .../src/log/log.go:365 +0xac expvar.Publish(0x906fcc, 0x17, 0x9946a0, 0xc0000b0318) .../src/expvar/expvar.go:278 +0x267 expvar.NewInt(...) .../src/expvar/expvar.go:298 github.com/dgraph-io/badger/v2/y.init.1() .../badger/y/metrics.go:55 +0x65 exit status 2 This time, replacing the "badger_" var prefix with "badger_v2_" seems like it's simple enough as a fix. Fixes #1208.

2020-04-08

Commit:	69f35b3
Author:	michele meloni	2020-04-08 11:52:23 +0200
Committer:	GitHub	2020-04-08 15:22:23 +0530

Add go_package in .proto (#1282)

2020-03-12

Commit:	ea19351
Author:	Ashish Goswami	2019-10-18 19:05:08 +0530
Committer:	Ibrahim Jarif	2020-03-12 21:13:21 +0530

Introduce StreamDone in Stream Writer (#1061) This PR introduces, a way to tell StreamWriter to close a stream. Previously Streams were always open until Flush is called on StreamWriter. This resulted in memory utilisation, because of underlying TableBuilder to a sortedWriter. Also closing all sorted writer in single call resulted in more memory allocation(during Flush()). This can be useful in some case such as bulk loader in Dgraph, where only one stream is active at a time. (cherry picked from commit 385da9100e3534a5f82b3c37e0990305f8008a3a)

2019-12-10

Commit:	f46f8ea
Author:	Ibrahim Jarif	2019-12-10 20:07:16 +0530
Committer:	GitHub	2019-12-10 20:07:16 +0530

Store total key-value size in table footer (#1137) This PR stores the total key-value size in a table in the table footer. The key-value size can be accessed via db.Tables(..) call. It returns the list of all tables along with the total size of key-values.

2019-10-18

Commit:	385da91
Author:	Ashish Goswami	2019-10-18 19:05:08 +0530
Committer:	GitHub	2019-10-18 19:05:08 +0530

2019-09-30

Commit:	e7d0a7b
Author:	Ibrahim Jarif	2019-09-30 17:00:22 +0530
Committer:	GitHub	2019-09-30 17:00:22 +0530

Support compression in Badger (#1013) This commit adds support for compression in badger. Two compression algorithms - Snappy and ZSTD are supported for now. The compression algorithm information is stored in the manifest file. We compress blocks (typically of size 4KB) stored in the SST. The compression algorithm can be specified via the CompressionType option to badger.

2019-09-24

Commit:	a425b0e
Author:	balaji	2019-09-24 19:36:36 +0530
Committer:	GitHub	2019-09-24 19:36:36 +0530

Support encryption at rest (#1042) This PR will add support for encrypting data which going to be on disk. Two components are being encrypted, one is sst and another one is vlog. In sst, each block is encrypted with seperate IV using AES CTR mode. In vlog, each entry is being encrypted. Each vlog will have base IV of 12 bytes. IV for the each entry is generated by merging base IV with entry offset. Data are encrypted using datakey, which is generated by the badger. The datakey is further encrypted using user provided key and stored in disk. So that user can change key. In order to change key user has to provide old key and new key. we'll decrypt using old key and store the datakey back to disk by encrypting using the new key. By this mechanism, it'll simplfy the key change.

2019-08-29

Commit:	d9b02b7
Author:	பாலாஜி ஜின்னா	2019-08-29 09:18:47 -0700

add encryption details to table manifest changes Signed-off-by: பாலாஜி ஜின்னா <balaji@dgraph.io>

2019-08-27

Commit:	e9fab80
Author:	balaji	2019-08-27 08:34:00 -0700
Committer:	GitHub	2019-08-27 08:34:00 -0700

integrate encryption to db (#1000) * add encrption to the tab᷆le

2019-08-21

Commit:	dd79bfa
Author:	பாலாஜி ஜின்னா	2019-08-21 05:20:55 -0700

add support for encryption to the db Signed-off-by: பாலாஜி ஜின்னா <balaji@dgraph.io>

2019-08-16

Commit:	8868b65
Author:	balaji	2019-08-16 00:05:54 -0700
Committer:	GitHub	2019-08-16 00:05:54 -0700

encryption registry (#975) For encryption at rest support. We'll be using two keys for encryption. One key is provided by the user. Another one will be generated by the badger(data key). The key generated by the badger will be used for encryption and decryption. The key generated by the badger will be encrypted by the user-provided key and persisted to the disk. The main reason behind this implementation is that It'll be easy to rotate keys. We just need to decrypt the data key with the old key and encrypt back with the new key. As a first step in this commit. Key Registry is added. Which is responsible for maintaining all the badger generated key and doing key rotation. We generate a new key for every 10 days, which can be changed by the option.

2019-08-09

Commit:	9f39780
Author:	பாலாஜி ஜின்னா	2019-08-09 01:07:21 -0700
Committer:	பாலாஜி ஜின்னா	2019-08-09 01:36:27 -0700

encryption registry Signed-off-by: பாலாஜி ஜின்னா <balaji@dgraph.io>

2019-08-05

Commit:	9ce7439
Author:	பாலாஜி ஜின்னா	2019-08-05 04:22:07 -0700

clean up Signed-off-by: பாலாஜி ஜின்னா <balaji@dgraph.io>

2019-07-25

Commit:	adfb181
Author:	பாலாஜி ஜின்னா	2019-07-25 11:02:32 -0700

new change Signed-off-by: பாலாஜி ஜின்னா <balaji@dgraph.io>

2019-07-22

Commit:	b05fa0a
Author:	பாலாஜி ஜின்னா	2019-07-22 06:46:18 -0700

hook key registry Signed-off-by: பாலாஜி ஜின்னா <balaji@dgraph.io>

2019-07-21

Commit:	4c9b1b4
Author:	பாலாஜி ஜின்னா	2019-07-21 11:02:50 -0700

add key registry Signed-off-by: பாலாஜி ஜின்னா <balaji@dgraph.io>

2019-07-15

Commit:	4603b3a
Author:	பாலாஜி ஜின்னா	2019-07-15 19:58:53 +0530

wip Signed-off-by: பாலாஜி ஜின்னா <balaji@dgraph.io>

2019-07-05

Commit:	61c492d
Author:	Ibrahim Jarif	2019-07-05 17:13:19 +0530
Committer:	GitHub	2019-07-05 17:13:19 +0530

[breaking/format] Add key-offset index to the end of SST table (#881) * Add key-offset index to the end of SST table This commit adds the key-offset index used for searching the blocks in a table to the end the SST file. The length of the index is stored in the last 4 bytes of the file.

2019-06-12

Commit:	069bc6b
Author:	ashish	2019-06-07 10:50:43 +0530
Committer:	ashish	2019-06-12 10:07:19 +0530

Remove proto, store plane offsets for binary search in block

2019-06-04

Commit:	58b24a8
Author:	Ibrahim Jarif	2019-06-04 18:36:32 +0530
Committer:	Ibrahim Jarif	2019-06-04 18:37:03 +0530

fixup

2019-05-31

Commit:	5061241
Author:	ashish	2019-05-23 19:18:29 +0530
Committer:	ashish	2019-05-31 21:46:21 +0530

Add Checksum at block level

Commit:	d0ded42
Author:	ashish	2019-05-31 12:10:30 +0530
Committer:	ashish	2019-05-31 21:46:21 +0530

Rebase with ibrahim/footer-protobuf

Commit:	6751285
Author:	ashish	2019-05-29 12:01:28 +0530
Committer:	ashish	2019-05-31 21:46:21 +0530

Change Chechsum proto

Commit:	222923c
Author:	ashish	2019-05-24 16:32:57 +0530
Committer:	ashish	2019-05-31 21:46:21 +0530

Add checksum implementation * Add checksum in protos * Add CRC32C and xxHash checksum implementation * Change blockMeta to blockIndex in protos

2019-05-30

Commit:	db8bb57
Author:	Ibrahim Jarif	2019-05-30 16:48:41 +0530

Change pb.checksum format and add checksum.go file

2019-05-28

Commit:	362779c
Author:	Ibrahim Jarif	2019-05-28 15:58:47 +0530

Remove checksum from manifest file

Commit:	aca502d
Author:	Ibrahim Jarif	2019-05-28 15:12:36 +0530

Address review comments

2019-05-24

Commit:	413ce9e
Author:	Ibrahim Jarif	2019-05-24 17:58:02 +0530
Committer:	Ibrahim Jarif	2019-05-24 18:00:45 +0530

fixup

2019-05-21

Commit:	fc950be
Author:	Ibrahim Jarif	2019-05-21 20:44:51 +0530

address review comments

2019-05-20

Commit:	e711436
Author:	Ibrahim Jarif	2019-05-17 18:02:34 +0530
Committer:	Ibrahim Jarif	2019-05-20 21:57:18 +0530

Add key-offset index to the end of SST table This commit adds the key-offset index used for searching the blocks in a table to the end the SST file. The length of the index is stored in the last 4 bytes of the file.

2019-05-16

Commit:	7116e16
Author:	Ashish Goswami	2019-05-17 05:05:11 +0530
Committer:	Manish R Jain	2019-05-16 16:35:11 -0700

Add StreamWriter for fast sorted stream writes (#802) Changes: * Start work on a sorted stream writer which can avoid paying the cost of compactions. * Logic for sorted stream writer is all written. * Add a stream id to each key in the output from Stream framework. * Make StreamWriter work. Performance is incredible, able to write at 100MBps. * Add Tests for Stream Writer * Update Oracle after Stream Writer is done * Mods in stream framework so we can decrease the number of key ranges * Add awareness of stream id in sortedWriter. * Sync the directories when StreamWriter is done. * Add licenses * Moving builder.Finish within the goroutine improves throughput by 15%. Getting 112MBps write speed. * Change Stream Writer Tests to have managed modes also * Rename files to stream_writer.go and the corresponding test file. * Fix a bug caused by marking an index done below current done until, causing a panic. Now we just recreate the oracle. * Update protos

2019-01-20

Commit:	2017987
Author:	Manish R Jain	2019-01-19 21:02:31 -0800
Committer:	GitHub	2019-01-19 21:02:31 -0800

Introduce SSTable sha256 checksums (#689) Add SHA256 checksums for SSTables in MANIFEST. If a table no longer matches this checksum, that table would be skipped over with an error. Tested that it works with previous badger directories. As new tables get created, Badger would store their checksums in MANIFEST. Modified `badger info` to show the checksums stored in MANIFEST, so user can manually compare the output from `sha256sum <filename>` if needed. Fixes #680 .

2018-12-29

Commit:	599bc29
Author:	Manish R Jain	2018-12-28 16:51:05 -0800
Committer:	GitHub	2018-12-28 16:51:05 -0800

Renaming and Refactoring (#655) - Flatten function can now be called from live code path, i.e. while Badger is running. - Refactor stop and start compactions code, so it can be used in Flatten, DropAll, and other places. - Rename protos package to pb. Consolidate both the proto files into one pb.proto. - Rename KeyToKVList to just KeyToList in Stream. - Rename KVPair to just KV. Changes: * Rename protos to pb. * Rename KeyToKVList to KeyToList. * Refactor stop and start compactions, so we can use it in various places. * Rename KVPair to KV * Defer startCompactions right next to stopCompactions. * Catch all usages of KVPair. Rename to KV.

2018-12-28

Commit:	14cbd89
Author:	Manish R Jain	2018-12-28 12:23:50 -0800
Committer:	GitHub	2018-12-28 12:23:50 -0800

Port Stream framework from Dgraph to Badger (#653) This PR ports over a Stream framework from Dgraph to Badger. This framework allows users to concurrently iterate over Badger, converting data to key-value lists, which then get sent out serially. In Dgraph we use this framework for shipping snapshots from leader to followers, doing periodic delta merging of updates, moving predicates from one Alpha group to another, and so on. However, the framework is general enough that it could lie within Badger, and that's what this PR achieves. In addition, the Backup API of Badger now uses this framework to make taking backups faster. Changes: * Port Stream from Dgraph to Badger. * Switch Backup to use the new Stream framework. * Update godocs. * Remove a t.Logf data race. * Self-review

2018-11-25

Commit:	ba13ac7
Author:	Gus	2018-11-25 09:51:53 -0700
Committer:	Manish R Jain	2018-11-25 08:51:53 -0800

Exclude deleted and expired items from backup (#624) * added check to exclude deleted and expired items from backup. also added check to exclude expired items from restore. * changed delete and expire checks to keep a copy of the data, but save a marker indicating state. delete state is detected during restore using a key prefix. * added meta field to protos.KVPair to store meta state during backup. regenerated protos. changed Backup and Load to store and use meta values. added backup and restore tests to complement the existing ones. * fixed test * Handle DiscardEarlierVersions and IsDeletedOrExpired correctly. * removed unneeded test funcs. cleaned up tests. added incremental backup test. * fixed small data race. updated tests. * Cosmetic changes

2017-11-02

Commit:	a5499e5
Author:	Deepak Jois	2017-11-02 07:33:14 +0530
Committer:	GitHub	2017-11-02 07:33:14 +0530

Add support for TTL We add an additional field in the LSM key structure to capture a Unix timestamp, beyond which the key would be considered expired and would be treated as if it is deleted. This changes the on-disk format, so we also need to increment the manifest version number. Fixes #298

2017-10-30

Commit:	671c20e
Author:	Deepak Jois	2017-10-25 15:43:34 +0530
Committer:	Deepak Jois	2017-10-30 09:50:26 +0530

Add DB.Backup() and DB.Load() for backup purposes. We write a backup of the latest snapshot in the DB (including all previous versions) to a protobuf-encoded file. Fixes #135

Commit:	ff3547c
Author:	Deepak Jois	2017-10-26 11:45:05 +0530
Committer:	Deepak Jois	2017-10-30 08:42:30 +0530

Add KV.Backup() to backup data to a file.

2017-08-06

Commit:	f3d5152
Author:	Sam Hughes	2017-07-30 21:13:47 -0700
Committer:	Sam Hughes	2017-08-06 14:25:43 -0700

Replace compaction log with manifest What this does do: - appends to manifest and syncs before deleting old files from disk, to ensure safety amidst kill -9 or system crash What this could do but does not: - include file size, smallest/biggest keys in the manifest - keep track of value log files It might be nice to have file size and include all files, to verify that a `cp -R` or file-based backup/restore has included everything. Right now this just gets us crash safety.