These 94 commits are when the Protocol Buffers files have changed:
| Commit: | 05648d5 | |
|---|---|---|
| Author: | Matt Davis | |
# Add per-field doc-values skip index (default on for new fields) Lucene 10.0 introduced the doc-values skip index (`DocValuesSkipper`), and 10.3/10.4/10.5 built a series of optimizations on top of it: block-skipping `SortedNumeric`/`SortedSet` range queries, dynamic-pruning numeric and sorted-set sorts, and skipper-backed `FieldExistsQuery`/`count()`/`rewrite()`. None of these fire unless the doc-values field was written with a skip index. Zulia stores every value as doc values and drives range queries and sorts off them, so adding this optimization helps many places. # Turn on doc-values skip index for new fields and the id field in new indexes * write `SortAs` sort doc-values via `SortedNumericDocValuesField.indexedField` / `SortedSetDocValuesField.indexedField` when the field has the flag set, otherwise the plain constructor * `zulia_index.proto`: add `optional bool docValueSkipIndex` to `FieldConfig`. `optional` so we can distinguish "unset" (default on for a new field) from an explicit opt-out. * Any new field defaults on unless it explicitly opts out * A field that already exists is frozen to its persisted flag. Lucene treats the skip index as part of the immutable field schema and the IndexWriter throws if it changes, so we never flip it on an existing field. * The built-in `zuliaId` sort field is not a `FieldConfig`, so it is gated on a new index-level version instead. `ShardDocumentIndexer` builds the id-sort skip index when `createdIndexVersion >= ZuliaIndexVersion.ID_SORT_DOC_VALUE_SKIP`, so new indexes get it and existing indexes keep the plain field. * `ZuliaIndexVersion` holds the current version plus per-feature introduction versions. It generalizes to other creation-time defaults that can't be expressed per field * Tests: `DocValueSkipIndexTest` (sort field type, skipper present in segment, id-sort skipper follows index version) and `DocValueSkipIndexPolicyTest` (per-field default-on / freeze policy). * The query side needs no changes * No index-format break. Existing indexes stay readable and keep working unchanged, but they do not get the optimization - To get the benefit on existing data, that data must be reindexed into a new index
The documentation is generated from this commit.
| Commit: | 5b7f979 | |
|---|---|---|
| Author: | Matt Davis | |
Add maxDocFreqPct to More-Like-This term selection with a default of 25
| Commit: | 50adcd2 | |
|---|---|---|
| Author: | Matt Davis | |
More-like-this: fetch source documents from other indexes * Add `sourceIndex` to more-like-this so the source documents resolved from (`documentId` can be fetched from indexes other than the ones the similarity search runs on. This lets you find documents in one index that are similar to a document living in a different index. * When given, the queried indexes are ignored for fetching; sources are fetched only from the source indexes, tried in order: the first source index that has the document wins on collision (same ordering rule the queried indexes used). Wildcards and aliases resolve as usual. * Source-doc exclusion (`includeSourceDocs=false`) is dropped when the source indexes are disjoint from the queried indexes: the source docs can't be in the queried results. * Add test cases for source index MLT queries
| Commit: | c4c47bc | |
|---|---|---|
| Author: | Matt Davis | |
| Committer: | Matt Davis | |
Add more-like-this (MLT) query support Adds a MORE_LIKE_THIS query type supporting lexical (Lucene MoreLikeThis), vector (KNN on averaged source vectors), and hybrid similarity search. - Sources are document IDs (fetched server-side), raw text, and/or raw vectors - MoreLikeThisLazyQuery defers MLT construction to per-shard rewrite so each shard uses its own IndexReader for IDF stats - Document IDs are resolved once on the federating node via batch fetch, and then resolved text and centroid vector are passed to shards in the query - When a doc ID exists in multiple queried indexes, the first index in request order wins - Centroid is normalized for UNIT_VECTOR fields - Validates fields, types, and cross-index consistency and caps source docs at 100 for stability - Requires at least one source (document IDs, like text, or like vectors) up front, and fails once on the federating node with a diagnostic message when source docs resolve to no text or vectors - Queries (all types) now fail fast with "Query requires at least one index" instead of silently returning zero hits when the index list is empty. The shared validator is unchanged, so warming searches, which imply their index, are unaffected - Wildcard index patterns and aliases are resolved via the same logic as regular queries - Source documents are excluded from results by default (includeSourceDocs to opt in) - minimum_should_match (mm) is honored on the per-field term disjunction - Realtime searches propagate the realtime flag to the source doc fetch so just-stored documents resolve - Client builder MoreLikeThisQuery with forText/forDocuments factories - REST access via existing queryJson parameter (protobuf Query JSON)
| Commit: | ffd5144 | |
|---|---|---|
| Author: | Matt Davis | |
# Add zuliaadmin updateIndex command Add an `updateIndex` subcommand to the `zuliaadmin` CLI for changing settings on existing index(es) without recreating them, building on the new replica-count change support. - Uses the shared `--index` argument, so a single invocation can target one index, a comma-separated list, or a wildcard (e.g. `--index "log_*"`). Only the options supplied are changed; every other setting is left as-is. - Reports per index whether the update changed anything or was a no-op. - Fix a stale comment in `zulia_index.proto`: the default for `numberOfReplicas` is 0, not 1. ## Usage ``` zuliaadmin updateIndex --index myIndex --numberOfReplicas 2 zuliaadmin updateIndex --index "log_*" --numberOfReplicas 1 --indexWeight 5 ```
| Commit: | 5b9b8b2 | |
|---|---|---|
| Author: | Matt Davis | |
Merge remote-tracking branch 'origin/main' into worktree-replica-config # Conflicts: # zulia-common/src/main/proto/zulia_index.proto
| Commit: | 59b4d22 | |
|---|---|---|
| Author: | Matt Davis | |
# Support changing the replica count for an existing index Previously the replica count was fixed at index creation, and any attempt to change it threw "Cannot change replication factor for existing index yet". This adds the ability to grow or shrink the number of replicas on a live index via `UpdateIndex` (or by re-applying a changed config through `CreateIndex`).
| Commit: | 15e068f | |
|---|---|---|
| Author: | Matt Davis | |
# Add configurable, live NRT cache sizes The index and taxonomy `NRTCachingDirectory` instances used hardcoded RAM cache sizes (50/150 MB for the index, 5/15 MB for the taxonomy) per shard. That ~165 MB/shard ceiling is oversized for many small or low-write indexes. Expose the sizes as per-index settings, allow disabling NRT caching entirely, and make the values changeable live. `ZuliaNRTCachingDirectory` is a subclass of Lucene `NRTCachingDirectory` that reads its thresholds and the disabled flag from the live `ServerIndexConfig` on each `doCacheWrite`. Because `reloadIndexSettings()` reconfigures that same config instance in place, an `updateIndexSettings` change takes effect with no IndexWriter reopen. Lowering a cap or disabling only stops caching new segments, and already-cached bytes drain at the next commit. No `CreateIndexRequestValidator` change: defaults are applied at use time (like `ramBufferMB`), not baked into stored settings, which keeps "0 = current default" and is what makes the live subclass simple.
| Commit: | 99a5278 | |
|---|---|---|
| Author: | Matt Davis | |
Add segment replication Adds Lucene segment replication: a primary fans new commits out to its configured replica nodes via a per-shard generation gate, virtual-thread executor, and per-replica circuit breaker. Adds two internal gRPC RPCs: `InternalGetSegmentFileInfo` (unary: replica reports its current file list and `segments_N` footer checksums) and `InternalSendSegmentFiles` (bidi stream: primary pushes byte chunks). The receiver stages `segments_N` under a `pending-` prefix and atomically renames it on commit so a refresh mid-stream cannot observe a torn header. A 30s watchdog re-fires replication on every primary shard to recover from missed checkpoints (replica unreachable, circuit open, inbound semaphore exhausted); on a healthy cluster every tick hits the generation gate and returns. A `/replication/{indexName}` REST endpoint reports per-shard and per-replica progress (last attempted generation, lag, last-success timestamp, circuit-breaker state). The publisher ships every file the new commit references when its `segments_N` differs from the replica's, since segment-name reuse is per-index (a fresh replica's bootstrap `_0.si` can match the primary's by length but not by content). The receiver deletes any existing file before writing so the bootstrap files get clobbered on first replication. Wire-level: both replication RPCs advertise `gzip` compression on the client side; the replica response to `InternalGetSegmentFileInfo` includes the replica's `Version.LATEST` so the primary can reject mismatches before any byte transfer, requiring `replicaVersion >= primaryVersion` so operators can roll Lucene-version upgrades replica-first. Bytes leaving each node are gated by a shared `ReplicationRateLimiter` token bucket (default 60 MiB/s, `0` disables). Server changes: - `ZuliaIndex` owns the `SegmentReplicationManager` and the watchdog timer; registers a post-commit hook on each primary shard at load time. - `ShardWriteManager` wraps `KeepOnlyLastCommitDeletionPolicy` in a `SnapshotDeletionPolicy` and uses `SnapshotDirectoryTaxonomyWriter` so the manager can snapshot both directories for a consistent commit view. - `ZuliaShard` gains a read-only replica constructor (backed by `ShardReadManager`) and a `refreshReaders` entry point the applier calls after each stream completes. - `ZuliaIndexManager` exposes `getIndexFromName` and a new `getReplicationState(String)` for the REST controller and the segment-file handlers; constructs the shared `ReplicationRateLimiter` once from `ZuliaConfig` and threads it through every `ZuliaIndex`. - `InternalRpcConnection` adds an async stub for the bidi stream. - `ZuliaConfig` gains `replicaResponseTimeout` (minutes, default 10) for the bidi-stream deadline and `replicationMaxBytesPerSec` (default 60 MiB/s) for the per-node bandwidth cap. Internal structure inside the manager: - `ShardState` (per shard) holds the `ReentrantLock` and last attempted generation; `ReplicaState` (per replica per shard) holds the last replicated generation, last-success timestamp, and circuit-breaker counters with encapsulated synchronisation. - `CommitSnapshot` is an `AutoCloseable` record that owns the index + taxonomy `IndexCommit` pair for one replication pass; `tryTake` handles the "no commit yet" case and the receiver uses try-with-resources for release. - `ReplicationUtil` consolidates the shared `isSegmentsFile` predicate and footer-checksum helper used by both the publisher and the receiver. Tests: - End-to-end `ReplicationTest` across two nodes: create index with one shard and one replica, index docs, force commit via optimize, wait for replica convergence, verify `PRIMARY_ONLY` / `REPLICA_ONLY` queries match, and check the REST endpoint reports a populated state from the primary node. - Unit `SegmentReplicationManagerTest` covers the no-replica short-circuit, the observability snapshot shape before any attempt, idempotent shutdown, watchdog-after-shutdown silence, and that publisher exceptions can't fail the commit hook.
| Commit: | 7b39e24 | |
|---|---|---|
| Author: | Matt Davis | |
Merge origin/main into worktree-primary-replica-rename Conflict in `ZuliaIndexManager` (clear / optimize / reindex) — both branches modified the same federator construction lines: - main switched the local lookup from `getIndexFromName(...)` to `getIndexForWrite(...)` so write ops resolve through multi-index aliases - HEAD renamed `MasterSlaveSettings.MASTER_ONLY` to `PrimaryReplicaSettings.PRIMARY_ONLY` Resolved by keeping `getIndexForWrite(...)` from main alongside the `PrimaryReplicaSettings.PRIMARY_ONLY` rename from HEAD. `./gradlew compileJava compileTestJava` passes cleanly.
| Commit: | fb2abb8 | |
|---|---|---|
| Author: | Matt Davis | |
Rename MasterSlaveSettings to PrimaryReplicaSettings Renames the proto enum `MasterSlaveSettings` to `PrimaryReplicaSettings` and its values (`MASTER_ONLY`/`SLAVE_ONLY`/`MASTER_IF_AVAILABLE` -> `PRIMARY_ONLY`/`REPLICA_ONLY`/`PRIMARY_IF_AVAILABLE`). The old names are kept as deprecated aliases via `option allow_alias = true` so existing wire traffic and on-disk protobufs stay compatible. The Java accessors and field names change, so consumers of the generated client/server code must recompile. Also renames the supporting Java classes for consistency: - `MasterSlaveSelector` -> `PrimaryReplicaSelector` - `MasterSlaveNodeRequestFederator` -> `PrimaryReplicaNodeRequestFederator` - `getMasterSlaveSettings` / `setMasterSlaveSettings` -> `getPrimaryReplicaSettings` / `setPrimaryReplicaSettings` Rename only, no behavior changes.
| Commit: | 6423511 | |
|---|---|---|
| Author: | Matthew Davis | |
| Committer: | GitHub | |
Merge pull request #184 from zuliaio/worktree-multi-index-aliases Support multi-index aliases with designated write index
| Commit: | 8d845b6 | |
|---|---|---|
| Author: | Matt Davis | |
# Support multi-index aliases with designated write index * Extend `IndexAlias` so a single alias can point at multiple indexes and fan out queries across all of them. An optional `writeIndex` designates which member receives any single-index mutation (store, delete, batchDelete, clear, optimize, reindex, associated-doc writes, updateIndex, getIndexSettings), enabling the rollover / zero-downtime reindex pattern without juggling two aliases. * `IndexAlias` gains `repeated string indexNames` and `string writeIndex` * Old persisted aliases (only `indexName` set) resolve through the fallback in `IndexAliasUtil.getIndexNames`. New writes populate `indexNames` only. Old-format data stays readable, new-format data is written cleanly.
| Commit: | 2ee303f | |
|---|---|---|
| Author: | Matt Davis | |
Adds per-clause `boost` and a `VECTOR_SHOULD` query type that allows linear hybrid retrieval (BM25 + KNN). `Query.boost` is score multiplier applied per clause via Lucene `BoostQuery`. Proto default of 0 is treated as unset; 1.0 is treated as no-op (no wrapping). Ignored at runtime for non-scoring clause types. QueryType.VECTOR_SHOULD vector query contributes score but does not constrain the result set to the top-K
| Commit: | e5721d5 | |
|---|---|---|
| Author: | Matt Davis | |
Add configurable merge scheduler and indexing throttle settings Expose maxMergeThreads, maxMergePending, and indexingThrottle as per-index settings that can be configured at create time and updated live. Previously merge thread count was auto-detected and the indexing throttle semaphore was hardcoded to 1 permit. * Added maxMergeThreads (default 0 for auto-detect), maxMergePending (default 0 for 5 pending), and indexingThrottle (default 0 for 1 concurrency when throttled) to IndexSettings and UpdateIndexSettings proto messages * Removed hardcoded setMaxMergeAtOnce(4) from TieredMergePolicy (to be removed in Lucene 11) in favor of scheduler-level thread control
| Commit: | 0c4555d | |
|---|---|---|
| Author: | Matt Davis | |
Add index-level description field (This will support the upcoming MCP server by allowing LLMs to understand what each index contains)
| Commit: | 5d58859 | |
|---|---|---|
| Author: | Matthew Davis | |
| Committer: | GitHub | |
Merge pull request #172 from zuliaio/geo_location Add `GEO_POINT` field type with geo-spatial search support
| Commit: | 4f3354d | |
|---|---|---|
| Author: | Matt Davis | |
| Committer: | Matt Davis | |
* Add BatchDeleteGroup with federated routing, mirroring BatchFetch pattern * Cache MasterSlaveSelector per index in both BatchDeleteRequestFederator and BatchFetchRequestFederator * Add test cases for batch delete features
| Commit: | 2db32a0 | |
|---|---|---|
| Author: | Matt Davis | |
## Add `GEO_POINT` field type with geo-spatial search support - **New `GEO_POINT` field type** - `.index()` adds `LatLonPoint` for geo filtering (`zl:geo()`, `zl:geoBbox()`) - `.sort()` adds `LatLonDocValuesField` for `geodist()` sorting and score functions - Sorting and score functions can be used independently, consistent with other field types where doc values are gated behind `.sort()` - **Query parser additions** - `zl:geo` - `zl:geoBbox` - **New `geodist()` function** - Supported in both sort expressions and score functions - **Multi-shard geo sorting** - Implemented in `QueryCombiner` and `ZuliaPostSortingComparator` - **Client API additions** - `Sort.geoDistance()` - `FieldConfigBuilder.createGeoPoint()` - **Server-side validation** - Added in `CreateIndexRequestValidator` - `GEO_POINT` must enable `.index()` and/or `.sort()` - `GEO_POINT` cannot be faceted - **Build updates** - Updated `javacc.gradle` for Gradle 9.3.0 - Upgraded JavaCC to `7.0.13`
| Commit: | c5ae62b | |
|---|---|---|
| Author: | Matt Davis | |
### Add index size on disk and doc count to displayIndexes --detailed The `displayIndexes --detailed` CLI command now shows document count and index size in MB for each index alongside the shard location info. Index size is computed from the Lucene index and taxonomy directories on the filesystem. The size is collected per shard, summed per index on each node, and aggregated across nodes through the existing getNumberOfDocs federator. Use of the getNumberOfDocs request is not ideal from a naming perspective, but it makes sense codewise. Need a breaking change to rename this in a later release to getIndexDetails or similar
| Commit: | 69bc74b | |
|---|---|---|
| Author: | Matt Davis | |
| Committer: | Matt Davis | |
Skip index reload when settings haven't changed Add no-op detection to createIndex and updateIndex so unchanged settings skip the store and cluster-wide reload. Strip createTime/updateTime before comparing in createIndex to avoid false mismatches from protobuf defaults. Add `changed` boolean to CreateIndexResponse and UpdateIndexResponse, exposed via isChanged() on the client result classes. Fix createIndex to preserve createTime on re-create and updateIndex to set updateTime when settings change. Add IndexTest cases for changed flag and timestamp behavior.
| Commit: | 3f6794f | |
|---|---|---|
| Author: | Matt Davis | |
Lucene filter cache support with per-index disable option Add configurable Lucene filter (query) cache at the server level and a per-index disableFilterCache option to opt out individual indexes from caching. Server-level filter cache configuration: - New filterCacheMaxQueries (default 10000) and filterCacheMaxMemoryMB (default 256) settings in ZuliaConfig - ZuliaDConfig.setLuceneStatic() now accepts ZuliaConfig and configures IndexSearcher.setDefaultQueryCache() with an LRUQueryCache, or disables caching entirely when either value is 0 - Moved setLuceneStatic() call in StartNodeCmd to after config is loaded so it can read cache settings - Updated TestHelper to pass default ZuliaConfig to match new signature Per-index filter cache disable: - New disableFilterCache field in IndexSettings proto and corresponding update fields in UpdateIndexSettings - ShardReader sets indexSearcher.setQueryCache(null) when disableFilterCache is true, overriding the server default - ZuliaIndexManager handles the setDisableFilterCache update operation - ClientIndexConfig and UpdateIndex expose disableFilterCache getter/setter and wire it through to proto builders
| Commit: | 2fe956f | |
|---|---|---|
| Author: | Matt Davis | |
Merge remote-tracking branch 'origin/main' into conditional_facets # Conflicts: # zulia-server/src/test/java/io/zulia/server/test/node/GeneralFeaturesTest.java
| Commit: | d41a189 | |
|---|---|---|
| Author: | Matt Davis | |
Add per-facet/stat facet conditional aggregation based on total hits threshold Allow individual CountRequest and StatRequest to specify maxTotalHitsForFacet and maxShardHitsForFacet thresholds that skip expensive aggregation when hit counts are too high. If maxShardHitsForFacet is not set, the default is maxTotalHitsForFacet for the shard. Shard-level filtering in ShardReader prevents aggregation work early, while QueryCombiner applies the global threshold and drops incomplete combiners when shards disagree (preventing partial/incorrect counts).
| Commit: | 20dd2f1 | |
|---|---|---|
| Author: | Matt Davis | |
Optimize BatchFetch with TermInSetQuery and add BatchFetchGroup API Replace sequential per-document TermQuery lookups in batch fetch with batched TermInSetQuery per shard, reducing reader acquires and Lucene queries from N to S (number of shards). Add InternalBatchFetch RPC with pre-grouped shard-level requests (InternalShardBatchFetchRequest) to avoid redundant regrouping on receiving nodes and eliminate recursive call risk. Add BatchFetchGroup proto message and BatchFetchGroupBuilder client API for efficiently fetching multiple documents by index, ID list, and shared fetch profile (document fields, masked fields, associated fetch type) without creating individual FetchRequest objects per document.
| Commit: | bb178c2 | |
|---|---|---|
| Author: | Matt Davis | |
Include search ID and label in shard-level debug log lines for correlation
| Commit: | 16e306b | |
|---|---|---|
| Author: | Matthew Davis | |
| Committer: | GitHub | |
Add HTML stripping as an option for analyzer settings (#152) Add default standard analyzer with html stripping built in Refactor ZuliaPerFieldAnalyzer to split out the tokenization chain into ZuliaFieldAnalyzer Make sure in analysis requests that a custom analyzer used would be closed
| Commit: | f9a8101 | |
|---|---|---|
| Author: | Matthew Davis | |
| Committer: | GitHub | |
Adding individual facet fields, and facet groups optimizations, merge optimizations, and shard info to the query controller (#150) * Individual facet fields, and facet groups optimizations * mark which shard the result is returned by * adjust merge policy
| Commit: | 779343b | |
|---|---|---|
| Author: | Matthew Davis | |
| Committer: | GitHub | |
refactor stats api to include information about the cache stats for both the general and pinned caches. Change stats endpoint to use the proto object directly instead of the statsdto object. (#146)
| Commit: | 20c175a | |
|---|---|---|
| Author: | Matt Davis | |
bump mongo driver to 5.5.1 from 5.4.0 (performance gains in theory, but probably not much for the zulia use case) bump common compress to 1.28.0 from 1.27.1, fixes CVE-2025-48924 upgrade micronaut to 4.9.1 from 4.7.6 (virtual threads can be enabled, although experimental now https://micronaut.io/2025/06/30/transitioning-to-virtual-threads-using-the-micronaut-loom-carrier/) upgrade protobuf to 4.31.1 from 3.25.5 upgrade grpc to 1.73.0 from 1.72.0 import DDSketch from the ddsketch repo and disable using the version from the jar in the build with the protobuf syntax in the gradle to get around the poisoning due to the CVE until they fix it. Explicitly place the zulia commons project at the top of the classpath to use our version of ddsketches first.
| Commit: | 023f768 | |
|---|---|---|
| Author: | Matthew Davis | |
| Committer: | GitHub | |
Virtual threads search and aggregation (#140) * implement facet concurrency using virtual threads allow concurrency to be specified on search / index / or node level
| Commit: | b753485 | |
|---|---|---|
| Author: | Matt Davis | |
Allow searches that are not real time (see only what has been committed). Non realtime searches allow higher performance during indexing. Searches, fetches (documents only, not associated files), Get Terms, Get Fields all support the real time flag. The default for real time is false. Previous versions of Zulia were realtime=true and not modifiable.
| Commit: | aeeec4c | |
|---|---|---|
| Author: | Matthew Davis | |
| Committer: | GitHub | |
add german normalization filter and custom analyzer test case (#115)
| Commit: | 07f1d15 | |
|---|---|---|
| Author: | Matthew Davis | |
| Committer: | GitHub | |
add snappy compression for stored BSON documents and metadata by default. enable it to be turned off via an index flag in create/update index (#109)
| Commit: | 96e26d9 | |
|---|---|---|
| Author: | Matthew Davis | |
| Committer: | GitHub | |
Create index field mapping aliases (#102) * update proto for field aliases * add field mapping aliases
| Commit: | 20b54b8 | |
|---|---|---|
| Author: | Matthew Davis | |
switch stored docs to binary doc values
| Commit: | 811b05e | |
|---|---|---|
| Author: | Matthew Davis | |
add more complete facet drill down with all/any and exclude logic
| Commit: | c9f544c | |
|---|---|---|
| Author: | William Millman | |
| Committer: | GitHub | |
External document (#88) * Adding the ability to link external files to documents in zulia * Added a helper builder to ensure external S3 documents use the correct keys.
| Commit: | adb59cc | |
|---|---|---|
| Author: | Matthew Davis | |
reformat code
| Commit: | 699b645 | |
|---|---|---|
| Author: | Matthew Davis | |
Reformat all files using correct formatter
| Commit: | d4c8d70 | |
|---|---|---|
| Author: | Matthew Davis | |
improve index settings display on REST to show warmed searches add ability to see if search response is pinned change log line to show if search is returned from the cache
| Commit: | 9e4d6a3 | |
|---|---|---|
| Author: | Ian Swepston | |
| Committer: | GitHub | |
Better numeric behaviors and support for shards (#80) * Adds support for shards on StatFacet * Adds ability to request percentiles in a StatFacet and NumericStat * Adds ability to generate numeric filters with a factory and apply them to a query * Fixes some bugs in query validation of NumericStat in sharded environment * Added a change which will add a match all query if only exclude filters are specified. Also added filter tests for the numeric range filter
| Commit: | ad81341 | |
|---|---|---|
| Author: | Matthew Davis | |
| Committer: | GitHub | |
Cache updates (#79) * Implement cache pinning Searches that are pinned remain in cache until the index is changed or zulia is restarted. There is no limit to the number of searches that can be pinned to an index so watch out for RAM use. * Allow zulia client to know if query response was cached * Add warming searches to create index and update index with changes to proto files, java client, and backend. Warmed searches are stored as a bytes instead of a QueryRequest to reduce dependency between query and indexing and to allow a future incompatibility not to break indexing loading. Actual warming of searches is not part of this commit. * switch to caffiene for query caching (implements Window TinyLfu for near optimal hit rate) * add shard warming timer config and warm searches on timer. refactor shard query to allow the logic to be abstracted * ensure order of is kept on index updates of field config, warmed searches, .... * basic index creation test cases with warmed searches * Add framework to shard for displaying cache statistics (not usable by a client yet) * add numeric set query type that is like term query for numbers * upgrade to lucene 9.4 * bump gradle to 7.5.1 * update mongodb java driver to 4.7.2 (mongodb 6.0 feature support) * upgrade to micronaut 3.7.1 from 3.4.3 * bump snakeyaml for vulnerability patch * bump embedded mongo from 3.4.5 to 3.4.11 * bump protobuf to 3.21.7 * fix compile post protobuf bump to 3.21.7 (https://github.com/protocolbuffers/protobuf/issues/10695) * remove legacy parser and dismax features only supported through legacy parser * add swagger-ui to rest service * fix serving of index html page for the rest service with links to swagger and wiki
| Commit: | 394ef89 | |
|---|---|---|
| Author: | Matthew Davis | |
remove legacy parser and dismax features only supported through legacy parser
| Commit: | e050395 | |
|---|---|---|
| Author: | Matthew Davis | |
add numeric set query type that is like term query for numbers
| Commit: | c144daf | |
|---|---|---|
| Author: | Matt Davis | |
| Committer: | GitHub | |
add (partial) update index functionality to zulia (#72) * add (partial) update index functionality to zulia * document new features: -add update index java client documentation -document metadata in create index -vector queries * improve documentation for existing: -delete index with associated documents -term queries
| Commit: | 12f4300 | |
|---|---|---|
| Author: | Matthew Davis | |
add ability to set metadata bson on the zulia index
| Commit: | d1cf3b5 | |
|---|---|---|
| Author: | Matt Davis | |
| Committer: | GitHub | |
add vector searching from lucene 9.0 using KnnVectorQuery and KnnVectorField replacing the old zulia handling (#70) add vector searching from lucene 9.0 using KnnVectorQuery and KnnVectorField replacing the old zulia handling (#70)
| Commit: | 1ab1999 | |
|---|---|---|
| Author: | Matt Davis | |
| Committer: | GitHub | |
add default false flag to not delete associated files to delete index (#61) command line option to actually delete needs to be added still
| Commit: | fe88003 | |
|---|---|---|
| Author: | Karl | |
| Committer: | GitHub | |
Zulia Version Information (#54) * Improved versioning support. * Added version flag to command-line * Added version info to nodes * Added version header to getCurrentNodes
| Commit: | b56eb01 | |
|---|---|---|
| Author: | Matt Davis | |
| Committer: | GitHub | |
Lucene 9.0 (#52) * Update to lucene 9.0 * Update to grpc-java v1.43.2 and Protobuf to 3.19.2 to avoid CVE-2021-22569. * Update mongo java driver to 4.4.0 * Add better logging for channel close * Make sure to clean/close up internal connections * General warning cleanup, package cleanup, and refactoring for simplicity * Clear out the old closed nodes in the test helper * Fix java 17 deprecation warning * Update to gradle 7.4 * Add new zulia query parser based on org.apache.lucene.queryparser.flexible.standard * New parser supports minimum should match added via (term term2 term3)~n syntax * New parser supports multiple field search via field1,field2:... * New parser supports GTE/LTE/GT/LT operators like pubYear>=2020 * Legacy parser is still available with legacy option on Query * Legacy parser supports dismax but new one does not yet * new boolean facet handling maps to True/False only from true/t/y/yes f/false/n/no * change boolean sort to be numeric based doc values * Ensure field name cannot contain a comma * Add taxonomy stats support for a boolean by treating it like 0,1
| Commit: | 8d37c95 | |
|---|---|---|
| Author: | Matt Davis | |
| Committer: | GitHub | |
Index alias (#51) * Add index alias support to zulia * Minor cleanup of test cases and client
| Commit: | db5737e | |
|---|---|---|
| Author: | Payam Meyer | |
| Committer: | GitHub | |
#48 - Added allDocCounts so that documents with null or empty values for a facet get counted towards total count. (#49)
| Commit: | 09db76a | |
|---|---|---|
| Author: | Matthew Davis | |
add TaxonomyStatsHandler and wire into shard reader
| Commit: | 730b4c2 | |
|---|---|---|
| Author: | Matthew Davis | |
add stat request proto messages and refactor shardreader to allow easier coding of actual request
| Commit: | 0521f23 | |
|---|---|---|
| Author: | Matthew Davis | |
add hierarchical faceting to zulia
| Commit: | 2e5d45c | |
|---|---|---|
| Author: | Matthew Davis | |
add FILTER_NOT and TERMS_NOT query type on server add .exclude() and .include() to TermQuery And FilterQuery
| Commit: | 6e9909a | |
|---|---|---|
| Author: | Matthew Davis | |
Add new scored function api and a new java search builder to support new functionality (old query will be deprecated soon)
| Commit: | 4c7d476 | |
|---|---|---|
| Author: | Matthew Davis | |
refactor client protocol to allow more advanced types of queries add terms query using new refactor
| Commit: | 555d5fa | |
|---|---|---|
| Author: | Matthew Davis | |
change missing last to missing first
| Commit: | 8265230 | |
|---|---|---|
| Author: | Matthew Davis | |
fix sorting bugs with null(missing) values add ability to put missing first or last
| Commit: | 5ef21c8 | |
|---|---|---|
| Author: | Matt Davis | |
#19 add field display name and description as optional attributes of a field
| Commit: | f348195 | |
|---|---|---|
| Author: | Matt Davis | |
| Committer: | GitHub | |
Reindex (#16) * add reindex in place functionality
| Commit: | f845f1b | |
|---|---|---|
| Author: | Matt Davis | |
#8 switch metadata to be bson/json format
| Commit: | 874f021 | |
|---|---|---|
| Author: | Matt Davis | |
add case protected words filter
| Commit: | bf70085 | |
|---|---|---|
| Author: | Matt Davis | |
add option to set ram buffer to larger size
| Commit: | 57fd003 | |
|---|---|---|
| Author: | Matt Davis | |
make sure index writer and analyzer's are reopened
| Commit: | e25131d | |
|---|---|---|
| Author: | payammeyer | |
Fixed typo
| Commit: | adf975c | |
|---|---|---|
| Author: | Matt Davis | |
default value to tfidf (Breaking change to make proto3 work)
| Commit: | 4dcfe94 | |
|---|---|---|
| Author: | Matt Davis | |
#2 allow query field (qf) to have wildcards allow multiple default search fields
| Commit: | 0161f97 | |
|---|---|---|
| Author: | payammeyer | |
Updated docId to luceneShardId. Added query command to zulia.
| Commit: | a324892 | |
|---|---|---|
| Author: | payammeyer | |
Added scored filter query.
| Commit: | 9a1d659 | |
|---|---|---|
| Author: | Matt Davis | |
update to new gradle, fix compile issues
| Commit: | 6618cea | |
|---|---|---|
| Author: | Payam Meyer | |
Filled in query defaults and created QueryRequestValidator.
| Commit: | d91c272 | |
|---|---|---|
| Author: | Matt Davis | |
start filling in default values / validation
| Commit: | 75caa60 | |
|---|---|---|
| Author: | Matt Davis | |
set up index routing framework for internal requests
| Commit: | 004f0e7 | |
|---|---|---|
| Author: | Matt Davis | |
implement basic create index logic start to implement primary/replica search logic
| Commit: | f9fa517 | |
|---|---|---|
| Author: | Matt Davis | |
add internal create index and delete index request
| Commit: | 1b37e52 | |
|---|---|---|
| Author: | Matt Davis | |
make batch fetch/delete stream remove client side cache add optional fetching of all nodes instead of just live nodes in service call
| Commit: | 7748e82 | |
|---|---|---|
| Author: | Matt Davis | |
optimize updates
| Commit: | 8cf9937 | |
|---|---|---|
| Author: | Matt Davis | |
add back internal version of store, fetch, and delete
| Commit: | bb2a598 | |
|---|---|---|
| Author: | Matt Davis | |
refactor to use federator or router
| Commit: | 722f862 | |
|---|---|---|
| Author: | Matt Davis | |
continue refactor from lumongo
| Commit: | 839d68f | |
|---|---|---|
| Author: | Payam Meyer | |
Added handlers for server service
| Commit: | 08a08e0 | |
|---|---|---|
| Author: | Matt Davis | |
start internal client refactor with common pool 2.x and grpc and zulia
| Commit: | 527bd95 | |
|---|---|---|
| Author: | Matt Davis | |
start the removal of hazelcast
| Commit: | 89e9538 | |
|---|---|---|
| Author: | Matt Davis | |
add master slave settings
| Commit: | c60b6eb | |
|---|---|---|
| Author: | Payam Meyer | |
Finished porting the client.
| Commit: | fcebab5 | |
|---|---|---|
| Author: | Payam Meyer | |
Added REST resources, some minor refactor.
| Commit: | 271747d | |
|---|---|---|
| Author: | Matt Davis | |
beginning of the client port
| Commit: | 5f2704d | |
|---|---|---|
| Author: | Matt Davis | |
move over document storage from lumongo and some analysis (filter, analyzer, similarity) classes from lumongo
| Commit: | 0495913 | |
|---|---|---|
| Author: | Matt Davis | |
make sure segment is remained to shard
| Commit: | 4a41b70 | |
|---|---|---|
| Author: | Matt Davis | |
add start scripts add some configuration setup
| Commit: | a9cb044 | |
|---|---|---|
| Author: | Matt Davis | |
bring over proto files from lumongo and rearrange and make proto3 compat