Proto commits in zuliaio/zuliasearch

These 94 commits are when the Protocol Buffers files have changed:

Commit:05648d5
Author:Matt Davis

# Add per-field doc-values skip index (default on for new fields) Lucene 10.0 introduced the doc-values skip index (`DocValuesSkipper`), and 10.3/10.4/10.5 built a series of optimizations on top of it: block-skipping `SortedNumeric`/`SortedSet` range queries, dynamic-pruning numeric and sorted-set sorts, and skipper-backed `FieldExistsQuery`/`count()`/`rewrite()`. None of these fire unless the doc-values field was written with a skip index. Zulia stores every value as doc values and drives range queries and sorts off them, so adding this optimization helps many places. # Turn on doc-values skip index for new fields and the id field in new indexes * write `SortAs` sort doc-values via `SortedNumericDocValuesField.indexedField` / `SortedSetDocValuesField.indexedField` when the field has the flag set, otherwise the plain constructor * `zulia_index.proto`: add `optional bool docValueSkipIndex` to `FieldConfig`. `optional` so we can distinguish "unset" (default on for a new field) from an explicit opt-out. * Any new field defaults on unless it explicitly opts out * A field that already exists is frozen to its persisted flag. Lucene treats the skip index as part of the immutable field schema and the IndexWriter throws if it changes, so we never flip it on an existing field. * The built-in `zuliaId` sort field is not a `FieldConfig`, so it is gated on a new index-level version instead. `ShardDocumentIndexer` builds the id-sort skip index when `createdIndexVersion >= ZuliaIndexVersion.ID_SORT_DOC_VALUE_SKIP`, so new indexes get it and existing indexes keep the plain field. * `ZuliaIndexVersion` holds the current version plus per-feature introduction versions. It generalizes to other creation-time defaults that can't be expressed per field * Tests: `DocValueSkipIndexTest` (sort field type, skipper present in segment, id-sort skipper follows index version) and `DocValueSkipIndexPolicyTest` (per-field default-on / freeze policy). * The query side needs no changes * No index-format break. Existing indexes stay readable and keep working unchanged, but they do not get the optimization - To get the benefit on existing data, that data must be reindexed into a new index

The documentation is generated from this commit.

Commit:5b7f979
Author:Matt Davis

Add maxDocFreqPct to More-Like-This term selection with a default of 25

Commit:50adcd2
Author:Matt Davis

More-like-this: fetch source documents from other indexes * Add `sourceIndex` to more-like-this so the source documents resolved from (`documentId` can be fetched from indexes other than the ones the similarity search runs on. This lets you find documents in one index that are similar to a document living in a different index. * When given, the queried indexes are ignored for fetching; sources are fetched only from the source indexes, tried in order: the first source index that has the document wins on collision (same ordering rule the queried indexes used). Wildcards and aliases resolve as usual. * Source-doc exclusion (`includeSourceDocs=false`) is dropped when the source indexes are disjoint from the queried indexes: the source docs can't be in the queried results. * Add test cases for source index MLT queries

Commit:c4c47bc
Author:Matt Davis
Committer:Matt Davis

Add more-like-this (MLT) query support Adds a MORE_LIKE_THIS query type supporting lexical (Lucene MoreLikeThis), vector (KNN on averaged source vectors), and hybrid similarity search. - Sources are document IDs (fetched server-side), raw text, and/or raw vectors - MoreLikeThisLazyQuery defers MLT construction to per-shard rewrite so each shard uses its own IndexReader for IDF stats - Document IDs are resolved once on the federating node via batch fetch, and then resolved text and centroid vector are passed to shards in the query - When a doc ID exists in multiple queried indexes, the first index in request order wins - Centroid is normalized for UNIT_VECTOR fields - Validates fields, types, and cross-index consistency and caps source docs at 100 for stability - Requires at least one source (document IDs, like text, or like vectors) up front, and fails once on the federating node with a diagnostic message when source docs resolve to no text or vectors - Queries (all types) now fail fast with "Query requires at least one index" instead of silently returning zero hits when the index list is empty. The shared validator is unchanged, so warming searches, which imply their index, are unaffected - Wildcard index patterns and aliases are resolved via the same logic as regular queries - Source documents are excluded from results by default (includeSourceDocs to opt in) - minimum_should_match (mm) is honored on the per-field term disjunction - Realtime searches propagate the realtime flag to the source doc fetch so just-stored documents resolve - Client builder MoreLikeThisQuery with forText/forDocuments factories - REST access via existing queryJson parameter (protobuf Query JSON)

Commit:ffd5144
Author:Matt Davis

# Add zuliaadmin updateIndex command Add an `updateIndex` subcommand to the `zuliaadmin` CLI for changing settings on existing index(es) without recreating them, building on the new replica-count change support. - Uses the shared `--index` argument, so a single invocation can target one index, a comma-separated list, or a wildcard (e.g. `--index "log_*"`). Only the options supplied are changed; every other setting is left as-is. - Reports per index whether the update changed anything or was a no-op. - Fix a stale comment in `zulia_index.proto`: the default for `numberOfReplicas` is 0, not 1. ## Usage ``` zuliaadmin updateIndex --index myIndex --numberOfReplicas 2 zuliaadmin updateIndex --index "log_*" --numberOfReplicas 1 --indexWeight 5 ```

Commit:5b9b8b2
Author:Matt Davis

Merge remote-tracking branch 'origin/main' into worktree-replica-config # Conflicts: # zulia-common/src/main/proto/zulia_index.proto

Commit:59b4d22
Author:Matt Davis

# Support changing the replica count for an existing index Previously the replica count was fixed at index creation, and any attempt to change it threw "Cannot change replication factor for existing index yet". This adds the ability to grow or shrink the number of replicas on a live index via `UpdateIndex` (or by re-applying a changed config through `CreateIndex`).

Commit:15e068f
Author:Matt Davis

# Add configurable, live NRT cache sizes The index and taxonomy `NRTCachingDirectory` instances used hardcoded RAM cache sizes (50/150 MB for the index, 5/15 MB for the taxonomy) per shard. That ~165 MB/shard ceiling is oversized for many small or low-write indexes. Expose the sizes as per-index settings, allow disabling NRT caching entirely, and make the values changeable live. `ZuliaNRTCachingDirectory` is a subclass of Lucene `NRTCachingDirectory` that reads its thresholds and the disabled flag from the live `ServerIndexConfig` on each `doCacheWrite`. Because `reloadIndexSettings()` reconfigures that same config instance in place, an `updateIndexSettings` change takes effect with no IndexWriter reopen. Lowering a cap or disabling only stops caching new segments, and already-cached bytes drain at the next commit. No `CreateIndexRequestValidator` change: defaults are applied at use time (like `ramBufferMB`), not baked into stored settings, which keeps "0 = current default" and is what makes the live subclass simple.

Commit:99a5278
Author:Matt Davis

Add segment replication Adds Lucene segment replication: a primary fans new commits out to its configured replica nodes via a per-shard generation gate, virtual-thread executor, and per-replica circuit breaker. Adds two internal gRPC RPCs: `InternalGetSegmentFileInfo` (unary: replica reports its current file list and `segments_N` footer checksums) and `InternalSendSegmentFiles` (bidi stream: primary pushes byte chunks). The receiver stages `segments_N` under a `pending-` prefix and atomically renames it on commit so a refresh mid-stream cannot observe a torn header. A 30s watchdog re-fires replication on every primary shard to recover from missed checkpoints (replica unreachable, circuit open, inbound semaphore exhausted); on a healthy cluster every tick hits the generation gate and returns. A `/replication/{indexName}` REST endpoint reports per-shard and per-replica progress (last attempted generation, lag, last-success timestamp, circuit-breaker state). The publisher ships every file the new commit references when its `segments_N` differs from the replica's, since segment-name reuse is per-index (a fresh replica's bootstrap `_0.si` can match the primary's by length but not by content). The receiver deletes any existing file before writing so the bootstrap files get clobbered on first replication. Wire-level: both replication RPCs advertise `gzip` compression on the client side; the replica response to `InternalGetSegmentFileInfo` includes the replica's `Version.LATEST` so the primary can reject mismatches before any byte transfer, requiring `replicaVersion >= primaryVersion` so operators can roll Lucene-version upgrades replica-first. Bytes leaving each node are gated by a shared `ReplicationRateLimiter` token bucket (default 60 MiB/s, `0` disables). Server changes: - `ZuliaIndex` owns the `SegmentReplicationManager` and the watchdog timer; registers a post-commit hook on each primary shard at load time. - `ShardWriteManager` wraps `KeepOnlyLastCommitDeletionPolicy` in a `SnapshotDeletionPolicy` and uses `SnapshotDirectoryTaxonomyWriter` so the manager can snapshot both directories for a consistent commit view. - `ZuliaShard` gains a read-only replica constructor (backed by `ShardReadManager`) and a `refreshReaders` entry point the applier calls after each stream completes. - `ZuliaIndexManager` exposes `getIndexFromName` and a new `getReplicationState(String)` for the REST controller and the segment-file handlers; constructs the shared `ReplicationRateLimiter` once from `ZuliaConfig` and threads it through every `ZuliaIndex`. - `InternalRpcConnection` adds an async stub for the bidi stream. - `ZuliaConfig` gains `replicaResponseTimeout` (minutes, default 10) for the bidi-stream deadline and `replicationMaxBytesPerSec` (default 60 MiB/s) for the per-node bandwidth cap. Internal structure inside the manager: - `ShardState` (per shard) holds the `ReentrantLock` and last attempted generation; `ReplicaState` (per replica per shard) holds the last replicated generation, last-success timestamp, and circuit-breaker counters with encapsulated synchronisation. - `CommitSnapshot` is an `AutoCloseable` record that owns the index + taxonomy `IndexCommit` pair for one replication pass; `tryTake` handles the "no commit yet" case and the receiver uses try-with-resources for release. - `ReplicationUtil` consolidates the shared `isSegmentsFile` predicate and footer-checksum helper used by both the publisher and the receiver. Tests: - End-to-end `ReplicationTest` across two nodes: create index with one shard and one replica, index docs, force commit via optimize, wait for replica convergence, verify `PRIMARY_ONLY` / `REPLICA_ONLY` queries match, and check the REST endpoint reports a populated state from the primary node. - Unit `SegmentReplicationManagerTest` covers the no-replica short-circuit, the observability snapshot shape before any attempt, idempotent shutdown, watchdog-after-shutdown silence, and that publisher exceptions can't fail the commit hook.

Commit:7b39e24
Author:Matt Davis

Merge origin/main into worktree-primary-replica-rename Conflict in `ZuliaIndexManager` (clear / optimize / reindex) — both branches modified the same federator construction lines: - main switched the local lookup from `getIndexFromName(...)` to `getIndexForWrite(...)` so write ops resolve through multi-index aliases - HEAD renamed `MasterSlaveSettings.MASTER_ONLY` to `PrimaryReplicaSettings.PRIMARY_ONLY` Resolved by keeping `getIndexForWrite(...)` from main alongside the `PrimaryReplicaSettings.PRIMARY_ONLY` rename from HEAD. `./gradlew compileJava compileTestJava` passes cleanly.

Commit:fb2abb8
Author:Matt Davis

Rename MasterSlaveSettings to PrimaryReplicaSettings Renames the proto enum `MasterSlaveSettings` to `PrimaryReplicaSettings` and its values (`MASTER_ONLY`/`SLAVE_ONLY`/`MASTER_IF_AVAILABLE` -> `PRIMARY_ONLY`/`REPLICA_ONLY`/`PRIMARY_IF_AVAILABLE`). The old names are kept as deprecated aliases via `option allow_alias = true` so existing wire traffic and on-disk protobufs stay compatible. The Java accessors and field names change, so consumers of the generated client/server code must recompile. Also renames the supporting Java classes for consistency: - `MasterSlaveSelector` -> `PrimaryReplicaSelector` - `MasterSlaveNodeRequestFederator` -> `PrimaryReplicaNodeRequestFederator` - `getMasterSlaveSettings` / `setMasterSlaveSettings` -> `getPrimaryReplicaSettings` / `setPrimaryReplicaSettings` Rename only, no behavior changes.

Commit:6423511
Author:Matthew Davis
Committer:GitHub

Merge pull request #184 from zuliaio/worktree-multi-index-aliases Support multi-index aliases with designated write index

Commit:8d845b6
Author:Matt Davis

# Support multi-index aliases with designated write index * Extend `IndexAlias` so a single alias can point at multiple indexes and fan out queries across all of them. An optional `writeIndex` designates which member receives any single-index mutation (store, delete, batchDelete, clear, optimize, reindex, associated-doc writes, updateIndex, getIndexSettings), enabling the rollover / zero-downtime reindex pattern without juggling two aliases. * `IndexAlias` gains `repeated string indexNames` and `string writeIndex` * Old persisted aliases (only `indexName` set) resolve through the fallback in `IndexAliasUtil.getIndexNames`. New writes populate `indexNames` only. Old-format data stays readable, new-format data is written cleanly.

Commit:2ee303f
Author:Matt Davis

Adds per-clause `boost` and a `VECTOR_SHOULD` query type that allows linear hybrid retrieval (BM25 + KNN). `Query.boost` is score multiplier applied per clause via Lucene `BoostQuery`. Proto default of 0 is treated as unset; 1.0 is treated as no-op (no wrapping). Ignored at runtime for non-scoring clause types. QueryType.VECTOR_SHOULD vector query contributes score but does not constrain the result set to the top-K

Commit:e5721d5
Author:Matt Davis

Add configurable merge scheduler and indexing throttle settings Expose maxMergeThreads, maxMergePending, and indexingThrottle as per-index settings that can be configured at create time and updated live. Previously merge thread count was auto-detected and the indexing throttle semaphore was hardcoded to 1 permit. * Added maxMergeThreads (default 0 for auto-detect), maxMergePending (default 0 for 5 pending), and indexingThrottle (default 0 for 1 concurrency when throttled) to IndexSettings and UpdateIndexSettings proto messages * Removed hardcoded setMaxMergeAtOnce(4) from TieredMergePolicy (to be removed in Lucene 11) in favor of scheduler-level thread control

Commit:0c4555d
Author:Matt Davis

Add index-level description field (This will support the upcoming MCP server by allowing LLMs to understand what each index contains)

Commit:5d58859
Author:Matthew Davis
Committer:GitHub

Merge pull request #172 from zuliaio/geo_location Add `GEO_POINT` field type with geo-spatial search support

Commit:4f3354d
Author:Matt Davis
Committer:Matt Davis

* Add BatchDeleteGroup with federated routing, mirroring BatchFetch pattern * Cache MasterSlaveSelector per index in both BatchDeleteRequestFederator and BatchFetchRequestFederator * Add test cases for batch delete features

Commit:2db32a0
Author:Matt Davis

## Add `GEO_POINT` field type with geo-spatial search support - **New `GEO_POINT` field type** - `.index()` adds `LatLonPoint` for geo filtering (`zl:geo()`, `zl:geoBbox()`) - `.sort()` adds `LatLonDocValuesField` for `geodist()` sorting and score functions - Sorting and score functions can be used independently, consistent with other field types where doc values are gated behind `.sort()` - **Query parser additions** - `zl:geo` - `zl:geoBbox` - **New `geodist()` function** - Supported in both sort expressions and score functions - **Multi-shard geo sorting** - Implemented in `QueryCombiner` and `ZuliaPostSortingComparator` - **Client API additions** - `Sort.geoDistance()` - `FieldConfigBuilder.createGeoPoint()` - **Server-side validation** - Added in `CreateIndexRequestValidator` - `GEO_POINT` must enable `.index()` and/or `.sort()` - `GEO_POINT` cannot be faceted - **Build updates** - Updated `javacc.gradle` for Gradle 9.3.0 - Upgraded JavaCC to `7.0.13`

Commit:c5ae62b
Author:Matt Davis

### Add index size on disk and doc count to displayIndexes --detailed The `displayIndexes --detailed` CLI command now shows document count and index size in MB for each index alongside the shard location info. Index size is computed from the Lucene index and taxonomy directories on the filesystem. The size is collected per shard, summed per index on each node, and aggregated across nodes through the existing getNumberOfDocs federator. Use of the getNumberOfDocs request is not ideal from a naming perspective, but it makes sense codewise. Need a breaking change to rename this in a later release to getIndexDetails or similar

Commit:69bc74b
Author:Matt Davis
Committer:Matt Davis

Skip index reload when settings haven't changed Add no-op detection to createIndex and updateIndex so unchanged settings skip the store and cluster-wide reload. Strip createTime/updateTime before comparing in createIndex to avoid false mismatches from protobuf defaults. Add `changed` boolean to CreateIndexResponse and UpdateIndexResponse, exposed via isChanged() on the client result classes. Fix createIndex to preserve createTime on re-create and updateIndex to set updateTime when settings change. Add IndexTest cases for changed flag and timestamp behavior.

Commit:3f6794f
Author:Matt Davis

Lucene filter cache support with per-index disable option Add configurable Lucene filter (query) cache at the server level and a per-index disableFilterCache option to opt out individual indexes from caching. Server-level filter cache configuration: - New filterCacheMaxQueries (default 10000) and filterCacheMaxMemoryMB (default 256) settings in ZuliaConfig - ZuliaDConfig.setLuceneStatic() now accepts ZuliaConfig and configures IndexSearcher.setDefaultQueryCache() with an LRUQueryCache, or disables caching entirely when either value is 0 - Moved setLuceneStatic() call in StartNodeCmd to after config is loaded so it can read cache settings - Updated TestHelper to pass default ZuliaConfig to match new signature Per-index filter cache disable: - New disableFilterCache field in IndexSettings proto and corresponding update fields in UpdateIndexSettings - ShardReader sets indexSearcher.setQueryCache(null) when disableFilterCache is true, overriding the server default - ZuliaIndexManager handles the setDisableFilterCache update operation - ClientIndexConfig and UpdateIndex expose disableFilterCache getter/setter and wire it through to proto builders

Commit:2fe956f
Author:Matt Davis

Merge remote-tracking branch 'origin/main' into conditional_facets # Conflicts: # zulia-server/src/test/java/io/zulia/server/test/node/GeneralFeaturesTest.java

Commit:d41a189
Author:Matt Davis

Add per-facet/stat facet conditional aggregation based on total hits threshold Allow individual CountRequest and StatRequest to specify maxTotalHitsForFacet and maxShardHitsForFacet thresholds that skip expensive aggregation when hit counts are too high. If maxShardHitsForFacet is not set, the default is maxTotalHitsForFacet for the shard. Shard-level filtering in ShardReader prevents aggregation work early, while QueryCombiner applies the global threshold and drops incomplete combiners when shards disagree (preventing partial/incorrect counts).

Commit:20dd2f1
Author:Matt Davis

Optimize BatchFetch with TermInSetQuery and add BatchFetchGroup API Replace sequential per-document TermQuery lookups in batch fetch with batched TermInSetQuery per shard, reducing reader acquires and Lucene queries from N to S (number of shards). Add InternalBatchFetch RPC with pre-grouped shard-level requests (InternalShardBatchFetchRequest) to avoid redundant regrouping on receiving nodes and eliminate recursive call risk. Add BatchFetchGroup proto message and BatchFetchGroupBuilder client API for efficiently fetching multiple documents by index, ID list, and shared fetch profile (document fields, masked fields, associated fetch type) without creating individual FetchRequest objects per document.

Commit:bb178c2
Author:Matt Davis

Include search ID and label in shard-level debug log lines for correlation

Commit:16e306b
Author:Matthew Davis
Committer:GitHub

Add HTML stripping as an option for analyzer settings (#152) Add default standard analyzer with html stripping built in Refactor ZuliaPerFieldAnalyzer to split out the tokenization chain into ZuliaFieldAnalyzer Make sure in analysis requests that a custom analyzer used would be closed

Commit:f9a8101
Author:Matthew Davis
Committer:GitHub

Adding individual facet fields, and facet groups optimizations, merge optimizations, and shard info to the query controller (#150) * Individual facet fields, and facet groups optimizations * mark which shard the result is returned by * adjust merge policy

Commit:779343b
Author:Matthew Davis
Committer:GitHub

refactor stats api to include information about the cache stats for both the general and pinned caches. Change stats endpoint to use the proto object directly instead of the statsdto object. (#146)

Commit:20c175a
Author:Matt Davis

bump mongo driver to 5.5.1 from 5.4.0 (performance gains in theory, but probably not much for the zulia use case) bump common compress to 1.28.0 from 1.27.1, fixes CVE-2025-48924 upgrade micronaut to 4.9.1 from 4.7.6 (virtual threads can be enabled, although experimental now https://micronaut.io/2025/06/30/transitioning-to-virtual-threads-using-the-micronaut-loom-carrier/) upgrade protobuf to 4.31.1 from 3.25.5 upgrade grpc to 1.73.0 from 1.72.0 import DDSketch from the ddsketch repo and disable using the version from the jar in the build with the protobuf syntax in the gradle to get around the poisoning due to the CVE until they fix it. Explicitly place the zulia commons project at the top of the classpath to use our version of ddsketches first.

Commit:023f768
Author:Matthew Davis
Committer:GitHub

Virtual threads search and aggregation (#140) * implement facet concurrency using virtual threads allow concurrency to be specified on search / index / or node level

Commit:b753485
Author:Matt Davis

Allow searches that are not real time (see only what has been committed). Non realtime searches allow higher performance during indexing. Searches, fetches (documents only, not associated files), Get Terms, Get Fields all support the real time flag. The default for real time is false. Previous versions of Zulia were realtime=true and not modifiable.

Commit:aeeec4c
Author:Matthew Davis
Committer:GitHub

add german normalization filter and custom analyzer test case (#115)

Commit:07f1d15
Author:Matthew Davis
Committer:GitHub

add snappy compression for stored BSON documents and metadata by default. enable it to be turned off via an index flag in create/update index (#109)

Commit:96e26d9
Author:Matthew Davis
Committer:GitHub

Create index field mapping aliases (#102) * update proto for field aliases * add field mapping aliases

Commit:20b54b8
Author:Matthew Davis

switch stored docs to binary doc values

Commit:811b05e
Author:Matthew Davis

add more complete facet drill down with all/any and exclude logic

Commit:c9f544c
Author:William Millman
Committer:GitHub

External document (#88) * Adding the ability to link external files to documents in zulia * Added a helper builder to ensure external S3 documents use the correct keys.

Commit:adb59cc
Author:Matthew Davis

reformat code

Commit:699b645
Author:Matthew Davis

Reformat all files using correct formatter

Commit:d4c8d70
Author:Matthew Davis

improve index settings display on REST to show warmed searches add ability to see if search response is pinned change log line to show if search is returned from the cache

Commit:9e4d6a3
Author:Ian Swepston
Committer:GitHub

Better numeric behaviors and support for shards (#80) * Adds support for shards on StatFacet * Adds ability to request percentiles in a StatFacet and NumericStat * Adds ability to generate numeric filters with a factory and apply them to a query * Fixes some bugs in query validation of NumericStat in sharded environment * Added a change which will add a match all query if only exclude filters are specified. Also added filter tests for the numeric range filter

Commit:ad81341
Author:Matthew Davis
Committer:GitHub

Cache updates (#79) * Implement cache pinning Searches that are pinned remain in cache until the index is changed or zulia is restarted. There is no limit to the number of searches that can be pinned to an index so watch out for RAM use. * Allow zulia client to know if query response was cached * Add warming searches to create index and update index with changes to proto files, java client, and backend. Warmed searches are stored as a bytes instead of a QueryRequest to reduce dependency between query and indexing and to allow a future incompatibility not to break indexing loading. Actual warming of searches is not part of this commit. * switch to caffiene for query caching (implements Window TinyLfu for near optimal hit rate) * add shard warming timer config and warm searches on timer. refactor shard query to allow the logic to be abstracted * ensure order of is kept on index updates of field config, warmed searches, .... * basic index creation test cases with warmed searches * Add framework to shard for displaying cache statistics (not usable by a client yet) * add numeric set query type that is like term query for numbers * upgrade to lucene 9.4 * bump gradle to 7.5.1 * update mongodb java driver to 4.7.2 (mongodb 6.0 feature support) * upgrade to micronaut 3.7.1 from 3.4.3 * bump snakeyaml for vulnerability patch * bump embedded mongo from 3.4.5 to 3.4.11 * bump protobuf to 3.21.7 * fix compile post protobuf bump to 3.21.7 (https://github.com/protocolbuffers/protobuf/issues/10695) * remove legacy parser and dismax features only supported through legacy parser * add swagger-ui to rest service * fix serving of index html page for the rest service with links to swagger and wiki

Commit:394ef89
Author:Matthew Davis

remove legacy parser and dismax features only supported through legacy parser

Commit:e050395
Author:Matthew Davis

add numeric set query type that is like term query for numbers

Commit:c144daf
Author:Matt Davis
Committer:GitHub

add (partial) update index functionality to zulia (#72) * add (partial) update index functionality to zulia * document new features: -add update index java client documentation -document metadata in create index -vector queries * improve documentation for existing: -delete index with associated documents -term queries

Commit:12f4300
Author:Matthew Davis

add ability to set metadata bson on the zulia index

Commit:d1cf3b5
Author:Matt Davis
Committer:GitHub

add vector searching from lucene 9.0 using KnnVectorQuery and KnnVectorField replacing the old zulia handling (#70) add vector searching from lucene 9.0 using KnnVectorQuery and KnnVectorField replacing the old zulia handling (#70)

Commit:1ab1999
Author:Matt Davis
Committer:GitHub

add default false flag to not delete associated files to delete index (#61) command line option to actually delete needs to be added still

Commit:fe88003
Author:Karl
Committer:GitHub

Zulia Version Information (#54) * Improved versioning support. * Added version flag to command-line * Added version info to nodes * Added version header to getCurrentNodes

Commit:b56eb01
Author:Matt Davis
Committer:GitHub

Lucene 9.0 (#52) * Update to lucene 9.0 * Update to grpc-java v1.43.2 and Protobuf to 3.19.2 to avoid CVE-2021-22569. * Update mongo java driver to 4.4.0 * Add better logging for channel close * Make sure to clean/close up internal connections * General warning cleanup, package cleanup, and refactoring for simplicity * Clear out the old closed nodes in the test helper * Fix java 17 deprecation warning * Update to gradle 7.4 * Add new zulia query parser based on org.apache.lucene.queryparser.flexible.standard * New parser supports minimum should match added via (term term2 term3)~n syntax * New parser supports multiple field search via field1,field2:... * New parser supports GTE/LTE/GT/LT operators like pubYear>=2020 * Legacy parser is still available with legacy option on Query * Legacy parser supports dismax but new one does not yet * new boolean facet handling maps to True/False only from true/t/y/yes f/false/n/no * change boolean sort to be numeric based doc values * Ensure field name cannot contain a comma * Add taxonomy stats support for a boolean by treating it like 0,1

Commit:8d37c95
Author:Matt Davis
Committer:GitHub

Index alias (#51) * Add index alias support to zulia * Minor cleanup of test cases and client

Commit:db5737e
Author:Payam Meyer
Committer:GitHub

#48 - Added allDocCounts so that documents with null or empty values for a facet get counted towards total count. (#49)

Commit:09db76a
Author:Matthew Davis

add TaxonomyStatsHandler and wire into shard reader

Commit:730b4c2
Author:Matthew Davis

add stat request proto messages and refactor shardreader to allow easier coding of actual request

Commit:0521f23
Author:Matthew Davis

add hierarchical faceting to zulia

Commit:2e5d45c
Author:Matthew Davis

add FILTER_NOT and TERMS_NOT query type on server add .exclude() and .include() to TermQuery And FilterQuery

Commit:6e9909a
Author:Matthew Davis

Add new scored function api and a new java search builder to support new functionality (old query will be deprecated soon)

Commit:4c7d476
Author:Matthew Davis

refactor client protocol to allow more advanced types of queries add terms query using new refactor

Commit:555d5fa
Author:Matthew Davis

change missing last to missing first

Commit:8265230
Author:Matthew Davis

fix sorting bugs with null(missing) values add ability to put missing first or last

Commit:5ef21c8
Author:Matt Davis

#19 add field display name and description as optional attributes of a field

Commit:f348195
Author:Matt Davis
Committer:GitHub

Reindex (#16) * add reindex in place functionality

Commit:f845f1b
Author:Matt Davis

#8 switch metadata to be bson/json format

Commit:874f021
Author:Matt Davis

add case protected words filter

Commit:bf70085
Author:Matt Davis

add option to set ram buffer to larger size

Commit:57fd003
Author:Matt Davis

make sure index writer and analyzer's are reopened

Commit:e25131d
Author:payammeyer

Fixed typo

Commit:adf975c
Author:Matt Davis

default value to tfidf (Breaking change to make proto3 work)

Commit:4dcfe94
Author:Matt Davis

#2 allow query field (qf) to have wildcards allow multiple default search fields

Commit:0161f97
Author:payammeyer

Updated docId to luceneShardId. Added query command to zulia.

Commit:a324892
Author:payammeyer

Added scored filter query.

Commit:9a1d659
Author:Matt Davis

update to new gradle, fix compile issues

Commit:6618cea
Author:Payam Meyer

Filled in query defaults and created QueryRequestValidator.

Commit:d91c272
Author:Matt Davis

start filling in default values / validation

Commit:75caa60
Author:Matt Davis

set up index routing framework for internal requests

Commit:004f0e7
Author:Matt Davis

implement basic create index logic start to implement primary/replica search logic

Commit:f9fa517
Author:Matt Davis

add internal create index and delete index request

Commit:1b37e52
Author:Matt Davis

make batch fetch/delete stream remove client side cache add optional fetching of all nodes instead of just live nodes in service call

Commit:7748e82
Author:Matt Davis

optimize updates

Commit:8cf9937
Author:Matt Davis

add back internal version of store, fetch, and delete

Commit:bb2a598
Author:Matt Davis

refactor to use federator or router

Commit:722f862
Author:Matt Davis

continue refactor from lumongo

Commit:839d68f
Author:Payam Meyer

Added handlers for server service

Commit:08a08e0
Author:Matt Davis

start internal client refactor with common pool 2.x and grpc and zulia

Commit:527bd95
Author:Matt Davis

start the removal of hazelcast

Commit:89e9538
Author:Matt Davis

add master slave settings

Commit:c60b6eb
Author:Payam Meyer

Finished porting the client.

Commit:fcebab5
Author:Payam Meyer

Added REST resources, some minor refactor.

Commit:271747d
Author:Matt Davis

beginning of the client port

Commit:5f2704d
Author:Matt Davis

move over document storage from lumongo and some analysis (filter, analyzer, similarity) classes from lumongo

Commit:0495913
Author:Matt Davis

make sure segment is remained to shard

Commit:4a41b70
Author:Matt Davis

add start scripts add some configuration setup

Commit:a9cb044
Author:Matt Davis

bring over proto files from lumongo and rearrange and make proto3 compat