Proto commits in zuliaio/zuliasearch

These 94 commits are when the Protocol Buffers files have changed:

2026-06-22

Commit:	05648d5
Author:	Matt Davis	2026-06-22 14:17:14 -0400

# Add per-field doc-values skip index (default on for new fields) Lucene 10.0 introduced the doc-values skip index (`DocValuesSkipper`), and 10.3/10.4/10.5 built a series of optimizations on top of it: block-skipping `SortedNumeric`/`SortedSet` range queries, dynamic-pruning numeric and sorted-set sorts, and skipper-backed `FieldExistsQuery`/`count()`/`rewrite()`. None of these fire unless the doc-values field was written with a skip index. Zulia stores every value as doc values and drives range queries and sorts off them, so adding this optimization helps many places. # Turn on doc-values skip index for new fields and the id field in new indexes * write `SortAs` sort doc-values via `SortedNumericDocValuesField.indexedField` / `SortedSetDocValuesField.indexedField` when the field has the flag set, otherwise the plain constructor * `zulia_index.proto`: add `optional bool docValueSkipIndex` to `FieldConfig`. `optional` so we can distinguish "unset" (default on for a new field) from an explicit opt-out. * Any new field defaults on unless it explicitly opts out * A field that already exists is frozen to its persisted flag. Lucene treats the skip index as part of the immutable field schema and the IndexWriter throws if it changes, so we never flip it on an existing field. * The built-in `zuliaId` sort field is not a `FieldConfig`, so it is gated on a new index-level version instead. `ShardDocumentIndexer` builds the id-sort skip index when `createdIndexVersion >= ZuliaIndexVersion.ID_SORT_DOC_VALUE_SKIP`, so new indexes get it and existing indexes keep the plain field. * `ZuliaIndexVersion` holds the current version plus per-feature introduction versions. It generalizes to other creation-time defaults that can't be expressed per field * Tests: `DocValueSkipIndexTest` (sort field type, skipper present in segment, id-sort skipper follows index version) and `DocValueSkipIndexPolicyTest` (per-field default-on / freeze policy). * The query side needs no changes * No index-format break. Existing indexes stay readable and keep working unchanged, but they do not get the optimization - To get the benefit on existing data, that data must be reindexed into a new index

The documentation is generated from this commit.

2026-06-18

Commit:	5b7f979
Author:	Matt Davis	2026-06-18 16:24:39 -0400

Add maxDocFreqPct to More-Like-This term selection with a default of 25

2026-06-16

Commit:	50adcd2
Author:	Matt Davis	2026-06-16 12:19:42 -0400

More-like-this: fetch source documents from other indexes * Add `sourceIndex` to more-like-this so the source documents resolved from (`documentId` can be fetched from indexes other than the ones the similarity search runs on. This lets you find documents in one index that are similar to a document living in a different index. * When given, the queried indexes are ignored for fetching; sources are fetched only from the source indexes, tried in order: the first source index that has the document wins on collision (same ordering rule the queried indexes used). Wildcards and aliases resolve as usual. * Source-doc exclusion (`includeSourceDocs=false`) is dropped when the source indexes are disjoint from the queried indexes: the source docs can't be in the queried results. * Add test cases for source index MLT queries

2026-06-11

Commit:	c4c47bc
Author:	Matt Davis	2026-06-11 13:57:29 -0400
Committer:	Matt Davis	2026-06-11 14:01:58 -0400

Add more-like-this (MLT) query support Adds a MORE_LIKE_THIS query type supporting lexical (Lucene MoreLikeThis), vector (KNN on averaged source vectors), and hybrid similarity search. - Sources are document IDs (fetched server-side), raw text, and/or raw vectors - MoreLikeThisLazyQuery defers MLT construction to per-shard rewrite so each shard uses its own IndexReader for IDF stats - Document IDs are resolved once on the federating node via batch fetch, and then resolved text and centroid vector are passed to shards in the query - When a doc ID exists in multiple queried indexes, the first index in request order wins - Centroid is normalized for UNIT_VECTOR fields - Validates fields, types, and cross-index consistency and caps source docs at 100 for stability - Requires at least one source (document IDs, like text, or like vectors) up front, and fails once on the federating node with a diagnostic message when source docs resolve to no text or vectors - Queries (all types) now fail fast with "Query requires at least one index" instead of silently returning zero hits when the index list is empty. The shared validator is unchanged, so warming searches, which imply their index, are unaffected - Wildcard index patterns and aliases are resolved via the same logic as regular queries - Source documents are excluded from results by default (includeSourceDocs to opt in) - minimum_should_match (mm) is honored on the per-field term disjunction - Realtime searches propagate the realtime flag to the source doc fetch so just-stored documents resolve - Client builder MoreLikeThisQuery with forText/forDocuments factories - REST access via existing queryJson parameter (protobuf Query JSON)

2026-05-25

Commit:	ffd5144
Author:	Matt Davis	2026-05-25 12:33:27 -0400

# Add zuliaadmin updateIndex command Add an `updateIndex` subcommand to the `zuliaadmin` CLI for changing settings on existing index(es) without recreating them, building on the new replica-count change support. - Uses the shared `--index` argument, so a single invocation can target one index, a comma-separated list, or a wildcard (e.g. `--index "log_*"`). Only the options supplied are changed; every other setting is left as-is. - Reports per index whether the update changed anything or was a no-op. - Fix a stale comment in `zulia_index.proto`: the default for `numberOfReplicas` is 0, not 1. ## Usage ``` zuliaadmin updateIndex --index myIndex --numberOfReplicas 2 zuliaadmin updateIndex --index "log_*" --numberOfReplicas 1 --indexWeight 5 ```

Commit:	5b9b8b2
Author:	Matt Davis	2026-05-25 12:20:33 -0400

Merge remote-tracking branch 'origin/main' into worktree-replica-config # Conflicts: # zulia-common/src/main/proto/zulia_index.proto

Commit:	59b4d22
Author:	Matt Davis	2026-05-25 11:57:07 -0400

# Support changing the replica count for an existing index Previously the replica count was fixed at index creation, and any attempt to change it threw "Cannot change replication factor for existing index yet". This adds the ability to grow or shrink the number of replicas on a live index via `UpdateIndex` (or by re-applying a changed config through `CreateIndex`).

Commit:	15e068f
Author:	Matt Davis	2026-05-24 20:13:08 -0400

# Add configurable, live NRT cache sizes The index and taxonomy `NRTCachingDirectory` instances used hardcoded RAM cache sizes (50/150 MB for the index, 5/15 MB for the taxonomy) per shard. That ~165 MB/shard ceiling is oversized for many small or low-write indexes. Expose the sizes as per-index settings, allow disabling NRT caching entirely, and make the values changeable live. `ZuliaNRTCachingDirectory` is a subclass of Lucene `NRTCachingDirectory` that reads its thresholds and the disabled flag from the live `ServerIndexConfig` on each `doCacheWrite`. Because `reloadIndexSettings()` reconfigures that same config instance in place, an `updateIndexSettings` change takes effect with no IndexWriter reopen. Lowering a cap or disabling only stops caching new segments, and already-cached bytes drain at the next commit. No `CreateIndexRequestValidator` change: defaults are applied at use time (like `ramBufferMB`), not baked into stored settings, which keeps "0 = current default" and is what makes the live subclass simple.

2026-05-19

Commit:	99a5278
Author:	Matt Davis	2026-05-19 17:18:35 -0400

Add segment replication Adds Lucene segment replication: a primary fans new commits out to its configured replica nodes via a per-shard generation gate, virtual-thread executor, and per-replica circuit breaker. Adds two internal gRPC RPCs: `InternalGetSegmentFileInfo` (unary: replica reports its current file list and `segments_N` footer checksums) and `InternalSendSegmentFiles` (bidi stream: primary pushes byte chunks). The receiver stages `segments_N` under a `pending-` prefix and atomically renames it on commit so a refresh mid-stream cannot observe a torn header. A 30s watchdog re-fires replication on every primary shard to recover from missed checkpoints (replica unreachable, circuit open, inbound semaphore exhausted); on a healthy cluster every tick hits the generation gate and returns. A `/replication/{indexName}` REST endpoint reports per-shard and per-replica progress (last attempted generation, lag, last-success timestamp, circuit-breaker state). The publisher ships every file the new commit references when its `segments_N` differs from the replica's, since segment-name reuse is per-index (a fresh replica's bootstrap `_0.si` can match the primary's by length but not by content). The receiver deletes any existing file before writing so the bootstrap files get clobbered on first replication. Wire-level: both replication RPCs advertise `gzip` compression on the client side; the replica response to `InternalGetSegmentFileInfo` includes the replica's `Version.LATEST` so the primary can reject mismatches before any byte transfer, requiring `replicaVersion >= primaryVersion` so operators can roll Lucene-version upgrades replica-first. Bytes leaving each node are gated by a shared `ReplicationRateLimiter` token bucket (default 60 MiB/s, `0` disables). Server changes: - `ZuliaIndex` owns the `SegmentReplicationManager` and the watchdog timer; registers a post-commit hook on each primary shard at load time. - `ShardWriteManager` wraps `KeepOnlyLastCommitDeletionPolicy` in a `SnapshotDeletionPolicy` and uses `SnapshotDirectoryTaxonomyWriter` so the manager can snapshot both directories for a consistent commit view. - `ZuliaShard` gains a read-only replica constructor (backed by `ShardReadManager`) and a `refreshReaders` entry point the applier calls after each stream completes. - `ZuliaIndexManager` exposes `getIndexFromName` and a new `getReplicationState(String)` for the REST controller and the segment-file handlers; constructs the shared `ReplicationRateLimiter` once from `ZuliaConfig` and threads it through every `ZuliaIndex`. - `InternalRpcConnection` adds an async stub for the bidi stream. - `ZuliaConfig` gains `replicaResponseTimeout` (minutes, default 10) for the bidi-stream deadline and `replicationMaxBytesPerSec` (default 60 MiB/s) for the per-node bandwidth cap. Internal structure inside the manager: - `ShardState` (per shard) holds the `ReentrantLock` and last attempted generation; `ReplicaState` (per replica per shard) holds the last replicated generation, last-success timestamp, and circuit-breaker counters with encapsulated synchronisation. - `CommitSnapshot` is an `AutoCloseable` record that owns the index + taxonomy `IndexCommit` pair for one replication pass; `tryTake` handles the "no commit yet" case and the receiver uses try-with-resources for release. - `ReplicationUtil` consolidates the shared `isSegmentsFile` predicate and footer-checksum helper used by both the publisher and the receiver. Tests: - End-to-end `ReplicationTest` across two nodes: create index with one shard and one replica, index docs, force commit via optimize, wait for replica convergence, verify `PRIMARY_ONLY` / `REPLICA_ONLY` queries match, and check the REST endpoint reports a populated state from the primary node. - Unit `SegmentReplicationManagerTest` covers the no-replica short-circuit, the observability snapshot shape before any attempt, idempotent shutdown, watchdog-after-shutdown silence, and that publisher exceptions can't fail the commit hook.

2026-05-14

Commit:	7b39e24
Author:	Matt Davis	2026-05-14 10:54:35 -0400

Merge origin/main into worktree-primary-replica-rename Conflict in `ZuliaIndexManager` (clear / optimize / reindex) — both branches modified the same federator construction lines: - main switched the local lookup from `getIndexFromName(...)` to `getIndexForWrite(...)` so write ops resolve through multi-index aliases - HEAD renamed `MasterSlaveSettings.MASTER_ONLY` to `PrimaryReplicaSettings.PRIMARY_ONLY` Resolved by keeping `getIndexForWrite(...)` from main alongside the `PrimaryReplicaSettings.PRIMARY_ONLY` rename from HEAD. `./gradlew compileJava compileTestJava` passes cleanly.

2026-05-04

Commit:	fb2abb8
Author:	Matt Davis	2026-05-04 12:35:27 -0400

Rename MasterSlaveSettings to PrimaryReplicaSettings Renames the proto enum `MasterSlaveSettings` to `PrimaryReplicaSettings` and its values (`MASTER_ONLY`/`SLAVE_ONLY`/`MASTER_IF_AVAILABLE` -> `PRIMARY_ONLY`/`REPLICA_ONLY`/`PRIMARY_IF_AVAILABLE`). The old names are kept as deprecated aliases via `option allow_alias = true` so existing wire traffic and on-disk protobufs stay compatible. The Java accessors and field names change, so consumers of the generated client/server code must recompile. Also renames the supporting Java classes for consistency: - `MasterSlaveSelector` -> `PrimaryReplicaSelector` - `MasterSlaveNodeRequestFederator` -> `PrimaryReplicaNodeRequestFederator` - `getMasterSlaveSettings` / `setMasterSlaveSettings` -> `getPrimaryReplicaSettings` / `setPrimaryReplicaSettings` Rename only, no behavior changes.

2026-04-23

Commit:	6423511
Author:	Matthew Davis	2026-04-23 16:56:21 -0400
Committer:	GitHub	2026-04-23 16:56:21 -0400

Merge pull request #184 from zuliaio/worktree-multi-index-aliases Support multi-index aliases with designated write index

Commit:	8d845b6
Author:	Matt Davis	2026-04-23 15:01:42 -0400

# Support multi-index aliases with designated write index * Extend `IndexAlias` so a single alias can point at multiple indexes and fan out queries across all of them. An optional `writeIndex` designates which member receives any single-index mutation (store, delete, batchDelete, clear, optimize, reindex, associated-doc writes, updateIndex, getIndexSettings), enabling the rollover / zero-downtime reindex pattern without juggling two aliases. * `IndexAlias` gains `repeated string indexNames` and `string writeIndex` * Old persisted aliases (only `indexName` set) resolve through the fallback in `IndexAliasUtil.getIndexNames`. New writes populate `indexNames` only. Old-format data stays readable, new-format data is written cleanly.

Commit:	2ee303f
Author:	Matt Davis	2026-04-23 13:53:36 -0400

Adds per-clause `boost` and a `VECTOR_SHOULD` query type that allows linear hybrid retrieval (BM25 + KNN). `Query.boost` is score multiplier applied per clause via Lucene `BoostQuery`. Proto default of 0 is treated as unset; 1.0 is treated as no-op (no wrapping). Ignored at runtime for non-scoring clause types. QueryType.VECTOR_SHOULD vector query contributes score but does not constrain the result set to the top-K

2026-04-07

Commit:	e5721d5
Author:	Matt Davis	2026-04-07 14:48:54 -0400

Add configurable merge scheduler and indexing throttle settings Expose maxMergeThreads, maxMergePending, and indexingThrottle as per-index settings that can be configured at create time and updated live. Previously merge thread count was auto-detected and the indexing throttle semaphore was hardcoded to 1 permit. * Added maxMergeThreads (default 0 for auto-detect), maxMergePending (default 0 for 5 pending), and indexingThrottle (default 0 for 1 concurrency when throttled) to IndexSettings and UpdateIndexSettings proto messages * Removed hardcoded setMaxMergeAtOnce(4) from TieredMergePolicy (to be removed in Lucene 11) in favor of scheduler-level thread control

2026-03-20

Commit:	0c4555d
Author:	Matt Davis	2026-03-20 11:08:16 -0400

Add index-level description field (This will support the upcoming MCP server by allowing LLMs to understand what each index contains)

2026-03-11

Commit:	5d58859
Author:	Matthew Davis	2026-03-11 16:31:28 -0400
Committer:	GitHub	2026-03-11 16:31:28 -0400

Merge pull request #172 from zuliaio/geo_location Add `GEO_POINT` field type with geo-spatial search support

Commit:	4f3354d
Author:	Matt Davis	2026-03-11 13:31:58 -0400
Committer:	Matt Davis	2026-03-11 13:32:19 -0400

* Add BatchDeleteGroup with federated routing, mirroring BatchFetch pattern * Cache MasterSlaveSelector per index in both BatchDeleteRequestFederator and BatchFetchRequestFederator * Add test cases for batch delete features

2026-03-10

Commit:	2db32a0
Author:	Matt Davis	2026-03-10 17:09:36 -0400

## Add `GEO_POINT` field type with geo-spatial search support - **New `GEO_POINT` field type** - `.index()` adds `LatLonPoint` for geo filtering (`zl:geo()`, `zl:geoBbox()`) - `.sort()` adds `LatLonDocValuesField` for `geodist()` sorting and score functions - Sorting and score functions can be used independently, consistent with other field types where doc values are gated behind `.sort()` - **Query parser additions** - `zl:geo` - `zl:geoBbox` - **New `geodist()` function** - Supported in both sort expressions and score functions - **Multi-shard geo sorting** - Implemented in `QueryCombiner` and `ZuliaPostSortingComparator` - **Client API additions** - `Sort.geoDistance()` - `FieldConfigBuilder.createGeoPoint()` - **Server-side validation** - Added in `CreateIndexRequestValidator` - `GEO_POINT` must enable `.index()` and/or `.sort()` - `GEO_POINT` cannot be faceted - **Build updates** - Updated `javacc.gradle` for Gradle 9.3.0 - Upgraded JavaCC to `7.0.13`

2026-03-06

Commit:	c5ae62b
Author:	Matt Davis	2026-03-06 14:29:16 -0500

### Add index size on disk and doc count to displayIndexes --detailed The `displayIndexes --detailed` CLI command now shows document count and index size in MB for each index alongside the shard location info. Index size is computed from the Lucene index and taxonomy directories on the filesystem. The size is collected per shard, summed per index on each node, and aggregated across nodes through the existing getNumberOfDocs federator. Use of the getNumberOfDocs request is not ideal from a naming perspective, but it makes sense codewise. Need a breaking change to rename this in a later release to getIndexDetails or similar

2026-02-26

Commit:	69bc74b
Author:	Matt Davis	2026-02-26 17:03:43 -0500
Committer:	Matt Davis	2026-02-26 17:26:31 -0500

Skip index reload when settings haven't changed Add no-op detection to createIndex and updateIndex so unchanged settings skip the store and cluster-wide reload. Strip createTime/updateTime before comparing in createIndex to avoid false mismatches from protobuf defaults. Add `changed` boolean to CreateIndexResponse and UpdateIndexResponse, exposed via isChanged() on the client result classes. Fix createIndex to preserve createTime on re-create and updateIndex to set updateTime when settings change. Add IndexTest cases for changed flag and timestamp behavior.

2026-02-25

Commit:	3f6794f
Author:	Matt Davis	2026-02-25 11:17:28 -0500

Lucene filter cache support with per-index disable option Add configurable Lucene filter (query) cache at the server level and a per-index disableFilterCache option to opt out individual indexes from caching. Server-level filter cache configuration: - New filterCacheMaxQueries (default 10000) and filterCacheMaxMemoryMB (default 256) settings in ZuliaConfig - ZuliaDConfig.setLuceneStatic() now accepts ZuliaConfig and configures IndexSearcher.setDefaultQueryCache() with an LRUQueryCache, or disables caching entirely when either value is 0 - Moved setLuceneStatic() call in StartNodeCmd to after config is loaded so it can read cache settings - Updated TestHelper to pass default ZuliaConfig to match new signature Per-index filter cache disable: - New disableFilterCache field in IndexSettings proto and corresponding update fields in UpdateIndexSettings - ShardReader sets indexSearcher.setQueryCache(null) when disableFilterCache is true, overriding the server default - ZuliaIndexManager handles the setDisableFilterCache update operation - ClientIndexConfig and UpdateIndex expose disableFilterCache getter/setter and wire it through to proto builders

2026-02-24

Commit:	2fe956f
Author:	Matt Davis	2026-02-24 10:08:42 -0500

Merge remote-tracking branch 'origin/main' into conditional_facets # Conflicts: # zulia-server/src/test/java/io/zulia/server/test/node/GeneralFeaturesTest.java

2026-02-23

Commit:	d41a189
Author:	Matt Davis	2026-02-23 15:23:14 -0500

Add per-facet/stat facet conditional aggregation based on total hits threshold Allow individual CountRequest and StatRequest to specify maxTotalHitsForFacet and maxShardHitsForFacet thresholds that skip expensive aggregation when hit counts are too high. If maxShardHitsForFacet is not set, the default is maxTotalHitsForFacet for the shard. Shard-level filtering in ShardReader prevents aggregation work early, while QueryCombiner applies the global threshold and drops incomplete combiners when shards disagree (preventing partial/incorrect counts).

2026-02-22

Commit:	20dd2f1
Author:	Matt Davis	2026-02-22 16:24:40 -0500

Optimize BatchFetch with TermInSetQuery and add BatchFetchGroup API Replace sequential per-document TermQuery lookups in batch fetch with batched TermInSetQuery per shard, reducing reader acquires and Lucene queries from N to S (number of shards). Add InternalBatchFetch RPC with pre-grouped shard-level requests (InternalShardBatchFetchRequest) to avoid redundant regrouping on receiving nodes and eliminate recursive call risk. Add BatchFetchGroup proto message and BatchFetchGroupBuilder client API for efficiently fetching multiple documents by index, ID list, and shared fetch profile (document fields, masked fields, associated fetch type) without creating individual FetchRequest objects per document.

2026-02-19

Commit:	bb178c2
Author:	Matt Davis	2026-02-19 12:16:13 -0500

Include search ID and label in shard-level debug log lines for correlation

2026-01-09

Commit:	16e306b
Author:	Matthew Davis	2026-01-09 11:30:44 -0800
Committer:	GitHub	2026-01-09 14:30:44 -0500

Add HTML stripping as an option for analyzer settings (#152) Add default standard analyzer with html stripping built in Refactor ZuliaPerFieldAnalyzer to split out the tokenization chain into ZuliaFieldAnalyzer Make sure in analysis requests that a custom analyzer used would be closed

2026-01-05

Commit:	f9a8101
Author:	Matthew Davis	2026-01-05 12:13:11 -0800
Committer:	GitHub	2026-01-05 15:13:11 -0500

Adding individual facet fields, and facet groups optimizations, merge optimizations, and shard info to the query controller (#150) * Individual facet fields, and facet groups optimizations * mark which shard the result is returned by * adjust merge policy

2025-11-13

Commit:	779343b
Author:	Matthew Davis	2025-11-13 11:02:39 -0800
Committer:	GitHub	2025-11-13 14:02:39 -0500

refactor stats api to include information about the cache stats for both the general and pinned caches. Change stats endpoint to use the proto object directly instead of the statsdto object. (#146)

2025-08-20

Commit:	20c175a
Author:	Matt Davis	2025-08-20 13:51:19 -0400

bump mongo driver to 5.5.1 from 5.4.0 (performance gains in theory, but probably not much for the zulia use case) bump common compress to 1.28.0 from 1.27.1, fixes CVE-2025-48924 upgrade micronaut to 4.9.1 from 4.7.6 (virtual threads can be enabled, although experimental now https://micronaut.io/2025/06/30/transitioning-to-virtual-threads-using-the-micronaut-loom-carrier/) upgrade protobuf to 4.31.1 from 3.25.5 upgrade grpc to 1.73.0 from 1.72.0 import DDSketch from the ddsketch repo and disable using the version from the jar in the build with the protobuf syntax in the gradle to get around the poisoning due to the CVE until they fix it. Explicitly place the zulia commons project at the top of the classpath to use our version of ddsketches first.

2025-06-27

Commit:	023f768
Author:	Matthew Davis	2025-06-27 15:25:33 -0400
Committer:	GitHub	2025-06-27 15:25:33 -0400

Virtual threads search and aggregation (#140) * implement facet concurrency using virtual threads allow concurrency to be specified on search / index / or node level

2025-05-22

Commit:	b753485
Author:	Matt Davis	2025-05-21 21:21:17 -0400

Allow searches that are not real time (see only what has been committed). Non realtime searches allow higher performance during indexing. Searches, fetches (documents only, not associated files), Get Terms, Get Fields all support the real time flag. The default for real time is false. Previous versions of Zulia were realtime=true and not modifiable.

2023-07-13

Commit:	aeeec4c
Author:	Matthew Davis	2023-07-13 08:20:40 -0400
Committer:	GitHub	2023-07-13 08:20:40 -0400

add german normalization filter and custom analyzer test case (#115)

2023-05-28

Commit:	07f1d15
Author:	Matthew Davis	2023-05-28 11:10:08 -0400
Committer:	GitHub	2023-05-28 11:10:08 -0400

add snappy compression for stored BSON documents and metadata by default. enable it to be turned off via an index flag in create/update index (#109)

2023-05-08

Commit:	96e26d9
Author:	Matthew Davis	2023-05-08 08:49:39 -0400
Committer:	GitHub	2023-05-08 08:49:39 -0400

Create index field mapping aliases (#102) * update proto for field aliases * add field mapping aliases

2023-04-07

Commit:	20b54b8
Author:	Matthew Davis	2023-04-07 11:05:44 -0400

switch stored docs to binary doc values

2023-03-31

Commit:	811b05e
Author:	Matthew Davis	2023-03-31 14:03:21 -0400

add more complete facet drill down with all/any and exclude logic

2023-03-04

Commit:	c9f544c
Author:	William Millman	2023-03-04 08:07:05 -0500
Committer:	GitHub	2023-03-04 08:07:05 -0500

External document (#88) * Adding the ability to link external files to documents in zulia * Added a helper builder to ensure external S3 documents use the correct keys.

2023-01-31

Commit:	adb59cc
Author:	Matthew Davis	2023-01-31 13:38:33 -0500

reformat code

Commit:	699b645
Author:	Matthew Davis	2023-01-31 13:18:27 -0500

Reformat all files using correct formatter

2022-11-09

Commit:	d4c8d70
Author:	Matthew Davis	2022-11-09 13:35:54 -0500

improve index settings display on REST to show warmed searches add ability to see if search response is pinned change log line to show if search is returned from the cache

2022-11-08

Commit:	9e4d6a3
Author:	Ian Swepston	2022-11-08 09:53:44 -0600
Committer:	GitHub	2022-11-08 10:53:44 -0500

Better numeric behaviors and support for shards (#80) * Adds support for shards on StatFacet * Adds ability to request percentiles in a StatFacet and NumericStat * Adds ability to generate numeric filters with a factory and apply them to a query * Fixes some bugs in query validation of NumericStat in sharded environment * Added a change which will add a match all query if only exclude filters are specified. Also added filter tests for the numeric range filter

2022-11-01

Commit:	ad81341
Author:	Matthew Davis	2022-11-01 10:38:05 -0400
Committer:	GitHub	2022-11-01 10:38:05 -0400

Cache updates (#79) * Implement cache pinning Searches that are pinned remain in cache until the index is changed or zulia is restarted. There is no limit to the number of searches that can be pinned to an index so watch out for RAM use. * Allow zulia client to know if query response was cached * Add warming searches to create index and update index with changes to proto files, java client, and backend. Warmed searches are stored as a bytes instead of a QueryRequest to reduce dependency between query and indexing and to allow a future incompatibility not to break indexing loading. Actual warming of searches is not part of this commit. * switch to caffiene for query caching (implements Window TinyLfu for near optimal hit rate) * add shard warming timer config and warm searches on timer. refactor shard query to allow the logic to be abstracted * ensure order of is kept on index updates of field config, warmed searches, .... * basic index creation test cases with warmed searches * Add framework to shard for displaying cache statistics (not usable by a client yet) * add numeric set query type that is like term query for numbers * upgrade to lucene 9.4 * bump gradle to 7.5.1 * update mongodb java driver to 4.7.2 (mongodb 6.0 feature support) * upgrade to micronaut 3.7.1 from 3.4.3 * bump snakeyaml for vulnerability patch * bump embedded mongo from 3.4.5 to 3.4.11 * bump protobuf to 3.21.7 * fix compile post protobuf bump to 3.21.7 (https://github.com/protocolbuffers/protobuf/issues/10695) * remove legacy parser and dismax features only supported through legacy parser * add swagger-ui to rest service * fix serving of index html page for the rest service with links to swagger and wiki

2022-10-12

Commit:	394ef89
Author:	Matthew Davis	2022-10-12 12:43:23 -0400

remove legacy parser and dismax features only supported through legacy parser

2022-09-23

Commit:	e050395
Author:	Matthew Davis	2022-09-23 14:46:29 -0400

add numeric set query type that is like term query for numbers

2022-06-15

Commit:	c144daf
Author:	Matt Davis	2022-06-15 09:19:12 -0400
Committer:	GitHub	2022-06-15 09:19:12 -0400

add (partial) update index functionality to zulia (#72) * add (partial) update index functionality to zulia * document new features: -add update index java client documentation -document metadata in create index -vector queries * improve documentation for existing: -delete index with associated documents -term queries

2022-06-04

Commit:	12f4300
Author:	Matthew Davis	2022-06-04 08:37:15 -0400

add ability to set metadata bson on the zulia index

2022-05-23

Commit:	d1cf3b5
Author:	Matt Davis	2022-05-23 11:15:23 -0400
Committer:	GitHub	2022-05-23 11:15:23 -0400

add vector searching from lucene 9.0 using KnnVectorQuery and KnnVectorField replacing the old zulia handling (#70) add vector searching from lucene 9.0 using KnnVectorQuery and KnnVectorField replacing the old zulia handling (#70)

2022-04-22

Commit:	1ab1999
Author:	Matt Davis	2022-04-22 11:12:12 -0400
Committer:	GitHub	2022-04-22 11:12:12 -0400

add default false flag to not delete associated files to delete index (#61) command line option to actually delete needs to be added still

2022-03-04

Commit:	fe88003
Author:	Karl	2022-03-04 11:08:41 -0600
Committer:	GitHub	2022-03-04 12:08:41 -0500

Zulia Version Information (#54) * Improved versioning support. * Added version flag to command-line * Added version info to nodes * Added version header to getCurrentNodes

2022-02-26

Commit:	b56eb01
Author:	Matt Davis	2022-02-26 09:11:14 -0500
Committer:	GitHub	2022-02-26 09:11:14 -0500

Lucene 9.0 (#52) * Update to lucene 9.0 * Update to grpc-java v1.43.2 and Protobuf to 3.19.2 to avoid CVE-2021-22569. * Update mongo java driver to 4.4.0 * Add better logging for channel close * Make sure to clean/close up internal connections * General warning cleanup, package cleanup, and refactoring for simplicity * Clear out the old closed nodes in the test helper * Fix java 17 deprecation warning * Update to gradle 7.4 * Add new zulia query parser based on org.apache.lucene.queryparser.flexible.standard * New parser supports minimum should match added via (term term2 term3)~n syntax * New parser supports multiple field search via field1,field2:... * New parser supports GTE/LTE/GT/LT operators like pubYear>=2020 * Legacy parser is still available with legacy option on Query * Legacy parser supports dismax but new one does not yet * new boolean facet handling maps to True/False only from true/t/y/yes f/false/n/no * change boolean sort to be numeric based doc values * Ensure field name cannot contain a comma * Add taxonomy stats support for a boolean by treating it like 0,1

2022-01-03

Commit:	8d37c95
Author:	Matt Davis	2022-01-03 10:23:42 -0500
Committer:	GitHub	2022-01-03 10:23:42 -0500

Index alias (#51) * Add index alias support to zulia * Minor cleanup of test cases and client

2021-12-14

Commit:	db5737e
Author:	Payam Meyer	2021-12-14 10:44:06 -0500
Committer:	GitHub	2021-12-14 10:44:06 -0500

#48 - Added allDocCounts so that documents with null or empty values for a facet get counted towards total count. (#49)

2021-11-17

Commit:	09db76a
Author:	Matthew Davis	2021-11-17 09:51:57 -0500

add TaxonomyStatsHandler and wire into shard reader

2021-11-10

Commit:	730b4c2
Author:	Matthew Davis	2021-11-10 16:08:45 -0500

add stat request proto messages and refactor shardreader to allow easier coding of actual request

2021-10-29

Commit:	0521f23
Author:	Matthew Davis	2021-10-29 09:16:01 -0400

add hierarchical faceting to zulia

2020-10-29

Commit:	2e5d45c
Author:	Matthew Davis	2020-10-29 09:59:11 -0400

add FILTER_NOT and TERMS_NOT query type on server add .exclude() and .include() to TermQuery And FilterQuery

2020-09-17

Commit:	6e9909a
Author:	Matthew Davis	2020-09-17 12:21:25 -0400

Add new scored function api and a new java search builder to support new functionality (old query will be deprecated soon)

2020-08-30

Commit:	4c7d476
Author:	Matthew Davis	2020-08-30 10:42:07 -0400

refactor client protocol to allow more advanced types of queries add terms query using new refactor

2020-04-17

Commit:	555d5fa
Author:	Matthew Davis	2020-04-16 22:31:59 -0400

change missing last to missing first

2020-04-16

Commit:	8265230
Author:	Matthew Davis	2020-04-16 19:10:29 -0400

fix sorting bugs with null(missing) values add ability to put missing first or last

2019-12-04

Commit:	5ef21c8
Author:	Matt Davis	2019-12-04 10:15:10 -0500

#19 add field display name and description as optional attributes of a field

2019-10-18

Commit:	f348195
Author:	Matt Davis	2019-10-18 13:13:40 -0400
Committer:	GitHub	2019-10-18 13:13:40 -0400

Reindex (#16) * add reindex in place functionality

2018-11-22

Commit:	f845f1b
Author:	Matt Davis	2018-11-21 20:04:20 -0500

#8 switch metadata to be bson/json format

2018-05-04

Commit:	874f021
Author:	Matt Davis	2018-05-04 12:30:17 -0400

add case protected words filter

2018-02-28

Commit:	bf70085
Author:	Matt Davis	2018-02-28 15:06:57 -0500

add option to set ram buffer to larger size

2018-01-09

Commit:	57fd003
Author:	Matt Davis	2018-01-09 14:17:45 -0500

make sure index writer and analyzer's are reopened

2017-12-06

Commit:	e25131d
Author:	payammeyer	2017-12-06 11:30:35 -0500

Fixed typo

2017-11-28

Commit:	adf975c
Author:	Matt Davis	2017-11-27 20:04:16 -0500

default value to tfidf (Breaking change to make proto3 work)

2017-11-04

Commit:	4dcfe94
Author:	Matt Davis	2017-11-04 15:39:50 -0400

#2 allow query field (qf) to have wildcards allow multiple default search fields

2017-11-03

Commit:	0161f97
Author:	payammeyer	2017-11-03 12:11:38 -0400

Updated docId to luceneShardId. Added query command to zulia.

2017-10-27

Commit:	a324892
Author:	payammeyer	2017-10-27 11:32:44 -0400

Added scored filter query.

2017-09-26

Commit:	9a1d659
Author:	Matt Davis	2017-09-26 11:46:34 -0400

update to new gradle, fix compile issues

2017-09-19

Commit:	6618cea
Author:	Payam Meyer	2017-09-19 08:58:34 -0400

Filled in query defaults and created QueryRequestValidator.

2017-09-15

Commit:	d91c272
Author:	Matt Davis	2017-09-15 14:34:11 -0700

start filling in default values / validation

2017-09-14

Commit:	75caa60
Author:	Matt Davis	2017-09-14 06:40:57 -0700

set up index routing framework for internal requests

2017-08-23

Commit:	004f0e7
Author:	Matt Davis	2017-08-23 15:50:21 -0400

implement basic create index logic start to implement primary/replica search logic

2017-08-21

Commit:	f9fa517
Author:	Matt Davis	2017-08-20 21:47:28 -0400

add internal create index and delete index request

2017-08-18

Commit:	1b37e52
Author:	Matt Davis	2017-08-18 17:59:47 -0400

make batch fetch/delete stream remove client side cache add optional fetching of all nodes instead of just live nodes in service call

2017-08-14

Commit:	7748e82
Author:	Matt Davis	2017-08-13 21:48:17 -0400

optimize updates

2017-08-13

Commit:	8cf9937
Author:	Matt Davis	2017-08-13 14:22:55 -0400

add back internal version of store, fetch, and delete

2017-08-12

Commit:	bb2a598
Author:	Matt Davis	2017-08-12 15:11:56 -0400

refactor to use federator or router

2017-08-10

Commit:	722f862
Author:	Matt Davis	2017-08-10 16:26:55 -0400

continue refactor from lumongo

Commit:	839d68f
Author:	Payam Meyer	2017-08-10 09:33:50 -0400

Added handlers for server service

2017-08-09

Commit:	08a08e0
Author:	Matt Davis	2017-08-09 14:57:12 -0400

start internal client refactor with common pool 2.x and grpc and zulia

2017-08-08

Commit:	527bd95
Author:	Matt Davis	2017-08-08 12:18:00 -0400

start the removal of hazelcast

2017-08-07

Commit:	89e9538
Author:	Matt Davis	2017-08-07 17:54:48 -0400

add master slave settings

Commit:	c60b6eb
Author:	Payam Meyer	2017-08-07 12:42:11 -0400

Finished porting the client.

Commit:	fcebab5
Author:	Payam Meyer	2017-08-07 09:38:10 -0400

Added REST resources, some minor refactor.

Commit:	271747d
Author:	Matt Davis	2017-08-07 08:55:21 -0400

beginning of the client port

2017-08-04

Commit:	5f2704d
Author:	Matt Davis	2017-08-04 18:46:09 -0400

move over document storage from lumongo and some analysis (filter, analyzer, similarity) classes from lumongo

2017-08-03

Commit:	0495913
Author:	Matt Davis	2017-08-03 11:40:06 -0400

make sure segment is remained to shard

2017-08-02

Commit:	4a41b70
Author:	Matt Davis	2017-08-02 16:54:02 -0400

add start scripts add some configuration setup

Commit:	a9cb044
Author:	Matt Davis	2017-08-02 12:09:59 -0400

bring over proto files from lumongo and rearrange and make proto3 compat