Proto commits in tensorflow/metadata

These commits are when the Protocol Buffers files have changed: (only the last 100 relevant commits are shown)

Commit:4cd97fe
Author:tf-metadata-team
Committer:tf-metadata-team

Support squared pearson correlation metric PiperOrigin-RevId: 747427219

The documentation is generated from this commit.

Commit:a3f72a5
Author:tf-metadata-team
Committer:tf-metadata-team

Proto changes for supporting content chunk semantic type in AI Flow PiperOrigin-RevId: 742870644

Commit:d4c2564
Author:tf-metadata-team
Committer:tf-metadata-team

Add Video as a domain to TFMD schema PiperOrigin-RevId: 729170148

Commit:f440b43
Author:tf-metadata-team
Committer:tf-metadata-team

Support Audio as a domain in Schema PiperOrigin-RevId: 699716595

Commit:cff231e
Author:tf-metadata-team
Committer:tf-metadata-team

Automated g4 rollback of changelist 669103619. *** Reason for rollback *** Broke TAP tests the LegoML project *** Original change description *** Mark message types as requiring the go/jspb object format methods. This CL marks types that use the JSPB object format accessors (see go/jspb-api-gotchas#objects) so that we do not remove them in the future. This is a backwards compatibility option and will represent no immediate change. The implementations of these methods are large and expensive for the JSCompiler to process; so we intend to limit their generation to improve compilation performance. See go/lsc-constrain-jspb-object-format-us... *** PiperOrigin-RevId: 670640538

Commit:2ad4ebe
Author:tf-metadata-team
Committer:tf-metadata-team

Internal only change. PiperOrigin-RevId: 669103619

Commit:bce3c31
Author:tf-metadata-team
Committer:tf-metadata-team

Updates schema proto documentation to clarify that top-level float/int domains are not supported in TFDV. PiperOrigin-RevId: 660899053

Commit:ec9005b
Author:tf-metadata-team
Committer:tf-metadata-team

For nested features with N nested levels (N > 1), the statistics counting the number of values in `CommonStatistics` and `WeightedCommonStatistics` will rely on the innermost level. PiperOrigin-RevId: 631265288

Commit:8ea7f6a
Author:tf-metadata-team
Committer:tf-metadata-team

Remove unused field NaturalLanguageDomain.location_constraint_regex. It was documented as "please do not use" and never implemented. PiperOrigin-RevId: 621881996

Commit:f7118d0
Author:tf-metadata-team
Committer:tf-metadata-team

Comment fix for copy&paste glitch PiperOrigin-RevId: 621830437

Commit:7c1ecd5
Author:tf-metadata-team
Committer:tf-metadata-team

* Add `embedding_type` to `FloatDomain` to specify the semantic type of the embedding, which is useful for use cases where downstream tasks depend on knowing where the embedding came from. PiperOrigin-RevId: 611538841

Commit:ed7ce77
Author:tf-metadata-team
Committer:tf-metadata-team

[Tunelab Integration] Update PSW to support text_generation task type PiperOrigin-RevId: 611265010

Commit:02145df
Author:tf-metadata-team
Committer:tf-metadata-team

Clarify comment on tensorflow.metadata.v0.FixedShape: it matches tensorflow.TensorShapeProto only for fully defined shapes. PiperOrigin-RevId: 596643499

Commit:056ecff
Author:tf-metadata-team
Committer:tf-metadata-team

Create prototype LoRA trainer in AutoTFX service This change implements the flow described in go/autotfx-pets-lora-backend PiperOrigin-RevId: 590643060

Commit:0808918
Author:tf-metadata-team
Committer:tf-metadata-team

Enable schema configuration to set default feature value for failed slicing sql when using SqlDeriver. PiperOrigin-RevId: 575254073

Commit:c65424c
Author:tf-metadata-team
Committer:tf-metadata-team

[MTL] All primary final model selection with task weight. PiperOrigin-RevId: 569627723

Commit:a85e542
Author:tf-metadata-team
Committer:tf-metadata-team

#MulticlassDistillation Update the distillation spec to include config for multiclass distillation. PiperOrigin-RevId: 564478592

Commit:62bf3b5
Author:tf-metadata-team
Committer:tf-metadata-team

Remove optional from problem statement proto. PiperOrigin-RevId: 563160918

Commit:c973240
Author:tf-metadata-team
Committer:tf-metadata-team

Internal only PiperOrigin-RevId: 559467373

Commit:ede25a3
Author:tf-metadata-team
Committer:tf-metadata-team

* Add `embedding_dim` to `FloatDomain` to specify the embedding dimension, which is useful for use cases such as restoring shapes for flattened sequence of embeddings. * Add `sequence_truncation_limit` to `SequenceMetadata` to specify the maximum sequence length that should be processed. PiperOrigin-RevId: 554643195

Commit:b5f35ca
Author:tf-metadata-team
Committer:tf-metadata-team

Add BOOL_TYPE_INVALID_CONFIG anomaly types. PiperOrigin-RevId: 551650214

Commit:8d3a752
Author:tf-metadata-team
Committer:tf-metadata-team

Internal only PiperOrigin-RevId: 538947625

Commit:985d366
Author:tf-metadata-team
Committer:tf-metadata-team

Add `joint_group` to `SequenceMetadata` to specify which group this sequence feature belongs to so that they can be modeled jointly. PiperOrigin-RevId: 527186422

Commit:6a7fab3
Author:tf-metadata-team
Committer:tf-metadata-team

Internal only PiperOrigin-RevId: 525823318

Commit:eac14dc
Author:tf-metadata-team
Committer:RaviTeja Gorijala

Add deriver classes and tests for image feature along with associated schema configuration support. PiperOrigin-RevId: 521859426

Commit:9bb595d
Author:tf-metadata-team
Committer:tf-metadata-team

Add deriver classes and tests for image feature along with associated schema configuration support. PiperOrigin-RevId: 521859426

Commit:a7991e5
Author:tf-metadata-team
Committer:tf-metadata-team

Adds currently unused HistogramSelection field to Schema PiperOrigin-RevId: 513062587

Commit:f0c4a7a
Author:tf-metadata-team
Committer:tf-metadata-team

Internal only PiperOrigin-RevId: 507908632

Commit:b2972c7
Author:tf-metadata-team
Committer:tf-metadata-team

MVP of BooleanFlipRate metric for use as a secondary tuning objective PiperOrigin-RevId: 507862768

Commit:013b564
Author:tf-metadata-team
Committer:tf-metadata-team

Add documentation for sql-based derived features. https://g3doc.corp.google.com/third_party/py/tensorflow_data_validation/google/g3doc/sql_derived_features.md?cl=495457531 PiperOrigin-RevId: 501017930

Commit:2f81d3e
Author:tf-metadata-team
Committer:tf-metadata-team

Adds a knob to TFMD schema to infer RaggedTensors for variable length features. By default they are inferred as ragged left-aligned SparseTensors. PiperOrigin-RevId: 499913930

Commit:9e59299
Author:tf-metadata-team
Committer:tf-metadata-team

Add supporting schema configuration for sql-based derived features. PiperOrigin-RevId: 495452787

Commit:40856c6
Author:tf-metadata-team
Committer:tf-metadata-team

Propagate `is_sorted`(`already_sorted`) field from schema's `SparseFeature` to `SparseTensor` TR. PiperOrigin-RevId: 488767868

Commit:a2d0b71
Author:tf-metadata-team
Committer:tf-metadata-team

Supports normalized absolute difference validation in tfdv. This can be used to verify that the exact numeric values of counts is similar when normalized by the overall size of two datasets. PiperOrigin-RevId: 483486591

Commit:ac771bf
Author:tf-metadata-team
Committer:tf-metadata-team

Internal only PiperOrigin-RevId: 482544627

Commit:1de3c4f
Author:tf-metadata-team
Committer:tf-metadata-team

Adds a new feature comparator NormalizedAbsoluteDifference for use in comparing datasets that are expected to have identical categorical value counts. A followup will implement the comparison. PiperOrigin-RevId: 482258142

Commit:bdf2c0e
Author:tf-metadata-team
Committer:tf-metadata-team

Add a CUSTOM_VALIDATION anomaly Type. PiperOrigin-RevId: 480641213

Commit:066c341
Author:tf-metadata-team
Committer:tf-metadata-team

NA PiperOrigin-RevId: 477099310

Commit:b0838a9
Author:tf-metadata-team
Committer:tf-metadata-team

Add the SequenceMetadata field to the schema to specify if this feature could be treated as a sequence feature. PiperOrigin-RevId: 476400441

Commit:c4decb1
Author:tf-metadata-team
Committer:tf-metadata-team

1) improved histogram accuracy Previously, QUANTILES histograms were generated with identical counts per bucket. When the elements selected as quantiles boundaries are far from the idealized boundary - e.g., you have a distribution that's far from continuous - this was very wrong. This is a common enough case that it probably affects histogram accuracy for many users. STANDARD histograms, being derived from the same quantiles source, were also affected. This CL propagates the cumulative weight sum from the underlying quantiles sketch, and uses those values to fill in sample counts. Since what we get from the sketch is a sequence of bin upper bounds and associated weights, this requires changing Bucket semantics to include their upper bound and not lower bound, except for the first bin which includes both, because the quantiles sketch always gives us the minimum element and its count separately. 2) simplified infinity handling The old handling of infinite values was complicated, and could result in bins that mixed finite and infinite values. For STANDARD histograms this is a problem, since we'd like to be able to align histograms using interpolation to calculate distance measures. I've updated STANDARD histogram generation to generate separate -inf and +inf bins, if applicable. 3) fixed nested list length custom stat It looks like we were computing custom stats for nested list length based on the count of elements up a level in the nested list hierarchy, which I've fixed. 4) handles float64 overflow by omitting the standard histogram PiperOrigin-RevId: 473787450

Commit:6f8049b
Author:tf-metadata-team
Committer:tf-metadata-team

Add is_auxiliary field to Task in problem_statement. PiperOrigin-RevId: 471911554

Commit:bc23278
Author:tf-metadata-team
Committer:tf-metadata-team

internal PiperOrigin-RevId: 471150553

Commit:dd1324f
Author:tf-metadata-team
Committer:tf-metadata-team

Add a categorical indicator to the schema for StringDomain PiperOrigin-RevId: 469291771

Commit:07cc0d3
Author:tf-metadata-team
Committer:tf-metadata-team

Mark task_weight and weight as deprecated PiperOrigin-RevId: 468596453

Commit:254e0c9
Author:tf-metadata-team
Committer:tf-metadata-team

Clarifies that num_non_missing statistics.proto includes examples that define a feature but contain an explicitly empty value list. PiperOrigin-RevId: 467706566

Commit:b850105
Author:tf-metadata-team
Committer:tf-metadata-team

Internal cleanup PiperOrigin-RevId: 462255959

Commit:31635e3
Author:tf-metadata-team
Committer:tf-metadata-team

Internal only PiperOrigin-RevId: 460497139

Commit:7041bed
Author:tf-metadata-team
Committer:tf-metadata-team

Add FPR and FNR as DifferenceAcrossSlice metrics We originally avoided these because we disprefer threshold-based metrics (thresholds don't have a priori meaning unless a model is calibrated). However, clients find these fairness metrics much more intuitive and they align better with PA / policy guidance. Also, equalizing FPR and FNR is more directly what MinDiff is able to achieve. Moreover, we've already added threshold-based metrics recently for multi-label. The new fairness metrics added here are for binary classification where the final model is always calibrated, and thus the threshold is only unstable in the AutoML loop, which we address in documentation. PiperOrigin-RevId: 458987803

Commit:cb32430
Author:tf-metadata-team
Committer:tf-metadata-team

Make ThresholdConfig.threshold subfield into a oneof This is to ensure future compatibility with other kinds of thresholds. PiperOrigin-RevId: 457881611

Commit:771bef6
Author:tf-metadata-team
Committer:tf-metadata-team

TFDV: fixes a bug in validation of derived features wherein we'd always produce a DERIVED_FEATURE_BAD_LIFECYCLE anomaly, and adds test coverage. PiperOrigin-RevId: 455209162

Commit:c337b40
Author:tf-metadata-team
Committer:tf-metadata-team

internal PiperOrigin-RevId: 450971903

Commit:0ab8c01
Author:tf-metadata-team
Committer:tf-metadata-team

internal PiperOrigin-RevId: 450458691

Commit:770c81d
Author:tf-metadata-team
Committer:tf-metadata-team

Remove `option jspb_use_correct_proto2_semantics = false` from proto files where it has no effect, allowing the value to be the default `true`. This option only affects jspb gencode for singular primitive fields without default values in in proto2 files. All of these files either have no such fields or are proto3. More info: go/jspb-correctness-lsc Tested: TAP for global presubmit queue http://test/OCL:448077555:BASE:449009947:1652809126374:7c4f8811 PiperOrigin-RevId: 449856532

Commit:ebaa8e1
Author:tf-metadata-team
Committer:tf-metadata-team

Introduce a new metric that computes multilabel recall at a given score threshold. For quality experimentation with the Feedback team. PiperOrigin-RevId: 448441850

Commit:07aed25
Author:tf-metadata-team
Committer:tf-metadata-team

Changes tfdv and tfx-bsl to use renamed derived source fields, and removes the old names from the proto. PiperOrigin-RevId: 446050596

Commit:4717022
Author:tf-metadata-team
Committer:tf-metadata-team

Changes the names of several TFMD fields pertaining to derived features to avoid conflict with existing uses of "derived feature". The old names will be deleted in a followup. PiperOrigin-RevId: 445952456

Commit:d8aa0e5
Author:tf-metadata-team
Committer:tf-metadata-team

Marks derived stats protos as experimental in tfmd. PiperOrigin-RevId: 445184813

Commit:75fb5a5
Author:tf-metadata-team
Committer:tf-metadata-team

Minor update to lifecycle stage documentation. PiperOrigin-RevId: 443753669

Commit:ada450b
Author:tf-metadata-team
Committer:tf-metadata-team

Adds a second anomaly type for derived features covering the source being set incorrectly PiperOrigin-RevId: 443423121

Commit:6cc84bd
Author:tf-metadata-team
Committer:tf-metadata-team

Adds DERIVED_FEATURE_BAD_LIFECYCLE anomaly to signal schema anomalies when a derived feature appears in the schema with an incompatible lifecycle (e.g., PRODUCTION). PiperOrigin-RevId: 443222946

Commit:fcb60ac
Author:tf-metadata-team
Committer:tf-metadata-team

internal PiperOrigin-RevId: 442882736

Commit:ff6e00d
Author:tf-metadata-team
Committer:tf-metadata-team

Handle derived source in tfx-bsl proto merger code. PiperOrigin-RevId: 441549608

Commit:3294624
Author:tf-metadata-team
Committer:tf-metadata-team

Introduce a new metric that computes multilabel precision at a given score threshold. For quality experimentation with feedback. PiperOrigin-RevId: 441069918

Commit:8a98a73
Author:tf-metadata-team
Committer:tf-metadata-team

Adds a DerivedFeatureSource message to TFMD to track metadata describing derived features. Derived features will be features that are computed from ordinary features during statistics generation, and which are available for exploratory analysis or validation, but not present in the raw inputs. PiperOrigin-RevId: 438584379

Commit:47dd73e
Author:tf-metadata-team
Committer:tf-metadata-team

Remove "do not use" from TensorRepresentationGroup docstring. The attribute is fairly mature and is already in use in a number of TFDV and TFT features PiperOrigin-RevId: 436783627

Commit:03b7c0f
Author:tf-metadata-team
Committer:tf-metadata-team

Add new Jaccard score performance metric. PiperOrigin-RevId: 429182336

Commit:210e7ca
Author:tf-metadata-team
Committer:tf-metadata-team

Rolls back addition of utf8_encoded field to StringDomain. PiperOrigin-RevId: 421914463

Commit:b004599
Author:tf-metadata-team
Committer:tf-metadata-team

rollback PiperOrigin-RevId: 421902202

Commit:c8713da
Author:tf-metadata-team
Committer:tf-metadata-team

Adds an anomaly type for unexpected non-utf8 strings. PiperOrigin-RevId: 421868016

Commit:1030660
Author:tf-metadata-team
Committer:tf-metadata-team

Introduces fields for unicode validation. PiperOrigin-RevId: 420781430

Commit:5231f60
Author:tf-metadata-team
Committer:tf-metadata-team

Mark various unsupported metrics as deprecated These metrics aren't supported in AutoTFX (the only implementation of this proto) PiperOrigin-RevId: 417707332

Commit:a343f99
Author:tf-metadata-team
Committer:tf-metadata-team

Fully deprecate objective_function and mark multi_objective for deprecation. The `multi_objective` bool isn't understood by our system currently and it's not at all clear that a single boolean like this is how we'd express a pareto search longer term. Since we're starting to support thresholded metrics and possibly soon weighted combinations, having this extra unused config adds confusion. PiperOrigin-RevId: 414920758

Commit:c28a163
Author:tf-metadata-team
Committer:tf-metadata-team

Proposes API for thresholded metrics A more complete proposal of the behavior/contract can be found here: https://docs.google.com/document/d/1xT8Fq2fc2j3ZkrNamFEK9I2OQg5988X4xV8fQFXBUH0/edit?resourcekey=0-9uxb8KFe_JCOesZo4Z317Q#heading=h.da395nbulm5q (Ideally contractual information belongs in the proto, but AutoTFX-specific details don't seem appropriate here so I think the above will have to be translated into some kind of AutoTFX g3doc instead?) PiperOrigin-RevId: 407688935

Commit:f5bc9e9
Author:tf-metadata-team
Committer:tf-metadata-team

Add a Google-only Differential Privacy Performance Metric to Problem Statement. PiperOrigin-RevId: 406898733

Commit:62c3d17
Author:tf-metadata-team
Committer:tf-metadata-team

Proposal for experimental version of a Google-only API for a coverage safety metric. PiperOrigin-RevId: 402708855

Commit:c69c81a
Author:tf-metadata-team
Committer:tf-metadata-team

Updates tensorflow metadata schema to contain is_categorical field in FloatDomain. PiperOrigin-RevId: 396027954

Commit:8ca2ce8
Author:tf-metadata-team
Committer:tf-metadata-team

Add specification of Positive and Negative class to the problem statement and handle it in AutoTFX schema augmentation. Note, this does not actually plug in the schema augmentation at this time. PiperOrigin-RevId: 384726076

Commit:5494f39
Author:tf-metadata-team
Committer:tf-metadata-team

Update warnings now that we have NLP support in TFDV. PiperOrigin-RevId: 373807037

Commit:af1e53a
Author:tf-metadata-team
Committer:tf-metadata-team

Adding probability prediction to problem_statement.proto PiperOrigin-RevId: 370727045

Commit:bee7839
Author:tf-metadata-team
Committer:tf-metadata-team

Add IFTTT guard to update TFDV anomalies docs. PiperOrigin-RevId: 368731855

Commit:7f9fb4b
Author:tf-metadata-team
Committer:tf-metadata-team

Add anomaly when natural language stats are not computed. PiperOrigin-RevId: 368680677

Commit:4ffe57b
Author:tf-metadata-team
Committer:tf-metadata-team

Add new anomaly info types. PiperOrigin-RevId: 368249522

Commit:4c8e8d6
Author:tf-metadata-team
Committer:tf-metadata-team

Schema: Adding sequence length min / max and set of tokens to ignore when computing the sequence length. Statistics: Adding sequence length distribution, min, max PiperOrigin-RevId: 362120228

Commit:fb26fc9
Author:tf-metadata-team
Committer:tf-metadata-team

Add (Google-only) API for fairness remediation problem specification This is the direction I'm leaning based on review. I think we're close enough to start talking about naming, etc. After getting feedback from you three, I'll run it by Kapla. Note the API does not support pre-split paths like Kapla has. They'd need to refactor their data materialization to align with this API. We might build them a little tool do to the data massaging but it would be outside of AutoTFX for now. PiperOrigin-RevId: 358469597

Commit:6ff1147
Author:tf-metadata-team
Committer:tf-metadata-team

Extend RaggedTensor representation to support more ragged tensor representations. PiperOrigin-RevId: 358190488

Commit:117a9c4
Author:tf-metadata-team
Committer:tf-metadata-team

Added a new type of anomaly for feature shape validation. PiperOrigin-RevId: 356351795

Commit:e347095
Author:tf-metadata-team
Committer:tf-metadata-team

add MAX_IMAGE_BYTE_SIZE_EXCEEDED to tfmd anomalies and max_num_bytes_int field to bytes_stats. PiperOrigin-RevId: 355641556

Commit:c75cb7a
Author:tf-metadata-team
Committer:tf-metadata-team

Implements reported_sequences and token statistics in natural language stats generator. PiperOrigin-RevId: 355009342

Commit:43d792a
Author:tf-metadata-team
Committer:tf-metadata-team

Add is_embedding to FloatDomain and a corresponding Anomaly type for invalid embeddings PiperOrigin-RevId: 352034922

Commit:7d6df7b
Author:tf-metadata-team
Committer:tf-metadata-team

[TFDV for NLP] Extend anomalies proto to report back anomalies associated with NaturalLanguageDomain. PiperOrigin-RevId: 347106293

Commit:43fa6b0
Author:tf-metadata-team
Committer:tf-metadata-team

Update schema to support differentiating between tokens to be excluded from the coverage calculation and those that represent oov tokens. e.g [PAD] represents the former while [UNK] represents the later. PiperOrigin-RevId: 345335922

Commit:bca48d9
Author:tf-metadata-team
Committer:tf-metadata-team

[TFDV for NLP] Update statistics.proto to define NaturalLanguageStatistics. PiperOrigin-RevId: 344852583

Commit:88fdc7e
Author:tf-metadata-team
Committer:tf-metadata-team

[TFDV for NLP] Definition of NaturalLanguageDomain. PiperOrigin-RevId: 344132237

Commit:6c734ea
Author:tf-metadata-team
Committer:tf-metadata-team

LSC: opt-out of default proto2 semantic correctness, we're inverting the default. This is intended to be a no-op change to your codebase, please rollback and let us know if there are unexpected issues caused by this change. To learn more, please visit: go/jspb-correct-proto2 #jspb-correct-proto2-lsc PiperOrigin-RevId: 340256812

Commit:ce8d5cc
Author:tf-metadata-team
Committer:tf-metadata-team

Add a DISABLED lifecycle stage This can be used to denote features that are excluded from a model. Whereas the semantics of DEPRECATED indicate that the feature was previously used but should not be in the future, DISABLED is more temporally agnostic and simply indicates a feature that is not currently in use. PiperOrigin-RevId: 339273198

Commit:aa10b5c
Author:tf-metadata-team
Committer:tf-metadata-team

Added a new proto DriftSkewInfo for keeping the raw measurements of skew/drift. Also added a new repeated field of that type in the Anomalies proto so that TFDV can report the raw measurements regardless of whether a drift/skew was detected. PiperOrigin-RevId: 336910440

Commit:8406f6d
Author:tf-metadata-team
Committer:tf-metadata-team

Fixing comment typos that caused me small confusion. PiperOrigin-RevId: 335090115

Commit:0d2d176
Author:tf-metadata-team
Committer:tf-metadata-team

Add new Anomaly type to describe when a domain is incompatible with the data type. PiperOrigin-RevId: 333393799

Commit:c1d9377
Author:tf-metadata-team
Committer:tf-metadata-team

Document skew/drift detection support for numeric features and remove experimental warning from Jensen-Shannon divergence field in schema proto. PiperOrigin-RevId: 333191383

Commit:967df77
Author:tf-metadata-team
Committer:tf-metadata-team

Allow for specifying a label as a Path or a string PiperOrigin-RevId: 329725820