These commits are when the Protocol Buffers files have changed: (only the last 100 relevant commits are shown)
| Commit: | 4cd97fe | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Support squared pearson correlation metric PiperOrigin-RevId: 747427219
The documentation is generated from this commit.
| Commit: | a3f72a5 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Proto changes for supporting content chunk semantic type in AI Flow PiperOrigin-RevId: 742870644
| Commit: | d4c2564 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add Video as a domain to TFMD schema PiperOrigin-RevId: 729170148
| Commit: | f440b43 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Support Audio as a domain in Schema PiperOrigin-RevId: 699716595
| Commit: | cff231e | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Automated g4 rollback of changelist 669103619. *** Reason for rollback *** Broke TAP tests the LegoML project *** Original change description *** Mark message types as requiring the go/jspb object format methods. This CL marks types that use the JSPB object format accessors (see go/jspb-api-gotchas#objects) so that we do not remove them in the future. This is a backwards compatibility option and will represent no immediate change. The implementations of these methods are large and expensive for the JSCompiler to process; so we intend to limit their generation to improve compilation performance. See go/lsc-constrain-jspb-object-format-us... *** PiperOrigin-RevId: 670640538
| Commit: | 2ad4ebe | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Internal only change. PiperOrigin-RevId: 669103619
| Commit: | bce3c31 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Updates schema proto documentation to clarify that top-level float/int domains are not supported in TFDV. PiperOrigin-RevId: 660899053
| Commit: | ec9005b | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
For nested features with N nested levels (N > 1), the statistics counting the number of values in `CommonStatistics` and `WeightedCommonStatistics` will rely on the innermost level. PiperOrigin-RevId: 631265288
| Commit: | 8ea7f6a | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Remove unused field NaturalLanguageDomain.location_constraint_regex. It was documented as "please do not use" and never implemented. PiperOrigin-RevId: 621881996
| Commit: | f7118d0 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Comment fix for copy&paste glitch PiperOrigin-RevId: 621830437
| Commit: | 7c1ecd5 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
* Add `embedding_type` to `FloatDomain` to specify the semantic type of the embedding, which is useful for use cases where downstream tasks depend on knowing where the embedding came from. PiperOrigin-RevId: 611538841
| Commit: | ed7ce77 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
[Tunelab Integration] Update PSW to support text_generation task type PiperOrigin-RevId: 611265010
| Commit: | 02145df | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Clarify comment on tensorflow.metadata.v0.FixedShape: it matches tensorflow.TensorShapeProto only for fully defined shapes. PiperOrigin-RevId: 596643499
| Commit: | 056ecff | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Create prototype LoRA trainer in AutoTFX service This change implements the flow described in go/autotfx-pets-lora-backend PiperOrigin-RevId: 590643060
| Commit: | 0808918 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Enable schema configuration to set default feature value for failed slicing sql when using SqlDeriver. PiperOrigin-RevId: 575254073
| Commit: | c65424c | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
[MTL] All primary final model selection with task weight. PiperOrigin-RevId: 569627723
| Commit: | a85e542 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
#MulticlassDistillation Update the distillation spec to include config for multiclass distillation. PiperOrigin-RevId: 564478592
| Commit: | 62bf3b5 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Remove optional from problem statement proto. PiperOrigin-RevId: 563160918
| Commit: | c973240 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Internal only PiperOrigin-RevId: 559467373
| Commit: | ede25a3 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
* Add `embedding_dim` to `FloatDomain` to specify the embedding dimension, which is useful for use cases such as restoring shapes for flattened sequence of embeddings. * Add `sequence_truncation_limit` to `SequenceMetadata` to specify the maximum sequence length that should be processed. PiperOrigin-RevId: 554643195
| Commit: | b5f35ca | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add BOOL_TYPE_INVALID_CONFIG anomaly types. PiperOrigin-RevId: 551650214
| Commit: | 8d3a752 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Internal only PiperOrigin-RevId: 538947625
| Commit: | 985d366 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add `joint_group` to `SequenceMetadata` to specify which group this sequence feature belongs to so that they can be modeled jointly. PiperOrigin-RevId: 527186422
| Commit: | 6a7fab3 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Internal only PiperOrigin-RevId: 525823318
| Commit: | eac14dc | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | RaviTeja Gorijala | |
Add deriver classes and tests for image feature along with associated schema configuration support. PiperOrigin-RevId: 521859426
| Commit: | 9bb595d | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add deriver classes and tests for image feature along with associated schema configuration support. PiperOrigin-RevId: 521859426
| Commit: | a7991e5 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Adds currently unused HistogramSelection field to Schema PiperOrigin-RevId: 513062587
| Commit: | f0c4a7a | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Internal only PiperOrigin-RevId: 507908632
| Commit: | b2972c7 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
MVP of BooleanFlipRate metric for use as a secondary tuning objective PiperOrigin-RevId: 507862768
| Commit: | 013b564 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add documentation for sql-based derived features. https://g3doc.corp.google.com/third_party/py/tensorflow_data_validation/google/g3doc/sql_derived_features.md?cl=495457531 PiperOrigin-RevId: 501017930
| Commit: | 2f81d3e | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Adds a knob to TFMD schema to infer RaggedTensors for variable length features. By default they are inferred as ragged left-aligned SparseTensors. PiperOrigin-RevId: 499913930
| Commit: | 9e59299 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add supporting schema configuration for sql-based derived features. PiperOrigin-RevId: 495452787
| Commit: | 40856c6 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Propagate `is_sorted`(`already_sorted`) field from schema's `SparseFeature` to `SparseTensor` TR. PiperOrigin-RevId: 488767868
| Commit: | a2d0b71 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Supports normalized absolute difference validation in tfdv. This can be used to verify that the exact numeric values of counts is similar when normalized by the overall size of two datasets. PiperOrigin-RevId: 483486591
| Commit: | ac771bf | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Internal only PiperOrigin-RevId: 482544627
| Commit: | 1de3c4f | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Adds a new feature comparator NormalizedAbsoluteDifference for use in comparing datasets that are expected to have identical categorical value counts. A followup will implement the comparison. PiperOrigin-RevId: 482258142
| Commit: | bdf2c0e | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add a CUSTOM_VALIDATION anomaly Type. PiperOrigin-RevId: 480641213
| Commit: | 066c341 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
NA PiperOrigin-RevId: 477099310
| Commit: | b0838a9 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add the SequenceMetadata field to the schema to specify if this feature could be treated as a sequence feature. PiperOrigin-RevId: 476400441
| Commit: | c4decb1 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
1) improved histogram accuracy Previously, QUANTILES histograms were generated with identical counts per bucket. When the elements selected as quantiles boundaries are far from the idealized boundary - e.g., you have a distribution that's far from continuous - this was very wrong. This is a common enough case that it probably affects histogram accuracy for many users. STANDARD histograms, being derived from the same quantiles source, were also affected. This CL propagates the cumulative weight sum from the underlying quantiles sketch, and uses those values to fill in sample counts. Since what we get from the sketch is a sequence of bin upper bounds and associated weights, this requires changing Bucket semantics to include their upper bound and not lower bound, except for the first bin which includes both, because the quantiles sketch always gives us the minimum element and its count separately. 2) simplified infinity handling The old handling of infinite values was complicated, and could result in bins that mixed finite and infinite values. For STANDARD histograms this is a problem, since we'd like to be able to align histograms using interpolation to calculate distance measures. I've updated STANDARD histogram generation to generate separate -inf and +inf bins, if applicable. 3) fixed nested list length custom stat It looks like we were computing custom stats for nested list length based on the count of elements up a level in the nested list hierarchy, which I've fixed. 4) handles float64 overflow by omitting the standard histogram PiperOrigin-RevId: 473787450
| Commit: | 6f8049b | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add is_auxiliary field to Task in problem_statement. PiperOrigin-RevId: 471911554
| Commit: | bc23278 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
internal PiperOrigin-RevId: 471150553
| Commit: | dd1324f | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add a categorical indicator to the schema for StringDomain PiperOrigin-RevId: 469291771
| Commit: | 07cc0d3 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Mark task_weight and weight as deprecated PiperOrigin-RevId: 468596453
| Commit: | 254e0c9 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Clarifies that num_non_missing statistics.proto includes examples that define a feature but contain an explicitly empty value list. PiperOrigin-RevId: 467706566
| Commit: | b850105 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Internal cleanup PiperOrigin-RevId: 462255959
| Commit: | 31635e3 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Internal only PiperOrigin-RevId: 460497139
| Commit: | 7041bed | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add FPR and FNR as DifferenceAcrossSlice metrics We originally avoided these because we disprefer threshold-based metrics (thresholds don't have a priori meaning unless a model is calibrated). However, clients find these fairness metrics much more intuitive and they align better with PA / policy guidance. Also, equalizing FPR and FNR is more directly what MinDiff is able to achieve. Moreover, we've already added threshold-based metrics recently for multi-label. The new fairness metrics added here are for binary classification where the final model is always calibrated, and thus the threshold is only unstable in the AutoML loop, which we address in documentation. PiperOrigin-RevId: 458987803
| Commit: | cb32430 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Make ThresholdConfig.threshold subfield into a oneof This is to ensure future compatibility with other kinds of thresholds. PiperOrigin-RevId: 457881611
| Commit: | 771bef6 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
TFDV: fixes a bug in validation of derived features wherein we'd always produce a DERIVED_FEATURE_BAD_LIFECYCLE anomaly, and adds test coverage. PiperOrigin-RevId: 455209162
| Commit: | c337b40 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
internal PiperOrigin-RevId: 450971903
| Commit: | 0ab8c01 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
internal PiperOrigin-RevId: 450458691
| Commit: | 770c81d | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Remove `option jspb_use_correct_proto2_semantics = false` from proto files where it has no effect, allowing the value to be the default `true`. This option only affects jspb gencode for singular primitive fields without default values in in proto2 files. All of these files either have no such fields or are proto3. More info: go/jspb-correctness-lsc Tested: TAP for global presubmit queue http://test/OCL:448077555:BASE:449009947:1652809126374:7c4f8811 PiperOrigin-RevId: 449856532
| Commit: | ebaa8e1 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Introduce a new metric that computes multilabel recall at a given score threshold. For quality experimentation with the Feedback team. PiperOrigin-RevId: 448441850
| Commit: | 07aed25 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Changes tfdv and tfx-bsl to use renamed derived source fields, and removes the old names from the proto. PiperOrigin-RevId: 446050596
| Commit: | 4717022 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Changes the names of several TFMD fields pertaining to derived features to avoid conflict with existing uses of "derived feature". The old names will be deleted in a followup. PiperOrigin-RevId: 445952456
| Commit: | d8aa0e5 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Marks derived stats protos as experimental in tfmd. PiperOrigin-RevId: 445184813
| Commit: | 75fb5a5 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Minor update to lifecycle stage documentation. PiperOrigin-RevId: 443753669
| Commit: | ada450b | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Adds a second anomaly type for derived features covering the source being set incorrectly PiperOrigin-RevId: 443423121
| Commit: | 6cc84bd | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Adds DERIVED_FEATURE_BAD_LIFECYCLE anomaly to signal schema anomalies when a derived feature appears in the schema with an incompatible lifecycle (e.g., PRODUCTION). PiperOrigin-RevId: 443222946
| Commit: | fcb60ac | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
internal PiperOrigin-RevId: 442882736
| Commit: | ff6e00d | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Handle derived source in tfx-bsl proto merger code. PiperOrigin-RevId: 441549608
| Commit: | 3294624 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Introduce a new metric that computes multilabel precision at a given score threshold. For quality experimentation with feedback. PiperOrigin-RevId: 441069918
| Commit: | 8a98a73 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Adds a DerivedFeatureSource message to TFMD to track metadata describing derived features. Derived features will be features that are computed from ordinary features during statistics generation, and which are available for exploratory analysis or validation, but not present in the raw inputs. PiperOrigin-RevId: 438584379
| Commit: | 47dd73e | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Remove "do not use" from TensorRepresentationGroup docstring. The attribute is fairly mature and is already in use in a number of TFDV and TFT features PiperOrigin-RevId: 436783627
| Commit: | 03b7c0f | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add new Jaccard score performance metric. PiperOrigin-RevId: 429182336
| Commit: | 210e7ca | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Rolls back addition of utf8_encoded field to StringDomain. PiperOrigin-RevId: 421914463
| Commit: | b004599 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
rollback PiperOrigin-RevId: 421902202
| Commit: | c8713da | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Adds an anomaly type for unexpected non-utf8 strings. PiperOrigin-RevId: 421868016
| Commit: | 1030660 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Introduces fields for unicode validation. PiperOrigin-RevId: 420781430
| Commit: | 5231f60 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Mark various unsupported metrics as deprecated These metrics aren't supported in AutoTFX (the only implementation of this proto) PiperOrigin-RevId: 417707332
| Commit: | a343f99 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Fully deprecate objective_function and mark multi_objective for deprecation. The `multi_objective` bool isn't understood by our system currently and it's not at all clear that a single boolean like this is how we'd express a pareto search longer term. Since we're starting to support thresholded metrics and possibly soon weighted combinations, having this extra unused config adds confusion. PiperOrigin-RevId: 414920758
| Commit: | c28a163 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Proposes API for thresholded metrics A more complete proposal of the behavior/contract can be found here: https://docs.google.com/document/d/1xT8Fq2fc2j3ZkrNamFEK9I2OQg5988X4xV8fQFXBUH0/edit?resourcekey=0-9uxb8KFe_JCOesZo4Z317Q#heading=h.da395nbulm5q (Ideally contractual information belongs in the proto, but AutoTFX-specific details don't seem appropriate here so I think the above will have to be translated into some kind of AutoTFX g3doc instead?) PiperOrigin-RevId: 407688935
| Commit: | f5bc9e9 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add a Google-only Differential Privacy Performance Metric to Problem Statement. PiperOrigin-RevId: 406898733
| Commit: | 62c3d17 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Proposal for experimental version of a Google-only API for a coverage safety metric. PiperOrigin-RevId: 402708855
| Commit: | c69c81a | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Updates tensorflow metadata schema to contain is_categorical field in FloatDomain. PiperOrigin-RevId: 396027954
| Commit: | 8ca2ce8 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add specification of Positive and Negative class to the problem statement and handle it in AutoTFX schema augmentation. Note, this does not actually plug in the schema augmentation at this time. PiperOrigin-RevId: 384726076
| Commit: | 5494f39 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Update warnings now that we have NLP support in TFDV. PiperOrigin-RevId: 373807037
| Commit: | af1e53a | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Adding probability prediction to problem_statement.proto PiperOrigin-RevId: 370727045
| Commit: | bee7839 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add IFTTT guard to update TFDV anomalies docs. PiperOrigin-RevId: 368731855
| Commit: | 7f9fb4b | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add anomaly when natural language stats are not computed. PiperOrigin-RevId: 368680677
| Commit: | 4ffe57b | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add new anomaly info types. PiperOrigin-RevId: 368249522
| Commit: | 4c8e8d6 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Schema: Adding sequence length min / max and set of tokens to ignore when computing the sequence length. Statistics: Adding sequence length distribution, min, max PiperOrigin-RevId: 362120228
| Commit: | fb26fc9 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add (Google-only) API for fairness remediation problem specification This is the direction I'm leaning based on review. I think we're close enough to start talking about naming, etc. After getting feedback from you three, I'll run it by Kapla. Note the API does not support pre-split paths like Kapla has. They'd need to refactor their data materialization to align with this API. We might build them a little tool do to the data massaging but it would be outside of AutoTFX for now. PiperOrigin-RevId: 358469597
| Commit: | 6ff1147 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Extend RaggedTensor representation to support more ragged tensor representations. PiperOrigin-RevId: 358190488
| Commit: | 117a9c4 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Added a new type of anomaly for feature shape validation. PiperOrigin-RevId: 356351795
| Commit: | e347095 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
add MAX_IMAGE_BYTE_SIZE_EXCEEDED to tfmd anomalies and max_num_bytes_int field to bytes_stats. PiperOrigin-RevId: 355641556
| Commit: | c75cb7a | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Implements reported_sequences and token statistics in natural language stats generator. PiperOrigin-RevId: 355009342
| Commit: | 43d792a | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add is_embedding to FloatDomain and a corresponding Anomaly type for invalid embeddings PiperOrigin-RevId: 352034922
| Commit: | 7d6df7b | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
[TFDV for NLP] Extend anomalies proto to report back anomalies associated with NaturalLanguageDomain. PiperOrigin-RevId: 347106293
| Commit: | 43fa6b0 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Update schema to support differentiating between tokens to be excluded from the coverage calculation and those that represent oov tokens. e.g [PAD] represents the former while [UNK] represents the later. PiperOrigin-RevId: 345335922
| Commit: | bca48d9 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
[TFDV for NLP] Update statistics.proto to define NaturalLanguageStatistics. PiperOrigin-RevId: 344852583
| Commit: | 88fdc7e | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
[TFDV for NLP] Definition of NaturalLanguageDomain. PiperOrigin-RevId: 344132237
| Commit: | 6c734ea | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
LSC: opt-out of default proto2 semantic correctness, we're inverting the default. This is intended to be a no-op change to your codebase, please rollback and let us know if there are unexpected issues caused by this change. To learn more, please visit: go/jspb-correct-proto2 #jspb-correct-proto2-lsc PiperOrigin-RevId: 340256812
| Commit: | ce8d5cc | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add a DISABLED lifecycle stage This can be used to denote features that are excluded from a model. Whereas the semantics of DEPRECATED indicate that the feature was previously used but should not be in the future, DISABLED is more temporally agnostic and simply indicates a feature that is not currently in use. PiperOrigin-RevId: 339273198
| Commit: | aa10b5c | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Added a new proto DriftSkewInfo for keeping the raw measurements of skew/drift. Also added a new repeated field of that type in the Anomalies proto so that TFDV can report the raw measurements regardless of whether a drift/skew was detected. PiperOrigin-RevId: 336910440
| Commit: | 8406f6d | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Fixing comment typos that caused me small confusion. PiperOrigin-RevId: 335090115
| Commit: | 0d2d176 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Add new Anomaly type to describe when a domain is incompatible with the data type. PiperOrigin-RevId: 333393799
| Commit: | c1d9377 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Document skew/drift detection support for numeric features and remove experimental warning from Jensen-Shannon divergence field in schema proto. PiperOrigin-RevId: 333191383
| Commit: | 967df77 | |
|---|---|---|
| Author: | tf-metadata-team | |
| Committer: | tf-metadata-team | |
Allow for specifying a label as a Path or a string PiperOrigin-RevId: 329725820