Proto commits in google/deepvariant

These commits are when the Protocol Buffers files have changed: (only the last 100 relevant commits are shown)

Commit:d6da5ac
Author:Genomics team in Google Health
Committer:Copybara-Service

Implement methylation-aware phasing support PiperOrigin-RevId: 738436840

Commit:36bce4d
Author:lucasbrambrink
Committer:Copybara-Service

[small model] Expands the SM code to handle multiple samples. - tracks the `sample_name` in the `DeepVariantCall.ReadSupport` proto. - updates the `small_model_makes_examples` to compute the `BaseFeatures` per sample - update training code/configs to set the above params. PiperOrigin-RevId: 736249896

Commit:ed53c6d
Author:koles
Committer:Copybara-Service

Add `--output_debug_info` to control if extra debug info should be output. By default, it's False, to reduce the amount of log info. PiperOrigin-RevId: 735921061

Commit:534ed69
Author:koles
Committer:Copybara-Service

Add non-uniform downsampling flags for DeepSomatic. PiperOrigin-RevId: 734640273

Commit:4a01e8e
Author:koles
Committer:Copybara-Service

Change window_selector logic. PiperOrigin-RevId: 734309163

Commit:f605fa7
Author:danielecook
Committer:Copybara-Service

Add 6mA channel. PiperOrigin-RevId: 734230252

Commit:5bf1d7f
Author:lucasbrambrink
Committer:Copybara-Service

[small model] Refactors how small model is instantiated in `make_examples`. This ties the variant caller to the `sample_lib.Sample`, similar to how each sample has its own reader. - moves `trained_small_model_path` to `make_examples.py` for each binary (currently DV and Pangenome-DV). - this is a no-op: everything should continue working as before. PiperOrigin-RevId: 734127195

Commit:8dfe4f8
Author:Genomics team in Google Health
Committer:Copybara-Service

Add methylation information to CVO PiperOrigin-RevId: 730626158

Commit:15cc49c
Author:lucasbrambrink
Committer:Copybara-Service

[small model] Accept multi-allelic candidates * adds 2 `BaseFeatures`: `alt_indices_depth` and `alt_indices_variant_allele_frequency` * adds 2 `VariantFeatures`: `is_multiallelic` and `is_multiple_alt_alleles` features * adds field `make_examples_alt_allele_indices` to `DeepVariantCall` (the candidate proto), which instructs make_examples_native to restrict examples to the alt_allele_indices combinations that the small model did not call confidently. PiperOrigin-RevId: 726626783

Commit:4e3b586
Author:Genomics team in Google Health
Committer:Copybara-Service

Add option to enable methylation calling in DeepVariant PiperOrigin-RevId: 726141448

Commit:f32b5ef
Author:danielecook
Committer:Copybara-Service

Move base modification parsing to nucleus. PiperOrigin-RevId: 723315638

Commit:d8b4083
Author:koles
Committer:Copybara-Service

Part (4) of complex variants implementation. PiperOrigin-RevId: 722839178

Commit:2b807f3
Author:koles
Committer:Copybara-Service

All realigning all regions. PiperOrigin-RevId: 722786313

Commit:45318e2
Author:danielecook
Committer:Copybara-Service

Add a 5mC methylation channel to deepvariant. PiperOrigin-RevId: 713528298

Commit:05f0933
Author:koles
Committer:Copybara-Service

Addressing the issue raised in https://github.com/google/deepvariant/issues/811 This change ensures that PL value is calculated according to the ploidy in reference blocks of gVCF. PiperOrigin-RevId: 698266848

The documentation is generated from this commit.

Commit:f36f747
Author:mobinasri
Committer:Copybara-Service

Fixing the memory issue with GBZ reader PiperOrigin-RevId: 695498239

Commit:61777eb
Author:koles
Committer:Copybara-Service

Add phasing info to candidate output. PiperOrigin-RevId: 683811208

Commit:2b1c194
Author:mobinasri
Committer:Copybara-Service

- Fixed slowness in parsing alignments from gbz. The path index is now created when the gbzReader is instantiated instead of creating an index for each query. - Instead of hard-coding context size to 1k it is now set based on the partition_size - Added a flag `--ref_chrom_prefix` for adding a prefix to the chromosome names extracted from gbz file. It is needed when we have a reference like /cns/oz-d/home/brain-genomics/pichuan/b199781544/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.hprc_july7.fna whose chromosomes have the prefix of "GRCh38." - Renamed `pangenome_ref_name` to `ref_name_pangenome` since all other pangenome-related flags have `_pangenome` as suffix. - Removed printing runtimes in the query function of gbzReader PiperOrigin-RevId: 681641986

Commit:ca88780
Author:koles
Committer:Copybara-Service

Add an option to skip normalization in make_examples. PiperOrigin-RevId: 681532370

Commit:a87fd7a
Author:mobinasri
Committer:Copybara-Service

Added a GBZ reader to Nucleus and integrated it natively into make_examples for DeepVariant. The GBZ reader can be used to extract reads from a GBZ file. It can be slow at the moment and further optimizations will follow. PiperOrigin-RevId: 678760711

Commit:bfc5437
Author:lucasbrambrink
Committer:Copybara-Service

[small model] - Adds haplotype information to small model if `phase_reads` is enabled. - Add strand information as two features. PiperOrigin-RevId: 677840848

Commit:b4e1ab6
Author:koles
Committer:Copybara-Service

Add a sample option to skip phasing for a sample. PiperOrigin-RevId: 674437025

Commit:cff0882
Author:Genomics team in Google Health
Committer:Copybara-Service

Automated Code Change PiperOrigin-RevId: 673655198

Commit:6b74831
Author:lucasbrambrink
Committer:kishwarshafin

[small model] Enable the small model to call INDELs PiperOrigin-RevId: 663489298

Commit:87f57dd
Author:lucasbrambrink
Committer:kishwarshafin

[small model] Capture additional signals (mapq, baseq, and context VAF) PiperOrigin-RevId: 657733223

Commit:04f922c
Author:juancmier
Committer:kishwarshafin

Add functionality to blank out channels for a given variant type. PiperOrigin-RevId: 655808421

Commit:d2a279b
Author:lucasbrambrink
Committer:kishwarshafin

[small model] Fixes memory leak during small model inference PiperOrigin-RevId: 653710878

Commit:5b3f233
Author:lucasbrambrink
Committer:Copybara-Service

[small model] Enable the small model to call INDELs PiperOrigin-RevId: 663489298

Commit:50f4830
Author:lucasbrambrink
Committer:Copybara-Service

[small model] Capture additional signals (mapq, baseq, and context VAF) PiperOrigin-RevId: 657733223

Commit:0fe7253
Author:juancmier
Committer:Copybara-Service

Add functionality to blank out channels for a given variant type. PiperOrigin-RevId: 655808421

Commit:348e2ce
Author:lucasbrambrink
Committer:Copybara-Service

[small model] Fixes memory leak during small model inference PiperOrigin-RevId: 653710878

Commit:be4cc46
Author:juancmier
Committer:pichuan

This CL adds a new channel to DeepVariant that contains the average coverage of the reads aligned to any given position or base in the genome. PiperOrigin-RevId: 643492127

Commit:c6ec0c9
Author:juancmier
Committer:pichuan

Add mean coverage estimation per sample. * Gets the total possible regions using `UNION(RefContigs, SampleContigs) - ExcludeContigs` * Uses sampling rate to get a subset of total possible regions consecutively in the middle of the possible regions. This has been tested empirically to be fast and accurate but is subject to change. * Use allele counter to process reads in each region. Allele counter already performs logic to account for cigar operations counts reads at specific positions. PiperOrigin-RevId: 645466191

Commit:e08da3e
Author:lucasbrambrink
Committer:pichuan

Adds `--skip_pileup_image_generation` to `make_examples` that determines if pileup images are also generated when generating small model examples. PiperOrigin-RevId: 641053022

Commit:0b8f6da
Author:pichuan
Committer:pichuan

Add optional flags: exclude_variants_vcf_filename and exclude_variants_af_threshold. PiperOrigin-RevId: 638788722

Commit:5a36126
Author:danielecook
Committer:pichuan

Add flag for downsampling training classes. PiperOrigin-RevId: 640646334

Commit:76f727c
Author:shafin
Committer:pichuan

Internal change PiperOrigin-RevId: 636994465

Commit:64d3775
Author:koles
Committer:pichuan

make_example flags to support fast pipeline. PiperOrigin-RevId: 634539441

Commit:42ba475
Author:lucasbrambrink
Committer:pichuan

Integrates small model variant calling with make_examples_core PiperOrigin-RevId: 625487979

Commit:130b6bc
Author:pichuan
Committer:pichuan

Use the channels_enum_to_blank sample option in pangenome-aware implementation. PiperOrigin-RevId: 625533161

Commit:b0a1752
Author:lucasbrambrink
Committer:pichuan

Adds small model example generation code. PiperOrigin-RevId: 621262627

Commit:e154980
Author:gjorge
Committer:pichuan

Small documentation fixes. PiperOrigin-RevId: 618323356

Commit:b693bbf
Author:danielecook
Committer:pichuan

Update DeepSomatic options proto. PiperOrigin-RevId: 613625838

Commit:d437bc8
Author:koles
Committer:pichuan

Add --trim_reads_for_pileup flag. PiperOrigin-RevId: 610442006

Commit:09193d8
Author:mobinasri
Committer:pichuan

Pangenome aware calling PiperOrigin-RevId: 612593727

Commit:0eba6d0
Author:pichuan
Committer:pichuan

In call_variants, if --include_debug_info is true, we'll incorporate the functionality in `vis.curate_pileup` to add extra annotations to the images. PiperOrigin-RevId: 603201809

Commit:3fc2539
Author:msamman
Committer:pichuan

remove base channels from pileup_image_native.cc PiperOrigin-RevId: 603467957

Commit:83b69fe
Author:juancmier
Committer:Copybara-Service

Add mean coverage estimation per sample. * Gets the total possible regions using `UNION(RefContigs, SampleContigs) - ExcludeContigs` * Uses sampling rate to get a subset of total possible regions consecutively in the middle of the possible regions. This has been tested empirically to be fast and accurate but is subject to change. * Use allele counter to process reads in each region. Allele counter already performs logic to account for cigar operations counts reads at specific positions. PiperOrigin-RevId: 645466191

Commit:8a36d86
Author:juancmier
Committer:Copybara-Service

This CL adds a new channel to DeepVariant that contains the average coverage of the reads aligned to any given position or base in the genome. PiperOrigin-RevId: 643492127

Commit:68e429e
Author:lucasbrambrink
Committer:Copybara-Service

Adds `--skip_pileup_image_generation` to `make_examples` that determines if pileup images are also generated when generating small model examples. PiperOrigin-RevId: 641053022

Commit:48e95c6
Author:danielecook
Committer:Copybara-Service

Add flag for downsampling training classes. PiperOrigin-RevId: 640646334

Commit:d96c5a8
Author:pichuan
Committer:Copybara-Service

Add optional flags: exclude_variants_vcf_filename and exclude_variants_af_threshold. PiperOrigin-RevId: 638788722

Commit:2c68cfe
Author:shafin
Committer:Copybara-Service

Internal change PiperOrigin-RevId: 636994465

Commit:40a8cba
Author:koles
Committer:Copybara-Service

make_example flags to support fast pipeline. PiperOrigin-RevId: 634539441

Commit:87ad5fc
Author:pichuan
Committer:Copybara-Service

Use the channels_enum_to_blank sample option in pangenome-aware implementation. PiperOrigin-RevId: 625533161

Commit:aacaec8
Author:lucasbrambrink
Committer:Copybara-Service

Integrates small model variant calling with make_examples_core PiperOrigin-RevId: 625487979

Commit:6991c56
Author:lucasbrambrink
Committer:Copybara-Service

Adds small model example generation code. PiperOrigin-RevId: 621262627

Commit:f91cc35
Author:gjorge
Committer:Copybara-Service

Small documentation fixes. PiperOrigin-RevId: 618323356

Commit:03df409
Author:pichuan
Committer:kishwarshafin

Resolve a trivial conflict in deepvariant.proto

Commit:e56cfa2
Author:danielecook
Committer:Copybara-Service

Update DeepSomatic options proto. PiperOrigin-RevId: 613625838

Commit:7f0f6ff
Author:mobinasri
Committer:Copybara-Service

Pangenome aware calling PiperOrigin-RevId: 612593727

Commit:7a9df63
Author:koles
Committer:Copybara-Service

Add --trim_reads_for_pileup flag. PiperOrigin-RevId: 610442006

Commit:d8611fd
Author:msamman
Committer:Copybara-Service

remove base channels from pileup_image_native.cc PiperOrigin-RevId: 603467957

Commit:cc11840
Author:pichuan
Committer:Copybara-Service

In call_variants, if --include_debug_info is true, we'll incorporate the functionality in `vis.curate_pileup` to add extra annotations to the images. PiperOrigin-RevId: 603201809

Commit:1a97858
Author:yuchenzz
Committer:Pi-Chuan Chang

Add a new field in DebugInfo, so we can later add activation layers outputs. PiperOrigin-RevId: 600887239

Commit:0afce8c
Author:pichuan
Committer:Pi-Chuan Chang

Optionally set `deterministic` for SerializedToString, which can help the testdata creation more deterministic. Add a flag to set this value, but default is False (no-op). PiperOrigin-RevId: 579340935

Commit:8e0dc61
Author:yuchenzz
Committer:Copybara-Service

Add a new field in DebugInfo, so we can later add activation layers outputs. PiperOrigin-RevId: 600887239

Commit:dbd39b5
Author:pichuan
Committer:Copybara-Service

Optionally set `deterministic` for SerializedToString, which can help the testdata creation more deterministic. Add a flag to set this value, but default is False (no-op). PiperOrigin-RevId: 579340935

Commit:28c2a0a
Author:akiraly
Committer:kishwarshafin

Add encoded_image in DebugInfo. PiperOrigin-RevId: 571982144

Commit:a25a3d1
Author:akiraly
Committer:Copybara-Service

Add encoded_image in DebugInfo. PiperOrigin-RevId: 571982144

Commit:493b9d5
Author:shafin
Committer:Pi-Chuan Chang

Add `de_novo` label in deepvariant examples for training with `sample_weight`. PiperOrigin-RevId: 544565682

Commit:ae6b749
Author:shafin
Committer:Copybara-Service

Add `de_novo` label in deepvariant examples for training with `sample_weight`. PiperOrigin-RevId: 544565682

Commit:30cac7f
Author:danielecook
Committer:Pi-Chuan Chang

Add a `--output_sitelist` flag to DeepVariant. PiperOrigin-RevId: 528879605

Commit:2eff181
Author:koles
Committer:Pi-Chuan Chang

Exclude non-DNA regions larger than 300,000 bases. PiperOrigin-RevId: 520783371

Commit:4a35682
Author:pichuan
Committer:Pi-Chuan Chang

[somatic] Add `max_fraction_snps_for_non_target_sample` and `max_fraction_indels_for_non_target_sample` which is used to remove variants that have higher AF from the normal (non-target in the somatic context) sample. PiperOrigin-RevId: 520666963

Commit:7f574af
Author:pichuan
Committer:Pi-Chuan Chang

For make_examples_somatic calling mode, we don't need to generate examples for _normal. `make_examples_somatic` uses multisample code, which previously will always generate examples for each of the sample. In this change, we introduce an option to skip examples and other outputs for specified samples. And, if only one sample is generating output, the code will skip adding the _sample suffix completely. PiperOrigin-RevId: 524380775

Commit:82a5837
Author:pichuan
Committer:Pi-Chuan Chang

Limit reads per region depending on dynamic number of bases covered. PiperOrigin-RevId: 527439333

Commit:4e0d0a0
Author:koles
Committer:Pi-Chuan Chang

Add optional phasing info output. PiperOrigin-RevId: 520372805

Commit:49d3eb2
Author:danielecook
Committer:Copybara-Service

Add a `--output_sitelist` flag to DeepVariant. PiperOrigin-RevId: 528879605

Commit:11e4fb6
Author:pichuan
Committer:Copybara-Service

Limit reads per region depending on dynamic number of bases covered. PiperOrigin-RevId: 527439333

Commit:e39891b
Author:pichuan
Committer:Copybara-Service

For make_examples_somatic calling mode, we don't need to generate examples for _normal. `make_examples_somatic` uses multisample code, which previously will always generate examples for each of the sample. In this change, we introduce an option to skip examples and other outputs for specified samples. And, if only one sample is generating output, the code will skip adding the _sample suffix completely. PiperOrigin-RevId: 524380775

Commit:ed415a5
Author:koles
Committer:Copybara-Service

Exclude non-DNA regions larger than 300,000 bases. PiperOrigin-RevId: 520783371

Commit:b52cd1c
Author:pichuan
Committer:Copybara-Service

[somatic] Add `max_fraction_snps_for_non_target_sample` and `max_fraction_indels_for_non_target_sample` which is used to remove variants that have higher AF from the normal (non-target in the somatic context) sample. PiperOrigin-RevId: 520666963

Commit:f163e64
Author:koles
Committer:Copybara-Service

Add optional phasing info output. PiperOrigin-RevId: 520372805

Commit:1c7b84a
Author:pichuan
Committer:Copybara-Service

Internal change. PiperOrigin-RevId: 508414273

Commit:4d0eb0d
Author:pichuan
Committer:Copybara-Service

Internal change. PiperOrigin-RevId: 508184474

Commit:20316b0
Author:koles
Committer:Copybara-Service

Internal change. PiperOrigin-RevId: 508127777

Commit:e03fd88
Author:pichuan
Committer:Copybara-Service

Internal change. PiperOrigin-RevId: 504968788

Commit:dcbc452
Author:pichuan
Committer:Copybara-Service

Internal change. PiperOrigin-RevId: 504412600

Commit:b7a042f
Author:pichuan
Committer:Copybara-Service

For adding the normalize_reads field to the config used for internal scripts. PiperOrigin-RevId: 503846990

Commit:538e8cb
Author:koles
Committer:Copybara-Service

Set region padding for phasing according to the region length. PiperOrigin-RevId: 503528336

Commit:aa27de1
Author:pichuan
Committer:Copybara-Service

This commit is to set up a new feature contributed by Doron Shem-Tov (@doron-st) from Ultima Genomics) - for multi-sample use case, optionally enable realignment jointly among all samples. Current default behavior is that we realign each sample individually. PiperOrigin-RevId: 494850034

Commit:b1f1b08
Author:koles
Committer:Copybara-Service

Added option to run make_examples to generate candidate positions only. PiperOrigin-RevId: 489575282

Commit:66766e3
Author:slzarate
Committer:Copybara-Service

Fix typo in channel specification. PiperOrigin-RevId: 475619583

Commit:a60bd4d
Author:danielecook
Committer:Copybara-Service

Internal Update PiperOrigin-RevId: 473070351

Commit:1ba2f50
Author:pichuan
Committer:Copybara-Service

Add phase_max_candidates: If the number of candidates exceeds this number in a window, skip direct phasing. PiperOrigin-RevId: 449537411

Commit:ffd1516
Author:pichuan
Committer:Copybara-Service

Automated g4 rollback of changelist 404595398 PiperOrigin-RevId: 446832797

Commit:3c5bcef
Author:pichuan
Committer:Copybara-Service

Make DeepVariantChannelEnum. Add the corresponding code so that make_examples code will store the list of channel enum in each examples. Also, change the training code so that it generates a model.ckpt.input_info file based on the first example it sees. PiperOrigin-RevId: 441069906

Commit:1e331ca
Author:pichuan
Committer:Copybara-Service

Update for DeepTrio Direct Phasing. PiperOrigin-RevId: 440477898

Commit:307f99f
Author:koles
Committer:Copybara-Service

Added "reverse_haplotypes" flag to improve training using direct phasing. PiperOrigin-RevId: 437321625