Skip to content

Tags: samtools/bcftools

Tags

1.21

Toggle 1.21's commit message

Verified

This tag was signed with the committer’s verified signature.
bcftools release 1.21:

Changes affecting the whole of bcftools, or multiple commands:

* Support multiple semicolon-separated strings when filtering by ID
  using -i/-e (#2190).
  For example, `-i 'ID="rs123"'` now correctly matches `rs123;rs456`

* The filtering expression ILEN can be positive (insertion), negative
  (deletion), zero (balanced substitutions), or set to missing value
  (symbolic alleles).

* bcftools query
* bcftools +split-vep

    - The columns indices printed by default with `-H` (e.g.,
      "#[1]CHROM") can be now suppressed by giving the option
      twice `-HH` (#2152)

Changes affecting specific commands:

* bcftools annotate

    - Support dynamic variables read from a tab-delimited annotation
      file (#2151). For example, in the two cases below the field
      'STR' from the -a file is required to match the INFO/TAG in
      VCF. In the first example the alleles REF,ALT must match, in
      the second example they are ignored. The option -k is required
      to output also records that were not annotated:

       bcftools annotate -a ann.tsv.gz \
          -c CHROM,POS,REF,ALT,SCORE,~STR -i'TAG={STR}' -k in.vcf

       bcftools annotate -a ann.tsv.gz \
          -c CHROM,POS,-,-,SCORE,~STR     -i'TAG={STR}' -k in.vcf

    - When adding Type=String annotations from a tab-delimited file,
      encode characters with special meaning using percent encoding
      (';', '=' in INFO and ':' in FORMAT) (#2202)

* bcftools consensus

    - Allow to apply a reference allele which overlaps a previous
      deletion, there is no need to complain about overlapping
      alleles in such case

    - Fix a bug which required `-s -` to be present even when there
      were no samples in the VCF (#2260)

* bcftools csq

    - Fix a rare bug where indel combined with a substitution ending
      at exon boundary is incorrectly predicted to have 'inframe'
      rather than 'frameshift' consequence (#2212)

* bcftools gtcheck

    - Fix a segfault with --no-HWE-prob. The bug was introduced with
      the output format change in 1.19 which replaced the DC section
      with DCv2 (#2180)

    - The number of matching genotypes in the DCv2 output was not
      calculated correctly with non-zero `-E, --error-probability`.
      Consequently, also the average HWE score was incorrect. The
      main output, the discordance score, was not affected by the bug

* bcftools +mendelian2

    - Include the number of good cases where at least one of the trio
      genotypes has an alternate allele (#2204)

    - Fix the error message which would report the wrong sample when
      non-existent sample is given. Note that bug only affected the
      error message, the program otherwise assigns the family members
      correctly (#2242)

* bcftools merge

    - Fix a severe bug in merging of FORMAT fields with Number=R and
      Number=A values. For example, rows with high-coverage FORMAT/AD
      values (bigger or equal to 128) could have been assigned to
      incorrect samples. The bug was introduced in version 1.19. For
      details see #2244.

* bcftools mpileup

    - Return non-zero error code when the input BAM/CRAM file is
      truncated (#2177)

    - Add FORMAT/AD annotation by default, disable with `-a -AD`

* bcftools norm

    - Support realignment of symbolic <DUP.*> alleles, similarly to
      <DEL.*> added previously (#1919,#2145)

    - Fix in reporting reference allele genotypes with
      `--multi-overlaps .` (#2160)

    - Support of duplicate removal of symbolic alleles of the same
      type but different SVLEN (#2182)

    - New `-S, --sort` switch to optionally sort output records by
      allele (#1484)

    - Add the `-i/-e` filtering options to select records for
      normalization. Note duplicate removal ignores this option.

    - Fix a bug where `--atomize` would not fill GT alleles for
      atomized SNVs followed by an indel (#2239)

* bcftools +remove-overlaps

    - Revamp the program to allow greater flexibility, with the
      following new options:

      -M, --mark-tag TAG   Mark -m sites with INFO/TAG

      -m, --mark EXPR      Mark (if also -M is present)
                           or remove sites [overlap]

            dup       .. all overlapping sites
            overlap   .. overlapping sites
            min(QUAL) .. mark sites with lowest QUAL until overlaps
                         are resolved

      --missing EXPR       Value to use for missing tags
                           with -m 'min(QUAL)'

            0   .. the default
            DP  .. heuristics, scale maximum QUAL value proportionally
                   to INFO/DP

      --reverse            Apply the reverse logic, for example
                           preserve duplicates instead of removing

      -O, --output-type t  t: plain list of sites (chr,pos),
                           tz: compressed list

* bcftools +tag2tag

    - The conversions --LXX-to-XX, --XX-to-LXX were working but
      specific cases such as --LAD-to-AD were not.

    - Print more informative error message when source tag type
      violiates VCF specification

* bcftools +trio-dnm2

    - Better handling of the --strictly-novel functionality,
      especically with respect to chrX inheritance

1.20

Toggle 1.20's commit message

Verified

This tag was signed with the committer’s verified signature.
bcftools release 1.20:

Changes affecting the whole of bcftools, or multiple commands:

* Add short option -W for --write-index. The option now accepts an
  optional parameter which allows to choose between TBI and CSI index
  format.

Changes affecting specific commands:

* bcftools consensus

    - Add new --regions-overlap option which allows to take into
      account overlapping deletions that start out of the fasta file
      target region.

* bcftools isec

    - Add new option `-l, --file-list` to read the list of file names
      from a file

* bcftools merge

    - Add new option `--force-single` to support single-file edge
      case (#2100)

* bcftools mpileup

    - Add new option --indels-cns for an alternative indel calling
      model, which should increase the speed on long read data
      (thanks to using edlib) and the precision (thanks to a number
      of heuristics).

* bcftools norm

    - Change the order of atomization and multiallelic splitting
      (when both -a,-m are given) from "atomize first, then split"
      to "split first, then atomize". This usually results in a
      simpler VCF representation. The previous behaviour can be
      achieved by explicitly streaming the output of the --atomize
      command into the --multiallelics splitting command.

    - Fix Type=String multiallelic splitting for Number=A,R,G tags
      with incorrect number of values.

    - Merging into multiallelic sites with `bcftools norm -m +indels`
      did not work. This is now fixed and the merging is now more
      strict about variant types, for example complex events, such as
      AC>TGA, are not considered as indels anymore (#2084)

* bcftools reheader

    - Allow reading the input file from a stream with --fai (#2088)

* bcftools +setGT

    - Support for custom genotypes based on the allele with higher
      depth, such as `--new-gt c:0/X` custom genotypes (#2065)

* bcftools +split-vep

    - When only one of the tags is present, automatically choose
      INFO/BCSQ (the default tag name produced by `bcftools csq`)
      or INFO/CSQ (produced by VEP). When both tags are present,
      use the default INFO/CSQ.

    - Transcript selection by MANE, PICK, and user-defined
      transcripts, for example

       --select CANONICAL=YES
       --select MANE_SELECT!=""
       --select PolyPhen~probably_damaging

    - Select all matching transcripts via --select, not just one

    - Change automatic type parsing of VEP fields DNA_position,
      CDS_position, and Protein_position from Integer to String,
      as it can be of the form "8586-8599/9231". The type Integer
      can be still enforced with
      `-c cDNA_position:int,CDS_position:int,Protein_position:int`.

    - Recognize `-c field:str`, not just `-c field:string`, as
      advertised in the usage page

    - Fix a bug which made filtering expression containing missing
      values crash (#2098)

* bcftools stats

    - When GT is missing but AD is present, the program determines
      the alternate allele from AD. However, if the AD tag has
      incorrect number of values, the program would exit with an
      error printing "Requested allele outside valid range". This
      is now fixed by taking into account the actual number of ALT
      alleles.

* bcftools +tag2tag

    - Support for conversion from tags using localized alleles (e.g.
      LPL, LAD) to the family of standard tags (PL, AD)

* bcftools +trio-dnm2

    - Extend --strictly-novel to exclude cases where the
      non-Mendelian allele is the reference allele. The change is
      motivated by the observation that this class of variants is
      enriched for errors (especially for indels), and better
      corresponds with the option name.

1.19

Toggle 1.19's commit message

Verified

This tag was signed with the committer’s verified signature.
bcftools release 1.19:

Changes affecting the whole of bcftools, or multiple commands:

* Filtering expressions can be given a file with list of strings to
  match, this was previously possible only for the ID column. For
  example

  ID=@file            .. selects lines with ID present in the file
  INFO/[email protected]  .. selects lines where TAG has a string value
                         listed in the file
  INFO/[email protected] .. TAG must not have a string
                         value listed in the file

* Allow to query REF,ALT columns directly, for example

    -e 'REF="N"'

Changes affecting specific commands:

* bcftools annotate

    - Fix `bcftools annotate --mark-sites`, VCF sites overlapping
      regions in a BED file were not annotated (#1989)

    - Add flexibility to FILTER column transfers and allow
      transfers within the same file, across files, and in
      combination. For examples see
      http://samtools.github.io/bcftools/howtos/annotate.html#transfer_filter_to_info

* bcftools call

    - Output MIN_DP rather than MinDP in gVCF mode

    - New `-*, --keep-unseen-allele` option to output the unobserved
      allele <*>, intended for gVCF.

* bcftools head

    - New `-s, --samples` option to include the #CHROM header line
      with samples.

* bcftools gtcheck

    - Add output options `-o, --output` and `-O, --output-type`

    - Add filtering options `-i, --include` and `-e, --exclude`

    - Rename the short option `-e, --error-probability` from lower
      case to upper case `-E, --error-probability`

    - Changes to the output format, replace the DC section with DCv2:

        - adds a new column for the number of matching genotypes

        - The --error-probability is newly interpreted as the
          probability of erroneous allele rather than genotype. In
          other words, the calculation of the discordance score
          now considers the probability of genotyping error to be
          different for HOM and HET genotypes, i.e. P(0/1|dsg=0) >
          P(1/1|dsg=0).

        - fixes in HWE score calculation plus output average HWE
          score rather than absolute HWE score

        - better description of fields

* bcftools merge

    - Add `-m` modifiers to suppress the output of the unseen allele
      <*> or <NON_REF> at variant sites (e.g. `-m both,*`) or all
      sites (e.g. `-m both,**`)

* bcftools mpileup

    - Output MIN_DP rather than MinDP in gVCF mode

* bcftools norm

    - Add the number of joined lines to the summary output, for
      example

      Lines   total/split/joined/realigned/skipped:  6/0/3/0/0

    - Allow combining -m and -a with --old-rec-tag (#2020)

    - Symbolic <DEL> alleles caused norm to expand REF to the full
      length of the deletion. This was not intended and problematic
      for long deletions, the REF allele should list one base only
      (#2029)

* bcftools query

    - Add new `-N, --disable-automatic-newline` option for pre-1.18
      query formatting behavior when newline would not be added when
      missing

    - Make the automatic addition of the newline character in a more
      predictable way and, when missing, always put it at the end of
      the expression. In version 1.18 it could be added at the end of
      the expression (for per-site expressions) or inside the square
      brackets (for per-sample expressions). The new behavior is:

        - if the formatting expression contains a newline character,
          do nothing

        - if there is no newline character and -N,
          --disable-automatic-newline is given, do nothing

        - if there is no newline character and -N is not given,
          insert newline at the end of the expression

 See #1969 for details

    - Add new `-F, --print-filtered` option to output a default
      string for samples that would otherwise be filtered by
      `-i/-e` expressions.

    - Include sample name in the output header with `-H` whenever it
      makes sense (#1992)

* bcftools +spit-vep

    - Fix on the fly filtering involving numeric subfields, e.g. `-i
      'MAX_AF<0.001'` (#2039)

    - Interpret default column type names (--columns-types) as entire
      strings, rather than substrings to avoid unexpected spurious
      matches (i.e. internally add ^ and $ to all field names)

* bcftools +trio-dnm2

    - Do not flag paternal genotyping errors as de novo mutations.
      Specifically, when father's chrX genotype is 0/1 and mother's
      0/0, 0/1 in the child will not be marked as DNM.

* bcftools view

    - Add new `-A, --trim-unseen-allele` option to remove the unseen
      allele <*> or <NON_REF> at variant sites (`-A`) or all sites
      (`-AA`)

1.18

Toggle 1.18's commit message

Verified

This tag was signed with the committer’s verified signature.
bcftools release 1.18:

Changes affecting the whole of bcftools, or multiple commands:

* Support auto indexing during writing BCF and VCF.gz via new
  `--write-index` option

Changes affecting specific commands:

* bcftools annotate

    - The `-m, --mark-sites` option can be now used to mark all sites
      without the need to provide the `-a` file (#1861)

    - Fix a bug where the `-m` function did not respect the
      `--min-overlap` option (#1869)

    - Fix a bug when update of INFO/END results in assertion error
      (#1957)

* bcftools concat

    - New option `--drop-genotypes`

* bcftools consensus

    - Support higher-ploidy genotypes with `-H, --haplotype` (#1892)

    - Allow `--mark-ins` and `--mark-snv` with a character, similarly
      to `--mark-del`

* bcftools convert

    - Support for conversion from tab-delimited files
      (CHROM,POS,REF,ALT) to sites-only VCFs

* bcftools csq

    - New `--unify-chr-names` option to automatically unify different
      chromosome naming conventions in the input GFF, fasta and VCF
      files (e.g. "chrX" vs "X")

    - More versatility in parsing various flavors of GFF

    - A new `--dump-gff` option to help with debugging and
      investigating the internals of hGFF parsing

    - When printing consequences in nonsense mediated decay
      transcripts, include 'NMD_transcript' in the consequence part
      of the annotation. This is to make filtering easier and
      analogous to VEP annotations. For example the consequence
      annotation 3_prime_utr|PCGF3|ENST00000430644|NMD is newly
      printed as 3_prime_utr&NMD_transcript|PCGF3|ENST00000430644|NMD

* bcftools gtcheck

    - Add stats for the number of sites matched in the GT-vs-GT,
      GT-vs-PL, etc modes. This information is important for
      interpretation of the discordance score, as only the
      GT-vs-GT matching can be interpreted as the number of
      mismatching genotypes.

* bcftools +mendelian2

    - Fix in command line argument parsing, the `-p` and `-P` options
      were not functioning (#1906)

* bcftools merge

    - New `-M, --missing-rules` option to control the behavior of
      merging of vector tags to prevent mixtures of known and missing
      values in tags when desired

    - Use values pertaining to the unknown allele (<*> or <NON_REF>)
      when available to prevent mixtures of known and missing values
      (#1888)

    - Revamped line matching code to fix problems in gVCF merging
      where split gVCF blocks would not update genotypes (#1891,
      #1164).

* bcftool mpileup

    - Fix a bug in --indels-v2.0 which caused an endless loop when
      CIGAR operator 'H' or 'P' was encountered

* bcftools norm

    - The `-m, --multiallelics +` mode now preserves phasing (#1893)

    - Symbolic <DEL.*> alleles are now normalized too (#1919)

    - New `-g, --gff-annot` option to right-align indels in forward
      transcripts to follow HGVS 3'rule (#1929)

* bcftools query

    - Force newline character in formatting expression when not given
      explicitly

    - Fix `-H` header output in formatting expressions containing
      newlines

* bcftools reheader

    - Make `-f, --fai` aware of long contigs not representable by
      32-bit integer (#1959)

* bcftools +split-vep

    - Prevent a segfault when `-i/-e` use a VEP subfield not included
      in `-f` or `-c` (#1877)

    - New `-X, --keep-sites` option complementing the existing `-x,
      --drop-sites` options

    - Force newline character in formatting expression when not given
      explicitly

    - Fix a subtle ambiguity: identical rows must be returned when
      `-s` is applied regardless of `-f` containing the `-a` VEP tag
      itself or not.

* bcftools stats

    - Collect new VAF (variant allele frequency) statistics from
      FORMAT/AD field

    - When counting transitions/transversions, consider also
      alternate het genotypes

* plot-vcfstats

    - Add three new VAF plots

1.17

Toggle 1.17's commit message

Verified

This tag was signed with the committer’s verified signature.
bcftools release 1.17:

Changes affecting the whole of bcftools, or multiple commands:

* The -i/-e filtering expressions

    - Error checks were added to prevent incorrect use of vector
      arithmetics. For example, when evaluating the sum of two
      vectors A and B, the resulting vector could contain nonsense
      values when the input vectors were not of the same length.
      The fix introduces the following logic:

        - evaluate to C_i = A_i + B_i when length(A)==B(A) and set
          length(C)=length(A)

        - evaluate to C_i = A_i + B_0 when length(B)=1 and set
          length(C)=length(A)

        - evaluate to C_i = A_0 + B_i when length(A)=1 and set
          length(C)=length(B)

        - throw an error when length(A)!=length(B) AND length(A)!=1
          AND length(B)!=1

    - Arrays in Number=R tags can be now subscripted by alleles found
      in FORMAT/GT. For example,

 FORMAT/AD[GT] > 10       .. require support of more than 10 reads for
                             each allele
 FORMAT/AD[0:GT] > 10     .. same as above, but in the first sample
 sSUM(FORMAT/AD[GT]) > 20  .. require total sample depth bigger than 20

* The commands `consensus -H` and `+split-vep -H`

    - Drop unnecessary leading space in the first header column
      and newly print `#[1]columnName` instead of the previous
      `# [1]columnName` (#1856)

Changes affecting specific commands:

* bcftools +allele-length

    - Fix overflow for indels longer than 512bp and aggregate alleles
      equal or larger than that in the same bin (#1837)

* bcftools annotate

    - Support sample reordering of annotation file (#1785)

    - Restore lost functionality of the --pair-logic option (#1808)

* bcftools call

    - Fix a bug where too many alleles passed to `-C alleles` via
      `-T` caused memory corruption (#1790)

    - Fix a bug where indels constrained with `-C alleles -T` would
      sometimes be missed (#1706)

* bcftools consensus

    - BREAKING CHANGE: the option `-I, --iupac-codes` newly outputs
      IUPAC codes based on FORMAT/GT of all samples. The `-s,
      --samples` and `-S, --samples-file` options can be used to
      subset samples. In order to ignore samples and consider only
      the REF and ALT columns (the original behavior prior to
      1.17), run with `-s -` (#1828)

* bcftools convert

    - Make variantkey conversion work for sites without an ALT allele
      (#1806)

* bcftool csq

    - Fix a bug where a MNV with multiple consequences (e.g. missense
      + stop_gained) would report only the less severe one (#1810)

    - GFF file parsing was made slightly more flexible, newly ids can
      be just 'XXX' rather than, for example, 'gene:XXX'

    - New gff2gff perl script to fix GFF formatting differences

* bcftools +fill-tags

    - More of the available annotations are now added by the `-t all`
      option

* bcftools +fixref

    - New INFO/FIXREF annotation

    - New -m swap mode

* bcftools +mendelian

    - The +mendelian plugin has been deprecated and replaced with
      +mendelian2. The function of the plugin is the same but the
      command line options and the output format has changed, and for
      this was introduced as a new plugin.

* bcftools mpileup

    - Most of the annotations generated by mpileup are now optional
      via the `-a, --annotate` option and add several new (mostly
      experimental) annotations.

    - New option `--indels-2.0` for an EXPERIMENTAL indel calling
      model. This model aims to address some known deficiencies of
      the current indel calling algorithm, specifically, it uses
      diploid reference consensus sequence. Note that in the current
      version it has the potential to increase sensitivity but at
      the cost of decreased specificity.

    - Make the FS annotation (Fisher exact test strand bias)
      functional and remove it from the default annotations

* bcftools norm

    - New --multi-overlaps option allows to set overlapping alleles
      either to the ref allele (the current default) or to a missing
      allele (#1764 and #1802)

    - Fixed a bug in `-m -` which does not split missing FORMAT
      values correctly and could lead to empty FORMAT fields such
      as `::` instead of the correct `:.:` (#1818)

    - The `--atomize` option previously would not split complex
      indels such as C>GGG. Newly these will be split into two
      records C>G and C>CGG (#1832)

* bcftools query

    - Fix a rare bug where the printing of SAMPLE field with `query`
      was incorrectly suppressed when the `-e` option contained a
      sample expression while the formatting query did not. See #1783
      for details.

* bcftools +setGT

    - Add new `--new-gt X` option (#1800)

    - Add new `--target-gt r:FLOAT` option to randomly select a
      proportion of genotypes (#1850)

    - Fix a bug where `-t ./x` mode was advertised as selecting both
      phased and unphased half-missing genotypes, but was in fact
      selecting only unphased genotypes (#1844)

* bcftools +split-vep

    - New options `-g, --gene-list` and `--gene-list-fields` which
      allow to prioritize consequences from a list of genes, or
      restrict output to the listed genes

    - New `-H, --print-header` option to print the header with `-f`

    - Work around a bug in the LOFTEE VEP plugin used to annotate
      gnomAD VCFs. There the LoF_info subfield contains commas
      which, in general, makes it impossible to parse the VEP
      subfields. The +split-vep plugin can now work with such files,
      replacing the offending commas with slash (/) characters. See
      also Ensembl/ensembl-vep#1351

    - Newly the `-c, --columns` option can be omitted when a
      subfield is used in `-i/-e` filtering expression. Note that
      `-c` may still have to be given when it is not possible to
      infer the type of the subfield. Note that this is an
      experimental feature.

* bcftools stats

    - The per-sample stats (PSC) would not be computed when `-i/-e`
      filtering options and the `-s -` option were given but the
      expression did not include sample columns (1835)

* bcftools +tag2tag

    - Revamp of the plugin to allow wider range of tag
      conversions, specifically all combinations from
      FORMAT/GL,PL,GP to FORMAT/GL,PL,GP,GT

* bcftools +trio-dnm2

    - New `-n, --strictly-novel` option to downplay alleles which
      violate Mendelian inheritance but are not novel

    - Allow to set the `--pn` and `--pns` options separately for SNVs
      and indels and make the indel settings more strict by default

    - Output missing FORMAT/VAF values in non-trio samples, rather
      than random nonsense values

* bcftools +variant-distance

    - New option `-d, --direction` to choose the directionality:
      forward, reverse, nearest (the default) or both (#1829)

1.16

Toggle 1.16's commit message

Verified

This tag was signed with the committer’s verified signature.
bcftools release 1.16:

* New plugin `bcftools +variant-distance` to annotate records with
  distance to the nearest variant (#1690)

Changes affecting the whole of bcftools, or multiple commands:

* The -i/-e filtering expressions

    - Added support for querying of multiple filters, for example `-i
      'FILTER="A;B"'` can be used to select sites with two filters
      "A" and "B" set. See the documentation for more examples.

    - Added modulo arithmetic operator

Changes affecting specific commands:

* bcftools annotate

    - A bug introduced in 1.14 caused that records with INFO/END
      annotation would incorrectly trigger `-c ~INFO/END` mode of
      comparison even when not explicitly requested, which would
      result in not transferring the annotation from a tab-delimited
      file (#1733)

* bcftools merge

    - New `-m snp-ins-del` switch to merge SNVs, insertions and
      deletions separately (#1704)

* bcftools mpileup

    - New NMBZ annotation for Mann-Whitney U-z test on number of
      mismatches within supporting reads

    - Suppress the output of MQSBZ and FS annotations in absence of
      alternate allele

* bcftools +scatter

    - Fix erroneous addition of duplicate PG lines

* bcftools +setGT

    - Custom genotypes (e.g. `-n c:1/1`) now correctly override
      ploidy

1.15.1

Toggle 1.15.1's commit message

Verified

This tag was signed with the committer’s verified signature.
bcftools release 1.15.1:

* bcftools annotate

    - New `-H, --header-line` convenience option to pass a header
      line on command line, this complements the existing `-h,
      --header-lines` option which requires a file with header lines

* bcftools csq

    - A list of consequence types supported by `bcftools csq` has
      been added to the manual page. (#1671)

* bcftools +fill-tags

    - Extend generalized functions so that FORMAT tags can be filled
      as well, for example:

       bcftools +fill-tags in.bcf -o out.bcf -- \
           -t 'FORMAT/DP:1=int(smpl_sum(FORMAT/AD))'

    - Allow multiple custom functions in a single run. Previously the
      program would silently go with the last one, assigning the same
      values to all (#1684)

* bcftools norm

    - Fix an assertion failure triggered when a faulty VCF file with
      a '-' character in the REF allele was used with `bcftools norm
      --atomize`.  This option now checks that the REF allele only
      includes the allowed characters A, C, G, T and N. (#1668)

    - Fix the loss of phasing in half-missing genotypes in variant
      atomization (#1689)

* bcftools roh

    - Fix a bug that could result in an endless loop or incorrect
      AF estimate when missing genotypes are present and the
      `--estimate-AF -` option was used (#1687)

* bcftools +split-vep

    - VEP fields with characters disallowed in VCF tag names by the
      specification (such as '-' in 'M-CAP') couldn't be queried.
      This has been fixed, the program now sanitizes the field names,
      replacing invalid characters with underscore (#1686)

1.15

Toggle 1.15's commit message

Verified

This tag was signed with the committer’s verified signature.
bcftools release 1.15:

* New `bcftools head` subcommand for conveniently displaying the
  headers of a VCF or BCF file. Without any options, this is
  equivalent to `bcftools view --header-only --no-version` but
  more succinct and memorable.

* The `-T, --targets-file` option had the following bug originating
  in HTSlib code: when an uncompressed file with multiple columns
  CHR,POS,REF was provided, the REF would be interpreted as 0
  gigabases (#1598)

Changes affecting specific commands:

* bcftools annotate

    - In addition to `--rename-annots`, which requires a file with
      name mappings, it is now possible to do the same on the command
      line `-c NEW_TAG:=OLD_TAG`

    - Add new option --min-overlap which allows to specify the
      minimum required overlap of intersecting regions

    - Allow to transfer ALT from VCF with or without replacement
      using:
       bcftools annotate -a annots.vcf.gz -c ALT file.vcf.gz
       bcftools annotate -a annots.vcf.gz -c +ALT file.vcf.gz

* bcftools convert

    - Revamp of `--gensample`, `--hapsample` and `--haplegendsample`
      family of options which includes the following changes:

    - New `--3N6` option to output/input the new version of the
      .gen file format, see
      https://www.cog-genomics.org/plink/2.0/formats#gen

    - Deprecate the `--chrom` option in favor of `--3N6`. A
      simple `cut` command can be used to convert from the new
      3*M+6 column format to the format printed with `--chrom`
      (`cut -d' ' -f1,3-`).

    - The CHROM:POS_REF_ALT IDs which are used to detect strand
      swaps are required and must appear either in the "SNP ID"
      column or the "rsID" column. The column is autodetected for
      `--gensample2vcf`, can be the first or the second for
      `--hapsample2vcf` (depending on whether the `--vcf-ids` option
      is given), must be the first for `--haplegendsample2vcf`.

* bcftools csq

    - Allow GFF files with phase column unset

* bcftools filter

    - New `--mask`, `--mask-file` and `--mask-overlap` options to
      soft filter variants in regions (#1635)

* bcftools +fixref

    - The `-m id` option now works also for non-dbSNP ids, i.e. not
      just `rsINT`

    - New `-m flip-all` mode for flipping all sites, including
      ambiguous A/T and C/G sites

* bcftools isec

    - Prevent segfault on sites filtered with -i/-e in all files
      (#1632)

* bcftools mpileup

    - More flexible read filtering using the options:
      --ls, --skip-all-set   skip reads with all of the FLAG bits set
      --ns, --skip-any-set   skip reads with any of the FLAG bits set
      --lu, --skip-all-unset skip reads with all of the FLAG bits unset
      --nu, --skip-any-unset skip reads with any of the FLAG bits unset

      The existing synonymous options will continue to function but
      their use is discouraged:
      --rf, --incl-flags  Required flags: skip reads with mask bits unset
      --ff, --excl-flags  Filter flags: skip reads with mask bits set

* bcftools query

    - Make the `--samples` and `--samples-file` options work also in
      the `--list-samples` mode. Add a new `--force-samples` option
      which allows to proceed even when some of the requested samples
      are not present in the VCF (#1631)

* bcftools +setGT

    - Fix a bug in `-t q -e EXPR` logic applied on FORMAT fields,
      sites with all samples failing the expression EXPR were
      incorrectly skipped. This problem affected only the use of
      `-e` logic, not the `-i` expressions (#1607)

* bcftools sort

    - make use of the TMPDIR environment variable when defined

* bcftools +trio-dnm2

    - The --use-NAIVE mode now also adds the de novo allele in
      FORMAT/VA

1.14

Toggle 1.14's commit message

Verified

This tag was signed with the committer’s verified signature.
bcftools release 1.14:

Changes affecting the whole of bcftools, or multiple commands:

* New `--regions-overlap` and `--targets-overlap` options which
  address a long-standing design problem with subsetting VCF files
  by region.  BCFtools recognize two sets of options, one for
  streaming (`-t/-T`) and one for index-gumping (`-r/-R`). They
  behave differently, the first includes only records with POS
  coordinate within the regions, the other includes overlapping
  regions. The two new options allow to modify the default behaviour,
  see the man page for more details.

* The `--output-type` option can be used to override the default
  compression level

Changes affecting specific commands:

* bcftools annotate

    - when `--set-id` and `--remove` are combined, `--set-id` cannot
      use tags deleted by `--remove`. This is now detected and the
      program exists with an informative error message instead of
      segfaulting (#1540)

    - while non-symbolic variation are uniquely identified by
      POS,REF,ALT, symbolic alleles starting at the same position
      were indistinguishable. This prevented correct matching of
      records with the same positions and variant type but different
      length given by INFO/END (samtools/htslib@60977f2). When
      annotating from a VCF/BCF, the matching is done automatically.
      When annotating from a tab-delimited text file, this feature
      can be invoked by using `-c INFO/END`.

    - add a new '.' modifier to control whether missing values should
      be carried over from a tab-delimited file or not. For example: 

      -c TAG .. adds TAG if the source value is not missing. If TAG
      exists in the target file, it will be overwritten

      -c .TAG .. adds TAG even if the source value is missing. This
      can overwrite non-missing values with a missing value and can
      create empty VCF fields (`TAG=.`)

* bcftools +check-ploidy

    - by default missing genotypes are not used when determining
      ploidy. With the new option `-m, --use-missing` it is possible
      to use the information carried in the missing and half-missing
      genotypes (e.g. ".", "./." or "./1")

* bcftools concat:

    - new `--ligate-force` and `--ligate-warn` options for finer
      control of `-l, --ligate` behaviour in imperfect overlaps.
      The new default is to throw an error when sites present in
      one chunk but absent in the other are encountered. To drop
      such sites and proceed, use the new `--ligate-warn` option
      (previously this was the default). To keep such sites, use
      the new `--ligate-force` option (#1567).

* bcftools consensus:

    - Apply mask even when the VCF has no notion about the
      chromosome. It was possible to encounter this problem when
      `contig` lines were not present in the VCF header and no
      variants were called on that chromosome (#1592)

* bcftools +contrast:

    - support for chunking within map/reduce framework allowing to
      collect NASSOC counts even for empty case/control sample sets
      (#1566)

* bcftools csq:

    - bug fix, compound indels were not recognised in some cases
      (#1536)

    - compound variants were incorrectly marked as 'inframe' even
      when stop codon would occur before the frame was restored
      (#1551)

    - bug fix, FORMAT/BCSQ bitmasks could have been assigned
      incorrectly to some samples at multiallelic sites, a
      superset of the correct consequences would have been set
      (#1539)

    - bug fix, the upstream stop could be falsely assigned to all
      samples in a multi-sample VCF even if the stop was relevant for
      a single sample only (#1578)

    - further improve the detection of mismatching chromosome naming
      (e.g. "chrX" vs "X") in the GFF, VCF and fasta files

* bcftools merge:

    - keep (sum) INFO/AN,AC values when merging VCFs with no samples
      (#1394)

* bcftools mpileup:

    - new --indel-size option which allows to increase the maximum
      considered indel size considered, large deletions in long read
      data are otherwise lost.

* bcftools norm:

    - atomization now supports Number=A,R string annotations (#1503)

    - assign as many alternate alleles to genotypes at multiallelic
      sites in the`-m +` mode, disregarding the phase.  Previously
      the program assumed to be executed as an inverse operation of
      `-m -`, but when that was not the case, reference alleles would
      have been filled instead of multiple alternate alleles (#1542)

* bcftools sort:

    - increase accuracy of the --max-mem option limit, previously the
      limit could be exceeded by more than 20% (#1576)

* bcftools +trio-dnm:

    - new `--with-pAD` option to allow processing of VCFs without
      FORMAT/QS. The existing `--ppl` option was changed to the
      analogous `--with-pPL`

* bcftools view:

    - the functionality of the option --compression-level lost in
      1.12 has been restored

1.13

Toggle 1.13's commit message
bcftools release 1.13:

This release brings new options and significant changes in BAQ
parametrizationin `bcftools mpileup`. The previous behavior can
be triggered by providingthe `--config 1.12` option. Please see
https://github.com/samtools/bcftools/pull/1474for details.

Changes affecting the whole of bcftools, or multiple commands:

* Improved build system

Changes affecting specific commands:

* bcftools annotate:

    - Fix rare a bug when INFO/END is present, all INFO fields are
      removed with `bcftools annotate -x INFO` and BCF output is
      produced. Then the removed INFO/END continues to inform the
      end coordinate and causes incorrect retrieval of records with
      the -r option (#1483)

    - Support for matching annotation line by ID, in addition to
      CHROM,POS,REF, and ALT (#1461)

 bcftools annotate -a annots.tab.gz  -c
CHROM,POS,~ID,REF,ALT,INFO/END input.vcf

* bcftools csq:

    - When GFF and VCF/fasta use a different chromosome naming
      convention (e.g. chrX vs X), no consequences would be added.
      Newly the program attempts to detect these differences and
      remove/add the "chr" prefix to chromosome name to match the
      GFF and VCF/fasta (#1507)

    - Parametrize brief-predictions parameter to allow explicit
      number of aminoacids to be printed. Note that the `-b,
      --brief-predictions` option is being replaced with `-B,
      --trim-protein-seq INT`

* bcftools +fill-tags:

    - Generalization and better support for custom functions that
      allow adding new INFO tags based on arbitrary `-i, --include`
      type of  expressions. For example, to calculate a missing
      INFO/DP annotation from FORMAT/AD, it is possible to use:

 -t 'DP:1=int(sum(FORMAT/AD))'

 Here the optional ":1" part specifies that a single value will be
added (by default Number=. is used) and the optional int(...) adds an
integer value (by default Type=Float is used).

    - When FORMAT/GT is not present, the INFO/AF tag will be newly
      calculated from INFO/AC and INFO/AN.

* bcftools gtcheck:

    - Switch between FORMAT/GT or FORMAT/PL when one is (implicitly)
      requested but only the other is available

    - Improve diagnostics, printing warnings when a line cannot be
      matched and the number of lines skipped for various reasons
      (#1444)

    - Minor bug fix, with PLs being the default, the
      `--distinctive-sites` option started to require explicit
      `--error-probability 0`

* bcftools index:

    - The program now accepts both data file name and the index
      file name. This adds to user convenience when running index
      statistics (-n, -s)

* bcftools isec:

    - Always generate sites.txt with isec -p (#1462)

* bcftools +mendelian:

    - Consider only complete trios, do not crash on sample name typos
      (#1520)

* bcftools mpileup:

    - New `--seed` option for reproducibility of subsampling code in
      HTSlib

    - The SCR annotation which shows the number of soft-clipped reads
      now correctly pools reads together regardless of the variant
      type. Previously only reads with indels were included at indel
      sites.

    - Major revamp of BAQ. Please see
      #1474 for details.
      The previous behavior can be triggered by providing the
      `--config 1.12` option.

    - Thanks to improvements in HTSlib, the removal of
      overlapping reads (which can be disabled with the `-x,
      --ignore-overlaps` options) is not systematically biased
      anymore (samtools/htslib#1273)

    - Modified scale of Mann-Whitney U tests. Newly INFO/*Z
      annotations will be printed, for example MQBZ replaces MQB.

* bcftools norm:

    - Fix Type=Flag output in `norm --atomize` (#1472)

    - Atomization must not discard ALT=. records

    - Atomization of AD and QS tags now correctly updates occurrences
      of duplicate alleles within different haplotypes



    - Fix a bug in atomization of Number=A,R tags

* bcftools reheader:

    - Add `-T, --temp-prefix` option

* bcftools +setGT:

    - A wider range of genotypes can be set by the plugin by
      allowing specifying custom genotypes. For example, to force a
      heterozygous genotype it is now possible to use expressions
      like:

     c:'m|M' c:0/1 c:0

* bcftools +split-vep:

    - New `-u, --allow-undef-tags` option

    - Better handling of ambiguous keys such as INFO/AF and CSQ/AD.
      The `-p, --annot-prefix` option is now applied before doing
      anything else which allows its use with `-f, --format` and `-c,
      --columns` options.



    - Some consequence field names may not constitute a valid tag
      name, such as "pos(1-based)". Newly field names are trimmed to
      exclude brackets.

* bcftools +tag2tag:

    - New --QR-QA-to-QS option to convert annotations generated by
      Freebays to QS used by BCFtools

* bcftools +trio-dnm:

    - Add support for sites with more than four alleles. Note that
      only the four most frequent alleles are considered, the model
      remains unchanged. Previously such sites were skipped.

    - New --use-NAIVE option for a naive DNM calling based solely on
      FORMAT/GT and expected Mendelian inheritance. This option is
      suitable for prefiltering.

    - Fix behavior to match the documentation, the `--dnm-tag DNG`
      option now correctly outputs log scaled values by default, not
      phred scaled.

    - Fix bug in VAF calculation, homozygous de novo variants were
      incorrectly reported as having VAF=50%

    - Fix arithmetic underflow which could lead to imprecise scores
      and improve sensitivity in high coverage regions

    - Allow combining --pn and --pns to set the noise trehsholds
      independently