Package org.snpeff.vcf
Class VcfEntry
- java.lang.Object
-
- org.snpeff.interval.Interval
-
- org.snpeff.interval.Marker
-
- org.snpeff.vcf.VcfEntry
-
- All Implemented Interfaces:
java.io.Serializable
,java.lang.Cloneable
,java.lang.Comparable<Interval>
,java.lang.Iterable<VcfGenotype>
,TxtSerializable
public class VcfEntry extends Marker implements java.lang.Iterable<VcfGenotype>
A VCF entry (a line) in a VCF file- Author:
- pablocingolani
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
VcfEntry.AlleleFrequencyType
-
Field Summary
Fields Modifier and Type Field Description static double
ALLELE_FEQUENCY_COMMON
static double
ALLELE_FEQUENCY_LOW
protected java.lang.String[]
alts
protected java.lang.String
altStr
protected java.lang.String
chromosomeName
static java.lang.String[]
EMPTY_STRING_ARRAY
protected java.lang.String
filter
static java.lang.String
FILTER_PASS
protected java.lang.String
format
protected java.lang.String[]
formatFields
protected java.lang.String[]
genotypeFields
protected java.lang.String
genotypeFieldsStr
protected byte[]
genotypeScores
protected java.util.HashMap<java.lang.String,java.lang.String>
info
static java.util.regex.Pattern
INFO_KEY_PATTERN
protected java.lang.String
infoStr
protected java.lang.String
line
protected int
lineNum
protected java.lang.Double
quality
protected java.lang.String
ref
static java.lang.String
SUB_FIELD_SEP
protected java.util.LinkedList<Variant>
variants
static java.lang.String
VCF_ALT_MISSING_REF
static java.lang.String[]
VCF_ALT_MISSING_REF_ARRAY
static java.lang.String
VCF_ALT_NON_REF
static java.lang.String[]
VCF_ALT_NON_REF_ARRAY
static java.lang.String
VCF_ALT_NON_REF_gVCF
static java.lang.String[]
VCF_ALT_NON_REF_gVCF_ARRAY
static java.lang.String
VCF_INFO_END
static java.lang.String
VCF_INFO_HETS
static java.lang.String
VCF_INFO_HOMS
static java.lang.String
VCF_INFO_NAS
static java.lang.String
VCF_INFO_PRIVATE
protected java.util.List<VcfEffect>
vcfEffects
protected VcfFileIterator
vcfFileIterator
protected java.util.ArrayList<VcfGenotype>
vcfGenotypes
static char
WITHIN_FIELD_SEP
-
Fields inherited from class org.snpeff.interval.Interval
chromosomeNameOri, end, id, parent, start, strandMinus
-
-
Constructor Summary
Constructors Constructor Description VcfEntry(VcfFileIterator vcfFileIterator, java.lang.String line, int lineNum, boolean parseNow)
Create a line form a file iteratorVcfEntry(VcfFileIterator vcfFileIterator, Marker parent, java.lang.String chromosomeName, int start, java.lang.String id, java.lang.String ref, java.lang.String altsStr, double quality, java.lang.String filterPass, java.lang.String infoStr, java.lang.String format)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addFilter(java.lang.String filterStr)
Add string to FILTER fieldvoid
addFormat(java.lang.String formatName)
Add a 'FORMAT' fieldvoid
addGenotype(java.lang.String vcfGenotypeStr)
Add a genotype as a stringvoid
addInfo(java.lang.String key, java.lang.String value)
Add a "key=value" tuple the info fieldVcfEntry.AlleleFrequencyType
alleleFrequencyType()
Categorization by allele frequencyjava.lang.Boolean
calcHetero()
Is this entry heterozygous? Infer Hom/Her if there is only one sample in the file.java.lang.String
check()
Perform several simple checks and report problems (if any).static java.lang.String
cleanUnderscores(java.lang.String s)
Return a string without leading, trailing and duplicated underscoresCds
cloneShallow()
Perform a shallow cloneboolean
compressGenotypes()
Compress genotypes into "HO/HE/NA" INFO fieldsboolean
delFilter(java.lang.String filterStr)
Remove a string from FILTER fieldint
getAltIndex(java.lang.String alt)
Get index of matching ALT entryjava.lang.String[]
getAlts()
java.lang.String
getAltsStr()
Create a comma separated ALTS stringjava.lang.String
getChromosomeNameOri()
Original chromosome name (as it appeared in the VCF file)java.lang.String
getFilter()
java.lang.String
getFormat()
java.lang.String[]
getFormatFields()
byte[]
getGenotypesScores()
Return genotypes parsed as an array of codesjava.lang.String
getInfo(java.lang.String key)
Get info stringjava.lang.String
getInfo(java.lang.String key, java.lang.String allele)
Get info string for a specific allelejava.lang.String
getInfo(java.lang.String key, Variant var)
Get an INFO field matching a variantboolean
getInfoFlag(java.lang.String key)
Does the entry exists?double
getInfoFloat(java.lang.String key)
Get info field as a 'double' number The norm specifies data type as 'FLOAT', that is why the name of this method might be not intuitivelong
getInfoInt(java.lang.String key)
Get info field as an long number The norm specifies data type as 'INT', that is why the name of this method might be not intuitivejava.util.Set<java.lang.String>
getInfoKeys()
Get all keys available in the info fieldjava.lang.String
getInfoStr()
Get the full (unparsed) INFO fieldjava.lang.String
getLine()
Original VCF line (from file)int
getLineNum()
int
getNumberOfSamples()
number of samples in this VCF filedouble
getQuality()
java.lang.String
getRef()
java.lang.String
getStr()
java.util.List<VcfEffect>
getVcfEffects()
java.util.List<VcfEffect>
getVcfEffects(EffFormatVersion formatVersion)
Parse 'EFF' info field and get a list of effectsVcfFileIterator
getVcfFileIterator()
VcfGenotype
getVcfGenotype(int index)
java.util.List<VcfGenotype>
getVcfGenotypes()
VcfHeaderInfo
getVcfInfo(java.lang.String id)
Get VcfInfo type for a given IDVcfInfoType
getVcfInfoNumber(java.lang.String id)
Get Info number for a given IDboolean
hasField(java.lang.String filedName)
boolean
hasGenotypes()
boolean
hasInfo(java.lang.String infoFieldName)
boolean
hasQuality()
boolean
isBiAllelic()
Is this bi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation.boolean
isCompressedGenotypes()
Do we have compressed genotypes in "HO,HE,NA" INFO fields?static boolean
isEmpty(java.lang.String value)
Does 'value' represent an EMPTY / MISSING value in a VCF field? (or multiple MISSING comma-separated values)boolean
isFilterPass()
boolean
isMultiallelic()
Is this multi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation.protected boolean
isShowWarningIfParentDoesNotInclude()
Show an error if parent does not include child?boolean
isSingleSnp()
Is thins a VCF entry with a single SNP?boolean
isSingleton()
Is this variant a singleton (appears only in one genotype)static boolean
isValidInfoKey(java.lang.String key)
Make sure the INFO key matches the regular expression (as specified in VCF spec 4.3)static boolean
isValidInfoValue(java.lang.String value)
Check that this value can be added to an INFO fieldboolean
isVariant()
Is this a change or are the ALTs actually the same as the referenceboolean
isVariant(java.lang.String alt)
Is this ALT string a variant?java.util.Iterator<VcfGenotype>
iterator()
int
mac()
Calculate Minor allele countdouble
maf()
Calculate Minor allele frequencyvoid
parse()
Parse a 'line' from a 'vcfFileIterator'java.util.List<VcfLof>
parseLof()
Parse LOF from VcfEntryjava.util.List<VcfNmd>
parseNmd()
Parse NMD from VcfEntryvoid
removeInfo(java.lang.String key)
Remove INFO fieldboolean
rmInfo(java.lang.String info)
Parse INFO fieldsvoid
setFilter(java.lang.String filter)
void
setFormat(java.lang.String format)
void
setGenotypeStr(java.lang.String genotypeFieldsStr)
void
setLineNum(int lineNum)
java.lang.String
toStr()
To string as a simple "CHR:START_REF/ALTs" formatjava.lang.String
toString()
java.lang.String
toStringNoGt()
Show only first eight fields (no genotype entries)VcfEntry
uncompressGenotypes()
Uncompress VCF entry having genotypes in "HO,HE,NA" fieldsjava.util.List<Variant>
variants()
Create a list of variants from this VcfEntrystatic java.lang.String
vcfInfoDecode(java.lang.String str)
Decode INFO valuestatic java.lang.String
vcfInfoEncode(java.lang.String str)
Encode a string to be used in an 'INFO' field value From the VCF 4.3 specification Characters with special meaning (such as field delimiters ';' in INFO or ':' FORMAT fields) must be represented using the capitalized percent encoding: %3A : (colon) %3B ; (semicolon) %3D = (equal sign) %25 % (percent sign) %2C , (comma) %0D CR %0A LF %09 TABstatic java.lang.String
vcfInfoKeySafe(java.lang.String str)
Return a string safe to be used in an 'INFO' field keystatic java.lang.String
vcfInfoValueSafe(java.lang.String str)
Return a string safe to be used in an 'INFO' field value-
Methods inherited from class org.snpeff.interval.Marker
adjust, apply, applyDel, applyDup, applyIns, applyMixed, clone, codonTable, compareTo, compareToPos, distance, distanceBases, getParent, getType, idChain, idChain, idChain, includes, intersect, isAdjustIfParentDoesNotInclude, isDeferredAnalysis, minus, query, query, readTxt, serializeParse, serializeSave, shouldApply, union, variantEffect, variantEffectNonRef
-
Methods inherited from class org.snpeff.interval.Interval
equals, findParent, getChromosome, getChromosomeName, getChromosomeNum, getEnd, getGenome, getGenomeName, getId, getStart, getStrand, hashCode, intersects, intersects, intersects, intersects, intersectSize, isCircular, isSameChromo, isStrandMinus, isStrandPlus, isValid, setChromosomeNameOri, setEnd, setId, setParent, setStart, setStrandMinus, shiftCoordinates, size, toStringAsciiArt, toStrPos
-
-
-
-
Field Detail
-
FILTER_PASS
public static final java.lang.String FILTER_PASS
- See Also:
- Constant Field Values
-
WITHIN_FIELD_SEP
public static final char WITHIN_FIELD_SEP
- See Also:
- Constant Field Values
-
SUB_FIELD_SEP
public static final java.lang.String SUB_FIELD_SEP
- See Also:
- Constant Field Values
-
EMPTY_STRING_ARRAY
public static final java.lang.String[] EMPTY_STRING_ARRAY
-
ALLELE_FEQUENCY_COMMON
public static final double ALLELE_FEQUENCY_COMMON
- See Also:
- Constant Field Values
-
ALLELE_FEQUENCY_LOW
public static final double ALLELE_FEQUENCY_LOW
- See Also:
- Constant Field Values
-
INFO_KEY_PATTERN
public static final java.util.regex.Pattern INFO_KEY_PATTERN
-
VCF_INFO_END
public static final java.lang.String VCF_INFO_END
- See Also:
- Constant Field Values
-
VCF_ALT_NON_REF
public static final java.lang.String VCF_ALT_NON_REF
- See Also:
- Constant Field Values
-
VCF_ALT_NON_REF_gVCF
public static final java.lang.String VCF_ALT_NON_REF_gVCF
- See Also:
- Constant Field Values
-
VCF_ALT_MISSING_REF
public static final java.lang.String VCF_ALT_MISSING_REF
- See Also:
- Constant Field Values
-
VCF_ALT_NON_REF_gVCF_ARRAY
public static final java.lang.String[] VCF_ALT_NON_REF_gVCF_ARRAY
-
VCF_ALT_NON_REF_ARRAY
public static final java.lang.String[] VCF_ALT_NON_REF_ARRAY
-
VCF_ALT_MISSING_REF_ARRAY
public static final java.lang.String[] VCF_ALT_MISSING_REF_ARRAY
-
VCF_INFO_HOMS
public static final java.lang.String VCF_INFO_HOMS
- See Also:
- Constant Field Values
-
VCF_INFO_HETS
public static final java.lang.String VCF_INFO_HETS
- See Also:
- Constant Field Values
-
VCF_INFO_NAS
public static final java.lang.String VCF_INFO_NAS
- See Also:
- Constant Field Values
-
VCF_INFO_PRIVATE
public static final java.lang.String VCF_INFO_PRIVATE
- See Also:
- Constant Field Values
-
alts
protected java.lang.String[] alts
-
altStr
protected java.lang.String altStr
-
chromosomeName
protected java.lang.String chromosomeName
-
filter
protected java.lang.String filter
-
format
protected java.lang.String format
-
formatFields
protected java.lang.String[] formatFields
-
genotypeFields
protected java.lang.String[] genotypeFields
-
genotypeFieldsStr
protected java.lang.String genotypeFieldsStr
-
genotypeScores
protected byte[] genotypeScores
-
info
protected java.util.HashMap<java.lang.String,java.lang.String> info
-
infoStr
protected java.lang.String infoStr
-
line
protected java.lang.String line
-
lineNum
protected int lineNum
-
quality
protected java.lang.Double quality
-
ref
protected java.lang.String ref
-
variants
protected java.util.LinkedList<Variant> variants
-
vcfEffects
protected java.util.List<VcfEffect> vcfEffects
-
vcfFileIterator
protected VcfFileIterator vcfFileIterator
-
vcfGenotypes
protected java.util.ArrayList<VcfGenotype> vcfGenotypes
-
-
Constructor Detail
-
VcfEntry
public VcfEntry(VcfFileIterator vcfFileIterator, Marker parent, java.lang.String chromosomeName, int start, java.lang.String id, java.lang.String ref, java.lang.String altsStr, double quality, java.lang.String filterPass, java.lang.String infoStr, java.lang.String format)
-
VcfEntry
public VcfEntry(VcfFileIterator vcfFileIterator, java.lang.String line, int lineNum, boolean parseNow)
Create a line form a file iterator
-
-
Method Detail
-
cleanUnderscores
public static java.lang.String cleanUnderscores(java.lang.String s)
Return a string without leading, trailing and duplicated underscores
-
isEmpty
public static boolean isEmpty(java.lang.String value)
Does 'value' represent an EMPTY / MISSING value in a VCF field? (or multiple MISSING comma-separated values)
-
isValidInfoKey
public static boolean isValidInfoKey(java.lang.String key)
Make sure the INFO key matches the regular expression (as specified in VCF spec 4.3)
-
isValidInfoValue
public static boolean isValidInfoValue(java.lang.String value)
Check that this value can be added to an INFO field- Returns:
- true if OK, false if invalid value
-
vcfInfoDecode
public static java.lang.String vcfInfoDecode(java.lang.String str)
Decode INFO value
-
vcfInfoEncode
public static java.lang.String vcfInfoEncode(java.lang.String str)
Encode a string to be used in an 'INFO' field value From the VCF 4.3 specification Characters with special meaning (such as field delimiters ';' in INFO or ':' FORMAT fields) must be represented using the capitalized percent encoding: %3A : (colon) %3B ; (semicolon) %3D = (equal sign) %25 % (percent sign) %2C , (comma) %0D CR %0A LF %09 TAB
-
vcfInfoKeySafe
public static java.lang.String vcfInfoKeySafe(java.lang.String str)
Return a string safe to be used in an 'INFO' field key
-
vcfInfoValueSafe
public static java.lang.String vcfInfoValueSafe(java.lang.String str)
Return a string safe to be used in an 'INFO' field value
-
addFilter
public void addFilter(java.lang.String filterStr)
Add string to FILTER field
-
addFormat
public void addFormat(java.lang.String formatName)
Add a 'FORMAT' field
-
addGenotype
public void addGenotype(java.lang.String vcfGenotypeStr)
Add a genotype as a string
-
addInfo
public void addInfo(java.lang.String key, java.lang.String value)
Add a "key=value" tuple the info field- Parameters:
key
- : INFO key namevalue
- : Can be null if it is a boolean field.
-
alleleFrequencyType
public VcfEntry.AlleleFrequencyType alleleFrequencyType()
Categorization by allele frequency
-
calcHetero
public java.lang.Boolean calcHetero()
Is this entry heterozygous? Infer Hom/Her if there is only one sample in the file. Ohtherwise the field is null.
-
check
public java.lang.String check()
Perform several simple checks and report problems (if any).
-
cloneShallow
public Cds cloneShallow()
Description copied from class:Marker
Perform a shallow clone- Overrides:
cloneShallow
in classMarker
-
compressGenotypes
public boolean compressGenotypes()
Compress genotypes into "HO/HE/NA" INFO fields
-
delFilter
public boolean delFilter(java.lang.String filterStr)
Remove a string from FILTER field
-
getAltIndex
public int getAltIndex(java.lang.String alt)
Get index of matching ALT entry- Returns:
- -1 if not found
-
getAlts
public java.lang.String[] getAlts()
-
getAltsStr
public java.lang.String getAltsStr()
Create a comma separated ALTS string
-
getChromosomeNameOri
public java.lang.String getChromosomeNameOri()
Original chromosome name (as it appeared in the VCF file)- Overrides:
getChromosomeNameOri
in classInterval
-
getFilter
public java.lang.String getFilter()
-
getFormat
public java.lang.String getFormat()
-
getFormatFields
public java.lang.String[] getFormatFields()
-
getGenotypesScores
public byte[] getGenotypesScores()
Return genotypes parsed as an array of codes
-
getInfo
public java.lang.String getInfo(java.lang.String key)
Get info string
-
getInfo
public java.lang.String getInfo(java.lang.String key, java.lang.String allele)
Get info string for a specific allele
-
getInfo
public java.lang.String getInfo(java.lang.String key, Variant var)
Get an INFO field matching a variant
-
getInfoFlag
public boolean getInfoFlag(java.lang.String key)
Does the entry exists?
-
getInfoFloat
public double getInfoFloat(java.lang.String key)
Get info field as a 'double' number The norm specifies data type as 'FLOAT', that is why the name of this method might be not intuitive
-
getInfoInt
public long getInfoInt(java.lang.String key)
Get info field as an long number The norm specifies data type as 'INT', that is why the name of this method might be not intuitive
-
getInfoKeys
public java.util.Set<java.lang.String> getInfoKeys()
Get all keys available in the info field
-
getInfoStr
public java.lang.String getInfoStr()
Get the full (unparsed) INFO field
-
getLine
public java.lang.String getLine()
Original VCF line (from file)
-
getLineNum
public int getLineNum()
-
getNumberOfSamples
public int getNumberOfSamples()
number of samples in this VCF file
-
getQuality
public double getQuality()
-
getRef
public java.lang.String getRef()
-
getStr
public java.lang.String getStr()
-
getVcfEffects
public java.util.List<VcfEffect> getVcfEffects()
-
getVcfEffects
public java.util.List<VcfEffect> getVcfEffects(EffFormatVersion formatVersion)
Parse 'EFF' info field and get a list of effects
-
getVcfFileIterator
public VcfFileIterator getVcfFileIterator()
-
getVcfGenotype
public VcfGenotype getVcfGenotype(int index)
-
getVcfGenotypes
public java.util.List<VcfGenotype> getVcfGenotypes()
-
getVcfInfo
public VcfHeaderInfo getVcfInfo(java.lang.String id)
Get VcfInfo type for a given ID
-
getVcfInfoNumber
public VcfInfoType getVcfInfoNumber(java.lang.String id)
Get Info number for a given ID
-
hasField
public boolean hasField(java.lang.String filedName)
-
hasGenotypes
public boolean hasGenotypes()
-
hasInfo
public boolean hasInfo(java.lang.String infoFieldName)
-
hasQuality
public boolean hasQuality()
-
isBiAllelic
public boolean isBiAllelic()
Is this bi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation.
-
isCompressedGenotypes
public boolean isCompressedGenotypes()
Do we have compressed genotypes in "HO,HE,NA" INFO fields?
-
isFilterPass
public boolean isFilterPass()
-
isMultiallelic
public boolean isMultiallelic()
Is this multi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation.
-
isShowWarningIfParentDoesNotInclude
protected boolean isShowWarningIfParentDoesNotInclude()
Description copied from class:Marker
Show an error if parent does not include child?- Overrides:
isShowWarningIfParentDoesNotInclude
in classMarker
-
isSingleSnp
public boolean isSingleSnp()
Is thins a VCF entry with a single SNP?
-
isSingleton
public boolean isSingleton()
Is this variant a singleton (appears only in one genotype)
-
isVariant
public boolean isVariant()
Is this a change or are the ALTs actually the same as the reference
-
isVariant
public boolean isVariant(java.lang.String alt)
Is this ALT string a variant?
-
iterator
public java.util.Iterator<VcfGenotype> iterator()
- Specified by:
iterator
in interfacejava.lang.Iterable<VcfGenotype>
-
mac
public int mac()
Calculate Minor allele count
-
maf
public double maf()
Calculate Minor allele frequency
-
parse
public void parse()
Parse a 'line' from a 'vcfFileIterator'
-
parseLof
public java.util.List<VcfLof> parseLof()
Parse LOF from VcfEntry
-
parseNmd
public java.util.List<VcfNmd> parseNmd()
Parse NMD from VcfEntry
-
removeInfo
public void removeInfo(java.lang.String key)
Remove INFO field
-
rmInfo
public boolean rmInfo(java.lang.String info)
Parse INFO fields
-
setFilter
public void setFilter(java.lang.String filter)
-
setFormat
public void setFormat(java.lang.String format)
-
setGenotypeStr
public void setGenotypeStr(java.lang.String genotypeFieldsStr)
-
setLineNum
public void setLineNum(int lineNum)
-
toStr
public java.lang.String toStr()
To string as a simple "CHR:START_REF/ALTs" format
-
toStringNoGt
public java.lang.String toStringNoGt()
Show only first eight fields (no genotype entries)
-
uncompressGenotypes
public VcfEntry uncompressGenotypes()
Uncompress VCF entry having genotypes in "HO,HE,NA" fields
-
variants
public java.util.List<Variant> variants()
Create a list of variants from this VcfEntry
-
-