Package org.snpeff.interval
Class Transcript
- java.lang.Object
-
- org.snpeff.interval.Interval
-
- org.snpeff.interval.Marker
-
- org.snpeff.interval.IntervalAndSubIntervals<Exon>
-
- org.snpeff.interval.Transcript
-
- All Implemented Interfaces:
java.io.Serializable
,java.lang.Cloneable
,java.lang.Comparable<Interval>
,java.lang.Iterable<Exon>
,TxtSerializable
public class Transcript extends IntervalAndSubIntervals<Exon>
Interval for a transcript, as well as some other information: exons, utrs, cds, etc.- Author:
- pcingola
- See Also:
- Serialized Form
-
-
Field Summary
-
Fields inherited from class org.snpeff.interval.Interval
chromosomeNameOri, end, id, parent, start, strandMinus
-
-
Constructor Summary
Constructors Constructor Description Transcript()
Transcript(Gene gene, int start, int end, boolean strandMinus, java.lang.String id)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int[]
aaNumber2Pos()
Calculate chromosome position as function of Amino Acid number Note that returns the chromosomal position of the first base for each Amino Acidint
aaNumber2Pos(int aaNum)
Find a genomic position of the first base in a Amino Acid 'aaNum'void
add(Cds cdsInt)
Add a CDSvoid
add(Intron intron)
Add an intronvoid
add(SpliceSite spliceSite)
Add a SpliceSitevoid
add(Utr utr)
Add a UTRboolean
adjust()
Adjust transcript coordinatesTranscript
apply(Variant variant)
Create a new transcript after applying changes in variantjava.lang.String
baseAt(int pos)
Find base at genomic coordinate 'pos'int
baseNumber2MRnaPos(int pos)
Calculate distance from transcript start to a position mRNA is roughly the same than cDNA.int
baseNumberCds(int pos, boolean usePrevBaseIntron)
Calculate base number in a CDS where 'pos' mapsjava.lang.String
baseNumberCds2Codon(int cdsBaseNumber)
Return a codon that includes 'cdsBaseNumber'int[]
baseNumberCds2Pos()
Calculate chromosome position as function of CDS numberint
baseNumberCds2Pos(int cdsBaseNum)
java.lang.String
cds()
Retrieve coding sequenceMarker
cdsMarker()
Create a marker of the coding region in this transcriptTranscript
cloneShallow()
Perform a shallow cloneint[]
codonNumber2Pos(int codonNum)
Return an array of 3 genomic positions where amino acid number 'aaNum' mapsboolean
collapseZeroGap()
Collapses exons having gaps of zero (i.e.double
cpgExonBias()
Calculate CpG bias: number of CpG / expected[CpG]int
cpgExons()
Count total CpG in this transcript's exonsvoid
createSpliceSites(int spliceSiteSize, int spliceRegionExonSize, int spliceRegionIntronMin, int spliceRegionIntronMax)
Find all splice sites.void
createUpDownStream(int upDownLength)
Creates a list of UP/DOWN stream regions (for each transcript) Upstream (downstream) stream is defined as upDownLength before (after) transcriptboolean
deleteRedundant()
Deletes redundant exons (i.e.Cds
findCds(Exon exon)
Find a CDS that matches exactly the exonExon
findExon(int pos)
Return the an exon that intersects 'pos'Exon
findExon(Marker marker)
Return an exon intersecting 'marker' (first exon found)Intron
findIntron(int pos)
Return an intron overlapping position 'pos'Utr
findUtr(int pos)
Return the UTR that hits position 'pos'java.util.List<Utr>
findUtrs(Marker marker)
Return the UTR that intersects 'marker' (null if not found)boolean
frameCorrection()
Correct exons based on frame information.java.util.List<Utr3prime>
get3primeUtrs()
Create a list of 3 prime UTRsjava.util.List<Utr3prime>
get3primeUtrsSorted()
java.util.List<Utr5prime>
get5primeUtrs()
Create a list of 5 prime UTRsjava.util.List<Utr5prime>
get5primeUtrsSorted()
BioType
getBioType()
java.util.List<Cds>
getCds()
Get all CDSsint
getCdsEnd()
int
getCdsStart()
Downstream
getDownstream()
java.util.Collection<Exon>
getExons()
A more intuitive name for 'subintervals'Exon
getFirstCodingExon()
Get first coding exonGene
getGene()
TranscriptSupportLevel
getTranscriptSupportLevel()
Marker
getTss()
Create a TSS markerUpstream
getUpstream()
java.util.List<Utr>
getUtrs()
Get all UTRsjava.lang.String
getVersion()
boolean
hasCds()
boolean
hasError()
Does this transcript have any errors?boolean
hasErrorOrWarning()
Does this transcript have any errors?boolean
hasTranscriptSupportLevelInfo()
boolean
hasWarning()
Does this transcript have any errors?java.util.List<Intron>
introns()
Get all introns (lazy init)boolean
isAaCheck()
protected boolean
isAdjustIfParentDoesNotInclude(Marker parent)
Adjust parent if it does not include child?boolean
isCanonical()
boolean
isChecked()
Has this transcript been checked against CDS/DNA/AA sequences?boolean
isCorrected()
boolean
isDnaCheck()
boolean
isDownstream(int pos)
boolean
isErrorProteinLength()
Check if coding length is multiple of 3 in protein coding transcriptsboolean
isErrorStartCodon()
Is the first codon a START codon?boolean
isErrorStopCodonsInCds()
Check if protein sequence has STOP codons in the middle of the coding sequenceboolean
isIntron(int pos)
boolean
isProteinCoding()
boolean
isRibosomalSlippage()
boolean
isUpstream(int pos)
boolean
isUtr(int pos)
boolean
isUtr(Marker marker)
boolean
isUtr3(int pos)
boolean
isUtr5(int pos)
boolean
isWarningStopCodon()
Is the last codon a STOP codon?Markers
markers()
A list of all markers in this transcriptjava.lang.String
mRna()
Retrieve coding sequence AND the UTRs (mRNA = 5'UTR + CDS + 3'UTR) I.e.java.lang.String
protein()
Protein sequence (amino acid sequence produced by this transcripts)Markers
query(Marker marker)
Query all genomic regions that intersect 'marker'Exon
queryExon(Marker interval)
Return the first exon that intersects 'interval' (null if not found)boolean
rankExons()
Assign ranks to exonsvoid
reset()
Remove all intervalsvoid
resetCache()
void
resetExons()
ErrorWarningType
sanityCheck(Variant variant)
Perfom some baseic chekcs, return error type, if anyvoid
serializeParse(MarkerSerializer markerSerializer)
Parse a line from a serialized filejava.lang.String
serializeSave(MarkerSerializer markerSerializer)
Create a string to serialize to a filevoid
setAaCheck(boolean aaCheck)
void
setBioType(BioType bioType)
void
setCanonical(boolean canonical)
void
setDnaCheck(boolean dnaCheck)
void
setProteinCoding(boolean proteinCoding)
void
setRibosomalSlippage(boolean ribosomalSlippage)
void
setTranscriptSupportLevel(TranscriptSupportLevel transcriptSupportLevel)
void
setVersion(java.lang.String version)
void
sortCds()
java.util.List<SpliceSite>
spliceSites()
java.lang.String
toString()
java.lang.String
toString(boolean full)
java.lang.String
toStringAsciiArt(boolean full)
Show a transcript as an ASCII Artboolean
utrFromCds(boolean verbose)
Calculate UTR regions from CDSsboolean
variantEffect(Variant variant, VariantEffects variantEffects)
Get some details about the effect on this transcript-
Methods inherited from class org.snpeff.interval.IntervalAndSubIntervals
add, addAll, addAll, clone, containsId, get, invalidateSorted, iterator, numChilds, remove, setStrandMinus, shiftCoordinates, sorted, sortedStrand, subIntervals
-
Methods inherited from class org.snpeff.interval.Marker
adjust, applyDel, applyDup, applyIns, applyMixed, codonTable, compareTo, compareToPos, distance, distanceBases, getParent, getType, idChain, idChain, idChain, includes, intersect, isDeferredAnalysis, isShowWarningIfParentDoesNotInclude, minus, query, readTxt, shouldApply, union, variantEffectNonRef
-
Methods inherited from class org.snpeff.interval.Interval
equals, findParent, getChromosome, getChromosomeName, getChromosomeNameOri, getChromosomeNum, getEnd, getGenome, getGenomeName, getId, getStart, getStrand, hashCode, intersects, intersects, intersects, intersects, intersectSize, isCircular, isSameChromo, isStrandMinus, isStrandPlus, isValid, setChromosomeNameOri, setEnd, setId, setParent, setStart, size, toStr, toStringAsciiArt, toStrPos
-
-
-
-
Constructor Detail
-
Transcript
public Transcript()
-
Transcript
public Transcript(Gene gene, int start, int end, boolean strandMinus, java.lang.String id)
-
-
Method Detail
-
aaNumber2Pos
public int[] aaNumber2Pos()
Calculate chromosome position as function of Amino Acid number Note that returns the chromosomal position of the first base for each Amino AcidIf you need the chromosomal position of each base
-
aaNumber2Pos
public int aaNumber2Pos(int aaNum)
Find a genomic position of the first base in a Amino Acid 'aaNum'
-
add
public void add(Cds cdsInt)
Add a CDS
-
add
public void add(Intron intron)
Add an intron
-
add
public void add(SpliceSite spliceSite)
Add a SpliceSite
-
add
public void add(Utr utr)
Add a UTR
-
adjust
public boolean adjust()
Adjust transcript coordinates
-
apply
public Transcript apply(Variant variant)
Create a new transcript after applying changes in variantNote: If this transcript is unaffected, no new transcript is created (same transcript is returned)
- Overrides:
apply
in classIntervalAndSubIntervals<Exon>
- Returns:
- The marker result after applying variant
-
baseAt
public java.lang.String baseAt(int pos)
Find base at genomic coordinate 'pos'
-
baseNumber2MRnaPos
public int baseNumber2MRnaPos(int pos)
Calculate distance from transcript start to a position mRNA is roughly the same than cDNA. Strictly speaking mRNA has a poly-A tail and 5'cap.
-
baseNumberCds
public int baseNumberCds(int pos, boolean usePrevBaseIntron)
Calculate base number in a CDS where 'pos' maps- Parameters:
usePrevBaseIntron
- : When 'pos' is intronic this method returns: - if( usePrevBaseIntron== false) => The first base in the exon after 'pos' (i.e. first coding base after intron) - if( usePrevBaseIntron== true) => The last base in the exon before 'pos' (i.e. last coding base before intron)
-
baseNumberCds2Codon
public java.lang.String baseNumberCds2Codon(int cdsBaseNumber)
Return a codon that includes 'cdsBaseNumber'
-
baseNumberCds2Pos
public int[] baseNumberCds2Pos()
Calculate chromosome position as function of CDS number
-
baseNumberCds2Pos
public int baseNumberCds2Pos(int cdsBaseNum)
-
cds
public java.lang.String cds()
Retrieve coding sequence
-
cdsMarker
public Marker cdsMarker()
Create a marker of the coding region in this transcript
-
cloneShallow
public Transcript cloneShallow()
Description copied from class:Marker
Perform a shallow clone- Overrides:
cloneShallow
in classIntervalAndSubIntervals<Exon>
-
codonNumber2Pos
public int[] codonNumber2Pos(int codonNum)
Return an array of 3 genomic positions where amino acid number 'aaNum' maps- Returns:
- aa2pos[0], aa2pos[1], aa2pos[2] are the coordinates (within the chromosome)
of the three bases conforming codon 'aaNum'. Any aa2pos[i] = -1 means that
it could a base in the codon could not be mapped.
Bases in the array are sorted by chromosome position, so aa2pos[0] < aa2pos[1] < aa2pos[2]
-
collapseZeroGap
public boolean collapseZeroGap()
Collapses exons having gaps of zero (i.e. exons that followed by other exons). Does the same for CDSs and UTRs.- Returns:
- true of any exon in the transcript was 'collapsed'
-
cpgExonBias
public double cpgExonBias()
Calculate CpG bias: number of CpG / expected[CpG]
-
cpgExons
public int cpgExons()
Count total CpG in this transcript's exons
-
createSpliceSites
public void createSpliceSites(int spliceSiteSize, int spliceRegionExonSize, int spliceRegionIntronMin, int spliceRegionIntronMax)
Find all splice sites.
-
createUpDownStream
public void createUpDownStream(int upDownLength)
Creates a list of UP/DOWN stream regions (for each transcript) Upstream (downstream) stream is defined as upDownLength before (after) transcript
-
deleteRedundant
public boolean deleteRedundant()
Deletes redundant exons (i.e. exons that are totally included in other exons). Does the same for CDSs. Does the same for UTRs.
-
findExon
public Exon findExon(int pos)
Return the an exon that intersects 'pos'
-
findExon
public Exon findExon(Marker marker)
Return an exon intersecting 'marker' (first exon found)
-
findIntron
public Intron findIntron(int pos)
Return an intron overlapping position 'pos'
-
findUtr
public Utr findUtr(int pos)
Return the UTR that hits position 'pos'- Returns:
- An UTR intersecting 'pos' (null if not found)
-
findUtrs
public java.util.List<Utr> findUtrs(Marker marker)
Return the UTR that intersects 'marker' (null if not found)
-
frameCorrection
public boolean frameCorrection()
Correct exons based on frame information.E.g. if the frame information (form a genomic database file, such as a GTF) does not match the calculated frame, we correct exon's boundaries to make them match.
This is performed in two stages: i) First exon is corrected by adding a fake 5'UTR ii) Other exons are corrected by changing the start (or end) coordinates.
-
get3primeUtrs
public java.util.List<Utr3prime> get3primeUtrs()
Create a list of 3 prime UTRs
-
get3primeUtrsSorted
public java.util.List<Utr3prime> get3primeUtrsSorted()
-
get5primeUtrs
public java.util.List<Utr5prime> get5primeUtrs()
Create a list of 5 prime UTRs
-
get5primeUtrsSorted
public java.util.List<Utr5prime> get5primeUtrsSorted()
-
getBioType
public BioType getBioType()
-
setBioType
public void setBioType(BioType bioType)
-
getCds
public java.util.List<Cds> getCds()
Get all CDSs
-
getCdsEnd
public int getCdsEnd()
-
getCdsStart
public int getCdsStart()
-
getDownstream
public Downstream getDownstream()
-
getExons
public java.util.Collection<Exon> getExons()
A more intuitive name for 'subintervals'
-
getFirstCodingExon
public Exon getFirstCodingExon()
Get first coding exon
-
getGene
public Gene getGene()
-
getTranscriptSupportLevel
public TranscriptSupportLevel getTranscriptSupportLevel()
-
setTranscriptSupportLevel
public void setTranscriptSupportLevel(TranscriptSupportLevel transcriptSupportLevel)
-
getTss
public Marker getTss()
Create a TSS marker
-
getUpstream
public Upstream getUpstream()
-
getUtrs
public java.util.List<Utr> getUtrs()
Get all UTRs
-
getVersion
public java.lang.String getVersion()
-
setVersion
public void setVersion(java.lang.String version)
-
hasCds
public boolean hasCds()
-
hasError
public boolean hasError()
Does this transcript have any errors?
-
hasErrorOrWarning
public boolean hasErrorOrWarning()
Does this transcript have any errors?
-
hasTranscriptSupportLevelInfo
public boolean hasTranscriptSupportLevelInfo()
-
hasWarning
public boolean hasWarning()
Does this transcript have any errors?
-
introns
public java.util.List<Intron> introns()
Get all introns (lazy init)
-
isAaCheck
public boolean isAaCheck()
-
setAaCheck
public void setAaCheck(boolean aaCheck)
-
isAdjustIfParentDoesNotInclude
protected boolean isAdjustIfParentDoesNotInclude(Marker parent)
Description copied from class:Marker
Adjust parent if it does not include child?- Overrides:
isAdjustIfParentDoesNotInclude
in classMarker
-
isCanonical
public boolean isCanonical()
-
setCanonical
public void setCanonical(boolean canonical)
-
isChecked
public boolean isChecked()
Has this transcript been checked against CDS/DNA/AA sequences?
-
isCorrected
public boolean isCorrected()
-
isDnaCheck
public boolean isDnaCheck()
-
setDnaCheck
public void setDnaCheck(boolean dnaCheck)
-
isDownstream
public boolean isDownstream(int pos)
-
isErrorProteinLength
public boolean isErrorProteinLength()
Check if coding length is multiple of 3 in protein coding transcripts- Returns:
- true on Error
-
isErrorStartCodon
public boolean isErrorStartCodon()
Is the first codon a START codon?
-
isErrorStopCodonsInCds
public boolean isErrorStopCodonsInCds()
Check if protein sequence has STOP codons in the middle of the coding sequence- Returns:
- true on Error
-
isIntron
public boolean isIntron(int pos)
-
isProteinCoding
public boolean isProteinCoding()
-
setProteinCoding
public void setProteinCoding(boolean proteinCoding)
-
isRibosomalSlippage
public boolean isRibosomalSlippage()
-
setRibosomalSlippage
public void setRibosomalSlippage(boolean ribosomalSlippage)
-
isUpstream
public boolean isUpstream(int pos)
-
isUtr
public boolean isUtr(int pos)
-
isUtr
public boolean isUtr(Marker marker)
-
isUtr3
public boolean isUtr3(int pos)
-
isUtr5
public boolean isUtr5(int pos)
-
isWarningStopCodon
public boolean isWarningStopCodon()
Is the last codon a STOP codon?
-
markers
public Markers markers()
A list of all markers in this transcript- Overrides:
markers
in classIntervalAndSubIntervals<Exon>
-
mRna
public java.lang.String mRna()
Retrieve coding sequence AND the UTRs (mRNA = 5'UTR + CDS + 3'UTR) I.e. Concatenate all exon sequences
-
protein
public java.lang.String protein()
Protein sequence (amino acid sequence produced by this transcripts)
-
query
public Markers query(Marker marker)
Query all genomic regions that intersect 'marker'- Overrides:
query
in classIntervalAndSubIntervals<Exon>
-
queryExon
public Exon queryExon(Marker interval)
Return the first exon that intersects 'interval' (null if not found)
-
rankExons
public boolean rankExons()
Assign ranks to exons
-
reset
public void reset()
Description copied from class:IntervalAndSubIntervals
Remove all intervals- Overrides:
reset
in classIntervalAndSubIntervals<Exon>
-
resetCache
public void resetCache()
-
resetExons
public void resetExons()
-
sanityCheck
public ErrorWarningType sanityCheck(Variant variant)
Perfom some baseic chekcs, return error type, if any
-
serializeParse
public void serializeParse(MarkerSerializer markerSerializer)
Parse a line from a serialized file- Specified by:
serializeParse
in interfaceTxtSerializable
- Overrides:
serializeParse
in classIntervalAndSubIntervals<Exon>
-
serializeSave
public java.lang.String serializeSave(MarkerSerializer markerSerializer)
Create a string to serialize to a file- Specified by:
serializeSave
in interfaceTxtSerializable
- Overrides:
serializeSave
in classIntervalAndSubIntervals<Exon>
-
sortCds
public void sortCds()
-
spliceSites
public java.util.List<SpliceSite> spliceSites()
-
toString
public java.lang.String toString(boolean full)
-
toStringAsciiArt
public java.lang.String toStringAsciiArt(boolean full)
Show a transcript as an ASCII Art
-
utrFromCds
public boolean utrFromCds(boolean verbose)
Calculate UTR regions from CDSs
-
variantEffect
public boolean variantEffect(Variant variant, VariantEffects variantEffects)
Get some details about the effect on this transcript- Overrides:
variantEffect
in classMarker
-
-