Command Line Menus

AggregatePlotter
AggregateQCStats
AlignmentEndTrimmer
AllelicExpressionComparator
AllelicExpressionDetector
AllelicExpressionMerger
AllelicExpressionRNASeqWriter
AllelicMethylationDetector
AnnotateBedWithGenes
AnnotatedVcfParser
ArupPipelineWrapper
AvatarAssembler
AvatarComparator
BamConcordance
BamContextInspector
BamHg19ToB37Converter
BamBlaster
BamMixer
Bar2Gr
Bar2USeq
Bed2Bar
BedStats
BedTabix
Bed2UCSCRefFlat
BedRegionSplitter
BisSeq
BisSeqAggregatePlotter
BisSeqErrorAdder
BisStat
BisStatRegionMaker
CalculatePerCycleErrorRate
ChIPSeq
ClusterMultiSampleVCF
CollectBamStats
CompareIntersectingRegions
CompareIntersectingVcfs
CompareParsedAlignments
ConcatinateFastas
CorrectVCFEnds
CorrelatePointData
CountChromosomes
BisulfiteConvertFastas
Consensus
CorrelationMaps
ConvertFasta2GCBarGraph
DbNSFPCoordinateConverter
DefinedRegionBisSeq
DefinedRegionDifferentialSeq
DefinedRegionRNAEditing
DefinedRegionScanSeqs
DRDSAnnotator
EnrichedRegionMaker
EstimateErrorRates
ExactBamMixer
ExportExons
ExportIntergenicRegions
ExportIntronicRegions
ExportTrimmedGenes
FastqBarcodeTagger
FastqInterlacer
FastqRenamer
FetchGenomicSequences
FindNeighboringGenes
FindOverlappingGenes
FindSharedRegions
FileCrossFilter
FileMatchJoiner
FileJoiner
FileSplitter
FilterIntersectingRegions
FilterPointData
FoundationVcfComparator
FoundationXml2Vcf
FreebayesVCFParser
GatkCalledSegmentAnnotator
GatkRunner
GeneiASEParser
Graph2Bed
GenerateOverlapStats
Gr2Bar
InosinePredict
IntersectLists
IntersectKeyWithRegions
IntersectRegions
JointGenotypeVCFParser
KeggPathwayEnrichment
KnownSpliceJunctionScanner
LofreqVCFParser
MafParser
MakeSpliceJunctionFasta
MakeTranscriptome
MaskExonsInFastaFiles
MaskRegionsInFastaFiles
MatchMates
MaxEntScanScore3
MaxEntScanScore5
MergeAdjacentRegions
MergeExonMetrics
MergeOverlappingGenes
MergePairedAlignments
MergePointData
MergeRegions
MergeSams
MergeUCSCGeneTable
MethylationArrayScanner
MethylationArrayDefinedRegionScanner
MicrosatelliteCounter
MiRNACorrelator
MpileupParser
MpileupRandomizer
MultipleReplicaScanSeqs
MultiSampleVCFFilter
MutectVCFParser
Mutect4VCFParser
NonReferenceRegionMaker
NovoalignBisulfiteParser
NovoalignIndelParser
NovoalignParser
NovoalignPairedParser
OligoTiler
OverdispersedRegionScanSeqs
ParseExonMetrics
ParseIntersectingAlignments
ParsePointDataContexts
PeakShiftFinder
PointDataManipulator
PoReCNV
Primer3Wrapper
PrintSelectColumns
QCSeqs
QueryIndexer
RandomizeTextFile
RankedSetAnalysis
ReadCoverage
ReferenceMutator
RNAEditingPileUpParser
RNAEditingScanSeqs
RNASeq
RNASeqSimulator
S3UrlMaker
Sam2Fastq
SamFastqLoader
Sam2USeq
SamAlignmentDepthMatcher
SamAlignmentExtractor
SamComparator
SamParser
SamTranscriptomeParser
SamSplitter
SamReadDepthSubSampler
SamSVFilter
SamSubsampler
ScalpelVCFParser
ScanSeqs
SubtractRegions
ScoreChromosomes
ScoreParsedBars
ScoreSequences
Sgr2Bar
Simulator
StrandedBisSeq
SRAProcessor
SubSamplePointData
Tag2Point
TempusJson2Vcf
TempusVcfComparator
Text2USeq
TomatoFarmer
Telescriptor
TNRunner
TRunner
UCSCBig2USeq
USeq2UCSCBig
USeq2Text
VarScanVCFParser
VCFBackgroundChecker
VCFBamAnnotator
VCF2Bed
VCFAnnotator
VCFCallFrequency
VCFComparator
VCFConsensus
VCFFdrEstimator
VCFMerger
VCFMpileupAnnotator
VCFVariantMaker
VCFNoCallFilter
VCFRegionFilter
VCFRegionMarker
VCFReporter
VCFSelector
VCFSpliceScanner
VCFTabix
VCF2Tsv
Wig2Bar
Wig2USeq
ScoreMethylatedRegions
ScoreEnrichedRegions
SomaticSniperVCFParser
StrelkaVCFParser

**************************************************************************************
**                            Aggregate Plotter:  Nov 2017                          **
**************************************************************************************
Fetches point data contained within each region, inverts - stranded annotation, zeros
the coordinates, sums, and window averages the values.  Usefull for generating
class averages from a list of annotated regions. Use a spreadsheet app to graph the
results.

Options:
-t PointData directories, full path, comma delimited. These should contain chromosome
       specific xxx.bar.zip files.
-b Bed file (chr, start, stop, text, score, strand(+/-/.), full path, containing
       regions to stack. Must be all the same size.
-p Peak shift, average distance between + and - strand peaks. Will be used to shift
       the PointData by 1/2 the peak shift, defaults to 0. 
-u Strand usage, defaults to 0 (combine), 1 (use only same strand), 2 (opposite
       strand), or 3 (ignore).
       this option to select particular stranded data to aggregate.
-r Replace scores with 1.
-f Pad start and stop of each bed region xxx bps, defaults to 0.
-d Delog2 scores. Do it if your data is in log2 space.
-v Convert each region scores to % of total.
-n Divide scores by the number of regions.
-k Divide each regions score by this value.
-l Divide each regions score by the total number of observations.
-s Scale all regions to a particular size. Defaults to max region size.
-a Average region scores instead of summing.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/AggregatePlotter -t
      /Data/PolIIRep1/,/Data/PolIIRep2/ -b /Anno/tssSites.bed -p 73 -u 1
      -l 

**************************************************************************************

**************************************************************************************
**                            Aggregate QC Stats: May 2019                          **
**************************************************************************************
Parses and aggregates alignment quality statistics from json files produced by the
SamAlignmentExtractor, MergePairedAlignments, Sam2USeq, BamConcordance and Fastq rule.

Options:
-j Directory containing xxx.json.gz files for parsing. Recurses through all other
      directories contained within.
-r Results directory for writing the summary xls spreadsheets for graphing.

Default Options:
-f FastqCount regex for parsing sample name, note the name must be identical across
the json files, defaults to (.+)_FastqCount.json.gz, case insensitive.
-s SAE regex, defaults to (.+)_SamAlignmentExtractor.json.gz
-m MPA regex, defaults to (.+)_MergePairedAlignments.json.gz
-u S2U regex, defaults to (.+)_Sam2USeq.json.gz
-b BC regex, defaults to (.+)_BamConcordance.json.gz
-p String to prepend onto output file names.
-c Don't calculate detailed region read coverage statistics, saves memory and time.
-v Print verbose debugging output.
-e Replace Exome with DNA in all file reference names.

Example: java -Xmx1G -jar pathToUSeq/Apps/AggregateQCStats -j . -r QCStats/ -p TR774_ 

**************************************************************************************

**************************************************************************************
**                            Alignment End Trimmer: Dec 2017                       **
**************************************************************************************
This application can be used to trim alignments according to the density of mismatches.
Each base of the alignment is compared to the reference sequence from the start of the
alignment to the end.  If the bases match, the score is increased by -m. If the bases
don't match, the score is decreased by -n.  The alignment position with the highest 
score is used as the new alignment end point. The cigar string, alignment position,
mpos and flags are all updated to reflect trimming. 

Notes:
1) Insertions, deletions and skips are currently not counted as matches or mismatches

Required:
-i Path to the orignal alignment, sam/bam/sam.gz OK.
-r Path to the reference sequence, gzipped OK.
-o Name of the trimmed alignment output.  Output is bam and bai.

Optional:
-m Score of match. Default 1
-n Score of mismatch. Default 2
-v Verbose output.  This will write out detailed information for every trimmed read.
    It is suggested to use this option only on small test files.
-l Min length.  If the trimmed length is less than this value, the read is switched
    to unaligned. Default 10bp
-e Turn on RNA Editing mode.  A>G (forward reads) and T>C (reverse reads) are considered
   matches.
-s Turn on mismatch scoring mode. Reads with more than -x mismatches are dropped. If 
   RNA Editing mode is on, A>G (forward reads) and T>C (reverse reads) are considered 
   matches.
-x Max number of mismatches allowed in max scoring mode. Default 0

Examples: 
1) java -Xmx4G -jar /path/to/AlignmentEndTrimmer -i 1000X1.bam -o 100X1.trim.bam
           -r /path/to/hg19.fasta
2) java -Xmx4G -jar /path/to/AlignmentEndTrimmer -i 1000X1.bam -o 100X1.trim.bam
           -r /path/to/hg19.fasta -m 0.5 -n 3
3) java -Xmx4G -jar /path/to/AlignmentEndTrimmer -i 1000X1.test.bam 
           -o 100X1.test.trim.bam -r /path/to/hg19.fasta -v
**************************************************************************************

**************************************************************************************
**                      Allelic Expression Comparator:  Oct 2014                    **
**************************************************************************************
Looks for changes in allelic expression between two conditions. First run the
AllelicExpressionDetector on each condition. Only snps with minimum # samples and
where in one of the conditions it also passes FDR and log2Rto thresholds.

Required Options:
-d Directory containing nameSnp.obj files to compare from the AllelicExpressionDetector.
-s Save directory.

Default Options:
-f Minimum -10Log10(FDR) for individual condition allelic expression, default 13
-l Minimum abs(log2Ratio) for individual condition allelic expression, default 1
-m Minimum samples in each condition to compare Snp count data, defaults to 2.
-r Full path to R. Defaults to '/usr/bin/R'

Example: java -Xmx4G -jar pathTo/USeq/Apps/AllelicExpressionComparator -s EyeAEC/
       -d EyeAED

**************************************************************************************

**************************************************************************************
**                      Allelic Expression Detector:  Sept 2016                     **
**************************************************************************************
Application for identifying allelic expression based on a table of snps and bam
alignments that have been filtered for alignment bias.  See the ReferenceMutator and
SamComparator apps. Uses DESeq2 to identify differential expression between alleles.

Required Options:
-n Sample names to process, comma delimited, no spaces.
-b Directory containing coordinate sorted bam and index files named according to their
      sample name.
-d SNP data file containing all sample snp calls.
-e Results directory.
-s SNP map bed file from the ReferenceMutator app.
-t Tabix gz indexed bed file of exons where the name column is the gene name, see
      ExportExons and https://github.com/samtools/htslib

Default Options:
-g Minimum GenCall score, defaults to 0.25
-q Minimum alignment base quality at snp, defaults to 20
-m Minimum alignment read coverage, defaults to 4
-p Minimum number replicas with heterozygous snp to score, defaults to 3
-r Full path to R (version 3+) loaded with DESeq2, see http://www.bioconductor.org
       Type 'library(DESeq2) in R to see if it is installed. Defaults to '/usr/bin/R'

Example: java -Xmx4G -jar pathTo/USeq/Apps/AllelicExpressionDetector -b Bam/RPENormal/
-n D002-14,D005-14,D006-14,D009-14 -d GenotypingResults.txt.gz -s SNPMap_Ref2Alt_Int.txt
-r RPENormal -t ~/Anno/b37EnsGenes7Sept2016_Exons.bed.gz

**************************************************************************************

**************************************************************************************
**                          Allelic Expression Merger :  Sept 2016                  **
**************************************************************************************
App to merge two GeneiASE tables from the AlleleicExpressionDetector and the 
AllelicExpressionRNASeqWriter. Where geneName coor duplicates are found, writes out
the first's record to the merged file.

Required Arguments:
-f First GeneiASE table (gene snp.id alt.dp ref.dp)
-s Second GeneiASE table (gene snp.id alt.dp ref.dp)
-m Merged output table results.

Example: java -Xmx4G -jar pathTo/USeq/Apps/AllelicExpressionMerger -f snpTable.txt
-s rnaSeqTable.txt -m mergedGeneiASETable.txt

**************************************************************************************

**************************************************************************************
**                       Allelic Expression RNASeq Writer :  Sept 2016               **
**************************************************************************************
Application for parsing count data for downstream Allele Specific Gene Expression
detection, e.g. GeneiASE. Avoids snvs with vars within the read length, skips INDELs.

Required Arguments:
-b Bam file with associated index from an RNASeq experiment after filtering for
      allelic alignment bias.
-v Vcf file containing snvs to use in extracting alignment counts from the bam. These
      will be filtered using the args below before saving.
-t Tabix gz indexed bed file of exons where the name column is the gene name, see
      ExportExons and https://github.com/samtools/htslib
-o Output file.

Default Arguments:
-l Read length, defaults to 50
-c Minimum alignment depth for quality bases, defaults to 10
-q Minimum base quality, defaults to 20
-a Minimum Alt alignment depth, defaults to 2
-r Minimum Ref alignment depth, defaults to 2
-m Minimum allele frequency, defaults to 0.05
-x Maximum allele frequency, defaults to 0.95
-p Don't print header

Example: java -Xmx4G -jar pathTo/USeq/Apps/AllelicExpressionRNASeqWriter -b proc.bam
-v lofreq.vcf.gz -t ~/Anno/b37EnsGenes7Sept2016_Exons.bed.gz -o forGeneiASE.txt.gz

**************************************************************************************

**************************************************************************************
**                     Allelic Methylation Detector:  March 2014                    **
**************************************************************************************
AMD identifies regions displaying allelic methylation, e.g. ~50% average mCG
methylation yet individual read pairs show a bimodal fraction distribution of either
fully methylated or unmethylated.

Options:
-s Save directory.
-f Fasta file directory.
-t BAM file directory containing one or more xxx.bam file with their associated xxx.bai
       index. The BAM files should be sorted by coordinate and have passed Picard
       validation.
-a Minimum number alignments per region, defaults to 15.
-e Minimum number Cs in each alignment, defaults to 6
-m Minimum region fraction methylation, defaults to 0.4
-x Maximum region fraction methylation, defaults to 0.6
-r Full path to R, defaults to /usr/bin/R
-c Converted CG context PointData directories, full path, comma delimited. These 
       should contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories. Use the ParsePointDataContexts on the output of the
       NovoalignBisulfiteParser to select CG contexts. 
-n Non-converted PointData directories, ditto. 
-b Provide a bed file (chr, start, stop,...), full path, to scan a list of regions
       instead of the genome.  See, http://genome.ucsc.edu/FAQ/FAQformat#format1

Example: java -Xmx4G -jar pathTo/USeq/Apps/AllelicMethylationDetector -s AMD
-f Fastas/ -t Bams/ -c PointData/Con -n PointData/NonCon 

**************************************************************************************

**************************************************************************************
**                           Annotate Bed With Genes   Nov 2018                     **
**************************************************************************************
Takes a bed like file and a UCSC gene table, intersects them and adds a new column to
the file with the gene names that intersect the gene exons or regions. 

Parameters:
-u UCSC RefFlat or RefSeq gene table file, full path. See,
       http://genome.ucsc.edu/cgi-bin/hgTables, (geneName name2(optional) chrom strand
       txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds). 
-b Bed like file of regions to intersect with genes, gz/zip OK
-i Indexes defining chr start stop columns, defaults to 0,1,2 for bed format.
-r Gzipped results file.
-p Bp padding to expand the bed regions when intersecting with genes.
-g Intersect gene regions with bed, not gene exons.

Example: java -Xmx2G -jar pathTo/USeq/Apps/AnnotateBedWithGenes -p 100 -g -i 1,2,3
      -b targetRegions.bed -r targetRegionsWithGenes.txt.gz -u hg19EnsGenes.ucsc.gz
**************************************************************************************

**************************************************************************************
**                             Annotated Vcf Parser  Sept 2018                      **
**************************************************************************************
Splits VCF files that have been annotated with SnpEff w/ dbNSFP and clinvar, plus the
VCFBackgroundChecker and VCFSpliceScanner USeq apps into passing and failing records.
Use the -e option to inspect the effect of the various filters on each record. Use
the VCFRegionFilter app to restrict variants to particular gene regions.

Options:
-v File path or directory containing xxx.vcf(.gz/.zip OK) file(s) to filter.
-s Directory for saving the results.
-f Perform a candidate somatic variant processing. Setting the following overrides
        the defaults. 
-d Minimum DP alignment depth
-m Minimum AF allele frequency
-x Maximum AF allele frequency
-j Ignore the max AF filter for ACMG incidental germline gene variants.
-p Maximum population allele frequency, only applies if present.
-b Maximum fraction of BKAF samples with allele frequency >= VCF AF, only applies
       if present.
-g Splice junction types to scan.
-n Minimum difference in splice junction scores, only applies if present.
-a Comma delimited list of SnpEff ANN impact categories to select for.
-c Comma delimited list of CLINSIG terms to select for.
-e Comma delimited list of CLINSIG terms to select against.
-i Comma delimited list of VCF ID keys to select for. If the VCF ID contains one or
       more, the record is passed regardless of other filters. The match is not exact.
-o Only require, if set or present, SnpEff ANN or CLINSIG or Splice to be true to pass.
       Defaults to require that all set pass.
-r Verbose per record output.
-y Path to a config txt file for setting the above.

Example: java -jar pathToUSeq/Apps/AnnotatedVcfParser -v VCFFiles/ -s Parsed/
        -d 75 -m 0.05 -x 0.75 -j -p 0.02 -b 0.1 -g D5S,D3S,G5S,G3S -n 3.5 -a
        HIGH,MODERATE -c Pathogenic,Likely_pathogenic -i Foundation,Tempus -v

**************************************************************************************

**************************************************************************************
**                           ArupPipelineWrapper: April 2016                        **
**************************************************************************************
This app wraps ARUP's pipeline.jar app for generating QC metrics, annotating variants,
and lastly creates the review directory.

Params:
  -o Job ID
  -m Submitter
  -y Analysis type
  -w Provide a root path for web links if you'd like to make them.
  -i Minimum alignment depth
  -t Threads
  -s Sample ID, defaults to name of output directory.
  -d Path to the output directory
  -j Path to the pipeline.jar application
  -p Path to the truncated pipeline properties file needing Reference prepending.
  -c Path to the properties Reference directory containing the Data, Apps, and Bed dirs.
  -q Path to the bed file for coverage QC
  -b Path to the bed file for variant calling
  -r Path to the fasta reference file w/ index and dict
  -u Path to the unfiltered bam file
  -f Path to the filtered bam file
  -v Path to the final vcf file
  -e SnpEff genome, defaults to hg19_ucsc_20150427
  -l Upload variants to NGSWeb, defaults to not uploading

Example: java -Xmx4G -jar pathTo/USeq/Apps/ArupPipelineWrapper -o MyJobNix3 -m DNix 
    -j ~/BioApps/Pipeline-1.0-SNAPSHOT-jar-with-dependencies.jar -y TestAnaly -w 
    ~/WebLinks -i 300 -d Results -p truncPipeProp.xml -c /Pipe/Reference/
    -q 0758221_compPad25bp_v1.bed -b 0758221_v1.bed -t 24 -r 
    ~/HCIAtlatl/data/Human/B37/human_g1k_v37_decoy.fasta -u CNV36B_unfiltered.bam -f 
    CNV36B_final.bam -v CNV36B_snvIndel.vcf

**************************************************************************************

**************************************************************************************
**                              Avatar Assembler : April 2019                       **
**************************************************************************************
Tool for assembling fastq avatar datasets based on the results of three sql queries.
See https://ri-confluence.hci.utah.edu/x/KwBFAg   Login as root on hci-clingen1

Options:
-i Info
-d Diagnosis
-g Gender
-p Path to Exp dir w/o Year, e.g. /Repository/PersonData/
-y Year dirs to examine for fastq linking, defaults to 2017,2018,2019,2020,2021
-j Job dir to place linked fastq
-f Only keep patients with a diagnosis containing this String, defaults to all.
-l Create Fastq links for all patient datasets, defaults to just those with both a
    Tumor and Normal exome.
-r Patient stats output file.

Example: java -jar -Xmx2G ~/USeqApps/AvatarAssembler -p /Repository/PersonData/
    -r avatarAssembler.log.gz -i sampleInfo.txt -d sampleDiagnosis.txt -g 
    sampleGender.txt > avatarAssemblerProblemSamples.txt -f HEM -y 2018,2019

**************************************************************************************

**************************************************************************************
**                           Avatar Comparator : Feb 2019                           **
**************************************************************************************
Tool for identifying AVATAR datasets that are ready for analysis or need attention.

Options:
-j Patient job directory
-v Verbose output

Example: java -jar -Xmx2G ~/USeqApps/AvatarComparator -j AJobs/ 

**************************************************************************************

**************************************************************************************
**                             Bam Concordance: April 2019                          **
**************************************************************************************
BC calculates sample level concordance based on uncommon homozygous SNVs found in bam
files. Samples from the same person will show high similarity (>0.9). Run BC on
related sample bams (e.g tumor & normal exomes, tumor RNASeq) plus an unrelated bam
for comparison. Mismatches passing filters are written to file. BC also generates a
variety of AF histograms for checking gender and sample contamination. Although
threaded, BC runs slowly with more that a few bams. Use the USeq ClusterMultiSampleVCF
app to check large batches of vcfs to identify the likely mismatched sample pairs.

WARNING: Mpileup does not check that the chr order is the same across samples and the
fasta reference, be certain they are or mpileup will produce incorrect counts. Use
Picard's ReorderSam app if in doubt.

Note re FFPE derived RNASeq data: A fair bit of systematic error is found in these
datasets.  As such, the RNA-> DNA contrasts are low. Yet the DNA->RNA are > 0.9

Options:
-r Path to a bed file of regions to interrogate.
-s Path to the samtools executable.
-f Path to an indexed reference fasta file.
-b Path to a directory containing indexed bam files.
-c Path to a tabix indexed bed file of common dbSNPs. Download 00-common_all.vcf.gz 
       from ftp://ftp.ncbi.nih.gov/snp/organisms/, grep for 'G5;' containing lines, 
       run VCF2Bed, bgzip and tabix it with https://github.com/samtools/htslib,
       defaults to no exclusion from calcs. 
-d Minimum read depth, defaults to 25.
-a Minimum allele frequency to count as a homozygous variant, defaults to 0.95
-m Minimum allele frequency to count a homozygous match, defaults to 0.9
-q Minimum base quality, defaults to 20.
-u Minimum mapping quality, defaults to 20.
-n Minimum fraction similarity to pass sample set, defaults to 0.85
-x Maximum log2Rto score for calling a sample female, defaults to 1.5
-y Minimum log2Rto score for calling a sample male, defaults to 2.5
-e Sample name to ignore in scoring similarity and gender, defaults to 'RNA'
-j Write gzipped summary stats in json format to this file.
-t Number of threads to use.  If not set, determines this based on the number of
      threads and memory available to the JVM so set the -Xmx value to the max.

Example: java -Xmx100G -jar pathTo/USeq/Apps/BamConcordance -r ~/exomeTargets.bed
      -s ~/Samtools1.3.1/bin/samtools -b ~/Patient7Bams -d 10 -a 0.9 -m 0.8 -f
      ~/B37/human_g1k_v37.fasta -c ~/B37/b38ComSnps.bed.gz -j bc.json.gz 

**************************************************************************************

**************************************************************************************
**                           Bam Context Inspector:  Oct 2016                       **
**************************************************************************************
Application for scanning the surrounding context of a set of regions for non reference
bps.  Use to flag variants with adjacent potentially confounding changes.

Required Options:
-b Sorted bam alignment file with associated index.
-r Bed file of regions to split into pass (no non refs) or fail (with a non ref bp).
-f Path to the reference fasta with and xxx.fai index

Default Options:
-p Bp of flanking bases to scan for non ref bases, defaults to 25
-c Minimum alignment coverage of a base before scanning, defaults to 6
-n Minimum non reference base count, defaults to 2
-q Minimum base quality, defaults to 13
-m Minimum alignment mapping quality, defaults to 13
-a Maximum non ref allele frequency, defaults to 0.03
-x Maximum number non ref bps in flanks, defaults to 1
-i Don't fail regions with an indel in the flanks

Example: java -Xmx4G -jar pathTo/USeq/Apps/BamContextInspector -b Bam/rPENormal.bam
-r rPENormal_calls.bed -r Ref/human_g1k_v37_decoy.fasta -q 20 -m 20 -a 2 

**************************************************************************************

**************************************************************************************
**                            Bam Hg19 to B37 Converter: Aug 2016                   **
**************************************************************************************
Cuts off the chr from each reference chromosome, converts chrM to MT, and swaps out
the header to convert hg19 alignments to b37 alignments.

Options:
-b Bam files to covert to b37, a directory with such or a single file.
-e A bam file with a good b37 header to add to the converted hg19 alignments.

Example: java -Xmx1500M -jar pathToUSeq/Apps/BamHg19B37Converter -b . -e ~/b37.bam

**************************************************************************************

**************************************************************************************
**                              Bam Blaster : April 2019                            **
**************************************************************************************
Injects SNVs and INDELs from a vcf file into bam alignments. These and their mates are
extracted as fastq for realignment. For SNVs, only alignment bases that match the
reference and have a CIGAR of M are modified. Not all alignments can be modified.
Secondary/supplemental/not proper are skipped. One var per alignment. Variants within
read length distance of prior are ignored and saved to file for iterative processing.
Be sure to normalize and decompose your vcf file (e.g.https://github.com/atks/vt).
INDELs first base must be reference. Use the ExactBamMixer or BamMixer to add
realignments (e.g. 10%) with the unmodified.bams (e.g. 90%). Use the VCFVariantMaker
to generate random vcf variants or pull a VCF from Clinvar/ Cosmic.

Required:
-b Path to a coordinate sorted bam file with index.
-v Path to a trimmed, normalized, decomposed vcf variant file, zip/gz OK.
-r Full path to a directory to save the results.
-s Max size INDEL, defaults to 50
-d Min alignment depth, defaults to 25
-m Min distance between variants, defaults to 150

Example: java -Xmx10G -jar pathTo/USeq/Apps/BamBlaster -b ~/BMData/na12878.bam
    -r ~/BMData/BB0 -v ~/BMData/clinvar.pathogenic.SnvIndel.vcf.gz 

**************************************************************************************

**************************************************************************************
**                               Bam Mixer : April 2018                             **
**************************************************************************************
Combines bam alignment files in different fractions to simulate multiple variant
frequencies. Run BamBlaster first.

Required:
-r Path to a directory to save the results
-u Path to the xxx_unmodified.bam from your BamBlaster run
-f Path to the xxx_filtered.bam from your BamBlaster run
-p Path to your realigned paired end bam
-s Path to your realigned single end bam

Optional:
-m Fractions to mix in the variant alignments, comma delimited, no spaces, defaults to
     0.025,0.05,0.1,0.2
-v Verbose output.

Example: java -Xmx10G -jar pathTo/USeq/Apps/BamMixer -r ~/TumorSim/
    -u ~/bb_unmodified.bam -f ~/bb_filtered.bam -p ~/bb_paired.bam -s ~/bb_single.bam 

**************************************************************************************

**************************************************************************************
**                                 Bar2Gr: Nov 2006                                 **
**************************************************************************************
Converts xxx.bar to text xxx.gr files.

-f The full path directory/file text for your xxx.bar file(s).

Example: java -Xmx1500M -jar pathTo/T2/Apps/Bar2Gr -f /affy/BarFiles/ 

**************************************************************************************

**************************************************************************************
**                                 Bar 2 USeq: Mar 2011                             **
**************************************************************************************
Recurses through directories and sub directories of xxx.bar(.zip/.gz OK) files
converting them to xxx.useq files (http://useq.sourceforge.net/useqArchiveFormat.html).  

Required Options:
-f Full path directory containing bar files or directories of bar files.

Default Options:
-i Index size for slicing split chromosome data (e.g. # rows per file),
      defaults to 10000.
-r For graphs, select a style, defaults to 0
      0	Bar
      1	Stairstep
      2	HeatMap
      3	Line
-h Color, hexadecimal (e.g. #6633FF), enclose in quotations
-d Description, enclose in quotations 
-g Reset genome version, defaults to that indicated by the bar files.
-e Delete original folders, use with caution.
-m Replace bar files with new xxx.useq file in bar file directory, use with caution.

Example: java -Xmx4G -jar pathTo/USeq/Apps/Bar2USeq -f
      /AnalysisResults/ -i 5000 -h '#6633FF' -g D_rerio_Jul_2010 
      -d 'Final processed chIP-Seq results for Bcd and Hunchback, 30M reads' 

**************************************************************************************

**************************************************************************************
**                                  Bed2Bar: June 2010                              **
**************************************************************************************
Bed2Bar builds stair step graphs from bed files for display in IGB. Strands are merged
and text information removed. Will also generate a merged bed file thresholding the 
graph at that level. 

-f Full path file or directory containing xxx.bed(.zip/.gz OK) files
-v Genome version (eg H_sapiens_Mar_2006), get from UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-s Sum bed scores for overlapping regions, defaults to assigning the highest score.
-t Threshold, defaults to 0.
-g Maximum gap, defaults to 0.

Example: java -Xmx4G pathTo/Apps/Bed2Bar -f /affy/res/zeste.bed.gz -v 
      M_musculus_Jul_2007 -g 1000 -s -t 100 

**************************************************************************************

**************************************************************************************
**                                BedStats: June 2010                               **
**************************************************************************************
Calculates several statistics on bed files where the name column contains a short read
sequence. This includes a read length distribution and frequencies of the 1st and last
bps. Can also trim your read to a particular length. 

Options:
-b Full path file name for your alignment bed file or directory containing such. The
       name column should contain your just you sequence or seq;qual .
-t Trim the 3' ends of your reads to the indicated length, defaults to not trimming.
-s Calculate base frequencies for the given 0 indexed base instead of the last base.
-r Reverse complement sequences before calculating stats and trimming.

Example: java -Xmx1500M -jar pathToUSeq/Apps/BedStats -b /Res/ex1.bed.gz -s 9 -t 10

**************************************************************************************

**************************************************************************************
**                                BedTabix: Jan 2013                                **
**************************************************************************************
Converts bed files to a SAMTools compressed bed tabix format. Recursive.

Required Options:
-v Full path file or directory containing xxx.bed(.gz/.zip OK) file(s). Recursive!
-t Full path tabix directory containing the compiled bgzip and tabix executables. See
      http://sourceforge.net/projects/samtools/files/tabix/
-f Force overwriting of existing indexed bed files, defaults to skipping.
-d Do not delete non gzipped bed files after successful indexing, defaults to deleting.
-e Only print error messages.

Example: java -jar pathToUSeq/Apps/BedTabix -v /VarScan2/BEDFiles/
     -t /Samtools/Tabix/tabix-0.2.6/ 

**************************************************************************************

**************************************************************************************
**                            Bed 2 UCSC RefFlat   June 2015                        **
**************************************************************************************
Takes a bed file and a UCSC gene table, intersects them and assigns each bed region
to a gene, then builds a new gene table using the bed region coordinates. Note, each
bed region must intersect only one gene. Modify the input gene table
(MergeUCSCGeneTable and manually trim) based on the errors. Lastly, all bed regions
must be assigned to genes.

Parameters:
-u UCSC RefFlat or RefSeq gene table file, full path. See,
       http://genome.ucsc.edu/cgi-bin/hgTables, (geneName name2(optional) chrom strand
       txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds). 
-b Bed file of regions to intersect with the gene table.
-t Don't remove UTRs if present, from the gene table.
-r Results file.

Example: java -Xmx2G -jar pathTo/USeq/Apps/Bed2UCSCRefFlat -u refSeqJun2015.ucsc
      -b targetRegionsFat.bed -r targetRegionsFat.ucsc
**************************************************************************************

**************************************************************************************
**                             Bed Region Splitter : June 2017                      **
**************************************************************************************
Regions exceeding the chunk size are split into multiple parts.

Required:
-d Path to a file or directory containing such to chunk.

Optional:
-c BP chunk size, defaults to 2000.

Example: java -Xmx4G -jar pathTo/USeq/Apps/BedRegionSplitter -d ToSplit/ -c 5000 

**************************************************************************************

**************************************************************************************
**                                   BisSeq: June 2016                              **
**************************************************************************************
Takes two condition (treatment and control) PointData from converted and non-converted
C bisulfite sequencing data parsed using the NovoalignBisulfiteParser and scores
regions for differential methylation using either a fisher exact or chi-square test 
for changes in methylation.  A Benjamini & Hockberg correction is applied to convert
the pvalues to FDRs. Data is only collected on bases that meet the minimum
read coverage threshold in both datasets.  The fraction differential methylation
statistic is calculated by taking the pseudomedian of all of the log2 paired base level
fraction methylations in a given window. Overlapping windows that meet both the
FDR and pseLog2Ratio thresholds are merged when generating enriched and reduced
regions. BisSeq generates several tracks for browsing and lists of differentially
methlated regions. To examine only mCG contexts, first filter your PointData using the
ParsePointDataContexts app. 

Options:
-s Save directory, full path.
-c Treatment converted PointData directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_-/+_.bar.zip files fro the NBP app.
       One can also provide a single directory that contains multiple PointData
       directories.
-C Control converted PointData directories, ditto. 
-n Treatment non-converted PointData directories, ditto. 
-N Control non-coverted PointData directories, ditto. 
-a Scramble control data.

Default Options:
-d Minimum per base read coverage, defaults to 5.
-w Window size, defaults to 250.
-m Minimum number methy C observations in window, defaults to 5. 
-f FDR threshold, defaults to 30 (-10Log10(0.01)).
-l Log2Ratio threshold, defaults to 1.585 (3x).
-r Full path to R, defaults to '/usr/bin/R'
-g Don't print graph files.

Example: java -Xmx10G -jar pathTo/USeq/Apps/BisSeq -c /Sperm/Converted -n 
      /Sperm/NonConverted -C /Egg/Converted -N /Egg/NonConverted -s /Res/BisSeq
      -w 500 -m 10 -l 2 -f 50 

**************************************************************************************

**************************************************************************************
**                       Bis Seq Aggregate Plotter: October 2012                    **
**************************************************************************************
BSAP merges bisulfite data over equally sized regions to generate data for class
average agreggate plots of fraction methylation.  A smoothing window is also applied.
Data for unstranded, sense, and antisense are produced.

Options:
-c Converted PointData directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories. See the NovoalignBisulfiteParser app.
-n Non-converted PointData directories, ditto. 
-b Bed file (tab delim: chr start stop name score strand(+/-/.)), full path.
-i Don't invert - stranded regions, defaults to inverting.
-s Scale all regions to a particular size. Defaults to scaling to max region size.
-m Calculate individual base fractions and then take a mean, ignoring zeros, over
       the window, instead of summing the obs in the window and taking the fraction.
-o Minimum number of observations before scoring base fraction methylation, defaults
       to 8.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/BisSeqAgregatePlotter -c
      /NBP/Con -n /NBP/NonCon -b /Anno/tssSites.bed -m

**************************************************************************************

**************************************************************************************
**                                BisSeqErrorAdder: June 2012                       **
**************************************************************************************
Takes PointData from converted and non-converted C bisulfite sequencing data parsed
using the NovoalignBisulfiteParser and simulates a worse non-coversion rate by 
randomly picking converted observations and making them non-converted. This is
accomplished by first measuring the non-conversion rate in the test chromosome (e.g.
chrLambda), calculating the fraction of converted C's need to flip to non-converted
to reach the target fraction non-converted and then using this flip fraction
to modify the other chromosome data. 

Options:
-s Save directory, full path.
-c Converted PointData directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories.
-n Non-converted PointData directories, ditto. 
-f Target fraction non-converted for test chromosome, this cannot be less than the
       current fraction.
-t Test chromosome, defaults to chrLambda* .

Example: java -Xmx12G -jar pathTo/USeq/Apps/BisSeqErrorAdder -c /Data/Sperm/Converted
      -n /Data/Sperm/NonConverted -f 0.02 

**************************************************************************************

**************************************************************************************
**                                 BisStat: June 2018                               **
**************************************************************************************
Takes PointData from converted and non-converted C bisulfite sequencing data parsed
using the NovoalignBisulfiteParser and generates several xxCxx context statistics and
graphs (bp and window level fraction converted Cs) for visualization in IGB.
BisStat estimates whether a given C is methylated using a binomial distribution where
the expect can be calculated using the fraction of non-converted Cs present in the
lambda data. Binomial p-values are converted to FDRs using the Benjamini & Hochberg
method. This app requires considerable RAM (10-64G).

Options:
-s Save directory, full path.
-c Converted PointData directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories.
-n Non-converted PointData directories, ditto. 
-f Directory containing chrXXX.fasta(/.fa .zip/.gz OK) files for each chromosome.

Default Options:
-p Minimimal FDR for non-converted C's to be counted as methylated, defaults to 20 a
       -10Log10(FDR = 0.01) conversion.
-e Expected fraction non-converted Cs due to partial bisulfite conversion and
       sequencing error, defaults to 0.005 .
-l Use the unmethylated lambda alignment data to set the expected fraction of
       non-converted Cs due to partial conversion and sequencing error. This is
       predicated on including a 'chrLambda' fasta sequence while aligning your data.
-o Minimum read coverage to count mC fractions, defaults to 8
-w Window size, defaults to 1000.
-m Minimum number Cs passing read coverage in window to score, defaults to 5. 
-r Full path to R, defaults to '/usr/bin/R'
-g Don't merge stranded data, defaults to running a non stranded analysis. Affects CG's.
-a First density quartile fraction methylation threshold, defaults to 0.25
-b Fourth density quartile fraction methylation threshold, defaults to 0.75

Example: java -Xmx12G -jar pathTo/USeq/Apps/BisStat -c /Data/Sperm/Converted -n 
      /Data/Sperm/NonConverted -s /Data/Sperm/BisSeq -w 5000 -m 10 -f
      /Genomes/Hg18/Fastas -o 10 

**************************************************************************************

**************************************************************************************
**                           BisStat Region Maker:    Nov 2014                      **
**************************************************************************************
Takes serialized window objects from BisStat, thresholds based on the min and max
fraction methylation params and prints regions in bed format meeting the criteria.
May also build regions base on the density of a given fraction methylation quartile.
For example, to identify regions where at least 0.8 of the sequenced Cs are low
methylated (<= 0.25 default settings in BisStat) set -q 1 -m 0.8 . To find regions of
with >= 0.9 of the Cs with high methylation (>= 0.75 default BisStat setting), set
-q 3 -m 0.9  . 

Options:
-s SerializedWindowObject directory from BisStat, full path.
-m Minimum fraction.
-x Maximum fraction.
-g Maximum gap, defaults to 0.
-q Merge windows based on their quartile density score, not fraction methylation, by
      indicating 1,2,or 3 for 1st, 2nd+3rd, or 4th, respectively.
-r Full path to R, defaults to '/usr/bin/R'

Example: java -Xmx4G -jar pathTo/USeq/Apps/BisStatRegionMaker -m 0.8 -x 1.0 -g 100
      -s /Data/BisStat/SerializedWindowObjects  

**************************************************************************************

**************************************************************************************
**	                    Calculate Per Cycle Error Rate : July 2015                **
**************************************************************************************
Calculates per cycle snv error rate provided a sorted indexed bam file and a fasta
sequence file. Only checks CIGAR M bases not masked or INDEL bases.

Required Options:
-b Full path to a coordinate sorted bam file (xxx.bam) with its associated (xxx.bai)
      index or directory containing such. Multiple files are processed independently.
-f Full path to the single fasta file you wish to use in calculating the error rate.

Default Options:
-s Perform separate first read second read analysis, defaults to merging.
-c Maximum fraction failing cycles, defaults to 0.1
-1 Maximum first read or merged read error rate, defaults to 0.01
-2 Maximum second read error rate, defaults to 0.0175
-o Write coverage statistics to this log file instead of stdout.
-j Write summary stats in json format to this file. Only stats for the first bam file
      are saved. Only separate strand analysis permitted.
-m Set minimum mapping quality for inclusion. Default: 0.
-p Require that a read be mapped in a proper pair for inclusion in error rate calculations.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/CalculatePerCycleErrorRate
     -b /Data/Bam/ -f /Fastas/chrPhiX_Illumina.fasta.gz 

**************************************************************************************

**************************************************************************************
**                                   ChIPSeq: May 2014                              **
**************************************************************************************
The ChIPSeq application is a wrapper for processing ChIP-Seq data through a variety of
USeq applications. It:
   1) Parses raw alignments (sam, eland, bed, or novoalign) into binary PointData
   2) Filters PointData for duplicate alignments
   3) Makes relative ReadCoverage tracks from the PointData (reads per million mapped)
   4) Runs the PeakShiftFinder to estimate the peak shift and optimal window size
   5) Runs the MultipleReplicaScanSeqs to window scan the genome generating enrichment
        tracks using DESeq2's negative binomial pvalues and B&H's FDRs
   6) Runs the EnrichedRegionMaker to identify likely chIP peaks (FDR < 1%, >2x).

Options:
-s Save directory, full path.
-t Treatment alignment file directories, full path, comma delimited, no spaces, one
       for each biological replica. These should each contain one or more text
       alignment files (gz/zip OK) for a particular replica. Alternatively, provide
       one directory that contains multiple alignment file directories.
-c Control alignment file directories, ditto. 
-y Type of alignments, either novoalign, sam, bed, or eland (sorted or export).
-v Genome version (e.g. H_sapiens_Feb_2009, M_musculus_Jul_2007), see UCSC FAQ,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-r Full path to R, defaults to '/usr/bin/R'. Be sure to install DESeq2, gplots, and
      qvalue Bioconductor packages.

Advanced Options:
-m Combine any replicas and run single replica analysis (ScanSeqs), defaults to
      using DESeq2.
-a Maximum alignment score. Defaults to 60, smaller numbers are more stringent.
-q Minimum mapping quality score. Defaults to 13, bigger numbers are more stringent.
      This is a phred-scaled posterior probability that the mapping position of read
      is incorrect. Set to 0 for RNASeq data.
-p Peak shift, defaults to the PeakShiftFinder peak shift or 150bp. Set to 0 for
      RNASeq data.
-w Window size, defaults to the PeakShiftFinder peak shift + stnd dev or 250bp.
-i Minimum number reads in window, defaults to 10.
-f Filter bed file (tab delimited: chr start stop) to use in excluding intersecting
      windows while making peaks, e.g. satelliteRepeats.bed .
-g Print verbose output from each application.
-e Don't look for reduced regions.

Example: java -Xmx2G -jar pathTo/USeq/Apps/ChIPSeq -y eland -v D_rerio_Dec_2008 -t 
      /Data/PolIIRep1/,/Data/PolIIRep2/ -c /Data/PolIINRep1/,/Data/PolIINRep2/ -s
      /Data/Results/WtVsNull -f /Anno/satelliteRepeats.bed

**************************************************************************************

**************************************************************************************
**                          Cluster Multi Sample VCF: Nov 2014                      **
**************************************************************************************
Clusters samples based on the genotypes of each that differ in one or more samples.

Options:
-v Full path to a multi sample vcf file (xxx.vcf/xxx.vcf.gz)). Note, Java often fails
       to parse tabix compressed vcf files.  Best to uncompress.

-r Minimum record QUAL score, defaults to 20.
-g Minimum sample genotype GT score, defaults to 20.
-i Use sample index instead of trimmed name in output.
-c Minimum # samples with given genotype, defaults to 1.

Example: java -Xmx2G -jar pathTo/USeq/Apps/ClusterMultiSampleVCF -v ~/UGP/suicide.vcf

**************************************************************************************

**************************************************************************************
**                              Collect Bam Stats: Dec 2014                         **
**************************************************************************************
Parses and plots bam alignment quality statistics from a log file containing the
output of the MergePairedAlignments and Sam2USeq apps. Will flag datasets that fail
the set thresholds.

Options:
-l Directory containing a combine log file of MergePairedAlignments and Sam2USeq,
      one per sample.

Default Options:
-x Minimum alignment coverage threshold, defaults to 10.
-c Minimum fraction interrogated bases at the coverage threshold, defaults to 0.95
-u Maximum fraction unmapped reads, defaults to 0.01
-d Maximum fraction duplicate reads, defaults to 0.15
-p Minimum fraction passing alignments, defaults to 0.8
-o Maximum fraction overlapping bps in paired alignments, defaults to 0.1

Example: java -Xmx1500M -jar pathToUSeq/Apps/CollectBamStats -l /QC/Sam2USeqLogs/
     -x 15 -c 0.9  

**************************************************************************************

**************************************************************************************
**                        Compare Intersecting Regions: Nov 2012                    **
**************************************************************************************
Compares test region file(s) against a master set of regions for intersection.
Reports the results as columns relative to the master. Assumes interbase coordinates.

Options:
-m Full path for the master bed file (tab delim: chr start stop ...).
-t Full path to the test bed file to intersect or directory of files.
-g Maximum bp gap allowed for scoring an intersection, defaults to 0 bp. Negative gaps
     force overlaps, positive gaps allow non intersecting bases between regions.

Example: java -Xmx4G -jar pathTo/Apps/CompareIntersectingRegions -g 1000
        -m /All/mergedRegions.bed.gz -t /IndividualERs/

************************************************************************************

**************************************************************************************
**                       Compare Intersecting Vcfs : February 2019                  **
**************************************************************************************
Compares vcf files by creating a master list of variants and then scores each for the
presense of the same CHROM POS ALT REF in each vcf file.

Options:
-v A directory of vcf files to compare (xxx.vcf(.gz/.zip OK)).
-r Name of a spreadsheed results file, should end in xxx.txt.gz

Example: java -Xmx10G -jar pathTo/USeq/Apps/CompareIntersectingVcfs -v VCFs/
       -r comparisonVcf.txt.gz

**************************************************************************************

**************************************************************************************
**                         Compare Parsed Alignments: Nov 2009                      **
**************************************************************************************
Compares two parsed alignments for a common distribution of snps using R's Fisher's
Exact. Run the ParseIntersectingAlignments with the same snp table first.

Options:
-a Full path file name for the first xxx.alleles file.
-b Full path file name for the first xxx.alleles file.
-d Full path directory name for writing temporary files.
-r Full path file name for R, defaults to '/usr/bin/R'

Example: java -Xmx1500M -jar pathToUSeq/Apps/CompareParsedAlignments. 
     -a /SeqData/lymphSNPs.alleles -b /SeqData/normalSNPs.alleles -b /temp/

**************************************************************************************

**************************************************************************************
**                              Concatinate Fastas: Oct 2010                        **
**************************************************************************************
Concatinates a directory of fasta files into a single sequence seperated by a defined
number of Ns.  Outputs the merged fasta as well as bed files for the junctions and
spacers as well as a file to be used to shift UCSC gene table annotations. Use this
app to create artificial chromosomes for poorly assembled genomes. 

Options:
-d Full path directory for saving the results.
-f Full path directory containing fasta files to concatinate.
-n Number of Ns to use as a spacer, defaults to 1000.
-c Name to give the concatinate, defaults to chrConcat .

Example: java -Xmx4G -jar pathTo/USeq/Apps/ConcatinateFastas -n 2000 -d
    /zv8/MergedNA_Scaffolds -f /zv8/BadFastas/ -c chrNA_Scaffold 

**************************************************************************************

**************************************************************************************
**                                Correct VCF Ends: July 2017                       **
**************************************************************************************
Use to correct the END=xxx tags in a Crossmap vcf . Removes any MC tags. Adds chr.

Required Options:
-v Path to the Crossmap vcf file.
-b Path to the VCF2Bed -> Crossmap bed file
-r Path to save the modifed gzipped file. 

Ex: java -jar USeq_XXX/Apps/CorrectVCFEnds -v b38.vcf -b b38.bed -r finalB38.vcf.gz

**************************************************************************************

**************************************************************************************
**                                CorrelatePointData: Aug 2011                      **
**************************************************************************************
Calculates a Pearson Correlation Coefficient on the values of PointData found with the
same positions in the two datasets. Do NOT use on stair-step/ heat-map graph data.
Only use on point representation data.

Options:
-f First PointData set. This directory should contain chromosome specific xxx.bar.zip
       files, stranded or unstranded.
-s Second PointData set, ditto. 
-p Full path file name to use in saving paired scores, defaults to not printing.

Example: java -Xmx4G -jar pathTo/USeq/Apps/CorrelatePointData -f /BaseFracMethyl/X1
      -s /BaseFracMethyl/X2 

**************************************************************************************

***************************************************************
*                      CountChromosomes                       *
*                                                             *
* This script drives samtools view command.  It will create   *
* a report that lists counds to standard chroms, extra        *
* chroms, phiX and adatpter.  This data will be used in the   *
* ParseMetrics App.                                           *
*                                                             *
* -i Input file (bam format)                                  *
* -o Output file (.txt format)                                *
* -r Reference (hg19, hg18, mm10, mm9 etc.                    *
* -p path to samtools                                         *
***************************************************************

Output File not specified, exiting

**************************************************************************************
**                        Bisulfite Convert Fastas: Dec 2008                        **
**************************************************************************************
Converts all the c/C's to t/T's in fasta file(s) maintaining case.

Required Parameters:
-f Full path text for the xxx.fasta file or directory containing such.

Example: java -Xmx2000M -jar pathTo/Apps/BisulfiteConvertFastas -f /affy/Fastas/

**************************************************************************************

**************************************************************************************
**                                  Consensus : March 2017                          **
**************************************************************************************
Consensus clusters alignments sharing the same unclipped start position and molecular
barcode. It then calls consensus on the clustered alignments outputing fastq for
realignment and unmodified bam records. After running, align the fastq files and merge
the new bams with those in the save directory. 

 Required arguments:
-b Path to the mate matched bam file created by FastqBarcodeTagger | cutadapt | bwa |
     MatchMates.  See FBT and MM for details. 

Optional Arguments:
-s Path to a directory to save the results, defaults to a derivative of the
     bam file.
-t Number concurrent threads to run, defaults to the max available to the jvm / 2.
-c Number of alignments to process in one chunk, defaults to 1,000,000. Adjust for the
     availible RAM.
-x Maximum number of alignments to cluster before subsampling, defaults to 20000.
-q Minimum barcode base quality, defaults to 13, anything less is assigned an N.
-n Minimum number of non N barcode bases, defaults to 7, anything less is tossed.
-f Minimum fraction barcode identity for inclusion in a cluster, defaults to 0.875 .
-u Minimum read base quality for inclusion in consensus calling, defaults to 13.
-r Minimum read base fraction identity to call a consensus base, defaults to 0.66 .
     Anything less is assigned an N.

Example: java -Xmx100G -jar pathTo/USeq/Apps/Consensus -b MM/passingMM.sorted.bam 

**************************************************************************************

**************************************************************************************
**                           Correlation Maps:    Nov 2007                         **
**************************************************************************************
CM calculates a correlation score for each window of genes and using permutation, an
empirical p-value.  The correlation score is the mean of all pair Spearman ranks for
the gene expression profiles in each window. If a single value is given (unlogged!) for
each gene, a mean of the scores within each window is calculated.

To calculate p-values, X randomized datasets are created by shuffling the expression
profiles between genes, windows are scored and pooled.  P-values for each real
score are calculated based on the area under the right side of the randomized score
distribution. In addition to a spread sheet report summary, heat map xxx.bar files
for the p-values and mean correlation are created for visualization in IGB.
Note, this analysis is not stranded.  If so desired parse lists appropriately.

Parameters:
-f The full path file text for a tab delimited gene file (text,chr,start,stop,scores)
-o GenomicRegion filter file, full path file text for a tab delimited region file to use in
      removing genes from correlation analysis. (chrom, start, stop).
-g Genome version for IGB visualizations (e.g. C_elegans_May_2007).
-w Window size, default is 50000bp. Setting this too small may exclude some regions.
-n Minimum number of genes required in each window, defaults to 3. Setting this too
       high will exclude some regions.
-r Number random trials, defaults to 100

Example: java -Xmx256M -jar pathTo/T2/Apps/CorrelationMaps -f /Mango/geneFile.txt
       -w 30000 -n 2 -o /Mango/operons.txt

**************************************************************************************

**************************************************************************************
**                     Convert Fasta 2 GC Bar Graphs: April 2011                    **
**************************************************************************************
Converts fasta files into graph files containing a 1 over each C in a CpG context.

Required Parameters:
-f Full path name for the directory containing xxx.fasta(.gz/.zip OK).
-v Versioned Genome (ie H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.

Example: java -Xmx4G -jar pathTo/Apps/ConvertFasta2GCBarGraph -f /affy/Fastas/
      -v H_sapiens_Feb_2009

**************************************************************************************

**************************************************************************************
**                         DbNSFP Coordinate Converter: Dec 2017                    **
**************************************************************************************
Walks a directory of dbNSFP files swapping the B38 coordinates with the B37, splits
by chromosome, sorts, and writes out the final composite. 
Options:
-d Path to a directory of dbNSFP files to parse.
-s Path to a directory for saving the results.


Example: java -Xmx20G -jar pathToUSeq/Apps/DbNSFPChrSplitter -d DbNSFP3.5a 
     -s B37_DbNSFP3.5a 

**************************************************************************************

**************************************************************************************
**                             Defined Region Bis Seq: Dec 2013                     **
**************************************************************************************
Takes two condition (treatment and control) PointData from converted and non-converted
C bisulfite sequencing data parsed using the NovoalignBisulfiteParser and scores user
defined regions for differential methylation using either a fisher or chi-square test. 
A Benjamini & Hockberg correction is applied to convert the pvalues to FDRs. Data is
only collected on Cs that meet the minimum read coverage threshold in both datasets. 
The fraction differential methylation statistic is calculated by taking the
pseudomedian of all of the log2 paired base level fraction methylations in a given
region. To examine particular mC contexts (e.g. mCG), first filter your PointData
using the ParsePointDataContexts app.

Options:
-b A bed file of regions to score (tab delimited: chr start stop ...)
-s Save directory, full path.
-c Treatment converted PointData directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_-/+_.bar.zip files fro the NBP app.
       One can also provide a single directory that contains multiple PointData
       directories.
-C Control converted PointData directories, ditto. 
-n Treatment non-converted PointData directories, ditto. 
-N Control non-coverted PointData directories, ditto. 

Default Options:
-d Minimum per base read coverage, defaults to 5.
-r Full path to R, defaults to '/usr/bin/R'

Example: java -Xmx10G -jar pathTo/USeq/Apps/DefinedRegionBisStat -c /Sperm/Converted
      -n /Sperm/NonConverted -C /Egg/Converted -N /Egg/NonConverted -s /Res/DRBS
      -b /Res/CpGIslands.bed 

**************************************************************************************

**************************************************************************************
**                     Defined Region Differential Seq: Aug 2016                    **
**************************************************************************************
DRDS takes sorted bam files, one per replica, minimum one per condition, minimum two
conditions (e.g. treatment and control or a time course/ multiple conditions) and
identifies differentially expressed genes using DESeq2 or SAMTools. DESeq2's rLog
normalized count data is used to heirachically cluster the samples. Differential
splicing is estimated using a chi-square test of independence. When testing only a
few genes or regions, append these onto a full gene table so that DESeq2 can
appropriately estimate the library size and replica variance.

Options:
-s Save directory.
-c Conditions directory containing one directory for each condition with one xxx.bam
       file per biological replica and their xxx.bai indexs. 3-4 reps recommended per
       condition. The BAM files should be sorted by coordinate using Picard's SortSam.
       All spice junction coordinates should be converted to genomic coordinates, see
       USeq's SamTranscriptomeParser.
-r Full path to R (version 3+) loaded with DESeq2, samr, and gplots defaults to
       '/usr/bin/R' file, see http://www.bioconductor.org . Type 'library(DESeq2);
       library(samr); library(gplots)' in R to see if they are installed. 
-u UCSC RefFlat or RefSeq gene table file, full path. Tab delimited, see RefSeq Genes
       http://genome.ucsc.edu/cgi-bin/hgTables, (uniqueName1 name2(optional) chrom
       strand txStart txEnd cdsStart cdsEnd exonCount (commaDelimited)exonStarts
       (commaDelimited)exonEnds). Example: ENSG00000183888 C1orf64 chr1 + 16203317
       16207889 16203385 16205428 2 16203317,16205000 16203467,16207889 . NOTE:
       this table should contain only ONE composite transcript per gene (e.g. use
       Ensembl genes NOT transcripts). Use the MergeUCSCGeneTable app to collapse
       transcripts. See http://useq.sourceforge.net/usageRNASeq.html for details.
-b (Or) a bed file (chr, start, stop,...), full path, See,
       http://genome.ucsc.edu/FAQ/FAQformat#format1
-g Genome Version  (ie H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-f Turn off DESeq2 independent filtering.

Advanced Options:
-m Mask overlapping gene annotations, recommended for well annotated genomes.
-x Max per base alignment depth, defaults to 50000. Genes containing such high
       density coverage are ignored.
-n Max number alignments per read. Defaults to 1, unique.  Assumes 'NH' tags have
      been set by processing raw alignments with the SamTranscriptomeProcessor.
-e Minimum number alignments per gene-region per replica, defaults to 10.
-i Score introns instead of exons.
-p Perform a stranded analysis. Only collect reads from the same strand as the
      annotation.
-j Reverse stranded analysis.  Only collect reads from the opposite strand of the
      annotation.  This setting should be used for the Illumina's strand-specific
      dUTP protocol.
-k Second read's strand is flipped. Otherwise, assumes this was not done in the 
      SamTranscriptomeParser.
-t Don't delete temp files (R script, R results, Rout, etc..).
-a Run SAMseq in place of DESeq2.  This is only recommended with five or more
      replicates per condition.
-v Use these 3 -10Log10(AdjPVal) thresholds, comma delimited, no spaces, defaults
      to 10,20,30
-w Use these 3 absolute log2 ratio thresholds, comma delimited, no spaces, defaults
      to 0.585,1,1.585
-y Add in non phred AdjPVal columns, defaults to excluding.
 
Example: java -Xmx4G -jar pathTo/USeq/Apps/DefinedRegionDifferentialSeq -c
      /Data/TimeCourse/ESCells/ -s /Data/TimeCourse/DRDS -g H_sapiens_Feb_2009
     -u /Anno/mergedHg19EnsemblGenes.ucsc.gz -w 0.322,0.585,1 -y 

**************************************************************************************

**************************************************************************************
**                           Defined Region RNA Editing: April 2014                 **
**************************************************************************************
DRRE scores regions for the pseudomedian of the base fraction edits as well as the
probability that the observations occured by chance using a permutation test based on
the chiSquare goodness of fit statistic. 

Options:
-b A bed file of regions to score (tab delimited: chr start stop ...)
-e Edited PointData directory from the RNAEditingPileUpParser.
       These should contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories. These will be merged when scanning.
-r Reference PointData directory from the RNAEditingPileUpParser. Ditto.
-a Minimum base read coverage, defaults to 5.
-t Run a stranded analysis, defaults to non-stranded.
-i Remove base fraction edits that are non zero and represented by just one edited
       base.

Example: java -Xmx4G -jar pathTo/USeq/Apps/DefinedRegionRNAEditing -b hg19UTRs.bed
-e /PointData/Edited -r /PointData/Reference 

**************************************************************************************

**************************************************************************************
**                           Defined Region Scan Seqs: March 2011                   **
**************************************************************************************
DRSS takes chromosome specific PointData xxx.bar.zip files and extracts scores under
each region to calculate several statistics including a binomial p-value, Storey
q-value FDR, an empirical FDR, a p-value for strand skew, and a chi-square test of
independence between the exon read count distributions between treatment and control
data (a test for alternative splicing). Several measures of read counts are provided
including counts for each strand, a normalized log2 ratio, and RPKMs (# reads per kb
of interrogated region per total million mapped reads). If a gene table is provided,
scores under each exon are summed to give a whole gene summary. It is also recommended
to run a gene table of introns (see the ExportIntronicRegions app) to look for
intronic retention and novel transfrags/ exons.  If one provides splice junction bed
files for treatment and control RNA-Seq data, see the NovoalignParser, splice
junctions will be scored for differential expression. This is an additional
calculation unrelated to the chi-square independance test. Lastly, if control
data is not provided, simple region sums are calculated.

Options:
-s Save directory, full path.
-t Treatment PointData directories, full path, comma delimited. These should
       contain unshifted stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories.
-c Control PointData directories, ditto. 
-p Peak shift, average distance between + and - strand peaks for chIP-Seq data, see
       PeakShiftFinder. For RNA-Seq set to the smallest expected fragment size. Will
       be used to shift the PointData 3' by 1/2 the peak shift.
-r Full path to R loaded with Storey's q-value library, defaults to '/usr/bin/R'
       file, see http://genomics.princeton.edu/storeylab/qvalue/
-u UCSC RefFlat or RefSeq gene table file, full path. See,
       http://genome.ucsc.edu/cgi-bin/hgTables, (name1 name2(optional) chrom strand
       txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds)
-b (Or) a bed file (chr, start, stop,...), full path, See,
       http://genome.ucsc.edu/FAQ/FAQformat#format1

Advanced Options:
-o Don't remove overlapping exons, defaults to filtering gene annotation for overlaps.
-i Score introns instead of exons.
-f Scan for just enriched regions, defaults to look for both. Only use with chIP-Seq
       datasets where the control is input. This turns on the empFDR estimation.
-d Treatment splice junction bed file(s) from the NovoalignParser, comma delimited,
       full path.
-e Control splice junction bed file(s), comma delimited, full path.
-m Minimum number of reads in associated gene before scoring splice junctions.
       Used in estimating the expected proportion of T and scaling the log2Ratio. 
       Defaults to 100.
-w Use read score probabilities (assumes scores are > 0 and <= 1), defaults to
       assigning 1 to each read score. Experimental.

Example: java -Xmx4G -jar pathTo/USeq/Apps/DefinedRegionScanSeqs -t
      /Data/PolIIRep1/,/Data/PolIIRep2/ -c /Data/Input1/,Data/Input2/ -s
      /Data/PolIIResults -p 100 -b /Data/selectRegions.bed -f 

**************************************************************************************

**************************************************************************************
**                                DRDS Annotator: January 2014                      **
**************************************************************************************
This application annotates DefinedRegionDifferentialSeq xlsx files using Ensembl 
biomart tab-delimited annotation files. By default, ensembl biomart output files will 
list the Ensembl gene id in the first column and Ensembl transcript id in the second 
column.  This application assumes these defaults.  It will match the gene id in the 
first column of the biomart file to the name listed in the 'IGB HyperLink' column 
found in the 'Analyzed Genes' tab of the DRDS xlxs output. All biomart columns after 
the transcript id column are added to the output file.  The data is inserted between 
the 'Alt Name' and locus columns in the 'Analyzed Genes' tab.

The biomart output files can have multiple annotation lines for each gene id.  
Currently, this app uses the first annotation line encountered.


Required Arguments:

-i Input file. Path to DRDS xlsx output file you wish to annotate 
-a Annotation file. Path to biomart annotation file. 
-o Annotated output file. Path to the annotated output file

Example: java -Xmx4G -jar pathTo/USeq/Apps/DRDSAnnotator -i geneStats.xlsx 
               -a mm10.biomart.txt -o geneStats.ann.xlsx

**************************************************************************************

**************************************************************************************
**                          Enriched Region Maker: July 2013                        **
**************************************************************************************
ERM combines windows from ScanSeqs xxx.swi files into larger enriched or reduced
regions based on one or more scores. For each score index, you must provide a minimal
score. Adjacent windows that exceed the minimum score(s) are merged and the best
window scores applied to the region. If treatment and control PointData are provided,
the best 25bp peak within each region will be identified and each ER rescored. To
select for ERs with a 1% FDR and 2x enrichment above control, follow the example
assuming score indexes 1,2,4 correspond to QValFDR, EmpFDR, and 
Log2Ratio. Note, if you are performing a static analysis comparing chIP vs chIP,
don't set thresholds on the EmpFDR, this was disabled and all of the values are zero.
To print descriptions of the score indexes, complete the command line and skip the 
-i option. Lastly, FDRs and p-values are represented in USeq in a transformed state,
as -10Log10(FDR/p-val) where 13 = 5%, 20 = 1%, etc. To select for regions with an FDR of
less than 1% you would set a threshold of 20 for the QValFDR and, if running a static
analysis, the EmpFDR. 

Options:
-f Full path file name for the serialized xxx.swi file from ScanSeqs, if a
      directory is specified, all xxx.swi files will be processed.
-s Minimal score(s) one for each score idex, comma delimited, no spaces.
-i Score index(s) one for each minimum score. 

Advanced Options:
-n Make a given number of ERs, one or more, comma delimited, no spaces. Uses score
      index 0.
-m Multiply scores by -1 to make reduced regions instead of enriched regions.
-r Remove windows that intersect a list of regions. Enter a full path tab delimited
      regions file text (chr start stop) Coordinates are assumed to be zero based and
      stop inclusive. Useful for excluding regions from ER generation.
-b BP buffer to subtract and add to start and stops of regions used in filtering
      intersecting windows, defaults to 0.
-e Exclude entire ERs that intersect the -r regions, defaults to removing windows.
      This is more exclusive and will not simply punch holes in ERs but throw out
      The entire ER.
-g Max gap, defaults to the size of the window used in ScanSeqs.
-t Provide treatment PointData directories, full path, comma delimited to ID the peak
       center in each ER. These should contain the same unshifted stranded chromosome
       specific xxx_-/+_.bar.zip files used in ScanSeqs.
-c Control PointData directories, ditto. 
-p Full path to R, defaults to '/usr/bin/R', required for rescanning ERs.
-w Sub window size, defaults to 25bp.

Example: java -Xmx500M -jar pathTo/USeq/Apps/EnrichedRegionMaker -f /solexa/zeste.swi
      -i 1,2,4 -s 20,20,1 -w 50

**************************************************************************************

**************************************************************************************
**                            Estimate Error Rates: Jan 2017                        **
**************************************************************************************
EER scans an mpileup file looking for short windows of adjacent bps (default 7) where
1) each base exceeds a minimum read depth of high quality bases (>100)
2) shows little evidence of indels (<0.1), and 
3) the fraction of poor quality bps isn't excessive (<0.5). 
The non reference snv observations are then tabulated for the center base in each
window, if low (<0.1), they are assumed error and saved. For indel error calculations,
each bp is scored as above sans the indel filter and window requirement. Insertions
are counted once regardless of the size, where as deletions are counted for every base
affected. Run this app on samples where real snvs and indels are expected to have an
allele frequency of  > ~0.5 , e.g. normal or pure single clone somatic.

Required Options:
-m Path to a normal sample mpileup file (gz/zip OK), 'samtools mpileup -B -q 20 -d 
     1000000 -f $fastaIndex -l $bedFile *.bam | gzip > mpileup.gz' Multiple samples
     in the file are merged.

Default Options:
-b Minimum base quality, default 20
-r Minimum good base coverage, default 100
-i Maximum INDEL allele freq for snv counting, default 0.1
-n Maximum non reference allelic freq, default 0.1
-p Maximum failing base allele freq, default 0.5
-f Number flanking bp to define scorable region, default 3
-s Comma delimited list (zero is 1st sample, no spaces) of sample indexes to merge,
     defaults to all.
-c File path to save a count table of parsed observations, defaults to none.

Example: java -Xmx4G -jar pathToUSeq/Apps/EstimateErrorRates -m normExo.mpileup.gz
     -r 200 -i 0.15 -f 2 -s 0,3,4 -c countTable.txt
**************************************************************************************

**************************************************************************************
**                             Exact Bam Mixer : March 2019                         **
**************************************************************************************
Combines bam alignment files in different fractions to simulate multiple variant
frequencies. Run BamBlaster first. Threaded, so provide almost all the memory available
to java. The ExactBamMixer attempts to create bam files containing variants will very
similar AFs.  The BamMixer produces more of a spread of AFs.

Required:
-r Path to a directory to save the results
-u Path to the xxx_unmodified.bam from your BamBlaster run
-f Path to the xxx_filtered.bam from your BamBlaster run
-i Path to your realigned bam containing injected variants, merge the single and 
     paired end alignment files with MergeSams USeq app.
-v Path to the vcf file containing variants used to modify the BamBlaster alignments

Optional:
-m Fractions to mix in the variant alignments, comma delimited, no spaces, defaults to
     0.025,0.05,0.1,0.2
-t Number of threads to use, defaults to all
-a Minimum number alt read pairs to include an injected variant in a particular mixed
     bam, defaults to 2

Example: java -Xmx100G -jar pathTo/USeq/Apps/BamMixer -r ~/TumorSim/ -v inject.vcf
    -u ~/bb_unmodified.bam -f ~/bb_filtered.bam -p ~/bb_paired.bam -s ~/bb_single.bam 

**************************************************************************************

**************************************************************************************
**                              Export Exons   Nov 2014                             **
**************************************************************************************
EE takes a UCSC Gene table and prints the exons to a bed file.

Parameters:
-g Full path file text for the UCSC Gene table.
-a Expand the size of each exon by X bp, defaults to 0
-u Remove UTRs if present, defaults to including
-n Append exon numbers to the gene name field.  This makes the bed file compatible 
      with DRDS
-f Export just 5' UTRs

Example: java -Xmx1000M -jar pathTo/T2/Apps/ExportExons -g /user/Jib/ucscPombe.txt
      -a 50
**************************************************************************************

**************************************************************************************
**                        Export Intergenic Regions    May 2007                     **
**************************************************************************************
EIR takes a gff file and uses it to mask a boolean array.  Parts of the boolean array
that are not masked are returned and represent integenic sequences. Be sure to put in
a gff line at the stop of each chromosome noting the last base so you caputure the last
intergenic region. (eg chr1 GeneDB lastBase 3600000 3600001 . + . lastBase). Base
coordinates are assumed to be stop inclusive, not interbase.

Parameters:
-g Full path file text for a gff file or directory containing such.
-t Base pairs to trim from the ends of each intergenic region, defaults to 0.
-m Minimum acceptable intergenic size, those smaller will be tossed, defaults to 60bp
-s Subtract one from the start and stop coordinates.

Example: java -Xmx1000M -jar pathTo/T2/Apps/ExportIntergenicRegions -s -m 100 -g
                 /user/Jib/GffFiles/Pombe/sanger.gff

**************************************************************************************

**************************************************************************************
**                         Export Intronic Regions    June 2007                     **
**************************************************************************************
EIR takes a UCSC Gene table and fetches the most conservative/ smallest intronic
regions. Base coordinates are assumed to be stop inclusive, not interbase.

Parameters:
-g Full path file text for the UCSC Gene table.
-m Minimum acceptable intron size, those smaller will be tossed, defaults to 60bp
-s Subtract one from the stop coordinates of your UCSC table to convert from interbase.

Example: java -Xmx1000M -jar pathTo/T2/Apps/ExportIntronicRegions -s -m 100 -g
                 /user/Jib/ucscPombe.txt

**************************************************************************************

**************************************************************************************
**                              Export Trimmed Genes    May 2012                    **
**************************************************************************************
EE takes a UCSC Gene table and clips each gene back to the first intron closed by a
coding sequence exon. Thus these include all of the 5'UTRs. Genes with no introns are
removed.

Parameters:
-g Full path file text for the UCSC Gene table.
-u Print just UTRs, defaults to UTRs plus 1st CDS intron with flanking exon.
-i Print just 1st CDS intron with flanking exons.

Example: java -Xmx1000M -jar pathTo/T2/Apps/ExportTrimmedGenes -u -g 
      /user/Jib/ucscPombe.txt
**************************************************************************************

**************************************************************************************
**                          Fastq Barcode Tagger: August 2018                       **
**************************************************************************************
Takes 2 or 3 fastq files (paired end reads and possibly a third containing unique 
molecular barcodes/ indexes), appends the barcode and quality to the fastq header, and
writes out the modified records. For IDT inline 2 fastq UMI data sets, the barcode is
parsed from the beginning of each fastq. Be sure to clip 5Ns from the 3' end when
adapter trimming.

Options:
-f First fastq file, .gz/.zip OK.
-s Second fastq file, .gz/.zip OK.
-b Barcode fastq file, .gz/.zip OK, or set -e
-e Parse barcodes from the first 3bp of each read and combine the two 3mers into a
      6mer barcode. 5bp are trimmed from the ends of each read to remove the UMI and
      2bp constant seq as well as an potential read through. IDT's current strategy.
-i Write interlaced fastq to stdout for direct piping to other apps
-r Directory to save the modified fastqs, defaults to the parent of -f
-l Max length of barcode, defaults to all. Use to trim 3' end.
-a Append the line number to the read name to uniquify.

Example: java -Xmx1G -jar pathToUSeq/Apps/FastqBarcodeTagger -f lob_1.fastq.gz
     -s lob_2.fastq.gz -b lob_barcode.fastq.gz -i | bwa mem -p /ref/hg19.fa 

**************************************************************************************

**************************************************************************************
**                            Fastq Interlacer: April 2016                          **
**************************************************************************************
Takes paired fastq files and writes interlaced/ interleaved fastq to stndOut. 

Options:
-f First fastq file, .gz/.zip OK.
-s Second fastq file, .gz/.zip OK.

Example: java -Xmx1G -jar pathToUSeq/Apps/FastqInterlacer -f lob_1.fastq.gz
     -s lob_2.fastq.gz | cutadapt | bwa | samblaster .... 

**************************************************************************************

**************************************************************************************
**                            Fastq Renamer: April 2018                             **
**************************************************************************************
Takes paired fastq files and replaces the header with the record count. 

Options:
-f First fastq file, .gz/.zip OK.
-s Second fastq file, .gz/.zip OK.
-d Path to a directory for saving the modified fastq files.

Example: java -Xmx1G -jar pathToUSeq/Apps/FastqRenamer -f lob_1.fastq.gz
     -s lob_2.fastq.gz -d UniquifiedFastq/ 

**************************************************************************************

**************************************************************************************
**                           FetchGenomicSequences: Feb 2013                        **
**************************************************************************************
Given a file containing genomic coordinates, fetches and saves the sequence (column
output: chrom origStart origStop fetchedStart fetchedStop completeFetch seq).

-f Full path to a file or directory containing tab delimited chrom, start,
        stop text files.  Interbabase coordinates (zero based, stop excluded).
-s Full path directory text containing containing genomic fasta files. The fasta
        header defines the name of the sequence, not the file name. 
-b Fetch flanking bases, defaults to 0. Will set start to zero or stop to last base if
        boundaries are exceeded.
-r Reverse complement fetched sequences, defaults to returning the + genomic strand.
-a Output fasta format.

Example: java -Xmx1000M -jar pathTo/T2/Apps/FetchGenomicSequences -f /data/miRNAs.txt
      -s /genomes/human/v35.1/ -b 5000 -r   


**************************************************************************************

**************************************************************************************
**                          Find Neighboring Genes:   Nov 2008                      **
**************************************************************************************
FNG takes a list of genes in UCSC Gene Table format and intersects them with a list of
regions finding the closest gene to each region as well as all of the genes that fall
within a given neighborhood. Distance is measured from the center of the region to the
transcription start site/ 1st base position in 1st exon. See Tables link under
http://genome.ucsc.edu/ . Note, output coordinates are zero based, stop inclusive.

-g Full path file text for a tab delimited UCSC Gene Table (text chrom strand txStart
      txEnd cdsStart cdsEnd exonCount exonStarts exonEnds etc...) .
-p Full path file/directory text for tab delimited region list(s) (chr, start, stop) .
-b Size of neighborhood in bp, default is 10000 
-f Find genes that overlap neighborhood irregardles of distance to TSS.
-c Only print closest genes.
-o Print neighbors on one line.

Example: java -jar pathTo/T2/Apps/FindNeighboringGenes -g /anno/hg17Ensembl.txt -p
      /affy/p53/finalPicks.txt -b 5000 -c

**************************************************************************************

**************************************************************************************
**                           Find Overlapping Genes: Oct 2010                       **
**************************************************************************************
Finds overlapping genes that converge, diverge, or contain one another given a UCSC
gene table.

Options:
-u UCSC RefFlat or RefSeq gene table file, full path. See,
       http://genome.ucsc.edu/cgi-bin/hgTables, (name1 name2(optional) chrom strand
       txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds). NOTE:
       this table should contain only one composite transcript per gene (e.g. Use
       Ensembl genes NOT transcripts. See MergeUCSCGeneTable app.). 

Example: java -Xmx4G -jar pathTo/USeq/Apps/FindOverlappingGenes -u 
      /data/zv8EnsemblGenes.ucsc.gz

**************************************************************************************

**************************************************************************************
**                            Find Shared Regions: Dec 2018                         **
**************************************************************************************
Writes out a bed file of shared regions. Interbase coordinates.

Options:
-f First bed file (tab delimited: chr start stop ...).
-s Second bed file.
-r Results file.
-m Minimum length, defaults to 0.

Example: java -Xmx4G -jar pathTo/USeq/Apps/FindSharedRegions -f 
      /Res/firstBedFile.bed -s /Res/secondBedFile.bed -r /Res/common.bed -m 100

************************************************************************************

**************************************************************************************
**                            File Cross Filter: Sept 2017                         **
**************************************************************************************
FCF takes one or more columns in the matcher file and uses these as a key to parse and
save matching keys in the to parse files. Use this to parse lines in files that match
those in another. Keys must be unique. The order and number of the rows in the matcher
file is preserved, if a match is not found in the parsed file, a blank line is inserted
instead.

-m Path to a tab delimited txt (.gz/.zip OK) file to use in matching.
-a One or more column indexs in the matcher file to use as the key.
-p Path to a file or directory of files to parse (.gz/.zip OK).
-b One or more column indexes in the parse file(s) to use as a key.

Example: java -jar pathTo/USeq/Apps/FileCrossFilter -m intRegions.bed -a 0,1,2
     -p SpreadSheetData/ -b 0,1,2 

**************************************************************************************

**************************************************************************************
**                            File Match Joiner:  July 2008                         **
**************************************************************************************
FMJ loads a file and a particular column containing unique entries, a key, and then
appends the key line to lines in the parsed file that match a particular column.
Usefull for appending say chromosome coordinates to snp ids data, etc.

-k Full path file text for a tab delimited txt file (key) containing unique entries.
-f Ditto but for the file to parse, can specify a directory too.
-i Collapse duplicate keys.
-j Skip duplicate keys.
-a Column index containing the unique IDs in key, defaults to 0.
-b Column index containing the unique IDs in parsers, defaults to 0.
-p Print only matches.

Example: java -jar pathTo/Apps/FileMatchJoiner -k /snpChromMap.txt -m /SNPData/
     --b 2 -p

**************************************************************************************

**************************************************************************************
**                             File Joiner: Feb 2016                                **
**************************************************************************************
Joins text files into a single file, avoiding line concatenations. This is a problem
with using 'cat * >> combine.txt'. Removes empty lines. Option to follow custom order.

Parameters:
-f Full path text for the directory containing the text files.
-o (Optional) Order the files using this comma delimited list, no spaces. Not all
         need to exist.
-c (Optional) Concatinated results file.

Example: java -jar pathTo/T2/Apps/FileJoiner -f /affy/SplitFiles/
    -o 1.fasta,2.fasta,3.fasta,4.fasta
**************************************************************************************

**************************************************************************************
**                          File Splitter: July 2010                                **
**************************************************************************************
Splits a big text file into smaller files given a maximum number of lines.

Required Parameters:
-f Full path file text or directory for the text file(s) (.zip/.gz OK).
-n Maximum number of lines to place in each.
-g GZip split files.

Example: java -Xmx256M -jar pathTo/T2/FileSplitter -f /affy/bpmap.txt -n 50000

**************************************************************************************

**************************************************************************************
**                        Filter Intersecting Regions: Dec 2018                     **
**************************************************************************************
Flattens the mask regions and uses it to split the split file(s) into intersecting
and non intersecting regions based on the minimum fraction intersection. For UCSC gene
tables, exons are compared and if any in the gene intersect, the whole gene is moved
accordingly.

Options:
-m Full path file text for the masking bed file (tab delim: chr start stop ...).
-s Full path file or directory containing bed, gtf/gff, or ucsc gene table files to 
        split.
-t Type of files to split, indicate: bed, gff, or ucsc
-i Minimum fraction of each split region required to score as an intersection with
        the flattened mask, defaults to 1x10-1074
-b Expand start and stop of regions to mask by xxx bps, defaults to 0

Example: java -Xmx4000M -jar pathTo/Apps/FilterIntersectingRegions -i 0.5
        -m /ArrayDesigns/repMskedDesign.bed -b /ArrayDesigns/ -t bed

************************************************************************************

**************************************************************************************
**                          Filter Point Data: May 2016                             **
**************************************************************************************
FPD drops or saves observations from PointData that intersect a list of regions
      (e.g. repeats, interrogated regions).

Options:
-p Point Data directories, full path, comma delimited. These should contain
      chromosome specific xxx.bar.zip files. 
-r Full path file text for a tab delimited text file containing regions to use in
      filtering the intersecting data (chr start stop ..., interbase coordinates).
-i Select data that intersects the list of regions, defaults to selecting data that
      doesn't intersect.
-a Acceptible intersection, fraction, defaults to 0.5
-n Just calculate the number of observations after filtering, don't save any data.
-f Save directory, defaults to derivative of parent.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/FilterPointData -p /data/PointData 
      -r /repeats/hg18RepeatMasker.bed -a 0.75

**************************************************************************************

**************************************************************************************
**                           Foundation Vcf Comparator: Nov 2018                    **
**************************************************************************************
FVC compares a Foundation vcf generated with the FoundationXml2Vcf to a recalled vcf.
Exact recall vars are so noted and removed. Foundation vcf with no exact but one
overlapping record can be merged with -k. Be sure to vt normalize each before running.
Recall variants failing FILTER are not saved.

Options:
-f Path to a FoundationOne vcf file, see the FoundationXml2Vcf app.
-r Path to a recalled snv/indel vcf file.
-m Path to named vcf file for saving the results.
-c Append chr if absent in chromosome name.
-e Exclude Foundation ##contig header lines.
-k Attempt to merge Foundation records that overlap a recall and are the same type. 
     Defaults to printing both.

Example: java -Xmx2G -jar pathToUSeq/Apps/FoundationVcfComparator -f /F1/TRF145.vcf
     -r /F1/TRF145_recall.vcf.gz -e -c -m /F1/TRF145_merged.vcf.gz -k 

**************************************************************************************

**************************************************************************************
**                             Foundation Xml 2 Vcf: Nov 2018                       **
**************************************************************************************
Attempts to parse xml foundation reports to vcf. This is an inprecise process with
some insertions, multi snv, and multi vars. VCF variants have not been normalized.
Consider left aligning and demultiplexing with vt. Remove PHI elements first with grep: 
grep -vwE '(MRN|FullName|FirstName|LastName|ReportPDF)' TRF123.xml > clnTRF123.xml

Options:
-x Path to a FoundationOne xml report or directory containing such.
-s Path to a directory for saving the results.
-f Path to the reference fasta with xxx.fai index
-o Skip variants that clearly fail to convert, e.g. var seq doesn't match fasta.
     Defaults to marking 'ci' in FILTER field.

Example: java -Xmx2G -jar pathToUSeq/Apps/FoundationXml2Vcf -x /F1/TRF145179.xml
     -f /Ref/human_g1k_v37.fasta -s /F1/VCF/ 

**************************************************************************************

**************************************************************************************
**                            Freebayes VCF Parser: Mar 2017                        **
**************************************************************************************
Parses Freebayes VCF files, filtering for read depth, allele frequency diff ratio, etc.
Inserts AF and DP into for the tumor sample into the INFO field. Changes the sample
order to Normal and Tumor and updates the #CHROM line. Put the tumor bam first when
calling freebayes.

Required Options:
-v Full path file or directory containing xxx.vcf(.gz/.zip OK) file(s).
-t Minimum tumor allele frequency (AF), defaults to 0.
-n Maximum normal AF, defaults to 1.
-u Minimum tumor alignment depth, defaults to 0.
-o Minimum normal alignment depth, defaults to 0.
-d Minimum T-N AF difference, defaults to 0.
-r Minimum T/N AF ratio, defaults to 0.
-p Remove non PASS filter field records.

Example: java -jar pathToUSeq/Apps/FreebayesVCFParser -v /VCFFiles/ -t 0.05 -n 0.5
        -u 100 -o 20 -d 0.05 -r 2

**************************************************************************************

**************************************************************************************
**                      Gatk Called Segment Annotator: December 2018                **
**************************************************************************************
Annotates GATKs CallCopyRatioSegments output with denoised copy ratio and heterozygous
allele frequency data from the tumor and matched normal samples. Enables filtering
using these values to remove copy ratio calls with high normal background. Adds 
intersecting gene names.

Required Options:
-r Results directory to save the passing and failing segments.
-s Called segment file from GATKs CallCopyRatioSegments app, e.g. xxx.called.seg
-t Tumor denoised copy ratio file, from GATKs DenoiseReadCounts app. Bgzip compress
      and tabix index it with https://github.com/samtools/htslib :
      grep -vE '(@|CONTIG)' tumor.cr.tsv > tumor.cr.txt
      ~/HTSLib/bgzip tumor.cr.txt
      ~/HTSLib/tabix -s 1 -b 2 -e 3 tumor.cr.txt.gz
-n Normal denoised copy ratio file, ditto.
-u Tumor allele frequency file, from GATKs ModelSegments app. Bgzip compress
      and tabix index it with https://github.com/samtools/htslib :
      grep -vE '(@|CONTIG)' gbm7.hets.tsv > gbm7.hets.txt
      ~/HTSLib/bgzip gbm7.hets.txt
      ~/HTSLib/tabix -s 1 -b 2 -e 2 gbm7.hets.txt.gz
-o Normal allele frequency file, ditto.
-g RefFlat UCSC gene file, run USeq's MergeUCSCGeneTable to collapse transcripts.

Default Options:
-c Minimum absolute tumor log2 copy ratio, defaults to 0.15
-x Maximum absolute normal log2 copy ratio, defaults to 0.5
-m Minimum absolute log2 TN ratio of copy ratios, defaults to 0.15
-a Maximum bp gap for intersecting a segment with a gene, defaults to 1000

Example: java -Xmx4G -jar pathTo/USeq/Apps/GatkCalledSegmentAnnotator -r AnnoResults/
       -s gbm7.called.seg -t tumor.cr.txt.gz -n normal.cr.txt.gz -u gbm7.hets.txt.gz
       -o gbm7.hets.normal.txt.gz -g ~/UCSC/hg38RefSeq_Merged.refFlat.gz -a 100 

**************************************************************************************

**************************************************************************************
**                               Gatk Runner: March 2018                            **
**************************************************************************************
Takes a bed file of target regions, splits it by the number of threads, writes out
each, executes the GATK Gatktype caller, and merges the results. Set the -Xmx to the
maximum available on the machine to enable correct cpu thread usage.

Options:
-r A regions bed file (chr, start, stop,...) to intersect, see
       http://genome.ucsc.edu/FAQ/FAQformat#format1 , gz/zip OK.
-s Path to a directory for saving the results.
-t Number concurrent thread override. Sets itself based on the memory and cpus 
     available to the JVM.
-c GATK command to execute, see the example below, modify to match your enviroment.
     Most resources require full paths. Don't set -o or -L
-l Use lowercased l for Lofreq compatability.
-b Add a -bamout argument and merge bam chunks.

Example: java -Xmx24G -jar pathToUSeq/Apps/GatkRunner -b -r /SS/targets.bed -s
     /SS/HC/ -c 'java -Xmx4G -jar /SS/GenomeAnalysisTK.jar -T MuTect2 
    -R /SS/human_g1k_v37.fasta --dbsnp /SS/dbsnp_138.b37.vcf 
    --cosmic /SS/v76_GRCh37_CosmicCodingMuts.vcf.gz -I:tumor /SS/sar.bam -I:normal 
    /SS/normal.bam'

**************************************************************************************

**************************************************************************************
**                              GeneiASE Parser:  Sept 2016                         **
**************************************************************************************
Combines the GeneiASE results file with the input data file.

Required Options:
-r GeneiASE results output file
-d GeneiASE input data file
-o Output file for the summary spreadsheet

Example: java -Xmx4G -jar pathTo/USeq/Apps/AllelicExpressionDetector -b Bam/RPENormal/
-n D002-14,D005-14,D006-14,D009-14 -d GenotypingResults.txt.gz -s SNPMap_Ref2Alt_Int.txt
-r RPENormal -t ~/Anno/b37EnsGenes7Sept2016_Exons.bed.gz

**************************************************************************************

**************************************************************************************
**                                 Graph 2 Bed: Feb 2011                            **
**************************************************************************************
Converts USeq stair step and heat map graphs into region bed files using a threshold.
Do not use this with non USeq generated graphs. Won't work with bar or point graphs.

Options:
-p Point Data directories, full path, comma delimited. Should contain chromosome
       specific xxx.bar.zip or xxx_-_.bar files. May point this to a single directory
       of such too.
-t Threshold, regions exceeding it will be saved, defaults to 0.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/Graph2Bed -t 9 -p /data/ReadCoverage

**************************************************************************************

**************************************************************************************
**                          Generate Overlas: Dec 2012                      **
**************************************************************************************
Merges proper paired alignments that pass a variety of checks and thresholds. Only
unambiguous pairs will be merged. Increases base calling accuracy in overlap and helps
avoid non-independent variant observations and other double counting issues. Identical
overlapping bases are assigned the higher quality scores. Disagreements are resolved
toward the higher quality base. If too close in quality, then the quality is set to 0.
Be certain your input bam/sam file(s) are sorted by query name, NOT coordinate. 

Options:
-f The full path file or directory containing raw xxx.sam(.gz/.zip OK)/.bam file(s)
      paired alignments. 
      Multiple files will be merged.

Default Options:
-a Maximum alignment score (AS:i: tag). Defaults to 120, smaller numbers are more
      stringent. Approx 30pts per mismatch for novoalignments.
-q Minimum mapping quality score, defaults to 13, larger numbers are more stringent.
      Set to 0 if processing splice junction indexed RNASeq data.
-r The second paired alignment's strand is reversed. Defaults to not reversed.
-d Maximum acceptible base pair distance for merging, defaults to 5000.
-m Don't cross check read mate coordinates, needed for merging repeat matches. Defaults
      to checking.
-l Output file name.  Write merging statitics to file instead of standard output.

Example: java -Xmx1500M -jar pathToUSeq/Apps/MergePairedSamAlignments -f /Novo/Run7/
     -c -s /Novo/STPParsedBams/run7.bam -d 10000 

**************************************************************************************

**************************************************************************************
**                                 Gr2Bar: Nov 2006                                 **
**************************************************************************************
Converts xxx.gr.zip files to chromosome specific bar files.

-f The full path directory/file text for your xxx.gr.zip file(s).
-v Genome version (ie H_sapiens_Mar_2006), get from UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases
-o Orientation of GR file.  If not specified, orientation is left as '.'

Example: java -Xmx1500M -jar pathTo/T2/Apps/Gr2Bar -f /affy/GrFiles/ -v hg17 

**************************************************************************************

**************************************************************************************
**                               Inosine Predict: Aug 2010                          **
**************************************************************************************
IP estimates the likelihood of ADAR RNA editing using the multiplicative 4L,4R model
described in Eggington et. al. 2010.

Options:
-f Multi fasta file containing sequence(s) to score.
-m Maxtrix scoring file.
-p Print an example matrix.
-o Don't include the opposite strand.
-s Save directory, defaults to parent of the fasta file.
-z Name of a zip archive to create containing the results.

Example: java -Xmx2G -jar pathTo/USeq/Apps/InosinePredict -m 
    ~/ADARMatrix/hADAR1-D.matrix.txt -f ~/SeqsToScore/candidates.fasta.gz

**************************************************************************************

**************************************************************************************
**                            Intersect Lists: Dec 2008                             **
**************************************************************************************
IL intersects two lists (of genes) and using randomization, calculates the
significance of the intersection and the fold enrichment over random. Note, duplicate
items are filtered from each list prior to analysis.

-a Full path file text for list A (or directory containing), one item per line.
-b Full path file text for list B (or directory containing), one item per line.
-t The total number of unique items from which A and B were drawn.
-n Number of permutations, defaults to 1000.
-p Print the intersection sets (common, unique to A, unique to B) to screen.

Example: java -Xmx1500M -jar pathTo/Apps/IntersectLists -a /Data/geneListA.txt -b 
     /Data/geneListB.txt -t 28356 -n 10000

**************************************************************************************

**************************************************************************************
**                       Intersect Key With Regions: July 2012                      **
**************************************************************************************
IR intersects lists of genomicRegions (chrom start stop(inclusive)) with a key, assumes the
lists are sorted from most confident to least confident. Multiple hits to the same key
region are ignored.

-k Full path file text for the key genomicRegions file, tab delimited (chr start
      stop(inclusive)).
-r Full path file text or directory containing your region files to score.
-g Max gap, defaults to -1. A max gap of 0 = genomicRegions must abut, negative values force
      overlap (ie -1= 1bp overlap, be careful not to exceed the length of the smaller
      region), positive values enable gaps (ie 1=1bp gap).
-s Subtract 1 from end coordinates.  Use for interbase.

Example: java -Xmx1500M -jar pathTo/Apps/IntersectKeyWithRegions -k /data/key.txt
      -r /data/HitLists/ 

**************************************************************************************

**************************************************************************************
**                            Intersect Regions: May 2017                           **
**************************************************************************************
IR intersects lists of regions (tab delimited: chrom start stop(inclusive)). Random
regions can also be used to calculate a p-value and fold enrichment.

-f First regions files, a single file, or a directory of files.
-s Second regions files, a single file, or a directory of files.
-g Max gap, defaults to 0. A max gap of 0 = regions must at least abut or overlap,
      negative values force overlap (ie -1= 1bp overlap, be careful not to exceed the
      length of the smaller region), positive values enable gaps (ie 1=1bp gap).
-e Score intersections where second regions are entirely contained by first regions.
-r Make random regions matched to the second regions file(s) and intersect with the
      first.  Enter either a bed file or full path directory that contains chromosome
      specific interrogated regions files (ie named: chr1, chr2 ...: chrom start stop).
-c Match GC content of second regions file(s) when selecting random regions, rather
      slow. Provide a full path directory text containing chromosome specific genomic
      sequences.
-n Number of random region trials, defaults to 1000.
-w Write intersections and differences.
-x Write paired intersections.
-p Print length distribution histogram for gaps between first and closest second.
-q Parameters for histogram, comma delimited list, no spaces:
       minimum length, maximum length, number of bins.  Defaults to -100, 2400, 100.

Example: java -Xmx1500M -jar pathTo/Apps/IntersectRegions -f /data/miRNAs.txt
      -s /data/DroshaLists/ -g 500 -n 10000 -r /data/InterrogatedRegions/


**************************************************************************************

**************************************************************************************
**                          Joint Genotype VCF Parser: Oct 2018                     **
**************************************************************************************
Splits and filters GATK joint genotyped multi sample vcf files. Use vt to decompose 
the multi alts. See https://genome.sph.umich.edu/wiki/Vt#Decompose . Replaces the AF
and DP INFO fields with the sample level values.

Required Params:
-v Path to vt decomposed GATK joint genotyped multi sample vcf file, gz/zip OK.
     ~/BioApps/vt decompose -s jointGenotyped.vcf.gz -o jointGenotyped.decomp.vcf.gz
-s Path to a directory to save the split files.

Optional Params:
-q Minimum QUAL value, defaults to 20
-d Minimum read depth based on the AD sample values, defaults to 10
-a Minimum AF allele freq, defaults to 0.2
-g Minimum GT genotype quality, defaults to 20
-f Print debugging output to screen

Example: java -jar -Xmx2G pathToUSeq/Apps/HaplotypeVCFParser -d 20 -a 0.25 -g 30 -f
      -v jointGenotyped.decomp.vcf.gz -s SplitFilteredVcfs/ -q 30 

**************************************************************************************

**************************************************************************************
**                           Kegg Pathway Enrichment:  Aug 2009                     **
**************************************************************************************
KPE looks for overrepresentation of genes from a user's list in Kegg pathways using a
random permutation test. Several files are needed from http://www.genome.jp/kegg 
Gene names must be in Ensembl Gene notation and begin with ENSG.

Options:
-e Full path file text for a KeggGeneIDs : EnsemblGeneIDs file (e.g. Human 
      ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/hsa_ensembl-hsa.list)
-p Full path file text for a KeggPathwayIDs : TextDescription file (e.g. Human 
      ftp://ftp.genome.jp/pub/kegg/pathway/map_title.tab)
-g Full path file text for a KeggGeneIDs : KeggPathwayIDs file (e.g. Human 
      ftp://ftp.genome.jp/pub/kegg/pathway/organisms/hsa/hsa_gene_map.tab)
-a Full path file text for your all interrogated Ensembl gene list (e.g. ENSG00...)
      One gene per line.
-s Full path file text for your select gene list.
-n Number of random iterations, defaults to 10000

Example: java -Xmx1500M -jar pathTo/USeq/Apps/KeggPathwayEnrichment -e 
      /Kegg/hsa_ensembl-hsa.list -p /Kegg/map_title.tab -g /Kegg/hsa_gene_map.tab
      -a /HCV/ensemblGenesWith20OrMoreReads.txt -s /HCV/upRegInHCV_Norm.txt

**************************************************************************************

**************************************************************************************
**                      Known Splice Junction Scanner : Sept 2017                   **
**************************************************************************************
Scores know splice junctions using the MaxEntScan algorithms. See Yeo and
Burge 2004, http://www.ncbi.nlm.nih.gov/pubmed/15285897 for details. 

Required Options:
-r Name of a gzipped bed file to use in saving the results, will over write.
-f Path to the reference fasta with associated xxx.fai index
-u UCSC RefFlat or RefSeq transcript (not merged genes) file, full path. See RefSeq 
       http://genome.ucsc.edu/cgi-bin/hgTables, (uniqueName1 name2(optional) chrom
       strand txStart txEnd cdsStart cdsEnd exonCount (commaDelimited)exonStarts
       (commaDelimited)exonEnds). Example: ENSG00000183888 C1orf64 chr1 + 16203317
       16207889 16203385 16205428 2 16203317,16205000 16203467,16207889 .
-m Full path directory name containing the me2x3acc1-9, splice5sequences and me2x5
       splice model files. See USeqDocumentation/splicemodels/ or 
       http://genes.mit.edu/burgelab/maxent/download/ 

Example: java -Xmx10G -jar ~/USeq/Apps/KnownSpliceJunctionScanner -f ~/Hg19/hg19.fasta
       -r ~/exm2.bed.gz -m ~/USeq/Documentation/splicemodels -u ~/hg19EnsTrans.ucsc.zip

**************************************************************************************

**************************************************************************************
**                             Lofreq VCF Parser: May 2018                          **
**************************************************************************************
Parses Lofreq vcf files with options for filtering for minimum QUAL, modifying the
FILTER field, removing non SNVs, and appending FORMAT info for downstream merging.

Required Params:
-v Full path file or directory containing xxx.vcf(.gz/.zip OK) file(s)

Optional Params:
-s File path to a directory for saving the modified vcfs
-m Minimum QUAL score, defaults to 0
-d Minimum DP read depth, defaults to 0
-t Minimum AF allele freq, defaults to 0
-r Minimum Alt count, defaults to 0
-i Remove non SNV records
-f Replace the FILTER field with '.'
-a Append FORMAT NORMAL TUMOR to #CHROM line and add empty columns to records
-n Mark variants failing thresholds FAIL instead of not printing

Example: java -jar pathToUSeq/Apps/LofreqVCFParser -v VCFFiles/ -m 32 -i -f -a
      -s FilteredLofreqVcfs/ -r 3 

**************************************************************************************

**************************************************************************************
**                                Maf Parser: Sept 2016                             **
**************************************************************************************
Parses and manipulates variant maf files. Provide a path to the tabix executables
(https://github.com/samtools/htslib) for TQuery lookup and IGV compatibility.

Options:
-m Path to a xxx.maf file (xxx.maf.txt and .zip/.gz OK) or directory containing such.
-o Output directory, will overwrite.
-t To tabix index the output, provide a path to the dir containing bgzip and tabix
-c Convert M chroms to MT

Example: java -Xmx4G -jar pathTo/USeq/Apps/MafParser -m MafTCGAFiles/ -o Sorted/ 
              -t ~/BioApps/HTSlib/1.3/bin/ -c 

**************************************************************************************

**************************************************************************************
**                        Make Splice Junction Fasta: Nov 2010                     **
**************************************************************************************

DEPRECIATED, don't use!  See MakeTranscriptome app!
MSJF creates a multi fasta file containing sequences representing all possible linear
splice junctions. The header on each fasta is the chr_endPosExonA_startPosExonB. The
length of sequence collected from each junction is 2x the radius. A word of warning,
be very careful about the coordinate system used in the gene table to define the
start and stop of exons.  UCSC uses interbase and this is assumed in this app. Check
a few of the junctions to be sure correct splices were made. All junction sequences
are from the top/ plus strand of the genome, they are not reverse complemented. Exon
sequence shorter than the radius will be appended with Ns.

Options:
-f Fasta file directory, should contain chromosome specific xxx.fasta files.
-u UCSC gene table file, full path. See, http://genome.ucsc.edu/cgi-bin/hgTables
-s Sequence length radius.
-r Results fasta file, full path.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/MakeSpliceJunctionFasta -s 32 
      -f /Genomes/Hg18/Fastas/ -u /Anno/Hg18/ucscKnownGenes.txt -r
      /Genomes/Hg18/Fastas/hg18_32_splices.fasta 

************************************************************************************

**************************************************************************************
**                           Make Transcriptome:  June 2012                         **
**************************************************************************************
Takes a UCSC ref flat table of transcripts and generates two multi fasta files of
transcripts and splices (known and theoretical). All possible unique splice junctions
are created given the exons from each gene's transcripts. In some cases this is
computationally intractable and theoretical splices from these are not complete.
Read through occurs with small exons to the next up or downstream so keep the sequence
length radius to a minimum to reduce the number of junctions. Overlapping exons are
assumed to be mutually exclusive. All sequence is from the plus genomic stand, no
reverse complementation. Interbase coordinates. This app can take a very long time to
run. Break up gene table by chromosome and run on a cluster. 

To incorporate additional splice-junctions, add a new annotation line containing two
exons representing the junction to the table. If needed, set the -s option to skip
duplicates. 

Options:
-f Fasta file directory, one per chromosome (e.g. chrX.fasta or chrX.fa, .gz/.zip OK)
-u UCSC RefFlat gene table file, full path. See,
       http://genome.ucsc.edu/cgi-bin/hgTables, (geneName transcriptName chrom strand
       txStart txEnd cdsStart cdsEnd exonCount (commaDelimited)exonStarts
       (commaDelimited)exonEnds). Example: ENSG00000183888 ENST00000329454 chr1 + 
       16203317 16207889 16203385 16205428 2 16203317,16205000 16203467,16207889 .
-r Sequence length radius. Set to the read length - 4bp.
-n Max number splices per transcript, defaults to 100000.
-m Max minutes to process each gene's splices before interrupting, defaults to 10.
-s Skip subsequent occurrences of splices with the same coordinates. Memory intensive.

Example: java -Xmx4G -jar pathTo/USeq/Apps/MakeTranscriptome -f /Genomes/Hg18/Fastas/
      -u /Anno/Hg18/ensemblGenes.txt.ucsc -r 46 -s 

************************************************************************************

**************************************************************************************
**                        Mask Exons In Fasta Files: June 2011                      **
**************************************************************************************
Replaces the exonic sequence with Ns.

Options:
-f Fasta file directory, one per chromosome (e.g. chrX.fasta or chrX.fa, .gz/.zip OK)
-u UCSC RefFlat gene table file, full path. See,
       http://genome.ucsc.edu/cgi-bin/hgTables, (geneName transcriptName chrom strand
       txStart txEnd cdsStart cdsEnd exonCount (commaDelimited)exonStarts
       (commaDelimited)exonEnds). Example: ENSG00000183888 ENST00000329454 chr1 + 
       16203317 16207889 16203385 16205428 2 16203317,16205000 16203467,16207889 .
-s Save directory, full path.

Example: java -Xmx4G -jar pathTo/USeq/Apps/MaskExonsInFastaFiles -f 
      /Genomes/Hg18/Fastas/ -u /Anno/Hg18/ensemblTranscripts.txt.ucsc -s 
      /Genomes/Hg18/MaskedFastas/

************************************************************************************

**************************************************************************************
**                       Mask Regions In Fasta Files: Aug 2016                      **
**************************************************************************************
Replaces the region (or non region) sequence with Ns. Interbase coordinates.

Options:
-f Fasta file directory, one per chromosome (e.g. chrX.fasta or chrX.fa, .gz/.zip OK)
-b Bed file of regions to mask.
-s Save directory, full path.
-r Mask sequence not in regions, reverse mask.

Example: java -Xmx4G -jar pathTo/USeq/Apps/MaskRegionsInFastaFiles -f 
      /Genomes/Hg18/Fastas/ -b /Anno/Hg18/badRegions.bed -s 
      /Genomes/Hg18/MaskedFastas/

************************************************************************************

**************************************************************************************
**                              MatchMates: February 2019                           **
**************************************************************************************
This app attaches mates of aligned first of pair reads to the attributes and modifies
the start position to enable sorting by unclipped start. Call Consensus to cluster and
collapse alignments with related molecular barcodes.

Options:
-s (Required) Provide a directory path for saving the modified alignments.
-b Path to a query name sorted bam/sam alignment file, defaults to reading from STDIN. 
-j Write summary stats in json format to this file.

Example: myAligner | java -Xmx2G -jar pathTo/USeq/Apps/MatchMates -s ReadyForConsensus

**************************************************************************************

**************************************************************************************
**                               MaxEntScanScore3: Nov 2013                         **
**************************************************************************************
Implementation of Max Ent Scan's score3 algorithm for human splice site detection. See
Yeo and Burge 2004, http://www.ncbi.nlm.nih.gov/pubmed/15285897 

Options:
-s Full path directory name containing the me2x3acc1-9 splice model files. See
     USeq/Documentation/ or http://genes.mit.edu/burgelab/maxent/download/ 
-t Full path file name for 23mer test sequences, GATCgatc only, one per line. Fasta OK.

Example: java -Xmx10G -jar pathTo/USeq/Apps/MaxEntScanScore3 -s ~/MES/splicemodels -t
     ~/MES/seqsToTest.fasta 

**************************************************************************************

**************************************************************************************
**                               MaxEntScanScore5: Nov 2013                         **
**************************************************************************************
Implementation of Max Ent Scan's score5 algorithm for human splice site detection. See
Yeo and Burge 2004, http://www.ncbi.nlm.nih.gov/pubmed/15285897 

Options:
-s Full path directory containing the splice5sequences and me2x5 splice model files.
     See USeq/Documentation/ or http://genes.mit.edu/burgelab/maxent/download/ 
-t Full path file name for 9mer test sequences, GATCgatc only, one per line. Fasta OK.

Example: java -Xmx10G -jar pathTo/USeq/Apps/MaxEntScanScore5 -s ~/MES/splicemodels -t
     ~/MES/seqsToTest.fasta 

**************************************************************************************

**************************************************************************************
**                           Merge Adjacent Regions: Oct  2018                      **
**************************************************************************************
Merges regions within a max bp gap and tracks the number merged.  Regions must not
overlap. Best run the MergeRegions app if in doubt.

Options:
-b Path to a bed file of non overlapping regions, xxx.gz/.zip OK.
-r Path for saving the merged xxx.bed.gz file.
-m Max bp gap, defaults to 5000.

Example: java -Xmx4G -jar pathTo/USeq/Apps/MergeAdjacentRegions -b myRegions.bed.zip 
     -m 1000 -r mergedRegions.bed.gz 

**************************************************************************************

**************************************************************************************
**                          MergeExonMetrics : June 2013                              **
**************************************************************************************
This app simply merges the output from several metrics html files.


Required:
-f Directory containing metrics dictionary files and a image directory
-o Name of the combined metrics file

Example: java -Xmx1500M -jar pathTo/USeq/Apps/MergeExonMetrics -f metrics -o 9908_metrics 
**************************************************************************************

**************************************************************************************
**                           Merge Overlappng Genes: Feb  2015                      **
**************************************************************************************
Merges transcript models that share exonic bps. Maximizes exons, minimizes introns.
Assumes interbase coordinates.

Options:
-u Path to a UCSC RefFlat or RefSeq gene table file or directory with such to merge.
       See http://genome.ucsc.edu/cgi-bin/hgTables, (geneName name2(optional) chrom
       strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds). 
-r Path for results file.
-m Minimum fraction exonic bp overlap for merging, defaults to 0.05

Example: java -Xmx4G -jar pathTo/USeq/Apps/MergeOverlappingGenes -d 
      /CufflinkTranscripts/zv9Genes.ucsc.gz -f 0.25 -r merged.ucsc

**************************************************************************************

**************************************************************************************
**                            Merge Paired Alignments: Oct 2018                     **
**************************************************************************************
Merges proper paired alignments that pass a variety of checks and thresholds. Only
unambiguous pairs will be merged. Increases base calling accuracy in overlap and helps
avoid non-independent variant observations and other double counting issues. Identical
overlapping bases are assigned the higher quality scores. Disagreements are resolved
toward the higher quality base. If too close in quality, then the quality is set to 0.

Options:
-b Path to a coordinate sorted xxx.bam file containing paired alignments.
-d Path to a directory for saving the results.

Default Options:
-s Save merged xxx.sam.gz alignments instead of binary ChromData. Either works
      in Sam2USeq for read coverage analysis, the ChromData is much faster.
-e Only process and save alignments overlapping this bed format region file.
-u Remove all alignments marked as duplicates, defaults to keeping.
-a Maximum alignment score (AS:i: tag). Defaults to 300, smaller numbers are more
      stringent for novoalign where each mismatch is ~30pts.
-q Minimum mapping quality score, defaults to 0, larger numbers are more stringent.
-r The second paired alignment's strand has been reversed. Defaults to not reversed.
-i Maximum acceptible base pair distance for merging, defaults to 5000.
-m Don't cross check read mate coordinates, needed for merging repeat matches. Defaults
      to checking.
-o Merge all proper paired alignments. Defaults to only merging those that overlap.
-p Don't print detailed paired alignment statistics and insert size histogram.
-t Number concurrent threads to run, defaults to the max available to the jvm.
-j Write summary stats in json format to this file.

Example: java -Xmx20G -jar pathToUSeq/Apps/MergePairedBamAlignments -f /Bams/ms.bam
     -p -s /Bams/MergedPairs/ms.mergedPairs.sam.gz -d 10000 

**************************************************************************************

**************************************************************************************
**                             Merge Point Data: Jan 2011                           **
**************************************************************************************
Efficiently merges PointData, collapsing by position and possibly strand. Identical
position scores are either summed or converted into counts. DO NOT use this app on
PointData that will be part of a primary chIP/RNA-seq analysis.  It is only for
bis-seq and visualization purposes.

Options:
-p Point Data directories, full path, comma delimited. Should contain chromosome
       specific xxx.bar.zip or xxx_-_.bar files. Alternatively, provide one directory
       containing multiple PointData directories.
-s Save directory, full path.
-c Don't replace scores with hit count, just sum existing scores.
-m Merge strands

Example: java -Xmx1500M -jar pathTo/USeq/Apps/MergePointData -p
      /Data/Ets1Rep1/,/Data/Ets1Rep2/ -s /Data/MergedEts1 -m 

**************************************************************************************

**************************************************************************************
**                             Merge Regions: July 2017                             **
**************************************************************************************
Flattens tab delimited bed files (chr start stop ...). Assumes interbase coordinates.

Options:
-d Directory containing bed files.

Example: java -Xmx4000M -jar pathTo/Apps/MergeRegions -d /Anno/TilingDesign/

************************************************************************************

**************************************************************************************
**                                 Merge Sams: May 2017                             **
**************************************************************************************
Merges sam and bam files. Adds a consensus header if one is not provided. These may
not work with GATK or Picard downstream apps, good for USeq.

Options:
-d The full path to a directory containing xxx.bam or xxx.sam.gz files to merged.

Default Options:
-s Save file, must end in xxx.bam, defaults merge.bam in -d.
-a Maximum alignment score. Defaults to 300, smaller numbers are more stringent.
      Approx 30pts per mismatch.
-m Minimum mapping quality score, defaults to 0 (no filtering), larger numbers are
      more stringent. Set to 13 or more to require near unique alignments. DO NOT set
      for alignments parsed by the SamTranscriptomeParser!
-f Save reads failing filters, defaults to tossing them.
-h Full path to a txt file containing a sam header, defaults to autogenerating the
      header from the sam/bam headers.
-t Don't delete temp xxx.sam.gz file.
-p Add program arguments to header, defaults to deleting, note duplicate cause Picard
      apps to fail.
-q Quiet, print only errors.

Example: java -Xmx1500M -jar pathToUSeq/Apps/MergeSams -f /Novo/Run7/
     -m 20 -a 120  

**************************************************************************************

**************************************************************************************
**                           Merge UCSC Gene Table: Aug  2018                       **
**************************************************************************************
Merges transcript models that share the same gene name (in column 0). Maximizes exons,
minimizes introns. Assumes interbase coordinates.

Options:
-u UCSC RefFlat or RefSeq gene table file, full path. See,
       http://genome.ucsc.edu/cgi-bin/hgTables, (geneName name2(optional) chrom strand
       txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds). 

Example: java -Xmx4G -jar pathTo/USeq/Apps/MergeUCSCGeneTable -u 
      /data/zv8EnsemblGenes.ucsc.gz

**************************************************************************************

**************************************************************************************
**                        Methylation Array Scanner: March 2014                     **
**************************************************************************************
MAS takes paired or non-paired sample PointData representing beta values (0-1) from
arrays and scores regions with enriched/ reduced signal using a sliding window
approach. A B&H corrected Wilcoxon signed rank (or rank sum test for non-paired),
pseudo median of the log2(treat/control) ratios (or log2(pseT/pseC) for non-paired),
and permutation test FDR is calculated for each window. Use the EnrichedRegionMaker
to identify enriched and reduced regions by picking thresholds (e.g. -i 0,1 -s 0.2,13).
MAS generates several data tracks for visualization in IGB including paired sample bp
log2 ratios, window level Wilcoxon FDRs, and window level pseudomedian log2 ratios. 
Note, non-paired analysis are very underpowered and require > 30 obs/ window to see
any significant FDRs.

Required Options:
-s Path to a directory for saving the results.
-d Path to a directory containing individual sample PointData directories, each of
      which should contain chromosome split bar files (e.g. chr1.bar, chr2.bar, ...)
-t Names of the treatment sample directories in -d, comma delimited, no spaces.
-c Ditto but for the control samples, the ordering is critical and describes how to
      pair the samples for a paired analysis.

Advanced Options:
-n Run a non-paired analysis where t and c are treated as groups and pooled.
-w Window size, defaults to 1000.
-o Minimum number observations in window, defaults to 10.
-p Minimum pseudomedian log2 ratio for estimating the permutation FDR, defaults to 0.2
-r Number permutations, defaults to 5
-e Run T-Test instead of Wilcoxon rank sum test for non-paired samples.
-v Save coefficient of variantion tracks

Example: java -Xmx4G -jar pathTo/USeq/Apps/MethylationArrayScanner -s ~/MAS/Res
     -d ~/MAS/Bar/ -t Early1,Early2,Early3 -c Late1,Late2,Late3
     -w 1500

**************************************************************************************

**************************************************************************************
**                    Methylation Array Defined Region Scanner: July 2013           **
**************************************************************************************
MADRS takes paired sample PointData representing beta values (0-1) from arrays and
a list of regions to score for differential methylation using a B&H corrected Wilcoxon
signed rank test and pseudo median of the paired log2(treat/control) ratios. Pairs
containing a zero value are ignored. It generates a spreadsheet of statistics for each
region. If a non-paired analysis is selected, a Wilcoxon rank sum test and
log2(pseT/pseC) are calculated on each region. Note this is a very underpowered test
requiring >30 observations to see any significant FDRs.

Required Options:
-b A bed file of regions to score (tab delimited: chr start stop ...)
-d Path to a directory containing individual sample PointData directories, each of
      which should contain chromosome split bar files (e.g. chr1.bar, chr2.bar, ...)
-t Names of the treatment sample directories in -d, comma delimited, no spaces.
-c Ditto but for the control samples, the ordering is critical and describes how to
      pair the samples for a paired analysis.
-o Minimum number paired observations in window, defaults to 3.
-z Skip printing regions with less than minimum observations.
-n Run a non-paired analysis where t and c are treated as groups and pooled. Uneven
      numbers of t and c are allowed.

Example: java -Xmx4G -jar pathTo/USeq/Apps/MethylationArrayDefinedRegionScanner 
     -v H_sapiens_Feb_2009 -d ~/MASS/Bar/ -t Early1,Early2,Early3
     -c Late1,Late2,Late3 

**************************************************************************************

**************************************************************************************
**                       Microsatellite Counter: Jan 2014                           **
**************************************************************************************
MicrosatelliteCounter identifies and counts microsatellite repeats in MiSeq fastq 
files. This iteration of the software requires you to specify the primers used in the
sequencing project.  It will automatically find the most likely microsatellite by 
looking at all possible repeats of length 1 through length 10 and finding the longest
repeat by length, not repeat unit.  There are two output files generated, the first 
lists primer statistics (currently only reads with both primers are used), the 
second lists repeat data.  Note that the input file are fastq sequence that were 
merged using a program like PEAR


Required Arguments:

-f Merged fastq file. Path to merged fastq file. We currently suggest using PEAR to 
       merge fastq sequences.
-p Primer file.  Path to primer reference file.  This file lists each primer used in 
       in the sequencing project in the format NAMEPRIMER1PRIMER2.
-n Sample name.  Sample name.  This string will be appended to the output files names.
-d Directory. Output directory. Output files will be written to this directory
-b Require both primers.  Both primers must be identified in order to more forward 
       with the analysis.

Example: java -Xmx4G -jar pathTo/USeq/Apps/MicrosatelliteCounter -f Merged.fastq 
      -r PrimerReference.txt -p 10511X1.primer.txt -o 10511X1.repeat.txt

**************************************************************************************

Merged fastq file not specified, exiting.

**************************************************************************************
**                            MiRNA Correlator: March 2014                          **
**************************************************************************************
Generates a spreadsheet to use in comparing changing miRNA levels to changes in gene
expression.

Options:
-r Results file.
-a All miRNA name file (single column of miRNA names).
-m MiRNA data (two columns: miRNA name, miRNA log2Rto).
-t Gene target to miRNA data (two columns: gene target name, miRNA name).
-e Gene expression data (three columns: gene name, log2Rto, FDR).
-f Don't print the gene expression FDR value in the spreadsheet.

Example: java -Xmx4G -jar pathTo/USeq/Apps/MiRNACorrelator -m miRNA_CLvsMOR.txt -a 
allMiRNANamesNoPs.txt -t targetGene2MiRNA.txt -e geneExp_CLvsMOR.txt -r results.xls

**************************************************************************************

**************************************************************************************
**                              MpileUp Parser: Sept 2015                           **
**************************************************************************************
Parses a SAMTools mpileup output file for non reference bases generating bed files and
data tracks with information related to error prone bases. Multiple samples are merged.

Options:
-p Path to a mpileup file (.gz or.zip OK, use 'samtools mpileup -Q 20 -A -B *bam').
-v Versioned Genome (ie H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-s Save directory, full path, defaults to pileup file directory.
-r Minimum read coverage, defaults to 15.
-e Max nonRef base fraction, defaults to 0.05
-w Window size, defaults to 50
-f Max fraction failing bp in window, defaults to 0.05

Example: java -Xmx4G -jar pathTo/USeq/Apps/MpileupParser -p /Pileups/N2.mpileup.gz -v
      H_sapiens_Feb_2009 -e 0.1 -w 25

**************************************************************************************

**************************************************************************************
**                             Mpileup Randomizer: May 2018                         **
**************************************************************************************
Upon finding a gap in the coverage, the sample order is randomized and maintained. Use
this app to 'de-identify' a multi sample mpileup file while maintaining INDEL blocks.

Required Options:
-m Path to a Samtools mpileup file (gz/zip OK).

Default Options:
-r Minimum read depth to pass a sample, default 10
-s Minimum number of samples that must pass to save line, default 3
-g Minimum gap, defaults to 125

Example: java -Xmx4G -jar pathToUSeq/Apps/MpileupRandomizer -m normExo.mpileup.gz
    -r 20 -s 4 
**************************************************************************************

**************************************************************************************
**                        Multiple Replica Scan Seqs: May 2014                      **
**************************************************************************************
MRSS uses a sliding window and Ander's DESeq negative binomial pvalue -> Benjamini & 
Hochberg AdjP statistics to identify enriched and reduced regions in a genome. Both
treatment and control PointData sets are required, one or more biological replicas.
MRSS generates window level differential count tracks for the AdjP and normalized
log2Ratio as well as a binary window objec xxx.swi file for downstream use by the
EnrichedRegionMaker. MRSS also makes use of DESeq's variance corrected count data to
cluster your biological replics. Given R's poor memory management, running DESeq
requires lots of RAM, 64bit R, and 1-3 hrs.

Options:
-s Save directory, full path.
-t Treatment replica PointData directories, full path, comma delimited, no spaces,
       one per biological replica. Use the PointDataManipulator app to merge same
       replica and technical replica datasets. Each directory should contain stranded
       chromosome specific xxx_-/+_.bar.zip files. Alternatively, provide one
       directory that contains multiple biological replical PointData directories.
-c Control replica PointData directories, ditto. 
-r Full path to 64bit R loaded with DESeq library, defaults to '/usr/bin/R' file, see
       http://www-huber.embl.de/users/anders/DESeq/ . Type 'library(DESeq)' in
       an R terminal to see if it is installed.
-p Peak shift, average distance between + and - strand peaks for chIP-Seq data, see
       PeakShiftFinder or set it to 100bp. For RNA-Seq set it to 0. It will be used
       to shift the PointData by 1/2 the peak shift.
-w Window size, defaults to the peak shift. For chIP-Seq data, a good alternative 
       is the peak shift plus the standard deviation, see the PeakShiftFinder app.
       For RNA-Seq data, set this to 100-250.

Advanced Options:
-m Minimum number of reads in a window, defaults to 15
-d Don't delete temp files

Example: java -Xmx4G -jar pathTo/USeq/Apps/MultipleReplicaScanSeqs -t
      /Data/PolIIRep1/,/Data/PolIIRep2/ -c /Data/Input1/,Data/Input2/ -s
      /Data/PolIIResults/ -p 150 -w 250 -b 

**************************************************************************************

**************************************************************************************
**                            Multi Sample VCF Filter  : July 2015                  **
**************************************************************************************
Filters a vcf file containing multiple sample records into those that pass or fail the
tests below. This works with VCFv4.1 files created by the GATK package. Note, the 
records are not modified. If the number of records in the VCF file is greater than 
500000, the VCF file is intersected in chunks.

Required:
-v Full path to a sorted single or multi sample vcf file (xxx.vcf/xxx.vcf.gz)). Note,
       Java often fails to parse tabix compressed vcf files.  Best to uncompress.

Optional:
-p Full path to an output VCF (xxx.vcf or xxx.vcf.gz).  Specifying xxx.vcf.gz will 
       compress and index the VCF using tabix (set -t too). Defaults to input_Filt.vcf
-f Print out failing records, defaults to printing those passing the filters.
-a Fail records where no sample passes the sample thresholds.
-i Fail records where the original FILTER field is not 'PASS' or '.'
-c Fail records that don't intersect the regions in this bed file, full path.
-b Filter by genotype flags.  -n, -u and -l must be set.
-n Sample names ordered by category.  
-u Number of samples in each category.  
-l Requirement flags for each category. All samples that pass the specfied filters 
       must meet the flag requirements, or the variant isn't reported.  At least one 
       sample in each group must pass the specified filters, or the variant isn't 
       reported.
		   a) 'W' : homozygous common 
		   b) 'H' : heterozygous 
		   c) 'M' : homozygous rare 
		   d) '-W' : not homozygous common 
	 	   e) '-H' : not heterozygous 
		   f) '-M' : not homozygous rare
-e Strict genotype matching. If this is selected, records with no-call samples 
       or samples falling below either minimum sample genotype quality (-g) or 
       minimum sample read depth (-r) won't be reported. Only samples listed in (-n)
       will be checked 
-d Minimum record QUAL score, defaults to 0, recommend >=20 
-g Minimum sample genotype quality GQ, defaults to 0, recommend >= 20 
-r Minimum sample read depth DP, defaults to 0, recommend >=10 
-x Maximum sample read depth DP, defaults to unlimited
-y Minimum sample allele count read depth AD or DP4, defaults to 0
-s Print sample names and exit.
-t Path to tabix.

Example: java -Xmx10G -jar pathTo/USeq/Apps/MultiSampleVCFFilter 
       -v DEMO.passing.vcf -p DEMO.intersection.vcf -c exomeV4.bed -b 
       -n SRR504516,SRR776598,SRR504515,SRR504517,SRR504483 -u 2,2,1 -l M,H,-M 

**************************************************************************************

**************************************************************************************
**                               Mutect VCF Parser: May 2018                        **
**************************************************************************************
Parses Mutect2 VCF files, filtering for read depth, allele frequency diff ratio, etc.
Inserts AF and DP into for the tumor sample into the INFO field. Changes the sample
order to Normal and Tumor and updates the #CHROM line. Replaces the QUAL with TLOD.

Options:
-v Full path file or directory containing xxx.vcf(.gz/.zip OK) file(s).
-f Directory to save the parsed files, defaults to the parent dir of the first vcf.
-t Minimum tumor allele frequency (AF), defaults to 0.
-n Maximum normal AF, defaults to 1.
-u Minimum tumor alignment depth, defaults to 0.
-a Minimum tumor alt count, defaults to 0.
-o Minimum normal alignment depth, defaults to 0.
-d Minimum T-N AF difference, defaults to 0.
-r Minimum T/N AF ratio, defaults to 0.
-t Minimum TLOD score, defaults to 0.
-p Remove non PASS filter field records.
-s Print spreadsheet variant summary.

Example: java -jar pathToUSeq/Apps/MutectVCFParser -v /VCFFiles/ -t 0.05 -n 0.5 -u 100
        -o 20 -d 0.05 -r 2 -a 3 

**************************************************************************************

**************************************************************************************
**                               Mutect 4 VCF Parser: Oct 2018                      **
**************************************************************************************
Parses Mutect2 VCF files from the GATK 4.0+ package, filtering for read depth, allele
frequency diff ratio, etc. Inserts AF and DP into for the tumor sample into the INFO
field. Replaces the QUAL with TLOD.

Options:
-v Full path file or directory containing xxx.vcf(.gz/.zip OK) file(s). It is REQUIRED
       to run 'vt decompose -s ' on these first. Recommend running decompose_blocksub
       too. See https://github.com/atks/vt 
-f Directory to save the parsed files, defaults to the parent dir of the first vcf.
-t Minimum tumor allele frequency (AF), defaults to 0.
-n Maximum normal AF, defaults to 1.
-u Minimum tumor alignment depth, defaults to 0.
-a Minimum tumor alt count, defaults to 0.
-o Minimum normal alignment depth, defaults to 0.
-d Minimum T-N AF difference, defaults to 0.
-r Minimum T/N AF ratio, defaults to 0.
-t Minimum TLOD score, defaults to 0.
-p Remove non PASS filter field records.

Example: java -jar pathToUSeq/Apps/Mutect4VCFParser -v /VCFFiles/ -t 0.05 -n 0.5 -u 100
        -o 20 -d 0.05 -r 2 -a 3 

**************************************************************************************

**************************************************************************************
**                           Non Reference Region Maker: Jan 2018                   **
**************************************************************************************
NRRM scans a single sample mpileup file looking for non reference base pairs. If these
pass read depth, allele frequency, and non ref base count thresholds, the base is
written to a bed file. BPs with insertions are saved as a 2 BP region. Run MergeRegions
or MergeAdjacentRegions to join proximal non ref BPs. 

Options:
-m Provide a path to a single sample samtools mpileup file or pipe mpileup output.
-b Path to write the bed file output, should end in xxx.bed.gz
-r Minimum read depth, 10
-a Minimum non reference allelic frequency (SNVs + INDELS), default 0.05
-c Minimum non reference base count, default 3
-q Minimum base quality for inclusion in AF calculation, default 10

Example: samtools mpileup -B -d 1000000 -f $faIndex -l $bed $bam | java
     -Xmx4G -jar pathToUSeq/Apps/NonReferenceRegionMaker -q 13 -r 20 -a 0.025 -b 
     0.025normExoNonRefMask.bed.gz -c 4 
**************************************************************************************

**************************************************************************************
**                        Novoalign Bisulfite Parser: May 2016                      **
**************************************************************************************
Parses Novoalign -b2 and -b4 single and paired bisulfite sequence alignment files into
PointData file formats. Generates several summary statistics on converted and non-
converted C contexts. Flattens overlapping reads in a pair to call consensus bps.
Note: for paired read RNA-Seq data run through the SamTranscriptomeParser first.

Options:
-a Alignment file or directory containing non merged novoalignments in SAM/BAM
      (xxx.sam(.zip/.gz OK) or xxx.bam) format. Multiple files are combine. 
-f Fasta file directory, chromosome specific xxx.fa/.fasta(.zip/.gz OK) files.
-s Save directory.
-v Versioned Genome (ie H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.

Default Options:
-p Print bed file parsed data.
-x Maximum alignment score. Defaults to 300, smaller numbers are more stringent.
-q Minimum mapping quality score. Defaults to 13, bigger numbers are more stringent.
      This is a phred-scaled posterior probability that the mapping position of read
      is incorrect. For RNASeq data, set this to 0.
-b Minimum base quality score for reporting a non/converted C, defaults to 13.
-c Minimum base quality score for reporting a overlapping non/converted C not found
      in the other pair, defaults to 13.
-d Remove duplicate reads prior to generating PointData. Defaults to not removing
      duplicates.

Example: java -Xmx25G -jar pathToUSeq/Apps/NovoalignBisulfiteParser -x 240 -a
      /Novo/Run7/ -f /Genomes/Hg19/Fastas/ -v H_sapiens_Feb_2009 -s /Novo/Run7/NBP 

**************************************************************************************

**************************************************************************************
**                         Novoalign Indel Parser: June 2010                        **
**************************************************************************************
Parses Novoalign alignment xxx.txt(.zip/.gz) files for consensus indels, something
currently not supported by the maq apps. Generates a consensus indel allele file,
interbase coordinates, for running through the Alleler application. Also creates two
bed files for the insertions and deletions.

Options:
-f The full path directory/file text of your Novoalign xxx.txt(.zip or .gz) file(s).
-r Full path directory for saving the results.
-p Minimum alignment posterior probability (-10Log10(prob)) of being incorrect,
      defaults to 13 (0.05). Larger numbers are more stringent.
-b Minimum effected indel base quality score(s), ditto, defaults to 13.
-u Minimum number of unique reads covering indel, defaults to 2.

Example: java -Xmx1500M -jar pathToUSeq/Apps/NovoalignIndelParser -f /Novo/Run7/
     -r /Novo/Run7/indelAlleleTable.txt -p 20 -b 20 -u 3 
**************************************************************************************

**************************************************************************************
**                            Novoalign Parser: Jan 2011                            **
**************************************************************************************
Parses Novoalign xxx.txt(.zip/.gz) files into center position binary PointData xxx.bar
files, xxx.bed files, and if appropriate, a splice junction bed file. For the later,
create a gene regions bed file and run it through the MergeRegions application to
collapse overlapping transcripts. We recommend using the following settings while
running Novoalign 'novoalign -r0.2 -q5 -d yourDataBase -f your_prb.txt | grep '>chr' >
yourResultsFile.txt'. NP works with native, colorspace, and miRNA novoalignments. 

Options:
-v Versioned Genome (ie H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-f The full path directory/file text of your Novoalign xxx.txt(.zip or .gz) file(s).
-r Full path directory text for saving the results.
-p Posterior probability threshold (-10Log10(prob)) of being incorrect, defaults to 13
      (0.05). Larger numbers are more stringent. The parsed scores are delogged and
      converted to 1-prob.
-q Alignment score threshold, smaller numbers are more stringent, defaults to 60
-c Chromosome prefix, defaults to '>chr'.
-i Ignore strand when making splice junctions.
-g (Optional) Full path gene region bed file (chr start stop...) containing gene
      regions to use in scaling intersecting splice junctions.
-s Just print alignment stats, don't save any data.

Example: java -Xmx1500M -jar pathToUSeq/Apps/NovoalignParser -f /Novo/Run7/
     -v H_sapiens_Mar_2006 -p 20 -q 30 -r /Novo/Run7/mRNASeq/ -i -g
     /Anno/Hg18/mergedUCSCKnownGenes.bed 

**************************************************************************************

**************************************************************************************
**                        Novoalign Paired Parser: January 2009                     **
**************************************************************************************
Parses Novoalign paired alignment files xxx.txt(.zip/.gz) into xxx.bed format.

Options:
-f The full path directory/file text of your Novoalign xxx.txt(.zip or .gz) file(s).
-e Exclude half matches with a high quality unmatched pair, defaults to keeping them.
-m Maximum size for paired reads mapping to the same chromosome, defaults to 100000.
-s Splice junction radius, defaults to 34. See the MakeSpliceJunctionFasta app.

Example: java -Xmx1500M -jar pathToUSeq/Apps/NovoalignPairedParser -f /Novo/Run7/
 

**************************************************************************************

**************************************************************************************
**                              Oligo Tiler: Oct 2009                               **
**************************************************************************************
OT tiles oligos across genomic regions returning their forward and reverse sequences.
Won't tile oligos with non GATC characters, case insensitive. Replaces non GATC chars
in offset regions with 'a'. Note, the defaults are set for generating a 60 mer Agilent
specific tiling microarray design where the first 10bp of the 3' stop are buried in the
matrix and the effective oligo length is 50bp. Adjust accordingly for other platforms.

Options:
-f Fasta file directory, should contain chromosome specific xxx.fasta files.
-r Regions file to tile (tab delimited: chr start stop ...) interbase coordinates.
-o Effective oligo size, defaults to 50.
-s Spacing to place oligos, defaults to 25.
-t Three prime offset, defaults to 10.
-m Minimum size of region to tile, defaults to 20.
-a Print oligo FASTA instead of an Agilent eArray text seq formatted results.
-c Tile CpG (spacing not used, see max gap option).
-g Max gap between adjacent CpGs to include in same oligo, defaults to 8.
-e Split export files by strand instead of alternating strand.
-b Replace 3' stop of oligos with the human 11-nullomer 'ccgatacgtcg'. The first
       ~10bp don't contribute to hybridization on Agilent arrays.

Example: java -Xmx4000M -jar pathTo/Apps/OligoTiler -s 40 -f /Genomes/Hg18/Fastas/ 
     -r /Designs/cancerArray.bed -p -a 

************************************************************************************

**************************************************************************************
**                        Overdispersed Region Scan Seqs: May 2012                  **
**************************************************************************************

WARNING: this application is depreciated and no longer maintained, use the
DefinedRegionDifferentialSeq app instead!

ORSS takes bam alignment files and extracts reads under each region or gene's exons to
calculate several statistics. Makes use of Simon Anders' DESeq R package to with its
negative binomial p-value test to control for overdispersion. A Benjamini-Hochberg FDR
correction is used to control for multiple testing. DESeq is run with and without
variance outlier filtering. A chi-square test of independence between the exon read
count distributions is used to score alternative splicing. Several read count measures
are provided including counts for each replica, FPKMs (# frags per kb of int region
per total mill mapped reads) as well as DESeq's variance adjusted counts(use these for
clustering, correlation, and other distance type analysis). If replicas are provided
either the smallest all pair log2Ratio is reported (default) or the pseudomedian.
Several results files are written: two spread sheets containing all of the genes,
those that pass the thresholds, as well as egr, bed12, and useq region files for
visualization in genome browsers.

Required Options:
-s Save directory.
-t Treatment directory containing one xxx.bam file with xxx.bai index per biological
       replica. The BAM files should be sorted by coordinate and have passed Picard
       validation. Use the SamTranscriptomeParser to convert your aligned transcriptome
       data to genomic coordinates.
-c Control directory, ditto. 
-u UCSC RefFlat or RefSeq Gene table file, full path. See,
       http://genome.ucsc.edu/cgi-bin/hgTables, (name1 name2(optional) chrom strand
       txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds). WARNING!!!!!!
       This table should contain only one composite transcript per gene. Use the
       MergeUCSCGeneTable app to collapse Ensembl transcripts downloaded from UCSC in
       RefFlat format.
-b (Or) a bed file (chr, start, stop,...), full path, See,
       http://genome.ucsc.edu/FAQ/FAQformat#format1
-v Versioned Genome (ie H_sapiens_Mar_2006, D_rerio_Jul_2010), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases. 

Advanced/ Default Options:
-o Don't remove overlapping exons, defaults to filtering gene annotation for overlaps.
-i Score introns instead of exons.
-a Data is stranded. Only collect reads from the same strand as the annotation.
-f Minimum FDR threshold, defaults to 10 (-10Log10(FDR=0.1))
-l Minimum absolute log2 ratio threshold, defaults to 1 (2x)
-e Minimum number mapping reads per region, defaults to 20
-d Don't delete temp files used by DESeq
-p Use a pseudo median log2 ratio in place of the smallest all pair log2 ratios for
      scoring the degree of differential expression when replicas are present.
      Recommended for experiments with 4 or more replicas.
-r Full path to R loaded with DESeq library, defaults to '/usr/bin/R' file, see
       http://www-huber.embl.de/users/anders/DESeq/ . Type 'library(DESeq)' in
       an R terminal to see if it is installed.

Example: java -Xmx4G -jar pathTo/USeq/Apps/OverdispersedRegionScanSeqs -t
      /Data/PolIIRep1/,/Data/PolIIRep2/ -c /Data/Input1/,Data/Input2/ -s
      /Data/PolIIResults/ -f 30 -e 30 -u /Anno/mergedZv9EnsemblGenes.ucsc.gz

**************************************************************************************

**************************************************************************************
**                         Create Exon Summary Metrics : April 2013                 **
**************************************************************************************
This script runs a bunch of summary metric programs and compiles the results.  It uses
R and LaTex to generate a fancy pdf as an output. Can also genrate html 


Required:
-a Alignment statistics from Picard's CollectAlignmentMetrics
-b Alignment counts from USeq's CountChromosome
-c Coverage of CCDS exons from USeq's Sam2USeq
-d Duplication statics from Picard's MarkDuplicates
-e Error rate from USeq's CalculatePerCycleErrorRate
-f Overlap Statistics from USeq's MergePaired Sam Alignment
-o Output file name
Optional
-r Path to R
-l Path to pdflatex
-t Generate html instead
-i Generate dictionary (for pipeline)
-c Coverage file name 


Example: java -Xmx1500M -jar pathTo/USeq/Apps/VCFAnnovar -v 9908R.vcf                 
**************************************************************************************

Alignment file not specified, exiting

**************************************************************************************
**                        ParseIntersectingAlignments: June 2010                    **
**************************************************************************************
Parses bed alignment files for intersecting reads provided another bed file of alleles.

Options:
-s Full path file text for your SNP allele five column bed file (tab delimited chr,
      start,stop,text,score,strand)
-a Full path file text for your alignment bed file from the NovoalignParser.
-m Minimum base quality, defaults to 13

Example: java -Xmx1500M -jar pathToUSeq/Apps/ParseIntersectingAlignments 
     -s /LympAlleles/ex1.bed -a /SeqData/lymphAlignments.bed -m 13

**************************************************************************************

**************************************************************************************
**                           ParsePointDataContexts: Feb 2011                       **
**************************************************************************************
Parses PointData for particular 5bp genomic sequence contexts.

Options:
-s Save directory, full path.
-p PointData directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories. These will be merged before splitting by summing overlapping
       position scores.
-f Fasta files for each chromosome.
-c Context java regular expression, must be 5bp long, 5'->3', case insensitive, e.g.:
       '..CG.' for CG
       '..C[CAT]G' for CHG
       '..C[CAT][CAT]' for CHH
       '..C[CAT].' for nonCG
       '..C[^G].' for nonCG


Example: java -Xmx12G -jar pathTo/USeq/Apps/ParsePointDataContexts -c '..CG.' -s
      /Data/PointData/CG -f /Genomes/Hg18/Fastas -p /Data/PointData/All/  

**************************************************************************************

**************************************************************************************
**                               PeakShiftFinder: May 2010                          **
**************************************************************************************
PeakShiftFinder estimates the bp difference between sense and antisense proximal chIP-
seq peaks. It calculates the shift int two ways: by generating a composite peak from a
set of the top peaks in a dataset and by taking the median shift for the top peaks.
The latter appears more reliable for some datasets. Inspect the results in IGB by
loading the xxx.bar graphs. When in doubt, run ScanSeqs with just your
treatment data setting the peak shift to 0 and window size to 50 and manually inspect
the shift in IGB.

Options:
-t Treatment Point Data directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_+_.bar.zip and xxx_-_.bar.zip  files.
-c Control Point Data directories, ditto. 
-s Save directory, full path.

Advanced Options:
-e Two chIP samples are provided, no input, scan for reduced peaks too.
-w Window size in bps, defaults to 50.
-a Minimum number window reads, defaults to 10
-d Minimum normalized window score, defaults to 2.5
-r Minimum fold of treatment to control window reads, defaults to 5
-n Number of peaks to merge for composite, defaults to 100
-p Distance off peak center to collect from 5' stop, defaults to 500
-m Distance off peak center to collect from 3' stop, defaults to 1000

Example: java -Xmx1500M -jar pathTo/USeq/Apps/PeakShiftFinder -t
      /Data/Ets1Rep1/,/Data/Ets1Rep2/ -c /Data/Input1/,Data/Input2/ -s
      /Results/Ets1PeakShiftResults -w 25 -d 5

**************************************************************************************

**************************************************************************************
**                            Point Data Manipulator: Oct 2010                      **
**************************************************************************************
Manipulates PointData to merge strands, shift base positions, replace scores with 1
and sum identical positions. If multiple PointData directories are given, the data is
merged. 

Options:
-p Point Data directories, full path, comma delimited. Should contain chromosome
       specific xxx.bar.zip or xxx_-_.bar files. Alternatively, provide one directory
       containing multiple PointData directories.
-s Save directory, full path.
-o Replace PointData scores with 1
-d Shift base position XXX bases 3', defaults to 0
-i Sum identical base position scores
-m Merge strands

Example: java -Xmx1500M -jar pathTo/USeq/Apps/PointDataManipulator -p
      /Data/Ets1Rep1/,/Data/Ets1Rep2/ -s /Data/MergedEts1 -o -i -m 

**************************************************************************************

**************************************************************************************
**                                PoRe CNV: April 2017                              **
**************************************************************************************
Uses Poisson regression and Pearson residuals to identify exons or genes whose counts
differ significantly from the fitted value base on all the exon sample counts. This
app wraps an algorithm developed by Alun Thomas.  Data tracks are generated for the
residuals and log2(observed/ fitted counts) as well as detailed spreadsheets. A bed
regions file of merged adjacent exons passing thresholds is also created. Use this
app for identifying CNVs in next gen seq datasets with > 10 normal samples.

Required Options:
-s Save directory.
-b BAM file directory, sorted and indexed by coordinate. One bam per sample. 
-u UCSC RefFlat or RefSeq gene table file, full path. Tab delimited, see RefSeq Genes
       http://genome.ucsc.edu/cgi-bin/hgTables, (uniqueName1 name2(optional) chrom
       strand txStart txEnd cdsStart cdsEnd exonCount (commaDelimited)exonStarts
       (commaDelimited)exonEnds). Example: ENSG00000183888 C1orf64 chr1 + 16203317
       16207889 16203385 16205428 2 16203317,16205000 16203467,16207889
-a Alun Thomas R script file.
-g Genome Version  (ie H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.

Default Options:
-c BAM file names for samples to process, comma delimited, no spaces, defaults to all.
-r Full path to R, defaults to /usr/bin/R
-l Minimum abs(log2(obs/exp)) for inclusion in the pass spreadsheet, defaults to 0.585
-p Maximum adjusted p-value for inclusion in the pass spreadsheet, defaults to 0.01
-e Minimum all sample exon count for inclusion in analysis, defaults to 20.
-d Max per sample exon alignment depth, defaults to 50000. Exons containing higher
       counts are ignored.
-t Number concurrent threads to run, defaults to the max available to the jvm.
-m Minimum number exons per data chunk, defaults to 1500.
-w Examine whole gene counts for CNVs, defaults to exons.
-x Keep sex chromosomes (X,Y), defaults to removing. Don't mix sexes!
-z Don't delete temp files.

Example: java -Xmx4G -jar pathTo/USeq/Apps/PoReCNV -s PRCnvResults/  -b BamFiles
       -u hg19EnsGenes.ucsc.gz -a RBambedSource.R  -g H_sapiens_Feb_2009 

**************************************************************************************

**************************************************************************************
**                            Primer3 Wrapper: Dec  2006                            **
**************************************************************************************
Wrapper for the primer3 application. Extracts sequence, formats for primer3, executes
and parses the output to a spreadsheet. See http://frodo.wi.mit.edu/primer3/

-f Full path file text for your sequence file, tab delimited, sequence in 1st column.
-s Pick small product sizes (45-80bp), defaults to standard (80-150bp)
-p Full path file text for the primer3_core application. Defaults to
     /nfs/transcriptome/software/noarch/T2/64Bit_Primer3_1.0.0/src/primer3_core
-m Full path file text for the mispriming library. Defaults to
     /nfs/transcriptome/software/noarch/T2/64Bit_Primer3_1.0.0/
     cat_humrep_and_simple.cgi.txt

Example: java -jar pathTo/T2/Apps/Primer3Wrapper -f /home/dnix/seqForQPCR.txt -p
    /nfs/transcriptome/software/noarch/T2/64Bit_Primer3_1.0.0/src/primer3_core
    -m /nfs/transcriptome/software/noarch/T2/64Bit_Primer3_1.0.0/
    cat_humrep_and_simple.cgi.txt -s 
**************************************************************************************

**************************************************************************************
**                           Print Select Columns: Sept 2010                        **
**************************************************************************************
Spread sheet manipulation.

Required Parameters:
-f Full path file or directory text for tab delimited text file(s)
-i Column indexs to print, comma delimited, no spaces
-n Number of initial lines to skip
-l Print only this last number of lines
-c Column word to append onto the start of each line
-r Append a row number column as the first column in the output
-d Append f ile text onto the start of each line
-s Skip blank lines and those with less than the indicated number of columns.
-a Print all available columns.

Example: java -jar pathTo/T2/PrintSelectColumns -f /TabFiles/ -i 0,3,9 -n 1 -c chr

**************************************************************************************

**************************************************************************************
**                                 QCSeqs: Nov 2009                                 **
**************************************************************************************
QCSeqs takes directories of chromosome specific PointData xxx.bar.zip files that 
represent replicas of signature sequencing data, merges the strands, uses a sliding
window to sum the hits, and calculate Pearson correlation coefficients for the window
sums between each pair of replicas.  Only windows with a sum score >= the minimum 
are included in the correlation.

-d Split chromosome Point Data directories, full path, comma delimited. (These should
       contain chromosome specific xxx.bar.zip files). 
-t Temp directory, full path. This will be created and then deleted.
-w Window size in bps, defaults to 500.
-s Window step size in bps, defaults to 250.
-m Minimum window sum score, defaults to 5.
-e (Optional) Provide a full path file name in which to write the window sums.

Example: java -Xmx1500M -jar pathTo/USeqs/Apps/QCSeqs -d /Solexa/PolII/Rep1PntData/,
      /Solexa/PolII/Rep2PntData/ -t /Solexa/PolII/TempDelMe -w 1000 -s 250 

**************************************************************************************

**************************************************************************************
**                                Query Indexer: May 2018                           **
**************************************************************************************
Builds index files for Query by recursing through a data directory looking for bgzip
compressed and tabix indexed genomic data files (e.g. vcf, bed, maf, and custom).
Interval trees are built containing regions that overlap with one or more data sources.
These are used by the Query REST service to rapidly identify which data files contain
records that overlap user's ROI. This app is threaded for simultanious file loading
and requires >30G RAM to run on large data collections so use a big analysis server.
Note, relative file paths are saved. So long as the structure of the Data Directory is
preserved, the QueryIndexer and Query REST service don't need to run on the same file
system.

Required Params:
-c A bed file of chromosomes and their lengths (e.g. chr21 0 48129895) to use to 
     building the intersection index. Exclude those you don't want to index. For
     multiple builds and species, add all, duplicates will be collapsed taking the
     maximum length. Any 'chr' prefixes are ignored when indexing and searching.
-d A data directory containing bgzipped and tabix indexed data files. Known file types
     include xxx.vcf.gz, xxx.bed.gz, xxx.bedGraph.gz, and xxx.maf.txt.gz. Others will
     be parsed using info from the xxx.gz.tbi index. Be sure to normalize and
     decompose_blocksub all VCF records, see http://genome.sph.umich.edu/wiki/Vt.
     Files may be hard linked but not soft.
-t Full path directory containing the compiled bgzip and tabix executables. See
     https://github.com/samtools/htslib
-i A directory in which to save the index files

Optional Params:
-s One or more directory paths, comma delimited no spaces, to skip when building
     interval trees but make available for data source record retrieval. Useful for
     whole genome gVCFs and read coverage files that cover large genomic regions.
-q Quiet output, no per record warnings.

Example for generating the test index using the GitHub Query/TestResources files see
https://github.com/HuntsmanCancerInstitute/Query

d=/pathToYourLocalGitHubInstalled/Query/TestResources
java -Xmx10G -jar pathToUSeq/Apps/QueryIndexer -c $d/b37Chr20-21ChromLen.bed -d $d/Data
-i $d/Index -t ~/BioApps/HTSlib/1.3/bin/ -s $d/Data/Public/B37/GVCFs 

**************************************************************************************

**************************************************************************************
**                            Randomize Text File: May 2013                         **
**************************************************************************************
Randomizes the lines of a text file(s).

Options:
-f Full path to a text file or directory containing such to randomize. Gzip/zip OK.
-n Number of lines to print, defaults to all.

Example: java -Xmx4G -jar pathTo/Apps/RandomizeTextFile -n 24560 -f
       /TilingDesign/oligos.txt.gz

************************************************************************************

**************************************************************************************
**                          Ranked Set Analysis: Jan 2006                           **
**************************************************************************************
RSA performs set analysis (intersection, union, difference) on lists of
genomic regions (tab delimited: chrom, start, stop, score, (optional notes)).

-a Full path file text for the first list of genomic regions.
-b Full path file text for the second list of genomic regions.
-d (Optional) Full path directory containing region files for all pair analysis.
-m Max gap, bps, set negative to force an overlap, defaults to -100
-s Save comparison as a PNG, default is no.

Example: java -jar pathTo/T2/Apps/RankedSetAnalysis -a /affy/nonAmpA.txt -b
      /affy/nonAmpB.txt -s

**************************************************************************************

**************************************************************************************
**                               Read Coverage: Feb 2012                            **
**************************************************************************************
Generates read coverage stair-step xxx.bar graph files for visualization in IGB. Will
also calculate per base coverage stats for a given file of interrogated regions and
create a bed file of regions with low coverage based on the minimum number of reads.
By default, graph values are scaled per million mapped reads.

Options:
-p Point Data directories, full path, comma delimited. Should contain chromosome
       specific xxx.bar.zip or xxx_-_.bar files. Can also provide one dir containing
       PointData dirs.
-s Save directory, full path.
-k Data is stranded, defaults to merging strands while generating graphs.
-a Data contains hit counts due to running it through the MergePointData app.
-r Don't scale graph values. Leave as actual read counts. 
-i (Optional) Full path file text for a tab delimited bed file (chr start stop ...)
       containing interrogated regions to use in calculating a per base coverage
       statistics. Interbase coordinates assumed.
-m Minimum number reads for defining good coverage, defaults to 8. Use this in combo
       with the interrogated regions file to identify poor coverage regions.
-b Just calculate stats, skip coverage graph generation.
-l Plus scalar, for stranded RC output, defaults to # plus observations/1000000
-n Minus scalar, for stranded RC output, defaults to # minus observations/1000000
-c Combine scaler, defaults to # observations/1000000

Example: java -Xmx1500M -jar pathTo/USeq/Apps/ReadCoverage -p
      /Data/Ets1Rep1/,/Data/Ets1Rep2/ -s /Data/MergedHitTrckEts1 -i 
      /CapSeqDesign/interrogatedExonsChrX.bed

**************************************************************************************

**************************************************************************************
**                            Reference Mutator  : Aug 2014                         **
**************************************************************************************
Takes a directory of fasta chromosome sequence files and converts the reference allele
to the alternate provided by a snp mapping table.

Required:
-f Full path to a directory containing chromosome specific fasta files. zip/gz OK.
-t Full path to a snp mapping table.
-s Full path to a directory to save the alternate fasta files.

Example: java -Xmx10G -jar pathTo/USeq/Apps/ReferenceMutator -f /Hg19/Fastas
    -s /Hg19/AltFastas/ -t /Hg19/omni2.5SnpMap.txt

**************************************************************************************

**************************************************************************************
**                          RNA Editing PileUp Parser: June 2013                    **
**************************************************************************************
Parses a SAMTools mpileup output file for refseq A bases that show evidence of
RNA editing via conversion to Gs, stranded. Base fraction editing is calculated for
bases passing the thresholds for viewing in IGB and subsequent clustering with
the RNAEditingScanSeqs app. The parsed PointData can be further processed using the
methylome analysis applications.

Options:
-p Path to a mpileup file (.gz or.zip OK, use 'samtools mpileup -Q 13 -A -B' params).
-v Versioned Genome (ie H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-s Save directory, full path, defaults to pileup file directory.
-r Minimum read coverage, defaults to 5.
-t Generate stranded specific reference calls, defaults to non stranded. Required for
      stranded down stream analysis.
-m Skip processing chrM.

Example: java -Xmx4G -jar pathTo/USeq/Apps/RNAEditingPileUpParser -t -p 
      /Pileups/N2.mpileup.gz -v C_elegans_Oct_2010

**************************************************************************************

**************************************************************************************
**                           RNA Editing Scan Seqs: April 2014                      **
**************************************************************************************
RESS attempts to identify clustered editing sites across a genome using a sliding
window approach.  Each window is scored for the pseudomedian of the base fraction
edits as well as the probability that the observations occured by chance using a
permutation test based on the chiSquare goodness of fit statistic. 

Options:
-s Save directory, full path.
-e Edited PointData directory from the RNAEditingPileUpParser.
       These should contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories. These will be merged when scanning.
-r Reference PointData directory from the RNAEditingPileUpParser. Ditto.

Advanced Options:
-a Minimum base read coverage, defaults to 5.
-b Minimum base fraction edited to use in analysis, defaults to 0.01
-w Window size, defaults to 50.
-p Minimum window pseudomedian, defaults to 0.005.
-m Minimum number observations in window, defaults to 3. 
-t Run a stranded analysis, defaults to non-stranded.
-i Remove base fraction edits that are non zero and represented by just one edited
       base.

Example: java -Xmx4G -jar pathTo/USeq/Apps/RNAEditingScanSeqs -s /Results/RESS -p 0.01
-e /PointData/Edited -r /PointData/Reference 

**************************************************************************************

**************************************************************************************
**                                    RNASeq: Aug 2016                              **
**************************************************************************************
The RNASeq application is a wrapper for processing RNA-Seq data through a variety of
USeq applications. It uses the DESeq2 package for calling significant differential
expression.  3-4 biological replicas per condition are strongly recommended. See 
http://useq.sourceforge.net/usageRNASeq.html for details constructing splice indexes,
aligning your reads, and building a proper gene (NOT transcript) table. Use this
application as a first pass transcriptome analysis. Run the individual apps for a fine
tuned analysis (e.g. DefinedRegionDifferentialSeq), see 
http://useq.sourceforge.net/usageRNASeq.html

The workflow:
   1) Converts raw sam alignments containing splice junction coordinates into genome
          coordinates outputting sorted bam alignemnts.
   2) Makes relative read depth coverage tracks.
   3) Scores known genes for differential exonic and intronic expression using DESeq2
         and alternative splicing with a chi-square test.
   4) Identifies unannotated differentially expressed transfrags using a window
         scan and DESeq2.

Options:
-s Save directory, full path.
-t Treatment alignment file directory, full path.  Contained within should be one
       directory per biological replica, each containing one or more raw
       SAM (.gz/.zip OK) files.
-c Control alignment file directory, ditto.  
-n Data is stranded. Only analyze reads from the same strand as the annotation.
-j Reverse stranded analysis.  Only count reads from the opposite strand of the
       annotation.  This setting should be used for strand-specific dUTP protocols.
-k Flip the strand of the second read pair.
-b Reverse the strand of both pairs.  Use this option if you would like the orientation
      of the alignments to match the orientation of the annotation in Illumina stranded 
      dUTP sequencing.
-w Don't add non phred transformed p-value columns to spreadsheet, defaults to adding.
-x Max per base alignment depth, defaults to 50000. Genes containing such high
       density coverage are ignored.
-v Genome version (e.g. H_sapiens_Feb_2009, M_musculus_Jul_2007), see UCSC FAQ,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-g UCSC RefFlat or RefSeq gene table file, full path. Tab delimited, see RefSeq Genes
       http://genome.ucsc.edu/cgi-bin/hgTables, (uniqueName1 name2(optional) chrom
       strand txStart txEnd cdsStart cdsEnd exonCount (commaDelimited)exonStarts
       (commaDelimited)exonEnds). Example: ENSG00000183888 C1orf64 chr1 + 16203317
       16207889 16203385 16205428 2 16203317,16205000 16203467,16207889 . NOTE:
       this table should contain only ONE composite transcript per gene (e.g. use
       Ensembl genes NOT transcripts). Use the MergeUCSCGeneTable app to collapse
       transcripts to genes. See the RNASeq usage guide for details.
-r Full path to R, defaults to '/usr/bin/R'. Be sure to install DESeq2, gplots, and
       qvalue Bioconductor packages.

Advanced Options:
-m Combine replicas and run single replica analysis using binomial based statistics,
       defaults to DESeq and a negative binomial test.
-a Maximum alignment score. Defaults to 120, smaller numbers are more stringent.
-o Don't delete overlapping exons from the gene table.
-e Print verbose output from each application and retain temp files.
-p Run SAMseq in place of DESeq.  This is suggested when you have five or more
      replicates in each condition, and not suggested if you have fewer.  Note 
      that it can't be run if you don't have at least two replicates per condition

Example: java -Xmx2G -jar pathTo/USeq/Apps/RNASeq -v D_rerio_Dec_2008 -t 
      /Data/PolIIMut/ -c /Data/PolIIWT/ -s
      /Data/Results/MutVsWT -g /Anno/zv8Genes.ucsc 

**************************************************************************************

**************************************************************************************
**                            RNA Seq Simulator: Aug 2011                           **
**************************************************************************************
RSS takes SAM alignment files from RNA-Seq data and simulates over dispersed, multiple
replica, differential, non-stranded RNA-Seq datasets. 

Options:
-u UCSC RefFlat or RefSeq gene table file, full path. See,
       http://genome.ucsc.edu/cgi-bin/hgTables, (name1 name2(optional) chrom strand
       txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds)
-p PointData directories, full path, comma delimited. These should contain parsed
       PointData (chromosome specific xxx_-/+_.bar.zip files) from running the
       NovoalignParser on all of your novoaligned RNA-Seq data. 
-n A full path directory name containing 3 or 4 equally split, randomized alignment
       xxx.sam (.zip or .gz) files. One for each replica you wish to simulate. Use the
       RandomizeTextFile and FileSplitter apps to generate these.

Default Options:
-g Number of genes to make differentially expressed, defaults to 500
-r Minimum number of mapped reads to include a gene in the differential expression
       defaults to 50.
-a Smallest skew factor for differential expression, defaults to 0.2
-b Largest skew factor for differential expression, defaults to 0.8
-c Smallest excluded skew factor for differential expression and for overdispersion,
       defaults to 0.45
-d Largest excluded skew factor for differential expression and for overdispersion,
       defaults to 0.55
-o Don't overdisperse datasets, defaults to overdispersing data using -c and -d params.
-s Skip intersecting genes.

Example: java -Xmx12G -jar pathTo/USeq/Apps/RNASeqSimulator -u 
       /anno/hg19RefFlatKnownGenes.ucsc.txt -p /Data/Heart/MergedPointData/ -n
       /Data/Heart/SplitSAM/ -s 46 -r 15 -g 1000 

**************************************************************************************

**************************************************************************************
**                               S3UrlMaker : April 2016                            **
**************************************************************************************
Generates Amazon S3 signed and timed URLs.

Required:
-b Amazon bucket name.
-p Relative 'path' to a 'directory' or 'file' within the bucket to fetch S3 objects,
     recursive.
-c Path to an Amazon profile credentials text file containing:
     [USER_NAME]
     aws_access_key_id=YOUR_ACCESS_KEY_ID
     aws_secret_access_key=YOUR_SECRET_ACCESS_KEY
-u Amazon USER_NAME in credentials file.

Optional:
-d Create the URLs, defaults to just listing the objects for creation.

-t Hours until URLs expire, defaults to 72.
-s Silence non error messages.

Example: java -Xmx4G -jar pathTo/USeq/Apps/S3UrlMaker -b aruplab -t 24 
     -u gip@gmail.com -p results/dev/2016/12-123-123456 -c ~/Amazon/cred.txt 

**************************************************************************************

**************************************************************************************
**                                Sam 2 Fastq: May 2018                              **
**************************************************************************************
Given a query name sorted alignment file, S2F writes out fastq data for paired and 
unpaired alignemnts. Any non-primary, secondary, or supplemental 
alignments are written to a failed sam file.  This app doesn't have the memory leak
found in Picard, writes gzipped fastq, and error checks the reads. Provide an
unfiltered fastq-bam to use in retrieving missing mates. S2F will remove pre and
post naming info from the BamBlaster restoring the original fragment names.

Options:
-s Path to a directory for saving parsed data.
-a Path to a query name sorted bam/sam alignment file. 
-u (Optional) Path to a query name sorted unfiltered bam/sam alignment file for use
      in fetching missing mates of the first bam. Convert fastq to bam, then qn sort.

Example: java -Xmx2G -jar pathTo/USeq/Apps/Sam2Fastq -a myQNSorted.bam -s S2F/

**************************************************************************************

**************************************************************************************
**                              Sam Fastq Loader: March 2012                       **
**************************************************************************************
Extracts the original Illumina fastq data from single or paired end sam alignments.
Assumes alignments and reads are in the same order. In novoalign, set -oSync .

Options:
-a Sam alignment txt file, full path, .gz/.zip OK.
-f First read fastq file, ditto.
-s (Optional) Second read fastq file, from paired read sequencing, ditto.

Example: java -Xmx1G -jar pathToUSeq/Apps/Sam2Fastq -a /SAM/unaligned.sam.gz -f 
     /Fastq/X1_110825_SN141_0377_AD06YNACXX_1_1.txt.gz -s 
     /Fastq/X1_110825_SN141_0377_AD06YNACXX_1_2.txt.gz

**************************************************************************************

**************************************************************************************
**                                Sam 2 USeq : April 2019                           **
**************************************************************************************
Generates per base read depth stair-step graph files for genome browser visualization.
By default, values are scaled per million mapped reads with no score thresholding. Can
also generate a list of regions that pass and fail a minimum coverage depth.

Required Options:
-f Full path to a bam or a sam file (xxx.sam(.gz/.zip OK) or xxx.bam) or directory
      containing such. Multiple files are merged. Also works with a directory of
      ChromData from MergePairedAlignments (faster and no overlap double counting).
-v Versioned Genome (ie H_sapiens_Mar_2006, D_rerio_Jul_2010), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.

Default Options:
-s Generate strand specific coverage graphs.
-m Minimum mapping quality score. Defaults to 0, bigger numbers are more stringent.
      This is a phred-scaled posterior probability that the mapping position of read
      is incorrect.
-a Maximum alignment score. Defaults to 1000, smaller numbers are more stringent.
-r Don't scale graph values. Leave as actual read counts. 
-e Scale repeat alignments by dividing the alignment count at a given base by the
      total number of genome wide alignments for that read.  Repeat alignments are
      thus given fractional count values at a given location. Requires that the IH
      tag was set.
-g Set the scalar count to this value, defaults to the number of passing alignments.
-b Path to a region bed file (tab delim: chr start stop ...) to use in calculating
      read coverage statistics.  Be sure these do not overlap! Run the MergeRegions app
      if in doubt.
-x Maximum read coverage stats calculated, defaults to 100, for use with -b.
-p Path to a file for saving per region coverage stats. Defaults to variant of -b.
-c Print all regions that meet a minimum # counts, defaults to 0, don't print.
      Requires -b to enable whole genome scan.
-l Print regions that also meet a minimum length, defaults to 0.
-n Root name for the pass and fail coverage bed files generated by -b and -c
-o Path to log file.  Write coverage statistics to a log file instead of stdout.
-k Make average alignment length graph instead of read depth.
-d Full path to a directory for saving bar binary PointData, defaults to not saving.
-j Write summary stats in json format to this file, requires -b and -c.
-y Include CIGAR Ns in read coverage, defaults to just M values.
-z Flip the strand of the 2nd of pair alignments.
-w Path to a config txt file for setting the above.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/Sam2USeq -f /Data/SamFiles/ -r
     -v H_sapiens_Feb_2009 -b ccdsExons.bed.gz 

**************************************************************************************

**************************************************************************************
**                            SamAlignmentDepthMatcher: Oct 2018                    **
**************************************************************************************
Performs a region by region alignment depth subsampling to output a sam file with
matching alignment depths. Alignments are extracted, mates matched, randomized, then
saved to match the count from the other file. Uses all threads available.

Required options:
-m Bam file, coordinate sorted with index, to use in calculating the alignment depths
      to match.
-s Bam file, coordinate sorted with index, to subsample to match -m and save. Be sure
      this is significantly bigger than -m .
-o Gzipped sam file to write the unsorted alignments from -s, must end in xxx.sam.gz
-b Bed file of regions in which to match alignment depths, xxx.bed(.gz/.zip OK).

Example: java -Xmx25G -jar pathToUSeq/Apps/SamAlignmentDepthMatcher -m tumor.bam
      -s bigMockTumor.bam -o matchedMock.sam.gz -b exomeCaptureTargets.bed.gz

**************************************************************************************

**************************************************************************************
**                        Sam Alignment Extractor: April 2018                       **
**************************************************************************************
Splits an alignment file into those that pass or fail thresholds and intersects
regions of interest. Calculates a variety of QC statistics.

Required Options:
-b Bam alignment file with its associated xxx.bai index, sorted by coordinate. 
-r A regions bed file (chr, start, stop,...) to intersect, full path, see,
       http://genome.ucsc.edu/FAQ/FAQformat#format1 , gz/zip OK.
-s Provide a directory path for saving the filtered alignments

Default Options:
-q Miminum mapping quality, defaults to no filtering, recommend 13.
-a Alignment score threshold, defaults to no filtering. 
-n Smaller alignment scores are better (novo), defaults to bigger are better (bwa).
-d Divide alignment score by the number of CIGAR M bases.
-m Minimum molecular barcode family size, defaults to 0. Requires :FS:# in read name.
-j Write summary stats in json format to this file.
-f Save off target alignments that meet thresholds to the pass file, defaults to fail.
-x Save secondary, supplemental, and non primary alignments, that pass the thresholds
       defaults to fail.
-w Skip writing failing alignments. Speeds up processing but kills the stats.
-u Write unmapped alignments to the pass file, defaults to writing to fail.

Example: java -Xmx4G -jar pathTo/USeq/Apps/SamAlignmentExtractor -q 20 -a 0.75 -d -b
      /Data/raw.bwaMem.bam -r /Data/targetsPad25bp.bed.gz -s/Data/SAE/ 

**************************************************************************************

**************************************************************************************
**                             Sam Comparator  : Dec 2015                           **
**************************************************************************************
Compares coordinate sorted, unique, alignment sam/bam files.  Splits alignments into
those that match or mismatch chrom and position (or sequence).

Required:
-a Full path sam/bam file name. zip/gz OK.
-b Full path sam/bam file name. zip/gz OK.
-s Full path to a directory to save the results.
-p Print paired mismatches to screen.
-f Only process first chrom, defaults to all.
-e Check sequence of pairs.

Example: java -Xmx10G -jar pathTo/USeq/Apps/SamComparator -a /hg19/ref.sam.gz
       -b /hg19/alt.sam.gz -s /hg19/SplitAlignments/

**************************************************************************************

**************************************************************************************
**                                Sam Parser: June 2013                             **
**************************************************************************************
Parses SAM and BAM files into alignment center position PointData xxx.bar files.
For RNASeq data, first run the SamTranscriptomeParser to convert splice junction
coordinates to genomic coordinates and set -m to 0 below.

Options:
-v Versioned Genome (ie H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-f The full path file or directory containing xxx.sam(.gz/.zip OK) or xxx.bam file(s).
      Multiple files will be merged.
-r Full path directory for saving the results.
-m Minimum mapping quality score. Defaults to 13, bigger numbers are more stringent.
      This is a phred-scaled posterior probability that the mapping position of read
      is incorrect. For RNA-Seq data from the SamTranscriptomeParser, set this to 0.
-a Maximum alignment score. Defaults to 60, smaller numbers are more stringent.

Example: java -Xmx1500M -jar pathToUSeq/Apps/SamParser -f /Novo/Run7/
     -v C_elegans_May_2008 -m 0 -a 120  

**************************************************************************************

**************************************************************************************
**                          Sam Transcriptome Parser: March 2017                    **
**************************************************************************************
STP takes SAM alignment files that were aligned against chromosomes and extended
splice junctions (see MakeTranscriptome app), converts the coordinates to genomic
space and sorts and saves the alignments in BAM format. Although alignments don't need
to be sorted by chromosome and position, it is assumed all the alignments for a given
fragment are grouped together. 

Options:
-f The full path file or directory containing raw xxx.sam(.gz/.zip OK) file(s).
      Multiple files will be merged. Skip -f and specify -s to read from standard in.

Default Options:
-s Save file, defaults to that inferred by -f. If an xxx.sam extension is provided,
      the alignments won't be sorted by coordinate or saved as a bam file.
-a Maximum alignment score. Defaults to Float.MAX_VALUE, no threshold.
-m Minimum mapping quality score, defaults to 0 (no filtering), larger numbers are
      more stringent. Only applies to genomic matches, not splice junctions. Set to 13
      or more to require near unique alignments.
-x Maximum mapping quality, reset reads with a mapping quality greater than the max to
      this max.
-n Maximum number of locations each read may align, defaults to 1 (unique matches).
-e Don't randomly pick one random alignment when the maximum number of locations
       threshold fails, fail all.
-r Reverse the strand of the second paired alignment. Reversing the strand is
      needed for proper same strand visualization of paired stranded Illumina data.
-b Reverse the strand of both pairs.  Use this option if you would like the orientation
      of the alignments to match the orientation of the annotation in Illumina stranded 
      UTP sequencing.
-u Save unmapped reads and those that fail the alignment score.
-c Don't remove chrAdapt and chrPhiX alignments.
-j Only print splice junction alignments, defaults to all.
-p Merge proper paired unique alignments. Those that cannot be unambiguously merged
      are left as pairs. Recommended to avoid double counting errors and increase
      base calling accuracy. For paired Illumina UTP data, use -p -r -b .
-q Maximum acceptable  base pair distance for merging, defaults to 300000.
-h Full path to a txt file containing a sam header, defaults to autogenerating the
      header from the read data.

Example: java -Xmx1500M -jar pathToUSeq/Apps/SamTranscriptomeParser -f /Novo/Run7/
     -m 20 -s /Novo/STPParsedBams/run7.bam -p -r  

**************************************************************************************

**************************************************************************************
**                                SamSplitter: July 2015                            **
**************************************************************************************
Randomly splits a sam or bam file in ~1/2. To maintain pairs, sort by queryname! Can
also split by paired and unpaired.

Options:
-s The full path to a queryname sorted xxx.bam or xxx.sam (.gz OK) file.

Default Options:
-a Maximum alignment score. Defaults to 240, smaller numbers are more stringent.
      Approx 30pts per mismatch.
-m Minimum mapping quality score, defaults to 0 (no filtering), larger numbers are
      more stringent. Set to 13 or more to require near unique alignments. DO NOT set
      for alignments parsed by the SamTranscriptomeParser!
-b Bypass all filters and thresholds.
-p Split into paired alignments and unpaired alignments.
-d Don't add PG line to sam header.

Example: java -Xmx1500M -jar pathToUSeq/Apps/SamSplitter -f /Novo/Run7/exome.bam
     -m 20 -a 120  

**************************************************************************************

**************************************************************************************
**                         Sam Read Depth Sub Sampler: Feb 2019                     **
**************************************************************************************
Filters, randomizes, and subsamples a coordinate sorted bam alignment file to a target
base level read depth over each of the provided regions. Depending on the gaps between
your regions, you may need to remove duplicate lines, e.g. 'sort -u body.sam > uni.sam'

Options:
-a Alignment xxx.bam file, coordinate sorted with index.
-b Bed file of regions to subsample (e.g. use Sam2USeq -c 1 -b hg38StdChrms.bed)
-t Target read depth.

Default Options:
-p Keep read groups together.  Causes greater variation in depth.
-x Maximum alignment score. Defaults to 300, smaller numbers are more stringent.
-q Minimum mapping quality score. Defaults to 13, bigger numbers are more stringent.
      For RNASeq data, set this to 0.

Example: java -Xmx25G -jar pathToUSeq/Apps/SamReadDepthSubSampler -x 240 -q 20 -a
      /Novo/Run7/full.bam -n 100 -b regionsWith1PlusAlignment.bed.gz 

**************************************************************************************

**************************************************************************************
**                               Sam SV Filter: Oct 2015                            **
**************************************************************************************
Filters SAM records based on their intersection with a list of target regions for
structural variation analysis. Both mates of a paired alignment are kept if they align to
at least one target region. These are split into those that align to different targets,
(span) the same target with sufficient softmasking on either the left or right side of the
mate pair (soft), or to one target and somewhere else outside of the bed file (single).

Options:
-a Alignment file or directory containing NAME sorted SAM/BAM files. Multiple files
       are processed independantly. Xxx.sam(.gz/.zip) or xxx.bam are OK. Assumes only
       uniquely aligned reads. Remove duplicates with Picard's MarkDuplicates app.
-s Save directory for the results.
-b Bed file (tab delim: chr, start, stop, ...) of target regions interbase coordinates.

Default Options:
-n Mark passing alignments as secondary. Needed for Delly with -n 30 novoalignments.
-d Don't coordinate sort and index alignments.
-x Maximum alignment score. Defaults to 1000, smaller numbers are more stringent.
-q Minimum mapping quality score. Defaults to 5, bigger numbers are more stringent.
-c Chromosomes to skip, defaults to 'chrAdap,chrPhi,chrM,random,chrUn'. Any SAM
       record chromosome name that contains one will be failed.
-m Minimum number of soft masked bases needed to keep paired alignments. Both must intersect
       a target region in the bed file, defaults to 10

Example: java -Xmx25G -jar pathTo/USeq_xxx/Apps/SamSVFilter -x 150 -q 13 -a
      /Novo/Run7/ -s /Novo/Run7/SSVF/ -c 'chrPhi,_random,chrUn_' 

**************************************************************************************

**************************************************************************************
**                              SamSubsampler: June 2016                            **
**************************************************************************************
Filters, randomizes, subsamples and sorts sam/bam alignments, doesn't keep pairs.

Options:
-a Alignment file or directory containing SAM/BAM (xxx.sam(.zip/.gz OK) or xxx.bam).
      Multiple files are merged.
-r Results directory.

Default Options:
-n Number of alignments to print, defaults to all passing thresholds.
-f Fraction alignments to keep, defaults to 1.
-s Sort and index output alignments.
-x Maximum alignment score. Defaults to 300, smaller numbers are more stringent.
-q Minimum mapping quality score. Defaults to 13, bigger numbers are more stringent.
      For RNASeq data, set this to 0.
-b Bypass all filters and thresholds.

Example: java -Xmx25G -jar pathToUSeq/Apps/SamSubsampler -x 240 -q 20 -a
      /Novo/Run7/ -s /Novo/Run7/SR -f 0.05 

**************************************************************************************

**************************************************************************************
**                            Scalpel VCF Parser: Jan 2017                         **
**************************************************************************************
Filters Scalpel VCF INDEL files for various thresholds.  Adds tumor DP and AF values
to the info field.

Required Params:
-v Full path file or directory containing xxx.vcf(.gz/.zip OK) file(s)
-m Minimum QUAL score, defaults to 0
-t Minimum tumor allele frequency (AF), defaults to 0.
-n Maximum normal AF, defaults to 1.
-u Minimum tumor alignment depth, defaults to 0.
-o Minimum normal alignment depth, defaults to 0.
-d Minimum T-N AF difference, defaults to 0.
-r Minimum T/N AF ratio, defaults to 0.
-p Remove non PASS filter field records.

Example: java -jar pathToUSeq/Apps/ScalpelVCFParser -v /VCFFiles/ -t 0.05 -n 0.5 -u 100
        -o 20 -d 0.05 -r 2

**************************************************************************************

**************************************************************************************
**                                  Scan Seqs: July 2015                            **
**************************************************************************************
Takes unshifted stranded chromosome specific PointData and uses a sliding window to
calculate several smoothed window statistics. These include a binomial p-value, a
q-value FDR, an empirical FDR, and a Bonferroni corrected binomial p-value for peak
shift strand skew. These are saved as heat map/ stairstep xxx.bar graph files for
direct viewing in the Integrated Genome Browser. The empFDR is only calculated when
scanning for enriched regions. Provide >2x the # of control reads relative to
treatment to prevent significant sub sampling when calculating the empFDR. If control
data is not provided, simple window sums are calculated.

Options:
-s Save directory, full path.
-t Treatment PointData directories, full path, comma delimited. These should
       contain unshifted stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories.
-c Control PointData directories, ditto. 
-p Peak shift, see the PeakShiftFinder app. Average distance between + and - strand
       peaks. Will be used to shift the PointData and set the window size.
-r Full path to R loaded with Storey's q-value library, defaults to '/usr/bin/R'
       file, see http://genomics.princeton.edu/storeylab/qvalue/

Advanced Options:
-w Window size, defaults to peak shift. A good alternative window size is the
       peak shift plus the standard deviation, see the PeakShiftFinder app.
-e Scan for both reduced and enriched regions, defaults to look for only enriched
       regions. This turns off the empFDR estimation.
-j Scan only one strand, defaults to both, enter either + or - 
-q Don't filter windows using q-value FDR threshold, save all to bar graphs,
       defaults to saving those with a q-value < 40%.
-m Minimum number reads in window, defaults to 2. Increasing this threshold will
       speed up processing considerably but compromises the q-value estimation.
-f Filter windows with high read control read counts. Don't use if looking for
       reduced regions.
-g Control window read count threshold, # stnd devs off median, defaults to 4.
-n Print point graph window representation xxx.bar files.
-a Number treatment observations to use in defining expect and ratio scalars.
-b Number control observations to use in defining expect and ratio scalars.
-u Use read score probabilities (assumes scores are > 0 and <= 1), defaults to
       assigning 1 to each read score. Experimental.

Example: java -Xmx4G -jar pathTo/USeq/Apps/ScanSeqs -t
      /Data/PolIIRep1/,/Data/PolIIRep2/ -c /Data/Input1/,Data/Input2/ -s
      /Data/PolIIResults -w 200 -p 100 -f -g 5 

**************************************************************************************

**************************************************************************************
**                             Subtract Regions: May 2009                           **
**************************************************************************************
Removes regions and parts there of that intersect the masking region file.  Provide
tab delimited bed files (chr start stop ...). Assumes interbase coordinates.

Options:
-m Bed file to use in subtracting/ masking.
-d Directory containing bed files to mask.

Example: java -Xmx4000M -jar pathTo/Apps/SubtractRegions -d /Anno/TilingDesign/
       -m /Anno/repeatMaskerHg18.bed

************************************************************************************

**************************************************************************************
**                           Score Chromosomes: Oct  2012                           **
**************************************************************************************
SC scores chromosomes for the presence of transcription factor binding sites. Use the
following options:

-g The full path directory text to the split genomic sequences (i.e. chr2L.fasta, 
      chr3R.fasta...), FASTA format.
-t Full path file text for the FASTA file containing aligned trimmed examples of
      transcription factor binding sites.  A log likelihood position specific
      probability matrix will be generated from these sequences and used to scan the
      chromosomes for hits to the matrix.
-s Score cut off for the matrix. Defaults to the score of the lowest scoring sequence
      used in making the LLPSPM.
-p Print hits to screen, default is no.
-v Provide a versioned genome (ie H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases, if you would like to write graph LLPSPM
      scores in xxx.bar format for direct viewing in IGB.

Example: java -Xmx4000M -jar pathTo/T2/Apps/ScoreChromosomes -g /my/affy/Hg18Seqs/ -t 
      /my/affy/fgf8.fasta -s 4.9 -v H_sapiens_Mar_2006

**************************************************************************************

**************************************************************************************
**                           ScoreParsedBars: Sept 2008                             **
**************************************************************************************
For each region finds the underlying scores from the chromosome specific bar files.
Prints the scores as well as their mean . A p-value for each region's score can be
calculated using chromosome, interrogated region, length, # scores, and gc matched
random regions. Be sure to set the -u flag if your scores are log2 values.

-r Full path file text for your region file (tab delimited: chr start stop(inclusive)).
-b Full path directory text for the chromosome specific data xxx.bar files.
-o Bp offset to add to the position coordinates, defaults to 0.
-s Bp offset to add to the stop of each region, defaults to 0.
-u Unlog the bar values, set this flag if your scores are log2 transformed.
-g Estimate a p-value for the score associated with each region. Provide a full path
         directory text for chromosome specific gc content boolean arrays. See
         ConvertFasta2GCBoolean app. Complete option -i
-i If estimating p-values, provide a full path file text containing the interrogated
         regions (tab delimited: chr start stop ...) to use in drawing random regions.
-n Number of random region sets, defaults to 1000.
-d Don't print individual scores to screen.

Example: java -jar pathTo/Apps/ScoreParsedBars -b /BarFiles/Oligos/
       -r /Res/miRNARegions.bed -o -30 -s -60 -i /Res/interrRegions.bed
       -g /Genomes/Hg18/GCBooleans/

**************************************************************************************

**************************************************************************************
**                           Score Sequences: July 2007                             **
**************************************************************************************
SS scores sequences for the presence of transcription factor binding sites. Use the
following options:

-g The full path FASTA formatted file text for the sequence(s) to scan.
-t Full path file text for the FASTA file containing aligned trimmed examples of
      transcription factor binding sites.  A log likelihood position specific
      probability matrix will be generated from these sequences and used to scan the
      sequences for hits to the matrix.
-s Score cut off for the matrix. Defaults to zero.

Example: java -Xmx500M -jar pathTo/T2/Apps/ScoreSequences -g /my/affy/DmelSeqs.fasta
      -t /my/affy/zeste.fasta

**************************************************************************************

**************************************************************************************
**                               Sgr2Bar: Jan 2012                                  **
**************************************************************************************
Converts xxx.sgr(.zip) files to chromosome specific bar files.

-f The full path directory/file text for your xxx.sgr(.zip or .gz) file(s).
-v Genome version (ie H_sapiens_Mar_2006, M_musculus_Jul_2007), get from UCSC Browser.
-s Strand, defaults to '.', use '+', or '-'
-t Graphs should be viewed as a stair-step, defaults to bar

Example: java -Xmx1500M -jar pathTo/Apps/Sgr2Bar -f /affy/sgrFiles/ -s + -t
      -v D_rerio_Jul_2006

**************************************************************************************

**************************************************************************************
**                               Simulator: Nov 2008                                **
**************************************************************************************
Generates chIP-seq simulated sequences for aligning to a reference genome.

-f Directory containing xxx.fasta files with genomic sequence. File names should
     represent chromosome names (e.g. chr1.fasta, chrY.fasta...)
-r Results directory
-b Bed file containing repeat locations (e.g. RepeatMasker.bed)
-n Number of spike-ins, defaults to 1000
-g Number of random fragments to generate for each spike-in, defaults to 1000
-s Minimum size of a fragment, defaults to 150
-x Maximum size of a fragment, defaults to 350
-l Length of read, defaults to 26
-e Comma delimited text of per base % error rates, defaults to 0.5,0.528,0.556,...

Example: java -Xmx1500M -jar pathTo/USeq/Apps/Simulator -f /Hg18/Fastas -r /Spikes/
    -b /Hg18/Repeats/repMsker.bed -l 36

**************************************************************************************

**************************************************************************************
**                                 StandedBisSeq: Feb 2011                          **
**************************************************************************************
Looks for strand bias in CG methylation from one dataset using fischer or chi-square
tests followed by a Benjamini and Hochberg FDR correction. Merges significant CGs
within max gap into larger regions. WARNING: many bisulfite datasets display strand
bias due to preferential breakage of C rich regions.  Use this app with caution.

Options:
-s Save directory, full path.
-c Converted PointData directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories. These will be merged. Use the ParsePointDataContexts to filter
       for just CG contexts.
-n Non-converted PointData directories, ditto. 
-f Fasta files for each chromosome.

Default Options:
-p Minimimal FDR for stranded methylation, defaults to 30, a -10Log10(FDR = 0.001)
       conversion.
-l Log2Ratio threshold for stranded methylation, defaults to 1.585 (3x).
-w Window size, defaults to 500.
-m Minimum #C obs in window, defaults to 4. 
-o Minimum coverage for CG bp methylation scanning, defaults to 2.
-x Max gap between significant CGs to merge, defaults to 500bp.
-g Generate graph files for IGB, defaults to just identifying biased regions.
-r Full path to R, defaults to '/usr/bin/R'

Example: java -Xmx12G -jar pathTo/USeq/Apps/StandedBisSeq -c /Data/Sperm/Converted -n 
      /Data/Sperm/NonConverted -s /Data/Sperm/StrandedBisSeqRes -g -p 20 - l 1 -f
      /Genomes/Hg18/Fastas/ 

**************************************************************************************

**************************************************************************************
**                               SRA Processor: Nov 2013                            **
**************************************************************************************
Fetchs SRA files from the Sequence Read Archive and converts them to gzipped fastq.
Use in conjunction with Tomato to align these on the ember cluster. Be sure the SRA
archives you want are really in fastq format.

Required Parameters:
-n Names of SRRs (runs) or SRPs (projects) to fetch, comma delimited, no spaces.
       (e.g. SRR016669 or SRP000401).
-f Fastq-dump executable, full path, from the SRA Toolkit, download from
       http://www.ncbi.nlm.nih.gov/Traces/sra/?view=software
-s Save directory, full path.

Optional Parameters:
-c Full path to a cmd.txt file to copy into converted SRA folders. If the save
       directory is scanned by tomato, a tomato job is then launched,
       see http://bioserver.hci.utah.edu/BioInfo/index.php/Software:Tomato
-q Set quality score offset to 64, defaults to 33. Needed for some Illumina datasets.

Example: java -Xmx4G -jar pathTo/USeq/Apps/SRAProcessor -n SRP000401 /
      -s /tomato/job/Nix/SRP000401/ -f ~/sratoolkit.2.1.8-centos_linux64/fastq-dump
      -c /tomato/job/Nix/SRP000401/cmd.txt 

**************************************************************************************

**************************************************************************************
**                             SubSamplePointData:  Dec 2008                        **
**************************************************************************************
SSPD takes PointData directories and randomly selects points from each directory and
saves the merge.

-f Comma delimited full path PointDataDirectories from which to draw or a single 
       directory containing multiple PointDataDirectories.
-n Total number of observations desired.
-s Full path file directory in which to save the results.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/SubSamplePointData -n 10000000 -f
    /Data/WCE1_Point,/Data/WCE2_Point,/Data/WCE3_Point -s /Data/Sub/ 

**************************************************************************************

**************************************************************************************
**                                Tag2Point: May 2010                               **
**************************************************************************************
Splits and converts tab delimited text (chr start stop ... strand (+ or -)) text
files into center position binary xxx.bar files. Use the appropriate options
to convert your coordinates into interbase coordiantes (zero based, stop excluded).

-v Versioned Genome (e.g. H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-i Strand column index, defaults to 5. 1st column is zero.
-b Subtract one from the beginning of each region.
-e Add one to the stop of each region.
-s Shift centered position x bps 3', defaults to 0.
-f The full path directory/file text of your text file(s) (.gz/.zip OK) .
-c Append 'chr' onto the chromosome column (your data lacks the prefix).

Example: java -Xmx1500M -jar pathTo/T2/Apps/Tag2Point -f /Solexa/BedFiles/
     -v H_sapiens_Mar_2006 -b 

**************************************************************************************

**************************************************************************************
**                             Tempus Json 2 Vcf: March 2019                        **
**************************************************************************************
Parses json Tempus reports to vcf. Leave in PHI to enable calculating age at
diagnosis. Summary statistics calculated for all reports. Vcfs will contain a mix of 
somatic and inherited snvs, indels, and cnvs. Be sure to vt normalize the exported
vcfs, https://github.com/atks/vt 

Options:
-j Path to Tempus json report or directory containing such, xxx.json(.gz/.zip OK)
-s Path to a directory for saving the results.
-b Path to a bed file for converting CNV gene names to coordinates where the name
     column contains just the gene name.
-f Path to the reference fasta with xxx.fai index. Required for CNV conversions.

Example: java -Xmx2G -jar pathToUSeq/Apps/TempusJson2Vcf -j /F1/TempusJsons
     -f /Ref/human_g1k_v37.fasta -s /F1/VCF/ -b /Ref/b37TempusGeneRegions.bed.gz 

**************************************************************************************

**************************************************************************************
**                           Tempus Vcf Comparator: March 2019                      **
**************************************************************************************
TVC compares a Tempus vcf generated with the TempusJson2Vcf to a recalled vcf.
Exact recall vars are so noted and removed. Tempus vcf with no exact but one
overlapping record can be merged with -k. Be sure to vt normalize each before running.
Recall variants failing FILTER are not saved.

Options:
-t Path to a TempusOne vcf file, see the TempusJson2Vcf app.
-r Path to a recalled snv/indel vcf file.
-m Path to named vcf file for saving the results.
-c Append chr if absent in chromosome name.
-g Exclude 'inherited' germline Tempus records from the comparison and merged output.
-s Exclude 'somatic' tumor Tempus records from the comparison and merged output.

Example: java -Xmx2G -jar pathToUSeq/Apps/TempusVcfComparator -f TL-18-03CFD6.vcf
     -r /F1/TL-18-03CFD6_recall.vcf.gz -g -c -m /F1/TL-18-03CFD6_merged.vcf.gz -k 

**************************************************************************************

**************************************************************************************
**                                 Text 2 USeq: Nov 2018                            **
**************************************************************************************
Converts text genomic data files (e.g. xxx.bed, xxx.gff, xxx.sgr, etc.) to
binary USeq archives (xxx.useq).  Assumes interbase coordinates. Only select
the columns that contain relevant information.  For example, if your data isn't
stranded, or you want to ignore strands, then skip the -s option.  If your data
doesn't have a value/ score then skip the -v option. Etc. Use the USeq2Text app to
convert back to text format. 

Required Parameters:
-f Full path file/directory containing tab delimited genomic data files.
-g Genome verison using DAS notation (e.g. H_sapiens_Mar_2006, M_musculus_Jul_2007),
      see http://genome.ucsc.edu/FAQ/FAQreleases#release1
-c Chromosome column index
-b Position/Beginning column index

Optional Parameters:
-s Strand column index (+, -, or .; NOT F, R)
-e End column index
-t Text column index(s), comma delimited, no spaces, defines which columns
      to join using a tab.
-v Value column index(s), ditto. One or two, if two, i[0]/(i[0]+i[1]) is calculated.
-i Index size for slicing split chromosome data (e.g. # rows per slice),
      defaults to 10000.
-r For graphs, select a style, defaults to 0
      0	Bar
      1	Stairstep
      2	HeatMap
      3	Line
-h Color, hexadecimal (e.g. #6633FF), enclose in quotations
-d Description, enclose in quotations 
-p Prepend chr onto chromosome name. Required for UCSC bb or bw formats.
-l Minus 10 Log10 transform values. Requires setting -v .
-m Convert chromosome names containing M to chrM .
-o Subtract one from beginning position.

Example: java -Xmx4G -jar pathTo/USeq/Apps/Text2USeq -f
      /AnalysisResults/BedFiles/ -c 0 -b 1 -e 2 -i 5000 -h '#6633FF'
      -d 'Final processed chIP-Seq results for Bcd and Hunchback, 30M reads'
      -g H_sapiens_Feb_2009 

Indexes for common formats:
       bed3 -c 0 -b 1 -e 2
       bed5 -c 0 -b 1 -e 2 -t 3 -v 4 -s 5
       bed12 -c 0 -b 1 -e 2 -t 3,6,7,8,9,10,11 -v 4 -s 5
       gff w/scr,stnd,name -c 0 -b 3 -e 4 -v 5 -s 6 -t 8

**************************************************************************************

**************************************************************************************
**                                TomatoFarmer: June 2015                           **
**************************************************************************************
TomatoFarmer controls an exome analysis from start to finish.  It creates alignment 
jobs for each of the samples in your directory, waits for all jobs to finish and then 
launches metrics and variant calling jobs.  Jobs will be resubmitted up to a set 
number of times to combat spurious CHPC erors.  Job directories are left behind so 
you can save your log files.


Required Arguments:

-d Job directory.  This directory must be a subdirectory of /tomato/version/job. Can 
       can be several directory levels below /tomato/version/job. 
       Example: '-d /tomato/dev/job/krustofsky/demo'. 
-e Email address.  TomatoFarmer emails you once the job completes/fails. You can also
       opt to get all tomato emails as individual jobs start/end (see option -x).
       Example: '-e hershel.krustofsky@hci.utah.edu'.
-y Analysis pipeline. The analysis pipeline or step to run.  Current options are: 
          Full pipline:
          1) ugp_full - UGP v1.3.0 pipeline, includes: Alignment, metrics and variant
             calls. Defaults to bwa and raw variant filtering
          Indivdual Steps
          2) ugp_align - Alignment/recalibration only. Defaults to bwa.
          6) metrics - Sample QC metrics only. Requires sample level  *final.bam 
              files from aligment steps.
          7) ugp_variant - Variant detection and filtering. Requires 1 or more 
             *final.bam files from alignment step. Defaults to raw variant filtering.
      Example: '-y ugp_full'.
-p Properties file.  This file contains a list of cluster-specific paths and options 
      this file doesn't need to be changed by the user. Example: '-p properties.txt' 

Optional Arguments:

-n Use novoalign. This option will change the aligner from bwa to novoalign.
-r Use variant recalibration. This option will change variant filtration from raw 
   to vqsr.  This should only be used if there are enough samples in the study or 
   1K genome background files are used.
-g Full genome.  Use this option if you want to detect variants genome-wide
-t Target regions. Setting this argument will restrict coverage metrics and variant 
      detection to targeted regions.  This speeds up the variation detection process
      and reduces noise. Options are:
          1) AgilentAllExonV4
          2) AgilentAllExonV5
          3) AgilentAllExonV5UTR
          4) AgilentAllExon50MB
          5) NimbleGenEZCapV2
          6) NimbleGenEZCapV3
          7) TruSeq
          8) path to custom targed bed file.
      If nothing is specifed for this argument, gatk exome boundaries are used. 
      Example: '-t truseq'.
-b 1K Genome samples.  Use this option if you want to spike in 200 1K genome samples 
      as the background sample set.  This should improve VQSR variant calling and 
      VAAST, but it will take a lot more time to process. BETA, only works for core 
      users!
-s Study name.  Set this if you want your VCF files to have a prefix other than 
      'STUDY'. Example: '-s DEMO'.
-v Validate fastq files.  TomatoFarmer will validate your fastq files before running 
      This is required if any of your samples are ASCII-64
-c Cluster. Specify cluster, defaults to all available clusters

Example: java -Xmx4G -jar pathTo/USeq/Apps/TomatoFarmer -d /tomato/version/job/demo/
      -e herschel.krustofsky@hci.utah.edu -y ugp_bwa -r -b -s DEMO 
      -t AgilentAllExon50MB

**************************************************************************************

**************************************************************************************
**                              Telescriptor:  Sept 2014                            **
**************************************************************************************
Compares two RNASeq datasets for possible telescripting. Generates a spreadsheet of
statistics for each gene as well as a variety of graphs in exonic bp space. The
ordering of A and B is important since A is window scanned to identify the maximal 5'
region. Thus A should be where you suspect telescripting, B where you do not.

Options:
-t Directory of bam files representing the first condition A.
-c Directory of bam files representing the second condition B.
-u UCSC refflat formatted Gene table. Run MergeUCSCGeneTable on a transcript table.
-s Director in which to save the results.
-r Full path to R, defaults to '/usr/bin/R', with installed ggplot2 package.

Default Options:
-g Minimum gene alignment count, defaults to 50
-a Minimum window + background alignment count, defaults to 25
-k Minimum Log2(ASkew/BSkew), defaults to 2. Set to 0 to print all.
-b Minimum base read coverage for log2Ratio graph output, defaults to 10
-l Minimum transcript exonic length, defaults to 250
-w Size of 5' window for scanning, defaults to 125
-f Fraction of exonic gene length to calculate background, defaults to 0.5
-i Data is not stranded, assumes both first and second reads follow annotation.

Example: java -Xmx4G -jar pathTo/USeq/Apps/Telescriptor -u hg19EnsTrans.ucsc -t Bam/T
       -c Bam/C -s GV_MOR 

**************************************************************************************

**************************************************************************************
**                                  TNRunner : April 2019                           **
**************************************************************************************
TNRunner is designed to execute several dockerized snakmake workflows on human tumor
normal datasets via a slurm cluster.  Based on the availability of fastq, Hg38
alignments are run, somatic and germline variants are called, and concordance measured
between sample bams. To execute TNRunner, create the following directory structure and
link or copy in the corresponding paired end Illumina gzipped fastq files.

MyPatientSampleDatasets
   MyPatientA
      Fastq
         NormalDNA
         TumorDNA
         TumorRNA
   MyPatientB
      Fastq
         TumorDNA
         TumorRNA
   MyPatientC....

The Fastq directory and sub directories must match this naming. Only include those
for which you have fastq.  Change the MyXXX to something relevant. TNRunner is
stateless so as more Fastq becomes available or issues are addressed, relaunch the
app. This won't effect running or queued slurm jobs. Relaunch periodically to assess
the current processing status and queue additional tasks or set option -l. Download the
latest workflows from 
https://github.com/HuntsmanCancerInstitute/Workflows/tree/master/Hg38RunnerWorkflows 
and the matching resource bundle from
https://hci-bio-app.hci.utah.edu/gnomex/gnomexFlex.jsp?analysisNumber=A5578 .
All workflow docs are optional although some require output from prior Analysis.

Options:
-p Directory containing one or more patient data directories to process.
-o Other patient's directory, containing additional xxx_final.bam files to include in
   sample concordance. The patient directory naming must match.
-k Directory containing xxxMalePoN.hdf5 and xxxFemalePoN.hdf5 GATK copy ratio
      background files.
-e Workflow docs for launching DNA alignments.
-t Workflow docs for launching RNA alignments.
-c Workflow docs for launching somatic variant calling.
-a Workflow docs for launching variant annotation.
-b Workflow docs for launching bam concordance.
-j Workflow docs for launching joint genotyping.
-y Workflow docs for launching copy analysis.
-v Workflow docs for launching clinical test variant info. Add a ClinicalReport folder to
      each patient dir containing the json formatted clinical information.
-g Germline AnnotatedVcfParser options, defaults to '-d 15 -m 0.2 -x 1 -p 0.01 -g 
      D5S,D3S -n 5 -a HIGH -c Pathogenic,Likely_pathogenic -o -e Benign,Likely_benign'
-s Somatic AnnotatedVcfParser options, defaults to '-d 20 -f'
-r Attempt to restart FAILED jobs from last successfully completed rule.
-d Delete and restart FAILED jobs.
-f Force a restart of all running and uncompleted jobs.
-q Quite output.
-x Maximum # jobs to launch, defaults to 40.
-l Check and launch jobs every hour until all are complete, defaults to launching once.

Example: java -jar pathToUSeq/Apps/TNRunner -p PatientDirs -o ~/FoundationPatients/
     -e ~/Hg38/DNAAlignQC/ -c ~/Hg38/SomaticCaller/ -a ~/Hg38/Annotator/ -b 
     ~/Hg38/BamConcordance/ -j ~/Hg38/JointGenotyping/ -t ~/Hg38/RNAAlignQC/
     -y /Hg38/CopyRatio/ -k /Hg38/CopyRatio/Bkg/ -s '-d 30 -r' -x 10 -l 
     -v /Hg38/Tempus/TempusVcf -l

**************************************************************************************

**************************************************************************************
**                                 TRunner December 2018                            **
**************************************************************************************
TRunner is designed to execute several dockerized snakmake workflows on human tumor
only datasets via a slurm cluster.  Based on the availability of the bams and xml,
fastq is extracted, Hg38 alignments run, somatic variants called, and xml variant info
parsed and compared. To execute TRunner, create the following directory structure and
link or copy in the corresponding raw bam, bai, and xml files.
MyPatientSampleDatasets
   MyPatient1
      Bam
         TumorDNA
         TumorRNA
      Xml
   MyPatient2
      Bam
         TumorDNA
      Xml
   MyPatient3....

The Bam directory and sub directories must match this naming. Change MyXXX to 
something relevant. As more files becomes available or issues are addressed, relaunch
the app. This won't effect running slurm jobs. Relaunch periodically to assess
the current processing status and queue additional tasks. Download the latest
workflows from https://github.com/HuntsmanCancerInstitute/Workflows/tree/master/
Hg38RunnerWorkflows/Foundation and the matching resource bundle from
https://hci-bio-app.hci.utah.edu/gnomex/gnomexFlex.jsp?analysisNumber=A5578

Options:
-p Directory containing one or more patient data directories to process.
-e Workflow docs for launching DNA capture alignments.
-t Workflow docs for launching transcriptome alignments.
-c Workflow docs for launching somatic variant calling.
-a Workflow docs for launching variant annotation.
-v Workflow docs for launching xml variant integration.
-s Somatic AnnotatedVcfParser options, defaults to '-d 50 -f'
-r Attempt to restart FAILED jobs from last successfully completed rule.
-d Delete and restart FAILED jobs.
-f Force a restart of all running and uncompleted jobs.
-q Quite output.
-x Maximum # jobs to launch, defaults to 0, no limit

Example: java -jar pathToUSeq/Apps/TRunner -p FoundationPatients -e 
     ~/Hg38/CaptureAlignQC/WorkflowDocs/ -c ~/Hg38/SomExoCaller/WorkflowDocs/ -a 
     ~/Hg38/Annotator/WorkflowDocs/ -b ~/Hg38/BamConcordance/WorkflowDocs/ -j
     ~/Hg38/JointGenotyping/WorkflowDocs/ -t ~/Hg38/TranscriptomeAlignQC/WorkflowDocs/ 
     -s '-d 100 -f' -x 20 

**************************************************************************************

**************************************************************************************
**                              UCSC Big 2 USeq: Jan 2013                           **
**************************************************************************************
Converts UCSC bigWig (xxx.bw) or bigBed (xxx.bb) archives to xxx.useq archives.

Options:
-b Full path file/directory containing xxx.bw and xxx.bb files. Recurses through sub 
       if a directory is given.
-d Full path directory containing the UCSC bigWigToBedGraph, bigWigToWig, and 
       bigBedToBed apps, download from http://hgdownload.cse.ucsc.edu/admin/exe/ and
       make executable (e.g. chmod 755 /MyApps/UCSC/*).
-v Genome version (e.g. H_sapiens_Mar_2006), get from UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases or IGB 
      http://bioviz.org/igb/releases/current/igb-large.jnlp
-f Force conversion of xxx.bw or xxx.bb overwriting any existing xxx.useq archives.
       Defaults to skipping those already converted.
-e Only print error messages.

Example: java -Xmx4G -jar pathTo/USeq/Apps/USeq2UCSCBig -v M_musculus_Jul_2007 -u
      /AnalysisResults/USeqDataArchives/ -d /MyApps/UCSC/

**************************************************************************************

**************************************************************************************
**                               USeq 2 UCSC Big: May 2016                          **
**************************************************************************************
Converts USeq archives to UCSC bigWig (xxx.bw) or bigBed (xxx.bb) archives based on
the data type. WARNING: bigBed format conversion will clip any associated scores to
between 0-1000. 

Options:
-u Full path file/directory containing xxx.useq files. Recurses through sub 
       if a directory is given.
-d Full path directory containing the UCSC wigToBigWig and bedToBigBed apps, download
       from http://hgdownload.cse.ucsc.edu/admin/exe/ and make executable with chmod.
-b Alternative to -d, specify path to the bedToBigBed app.
-w Ditto, path to wigToBigWig app.
-f Force conversion of xxx.useq to xxx.bw or xxx.bb overwriting any UCSC big files.
       Defaults to skipping those already converted.
-e Only print error messages.
-t Sandbox the UCSC apps by providing a full path file name to the timeout.pl app.
       Download from https://github.com/pshved/timeout . Max time and mem per file 
       conversion 1hr and 4G.
-m Don't delete temp files.

Example: java -Xmx4G -jar pathTo/USeq/Apps/USeq2UCSCBig -u
      /AnalysisResults/USeqDataArchives/ -d /Apps/UCSC/

**************************************************************************************

**************************************************************************************
**                                USeq 2 Text: Oct 2012                             **
**************************************************************************************
Converts USeq archives to text either as minimal native, bed, or wig graph format. 


Options:
-f Full path file/directory containing xxx.useq files.
-b Print bed format, defaults to native text format.
-c Convert scores to bed format 0-1000.
-w Print wig graph format (var step or bed graph), defaults to native format.


Example: java -Xmx4G -jar pathTo/USeq/Apps/USeq2Text -f
      /AnalysisResults/USeqDataArchives/ 

**************************************************************************************

**************************************************************************************
**                             VarScan VCFParser: Dec 2014                          **
**************************************************************************************
Parses and filters VarScan VCF files for those called SOMATIC.  Replaces the QUAl
score with the ssc score.

Required Options:
-v Full path file or directory containing xxx.vcf(.gz/.zip OK) file(s). Recursive!

Example: java -jar pathToUSeq/Apps/VarScanVCFParser -v /VarScan2/VCFFiles/

**************************************************************************************

**************************************************************************************
**                         VCF Background Checker : Jan 2019                        **
**************************************************************************************
VBC calculates non-reference allele frequencies (AF) from a background multi-sample 
mpileup file over each vcf record. It then calculates a z-score for the vcf AF and 
appends it to the INFO field. If multiple bps are affected (e.g. INDELs) or bp padding
provided, the lowest bp z-score is appended. Z-scores < ~4 are indicative of non
reference bps in the background samples. A flag is appended the FILTER field if a
background AF was found within 10% of the vcf AF. Note, VBC requires AF and DP tags
in the INFO field of each record to use in the z-score calculation, see -f and -p. 

Required:
-v Path to a xxx.vcf(.gz/.zip OK) file or directory containing such.
-m Path to a bgzip compressed and tabix indexed multi-sample mpileup file. e.g.:
      1) Mpileup: 'echo "#SampleOrder: "$(ls *bam) > bkg.mpileup; samtools mpileup
             -B -q 13 -d 1000000 -f $fastaIndex -l $bedFile *.bam >> bkg.mpileup'
      2) (Optional) MpileupRandomizer: java -jar -Xmx10G ~/USeqApps/MpileupRandomizer
             -r 20 -s 3 -m bkg.mpileup
      3) Bgzip: 'tabix-0.2.6/bgzip bkg.mpileup_DP20MS3.txt'
         Tabix: 'tabix-0.2.6/tabix -s 1 -b 2 -e 2 bkg.mpileup_DP20MS3.txt.gz'
-s Path to directory in which to save the modified vcf file(s)

Optional:
-f Tumor AF INFO name, defaults to T_AF
-p Tumor DP INFO name, defaults to T_DP
-z Minimum vcf z-score, defaults to 0, no filtering. Unscored vars are kept.
-q Minimum mpileup sample bp quality, defaults to 20
-c Minimum mpileup sample read coverge, defaults to 20
-f Maximum mpileup sample AF, defaults to 0.3
-a Minimum # mpileup samples for z-score calculation, defaults to 3
-e Exclude vcf records that could not be z-scored
-u Replace QUAL value with z-score. Un scored vars will be assigned 0
-d Print verbose debugging output
-t Number of threads to use, defaults to all

Example: java -Xmx4G -jar pathTo/USeq/Apps/VCFBackgroundChecker -v SomaticVcfs/ -z 3
-m bkg.mpileup.gz -s BkgFiltVcfs/ -q 13 -u 

**************************************************************************************

**************************************************************************************
**                             VCF Bam Annotator : Aug 2017                         **
**************************************************************************************
This app pulls read coverage information from each bam over the vcf variants and adds
the data as sample information in the vcf. Of particular use is the breakdown of
alignment depth by R1 and R2. 

Options:
-v Vcf file (xxx.vcf(.gz/.zip OK)) to annotate.
-b Comma delimited list of indexed bam files to extract alignment info.
-n Comma delimited list of 'Sample Names' corresponding to the bams, defaults to the
     bam file names.
-r Annotated vcf results file.
-f Indexed reference fasta used in aligning the bams.
-s Path to the samtools executable.
-g Path to the HTSlib bgzip executable.
-t Path to the HTSlib tabix executable.
-p Number of threads to use, defaults to all available.
Example: java -Xmx4G -jar pathTo/USeq/Apps/VCFConsensus -p illumina.vcf -q Strelka
-s stnd.indel.vcf.gz -t Scalpel -o indelCalls.vcf.gz 

**************************************************************************************

**************************************************************************************
**                                  VCF 2 Bed: June 2017                            **
**************************************************************************************
Converts a vcf file to bed format.

Required Options:
-v Full path file or directory containing xxx.vcf(.gz/.zip OK) file(s).
-p Padding to expand each variants size, defaults to 0
-s Directory to save the bed files, defaults to the parent of the vcf
-e Print out only the END=xxx containing vcf records as the bed based on the end value.

Example: java -jar pathToUSeq/Apps/VCF2Bed -v /VCFFiles/ -p 25 

**************************************************************************************

**************************************************************************************
**                              VCF Annotator : November 2014                       **
**************************************************************************************
VCFAnnotator adds user-specifed annotations to the VCF file INFO field.  Only hg19 is 
supported at this time.  If your VCF file has more than 500,000 records, it will be 
split into smaller VCF files that are annotated separately.  Once annotation is 
complete, the individual annotated files are merged and compressed.  This application 
uses a lot of memory when running large VCF files, so use 20gb of memory when starting
java.

Required:
-v VCF file. Path to a multi-sample vcf file, compressed ok (XXX.vcf/XXX.vcf.gz).
-o Output VCF file.  Path to the annotated vcf file, can be specifed as XXX.vcf or 
   XXX.vcf.gz. If XXX.vcf.gz, the file will be compressed and indexed using tabix.
-p Path to annovar directory.

Optional:
-d dbSNP database.  By default, this application uses dbSNP 137 for annotation. Use 
      this option along with dbSNP database identifier to use a different version, 
      i.e. snp129. The annovar-formatted dbSNP database must be in the annovar data 
      directory for this option to work.
-e Ethnicity.  By default, the 1K frequency is calculated across all ethnicities.  
      If you want to restrict it to one of African (AFR), Admixed American (AMR) 
      East Asian (EAS), European (EUR), South Asian (SAS), use this option 
      followed by the ethnicity identifier.
-a Annotations to add.  By default, this application uses a subset of annovar  
      annotations.  Use a comma-separated list of keys to specify a custom set. 
      Available annotations with (keys): ensembl gene annotations (ENSEMBL), refSeq 
      gene names (REFSEQ), transcription factor binding sites (TFBS), segmental 
      duplicatons (SEGDUP), database of genomic variants (DGV), variant scores 
      (SCORES), GWAS catalog annotations (GWAS), clinvar variations (CLINVAR), 
      dbsnp annotations (DBSNP), nci60 annotations (NCI60), 1K genomes annotations 
      (ONEK), COSMIC annotations (COSMIC), ESP annotations (ESP), OMIM genes and 
      diseases (OMIM), and NIST callable ragions (NIST).  The SCORES option 
       includes SIFT, PolyPhen2, MutationTaster, MutationAssessor, LRT, PhyloP and 
      SiPhy. The ENSEMBL option includes the columns EnsemblRegion, EnsemblName, 
      VarType and VarDesc. 
-r Path to annovar repository.  By default, VCFAnnotator uses the humandb directory in 
      the annovar program directory.  Use this if you want to specify a different 
       directory
 -n VAAST output.  If a VAAST output file is specified, the VCF file is annotated with
      the VAAST variation score and gene rank.
-t Path to tabix directory.
-c Number of genotypes to process per chunk.  The total number of genotypes is the 
      number of records in the VCF * the number of samples in the VCF.  By default 
      the file is split into 10M genotypes.  Can take 20GB or more to run.


Example: java -Xmx20G -jar pathTo/USeq/Apps/VCFAnnotator -v 9908R.vcf 
      -o 9908_ann.vcf.gz 

**************************************************************************************

**************************************************************************************
**                            VCF Call Frequency: Jan 2019                          **
**************************************************************************************
Calculates a vcf call frequency for each variant when pointed at a genomic Query
service (https://github.com/HuntsmanCancerInstitute/Query) or the Data and Index
directories the service is accessing. CallFreq's are calculated 
by first counting the number of exact vcf matches present and dividing it by the
number of intersecting bed files. Use this to flag variants with high call rates that
are potential false positives. Some are not, e.g. BRAF V600E occurs in 58% of cancers
with a BRAF mutation. So treate the call freq as an annotation requiring context level
interpretation. Use the file filter to limit which files are included in the call freq
calculations. Only file paths containing the file filter will be included. Be sure to
place both the sample vcf and associated callable region bed files in the same Query
index folder path as defined by -f. 

Required Options:
-v Full path to a file or directory containing xxx.vcf(.gz/.zip OK) file(s)
-s Directory to save the annotated vcf files
-f Query service file filter, e.g. /B37/Somatic/Avatar/
-c Config txt file containing two tab delimited columns with host, queryUrl, realm, 
     userName, password, and (optionally) fileFilter and or maxCallFreq. 'chmod 600'
     the file! e.g.: 
     host hci-clingen1.hci.utah.edu
     queryUrl http://hci-clingen1.hci.utah.edu:8080/Query/
     realm QueryAPI
     userName FColins
     password g0QueryAP1
-i (Alternative to -c), provide a path to the Index directory generated by the USeq 
     QueryIndexer app.
-d (Alternative to -c), provide a path to the Data directory used in creating the Index.

Options:
-m Maximum call freq, defaults to 1, before appending 'CallFreq' to the FILTER field.
-x Remove failing max call freq records, not recommended.
-b Minimum bed call count before applying a max call freq filter, defaults to 8.
-e Print verbose debugging output.

Example: java -jar pathToUSeq/Apps/VCFCallFrequency -v Vcf/ -s CFVcfs -f 
    /B37/Somatic/Avatar/ -m 0.05 -c vcfCFConfig.txt 

**************************************************************************************

**************************************************************************************
**                              VCF Comparator : March 2017                         **
**************************************************************************************
Compares test vcf file(s) against a gold standard key of trusted vcf calls. Only calls
that fall in the common interrogated regions are compared. WARNING tabix gzipped files
often fail to parse correctly with java. Seeing odd error messages? Try uncompressing.
Be sure a score is provided in the QUAL field.

Required Options:
-a VCF file for the key dataset (xxx.vcf(.gz/.zip OK)).
-b Bed file of interrogated regions for the key dataset (xxx.bed(.gz/.zip OK)).
-c VCF file for the test dataset (xxx.vcf(.gz/.zip OK)). May also provide a directory
       containing xxx.vcf(.gz/.zip OK) files to compare.
-d Bed file of interrogated regions for the test dataset (xxx.bed(.gz/.zip OK)).

Optional Options:
-k Use a bed file of approx key variants (chr start stop type[#alt_#ref_SNV/INS/DEL]
       instead of a vcf key.
-g Require the genotype to match, defaults to scoring a match when the alternate
       allele is present.
-f Only require the position to match, don't consider the alt base or genotype.
-v Use VQSLOD score as ranking statistic in place of the QUAL score.
-s Only compare SNPs, defaults to all.
-n Only compare non SNPs, defaults to all.
-p Provide a full path directory for saving the parsed data. Defaults to not saving.
-e Exclude test and key records whose FILTER field is not . or PASS. Defaults to
       scoring all.
-i Relax matches to key INDELs to include all test variants within x bps.

Example: java -Xmx10G -jar pathTo/USeq/Apps/VCFComparator -a /NIST/NA12878/key.vcf
       -b /NIST/NA12878/regions.bed.gz -c /EdgeBio/Exome/testHaploCaller.vcf.zip
       -d /EdgeBio/Exome/NimbleGenExomeV3.bed -g -v -s -e -p /CompRes/ 

**************************************************************************************

**************************************************************************************
**                              VCF Consensus : June 2017                           **
**************************************************************************************
Merges VCF files with the same #CHROM line. Primary records with the same chrPosRefAlt
as a secondary are saved after appending the ID and FILTER, the secondary
is dropped. Headers are joined keeping the primary header line when the same. Run
iteratively with multiple VCF files you'd like to merge.  Good for combining multiple
variant callers run on the same sample. The ID field lists which callers found each
variant.

Required:
-p Path to a primary vcf file (xxx.vcf(.gz/.zip OK)) to merge.
-s Path to a secondary vcf file (xxx.vcf(.gz/.zip OK)) to merge.
-o Path to an output xxx.vcf.gz file.

Optional:
-q Primary name to replace the ID column.
-t Secondary name to replace the ID column.

Example: java -Xmx4G -jar pathTo/USeq/Apps/VCFConsensus -p illumina.vcf -q Strelka
-s stnd.indel.vcf.gz -t Scalpel -o indelCalls.vcf.gz 

**************************************************************************************

**************************************************************************************
**                            VCF Fdr Estimator  :   Jan 2019                       **
**************************************************************************************
Estimates false discovery rates for each QUAL score in a somatic VCF file by counting
the number of records that are >= to that QUAL score in a matched background VCF. The
estimated FDR = #Bkg/ #Som. In cases where increasingly stringent QUAL thresholds
reduce the nummber of Som records but not Bkg records, the FDR increases. To control
for this inconsistancy, the prior FDR is assigned to the more stringent QUAL, a 'dFDR'.

To generate a matched background VCF file, use the SamReadDepthMatcher app to
subsample a high depth normal bam file to match the read depth over each exon in the
tumor bam file.  Run the same somatic variant calling and filtering workflow used in
generating the real somatic VCF file but substitute the matched depth mock tumor bam
for the real tumor bam. Lastly, use a low stringency set of germline variants
identified in the high depth normal sample to filter out any het and hom variants in
the bkg VCF.

Required Options:
-b Background VCF file (xxx.vcf(.gz/.zip OK)).
-s Somatic VCF file (xxx.vcf(.gz/.zip OK)).
-r VCF file for saving the estimated FDR results.

Example: java -Xmx10G -jar pathTo/USeq/Apps/VCFFdrEstimator -b patient123Bkg.vcf.gz
       -v patient123Somatic.vcf.gz -r FinalVcfs/patient123SomaticFdr.vcf.gz

**************************************************************************************

**************************************************************************************
**                              VCF Merger : Feb 2016                               **
**************************************************************************************
Merges VCF files with the same samples. Collapses the headers with a simple hash. Will
not work well with downstream apps that cannot process mixed INFO and FORMAT records.

Required:
-v Full path to a vcf file (xxx.vcf(.gz/.zip OK)) or directory containing such. Note,
       Java often fails to parse tabix compressed vcf files.  Best to uncompress.

Optional:
-o Full path to an output vcf file, defaults to merged.vcf.gz in parent -v dir.

Example: java -Xmx4G -jar pathTo/USeq/Apps/VCFMerger -v /CancerSamples/

**************************************************************************************

**************************************************************************************
**                          VCF Mpileup Annotator : April 2019                      **
**************************************************************************************
VMA estimates the AF and DP of a vcf record from a single sample mpileup file.  It 
replaces the AF or DP INFO values in the vcf records if present. For INDELs, the
region effected is scanned and the maximum AF and DP assigned. Provide the max memory
available to the app to maximize cpu usage.

Required:
-v Path to a xxx.vcf(.gz/.zip OK) file.
-m Path to a bgzip compressed and tabix indexed single sample mpileup file. e.g.:
      1) Mpileup: 'samtools mpileup -B -R -A -d 1000000 -f $fastaIndex -l
         $bedFile $indexedBamFile > bam.mpileup'
      2) Bgzip: 'tabix-0.2.6/bgzip bam.mpileup'
         Tabix: 'tabix-0.2.6/tabix -s 1 -b 2 -e 2 bam.mpileup.gz'
-o Path to a xxx.vcf.gz file to save the modified vcf records.

Optional:
-q Minimum mpileup sample bp quality, defaults to 0
-e Number of decimals in the AF, defaults to 4
-d Print verbose debugging output.
-t Number of threads to use, defaults to all/ 5GB.

Example: java -Xmx64G -jar pathTo/USeq/Apps/VCFMpileupAnnotator -v spikes.vcf -q 13
-m bam.mpileup.gz -o spikes.mod.vcf.gz

**************************************************************************************

**************************************************************************************
**                           Vcf Mutant Maker  : June 2015                          **
**************************************************************************************
Generates a VCF file of random SNVs and INDELs for BamBlaster. Only one variant is
made per input region. Be sure to left align this vcf file before using, see
https://github.com/atks/vt .

Required:
-b Full path bed file containing regions in which to create mutations.
-f Path to a directory of chrom specific reference fasta files (gzip/zip OK, 
      e.g. 1.fa.gz, 2.fa.gz, X.fa.gz...)

Optional:
-s Number of SNVs to make, defaults to 1/2 the number of regions.
-i Number of INDELs to make, ditto.
-m Max indel size, defaults to 10.
-v VCF results file, defaults to a permutation of the bed file.

Example: java -Xmx10G -jar pathTo/USeq/Apps/VCFMutantMaker -b ~/Sim/highCov.bed 
     -f ~/Sim/human_g1k_v37.fasta.gz -m 12 -v ~/Sim/unNormalizedVMM.vcf

**************************************************************************************

**************************************************************************************
**                             VCFNoCallFilter: April 2015                          **
**************************************************************************************
Parses multi sample VCF records for too many no call or low Genotype Quality records.
Good for removing records where the background is poorly called.

Required Options:
-v Full path file or directory containing xxx.vcf(.gz/.zip OK) file(s).
-m Maximum number no call or low GQ samples to pass a record
-b Beginning sample index, zero based, included, defaults to 0
-e Ending sample index, not included, defaults to last
-g Minimum Genotype Quality GQ, defaults to 13

Example: java -jar pathToUSeq/Apps/VCFNoCallFilter -v /VCFFiles/ -m 5 -b 15 -e 83 

**************************************************************************************

**************************************************************************************
**                              VCF Region Filter : Dec 2015                        **
**************************************************************************************
Sorts each vcf record (based on its position and max length of the alt or ref)
by intersection with a chr start stop....bed file.

Required Params:
-v VCF file or directory containing such (xxx.vcf(.gz/.zip OK)) to parse
-b Bed file of regions (chr start stop ...), interbase coordinates 
       (xxx.bed(.gz/.zip OK)) to intersect

Optional Params:
-s Save directory for the modified vcfs
-p Pad bed start stop by this # bps, defaults to 0

Example: java -Xmx9G -jar pathTo/USeq/Apps/VCFRegionFilter -v Vcfs/ 
       -b /Anno/offTargetRegions.bed.gz -p 10 

**************************************************************************************

**************************************************************************************
**                              VCF Region Marker : May 2016                        **
**************************************************************************************
Intersects each vcf record (based on its position and max length of the alt or ref)
with a chr start stop text.... bed file(s). For those that intersect, the bed text is 
added to the vcf FILTER field. Multiple hits are concatinated.

Required Params:
-v VCF file or directory containing such (xxx.vcf(.gz/.zip OK)) to parse
-b Bed file(s) of regions (minimum chr start stop text), interbase coordinates 
       (xxx.bed(.gz/.zip OK)) to intersect, comma delimit multiple files, no spaces.

Optional Params:
-s Save directory for the modified vcfs
-p Pad bed start stop by this # bps, defaults to 0
-c Clear starting FILTER field

Example: java -Xmx9G -jar pathTo/USeq/Apps/VCFRegionMarker -v testHaploCaller.vcf.zip
       -b /Anno/offTargetRegions.bed.gz,/Anno/pseudogene.bed -p 10 -s MarkedVcfs -c 

**************************************************************************************

**************************************************************************************
**                            VCF Reporter: April 2013                              **
**************************************************************************************
This application takes a VCF file as input and returns either a modified VCF file or a
tab-delimited text file containing user-specified and optionally formatted INFO 
fields.  The modified VCF file is useful if you want to view annotations in IGV.  The
standard set of INFO fields is quite large and can't fit in the IGV window. The tab-
delimited text file allows the annotations to be viewing in Excel for easier sorting 
and filtering. If the number of VCF records is greater than 500,000, the reporting 
will be done in chunks.  The chunks are merged and compressed automatically at the end
of the application.

Required:
-v VCF file. Full path to a multi sample vcf file (xxx.vcf(.gz/.zip OK)).
-o Output file.  Full path to the output file. If xxx.txt is specified, output will 
      be a tab-delimited text file.  If xxx.vcf is specified, output will be an 
      uncompressed vcf file.  If xxx.vcf.gz is specifed, output will be a tabix 
      compressed and indexed vcf file.

Optional:
-d Desired Columns.  A comma-separated list of INFO-field names that will be reported
      in the output vcf.
-u Unwanted Columns. A comma-separated list of INFO-field names that will not be 
      reported in the output vcf.
-r Reporting Style. INFO field styles.  Only two styles are currently supported, 
      'unmodified' and 'short'. Unmodified is used by default.  Short truncates some
      of the longer fields, which help visibility in IGV.
-a Annotations only.  Report standard annotations using the 'short' reporting style.
      Skip INFO fields reported by GATK to reduce clutter.  The skipped fields are 
      used by GATK to determine variation quality and might not be useful to the 
      general user.
-x Damaging only.  Only report nonsynonymous, frameshift or splicing variants.
-p Path to tabix directory.  Set this variable if the application is not run on 
      moab/alta

Tab-delimited only options:
-k Generate key.  Text document that lists descriptions of each column in the output
      table.


Example: java -Xmx10G -jar pathTo/USeq/Apps/VCFReporter -v 9908R.vcf 
      -d SIFT,LRT,MT,MT_P -r short -o 9908.ann.txt 

**************************************************************************************

**************************************************************************************
**                              VCF Selector : April 2019                           **
**************************************************************************************
Selects variants for injection with the BamBlaster tool. Prioritizes variants by
position, key vs other, and average read depth.  First run the VCFMpileupAnnotator
and the bam files you plan on modifying.

Required Params:
-b Bed file of regions (chr start stop ...) to use in selecting intersecting vcf 
     records, (xxx.bed(.gz/.zip OK)).
-k VCF file (xxx.vcf(.gz/.zip OK)) of key priority variants to attempt to inject
     first, e.g. annotated as pathogenic.
-s Save directory for the modified vcfs

Optional Params:
-o VCF file (xxx.vcf(.gz/.zip OK)) of other variants to attempt to include when 
     priority variants aren't present.
-p BP distance to keep between vcf records, defaults to 150

Example: java -Xmx9G -jar pathTo/USeq/Apps/VCFSelector -b GBMTargets.bed.gz -p 160 
     -k pathoClinvarSNVs.vcf.gz -o allCosmicClinvarSNVs.vcf.gz 

**************************************************************************************

**************************************************************************************
**                            VCF Splice Scanner : Sept 2018                        **
**************************************************************************************
Scores variants for changes in splicing using the MaxEntScan algorithms. See Yeo and
Burge 2004, http://www.ncbi.nlm.nih.gov/pubmed/15285897 for details. Known splice
acceptors and donors are scored for loss of a junction.  Exonic, intronic, and splice
bases are scanned for novel junctions in a window around each variant. See the vcf
INFO header for a description of the output. Use this information to identify 
snv and indel variants that may effect splicing.

Required Options:
-v Path to a vcf file to annotate (xxx.vcf(.gz/.zip OK)).
-r Name of a gzipped vcf file to use in saving the results, will over write.
-f Path to the reference fasta with xxx.fai index
-u UCSC RefFlat or RefSeq transcript (not merged genes) file, full path. See RefSeq 
       http://genome.ucsc.edu/cgi-bin/hgTables, (uniqueName1 name2(optional) chrom
       strand txStart txEnd cdsStart cdsEnd exonCount (commaDelimited)exonStarts
       (commaDelimited)exonEnds). Example: ENSG00000183888 C1orf64 chr1 + 16203317
       16207889 16203385 16205428 2 16203317,16205000 16203467,16207889 .
-m Full path directory name containing the me2x3acc1-9, splice5sequences and me2x5
       splice model files. See USeqDocumentation/splicemodels/ or 
       http://genes.mit.edu/burgelab/maxent/download/ 

Optional options:
-x Export category for adding info to the vcf file, defaults to 2:
       0 All types (gain or damaged in exon, intron, and splice)
       1 Just report damaged splices
       2 Report damaged splices and novel splices in exons and splice junctions.
-a Minimum new splice junction score in Alt, max score in Ref, defaults to 3.
-b Minimum new splice junction score difference, new - refseq, defaults to 1.
-c Maximum damaged splice junction score, defaults to 3.
-d Minimum damaged score difference, refseq - new, defaults to 1.
-s Format vcf with minimal output for downstream annotators, e.g. snpEff.

Example: java -Xmx10G -jar ~/USeq/Apps/VCFSpliceAnnotator -f ~/Hg19/Fa/ -v ~/exm2.vcf
       -m ~/USeq/Documentation/splicemodels -i -u ~/Hg19/hg19EnsTrans.ucsc.zip -r
       ~/ExmSJAnno/exm2VSSAnno.vcf.gz -x 0
**************************************************************************************

**************************************************************************************
**                                VCFTabix: Jan 2013                                **
**************************************************************************************
Converts vcf files to a SAMTools compressed vcf tabix format. Recursive.

Required Options:
-v Full path file or directory containing xxx.vcf(.gz/.zip OK) file(s). Recursive!
-t Full path tabix directory containing the compiled bgzip and tabix executables. See
      http://sourceforge.net/projects/samtools/files/tabix/
-f Force overwriting of existing indexed vcf files, defaults to skipping.
-d Do not delete non gzipped vcf files after successful indexing, defaults to deleting.
-e Only print error messages.

Example: java -jar pathToUSeq/Apps/VCFTabix -v /VarScan2/VCFFiles/
     -t /Samtools/Tabix/tabix-0.2.6/ 

**************************************************************************************

**************************************************************************************
**                               VCF 2 Tsv: February 2019                           **
**************************************************************************************
Converts vcf files' SNVs and INDELs to tsv Illumina CE/CA format. 

Required Options:
-v Full path file or directory containing xxx.vcf(.gz/.zip OK) file(s).
-s Directory to save the tvs files, defaults to the parent of the vcf
-q Minimum QUAL, defaults to 0, no threshold.
-z Ignore QUAL thresholding when BKZ= is absent from INFO field.
-b Max frac BKAFs >= AF, defaults to 1, no threshold.
-i ID column string forcing export. Case insensitive.

Example: java -jar pathToUSeq/Apps/VCF2Tsv -v VCFss/ -q 3 -z -b 0.2 -i Foundation 

**************************************************************************************

**************************************************************************************
**                                Wig2Bar: Oct 2009                                 **
**************************************************************************************
Converts variable step and fixed step xxx.wig(.zip/.gz OK) files to chrom specific
bar files.

-f The full path directory/file text for your xxx.wig(.gz/.zip OK) file(s).
-v Genome version (ie H_sapiens_Mar_2006), get from UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases
-s Skip wig lines with designated value/score.

Example: java -Xmx1500M -jar pathTo/Apps/Wig2Bar -f /WigFiles/ -v hg18 -s 0.0 

**************************************************************************************

**************************************************************************************
**                                Wig 2 USeq: May 2012                              **
**************************************************************************************
Converts variable step, fixed step, and bedGraph xxx.wig/bedGraph4(.zip/.gz OK) files
into stair step/ heat map useq archives. Span parameters are not supported.

-f The full path directory/file text for your xxx.wig(.gz/.zip OK) file(s).
-v Genome version (e.g. H_sapiens_Mar_2006), get from UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases
-s Skip wig lines with designated value/score.
-i Index size for slicing split chromosome data (e.g. # rows per file), defaults to
      100000.
-r Initial graph style, defaults to 1
      0	Bar
      1	Stairstep
      2	HeatMap
      3	Line
-h Initial graph color, hexadecimal (e.g. #6633FF), enclose in quotations!
-d Description, enclose in quotations! 
-p Prepend a 'chr' onto bedGraph chromosomes.

Example: java -Xmx1G -jar path2/Apps/Wig2USeq -f /WigFiles/ -v H_sapiens_Feb_2009

**************************************************************************************

**************************************************************************************
**                        Score Methylated Regions: Dec 2013                        **
**************************************************************************************
For each region finds the underlying methylation data. A p-value (Bon Corr) for each
region's fraction methylated (# nonConObs/ # totalObs) as well as a fold enrichment
can be calculated using regions randomly drawn matched by chromosome, region length,
# obs, and GC content.

Options:
-c Converted PointData directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories. See the NovoalignBisulfiteParser app.
-n Non-converted PointData directories, ditto. 
-r Full path file text for your region of interest file (tab delim: chr start stop).
-g To calculate p-values for methylation enrichment/ reduction,  provide a full path
         directory containing for chromosome specific gc content boolean arrays. See
         the ConvertFasta2GCBoolean app. Complete option -i
-i Likewise, to calculate p-values, also provide a full path file text containing the
         interrogated regions (tab delim: chr start stop ...) to use in drawing
         random regions.
-u Number of random region sets, defaults to 1000.
-m Minimum number of observations in a region to score, defaults to 10.
-o Minimum read coverage to count mC fraction, defaults to 8
-b Minimum number of Cs passing read coverage in region to score, defaults to 1
-p Print only regions that pass thresholds, defaults to all

Example: java -jar pathTo/Apps/ScoreMethylatedRegions -c /Data/Sperm/Converted -n 
      /Data/Sperm/NonConverted -r /Res/miRNARegions.bed -i /Res/interrRegions.bed
       -g /Genomes/Hg18/GCBooleanArrays/

**************************************************************************************

**************************************************************************************
**                        Score Enriched Regions: December 2012                     **
**************************************************************************************
ScoreEnrichedRegions determines if the set of regions specified by the user is more or less
enriched than a randomly generated set of regions matched on chromosome, region
length and GC content.  The software determines if each individual region is more/less enriched
as well as the dataset as a whole.  Individual region p-values are caluculated by comparing
the region fold-enrichment to the fold-enrichment of each randomly generated region.  Aggregate
p-values are calculated by comparing the median fold-enrichment of the user-specified dataset
 to median fold-enrichment each randomly generated datset. The software uses 1000 
randomly generated regions by default. Pseudocounts are added to moderate fold-change values
Options:
-c Conditions directory containing one directory for each condition with one xxx.bam
       file per biological replica and their xxx.bai indexes. The BAM files should be 
       sorted by coordinate using Picard's SortSam.
-r Regions of interest in Bed format (chr, start, stop,...), full path, See,
       http://genome.ucsc.edu/FAQ/FAQformat#format1
-i Interrogated regions in Bed format (chr, start, stop, ...), full path to use in drawing
       random regions.
-g A full path directory containing for chromosome specific gc content boolean arrays. See
       the ConvertFasta2GCBoolean app
-o Full path to the output file

Advanced Options:
-x Max per base alignment depth, defaults to 50000. Genes containing such high
       density coverage are ignored. Warnings are thrown.
-f Psuedocounts to each region.  Defaults to 10

Example: java -Xmx10G -jar pathTo/USeq/Apps/ScoreEnrichedRegions -c
      /Data/TimeCourse/ESCells/ -r regionOfInterest.bed -i sequencedRegions.bed -g gcContent/ 
     -o resultsGoHere.txt 

**************************************************************************************

**************************************************************************************
**                           Somatic Sniper VCF Parser: March 2015                  **
**************************************************************************************
Parses Somatic Sniper VCF files, replacing the QUAl score with the SSC score. Also 
filters for minimum tumor normal read depth, difference in alt allelic ratios, and on
normal alt allelic rations.

Required Options:
-v Full path file or directory containing xxx.vcf(.gz/.zip OK) file(s).
-m Minimum SSC score, defaults to 0.
-a Minimum alignment depth for both tumor and normal samples, defaults to 0.
-r Minimum absolute difference in alt allelic ratios, defaults to 0.
-n Maximum normal alt allelic fraction, defaults to 1.

Example: java -jar pathToUSeq/Apps/SomaticSniperVCFParser -v /VCFFiles/ -m 32 -a 15
      -r 0.25 -n 0.02

**************************************************************************************

**************************************************************************************
**                            Strelka VCF Parser: Dec 2018                          **
**************************************************************************************
Parses Strelka VCF INDEL and SNV files, replacing the QUAl score with the QSI or QSS
score. Also filters for read depth, T/N alt allelic ratio and diff, ref/alt with '.',
and tumor and normal alt allelic ratios. Lastly, it inserts the tumor DP and AF info.
For somatic exome datasets sequenced at >100X unique observation read depth, try the 
tier filtering to select for lists with approx 1%, 3-5%, and 9-15% FDR. Follow the
example.

Required Params:
-v Full path file or directory containing xxx.vcf(.gz/.zip OK) file(s).
-m Minimum QSI or QSS score, defaults to 0.
-e Apply a tuned 100X exome QSI/S stringency tier: 1 (9-15%FDR), 2 (4-6%FDR),
         3 (1-2%FDR), defaults to 0, no tiered filtiering (37-63%FDR).
-t Minimum tumor allele frequency (AF), defaults to 0.
-n Maximum normal AF, defaults to 1.
-u Minimum tumor alignment depth, defaults to 0.
-a Minimum tumor alt count, defaults to 0.
-o Minimum normal alignment depth, defaults to 0.
-d Minimum T-N AF difference, defaults to 0.
-r Minimum T/N AF ratio, defaults to 0.
-p Remove non PASS filter field records.
-s Print spreadsheet variant summary.
-f Directory in which to save the parsed files, defaults to the parent dir of the vcfs.

Example: java -jar pathToUSeq/Apps/StrelkaVCFParser -v /VCFFiles/ -t 0.03 -n 0.6 
-u 30 -o 10 -a 3 -d 0.03 -r 2 -e 1 

**************************************************************************************