Introduction to Galaxy
Instructor | Stephanie Le Gras |
---|---|
Duration | 3.5 hours |
Content | Description of the key features of Galaxy (Lecture) |
Practical session on basic features of Galaxy (Hands-on) | |
Prerequisites | None |
1 Log in to GalaxEast
- Go to Galaxy France website
- Log in with your account.
2 History
2.1 Create a new history
2.2 Change the name of the history to “DNA-seq data analysis"
3 Import files from your computer to Galaxy
- Download the file “sample.bed.gz” following this link and upload it to Galaxy.
- The genome is: Mouse (mm9)
- The format is: bed
4 Remove a dataset
- Remove the dataset sample.bed from your history by clicking on the button
- You are told that your history is empty. Look at the size of your history
- Click on “deleted” in the top of the history panel (below the history name). Remove definitely the file from the disk by clicking on “Supprimer définitivement du disque”.
- Click on “hide deleted”
5 Running a tool
- Download the two files CRN-107_11-R1.fastq.gz and CRN-107_11-R2.fastq.gz following this link.
- Import them to your history called “DNA-seq data analysis”
- The genome is: Human (hg19)
- The format: <auto detect>
- Use the tool “FastQC Read Quality reports” to compute quality analysis on the datasets “CRN-107_11-R1.fastq” and “CRN-107_11-R2.fastq”
- Use default parameters.
What is the quality encoding of the two fastq files?
6 Running tools without a workflow
Analyze CRN-107 data from reads to variant annotation.
Run the following tools:
- BWA mem to align reads to the reference genome
- Picard markduplicates to identify duplicated reads
- Freebayes to detect variants
- snpEff to annotate variants
To run the tools you will need the following files:
- CRN-107_11-R1.fastq
- CRN-107_11-R2.fastq
- CaptureDesign_chr4.bed (download it from here)
Import missing files from the data library “DNA-seq test datasets”
Here are the parameters to use for each of the tools. All parameters not mentioned are to be used with default values.
- Map with BWA-MEM - map medium and long reads (> 100 bp) against reference genome
- Using reference genome: hg19
- Single or Paired-end reads: Paired
- Select first set of reads: CRN-107_11-R1.fastq
- Select second set of reads: CRN-107_11-R2.fastq.
- Set read groups information? Set read groups (Picard style)
- Read group identifier (ID): Auto-assign Yes
- Read group sample name (SM): Auto-assign Yes
- Library name (LB): Auto-assign Yes
- Platform/technology used to produce the reads (PL): ILLUMINA
- Platform unit (PU): HS026.2
- Sequencing center that produced the read (CN): Genomeast
- Description (DS): CRN-107
- Predicted median insert size (PI): 250
- Date that run was produced (DT): 2017-12-13
- MarkDuplicates examine aligned records in BAM datasets to locate duplicate molecules.
- Select SAM/BAM dataset or dataset collection: output of BWA mem
- Select validation stringency: Silent
- FreeBayes bayesian genetic variant detector
- BAM or CRAM dataset: output (bam) of markduplicates
- Using reference genome: hg19
- Limit analysis to regions in this BED dataset: CaptureDesign_chr4.bed
- SnpEff Variant effect and annotation
- Sequence changes (SNPs, MNPs, InDels): output of FreeBayes (VCF)
- Input format: VCF
- Output format: VCF (only if input is VCF)
- Genome source: Downloaded on demand
- Snpff Genome Version Name (e.g. GRCh38.86): hg19
- VCFtoTab-delimited: Convert VCF data into TAB-delimited format
- Select VCF dataset to convert: output of SnpEff
- How many variants are called?
7 Create a workflow out of an existing history
One can create a workflow from an existing history going to the history button and selecting “Extract Workflow”.
7.1 Extract a workflow out of the history called "DNA-seq data analysis"
7.2 Rename the workflow "DNA-seq data analysis"
8 Edit a workflow with the workflow editor
8.1 Open the workflow editor with the workflow "DNA-seq data analysis"
8.2 Add steps to the workflow
Your workflow should look like this before editing:
Add the following tools:
- Samtools flagstat to compute mapping statistics (after BWA mem)
- Filter SAM or BAM, output SAM or BAM to select aligned reads with a mapping quality >= 20 (after MarkDuplicates)
- Samtools flagstat to compute mapping statistics after removing reads with low mapping qualities (after Filter)
Here are the parameters to use for each of the tools:
- Flagstat tabulate descriptive stats for BAM dataset
- BAM File to Convert: output of BWA mem
- Filter SAM or BAM, output SAM or BAM files on FLAG MAPQ RG LN or by region
- SAM or BAM file to filter: output of Picard MarkDuplicates
- Minimum MAPQ quality score: 20
- Flagstat tabulate descriptive stats for BAM dataset
- BAM File to Convert: output of Filter
The final workflow should look like this (new tools are in black boxes):
Save the workflow once you are done editing it:
9 Run a workflow
9.1 Import files
Import the following files from the data library “DNA-seq test datasets” to a new history:
- CRN-107_11-R1.fastq
- CRN-107_11-R2.fastq
- CaptureDesign_chr4.bed