Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
training:dudijon:galaxy [2021/12/14 18:12]
slegras [9.2 Run the workflow DNA-seq data analysis]
training:dudijon:galaxy [2022/12/09 15:37] (current)
slegras
Line 3: Line 3:
 ^ Instructor ^ Stephanie Le Gras ^ ^ Instructor ^ Stephanie Le Gras ^
 ^ Duration | 3.5 hours | ^ Duration | 3.5 hours |
-^ Content | {{:​training:​dudijon:​introgalaxy_2020_compressed.pdf|Description of the key features of Galaxy (Lecture)}} |+^ Content | {{:​training:​dudijon:​introgalaxy_2021_compressed.pdf|Description of the key features of Galaxy (Lecture)}} |
 ^ ::: | Practical session on basic features of Galaxy (Hands-on) | ^ ::: | Practical session on basic features of Galaxy (Hands-on) |
 ^ Prerequisites | None | ^ Prerequisites | None |
Line 27: Line 27:
 ++++ ++++
  
-==== - Import data into Galaxy ==== +==== - Import files from your computer to Galaxy ====
-=== - Import files from your computer to Galaxy === +
-  - Download the two files **CRN-107_11-R1.fastq.gz** and **CRN-107_11-R2.fastq.gz** following this [[https://​seafile.igbmc.fr/​d/​345d7581237d4295bf2c/​|link]]. +
-  - Import them to your history called “DNA-seq data analysis”+
  
-  ​* The genome is: Human (hg19) +  - Download the file “**sample.bed.gz**” following this [[https://​seafile.igbmc.fr/​d/​1adaad8f80394182a784/​|link]] ​ and upload it to Galaxy.
-  * The format: <auto detect>​ +
- +
-++++ Answer | +
-  - Click on "​Upload Data"​ +
-  - Drag and drop the two fastq files “**CRN-107_11-R1.fastq**” and "​**CRN-107_11-R2.fastq**"​ +
-  - Select/​Enter Genome for both datasets as: hg19 +
-  - Click on Start +
- +
-{{:​training:​dudijon:​03.1-LaunchDragAndDrop.png?​|}} +
-{{:​training:​dudijon:​03.2-UploadFastqFiles.png?​|}} +
-++++ +
- +
-==== - Remove a dataset ==== +
-=== - Import a file from your computer === +
-  ​- Download the file “**sample.bed.gz**” following this [[https://​seafile.igbmc.fr/​d/​345d7581237d4295bf2c/​|link]] ​ and upload it to Galaxy.+
   * The genome is: Mouse (mm9)   * The genome is: Mouse (mm9)
   * The format is: bed   * The format is: bed
Line 62: Line 44:
 ++++ ++++
  
 +==== - Remove a dataset ====
   - Remove the dataset **sample.bed** from your history by clicking on the button ​   - Remove the dataset **sample.bed** from your history by clicking on the button ​
   - You are told that your history is empty. Look at the size of your history   - You are told that your history is empty. Look at the size of your history
-    - Click on “**deleted**” in the top of the history panel (below the history name). Remove definitely the file from the disk by clicking on "**Permanently remove it from disk**”.+    - Click on “**deleted**” in the top of the history panel (below the history name). Remove definitely the file from the disk by clicking on "**Supprimer définitivement du disque**”.
     - Click on “hide deleted”     - Click on “hide deleted”
  
 ==== - Running a tool ==== ==== - Running a tool ====
 +  - Download the two files **CRN-107_11-R1.fastq.gz** and **CRN-107_11-R2.fastq.gz** following this [[https://​seafile.igbmc.fr/​d/​1adaad8f80394182a784/​|link]].
 +  - Import them to your history called “DNA-seq data analysis”
 +    * The genome is: Human (hg19)
 +    * The format: <auto detect>
 +
 +++++ Answer |
 +  - Click on "​Upload Data"
 +  - Drag and drop the two fastq files “**CRN-107_11-R1.fastq.gz**” and "​**CRN-107_11-R2.fastq.gz**"​
 +  - Select/​Enter Genome for both datasets as: hg19
 +  - Click on Start
 +
 +{{:​training:​dudijon:​03.1-LaunchDragAndDrop.png?​|}}
 +{{:​training:​dudijon:​03.2-UploadFastqFiles.png?​|}}
 +++++
 +
   - Use the tool “FastQC Read Quality reports” to compute quality analysis on the datasets “**CRN-107_11-R1.fastq**” and "​**CRN-107_11-R2.fastq**"​   - Use the tool “FastQC Read Quality reports” to compute quality analysis on the datasets “**CRN-107_11-R1.fastq**” and "​**CRN-107_11-R2.fastq**"​
     - Use default parameters.     - Use default parameters.
Line 95: Line 93:
   * CRN-107_11-R1.fastq   * CRN-107_11-R1.fastq
   * CRN-107_11-R2.fastq   * CRN-107_11-R2.fastq
-  * CaptureDesign_chr4.bed (download it from [[https://​seafile.igbmc.fr/​d/​345d7581237d4295bf2c/|here]])+  * CaptureDesign_chr4.bed (download it from [[https://​seafile.igbmc.fr/​d/​1adaad8f80394182a784/|here]])
  
 Import missing files from the data library "​**DNA-seq test datasets**"​ Import missing files from the data library "​**DNA-seq test datasets**"​
Line 123: Line 121:
     - Limit analysis to regions in this BED dataset: CaptureDesign_chr4.bed     - Limit analysis to regions in this BED dataset: CaptureDesign_chr4.bed
   - __SnpEff__ Variant effect and annotation   - __SnpEff__ Variant effect and annotation
-    - Sequence changes (SNPs, MNPs, InDels): **output of GATK Haplotype Caller ​(VCF)**+    - Sequence changes (SNPs, MNPs, InDels): **output of FreeBayes ​(VCF)**
     - Input format: VCF     - Input format: VCF
     - Output format: VCF (only if input is VCF)     - Output format: VCF (only if input is VCF)
Line 143: Line 141:
 === - Rename the workflow "​DNA-seq data analysis"​ === === - Rename the workflow "​DNA-seq data analysis"​ ===
 ++++ Answer | ++++ Answer |
-{{:​training:​dudijon:​04-manageworkflow.png?|}}+{{:​training:​dudijon:​05-editorrunworklow.png?|}}
  
-Now your can edit or run the workflow: 
- 
-{{:​training:​dudijon:​05-editorrunworklow.png?​|}} 
 ++++ ++++
  
Line 174: Line 169:
  
   - __Samtools flagstat__ to compute mapping statistics (after BWA mem)   - __Samtools flagstat__ to compute mapping statistics (after BWA mem)
-  - __Filter__ ​to select aligned reads with a mapping quality >= 20 (after MarkDuplicates)+  - __Filter SAM or BAM, output SAM or BAM__ to select aligned reads with a mapping quality >= 20 (after MarkDuplicates)
   - __Samtools flagstat__ to compute mapping statistics after removing reads with low mapping qualities (after Filter)   - __Samtools flagstat__ to compute mapping statistics after removing reads with low mapping qualities (after Filter)
  
Line 180: Line 175:
   - __Flagstat__ tabulate descriptive stats for BAM dataset   - __Flagstat__ tabulate descriptive stats for BAM dataset
     - BAM File to Convert: **output of BWA mem**     - BAM File to Convert: **output of BWA mem**
-  - __Filter__ ​BAM datasets ​on a variety of attributes +  - __Filter SAM or BAM, output SAM or BAM__ files on FLAG MAPQ RG LN or by region 
-    - BAM dataset(s) ​to filter: **output of Picard MarkDuplicates** +    - SAM or BAM file to filter: **output of Picard MarkDuplicates** 
-    - Select BAM property to filter on: mapQuality +    - Minimum MAPQ quality ​score: **20**
-      - Filter on read mapping ​quality ​(phred scale): **>=20** (this exact expression, including ">​="​!)+
   - __Flagstat__ tabulate descriptive stats for BAM dataset   - __Flagstat__ tabulate descriptive stats for BAM dataset
     - BAM File to Convert: **output of Filter**     - BAM File to Convert: **output of Filter**
Line 215: Line 209:
  
 ++++ Answer | ++++ Answer |
-561598 - 531417 ​30181+561598 - 530355 ​31243
 ++++ ++++