Introduction to chIP-seq data analysis

Instructor Stephanie Le Gras
Duration 4 hours
Content Analyzing chIP-seq data (Lecture)
Hands-on are being run using Galaxy (Practical)
link to UCSC
Prerequisites Introduction to Galaxy training
Basic knowledge on chIP-seq experiments

We are using chIPseq data from :

Strub, T., Giuliano, S., Ye, T., Bonet, C., Keime, C., Kobi, D., Le Gras, S., Cormont, M., Ballotti, R., and Bertolotto, C. (2011). Essential role of microphthalmia transcription factor for DNA replication, mitosis and genomic stability in melanoma. Oncogene 30, 2319–2332.

In this study, they did a chIPseq on the transcription factor MITF in melanoma cell lines (501Mel).

We have 2 datasets:

  • MITF
  • Control

During the pratical session, we are going to use the GalaxEast platform.

1 Question

1.1 Visualize the WIG files for mitf and the control into UCSC

Go to UCSC web site.

Files are :

(to download, click on Télécharger)

To download personal tracks to UCSC, go to My Data (top menu)/Custom Tracks.

Select the right genome assembly before uploading your data. Data were aligned to the Human genome hg19.

Answer

1.2 Go to chromosome 2 in the Genome Browser

Today's analysis is limited to chr2 data only.

&nolink|

1.3 Change display mode for each track from dense to full

To change display mode of tracks you can :

  • right click on the track you want to change display mode and select the required display mode
  • scroll down and go below the plot in the section “Custom tracks” to change the display mode of the two uploaded tracks.

Go to check the genes:

  • ANKRD30BL
  • CFAP221
  • DBI

Do you see peaks at this locations?

2 Question

2.1 Go to GalaxEast

2.2 Log in to GalaxEast

2.3 Create a new history called « ChIP-seq data analysis »

2.4 Import datasets from the "Shared Data" top menu. Data are in « Chip seq test dataset (chr2) ».

Go to Shared Data/Data Librairies/Chip seq test dataset (chr2). Import the two datasets.

Answer

3 Question

3.1 Run MACS 1.4.2 on the data using MITF (2) and control datasets as inputs.

Use default parameters except for:

  • tag size (54)
  • Effective genome size (75% of the size of chr2: 182400000)

Answer

3.2 Take a look at the result files

What is the fragment length estimated by MACS? How many peaks are called?

Answer

3.3 Have a look at the different MACS result files

4 Question

4.1 Re-run MACS using changed parameters

To rerun a tool with the same parameters, click on the button with the two rounded arrows in one of the datasets generated by MACS.

In the tool form, change only the parameters as such:

  • Do not build the shifting model
  • Arbitrary shift size: 100

How many peaks are found now?

Answer

5 Question

5.1 Annotate the peaks with Homer annotatePeaks

Now, that we have a list of regions bound by the protein, we would like to know what are the genes nearby MITF peaks. This is done using Homer annotatePeaks.

Tips

  • Use the (peaks: bed) dataset generated by Macs as input of this step.
  • the only parameter to change is the genome version which as to be set to hg19

Answer

6 De novo motif discovery

We are going to run the de novo motif discovery in regions +/- 40 nucleotides around the summits of peaks detected by MACS. The tool (MEME) we are going to use to run the analysis needs the nucleotide sequences as input. So far, we have the genomic coordinates of the peak summits (1 nucleotide long). To get the right input to MEME, we need:

  • To compute the genomic coordinates of the peak summits +/-40
  • To extract the nucleotide sequences from previous coordinates

6.1 Upload the datasets with the positions of peak summits generated by Macs to Galaxy.

Use the dataset generated by Macs with peak summits (html dataset).

Export it out of Galaxy and upload it to Galaxy.

Tips:

  • Data type: bed
  • Genome: hg19

Answer

6.2 Compute the coordinates of the peak summits +-40

Tips:

  • Use the utility called “SlopBed”
  • Use the chromosome length file hg19.len from the data library “Chromosome length”

Answer

6.3 Extract the fasta sequences

Tips:

  • Use the utility called “Extract Genomic DNA using coordinates from assembled/unassembled genomes”
  • Use the dataset with coordinates of peak summits +/-40 as input

Answer

6.4 Run MEME

Use MEME 4.8.0!

Tips:

  • Use default parameters except for:
    • Search for 2 motifs
    • Width of motifs should be between 6 and 12
    • E-value to stop looking for motifs : 1
    • I certify that I am not using this tool for commercial purposes.: Yes
  • To display additional parameters select Advanced in the “Options Configuration” drop down list.

Answer