Introduction to Ensembl Genome Browser/Biomart
Instructor | Stephanie Le Gras |
---|---|
Duration | 2 hours |
Content | |Introduction to Ensembl Genome Browsers/Biomart (Lecture) |
Practical session on basic features of Ensembl Genome Browsers/Biomart (Hands-on) | |
Prerequisites | None |
1 Practical session: Ensembl Genome Browser
1.1 Question
- What is the assembly version of the human genome available on the current Ensembl release?
- How many coding genes does the Human genome contain in the current Ensembl release (v103)?
- How many coding genes does the Human genome contain in the previous Ensembl release (v75)?
1.2 Question
- What is the version of the genome assembly of the Naked mole-rat (female)? (Heterocephalus glaber)
- How many genes were predicted by Genscan?
1.3 Question
- Find out the size of the human chromosome 4 for assembly hg19 (GRCh37).
1.4 Question
Go back to current genome assembly (GRCh38).
- How many alternative transcript gene BBS5 has (Human)
- How many exons does transcript BBS5-205 have?
- How many coding exons?
Comment – Go back to the “Gene: BBS5” tab and have a look at the colors of transcripts BBS5-201 and BBS5-202 in the genome browser view
1.5 Question
1.6 Question
1.7 Question
2 Practical session: Biomart
2.1 Question
2.1.1 Question
Using Ensembl/BioMart, retrieve all transcripts IDs and the gene ID of IDH1 gene (human). How many transcripts the gene IDH1 has? Use Ensembl Gene v103, for Human GRCh38.p13
- Click on Filters :
- Expand the GENE section
- Select « Input external references ID list »
- Select HGNC symbol(s) in the drop down menu
- Enter IDH1 in the text box
- Click on Attributes :
- Select “Features” (top panel, selected by default)
- Select Gene stable ID, Transcript stable ID, Gene Name
2.1.2 Question
Extract all exon sequences of the IDH1 gene in fasta format. Headers will contain the Gene names, Transcript stable IDs and Exon stable IDs.
2.1.3 Question
Extract all coding sequences of the IDH1 gene in fasta format. Headers will contain the Transcript stable IDs and Exon stable IDs.
2.1.4 Question
Retrieve GO-terms associated to the IDH1 gene (select GO Term Name, GO domain and GO Term Accession along with Gene stable ID, Transcript stable ID and Gene Name).
2.1.5 Question
Retrieve the germline variations found in this gene. Annotations to be found (Variant Name, Variant Alleles, Minor allele frequency, Chromosome/scaffold name, Chromosome/scaffold position start (bp), Chromosome/scaffold position end (bp), Variant Consequence along with Gene stable ID, Transcript stable ID and Gene Name).
2.2 Question
We have run an RNA-seq experiment and we have extracted upregulated genes. We are using RNAseq data from :
Strub, T., Giuliano, S., Ye, T., Bonet, C., Keime, C., Kobi, D., Le Gras, S., Cormont, M., Ballotti, R., and Bertolotto, C. (2011). Essential role of microphthalmia transcription factor for DNA replication, mitosis and genomic stability in melanoma. Oncogene 30, 2319–2332.
In this study, they compared the transcriptome in melanoma cell lines between cells with an siRNA against MITF or an siRNA against Luciferase (used as a control). The data are given as a TSV (Tab-Separated Values) file which contain the number of reads per genes. Genes are identified by their Ensembl gene IDs.
Data have been analyzed using the Human genome hg38/GRCh38 - Ensembl v84.
Here are the different column of the file to be analyzed:
Gene id | Ensembl gene ID |
---|---|
siLuc2 | Raw read counts - control - 1st biological replicate |
siLuc3 | Raw read counts - control - 2nd biological replicate |
siMitf3 | Raw read counts - siMITF - 1st biological replicate |
siMitf4 | Raw read counts - siMITF - 2nd biological replicate |
norm.siLuc2 | Normalized read counts - control - 1st biological replicate |
norm.siLuc3 | Normalized read counts - control - 2nd biological replicate |
norm.siMitf3 | Normalized read counts - siMITF - 1st biological replicate |
norm.siMitf4 | Normalized read counts - siMITF - 2nd biological replicate |
… | … |
2.2.1 Question
Download and uncompress the file siMitfvssiLuc.up.txt.zip to extract gene annotations using Ensembl/BioMart for those genes. Use the column Gene ID to extract annotations. Annotations to extract are : gene IDs, chromosome, start of gene, end of gene, strand, Associated Gene Name, gene type.
2.2.2 Question
You want to run a de novo motif discovery on all promoters of the upregulated genes (the ones from the file siMitfvssiLuc.up.txt). Extract the promoter sequences of all up-regulated genes: retrieve the 200nt upstream of the transcripts of these genes.
2.3 Question
2.3.1 Question
How many genes are located in the genomic region: 2:208226227-208276270.
2.3.2 Question
Extract the coordinates of all human genes located on chromosomes (exclude scaffolds). Information to extract for each gene: Gene stable ID, Chromosome/scaffold name, Gene Start (bp), Gene End (bp), strand and Gene Name.
2.4 Question
The following is a list of 11 IDs of human proteins from the NCBI RefSeq database:
NP_001218
NP_203125
NP_203124
NP_203126
NP_001007233
NP_150636
NP_150635
NP_001214
NP_150637
NP_150634
NP_150649
Generate a list that shows to which Gene stable IDs and to which Gene names these RefSeq IDs correspond. Do these 11 proteins correspond to 11 genes?
2.5 Question
Forrest et al performed a microarray analysis of peripheral blood mononuclear cell gene expression in benzene-exposed workers (Environ Health Perspect. 2005 June; 113(6): 801–807). The microarray used was the human Affymetrix U133A/B (also called U133 plus 2) GeneChip. The top 8 up-regulated probe-sets were:
207630_s_at
221840_at
219228_at
204924_at
227613_at
223454_at
228962_at
214696_at
2.5.1 Question
Retrieve for the genes corresponding to these probe-sets the Gene and Transcript stable IDs as well as their Gene names and descriptions.
2.5.2 Question
In order to be able to study these human genes in mouse, identify their mouse orthologues. Also retrieve the genomic coordinates of these orthologues.
Some exercices are taken from Ensembl tutorials.