Introduction to Ensembl Genome Browser/Biomart

Instructor Stephanie Le Gras
Duration 2 hours
Content |Introduction to Ensembl Genome Browsers/Biomart (Lecture)
Practical session on basic features of Ensembl Genome Browsers/Biomart (Hands-on)
Prerequisites None

1.1 Question

  • What is the assembly version of the human genome available on the current Ensembl release?

Answer

  • How many coding genes does the Human genome contain in the current Ensembl release (v103)?

Answer

  • How many coding genes does the Human genome contain in the previous Ensembl release (v75)?

Answer

1.2 Question

  • What is the version of the genome assembly of the Naked mole-rat (female)? (Heterocephalus glaber)

Answer

  • How many genes were predicted by Genscan?

Answer

1.3 Question

  • Find out the size of the human chromosome 4 for assembly hg19 (GRCh37).

Answer

1.4 Question

Go back to current genome assembly (GRCh38).

  • How many alternative transcript gene BBS5 has (Human)

Answer

  • How many exons does transcript BBS5-205 have?

Answer

  • How many coding exons?

Answer

Comment – Go back to the “Gene: BBS5” tab and have a look at the colors of transcripts BBS5-201 and BBS5-202 in the genome browser view

1.5 Question

  • Retrieve this page

Answer

1.6 Question

  • Retrieve this page

Answer

1.7 Question

Display RNAseq data for breast and skeletal muscle in the genome browser (in BRCA1 gene region).

Answer

2.1 Question

2.1.1 Question

Using Ensembl/BioMart, retrieve all transcripts IDs and the gene ID of IDH1 gene (human). How many transcripts the gene IDH1 has? Use Ensembl Gene v103, for Human GRCh38.p13

  • Click on Filters :
    • Expand the GENE section
    • Select « Input external references ID list »
    • Select HGNC symbol(s) in the drop down menu
    • Enter IDH1 in the text box
  • Click on Attributes :
    • Select “Features” (top panel, selected by default)
    • Select Gene stable ID, Transcript stable ID, Gene Name

Answer

2.1.2 Question

Extract all exon sequences of the IDH1 gene in fasta format. Headers will contain the Gene names, Transcript stable IDs and Exon stable IDs.

Answers

2.1.3 Question

Extract all coding sequences of the IDH1 gene in fasta format. Headers will contain the Transcript stable IDs and Exon stable IDs.

Answer

2.1.4 Question

Retrieve GO-terms associated to the IDH1 gene (select GO Term Name, GO domain and GO Term Accession along with Gene stable ID, Transcript stable ID and Gene Name).

Answer

2.1.5 Question

Retrieve the germline variations found in this gene. Annotations to be found (Variant Name, Variant Alleles, Minor allele frequency, Chromosome/scaffold name, Chromosome/scaffold position start (bp), Chromosome/scaffold position end (bp), Variant Consequence along with Gene stable ID, Transcript stable ID and Gene Name).

Answer

2.2 Question

We have run an RNA-seq experiment and we have extracted upregulated genes. We are using RNAseq data from :

Strub, T., Giuliano, S., Ye, T., Bonet, C., Keime, C., Kobi, D., Le Gras, S., Cormont, M., Ballotti, R., and Bertolotto, C. (2011). Essential role of microphthalmia transcription factor for DNA replication, mitosis and genomic stability in melanoma. Oncogene 30, 2319–2332.

In this study, they compared the transcriptome in melanoma cell lines between cells with an siRNA against MITF or an siRNA against Luciferase (used as a control). The data are given as a TSV (Tab-Separated Values) file which contain the number of reads per genes. Genes are identified by their Ensembl gene IDs.

Data have been analyzed using the Human genome hg38/GRCh38 - Ensembl v84.

Here are the different column of the file to be analyzed:

Gene id Ensembl gene ID
siLuc2 Raw read counts - control - 1st biological replicate
siLuc3 Raw read counts - control - 2nd biological replicate
siMitf3 Raw read counts - siMITF - 1st biological replicate
siMitf4 Raw read counts - siMITF - 2nd biological replicate
norm.siLuc2 Normalized read counts - control - 1st biological replicate
norm.siLuc3 Normalized read counts - control - 2nd biological replicate
norm.siMitf3 Normalized read counts - siMITF - 1st biological replicate
norm.siMitf4 Normalized read counts - siMITF - 2nd biological replicate

2.2.1 Question

Download and uncompress the file siMitfvssiLuc.up.txt.zip to extract gene annotations using Ensembl/BioMart for those genes. Use the column Gene ID to extract annotations. Annotations to extract are : gene IDs, chromosome, start of gene, end of gene, strand, Associated Gene Name, gene type.

Answer

2.2.2 Question

You want to run a de novo motif discovery on all promoters of the upregulated genes (the ones from the file siMitfvssiLuc.up.txt). Extract the promoter sequences of all up-regulated genes: retrieve the 200nt upstream of the transcripts of these genes.

Answer

2.3 Question

2.3.1 Question

How many genes are located in the genomic region: 2:208226227-208276270.

Answer

2.3.2 Question

Extract the coordinates of all human genes located on chromosomes (exclude scaffolds). Information to extract for each gene: Gene stable ID, Chromosome/scaffold name, Gene Start (bp), Gene End (bp), strand and Gene Name.

Answer

2.4 Question

The following is a list of 11 IDs of human proteins from the NCBI RefSeq database:

NP_001218
NP_203125
NP_203124
NP_203126
NP_001007233
NP_150636
NP_150635
NP_001214
NP_150637
NP_150634
NP_150649

Generate a list that shows to which Gene stable IDs and to which Gene names these RefSeq IDs correspond. Do these 11 proteins correspond to 11 genes?

Answer

2.5 Question

Forrest et al performed a microarray analysis of peripheral blood mononuclear cell gene expression in benzene-exposed workers (Environ Health Perspect. 2005 June; 113(6): 801–807). The microarray used was the human Affymetrix U133A/B (also called U133 plus 2) GeneChip. The top 8 up-regulated probe-sets were:

207630_s_at
221840_at
219228_at
204924_at
227613_at
223454_at
228962_at
214696_at

2.5.1 Question

Retrieve for the genes corresponding to these probe-sets the Gene and Transcript stable IDs as well as their Gene names and descriptions.

Answer

2.5.2 Question

In order to be able to study these human genes in mouse, identify their mouse orthologues. Also retrieve the genomic coordinates of these orthologues.

Answer

Some exercices are taken from Ensembl tutorials.