Blog

How to extract promoters positions

Introduction

In this post, I will show how easy it is to extract the genomic positions of every promoters of a specific genome build.

For this demo, you will need the TxDb.Hsapiens.UCSC.hg19.knownGene package:

require(TxDb.Hsapiens.UCSC.hg19.knownGene)
# To avoid have to type the whole package name every time, we use the variable name txdb
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

TL;DR

promoters(genes(txdb), upstream = 1500, downstream = 500)...
Read more about How to extract promoters positions

MySQL vs MongoDB

Introduction

The relational databases held the leadership for decades and at that time the choice was quite obvious, either MySQL, Oracle, or MS SQL, just to name a few. They’ve served as a basis for tons of enterprise applications, while modern apps require more diversity and scalability. Non-relational databases, like MongoDB, have appeared to meet the existing requirements and replace current relational environment.

This post originally appeared on DA-14 website. Read the ...

Read more about MySQL vs MongoDB

The Human Epigenome Roadmap

Aside from the occasional somatic mutation, the genome of every cell in an individual’s body is largely preserved. Yet different types of cells (and tissues, and organs) are incredibly diverse. The majority of that specialization is governed by epigenetic changes — histone modifications, DNA accessibility, and methylation — that influence when and how genes are expressed.

Our knowledge of the epigenome has lagged well behind our knowledge of the genome, partly because it’s been difficult to study. The application of next-gen sequencing to RNA libraries (RNA-Seq),...

Read more about The Human Epigenome Roadmap

MSigDB relate TFs to Genes

collection: Motif gene sets

Gene sets representing potential targets of regulation by transcription factors or microRNAs. The sets consist of genes grouped by short sequence motifs they share in their non-protein coding regions. The motifs represent known or likely cis-regulatory elements in promoters and 3'-UTRs. These gene sets make it possible to link changes in an expression profiling experiment to a putative cis-regulatory element. The C3 collection is divided into two sub-collections: microRNA targets (MIR) and transcription factor targets (TFT). 
... Read more about MSigDB relate TFs to Genes

dbNSFP

Database type:          variant
Number of records:      89,617,785
Distinct variants:      84,484,850
Reference genome hg18:  chr, hg18_pos, ref, alt
Reference genome hg19:  chr, pos, ref, alt

Field:                  chr
Type:                   string
Comment:                Chromosome number
Missing entries:        0 
Unique Entries:         24

Field:                  pos
Type:                   integer
Comment:                physical position on the chromosome as to hg19
                        (1-based coordinate)
Missing entries:        0 
Unique Entries:         28,060,014
Range...
Read more about dbNSFP

C4A

C4A is part of a “complement” group. The term complement means it is able to kill bacteria and contributes to immune defenses. However, if there are too many compliments, it can cause tissue damage and trigger an allergic reaction. C4A is an activation protein, which means it also activates the other complement proteins to increase in level. The C3a, C4a, and C5a components are referred to as anaphylatoxins: they cause smooth muscle contraction, histamine release from mast cells, and enhanced vessel permeability. They also mediate inflammation and the generation of free ...

Read more about C4A