Blog

MySQL vs MongoDB

Introduction

The relational databases held the leadership for decades and at that time the choice was quite obvious, either MySQL, Oracle, or MS SQL, just to name a few. They’ve served as a basis for tons of enterprise applications, while modern apps require more diversity and scalability. Non-relational databases, like MongoDB, have appeared to meet the existing requirements and replace current relational environment.

This post originally appeared on DA-14 website. Read the ...

Read more about MySQL vs MongoDB

The Human Epigenome Roadmap

Aside from the occasional somatic mutation, the genome of every cell in an individual’s body is largely preserved. Yet different types of cells (and tissues, and organs) are incredibly diverse. The majority of that specialization is governed by epigenetic changes — histone modifications, DNA accessibility, and methylation — that influence when and how genes are expressed.

Our knowledge of the epigenome has lagged well behind our knowledge of the genome, partly because it’s been difficult to study. The application of next-gen sequencing to RNA libraries (RNA-Seq),...

Read more about The Human Epigenome Roadmap

MSigDB relate TFs to Genes

collection: Motif gene sets

Gene sets representing potential targets of regulation by transcription factors or microRNAs. The sets consist of genes grouped by short sequence motifs they share in their non-protein coding regions. The motifs represent known or likely cis-regulatory elements in promoters and 3'-UTRs. These gene sets make it possible to link changes in an expression profiling experiment to a putative cis-regulatory element. The C3 collection is divided into two sub-collections: microRNA targets (MIR) and transcription factor targets (TFT). 
... Read more about MSigDB relate TFs to Genes

dbNSFP

Database type:          variant
Number of records:      89,617,785
Distinct variants:      84,484,850
Reference genome hg18:  chr, hg18_pos, ref, alt
Reference genome hg19:  chr, pos, ref, alt

Field:                  chr
Type:                   string
Comment:                Chromosome number
Missing entries:        0 
Unique Entries:         24

Field:                  pos
Type:                   integer
Comment:                physical position on the chromosome as to hg19
                        (1-based coordinate)
Missing entries:        0 
Unique Entries:...
Read more about dbNSFP

C4A

C4A is part of a “complement” group. The term complement means it is able to kill bacteria and contributes to immune defenses. However, if there are too many compliments, it can cause tissue damage and trigger an allergic reaction. C4A is an activation protein, which means it also activates the other complement proteins to increase in level. The C3a, C4a, and C5a components are referred to as anaphylatoxins: they cause smooth muscle contraction, histamine release from mast cells, and enhanced vessel permeability. They also mediate inflammation and the generation of free ...

Read more about C4A

Protein function predictions

For human mutations that are predicted to result in an amino acid substitution we provide SIFT and PolyPhen predictions for the effect of this substitution on protein function. We compute the predictions for each of these tools for all possible single amino acid substitutions in the Ensembl human proteome. This means we can provide predictions for novel mutations for VEP and API users. We were able to compute predictions from at least one tool for over 95% of the human proteins in Ensembl. SIFT predictions are also available for chicken, cow, dog, horse, mouse, pig, rat...

Read more about Protein function predictions

Variant Consequences

 

For each variant that is mapped to the reference genome, we identify each Ensembl transcript that overlap the variant. We then use a rule-based approach to predict the effects that each allele of the variant may have on the transcript. The set of consequence terms, defined by the Sequence Ontology (SO), that can be currently assigned to each combination of an allele and a transcript is shown in the table below. Note that each allele of each variant may have a different...

Read more about Variant Consequences