What is bioinformatics ?
In biology, bioinformatics is defined as, “the use of computer to store, retrieve, analyze or predict the composition or structure of bio-molecules” . Bioinformatics is the application of computational techniques and information technology to the organization and management of biological data. Classical bioinformatics deals primarily with sequence analysis.
Aims of bioinformatics
- Development of database containing all biological information.
- Development of better tools for data designing, annotation and mining.
- Design and development of drugs by using simulation software.
- Design and development of software tools for protein structure prediction function, annotation and docking analysis.
- Creation and development of software to improve tools for analyzing sequences for their function and similarity with other sequences
Biological databases
Biological data are complex, exception-ridden, vast, and incomplete. Therefore several databases have been created and interpreted to ensure unambiguous results. A collection of biological data arranged in a computer-readable form that enhances the speed of search and retrieval and convenient to use is called a biological database. A good database must have updated information.
Importance of biological database
A range of information like biological sequences, structures, binding sites, metabolic interactions, molecular action, functional relationships, protein families, motifs and homologous can be retrieved by using biological databases. The main purpose of a biological database is to store and manage biological data and information in computer readable forms.
In this course we learned about the different biological databases that are being used in bioinformatics and get to know a little bit about their details. Mainly these databases are divided into four categories and we learned about them base by base. And explained the difference among the primary and secondary database and explained their utilization in bioinformatics.
This course will be extremely helpful to students of data analyst and bioinformaticians because they use the databases a lot in their work.
If you guys have any questions or suggestions please let me know in instructor inbox I’ll try to answer all of your questions within 12 hours.
Primary Databases
-
1Introduction of Biological Databases
Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis.[citation needed] They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics.
Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures.
Biological databases can be broadly classified into sequence, structure and functional databases. Nucleic acid and protein sequences are stored in sequence databases and structure databases store solved structures of RNA and proteins. Functional databases provide information on the physiological role of gene products, for example enzyme activities, mutant phenotypes, or biological pathways.
-
2Types of Biological Databases
There are three to four main types of biological databases
Primary database
Secondary Databases
Derived database
Literature Databases
-
3Difference Between Primary and Secondary Databases
Primary databases
In bioinformatics, and indeed in other data intensive research fields, databases are often categorised as primary or secondary. Primary databases are populated with experimentally derived data such as nucleotide sequence, protein sequence or macromolecular structure. Experimental results are submitted directly into the database by researchers, and the data are essentially archival in nature. Once given a database accession number, the data in primary databases are never changed: they form part of the scientific record.
Secondary databases
By contrast, secondary databases comprise data derived from the results of analysing primary data. They are often referred to as curated databases but this is a bit of a misnomer because primary databases are also curated to ensure that the data in them is consistent and accurate.
Secondary databases often draw upon information from numerous sources, including other databases (primary and secondary), controlled vocabularies and the scientific literature. They are highly curated, often using a complex combination of computational algorithms and manual analysis and interpretation to derive new knowledge from the public record of science.
Secondary databases have become the molecular biologist’s reference library over the past decade or so, providing a wealth of (often daunting) information on just about any gene or gene product that has been investigated by the research community. The potential for mining this information to make new discoveries is vast. It’s our job in this course to reduce your activation energy to make more of these resources for your research.
Secondary Databases
-
4Primary Databases
International Nucleotide Sequence Database (INSD) consists of the following databases.
DNA Data Bank of Japan (National Institute of Genetics)
EMBL (European Bioinformatics Institute)
GenBank (National Center for Biotechnology Information)
A few popular databases are GenBank from NCBI (National Center for Biotechnology Information), UNIProt from the Swiss Institute of Bioinformatics.
-
5Explaining Primary Databases
Explained the different primary databases
GenBank
EMBL
UniProt
DDBJ
-
6Explaining Primary Databases 2nd Part
Explained primary Databases
Array Express and BioStudies:
GEO
-
7Explaining Primary Databases Last Part
Explained Secondary Databases
REACTOME
Broad Institute of Harvard and MIT
Genome Analysis Toolkit - Broad Institute
Firebrowse
KEGG: Kyoto Encyclopedia of Genes and Genomes
TCGA Computational Tools - National Cancer Institute
Literature Databases
-
8Introduction of Secondary Databases
Secondary databases make use of publicly available sequence data in primary databases to to provide layers of information to DNA or protein sequence data. ... Secondary databases comprise data derived from analysing entries in primary databases.
-
9Explaining Secondary Databases
Explained Secondary Databases
1000 Genomes Browser
BLAST
ClinVar
-
10Explaining Secondary Databases 2nd Part
Explained Secondary Databases
dbSNP
dbVar
dbGaP
-
11Explaining Secondary Databases 3rd Part
Explained Secondary Databases
Gene
Genome
MedGen
-
12Explaining Secondary Databases 4th Part
Explained secondary databases
NCBI Develop
E-Utilities API for NCBI
ENCODE