This Bioinformatics course is going to game changer for you. Currently, there is an explosion of biological data. Bioinformatics is at the intersection of biology and computer science.
What is Bioinformatics ?
In biology, bioinformatics is defined as, “the use of computer to store, retrieve, analyze or predict the composition or structure of bio-molecules” . Bioinformatics is the application of computational techniques and information technology to the organization and management of biological data. Classical bioinformatics deals primarily with sequence analysis.
Aims of Bioinformatics
-
Development of database containing all biological information.
-
Development of better tools for data designing, annotation and mining.
-
Design and development of drugs by using simulation software.
-
Design and development of software tools for protein structure prediction function, annotation and docking analysis.
-
Creation and development of software to improve tools for analyzing sequences for their function and similarity with other sequences
You will be using the DNA and protein sequence on-line databases that are the core of bioinformatics. There are two general types of sequence databases: Primary databases contain experimental results in an accessible format, but are not sequences that are a population consensus. DDBJ, EMBL, and GenBank are primary databases. Secondary databases are curated to reflect consensus sequences from multiple experiments and usually use the primary databases as their sources.
Abbreviations
DDBJ – DNA Databank of Japan
EMBL – European Molecular Biology Laboratory
NCBI – National Center for Biotechnology Information
BLAST – Basic local alignment search tool
ClustalW2 – Use for Multiple sequence alignment (MSA)
This course will be extremely helpful to students of data analyst and bioinformaticians because they use the databases a lot in their work.
If you guys have any questions or suggestions please let me know in instructor inbox I’ll try to answer all of your questions within 12 hours.
Introduction to Bioinformatics Databases
-
1What is Bioinformatics?
What is Bioinformatics?
Bioinformatics is the acquisition, storage, arrangement, identification, analysis, and communication of information related to biology. The term was coined in 1990 with the use of computers in DNA sequence analysis. Think of it as the “theoretical” branch of molecular biology – like the relationship of theoretical physics to the general field of physics.
The term bioinformatics was coined by Paulien Hogeweg and Ben Hesper to describe "the study of informatic processes in biotic systems" and it found early use when the first biological sequence data began to be shared.
-
2History and Application of Bioinformatics
History of Bioinformatics (Emerged as a scientific discipline)
1962:
Zuckerkandl and Pauling proposed sequence variability.
1965-1978:
The NBRF compiled "Atlas of protein" by Margaret O. Dayhoff.
1980-1984:
The EMBL was established and The PIR was established by NBRF.
Sequence Search Introduction
-
3Scope & Broad Coverage of Bioinformatics
Broad Coverage of Bioinformatics
Bioinformatics covers many specialized and advanced areas of biology.
Functional genomics:
Identification of genes and their respective functions.
Structural genomics:
Predictions related to functions of proteins.
Comparative genomics:
For understanding the genomes of different species of organisms.
DNA microarrays:
These are designed to measure the levels of gene expression in different tissues, various stages of development and in different diseases.
Medical informatics:
This involves the management of biomedical data with special referee to biomolecules, in vitro assays and clinical trials.
-
4Components and Applications of Bioinformatics
Components of Bioinformatics
Bioinformatics comprises three components:
1. Creation of databases
2. Development of algorithms and statistics
3. Analysis of data and interpretation
Applications of Bioinformatics
A selected list of applications of bioinformatics is given below:
i. Sequence mapping of biomolecules (DNA, RNA, proteins).
ii. Identification of nucleotide sequences of functional genes.
iii. Finding of sites that can be cut by restriction enzymes.
iv. Designing of primer sequence for polymerase chain reaction.
-
5Introduction of Biological Databases
Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures.
Importance of Biological Database
Databases are important tools in assisting scientists to analyze and explain a host of biological phenomena from the structure of biomolecules and their interaction, to the whole metabolism of organisms and to understanding the evolution of species. This knowledge helps facilitate the fight against diseases, assists in the development of medications, predicting certain genetic diseases and in discovering basic relationships among species in the history of life.
Biological database design, development, and long-term management is a core area of the discipline of bioinformatics
Introduction of BLAST & ClustalW2
-
6Types of Biological Databases
Types of biological database:
There are three to four main types of biological databases on the base of data it contain;
Primary database
Secondary Databases
Specialized Database
Literature Databases
-
7Introduction of Entrez
Entrez is a data retrieval system developed by the National Center for Biotechnology Information (NCBI) that provides integrated access to a wide range of data domains, including literature, nucleotide and protein sequences, complete genomes, three-dimensional structures, and more. Entrez includes powerful search features that retrieve not only the exact search results but also related records within a data domain that might not be retrieved otherwise and associated records across data domains.
-
8Introduction of FASTA & Reference Sequence
FASTA Sequence
FASTA format or sequence is a text-based format for representing either nucleotide sequences or peptide sequences
a unique identification number (the accession number)
the version number of the sequence
the length of the sequence
molecule type (DNA or mRNA)
taxonomic division (for instance, INV = invertebrate)
last release date
source organism
Reference Sequence (RefSeq)
The Reference Sequence (RefSeq) collection provides a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins.
Always start with NT for DNA, NM for mRNA, or NP for protein
-
9Entrez Search to Retrieve Sequence (Practical)