A knowledge-driven protocol for prediction of proteins of interest with an emphasis on biosynthetic pathways

Joshi, Adwait G.; Harini, K.; Meenakshi, Iyer; Shafi, K. Mohamed; Pasha, Shaik Naseer; Mahita, Jarjapu; Sajeevan, Radha Sivarajan; Karpe, Snehal D.; Ghosh, Pritha; Nitish, Sathyanarayanan; Gandhimathi, A.; Mathew, Oommen K.; Prasanna, Subramanian Hari; Malini, Manoharan; Mutt, Eshita; Naika, Mahantesha; Ravooru, Nithin; Rao, Rajas M.; Shingate, Prashant N.; Sukhwal, Anshul; Sunitha, Margaret S.; Upadhyay, Atul K.; Vinekar, Rithvik S.; Sowdhamini, Ramanathan

doi:10.1016/j.mex.2020.101053

Research article2020Peer reviewedOpen access

A knowledge-driven protocol for prediction of proteins of interest with an emphasis on biosynthetic pathways

Joshi, Adwait G.; Harini, K.; Meenakshi, Iyer; Shafi, K. Mohamed; Pasha, Shaik Naseer; Mahita, Jarjapu; Sajeevan, Radha Sivarajan; Karpe, Snehal D.; Ghosh, Pritha; Nitish, Sathyanarayanan; Gandhimathi, A.; Mathew, Oommen K.; Prasanna, Subramanian Hari; Malini, Manoharan; Mutt, Eshita; Naika, Mahantesha; Ravooru, Nithin; Rao, Rajas M.; Shingate, Prashant N.; Sukhwal, Anshul;
Show more authors

Abstract

This protocol describes a stepwise process to identify proteins of interest from a query proteome derived from NGS data. We implemented this protocol on Moringa oleifera transcriptome to identify proteins involved in secondary metabolite and vitamin biosynthesis and ion transport. This knowledge-driven protocol identifies proteins using an integrated approach involving sensitive sequence search and evolutionary relationships. We make use of functionally important residues (FIR) specific for the query protein family identified through its homologous sequences and literature. We screen protein hits based on the clustering with true homologues through phylogenetic tree reconstruction complemented with the FIR mapping. The protocol was validated for the protein hits through qRT-PCR and transcriptome quantification. Our protocol demonstrated a higher specificity as compared to other methods, particularly in distinguishing cross-family hits. This protocol was effective in transcriptome data analysis of M. oleifera as described in Pasha et al.Knowledge-driven protocol to identify secondary metabolite synthesizing protein in a highly specific manner.Use of functionally important residues for screening of true hits.Beneficial for metabolite pathway reconstruction in any (species, metagenomics) NGS data. (C) 2020 The Authors. Published by Elsevier B.V.

Keywords

Pathway; Homology; Multiple sequence alignment; Functionally important residue; Phylogenetic analysis

Published in

MethodsX
2020, volume: 7, article number: 101053
Publisher: Elsevier {BV}

SLU Authors

Radha Sivarajan, Sajeevan
- National Centre for Biological Sciences (NCBS)

UKÄ Subject classification

Bioinformatics (Computational Biology)

Publication identifier

DOI: https://doi.org/10.1016/j.mex.2020.101053

Permanent link to this page (URI)

https://res.slu.se/id/publ/120205

A knowledge-driven protocol for prediction of proteins of interest with an emphasis on biosynthetic pathways

Abstract

Keywords

Published in

SLU Authors

Radha Sivarajan, Sajeevan

UKÄ Subject classification

Publication identifier

Permanent link to this page (URI)