Next Generation Sequencing and Structural Identification of retinoblastoma data Analysis

doi:https://doi.org/10.61336/ejcm/24-4-77

Contents

Abstract
Keywords
None
Material And Method
Results:
Conclusion
References

Download XML

580 Views

6 Downloads

Share this article

Research Article | Volume 14 Issue: 4 (Jul-Aug, 2024) | Pages 590 - 597

Next Generation Sequencing and Structural Identification of retinoblastoma data Analysis

TV Venkateswaran

Uma Kumari

Project Trainee at Bioinformatics Project and Research Insitute, Noida - 201301, India Senior Bioinformatics Scientist, Bioinformatics Project and Research Insitute, Noida - 201301, India

Under a Creative Commons license

Open Access

DOI : https://doi.org/10.61336/ejcm/24-4-77

Received

June 10, 2024

Revised

June 28, 2024

Accepted

July 25, 2024

Published

Aug. 9, 2024

Abstract

The most common type of eye cancer in youngsters is called retinoblastoma. Four categories are used to categorise the disease. One classification is based on whether the condition affects one eye or both; these are referred to as unilateral and bilateral, respectively. Based on gene expression analysis, retinoblastomas can be classified into two types. Group 1 exhibits an invasive tumour pattern together with a variety of different types of retinal cells. Group 2 exhibits a distinct cone photoreceptor expression profile. RBBP9 is a protein that is involved in the human cancer process and is a binding partner of retinoblastoma susceptibility protein (Rb). LxCxE is the Rb binding motif found in the sequence of RBBP9. Yeast two-hybrid experiments revealed that RBBP9 interacts with Rb. RBBP9 is 21 kD in size, and its crystal structure has been investigated. There is evidence linking RBBP9 to pancreatic cancer. Pancreatic cancer cannot develop without the protein's serine hydrolase activity. Serine hydrolase activity works by phosphorylating Smad2/3 less, which in turn suppresses TGF-β antiproliferative signalling. Our goals in this study are to visualise the protein structure in PyMol, determine the nature of the protein (whether hydrophobic or hydrophilic), use the multiple alignment tool COBALT to check for protein conservation across other species, identify both chains (A & B) using PyMol, examine the interaction between 2QS9 and 7OEX (a protein that is very similar to 2QS9) in PyMol, perform multiple sequence alignment using clustal omega to determine the nature of the protein, and plot the Ramachandran plot to visualise energetically allowed region in BioPython and Saves server and molecular docking of 2QS9 and Topotecan using CB-DOCK2.

Keywords

Retinoblastoma

Evolutionary relationship

NGS

Molecular Docking

Structure visualization

Domain analysis

Biopython

None

The most common type of eye cancer in youngsters is called retinoblastoma. Leukocoria, strabismus, buphthalmus, advanced intraocular tumour, extraocular tumour involving orbital tissue, and cellulitis are the main signs of the illness.Four categories are used to categorise the disease. One classification is based on whether the condition affects one eye or both; these are referred to as unilateral and bilateral, respectively. The other group, known as intraocular and extraocular, respectively, is dependent on whether the condition is present inside the eyeball or in structures outside the eyeball (1). Retinoblastoma affects 1 in 14,000–20,000 live births worldwide. Retinoblastoma occurs in 40% of bilateral cases and 60% of unilateral instances. The illustration of a healthy eye and an afflicted eye is shown below.The affected eye has cancerous cell starting from the retina. The genetic nature of the disorder can be germline or somatic. The causative gene of retinoblastoma is RB1 and most bilateral retinoblastomas are caused by germline pathogenic mutations in this gene. 10% to 15% of unilateral retinoblastomas result from germline pathogenic mutations whereas the remaining85-90% is caused by somatic changes (2).

As per gene expression profiling, there are two groups of retinoblastomas. Group 1 has a range of various retinal cell types and the tumor pattern in this one is invasive. Group 2 showed a unique expression profile of cone photoreceptor. Below are the examples of unilateral and bilateral retinoblastoma (3).

Unilateral retinoblastoma- One eye here has leukocoria which is the white pupillary reflex.

Bilateral Retinoblastoma- Both the eyes show leukocoria

One of the proteins which is a binding partner of retinoblastoma susceptibility protein (Rb) is RBBP9 and is important in the human cancer pathway. RBBP9 has LxCxE in its sequence which is the Rb binding motif. RBBP9 was shown to interact with Rb by yeast two-hybrid and co-immunoprecipitation experiments. Substitution of Glutamine for Leucine in this motif blocked the binding of Rb. The size of RBBP9 is 21 kD and the crystal structure has been studied. RBBP9 has been implicated in pancreatic cancer. The serine hydrolase activity of the protein is necessary for the development pancreatic carcinoma. The serine hydrolase activity functions by inhibiting TGF-β antiproliferative signaling through suppressing Smad2/3 phosphorylation (4).

Below given is the crystal structure of RBBP9:

Figure1:RBBP9 interaction with other Rb family protein: (Image taken from Sergey M. Vorobiev et al., 2012)

Figure 2:1Image taken from Sergey M. Vorobiev et al., 2012.

MATERIAL AND METHOD

The medications that have been utilised are taken out of the pharmacy. When it comes to chemistry and the biological sciences, PubChem is an indispensable resource. The National Centre for Biotechnology Information (NCBI), a division of the US National Library of Medicine (NLM), is responsible for maintaining this enormous database. Free access to data on the biological actions of small compounds is the main goal of PubChem [5]. The PDB database was used to obtain the protein structure. The protein's PDB id is 6U9N. With a primary focus on proteins and nucleic acids, the Protein Data Bank (PDB) is a crucial resource for three-dimensional structural data on biological macromolecules. The PDB's archived data are publicly available.The Protein Data Bank (PDB) is an essential repository for three-dimensional structural data of biological macromolecules, primarily focusing on proteins and nucleic acids. The data archived in the PDB are freely accessible to researchers worldwide, enabling them to explore the structures of proteins, nucleic acids, and complex assemblies. Scientists use this information for a multitude of purposes, including understanding biological functions, drug discovery, protein engineering, and molecular modelling [6]. Further, to study the protein structure rasmol software has been utilized and then to analyze the ligand-protein complex interaction pymol has been used. RasMol is an influential and pioneering molecular visualizationprogram that has significantly contributed to the field of structural biology.RasMol facilitated the visualization of molecular structures by providing features to rotate, translate, and zoom into the three-dimensional representations of molecules. Users could manipulate these structures to examine different angles, surface properties, and structural details, aiding in the understanding of molecular interactions, folding patterns, and active sites [7,8]. PyMOL offers a vast array of tools for structural analysis and exploration. It allows users to measure distances, angles, and dihedral angles, perform alignments between structures, analyze electrostatic potentials, generate molecular surfaces, and visualize molecular dynamics trajectories [9].For further sequence similarity and phylogenetic analysis of protein BLAST and COBALT has been used, respectively. BLAST, which stands for Basic Local Alignment Search Tool, is a fundamental and widely used bioinformatics algorithm and software tool. It's designed to compare biological sequences, such as DNA, RNA, or protein sequences, against vast databases to identify similarities and infer functional, structural, or evolutionary relationships between sequences. BLAST outputs result in a format that includes statistical measures indicating the significance of matches found. This information helps researchers assess the likelihood that a match occurred by chance and allows them to prioritize and further investigate the most relevant matches [10,11]. COBALT employs a constraint-based approach for multiple sequence alignment. It gathers a set of pairwise constraints derived from various sources, including database searches, sequence similarity data, and user input. These constraints serve as guiding principles or rules that inform the alignment process [12]. For the docking of protein and drug, CB Dock2 has been used. CB-Dock2 is a user-friendly web server designed specifically for blind docking, meaning it predicts binding modes without prior information about binding sites. CB-Dock2 showcased a notable success rate of around 85% for binding pose prediction, with root-mean-square deviation (RMSD) values less than 2.0 Å [13]. This tool is designed to compute various physicochemical properties and analyze protein sequences [14]. Then we have calculated protein interaction and try to find gene ontology. For that STRING tool has been used. STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) is a bioinformatics database and web-based tool that consolidates and predicts protein-protein interactions (PPIs) and functional associations across various organisms.

According to (15), it combines predicted and known interactions to provide a thorough resource for examining protein interactions and their functional ramifications. Protein quality was assessed using ERRAT and SAVES. A bioinformatics tool called ERRAT is used to analyse protein structures using atomic coordinates. It evaluates the statistics of interactions between non-bonded atoms and points up any faults or anomalies in the structure. By examining the distribution of atomic interactions in relation to high-resolution structures, ERRAT calculates a quality factor that represents the overall quality of the model. The accuracy and dependability of protein structures, however, are evaluated by a collection of tools called SAVES (Structural Analysis and Verification Server) [16,17]

RESULTS:

Figure3: Representation of N and C-Terminal of 2QS9. N-Terminal is blue and C-terminal is red.

Figure 4: Presentation of helix, sheet and loop in PyMol

Figure5: Multiple alignment results from COBALT representing hydropathy scale showing hydrophobic nature.In hydropathy scale, the red bars are hydrophobic and blue bars are hydrophilic. The number of red bars are more than the blue ones which shows that the protein is hydrophobic.

Analysis in Sequence similarity

Description of RBBP9 across species with percentage identity. Here we have to compare the percentage identity of RBBP9 (Homo sapiens) with the same protein across other species. 7OEX is identical to 2QS9.

Figure 6: Active site representation in 2QS9 (Chain A and Chain B)

Figure 7: Protein -Ligand interaction using the lingand Topotecan with an auto-doc score of -8.3

Figure8: Protein-Ligand interaction using the lingand Cytoxan with an auto-doc score of -4.5 (Cytoxan also known as cyclophosphamide)

Figure 9: Docking score table of molecular docking interactions of 2QS9 and Topotecan

Figure 10: Docking score table of molecular docking interactions of 2QS9 and Cytoxan

Active Residues where the Protein Ligand interaction was observed (Topotecan)

Chain A: LYS42 ASN43 PRO45 ASP46 PRO47 ILE48 THR49 ARG51 ILE54 PHE58 GLU62 ASN107

Chain B: ARG51 GLU52 SER53 ILE54 LEU56 PRO57 ARG83 THR87 HIS88 GLU106 ARG109 ALA110 SER111 GLY112 THR115 ARG116 PRO117

Active Residues where the Protein Ligand interaction was observed (Cytoxan also known as cyclophosphamide)

Chain A: PRO45 ASP46 PRO47 ILE48 THR49 ARG51 ILE54 PHE58

Chain B: ARG51 GLU52 SER53 LEU56 PRO57 GLU86 ALA110 SER111 GLY112 TYR113 THR115 ARG116

Figure 11: Molecular docking of Topotecan using CB-DOCK2-Distribution of hydrophobic areas (Green and yellow)

Figure 12: Identification of domains yellow chain A and red chain b- White background

Figure 13; ERRAT-Error Values-Quality Factor-Chain A. The error values are below 95 percentage which implies that the quality of the sample is very good to use the same for analysis.

ERRAT-Error Values-Quality Factor-Chain B. The error values are below 95 percentage which implies that the quality of the sample is very good to use the same for analysis.

Figure 14: Clustal Omega Multiple Sequence Alignment between 2QS9-7OEX High number of blues showing acidic nature of the proteins

Figure 15: Protein change representation in PDBSum with change in position Asp102 variant

Figure16: Interface statistics between chains A & B of the 2QS9 protein. The interaction between the chains is non-bonded.

Given below is the schematic diagram of interactions between protein chains. Interacting chains are joined by coloured lines, each representing a different type of interaction. The area of each circle is proportional to the surface area of the corresponding protein chain. The extent of the interface region on each chain is represented by the black wedge whose size signifies the interface surface area

Figure 17: Protein-Protein interaction residues of 2QS9 between chain A and chain B showing non-bonded contacts.

Residue interactions across interface

Coloured by residue type showing non-bonded contacts (Protein-Protein analysis in PDB web server). .

Residue analysis:

Positive – Arg51

Negative – Glu62

Neutral- Ser53

Proline45 and 57

Aliphatic- Ile54

Figure 18: Functional domain analysis using interproscan. The 2QS9 protein has Ser_hydrolase as the representative domain and this protein belongs to the hydrolase_RBBP9/YdeN family.

Figure 19; Gene view histogram of mutations across RBBP9

Figure 20:3D structure view of RBBP9 from COSMIC Database

Figure 21: Redsite in the 3D Structure of RBBP9 corresponds to the maximum frequency of the mutations

Figure 22: Ramachandran Plot of 2QS9 using Biopython.

Figure 23: Protein Contact map of 2QS9 using Biopython. This map shows the interaction (bonding) between amino acids within the protein. The yellow spots on the map represents the amino acids. The spots which are close to each other depicts strong bond whereas the scattered spots depicts weak bond.

Figure 24: Phylogenetic tree using Biopython. The tree depicts that 2QS9 is similar and related to 7OEX.

The phylogenetic score (190) between 2QS9 and 7OEX using Biopython.

CONCLUSION

When used to retinoblastoma, next-generation sequencing (NGS) offers a thorough understanding of the disease's genetic makeup, facilitating precise diagnosis, molecular categorization, and tailored treatment plans. This aids in determining structural validation, comprehending tumour heterogeneity, differentiating between sporadic and inherited cases, and identifying possible targets for therapy. Consequently, NGS has developed into a vital tool for enhancing retinoblastoma patient care and results.

REFERENCES

Rootman DB, Gonzalez E, Mallipatna A, et al. Hand-held high-resolution spectral domain optical coherence tomography in retinoblastoma: clinical and morphologic considerations. Br J Ophthalmol. 2013 Jan;97(1):59-65. doi: 10.1136/bjophthalmol-2012-302133. Epub 2012 Oct 26. PMID: 23104902.
Shields CL, Schoenberg E, Kocher K, et al. Lesions simulating retinoblastoma (pseudoretinoblastoma) in 604 cases: results based on age at presentation. Ophthalmology. 2013 Feb;120(2):311-6. doi: 10.1016/j.ophtha.2012.07.067. Epub 2012 Oct 27. PMID: 23107579.
Norrie JL, Nityanandam A, Lai K, et al. Retinoblastoma from human stem cell-derived retinal organoids. Nat Commun. 2021 Jul 27;12(1):4535. doi: 10.1038/s41467-021-24781-7. PMID: 34315877; PMCID: PMC8316454.
Kapatai G, Brundler MA, Jenkinson H, et al. Gene expression profiling identifies different sub-types of retinoblastoma. Br J Cancer. 2013 Jul 23;109(2):512-25. doi: 10.1038/bjc.2013.283. Epub 2013 Jun 11. PMID: 23756868; PMCID: PMC3721394.
Castela G, Providência J, Monteiro M, et al. Characterization of the Portuguese population diagnosed with retinoblastoma. Sci Rep. 2022 Mar 14;12(1):4378. doi: 10.1038/s41598-022-08326-6. PMID: 35288594; PMCID: PMC8921246.
Vorobiev SM, Huang YJ, Seetharaman J, et al. Human retinoblastoma binding protein 9, a serine hydrolase implicated in pancreatic cancers. Protein Pept Lett. 2012 Feb;19(2):194-7. doi: 10.2174/092986612799080356. PMID: 21933118; PMCID: PMC3677193.
Kim S, Thiessen PA, Bolton EE, et al. PubChem Substance and Compound databases. Nucleic Acids Res. 2016 Jan 4;44(D1):D1202-13. doi: 10.1093/nar/gkv951. Epub 2015 Sep 22. PMID: 26400175; PMCID: PMC4702940.
Wang Y, Bolton E, Dracheva S, et al. An overview of the PubChem BioAssay resource. Nucleic Acids Res. 2010 Jan;38(Database issue):D255-66. doi: 10.1093/nar/gkp965. Epub 2009 Nov 19. PMID: 19933261; PMCID: PMC2808922.
Berman HM, Westbrook J, Feng Z, et al. The Protein Data Bank. Nucleic Acids Res. 2000 Jan 1;28(1):235-42. doi: 10.1093/nar/28.1.235. PMID: 10592235; PMCID: PMC102472.
Westbrook J, Feng Z, Jain S, et al. The Protein Data Bank: unifying the archive. Nucleic Acids Res. 2002 Jan 1;30(1):245-8. doi: 10.1093/nar/30.1.245. PMID: 11752306; PMCID: PMC99110.
Berman HM, Battistuz T, Bhat TN, et al. The Protein Data Bank. Acta Crystallogr D Biol Crystallogr. 2002 Jun;58(Pt 6 No 1):899-907. doi: 10.1107/s0907444902003451. Epub 2002 May 29. PMID: 12037327.
Sayle RA, Milner-White EJ. RASMOL: biomolecular graphics for all. Trends Biochem Sci. 1995 Sep;20(9):374. doi: 10.1016/s0968-0004(00)89080-5. PMID: 7482707.
Tony Pembroke, Bio-molecular modelling utilising RasMol and PDB resources: atutorial with HEW lysozyme, Biochemistry and Molecular Biology Education,Volume 28, Issue 6, 2000, Pages 297-300, ISSN 1470-8175,https://doi.org/10.1016/S1470-8175(00)00050-3.
Seeliger D, de Groot BL. Ligand docking and binding site analysis with PyMOL and Autodock/Vina. J Comput Aided Mol Des. 2010 May;24(5):417-22. doi: 10.1007/s10822-010-9352-6. Epub 2010 Apr 17. PMID: 20401516; PMCID: PMC2881210.
Mura C, McCrimmon CM, et al. An introduction to biomolecular graphics. PLoS Comput Biol. 2010 Aug 26;6(8):e1000918. doi: 10.1371/journal.pcbi.1000918. PMID: 20865174; PMCID: PMC2928806.
McGinnis S, Madden TL. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W20-5. doi: 10.1093/nar/gkh435. PMID: 15215342; PMCID: PMC441573.
Boratyn GM, Camacho C, Cooper PS, et al. BLAST: a more efficient report with usability improvements. Nucleic Acids Res. 2013 Jul;41(Web Server issue):W29-33. doi: 10.1093/nar/gkt282. Epub 2013 Apr 22. PMID: 23609542; PMCID: PMC3692093.
Johnson M, Zaretskaya I, Raytselis Y, et al. NCBI BLAST: a better web interface. Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W5-9. doi: 10.1093/nar/gkn201. Epub 2008 Apr 24. PMID: 18440982; PMCID: PMC2447716.
Papadopoulos JS, Agarwala R. COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics. 2007 May 1;23(9):1073-9. doi: 10.1093/bioinformatics/btm076. Epub 2007 Mar 1. PMID: 17332019.
Liu Y, Yang X, Gan J, et al. CB-Dock2: improved protein-ligand blind docking by integrating cavity detection, docking and homologous template fitting. Nucleic Acids Res. 2022 Jul 5;50(W1):W159-W164. doi: 10.1093/nar/gkac394. PMID: 35609983; PMCID: PMC9252749.
Gasteiger E, Gattiker A, Hoogland C, et al. ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 2003 Jul 1;31(13):3784-8. doi: 10.1093/nar/gkg563. PMID: 12824418; PMCID: PMC168970.
Szklarczyk D, Franceschini A, Wyder S, et al. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015 Jan;43(Database issue):D447-52. doi: 10.1093/nar/gku1003. Epub 2014 Oct 28. PMID: 25352553; PMCID: PMC4383874.
Al-Khayyat MZ, Al-Dabbagh AG. In silico Prediction and Docking of Tertiary Structure of LuxI, an Inducer Synthase of Vibrio fischeri. Rep Biochem Mol Biol. 2016 Apr;4(2):66-75. PMID: 27536699; PMCID: PMC4986264.
Messaoudi A, Belguith H, Ben Hamida J. Homology modeling and virtual screening approaches to identify potent inhibitors of VEB-1 β-lactamase. Theor Biol Med Model. 2013 Apr 2;10:22. doi: 10.1186/1742-4682-10-22. PMID: 23547944; PMCID: PMC3668210.
Uma Kumari, K. S. (2023). CADD Approaches For The Early Diagnosis Of Lung Cancer. Journal of Clinical Otorhinolaryngology, Head, and Neck Surgery, 27(1), 5190-5199.
.Uma Kumari, N. B. (2023). Computer Aided Drug Designing Approach for Prospective Human Metastatic Cancer. International Journal for Research in Applied Science and Engineering Technology, 11, 1874-1879. doi:10.22214/ijraset.2023.550014
Uma Kumari,Gurpreet Kaur et al,”Biopython/Network Of Protein Identification And NGS Analysis Of Glioma Cancer ATP Competitive Type III C-MET Inhibitor : 7.367 Volume 11, Issue 2 : 27-Jun-2024 :pp 41-51
Uma Kumari,Sharvari Santosh Kulkarini etal "Structure Analysis And Glioma Targeted Therapeutics Cancer In ATP Competitive Type III C-MET Inhibitor", International Journal of Emerging Technologies and Innovative Research, ISSN:2349-5162, Vol.11, Issue 6, page no. ppi271-i282, June-2024, Av

European Journal of Cardiovascular Medicine

Download PDF