Skip to content
Surf Wiki
Save to docs
science/biology

From Surf Wiki (app.surf) — the open knowledge base

Open reading frame

DNA section marked with start and stop codon of different length


DNA section marked with start and stop codon of different length

In molecular biology, reading frames are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible reading frames will be "open" (the "reading", however, refers to the RNA produced by transcription of the DNA and its subsequent interaction with the ribosome in translation). Such an open reading frame (ORF) may contain a start codon (usually AUG in terms of RNA) and by definition cannot extend beyond a stop codon (usually UAA, UAG or UGA in RNA). That start codon (not necessarily the first) indicates where translation may start. The transcription termination site is located after the ORF, beyond the translation stop codon. If transcription were to cease before the stop codon, an incomplete protein would be made during translation.

In eukaryotic genes with multiple exons, introns are removed and exons are then joined together after transcription to yield the final mRNA for protein translation. In the context of gene finding, the start-stop definition of an ORF therefore only applies to spliced mRNAs, not genomic DNA, since introns may contain stop codons and/or cause shifts between reading frames. An alternative definition says that an ORF is a sequence that has a length divisible by three and is bounded by stop codons. This more general definition can be useful in the context of transcriptomics and metagenomics, where a start or stop codon may not be present in the obtained sequences. Such an ORF corresponds to parts of a gene rather than the complete gene.

Biological significance

One common use of open reading frames (ORFs) is as one piece of evidence to assist in gene prediction. Long ORFs are often used, along with other evidence, to initially identify candidate protein-coding regions or functional RNA-coding regions in a DNA sequence. or 150 codons. By itself even a long open reading frame is not conclusive evidence for the presence of a gene.

Short open reading frames

Some short open reading frames (sORFs), also known as small open reading frames (smORFs), usually

Six-frame translation

Since DNA is interpreted in groups of three nucleotides (codons), a DNA strand has three distinct reading frames. The double helix of a DNA molecule has two anti-parallel strands; with the two strands having three reading frames each, there are six possible frame translations.

Example of a six-frame translation. The nucleotide sequence is shown in the middle with forward translations above and reverse translations below. Two possible open reading frames with the sequences are highlighted.

Software

Finder

The ORF Finder (Open Reading Frame Finder) is a graphical analysis tool which finds all open reading frames of a selectable minimum size in a user's sequence or in a sequence already in the database. This tool identifies all open reading frames using the standard or alternative genetic codes. The deduced amino acid sequence can be saved in various formats and searched against the sequence database using the basic local alignment search tool (BLAST) server. The ORF Finder should be helpful in preparing complete and accurate sequence submissions. It is also packaged with the Sequin sequence submission software (sequence analyser).

Investigator

ORF Investigator is a program which not only gives information about the coding and non coding sequences but also can perform pairwise global alignment of different gene/DNA regions sequences. The tool efficiently finds the ORFs for corresponding amino acid sequences and converts them into their single letter amino acid code, and provides their locations in the sequence. The pairwise global alignment between the sequences makes it convenient to detect the different mutations, including single nucleotide polymorphism. Needleman–Wunsch algorithms are used for the gene alignment. The ORF Investigator is written in the portable Perl programming language, and is therefore available to users of all common operating systems.

Predictor

OrfPredictor is a web server designed for identifying protein-coding regions in expressed sequence tag (EST)-derived sequences. For query sequences with a hit in BLASTX, the program predicts the coding regions based on the translation reading frames identified in BLASTX alignments, otherwise, it predicts the most probable coding region based on the intrinsic signals of the query sequences. The output is the predicted peptide sequences in the FASTA format, and a definition line that includes the query ID, the translation reading frame and the nucleotide positions where the coding region begins and ends. OrfPredictor facilitates the annotation of EST-derived sequences, particularly, for large-scale EST projects.

ORF Predictor uses a combination of the two different ORF definitions mentioned above. It searches stretches starting with a start codon and ending at a stop codon. As an additional criterion, it searches for a stop codon in the 5' untranslated region (UTR or NTR, nontranslated region). The OrfPredictor web server was not further supported, the standalone OrfPredictor tool can be downloaded at the following site (http://bioinformatics.ysu.edu/publication/tools_download/).

ORFik

ORFik is a R-package in Bioconductor for finding open reading frames and using Next generation sequencing technologies for justification of ORFs.

orfipy

orfipy is a tool written in Python / Cython to extract ORFs in an extremely and fast and flexible manner. orfipy can work with plain or gzipped FASTA and FASTQ sequences, and provides several options to fine-tune ORF searches; these include specifying the start and stop codons, reporting partial ORFs, and using custom translation tables. The results can be saved in multiple formats, including the space-efficient BED format. orfipy is particularly faster for data containing multiple smaller FASTA sequences, such as de-novo transcriptome assemblies.

References

References

  1. (2021-08-25). "Stop Codon". National Institutes of Health.
  2. (2009). "Microbiology: An Evolving Science". W.W. Norton & Co..
  3. (March 2018). "The Definition of Open Reading Frame Revisited". Trends in Genetics.
  4. (1997). "Computational methods for the identification of genes in vertebrate genomic sequences". Human Molecular Genetics.
  5. (1997). "The difficulty of identifying genes in anonymous vertebrate sequences". Computers & Chemistry.
  6. (2005). "Computational Genome Analysis: an introduction". [[Springer-Verlag]].
  7. (2022). "Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures". Journal of Biomedical Science.
  8. (2022). "De novo birth of functional microproteins in the human lineage". Cell Reports.
  9. (2022). "Small Open Reading Frames, How to Find Them and Determine Their Function". Frontiers in Genetics.
  10. (September 2015). "Pri sORF peptides induce selective proteasome-mediated protein processing". Science.
  11. (January 2014). "uORFdb--a comprehensive literature database on eukaryotic uORF biology". Nucleic Acids Research.
  12. (April 1994). "Initiation codons within 5'-leaders of mRNAs as regulators of translation". Trends in Biochemical Sciences.
  13. (September 2012). "Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution". Proceedings of the National Academy of Sciences of the United States of America.
  14. (October 2016). "Early B-cell factor 1 (EBF1) is critical for transcriptional control of SLAMF1 gene in human B cells". Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms.
  15. (November 1997). "Comparison of DNA sequences with protein sequences". Genomics.
  16. "ORFfinder".
  17. (2012). "ORF Investigator: A New ORF finding tool combining Pairwise Global Gene Alignment". Research Journal of Recent Sciences.
  18. "OrfPredictor".
  19. (April 1990). "Cap-independent enhancement of translation by a plant potyvirus 5' nontranslated region". Journal of Virology.
  20. (2018). "ORFik - Open reading frames in genomics".
  21. (2021). "ORFik: A comprehensive R toolkit for the analysis of translation". BMC Bioinformatics.
  22. (February 2021). "orfipy: a fast and flexible tool for extracting ORFs". Bioinformatics.
  23. (2021-02-13). "urmi-21/orfipy".
Info: Wikipedia Source

This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page.

Want to explore this topic further?

Ask Mako anything about Open reading frame — get instant answers, deeper analysis, and related topics.

Research with Mako

Free with your Surf account

Content sourced from Wikipedia, available under CC BY-SA 4.0.

This content may have been generated or modified by AI. CloudSurf Software LLC is not responsible for the accuracy, completeness, or reliability of AI-generated content. Always verify important information from primary sources.

Report