Saturday, June 3, 2023
AI Emerging Tech
  • Home
  • News
  • Shop
  • Blog
  • Resources
No Result
View All Result
  • Home
  • News
  • Shop
  • Blog
  • Resources
No Result
View All Result
AI Emerging Tech
No Result
View All Result

This Artificial Intelligence-Based Protein Language Model Unlocks General-Purpose Sequence Modelling

admin@justmattg by admin@justmattg
January 22, 2023
Home News
Share on FacebookShare on Twitter


The way people study the language of life has been fundamentally altered by comparing the syntax-semantics of natural languages and the sequence function of proteins. Although this comparison has inherent value when seen as a historical milestone that helped improve NLP’s application to the domain of proteins (such as language models), results from the area of NLP do not entirely translate to protein language. In addition to scaling up NLP model sizes, scaling up protein language models may have a much greater impact than scaling up NLP model sizes.

The observation of language models with a huge number of parameters trained on a huge number of steps still undergoing noticeable learning gradients and therefore perceived as under-fitted tends to encourage the proportionality between the model size and the richness of its learned representations rather -falsely-. As a result, choosing more accurate or relevant protein representations has gradually changed to choosing bigger models, which require more computing power and are therefore less accessible. Notably, PLM sizes recently increased from 106 to 109 parameters. They base their size-performance benchmark utilizing ProtTrans’s ProtT5-XL-U50, an encoder-decoder transformer pre-trained on the UniRef50 database, whose parameters are 3B for training and 1.5B for inference, shedding light historically on protein language model state-of-the-art (SOTA).

To develop scaling principles for protein sequence modeling, the RITA family of language models, which is a first step in that direction, was used to show how the performance of a model changes about its size. RITA presents four alternative models with performance-proportional increases in size from 85M to 300M, to 680M, to 1.2B parameters. A similar pattern was later confirmed by ProGen2, a collection of protein language models trained on various sequencing datasets and including 6.4B parameters. Finally, and as of the time this study was published, ESM-2, a survey of general-purpose protein language models that similarly shows a proportionate performance rise in size from 650M to 3B to 15B parameters, is the most recent addition encouraging model up-scaling.

The simple relationship between larger and ostensibly better PLMs ignores several factors, including computing costs and the design and deployment of task-agnostic models. This increases the entrance hurdle for innovative research and limits its capacity to scale. Although model size unquestionably influences achieving the goals above, it is not the only one. Pre-training dataset scaling in the same direction is conditional, i.e., larger datasets are not always preferable to smaller datasets of greater quality. They argue that scaling up language models is conditional and continues in the same approach (i.e., bigger models are not necessarily better than smaller models of protein knowledge guided means of optimization).

The primary goal of this study is to incorporate knowledge-guided optimization into an iterative empirical framework that encourages access to research innovation through practical resources. Because their model “unlocks” the language of life by learning better representations of its “letters,” the amino acids, they named their project “Ankh” (a reference to the Ancient Egyptian sign for the key to life). This is further developed into two pieces of evidence for assessing Ankh’s generality and optimization.

A generation study for protein engineering on High-N (family-based) and One-N (single sequence-based) applications, where N is the number of input sequences, is the first step in outperforming the performance of the SOTA in a wide range of structure and function benchmarks. The second step is to achieve this performance by a survey of optimum attributes, including not only the model architecture but also the software and hardware used for the model’s creation, training, and deployment. According to the application’s needs, they provide two pre-trained models called Ankh big and Ankh base, each offering two ways of computation. They call their flagship model, Ankh big, Ankh, for convenience’s sake. The pretrained models are available on their GitHub page. It also has details on how to run the codebase. 


Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our Reddit Page, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.



Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.




Source link

admin@justmattg

admin@justmattg

Next Post
UCLA Researchers Propose PhyCV: A Physics-Inspired Computer Vision Python Library

UCLA Researchers Propose PhyCV: A Physics-Inspired Computer Vision Python Library

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended.

Yadea Celebrates a 2022 that Witnessed Rapid Globalization, Multiple Tech Breakthroughs and Esthetic Innovations, Excited to Further “Electrify Your Life” in 2023

Yadea Celebrates a 2022 that Witnessed Rapid Globalization, Multiple Tech Breakthroughs and Esthetic Innovations, Excited to Further “Electrify Your Life” in 2023

January 14, 2023
The Indicator from Planet Money : NPR

The Indicator from Planet Money : NPR

January 19, 2023

Trending.

New NextGen TV Products Highlighted at 2023 CES

New NextGen TV Products Highlighted at 2023 CES

January 24, 2023
Whose art is this, really? Inside Canadians’ fight against AI

Whose art is this, really? Inside Canadians’ fight against AI

February 2, 2023
AI-Generated Seinfeld-Like Twitch ‘TV Show’ Is Peak Absurdity

AI-Generated Seinfeld-Like Twitch ‘TV Show’ Is Peak Absurdity

February 2, 2023
Artificial intelligence bot ChatGPT could be misused to spread ‘propaganda and disinformation’ to users, report says

Artificial intelligence bot ChatGPT could be misused to spread ‘propaganda and disinformation’ to users, report says

January 12, 2023
The Future of Artificial Intelligence: A Look Inside DOD’s Newest AI Office

The Future of Artificial Intelligence: A Look Inside DOD’s Newest AI Office

January 18, 2023
  • Privacy Policy
  • Refund and Returns Policy
  • Contact Us

© 2023 AIEmergingTech - All rights reserved.

No Result
View All Result
  • Home
  • News
  • Shop
  • Blog
  • Resources

© 2023 AIEmergingTech - All rights reserved.