I've just whipped one together and am currently testing it out, I won't have time tonight but check back tomorrow and I should have it up by then. (I'll post a reply to my own comment here)
"The human genome contains approximately 19,000 to 20,000 protein-coding genes. While each gene can initiate the production of at least one protein, the total count of distinct proteins is significantly higher. Estimates suggest the human body contains 80,000 to 400,000 different protein types, with some projections reaching up to a million, depending on how a “distinct protein” is defined."
Plus, that's just in the human DNA. In your body are a whole bunch of bacteria, adding even more types of protein.
> The actual number of protein molecules? Billions. Trillions if we're counting across all your cells.
(The linked-to piece later says "every single one of your 37 trillion cells", showing that "trillions" is far from the correct characterization. "trillions of trillions" would get the point across better.)
> Each one has a specific job.
Proteins can do multiple jobs, unless you define "job" as "whatever the protein does."
"many of the proteins or protein domains encoded by viruses are multifunctional. The transmembrane (TM) domains of Hepatitis C Virus envelope glycoprotein are extreme examples of such multifunctionality. Indeed, these TM domains bear ER retention signals, demonstrate signal function and are involved in E1:E2 heterodimerization (Cocquerel et al. 1999; Cocquerel et al. 1998; Cocquerel et al. 2000). All these functions are partially overlapped and present in the sequence of <30 amino acids"
> And if even ONE type folds wrong, one could get ... sickle cell anemia
Sickle cell anemia is due to a mutation in the hemoglobin gene causing a hydrophobic patch to appear on the surface, which causes the hemoglobins to stick to each other.
"Errors are more frequent during protein synthesis, resulting either from misacylation of tRNAs or from tRNA selection errors that cause insertion of an incorrect amino acid (misreading) shifting out of the normal reading frame (frameshifting), or spontaneous release of the peptidyl-tRNA (drop-off) (Kurland et al. 1996). Misreading errors are arguably the most common translational errors (Kramer and Farabaugh 2007; Kramer et al. 2010; Yadavalli and Ibba 2012)."
> Then AI companies showed up in 2020 and said "we got this" and solved it in an afternoon.
"The first protein successfully designed completely de novo was done by Stephen Mayo and coworkers in 1997 ... Later, in 2008, Baker's group computationally designed enzymes for two different reactions.[7] In 2010, one of the most powerful broadly neutralizing antibodies was isolated from patient serum using a computationally designed protein probe.[8] In 2024, Baker received one half of the Nobel Prize in Chemistry for his advancement of computational protein design, with the other half being shared by Demis Hassabis and John Jumper of Deepmind for protein structure prediction."
> These are called secondary structures, local patterns in the protein backbone
The corresponding figure is really messed up. The sequence of atoms in the amino acids are wrong, and the pairs of atoms which are hydrogen bonded are wrong. For example, it shows a hydrogen bond between two double-bonded oxygens, which don't have a hydrogen, and a hydrogen bond between two hydrogens, which would both have partial positive charge. The hydrogen bonds are suppose to go from the N-H to the O=C. See https://en.wikipedia.org/wiki/Beta_sheet#Hydrogen_bonding_pa...
> Given the same sequence, you get the same structure.
The structure may depend on environmental factors. For example, https://en.wikipedia.org/wiki/%CE%91-Lactalbumin "α-lactalbumin is a protein that regulates the production of lactose in the milk of almost all mammalian species ... A folding variant of human α-lactalbumin that may form in acidic environments such as the stomach, called HAMLET, probably induces apoptosis in tumor and immature cells."
There can also be post-translational modifications.
> The sequence contains all the instructions needed to fold into the correct shape.
Assuming you know the folding environment.
> Change the shape even slightly, and the protein stops working.
I don't know how to interpret this. Some proteins require changing their shape to work. Myosin - a muscle protein - changes it shape during its power stroke.
> Prions are misfolded proteins that can convert normal proteins into the misfolded form, spreading like an infection
Earlier the author wrote "It's deterministic (mostly, there are exceptions called intrinsically disordered proteins, but let's not go there)."
https://en.wikipedia.org/wiki/Prion says "Prions are a type of intrinsically disordered protein that continuously changes conformation unless bound to a specific partner, such as another protein."
So the author went there. :)
Either accept that proteins aren't always deterministically folded based on their sequence, or don't use prions as an example of misfolding.
Physical insights are built into the network structure, not just a process around it
- End-to-end system directly producing a structure instead of inter-residue distances
- Inductive biases reflect our knowledge of protein physics and geometry
- The positions of residues in the sequence are de-emphasized
- Instead residues that are close in the folded protein need to communicate
- The network iteratively learns a graph of which residues are close, while reasoning
over this implicit graph as it is being built
What went badly:
- Manual work required to get a very high-quality Orf8 prediction
- Genetics search works much better on full sequences than individual domains
- Final relaxation required to remove stereochemical violations
What went well
- Building the full pipeline as a single end-to-end deep learning system
- Building physical and geometric notions into the architecture instead of a search process
- Models that predict their own accuracy can be used for model-ranking
- Using model uncertainty as a signal to improve our methods (e.g. training new models to
eliminate problems with long chains)
====
Also you can read the papers, e.g. https://www.nature.com/articles/s41586-019-1923-7 (available if you search the title on Google Scholar; also https://www.nature.com/articles/s41586-021-03819-2_reference...). There is actual, real good science, physics, and engineering going on here, as compared to e.g. LLMs or computer vision models that are just trained on the internet, and where all the engineering is focused on managing finicky training and compute costs. AlphaFold requires all this and more.
EDIT: Basically, the article makes it sound like deep models just allowed scientists to sidestep all the complicated physics and etc and just magically solve the problem, and while this is arguably somewhat correct for computer vision and much of NLP, this is the exact opposite of the truth for AlphaFold.
Is there a way to run these Omni models on a Macbook quantized via GGUF or MLX? I know I can run it in LMStudio or Llama.cpp but they don't have streaming microphone support or streaming webcam support.
Qwen usually provides example code in Python that requires Cuda and a non-quantized model. I wonder if there is by now a good open source project to support this use case?
I oftentimes run Linux Desktop fullscreen in a VM on macOS. macOS acts like a hardware abstraction layer in that case. Depending on the task and the tools, I sometimes prefer this option (I do like the macOS UI though (except for the current version), I just like to use the right tool for the job)
Hey Marcin :) Really dig this. A much friendlier alternative to Swift development than having to use the monstrosity that is Xcode. Especially for people that want to get something done quickly. The Linux support is also really cool.
The Xcode-alternatives market is really starved, so I'm very happy this exists. It's possible to configure VSCode to support Swift, but that's a lot of configuration and messing around.
> but that's a lot of configuration and messing around.
Maybe I’m missing something but the last time I did this I clicked “install” on the official Swift VSCode extension and that was it. Not a lot of messing around needed, for me at least!
You're thinking of QuickTime 7, that can be optionally installed (as a separate app) even on macOS 10.14 Mojave! But the website is referring to versions of QuickTime X. QuickTime 10.2, which was included with Mountain Lion, was the last to support third-party components. (If you've ever used "Perian", that's what I'm referring to.)
reply