We actually just wrote a book with your profile in mind -- especially if by "AI" you're especially interested in LLMs and if you're a visual learner. It's called Hands-On Large Language Models and it contains 300 original figures explaining the main couple hundred intuitions and applications for these models. You can also read it online on the O'Reilly platform. I find that after acquiring the main intuitions, people find it much easier to move on to code implementations or papers.
This is my sense as well. Text generation LLMs haven't been the best source of embeddings for other downstream use cases. If you're optimizing for token embeddings (e.g., for NER, span detection, or token classification tasks), then a token training objective is important. If you need text-level embeddings (e.g., for semantic search or text classification), then that training objective is required (e.g., what Sentence BERT did to optimize BERT embeddings for semantic search).
The SGPT model is a very high performing text embeddings model adapted from a decoder. Using the same techniques with Llama-2 might perform better than you expect. I think someone will need to try these things before we know for certain. I believe there is still room for significant improvement with embedding models.
This is a field I find fascinating. It's generally the research field of Machine Learning Interpretability. The BlackboxNLP workshop is one of the main places for investigating this and is a very popular academic workshop https://blackboxnlp.github.io/
One of the most interesting presentations in the last session of the workshop is this talk by David Bau titled "Direct Model Editing and Mechanistic Interpretability". David and his team locate exact information in the model, and edit it. So for example they edit the location of the Eiffel Tower to be in Rome. So whenever the model generates anything involving location (e.g., the view from the top of the tower), it actually describes Rome
There is also work on "Probing" the representation vectors inside the model and investigating what information is encoded at the various layers. One early Transformer Explainability paper (BERT Rediscovers the Classical NLP Pipeline
https://arxiv.org/abs/1905.05950) found that "the model represents the steps of the traditional NLP pipeline in an interpretable and localizable way: POS tagging, parsing, NER, semantic roles, then coreference". Meaning that the representations in the earlier layers encode things like whether a token is a verb or noun, and later layers encode other, higher-level information. I've made an intro to these probing methods here: https://www.youtube.com/watch?v=HJn-OTNLnoE
A lot of applied work doesn't require interpretability and explainability at the moment, but I suspect the interest will continue to increase.
I wasn't aware of that BERT explainability paper - will be reading it, and watching your video.
Are there any more recent Transformer Explainability papers that you would recommend - maybe ones that build on this and look at what's going on in later layers?
The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention https://arxiv.org/abs/2202.05798
Another piece of the puzzle seems to be transformer "induction heads" where attention heads in consecutive layers work together to provide a mechanism that is believed to be responsible for much of in-context learning. The idea is that earlier instances of a token pattern/sequence in the context are used to predict the continuation of a similar pattern later on.
In the most simple case this is a copying operation such that an early occurrence of AB predicts that a later A should be followed by B. In the more general case this becomes A'B' => AB which seems to be more of an analogy type relationship.
The landing page is technically the course overview. I'd love to hear what you think would've made it more engaging for you. We can probably pull up some of the visuals to it as a preview. Let me see what we can do on that front.
I know this course is meant to be content marketing for your company, and I don't mean this in a derogatory sense--you're doing content marketing right by providing high quality information to an audience who could be interested in your services--but it's a bit odd that going to https://llm.university unceremoniously drops me in the middle of what looks to be your product's documentation. That's not wrong necessarily, but it all adds up to arriving at a site and having a feeling of "what am I looking at?".
Like the grandparent comment mentioned, the pitch is "visual, intuitive explanations", but I don't see that on the landing page. I'm looking for a way to get to the start of your content, but the top and left hand menus don't help and are, if anything, confusing until I realize that I'm now inside of a larger set of documentation unrelated to the course.
Below the fold we see a "Let's get started!", but the link I see, Structure of the Course" doesn't sound like getting started. It sounds like more front matter. From the nav menu I see that after that I still won't get to the content, but instead a page about the instructors. Do I really need to read blurbs of the instructors before I get to the meat of the course?
It just feels like too much wrapping paper and packaging to get to the good stuff--and it really does seem like good stuff! And I think the way that you've embedded this course into the rest of your documentation prevents you from presenting it in a structure that is more familiar and easy to navigate (e.g. an 'About' link at the top that talks about the instructors and Cohere).
It might be frustrating to put a lot of time and effort into high quality materials, only for people to not want to spend a few minutes looking around, but from the audience perspective, there's a sea of LLM-related content out there. I want to quickly determine if this is worth adding to my already-too-long list of LLM related bookmarks of things I want to read.
Is your goal to make it feel like a typical university course or like something else?
If like a typical university course, start with a syllabus and a course description and all the logistics.
If like something else then the first 10 seconds of the experience should make people go "Oh, this is different."
What's happening in the first 1 second, 30 seconds, 1 minute, 10 minutes, etc. that are reflective of the rest of my experience? That will serve as an advanced organizer for what's to follow?
The very first graphic I see is labeled as a "quiz" and requires me to read a bunch of surrounding text to make sense of it.
That's the vibe: a promise of something visual and intuitive, first consummated by a long syllabus and a quiz.
The goal is to make the materials as accessible as possible. So we're definitely not limited to the structure of a typical university course and are happy to iterate on it.
I appreciate you elaborating on your feedback. Thank you.
The landing page should probably be more of a marketing page explaining why the course is worth checking out, with low information density and large visuals
I'm the author of https://jalammar.github.io/illustrated-transformer/ and have spent years since introducing people to Transformers and thinking of how best to communicate those concepts. I've found that different people need different kinds of introductions, and the thread here includes some often cited resources including:
Looking back at The Illustrated Transformer, when I introduce people to the topic now, I find I can hide some complexity by omitting the encoder-decoder architecture and focusing only on one. Decoders are great because now a lot of people come to Transformers having heard of GPT models (which are decoder only). So for me, my canonical intro to Transformers now only touches on a decoder model. You can see this narrative here: https://www.youtube.com/watch?v=MQnJZuBGmSQ
There's a lot you can do with the vectors themselves without needing to embed any more text (e.g., clustering, exploration, visualization after dimensionality reduction...etc). Here's a previous embeddings exploration of top HN posts: https://txt.cohere.com/combing-for-insight-in-10-000-hacker-... A lot of that code can be used here as well.
If you want to query for a search term, you can use a trial API key which is free to use for prototyping. The embedding model itself is not open source, though. [co-author of the post here]
Cohere actually trains its own models and they are not based on models from other providers [I work at Cohere].
Your prompt suggestion is a good one for LLMs as a whole. Any information added to the context informs the model and nudges it towards the expected answer format.
Oh sorry I've got mixed up with who runs what model, I thought coheres one was Claude, but that's anthropic. Wasn't trying to say it was based on another.