What is the implementation difference between using the system WebView (fragmented, especially bad under linux) and using one shared tauri-base runtime that only gets breaking changes updates every 2 years or so so there aren't twenty different ones running at the same time and it ends up like electron?
Would bundling one extended support release of chromium or firefox's backends that are then shared between all tauri apps not suffice?
They mention FSR specifically in the trailer, but this comes with RDNA3, meaning no FSR4 currently. Does this mean that the int8 path for fsr4 is gonna become official to support this and the ps5 pro?
Now to do speculation on top of speculation on top of speculation: Valve's next vr headset deckard / steam frame is also rumored to be using an ARM chip, and with them being quite close with AMD since the steam deck custom APU (although that one was apparently just something originally intended for magic leap before that fell apart), this could be in there + be powerful enough to run standalone VR.
Is there anything preventing them from using heterogeneous memory chips, like 1/4 GDDR7 and 3/4 LPDDR? It could enable new MEO-like architectures with finer-grained performance tuning for long contexts.
Rumor has it (according to MLID, so no one knows whether it's accurate) that AMD is also looking to use regular LPDDR memory for some of it's lower end next gen GPUs to not have to contend with nvidia over limited and cartelled GDDR7 supply. Maybe they're going to increase parallel bandwidth to compensate it? Or have wholly different tricks up their sleeve.
LPDDR5x really just means LPDDR5 running at higher than the original speed of 6400MT/s. Absent any information about which faster speed they'll be using, this correction doesn't add anything to the discussion. Nobody would expect even Intel to use 6400MT/s for a product that far in the future. Where they'll land on the spectrum from 8533 MT/s to 10700 MT/s is just a matter for speculation at the moment.
Xe3P as far as I remember is built in their own fabs as opposed to xe3 at TSMC. This could give them a huge advantage by being possibly the only competitor not competing for the same TSMC wafers
One thing I don't get about the ever-reoccuring RAG discussions and hype men proclaiming "Rag is dead", is that people seem to be talking about wholly different things?
My mental model is that what is called RAG can either be:
- a predefined document store / document chunk store where every chunk gets a a vector embedding, and a lookup decides what gets pulled into context as to not have to pull whole classes of document, filling it up
- the web search like features in LLM chat interfaces, where they do keyword search, and pull relevant documents into context, but somehow only ephemerally, with the full documents not taking up context in the future of the thread (unsure about this, did I understand it right?) .
with the new models with million + tokens of context windows, some where arguing that we can just throw whole books into the context non-ephemerally, but doesnt that significantly reduce the diversity of possible sources we can include at once if we hard commit to everything staying in context forever? I guess it might help with consistency? But is the mechanism with which we decide what to keep in context not still some kind of RAG, just with larger chunks of whole documents instead of only parts?
I'd be extatic if someone who really knows their stuff could clear this up for me
Technically, RAG is anything that augments generation with external search. However, it often has a narrower meaning: "uses a vector DB."
Throwing everything into one large context window is often impractical - it takes much more time to process, and many models struggle to find information accurately if too much is going on in the context window ("lost in the middle").
The "classic" RAG still has its place when you want low latency (or you're limited by VRAM) and the results are already good enough.
We can't throw in infinite things in the context though.
My impression is that GPT-5 gets confused, not quite right away, but after a couple of pages it has no idea. It doesn't take pages on pages before it forgets things.
I’m currently experimenting with prompts of ~300k tokens for a certain classification task and I think I might be able to make it work. GPT5 chokes but Gemini 2.5 Pro is showing promise. Jury’s still out and I might change my tune in a couple of weeks.
It should also be said, that what I say here is focused on things where these models have problems.
For example, I consider the model confused when it starts outputting stereotyped or cliche responses, and I intentionally go at problems that I know that the models have problems with (I already know they can program and do some maths, but I want to see what they can't do). But if you're using them for things they're made for, and which aren't confusing, such as people arguing with each other, then you are probably likely to succeed.
Prompts with lots of examples are reasonable and I know they can get very long.
In both cases for "Question Answering" it's about similarity search but there are two main orthogonal differences between RAG and Non-RAG :
-Knowing the question at the time of index building
-Higher order features : the ability to compare fetched documents with one another and refine the question
Non-RAG, aka multi-layer (non-causal) transformer with infinite context, is the more generic version, fully differentiable meaning you can use machine learning to learn how to Non-RAG better. Each layer of the transformer can use the previous layer to reason and refine the similarity search. (A causal transformer know the question at the time when it is feed the question, and can choose to focus it's attention on different part of the previously computed features of the provided documents but may benefit from having some reflection token, or better : be given the question before being presented the documents (provided you've trained it to answer it like that).)
RAG is an approximation of the generic case to make it faster and cheaper. Usually it breaks end-to-end differentiability by using external tools, so this mean that if you want to use machine learning to learn how to RAG better you will need to use some variant of Reinforcement Learning which is slower to learn things. RAG usually don't know the question at the time of index building, and documents are treated independently of each other, so no (automatic) higher order features (embeddings are fixed).
A third usual approximation, is to feed the output of RAG into Non-RAG, to hopefully get the best of both world. You can learn the Non-RAG given RAG with machine learning (if you train it with some conversations where it used RAG), but the RAG part won't improve by itself.
Non-RAG need to learn so it needs a big training dataset, but fortunately it can pick-up question answer pair in an unsupervised fashion when you feed it the whole web, and you only need a small instruction training and preference optimization dataset to shape it to your need. If performance isn't what you expect in a specific case, you can provide more specific examples and retrain the model until it gets it and you get better performance for the case you were interested in. You can improve the best case but it's hard to improve the worst case.
RAG has more control on what you feed it but content should be in a more structured way. You can prevent worst cases more easily but it's hard to improve good case.
> My mental model is that what is called RAG can either be:
RAG is confusing, because if you look at the words making up the acronym RAG, it seems like it could be either of the things you mentioned. But it originally referred to a specific technique of embeddings + vector search - this was the way it was used in the ML article that defined the term, and this is the way most people in the industry actually use the term.\
It annoys me, because I think it should refer to all techniques of augmenting, but in practice it's often not used that way.
There are reasons that specifically make the "embeddings" idea special - namely, it's a relatively new technique that actually fits LLM very well, because it's a semantic search - meaning, it works on "the same input" as LLMs do, which is a free-text query. (As opposed to a traditional lookups that work on keyword search or similar.)
As for whether RAG is dead - if you mean specifically vector-embeddings and semantic search, it's possible - because you could theoretically use other techniques for augmentation, e.g. an agent that understands a user question about a codebase and uses grep/find/etc to look for the information, or composes a search to search the internet for something. But it's definitely not going to die in that second sense of "we need some way to augment LLMs knowledge before text generation", that will probably always be relevant, as you say.
Oh wow, I would not have caught that. I had a look at the first couple of pages, and as not-a-C-expert, it looked pretty solid to me. Readjusting our heuristics to generated slop (or even non-slop?) is gonna take so much more energy than before.
Although I've also been thinking about the overall role of effort in products, art, or any output really. Necessary effort to produce something is / was at least some indicator of quality that means that the author spent a certain amount of time with the material, and probably didn't want to release something bad if it meant they had to put a certain threshold of effort in anyways. With that gone, of course some people are gonna get their productivity enhanced and use this tool to make even better things, more often. But having to expend even more engery as a consumer to find out whether something is worth it is incredibly hard.
Because all the content is taken from my personal notes (with more to come on building a search engine, vector database, and graph database in C and Go), in the last step I used an LLM for editing and fixing grammar and formulas (typing LaTeX by hand takes a lot of time). If you find the content to be just AI slop, I'm sorry for taking your time.
Oh wow, a lot of focus on code from the big labs recently. In hindsight it makes sense that the domain the people building it know best is the one getting the most attention, and it's also the one the models have seen the most undeniable usefulness in so far. Though personally, the unpredictability of the future where all of this goes is a bit unsettling at the same time...
Along with developers wanting to build tools for developers like you said, I think code is a particularly good use case for LLMs (large language models), since the output product is a language.
It's because the output is testable. If the model outputs a legal opinion or medical advice, a human needs to be looped in to verify that the advice is not batshit insane. Meanwhile, if the output is code, it can be run through a compiler and (unit) tests run to verify that the generated code is cromulent without a human being in the loop for 100% of it, which means the supercomputer can just go off and do it a thing with less supervision.
Thing is though if you are good at code it solves many other adjacent tasks for LLMs, like formatting docs for output, presentations, spreadsheet analysis, data crawling etc.
Congrats! You’re now on the p(doom)-aware path. People have been concerned for decades and are properly scared today. That doesn’t stop the tools from being useful, though, so enjoy while the golden age lasts.
Would bundling one extended support release of chromium or firefox's backends that are then shared between all tauri apps not suffice?
reply