Started with friends introducing me to anime in highschool (with english subtitles), which I got hooked on, then got into the music as well, and later into vtubers (so no subtitles when watching live). I haven't ever really been into other entertainment, so for a little over a decade I've been listening to japanese on a near daily basis.
I know it's a meme for people to claim to know japanese from watching anime, which is why I don't claim to be able to speak it, but over time I did pick up enough that I don't need subtitles anymore. I'm slowly working on reading with practice books, wanikani etc, will eventually figure out some way to practice speaking too.
Web search is not a capability of a “bare” LLM, but in an LLM-based system it can be done by giving the LLM access to a “web search tool”, I.e essentially you instruct it to output a specific structured text (typically json but doesn’t have to be) indicating its “intent” to search, and your wrapper intercepts/detects this structured response, does the actual search and returns the results (e.g snippets from top k results) into the context of the LLM amd have it use these to respond to your question.
A similar thing can be done with external documents - your wrapper retrieves docs/fragments relevant to the query, puts them in the context and lets the LLM use them to answer the query. This is called Retrieval Augmented Generation (RAG).
The above is a highly simplified description.
In the Langroid library (a multi-agent framework from ex-CMU/UW-Madison researchers) we have these and more. For example here’s a script that combines web search and RAG:
I don't know about ML but if you want to learn applied stats I would look up andrew gelman's or one of the newer books on Bayesian Inference ones using Stan and do them cover to cover.
In the last days I'm using ChatGPT as my first choice when searching for something and then googling if I'm not sure GPT is not hallucinating. Are they able to sustain the load?