I don't think this is an inherent problem of having a compiled dynamic (in other words, JITted) language.
Things like Javascript (V8) and Lua (LuaJIT) manages to have fast startup times while having exceptional performance in hot paths - this is because they have a fast bytecode interpreter that executes the script first while the actual compilation is taking place. Unfortunately Julia in its current state, doesn't have a fallback interpreter to execute its bytecode/IR (which is similar to LLVM IR). And LLVM IR isn't really suited for fast execution in interpreters - it's moreso designed as an intermediate language for a heavyweight compiler than a bytecode for a dynamic language.
Maybe some heroic figure would come out of the fog and suddenly brings a whole new bytecode representation and a fast interpreter for Julia, but that would be quite a project..
Julia does have a minimal compilation path with an interpreter. You can even configure this on a per-module basis, which I believe some of the plotting packages do to reduce latency. There is even a JIT-style dynamic compiler which works similarly to the VMs you listed: https://github.com/tisztamo/Catwalk.jl/.
IMO, the bigger issue is one of predictability and control. Some users may not care about latency at all, whereas others have it as a primary concern. JS and related runtimes don't give you much control over when optimization and are thus black boxes, whereas Julia has known semantics around it. I think fine-grained tools to externally control optimization behaviour for certain modules (in addition to the current global CLI options and per-package opt-ins) would go a long way towards addressing this.
You're right, done some more research and there seems to be an interpreter in the compiler: https://github.com/JuliaDebug/JuliaInterpreter.jl. It's only enabled by explicitly adding annotations in your code, and is mainly used for the internal debugger, but it's still there.
Still, it still seems to try executing the internal SSA IR in its raw form (which is more geared towards compiling rather than dynamic execution in a VM). I was talking more towards a conventional bytecode interpreter (which you can optimize the hell out of it like LuaJIT did). A bytecode format that is carefully designed for fast execution (in either a stack-based or register-based VM) would be much better for interpreters, but I'm not sure if Julia's language semantics / object model can allow it. Maybe some intelligent people out there can make the whole thing work, is what I was trying to say.
The naming is unfortunate, but JuliaInterpreter is not the built-in one but a separate package for use in external tooling. The built-in one can be run globally via the --compile=min CLI flag. Likewise, you can also pass -O0 to -O3 to configure the level of optimization (which, predictably, affects latency).
As for IR representation, I'm not aware of any limitations of the IR (remember, there are multiple levels) over a LuaJIT-style bytecode for interpretation performance. After all, the Futamura projections tell us that a compiler is really an interpreter that's undergone some partial application. Of course, that's a theoretical correspondence that has little bearing on real-world performance, but I don't think you can confidently say that Julia's lowered or unlowered IR forms are fundamentally bad for fast interpretation.
You can set optimization per module with `Base.Experimental.@optlevel`, though I'm not finding any documentation for it. Could swear it was in a release note.
help?> Base.Experimental.@optlevel
Experimental.@optlevel n::Int
Set the optimization level (equivalent to the -O command line argument) for code in
the current module. Submodules inherit the setting of their parent module.
Supported values are 0, 1, 2, and 3.
The effective optimization level is the minimum of that specified on the command line
and in per-module settings.
Inside Julia compiler/runtime there is an interpreter, because Julia uses a heuristic to determine whether to compile or interpret a function. There is also interpreter code in Julia debugger. I don't know how full featured they are, but one does not have to start from scratch.
On the other hand, implementing a tracing JIT for Julia is going to be such a big task, I am not sure how much help existing interpreters are going to be. At the very least there needs to be a new GC, which necessitates changes everywhere except the parser. LLVM integration may also prove awkward for a tracing JIT.
Things like Javascript (V8) and Lua (LuaJIT) manages to have fast startup times while having exceptional performance in hot paths - this is because they have a fast bytecode interpreter that executes the script first while the actual compilation is taking place. Unfortunately Julia in its current state, doesn't have a fallback interpreter to execute its bytecode/IR (which is similar to LLVM IR). And LLVM IR isn't really suited for fast execution in interpreters - it's moreso designed as an intermediate language for a heavyweight compiler than a bytecode for a dynamic language.
Maybe some heroic figure would come out of the fog and suddenly brings a whole new bytecode representation and a fast interpreter for Julia, but that would be quite a project..