More

SaveTheRbtz · on April 7, 2023

"An In-Depth look at Lerp, Smoothstep, and Shaping Functions"[1] by SimonDev is the best 8min introduction to Lerp for a software engineer that I saw.

[1] https://www.youtube.com/watch?v=YJB1QnEmlTs

SaveTheRbtz · on Feb 1, 2023

We're mostly on the Grafana stack, so we tend to favor their open-source solutions for metric backends. For client libraries, we're currently using Prometheus for metrics and OpenTelemetry (OTEL) for tracing, but we're looking to migrate fully to OTEL once metrics are stable there. As for logging, it depends on the project, but generally we use either Zerolog or Zap, with either Loki Docker plugin or Promtail.

SaveTheRbtz · on April 23, 2022

I was recently playing[0] with the ZSTD seekable format[1]: a *valid* ZSTD stream format (utilizing ZSTD skippable frames, that are ignored by de-compressor) w/ an additional index at the end of the file that allows for random access to the underlying compressed data based on uncompressed offsets. This combined w/ a Content-Defined-Chunking[2] and a cryptographic hash[3] in the index allows for a very efficient content-addressable storage. For example I've successfully applied it to a bazel-cache, which gave me between 10x and 100x wins on repository size w/ negligible CPU usage increase.

[0] https://github.com/SaveTheRbtz/zstd-seekable-format-go

[1] https://github.com/facebook/zstd/blob/dev/contrib/seekable_f...

[2] e.g FastCDC https://www.usenix.org/system/files/conference/atc16/atc16-p...

[3] https://github.com/facebook/zstd/pull/2737

wscott · on April 23, 2022

BitKeeper used a seekable compressed file format for revision control data. It allowed a data-structure to be dynamically loaded on demand without needed to uncompress the whole file. A large empty memory buffer was allocated and then read permission was removed with mprotect(). Then a signal handler populate regions of that buffer with data from the compressed file on demand using the ability to seek to certain boundaries.

This change achieved a 10X speedup on normal operations compared to the old code that used SCCS files.

The compressed format stored data blocks in arbitrary order and then the index at the end of the file gave the data layout. Then allows write to the file without rewriting. BitKeeper itself only needed to append to the end of the file, but the format could support inserting data in the middle by only appending to the physical file.

It also had a data redundancy CRC layer that could detect data corruption and recover data from some types of corruption.

https://www.bitkeeper.org/ https://github.com/bitkeeper-scm/bitkeeper/blob/master/src/l... https://github.com/bitkeeper-scm/bitkeeper/blob/994fb651a404...

rwmj · on April 23, 2022

Really wish this was part of official zstd (https://github.com/facebook/zstd/issues/395#issuecomment-535...) and not a contrib / separate tool.

tomberek · on April 23, 2022

That last proposal is mine! Thanks for the Go implementation, I’ll have to play with it some time. Looks like you’ve built something very close to my intention. Did you come across any additional changes needed to the spec? Any suggestions or results from going through the implementation that would help?

Another thought is if it is possible and how to coordinate te re-using dictionaries.

SaveTheRbtz · on April 28, 2022

Thanks for creating this proposal! Would love to see it committed upstream. For me it worked just fine -- I'll need to play a bit more with it and maybe put it into a real-world production setup to see how well it preforms on real-world data.

kldx · on April 23, 2022

Is your bazel cache implementation open source? I am dabbling in bazel and I am not sure where zstd fits in the bazel cache model. I'm interested in learning more about this

SaveTheRbtz · on April 28, 2022

I did PoC experiments with compression, chunking, and IPFS here: https://github.com/SaveTheRbtz/bazel-cache

If you need a mature compression implementation for bazel I would recommend using recent bazel versions w/ gRPC-based bazel-remote: https://github.com/buchgr/bazel-remote

bazel nowadays supports end-to-end compression w/ `--experimental_remote_cache_compression`: https://github.com/bazelbuild/bazel/pull/14041

SaveTheRbtz · on May 21, 2021

The analysis itself is quite impressive: a very systematic top-down approach. We need more people doing stuff like this!

But! Be careful applying tunables from the article "as-is"[1]: some of them would destroy TCP performance:

  net.ipv4.tcp_sack=0
  net.ipv4.tcp_dsack=0
  net.ipv4.tcp_timestamps=0
  net.ipv4.tcp_moderate_rcvbuf=0
  net.ipv4.tcp_congestion_control=reno
  net.core.default_qdisc=noqueue

Not to mention that `gro off` that will bump CPU usage by ~10-20% on most real world workload, Security Team would be really against turning off mitigations, and usage of `-march=native` will cause a lot of core dumps in heterogenous production environments.

[1] This is usually the case with single purpose micro-benchmarks: most of the tunables have side effects that may not be captured by a single workflow. Always verify how the "tunings" you found on the internet behave in your environment.

SaveTheRbtz · on May 19, 2021

> tcpdump ... linux TCP stack ... ebpf

> Networks are not my specialty

I wish all network non-specialist were like you!

Natfan · on May 19, 2021

I have nowhere near any of these skills, and I know I'm not a specialist.

SaveTheRbtz · on May 19, 2021

Thanks! Added it back to the blogpost (seems like we accidentally lost the `pktmon` reference during the editing process =)

SaveTheRbtz · on May 19, 2021

A couple of questions:

* What are the reasons for disabling TCP timestamps by default? (If you can answer) will they be eventually enabled by default? (The reason I'm asking is that Linux uses TS field as storage for syncookies, and without it will drop WScale and SACK options greatly degrading Windows TCP perf in case of a synflood.[1])

* I've noticed "Pacing Profile : off" in the `netsh interface tcp show global` output. Is that the same as tcp pacing in fq qdisc[2]? (If you can answer) will it be eventually enabled by default?

[1] https://elixir.bootlin.com/linux/v5.13-rc2/source/net/ipv4/s... [2] https://man7.org/linux/man-pages/man8/tc-fq.8.html

slowstart · on May 19, 2021

Windows historically defaulted to accepting timestamps when negotiated by the peer but didn't initiate the negotiation. There are benefits to timestamps and one downside (12 bytes overhead per packet). Re. syncookies, that's an interesting problem but under a severe syn attack, degraded performance is not going to be the biggest worry for the server. We might turn them on but for the other benefits, no committed plans. Re. pacing profile, no that's pacing implemented at the TCP layer itself (unlike fq disc) and is an experimental knob off by default.

SaveTheRbtz · on May 19, 2021

re. syncookies: Linux by default starts issuing syncookies when listening socket's backlog overflows, so it may be accidentally triggered even by a small connection spike. (This, of course, is not an excuse for a service misconfiguration but it is quite common: somaxconn on Linux before 5.4 used to be 128 and many services use the default.)

re: pacing: Awesome!! I would guess it is similar to Linux "internal implementation for pacing"[1]. Looking forward to it eventually graduating form being experimental! As a datapoint: enabling pacing on our Edge hosts (circa 2017) resulted in ~17% reduction in packet loss (w/ CUBIC) and even fully eliminated queue drops on our shallow-buffered routers. There were a couple of roadbumps (e.g. "tcp: do not pace pure ack packets"[2]) but Eric Dumazet fixed all of them very quickly.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin... [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

slowstart · on May 19, 2021

Thanks for the heads up. We will investigate to see what fraction of connections end up losing these options. Pacing TCP is certainly on our roadmap. Our QUIC implementation MsQuic paces by default already.

drummer · on May 19, 2021

Do you have any details on how or when Microsoft will roll out quick in Windows? Will it work by just specifying the quick protocol when creating a socket like with tcp?

SaveTheRbtz · on May 19, 2021

Sadly, middleboxes are a real problem, esp. with our Enterprise customers. We had this problem even with HTTP/2 rollout so there is even a special HTTP/1.1-only mode in the Desktop Client for environments where h2 is disabled.

In the future we are planing on having an HTTP/3 support which will give us pretty much the same benefits as SCTP with a better middlebox compatibility.

SaveTheRbtz · on May 18, 2021

We didn't mention RSS/RPS in the post mostly because they are stable. (Albeit, relatively ineffective in terms of L2 cache misses.) FlowDirector, OTOH, breaks that stability and causes a lot of migrations, and hence a lot of re-ordering.

Anyways, nice reference for TAPS! Fo those wanting to dig into it a bit more, consider reading an introductory paper (before a myriad of RFC drafts from the "TAPS Working Group"): https://arxiv.org/pdf/2102.11035.pdf

PS. We went through most of our low-level web-server optimization for the Edge Network in an old blogpost: https://dropbox.tech/infrastructure/optimizing-web-servers-f...

SaveTheRbtz · on May 18, 2021

We will be eventually migrating to UDP (HTTP/3) once it is rolled out on Envoys[0] on Dropbox Edge Network[1].

[0] https://dropbox.tech/infrastructure/how-we-migrated-dropbox-...

[1] https://dropbox.tech/infrastructure/dropbox-traffic-infrastr...