More

djb_hackernews · on July 13, 2018

Can you expand on that? In my experience nothing about Docker implies a performance impact in terms of size or start time.

collinf · on July 13, 2018

Well, there is the overhead of creating and removing namespaces each time a container is ran, or communicating with the Docker daemon.

I think to most people it would be negligible, but fb operates at a scale where these normally insignificant pieces matter. I would be interested to hear more about the _why_ of a system like this over containerization.

edit: rwmj's comment has a good discussion over the benefits of this over containerization.

nwmcsween · on July 14, 2018

I promise you 100% the overhead is docker and nothing else.

djb_hackernews · on July 6, 2018

The clustering story for etcd is pretty lacking in general. The discovery mechanisms are not built for cattle type infrastructure or public clouds. ie it is difficult to bootstrap a cluster on a public cloud without first knowing the network interfaces your nodes will have or it requires you to already have an etcd cluster OR use SRV records. From my experience etcd makes it hard to use auto scaling groups for healing and rolling updates.

From my experience consul seems to have a better clustering story but I'd be curious why etcd won out over other technologies as the k8s datastore of choice.

nvarsj · on July 6, 2018

> From my experience consul seems to have a better clustering story but I'd be curious why etcd won out over other technologies as the k8s datastore of choice.

That'd be some interesting history. That choice had a big impact in making etcd relevant, I think. As far as I know, etcd was chosen before kubernetes ever went public, pre-2014? So it must have been really bleeding edge at the time. I don't think consul was even out then - it might have been they were just too late to the game. The only other reasonable option was probably ZooKeeper.

robszumski · on July 6, 2018

I was around at CoreOS before Kubernetes existed. I don't recall exactly when etcd was chosen at the data store, but the Google team valued focus for this very important part of the system.

etcd didn't have an embedded DNS server, etc. Of course, these things can be built on top of etcd easily. Upstream has taken advantage of this by swapping the DNS server used in Kubernetes twice, IIRC.

Contrast this with Consul which contains a DNS server and is now moving into service mesh territory. This isn't a fault of Consul at all, just a desire to be a full solution vs a building block.

otterley · on July 6, 2018

My understanding is that Google valued the fact that etcd was willing to support gRPC and Consul wasn't -- i.e., raw performance/latency was the gating factor. etcd was historically far less stable and less well documented than Consul, even though Consul had more functionality. etcd may have caught up in the last couple years, though.

smarterclayton · on July 7, 2018

At the time gRPC was not part of etcd - that only arrived in etcd 3.x.

The design of etcd 3.x was heavily influenced by the Kube usecase, but the original value of etcd was that

A) you could actually do an reasonably cheap HA story (vs Singleton DBs)

B) the clustering fundamentals were sound (zookeeper at the time was not able to do dynamic reconfiguration, although in practice this hasn’t been a big issue)

C) consul came with a lot of baggage that we wanted to do differently - not to knock consul, it just overlapped with alternate design decisions (like a large local agent instead of a set of lightweight agents)

D) etcd was the simplest possible option that also supported efficient watch

While I wasn’t part of the pre open sourcing discussions, I agreed with the initial rationale and I don’t regret the choice.

The etcd2 - 3 migration was more painful than it could be, but most of the challenges I think were excacerbated by us not pulling the bandaid off early and forcing a 2-3 migration for all users right after 1.6.

piva00 · on July 6, 2018

My impression is that etcd works more in a lower-level data store abstraction than Consul, exactly why it's not so feature-rich but is used as building block. Consul packs more out-the-box if that's what you need.

Both are atill much better to operate than ZooKeeper.

outworlder · on July 7, 2018

There are several ways of bootstraping ETCD. The one I use is the one you mention: since they are brought up with Terraform, always on a brand new VPC, we can calculate what the IP addresses will be on Terraform itself and fill the initial node list that way. We can destroy an ETCD node if need be, and recreate. Granted, it is nowhere near close to being as convenient as an ASG.

The alternate method, and the method we used before, is to use an existing cluster, as you mention. If cattle self-healing is that important, perhaps you could afford a small cluster only for bootstrapping? Load will be very low unless you are bootstrapping a node somewhere. There are costs involved in keeping those instances 24/7, but they may be acceptable in your environment(and the instances can be small). Then the only thing you need is to store the discovery token and inject it with cloud init or some other mechanism.

That said, I just finish a task to automate our ELK clusters. For Elasticsearch I can just point to a load balancer which contains the masters and be done with it. I wish I could do the same for ETCD.

djb_hackernews · on April 17, 2018

This has a name! Baader-Meinhof Phenomenon.

djb_hackernews · on April 4, 2018

It's the same advantages as running anything else in a container, cgroups and namespaces.

djb_hackernews · on April 2, 2018

> Trump is a skilled negotiator

I think this meme died 6 months ago. If /s, carry on.

djb_hackernews · on Jan 18, 2018

350TB of memory, and 50,000 cores, nice.

ARP caching seems to be a common issue in cloud environments. AWS recommends turning it off and does so itself in their Amazon Linux distro.

djb_hackernews · on Jan 11, 2018

There are quite a few red flags in the story, nevermind the ICO.

- Pitching before building

- Delaying release for more development

- "Tremendous growth" but no one willing to put up $$

- What is forcing a series A? Are you out of runway?

- An investor taking the lazy approach to investing (requiring others invest first). This is so common and so lazy and not confidence inspiring.

anononpurpose · on Jan 11, 2018

My use of language might be confusing. We had a proof of concept implementation before the first investment. We got the investment after pitching for a year. We built and released the v1.0 in 6 months. Improved it for 2 years. Pivoted last summer. Retention rate is still increasing today. The meeting about the ICO happened lately.

djb_hackernews · on Dec 24, 2017

As in you investigate why their employers couldn't retain their talent?

The way employers view "job hoppers" says a lot about the work place culture, speaking from experience on both sides and both philosophies. (One philosophy solely faulting the employee, the other philosophy recognizing the current state of the industry)

ChuckMcM · on Dec 24, 2017

I like to try to understand what someone is hoping to get out of their job. People have any number of reasons for working, from 'putting food on the table' to 'changing the world.' There are a lot of intermediate motivations as well.

I try to understand that because I have a fairly good idea what the job can offer and the extent to which the job can be tailored to help align it with the employee's goals.

I try not to generalize. I start with the thesis that someone who loves their job and that job is meeting all of their current personal goals, will stay working on it until one of those two things changes. In my experience that sort of change generally takes years not months.

When people go job to job to job, it can be simple like they are trying to ratchet their salary up faster than the typical 'annual' raise cycle. It can be complex like they are taking jobs that meet some requirement they have imposed on themselves but they don't actually like those sorts of jobs. Or it can be something else entirely. I'd like to understand what it is so that I can gauge whether or not the person is likely to stick around long enough to have an impact or not.

djb_hackernews · on Dec 18, 2017

What do the people that use cloud instances that don't have local disks do? You definitely don't want swap to be on EBS...

Bender · on Dec 18, 2017

Allocate 20+% more memory that you could ever need. Use cgroups, lxc, systemd to constrain applications and containers to specific amounts of memory, cpu, etc. Properly engineered systems will have enough memory for everything else under the hood beyond the application. VM and container abstraction does not negate the requirement to calculate this.

sharms · on Dec 18, 2017

Its a good question, but the swap advice is suspicious. If I would normally purchase a 4GB of ram instance with 4GB swap, then what happens when I purchase a 8GB instance? Do I still need 4GB swap? Hopefully this helps illustrate that it is really the applications you intend to run and the swap doesn't mean the system gets more efficient

djb_hackernews · on Dec 5, 2017

It's not clear how this is different from what is currently possible. I'm not a route53 guru but can't you already a) create a subdomain microservice.mydomain.com b) create instances c) add the instances IP address to an A or AAAA record for the subdomain.

Is it that they didn't have APIs for these operations and now they do?

I know I'm missing something.

zackelan · on Dec 6, 2017

If I'm reading the underlying docs correctly, previously you would have called ChangeResourceRecordSets[0] with a quite verbose XML document. It looks like you'd need to first query for the existing RR set, modify it, then update it, and deal with potential race conditions if two service instances are starting concurrently. Technically possible, but quite a bit of complexity.

Now with auto-naming, you create a service[1], then a service instance calls RegisterInstance[2] on start-up with a much simpler JSON payload.

0: https://docs.aws.amazon.com/Route53/latest/APIReference/API_...

1: https://docs.aws.amazon.com/Route53/latest/APIReference/API_...

2: https://docs.aws.amazon.com/Route53/latest/APIReference/API_...

antoncohen · on Dec 6, 2017

There were APIs to do the operations you mentioned. I think for most services an ELB would do the trick; create an ELB, add instances to it, create a CNAME or Alias to the ELB.

The one time I've wanted this is with auto scaling groups for services that don't use ELBs. I haven't found docs on it, but if this could be used to add/remove DNS records based on auto scaling events that would be useful. It would save from using lifecycle hooks to trigger a Lambda function.

Also this seems to be a larger service discovery play, it just doesn't seem very fleshed out yet.

openasocket · on Dec 5, 2017

It's not a long article, like three paragraphs. The complexity is handling health checks and the like. If one of your endpoints goes down, you want to update the DNS record to remove it. Which means you have to make or use software that continuously monitors your endpoints and updates DNS accordingly. Now Route 53 will do those health checks for you automatically.

cirowrc · on Dec 6, 2017

When I first read the article I was under the impression that now one would be able to connect a zone with an autoscaling group (and, as you mentioned, avoid allocation internal ELBs), but it looks like it's really just some sugar on top of the existing API.

Am I right?