More

dundun · on May 8, 2024

I'm going to fight tooth and nail when someone calls NYC unsafe, but it's going to be very difficult to argue against the store closings because of theft (as at least one factor).

I've personally witnessed three blatant thefts in the last few years from my local Duane Reade (that closed down in April). Every time the clerks are like "pretty sure that was the same guy from yesterday". It's never violent or scary. It's just like watching a fight between homeless people in a subway station -- you look, think that's odd, and move on.

> Where in NY are these stores closing? 4 different pharmacies that have closed down since the pandemic just on my path to work, including two a stone's throw from the NY stock exchange. https://maps.app.goo.gl/fJcHCgjVacP5pEuHA https://maps.app.goo.gl/kmDXnjHruMCvS2CA6

I suspect it's not all shrinkage though. I imagine continued trends where we buy more and more things via online retailers like Amazon and the growth of online/by mail pharmacies has contributed too. CVS/Duane Reade are still opening new locations too, so it can't be all that bad.

wernercd · on May 8, 2024

> I'm going to fight tooth and nail when someone calls NYC unsafe

Not the main thing I've been arguing, per-se but the fact that the national guard is being deployed into places like the subway seems to bolster the notion that NY isn't doing well.

https://abc7ny.com/subway-crime-nyc-statistics-assault/14381...

"According to the NYPD, there were 570 reports of felony assault on trains or in stations in 2023, that's the highest number in more than 20 years and a 53% jump from pre-pandemic levels."

January crime was up 50% compared to 2023 - and yes, that's a 2 year snapshot. Statistics is the game of picking the two points and saying "SEE! I'M RIGHT!"

but the main points is that violent crime is up.

The main point of my comments is more general crime - its hard to say crime is down when recent decisions to raise the bar to charge people has literally made fewer things crimes so "crime is down" can be true from a "statistically reported" perspective while actual numbers are up.

Look at California that raised the level of misdemeanor to $950... so felonies are down? Gee... I wonder why? Even though objectively more crime is happening, less is getting reported because people won't waste their time on "misdemeanors" that won't get charged by soft on crime DA. Crimes down? True... but also a lie.

Crime is running rampant as criminality is now, for all intents, legal if you're under a threshold. (or, if you're of the right demographics to "atone for past injustice")

dundun · on April 9, 2024

This is the biggest one: https://feast.dev/

echrisinger · on April 9, 2024

This isn't really a drop-in replacement; they don't offer transforms out of the box.

Admittedly some of the transforms proposed in this article are a little simple & don't represent the full space of feature eng requirements for all large orgs

econometrician · on April 9, 2024

Actually feast does support transformations depending upon the source. It supports transferring data on demand and via streaming. It does not support batch transformation only because technically it should just be an upload but we can revisit that decision.

xLaszlo · on April 9, 2024

I think feast is sunsetted

econometrician · on April 9, 2024

There are new maintainers: https://feast.dev/blog/the-future-of-feast/

dundun · on April 9, 2024

How does this relate to Zipline and Bighead? Does it replace those projects or is it a continuation of them?

nikhilsimha · on April 9, 2024

Bighead is the model training and inference platform.

Chronon is a full re-write of zipline with 1) a different underlying algorithm for time-travel to address scalability concerns. 2) a different serde and fetching strategy to address latency concerns.

echrisinger · on April 9, 2024

I'd imagine a continuation... he is also the author of Zipline

dundun · on March 14, 2024

There's a ton of dunning-kruger going on, but I think as college kids riding a high of winning a contest and learning a ton, it can be excused.

rjbwork · on March 14, 2024

Absolutely. I'm not trying to dump on them. It may not have come through with my choice of verbiage, but there's something romantic (in the aesthetic sense) and admirable about what they've done and their attitudes. That kind of unbridled optimism and confidence of youth that can be hard to cultivate as we get older.

mewpmewp2 · on March 14, 2024

Life is full of cringe and certainly I have had more than my fair share of it, who am I to judge anyone. Maybe I am even far more cringe than an average person, or maybe I notice my cringe much more, I hope it is the last, but I can never tell.

dundun · on July 14, 2023

Google opensourced Tensorflow because they believed it would help with the hiring process: if researchers could use the same framework to do their PhDs as Google used in their production systems, that was seen as an advantage.

Maybe that's Meta's play here? Maybe the idea is that the ecosystem around a model could be as valuable or more valuable than the model itself too, so an OSS model could benefit Meta a lot more by gaining more of the ecosystem mind share?

Or Maybe Yann LeCun is just a hippie that dreams of free love, hard drugs and open-source models?

dundun · on Feb 19, 2020

There was also a breaking story about Google acquiring Spotify on the exact same date a year later!

dundun · on Feb 25, 2016

You certainly can use a db for an event source. This article does a really good job of explaining how: http://www.confluent.io/blog/turning-the-database-inside-out...

As mentioned in the post, we've pushed Kafka to at least 700,000 events per second. We have room to push it to much more, but stay in tune for post 2 and 3 to see what we're doing instead.

pheeney · on Feb 25, 2016

That is actually the post that got me into event sourcing / streams. As far as user analytic type events go this makes complete sense to me. What I haven't been able to discern is whether its useful to use this architecture on a much much smaller scale for things that may not be user events.

I love the thought of throwing everything into a stream and populating the read models, analytics, search index, etc with the data. However, for example if you had a CMS / ecom for a smaller organization, should the admin actions also be events? If you have an event source db, they would have to be, and you get all the benefits outlined from the article.

At what point do you decide what to put in the stream and what to build without? Are there events that should never be in a stream? Those are the questions I have been researching but I haven't found a lot of resources or discussions around making these decisions.

pheeney · on Feb 26, 2016

My current thought process is you use a relational db like postgres with json support to go from hobby / early startup to traction where you would need to start being concerned with scaling. At that point you switch to kafka or related hosted tools.

As far the data you put into the stream, I would think it could be everything if you treated all data as immutable even admin actions? Only thing that seems up in the air is transactions.

That is as far as I got though. I don't work with a company that has that kind of scale to use this, but I'd like to start working with it.

macca321 · on Feb 26, 2016

I built an CRM/CMS application where every single controller action call is event sourced.

The whole application lives in memory as a single object aggregate, which gets rebuilt on startup. I started off with writing json to the file system, moved into compressing and appending to a log file, and moved into using Azure cloud tables.

It's awesomely fast to respond to requests (15ms), and to add new features, but you do get interesting new problems, e.g. along the way I had to:

- come up with a way of migrating events (as my storage formats changed as I improved my frameworks) - find a good way to do fast full-text-search against in memory objects as I had no SQL or ElasticSearch infrastructure (ended up using Linq against in-memory Lucene RamDirectories) - deal with concurrency issues in a fairly novel manner(as all users are acting against a single in-memory)

I'm hoping this architecture will start to become more popular - I think we are in need of a framework equivalent to Rails to take it mainstream.

pheeney · on Feb 26, 2016

That is very interesting. I am guessing this is a closed source application? Did you do something along the lines of CQRS (Command, Query part) or just write directly the event source? At what point did appending to log file stop working which caused the switch to the cloud (or was that for unrelated reasons)?

I am also hoping it will become more popular as the pros seem to vastly outweigh the cons. But I think you are right about the framework. From my research it seems to be medium to large enterprises that would typically be best suited to using and developing something like kafka, and those enterprises typically would not open source their applications. So I definitely think a framework from a company who is using it as scale would be huge.

Until then, I suppose I will keep reading up and learning all I can and figure out how to implement this on a much smaller scale.

macca321 · on Feb 28, 2016

Cloud storage was just used so I didn't have to manage backups myself.

I absolutely didn't separate command and query - the commands themselves are actions which execute against the domain model, and that domain is used to build responses.

My project is here: [Sourcery](https://github.com/mcintyre321/Sourcery) but I think a more mature project you might like to look into is [OrigioDB](http://origodb.com/).

Another thing that gets tricky is making your application deterministic - any calls to the current time, random number or guid generatiom, or to 3rd party services, have to be recorded and replayable in the correct order for when you reconstruct your application instance. This can get tricky if you refactor your application or change its logic later.

It's worth reading up on Prevalence/MemoryImage, and looking into NEventStore also.

dundun · on Feb 24, 2016

Nothing in this move should affect contributions of Snakebite and Luigi. If anything, it will just make them easier to use with cloud environments in addition to bare metal.

With over 20K jobs/day, Hadoop will be a part of Spotify's data processing stack for quite a while. BigQuery is just a (awesome) piece of the full puzzle.

[spotifier]

dundun · on May 18, 2015

Browsing through the code, this looks a bit rough around the edges and untested. I'd wonder how it compares to similar packages for Spark, Hadoop or Flink. Lack of numbers and users make me wonder if it's anything more than someone's PhD project at CMU.

dundun · on Feb 8, 2015

MR is actually growing at Google still and heavily used by legacy systems and experienced engineers.

Compared to newer frameworks like those described in the FlumeJava and MillWheel papers, MR's growth is flat.

This is also happening in the Hadoop ecosystem too: if you're writing JavaMR by hand, you're probably spending more time and writing less efficient jobs than what you might get by an optimized pig/hive job with tez under the hood. Or through something in the Cascading or Crunch family, which provides useful abstractions on top of MR or other execution engines.

Then there are also a lot of tools popping up that take some if the use cases that were shoehorned into MR which are more natural outside, like ML/iterative computation through Spark.

MR isn't dying inside or outside Google, it's just being abstracted away.