Hacker Newsnew | past | comments | ask | show | jobs | submit | diego's commentslogin

Meta has extremely opaque account policies. For example, I bought the Meta Raybans a month ago. It kept telling me the AI features were not available in my region, even though I am in San Francisco. I joined Facebook in 2006, and I have used my account for the Oculus headset without a problem. But no matter what I did, the AI function of the Raybans wouldn't work.

I ended up creating a brand new account just for that, and it worked fine. No idea why it would work with a brand new account and not with my old account in good standing, never suspended or warned about anything.


I have recently had a need to create an Instagram account. I logged in from my home IP and it was recognized as coming from Vietnam (my home IP has been the same since 2016, always with the same ISP). Everything was in Vietnamese and I had to spend half an hour figuring out how to switch it back to English. But in the home feed I still got only Vietnamese influencers, and there was nothing in the settings to change that. I got assigned to Vietnam for life.

Well, I did nothing with the account except setting up the profile and following some people. Then I logged in to the account on my phone, which of course is not from Vietnam. Bam, account suspended for violating the TOS. I appealed, after one day got a message that the ban was upheld because I did violate the TOS.

I guess no Instagram for me. That's probably for the better.


There's a mid-sized international bus company over here and once I bought a ticket for the wrong day, realized only after payment. I simply called the phone number, the lady spoke my language, reissued the ticket for a different day, that's it.

I was shocked that customer support can work like that.


Is it possible your FB account has been compromised for a while? Is your ISP's RIR whois information correct?


No, the Instagram account was completely unrelated to my FB account (different email, different browser). Every other tool I have ever used over the last decade showed my IP location correctly, so I don't think there are any mistakes in my ISP's WHOIS.

As a bonus, Facebook Business Manager sometimes shows me messages in Russian.


my best guess is, you could've connected from a different ip once 10 years ago and it improperly geolocated that ip as being in a tiny country and now it thinks you're secretly from that country even though you've been accessing the site from a US ip forever. it's the only plausible reason i can really think of. unless they set up a "country estimation" ai or a similar newfangled system and it's convinced for some reason you're actually not american. it's too out there but you never know


Some GEOIP databases are rotting, near as I can tell.

I’ve got a proxy on random machine in a OVH DC in Oregon. Always properly geo-located to Oregon - until a few months ago.

Now YouTube insists I’m in France. Which is quite entertaining, ads wise.


I think the title just resonates. Older folks like me must reflexively upvote this stuff. I admit I do sometimes.



Since there's already an active thread with the correct link at https://news.ycombinator.com/item?id=36881948, I've merged the relevant comments thither. That way we can leave the complaints about the link here and just kill this thread.


https://iamnotarobot.substack.com/

Lately I've been writing about AI because it's impossible not to. But I generally write shower thoughts about tech and my experiences in the industry.


No complaint. It's more of a warning about how the main players (OpenAI, LangChain) share notebooks and cookbooks that illustrate how to make the LLMs "query" the databases. At the very least one would expect some language telling people to not do that in production. And it's not unique to SQL, this is just an extreme example.


> At the very least one would expect some language telling people to not do that in production. And it's not unique to SQL, this is just an extreme example.

In professional communication, is it necessary to repeat the obvious all the time? Does an article in a medical journal or a law journal need to explicitly remind its readers of 101 level stuff? If an unqualified person reads the article, misinterprets it because they don’t understand the basics of the discipline, and causes some harm as a result-how is that the responsibility of the authors of the article? Why should software engineering be any different?


> In professional communication, is it necessary to repeat the obvious all the time?

Based on the “repeat” dev articles I’ve seen on HN over the many years and the “repeat” mistakes replicated in the actual workplace, I think it is necessary.

> Why should software engineering be any different?

I don’t think it is. But also see my point below.

I understand the example you were trying to use but it wasn’t very effective. Dev blogs are not equivalent to medical or law journals in many ways that I don’t need to list. Academic computer science white papers are a bit closer.

Thinking about this more, in my experience and across multiple fields, I always see a phenomenon where either colleagues/classmates/whoever reference a _popular_ but _problematic_ resource which leads to a shitshow.


> Dev blogs are not equivalent to medical or law journals in many ways that I don’t need to list. Academic computer science white papers are a bit closer.

Okay, there are law blogs and medicine blogs too, which are directly comparable to dev blogs. And by that I mean blogs targeted at legal and medical professionals, not blogs on those topics targeted at consumers. For example, BMJ's Frontline Gastroenterology blog [0], whose target audience is practicing and trainee gastroenterologists, and its authors write for their target audience – it is public and anyone can read it, but I don't think the authors spend too much time worrying "what if an unqualified person reads this and misinterprets it due to a lack of basic medical knowledge?"

Or similarly, consider Opinio Juris, the most popular international law blog on the Internet. When a blog post contains the sentence "As most readers will know, lex specialis was created by the International Court of Justice in the Nuclear Weapons Case, to try to explain the relationship between international humanitarian law (IHL) and international human rights law (IHRL)", [1] you know you are not reading something aimed at a general audience.

[0] https://blogs.bmj.com/fg/

[1] http://opiniojuris.org/2020/01/13/the-soleimani-case-and-the...


> but I don't think the authors spend too much time worrying "what if an unqualified person reads this and misinterprets it due to a lack of basic medical knowledge?"

1) You don’t sound too sure about this. Your previous comment sounded like speculation also. Do you actually read these blogs and/or journals?

2) Again, you’re making comparisons that aren’t equivalent. Your argument fails when you replace “unqualified person” with “unqualified target person”. My pizza delivery driver is not reading dev blogs. The junior and senior engineers on my team over the years who passed 5 rounds of interviews yet still make simple but devastating mistakes are reading these blogs.

> lex specialis

1) In your previous comment, you said that medical and law journals _don’t_ explain every basic little thing. And now you provided a quote where the law blog is explicitly explaining a very basic thing even to their _qualified target audience_. If “most readers” already know something, then what’s the point of re-explaining it? You’re proving my point instead.

2) Another comparison that isn’t equivalent. Even if an “unqualified” person were to read a _professional_ law or medical blog/journal, what’s the worst that could happen? Nothing.

The answer to that question above will definitely change if we’re talking about _nonprofessional_ content (e.g. TikTok law and medical advice). Frankly, more dev blogs veer towards the “unprofessional” side than “professional”.


> 1) You don’t sound too sure about this. Your previous comment sounded like speculation also. Do you actually read these blogs and/or journals?

I have read some of them before. Not Frontiers in Gastroenterology, but I have spent a lot of time reading psychology/psychiatry journals, since they have some personal relevance. Some of my favourite papers are https://link.springer.com/article/10.1007/s40489-016-0085-x and https://www.nature.com/articles/s41398-019-0631-2 and also (not a paper, a letter to the editor) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9054657/

I have in the past read https://dontforgetthebubbles.com/ which is a paediatrics blog–again, has some personal relevance.

I also am very interested in law. I actually applied to law school once, but didn't get in, and gave up on the idea after that. If they'd accepted me, I might have been a lawyer right now rather than a software engineer. Public international law was always an area of particular fascination for me. I remember being at university, and I was supposed to be at my CS lecture, but instead I was in the library reading books like Restatement (Third) of the Foreign Relations Law of the United States and some textbook (I forget its name now) I found on the EU treaties and ECJ case law. So yes, I do read law blogs sometimes. I went through a period when I was reading SCOTUSblog a lot. Not a blog, but I actually enjoy reading stuff like this: https://legal.un.org/ilc/texts/instruments/english/draft_art...

> And now you provided a quote where the law blog is explicitly explaining a very basic thing even to their _qualified target audience_. If “most readers” already know something, then what’s the point of re-explaining it? You’re proving my point instead.

Even that quoted sentence is assuming the reader already knows what "international humanitarian law" and "international human rights law" are, and what is the difference between them. There are also many cases in that post in which (unlike lex specialis) the author uses technical legal terminology without ever explaining it: for example, his repeated invocation of jus ad bellum, or his mention of the "Inter-American System". Another example is where he cites the Vienna Convention on the Law of Treaties, which assumes the reader understands its significance.

> Even if an “unqualified” person were to read a _professional_ law or medical blog/journal, what’s the worst that could happen? Nothing.

For a medical journal – a person reads an article about using drug X to treat condition Y. They then proceed to misdiagnose themselves with condition Y, and then somehow acquire drug X without having been prescribed it, and start taking it. A person could cause quite serious medical harm to themselves in this way. Reading medical journals can also contribute to the development of illness anxiety disorder – you don't need to be a medical student to develop medical student's disease.

For a law journal - a criminal defendant reads something in a law journal and thinks it helps their case. Their lawyer tries to explain to them that they are misunderstanding it and it isn't actually relevant to their situation, but they refuse to listen. They fire their lawyer and then try to argue in court based on that misunderstanding. It is easy to see how they could end up with a significantly worse outcome as a result, maybe even many extra years in prison.

Conversely, our 10 year old sometimes write Python programs. They aren't anything special, but better than I could do at his age. I bet you his Python programs are full of security holes and nasty bugs and bad practices. Who cares, what possible harm could result? And he isn't at the stage yet of reading development blogs, but I've seen him before copying code off random websites he found, so maybe he has stumbled on to one of them. My brother is a (trainee) oncologist, but he did an introductory programming course as an undergrad, and he wrote some Python programs in that too, although he hasn't done any programming in years–what harm could have his programs done? If he started trying to modify the software in one of the radiation therapy machines, I'd be worried (but he's too responsible for that); if he decided to try writing a game in Python for fun, why should anyone worry, no matter what the quality of his code is?


Maybe "complaint" was the wrong word but I disagree with the conclusion that LLMs are "not for trustworthy production systems" for the reasons I stated.

Full disclosure, I wrote a blog post called "Text to SQL in Production." Maybe I should add a follow-up covering our guardrails. I agree that they are necessary.

https://canvasapp.com/blog/text-to-sql-in-production


tl;dr: nothing we didn't know. Since the beginning of times, startups with lots of funding have failed for a number of reasons. AI is no different in that regard.


Seconded. Start here, and pause whenever you realize there is something that you need to learn in order to follow. Learn that and keep going.


I'm already lost 1 minute in when he's talking about 'evaluating the gradient of a loss function'.


Taking the first derivative of the objective function, to figure out if it’s close enough to zero, right?


So ask ChatGPT what it's about. Seriously, learning shit became much easier nowadays.


I had not seen that article. I'm not surprised someone had a similar idea before.


If you try Bard or Claude or character.ai they are not far behind GPT4. They might even be on par in terms of raw LLM capabilities. ChatGPT has better marketing and in some cases better UX. A lot of this is self-fulfilling. We think it's far ahead, so it appears to be far ahead.


> If you try Bard or Claude or character.ai they are not far behind GPT4

Bard is way behind ChatGPT with GPT-3.5, much less GPT-4. Haven’t tried the others, though.

OTOH, that’s way behind qualitatively, not in terms of time-of-progress. So I don’t think it is at all an insurmountable lead, as much as it is a big utility gap.


In my experiments:

GPT4>ChatGPT>Claude>Character AI> Bard

Claude and Character AI are great at holding a conversation but they lack the ability to do anything specialized that really makes these LLM’s useful in my day to day life. I ask GPT-4 and ChatGPT questions I would ask in stackoverflow, I can’t do that with Claude or Character AI. Bard actually seems behind even conversationally to the rest


That's exactly what I did here. https://github.com/dbasch/semantic-search-tweets


Thank you! Comparing this and the link the other commenter posted, what handles the actual search querying? Does instructor-xl include an LLM in addition to the embeddings? The other commenter's repo uses Pinecone for the embeddings and OpenAI for the LLM.

My apologies if I am completely mangling the vocabulary here - I have an, at best, rudimentary understanding of this stuff that I am trying to hack my education on.

Edit: If you're at the SF meetup tomorrow, I'd happily buy you a beverage in return for this explanation :)


It's in the repo:

You first create embeddings. What is this? It's an n-dimensional vector space with your tweets 'embedded' in that space. Each word is an n-dimensional vector in this space. The vectorization is supposed to maintain 'semantic distance'. Basically, if two words are very close in meaning or related (by say frequently appearing next to each other in corpus) they should be 'close' in some of those n-dimensions as well. The result at the end is the '.bin' file, the 'semantic model' of your corpus.

https://github.com/dbasch/semantic-search-tweets/blob/main/e...

For semantic search, you run the same embedding algorithm against the query and take the resultant vectors and do similarity search via matrix ops, resulting in a set of results, with probabilities. These point back to the original source, here the tweets, and you just print the tweet(s) that you select from that result set (here the top 10).

https://github.com/dbasch/semantic-search-tweets/blob/main/s...

Experts can chime in here but there are knobs such as 'batch size' and the functions you use to index. (cosine was used here.)

So the various performance dimensions of the process should also be clear. There is a fixed cost of making the embeddings of your data. There is a per-op embedding of your query, and then running the similarity algorithm to find the result set.


Thank you for this walkthrough, and for citing the code alongside!


hth


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: