Congrats to the Arango team! I used it at my earlier workplace for computing suggested friends/followers which replaced an older service (Postgres + redis and application server-side caches). The resulting solution was faster, ran on a single modest machine (8 GB RAM, 4 cores) and allowed us to spin down 3 higher-end machines and reduce a layer of caches on application-servers. Writing the microservice using Foxx (which is built-in to Arango) was a pleasure and the easy deployment + Swagger API was a great developer experience. The community slack was friendly and helped me out with some AQL.
Imagine you would go to your preferred online marketplace and search for a generic product.
You get 1000+ results.
So you filter by avg.star-rating > 4.0
Still 500+ results.
Those with just one 5 star rating in front of the one with 300 reviews and a 4.8 avg. Annoying.
What I really want:
I would like to filter for products that have at least 5 (relatively long) reviews, an average rating of 4.0 and at least 2 of these review comments mentioning the use case for which I would like to use this product. Maybe I just want the verified purchases to be counted or the reviews of friends and friends of friends...
Using a native multi-model approach you can do both. Simply retrieve all category X products ranked by product rating, limit 50/page or perform advanced lookups - without having to synchronize data from a document or relational model with an additional graph or search engine.
Combining full text search with scorers, graph traversals and/or join operations you could do an ad-hoc query in AQL to get the most relevant products & reviews with a single query.
Multi-model provides choice. In data modeling and querying.
Never heard of this before too and it looks like a very good fit for our ML production data. Currently we use cassandra and I would have to see how easy it is to upgrade our tech and if it's worth it.
Seeing all the comments here it seems like Arango is a good fit for many use cases.
I would really recommend to the founders of the company to invest in marketing , it’s really important for developers to have something that speak to them.
My point is maybe the issue here isn’t performance of features of the database but rather the marketing that prevent it from finding its market fit.
Well deserved! At work I have been using ArangoDB for a few years now as a graph database. So far it's been working great with up to 100K graph nodes across two dozen collections.
ArangoDB is a perfect tool for prototyping or early stages of companies that need data that might have multiple looks to it. I've used it on tons of small projects and have nothing but praise. It's a solid not-so-little beast.
I don't understand under what specific use case ArangoDB works best; the comparisons section lists Cassandra and Neo4j, and my understanding was Cassandra was for something like chat apps and Neo4j was for something like GIS analytics. Enlighten me?
Also, how convertible is a proprietary query language like AQL/CQL to SQL? Is it fully declarative, and version completely independently of the database core?
Regarding the question on the query language, AQL is fully declarative. In this respect it is like SQL. However, there are a few differences between AQL and SQL:
* SQL is an all-purpose database management and querying language. It is very complex and heavy-weight as it has to solve a lot of different problems, e.g. data definition, data retrieval and manipulation, stored procedures etc. AQL is much more lightweight, as its purpose is querying and manipulating database data. Any data definition or database administration commands are not part of AQL, but can be achieved using other, dedicated commands/APIs.
* for data retrieval and manipulation, the functionality of SQL and AQL do overlap a lot, but they use different keywords for the similar things. Still simple SQL queries can be converted to AQL easily and vice versa. There are some specialized parts of AQL, such as graph traversals and shortest path queries, for which may be no direct equivalent in SQL.
AQL is versioned along with the database core, as sometimes features are added to AQL which the database core must also support and vice versa. However, during further development of AQL and the database core, one of the major goals is to keep it always downwards-compatible, meaning that existing AQL queries are expected to work and behave identically in newer versions of the database (but ideally run faster or are better optimized there).
Okay, I like how backwards compatibility is preserved. I worked with mongoDB at my previous company and we ended up not being able to migrate to mongoDB 3.x. I think it was because we forked 'eve-mongoengine' and couldn't merge upstream changes, which ended up forcing us to version the entire stack through the database at the same time, which passed the threshold of feasibility.
We were absolute idiots, but I still think a data warehouse should be idiot-proof, which is why I like SQL.
I read through the documentation for ArangoDB and I would be concerned about the lack of native strict type definitions and referencing in AQL, as well as the dearth of type availability in ArangoDB in general. Is this a design decision related to not supporting data/database administration, or something to be added later to the roadmap?
It sounds like if you support write-intensive paths through the database, it would be considered an OLTP database for some OLTP workloads; do you publish TPC-C benchmarks anywhere? What about resource utilization?
Is there a particular reason to support JavaScript first? Is it because Swagger has JavaScript-first support, or a different reason?
ArangoDB is a schema-less database.
There is currently no support for schemas or schema validation on the database core level, but it may be added later, because IMHO it is a very sensible feature.
When that is in place, AQL may also be extended to get more strict about the types used in queries. However, IMHO that should only be enforced if there is a schema present.
To keep things simple and manageable, we originally started with AQL just being a language for querying the database. It was extended years ago to support data manipulation operations. I don't exclude the possibility that at some point it will support database administration or DDL commands, however, I am just one of the developers and not the product manager.
And you are right about the main use case being OLTP workloads. For OLAP use cases, dedicated analytical databases (with fixed data schemas) are probably superior, because they can run much more specialized and streamlined operations on the data.
To my best knowledge we never published any TPC benchmark results somewhere. I think it's possible to implement TPC-C even without SQL, however, implementing the full benchmark is a huge amount of work, so we never did...
Forgot to answer the JavaScript question...
JavaScript can be used in ArangoDB to run something like stored procedures. ArangoDB comes with a JavaScript-based framework (named Foxx) for building data-centric micro services. Its usage is completely optional however. When using the framework, it will allow you to easily write custom REST APIs for certain database operations. The API description is consumable via Swagger too, so API documentation and discoverability are no-brainers.
Apart from that, ArangoDB comes with a JavaScript-enabled shell (arangosh) that can be used for scripting and automating database operations.
Somewhat at least...
N1QL tries to be more close to SQL in terms of keywords and such, whereas the AQL approach was to pick different keywords than SQL. Apart from the difference in keywords I tend to agree.
As an aside, Apache Cassandra CQL is now used by a growing number of wire-compliant databases:
• Cassandra
• Scylla (full disclosure: I work for ScyllaDB)
• DataStax Enterprise (DSE)
• Cosmos DB
• Yugabyte
Also, Cassandra, and Cassandra-like databases (like Scylla) are capable of far more than 'chat apps.' There are a lot of IoT, adtech, and other use cases. I just published this blog today: https://www.scylladb.com/2019/03/14/from-sap-to-scylla-track...
(Apologies for coming in sideways to this thread. Hat's off to ArangoDB, and all in the NoSQL arena who are pushing the envelope in terms of new Big Data solutions.)
ArangoDB is a multi-model database so it tries to target several use cases. It provides functionality working with key-values, documents, graphs and fulltext indexing/searching. It provides some flexibility in the sense that it does not force you into a specific way of working with the data. For example, it does not force you to treat each use case as a graph use case.
This is in contrast to some other specialized databases, which excel at their specific area, but also force you to completely adopt the type of data-modeling they support.
Think we have to be a bit more precise here. ArangoDB supports documents, key/value, and graph. It is not really optimized for large timeseries use cases which might need windowing or other features. Influx or Timescale might provide better characteristics here. However, for the supported data models we found a way to combine them quite efficiently.
Many search engines access data stored in JSON format. Hence integrating a search engine like ArangoSearch as an additional layer on top of the existing data models is no magic but makes a lot of sense. Allowing to combine models with search is then rather an obvious task for us.
Specialized databases have the advantage of being, well, specialized...
For example, a specialized OLAP database which knows about the schema of the data can employ much more streamlined storage and query operators, so it should have a "natural" performance advantage.
However, a very specialized database may later lock you in to something, and in case you need something different, you will end up with running multiple different special-purpose databases.
Not saying this is necessarily bad (or good), but it is at least one aspect to consider how many different databases to you want to operate & manage in your stack.
Interesting. Pretty much every startup I've worked for has run 2-3 databases. Usually Redis plus some search (typically Elastic now). I could see this making that easier.
I also feel managed service is quite critical offering they're missing right now. Considering recent AWS and Elastic debacle, market is going to be tough for Open Source products like ArangoDB.
I'm a bit more confused on this one, not having seen the tech before. Isn't the DB space absurdly saturated with open source tools like this not really having much life in them?
Postgres is a multi-model db, with document/keyvalue/graph -- isn't it just pretty easy for an established player to add data model onto their platform?
I played with Arango a few years ago to prototype some graph stuff. Super fun to play with and it was awesome being able to traverse the graph so easily.
We were playing with data to make it easy to go from a specific analyte that was generated all the way up through its protein, DNA, chromosome, disease, and phenotype via the graph. I'm sad the project never went anywhere, but even back then Arango was great.
ArangoDB is definitely my database of choice. There is a lot to like. Ease of setup and clustering, free REST API, solid graph features with AQL, great docs. I have been promoting it in my projects. I would love to be their partner or tech evangelist for Southeast Asia. If you guys are looking, I am game for it.
I wonder how would one douse investors concern of having Open Source product like ArangoDB, and AWS effectively eating their lunch if/when wide adoption comes?
Congratulations on the funding btw! I'm a happy and grateful user.
The database market will all its competition is definitely challenging. I have no doubt AWS will increase their database market share over time.
The good thing about this competition is that it is forcing all vendors to be innovative and to find (more) USPs.
AWS DocumentDB seems to be pretty much tied to the MongoDB API right now... So At the moment this will somewhat limit its functionality. However, they will not stand still and probably also extend into the multi-model space at some point. Apart from that, not everyone will be willing to pay for DocumentDB or have their data located in Amazon datacenters.
"AWS DocumentDB seems to be pretty much tied to the MongoDB API"
I could imagine that they didn't build DocumentDB from the ground up.
DocumentDB is probably just a MongoDB compatible API for one of their base services (S3 or DynamoDB).
As far as I know, they build Serverless Aurora on top of S3, with the help of S3 select. So they will probably just create another custom-DB compatible API if they have the impression that this custom DB becomes the next big thing.
Exactly, AWS DocumentDB is only MongoDB API-compatible, but it's not using any MongoDB components.
It's an implementation of its own, leveraging many the base building blocks and infrastructure Amazon has created.
DocumentDB is currently tied to the MongoDB 3.6 API, that means all the transactional extensions MongoDB has added recently is not present in DocumentDB (yet).
We've used ArangoDB for a while where I work, and have only had positive experiences so far. The query language, speed, and flexibility are all nice to work with.
hmm, I wonder how hard it would be to make a JavaScript driver that lets you manipulate data just like you do in JavaScript, eg. using map,reduce/filter, push etc. For me it's a lot of overhead when switching back and forth between different languages, eg. between JS and SQL. Even though SQL is a powerful language and I'm really good at it.
(full disclosure: I work for ArangoDB but this is my own personal opinion)
Coming from a JS background AQL is actually pretty easy to learn. Personally the only thing that keeps tripping me up is that AQL doesn't have a triple-equals and JS has trained me to avoid double-equals in comparisons.
This is how you fetch every user in a collection:
FOR user IN users RETURN user
This is how you fetch every admin:
FOR user IN users FILTER user.role == "admin" RETURN user
This is how you fetch their email addresses:
FOR user IN users FILTER user.role == "admin" RETURN user.email
Compare this to the equivalent in SQL:
SELECT email FROM users WHERE role == "admin"
The AQL example is IMO easy to read if you know JS or any similar language. AQL even has object and array literals. There are a few idiosyncrasies but you can get very far without needing to invest time to "properly" learn the language. The naive approach usually results in pretty good performance out of the box.
I'd say the mental overhead of switching into AQL and out doesn't quite compare to that of e.g. SQL or even MongoDB queries but you are of course correct that there is some overhead nevertheless. That said, there are community-maintained ODMs for ArangoDB if you don't want to touch another language to write the queries by hand.
I would strongly recommend giving AQL a try though. When I started using ArangoDB (before becoming a contributor) I was hesitant as well but what quickly won me over was that I was able to read most AQL queries without having to learn an entirely new language.