http://www.paulgraham.com/carl.html
Here is what I got out of it.
1. They keep large amounts of data in memory mapped files
This puts the onus of memory management on the OS file system cache, which makes sense. Kafka, lucene, elasticsearch, varnish, mongodb, and many other systems take a similar strategy.
They also chose to hold the data in C/C++ structs and just access pointers to that object in lisp. I think this is probably similar to what numpy does in python, keeping the large data objects out of the python runtime and avoiding GC.
2. They find options for flights to and flights from using simple graph search
That makes sense too. The amount of data here is relatively limited.
3. To find optimal flights they use "very clever algorithms" on a "pricing-graph".
I'm thinking for this they use something like A* as a path finding algorithm (for a great intro to A*, see this series from Red Blob Games). This would allow them to say that they search through the full cartesian product (every possible combination) of pricing combinations but really only evaluate items that are likely to have the lowest prices.
The article mentions that results are
ordered according to the function f", assuming of course certain restrictions on f,
This makes A* (or another Djikstra derivative) even more likely, since this sounds like a requirement for an admissible distance heuristic.
4. They preallocate data structures and just fail queries when per query memory runs out
[...] we pre-allocate all data structures we need and die on queries that exceed them
This reminds me a bit of how elasticsearch works. They use threadpools to maintain a bunch of threads ready to do a specific amount of work. Each thread pool also has a queue of a certain size. When the pool if full, it will either block or just reject the attempt to add to the queue, depending on the type of process. This allows elasticsearch to use a consistent bounded amount of memory for its core performance critical code.
Manual memory management is also very common in other performance critical systems like game engines, scientific computing, and virtual machines / language runtimes.
High level take aways:
- they use smart algorithms in stages to yield a solution
- they avoid GC as much as possible
- long queries can be ok, as long as they yield a lot of value (or make the wait feel less like waiting)