ML Notes: Advocacy

Garbage Collection

GC for Systems Programmers - a very effective defense of GC as such, in particular "Lies people believe about memory management"
an anonymous malloc() rant - not praising GC at all, but if you aren't using composable and explicit custom allocators, which can dodge the complaints expressed here in a way that RAII and Rust don't dodge, then why not just use GC in a more disciplined manner? Because you're not getting as much as you might think from the technology vs. the discipline.

Exceptions

Why I prefer exceptions to error values - exceptions are cleaner, provide better error messages, can even be faster, and don't offer the same moral hazards like forcing a choice between "change my interface and every caller's use of it" vs. "Rust's unwrap: assert that this error never happens"
Response to the last from The PrimeTime - his best argument is at the end, that with exceptions it's harder to know where errors are even possible. I'd appreciate some static assertions about that, and some way to show possible exceptions from a function with merlin.
How OCaml exceptions are implemented

Comparisons

vs. Haskell: 8 months of OCaml after 8 years of Haskell in production
If I come to an existing OCaml project, the worst thing previous developers could do to it is have poor variable names, minimal documentation, and 200+ LOC functions. That’s fine, nothing extraordinary, I can handle that.

If I come to an existing Haskell project, the worst thing previous developers could do… Well, my previous 8 years of Haskell experience can’t prepare me for that 😅

That’s why I feel more productive in OCaml.
vs. F#: Why I chose OCaml as my primary language - not focused on the comparison. In F#'s favor: computational expressions, type providers, active patterns, statically resolved type parameters, .NET interop. And against it: lack of a module language, incompatible object model, no local/generalized opens, no polymorphic variants, no GADTs, no user-defined effects, no extensible sumtypes.
vs. Python: OCaml: what you gain - mostly positive, but notably:
In the SAT solver, I declared the type of a literal abstractly as type lit and, internally, I used type lit = int (an array index). That worked fine. Later, I changed the internal representation from an int to a record. Ideally, that would have no effect on users of the module, but OCaml allows testing abstract types for equality, which resulted in each comparison recursively exploring the whole SAT problem. It can also cause runtime crashes if it encounters a function in this traversal. Haskell's type classes avoid this problem by letting you control which types can be compared and how the comparison should be done.
vs. F#, Haskell, Scala, Rust, SML: Real World OCaml's prologue
OCaml stands apart because it manages to provide a great deal of power while remaining highly pragmatic. The compiler has a straightforward compilation strategy that produces performant code without requiring heavy optimisation and without the complexities of dynamic just-in-time (JIT) compilation. This, along with OCaml’s strict evaluation model, makes runtime behavior easy to predict. The garbage collector is incremental, (letting you avoid large GC-related pauses) and precise, meaning it will collect all unreferenced data (unlike many reference-counting collectors). Plus, the runtime is simple and highly portable.
vs. Erlang, with SML and Haskell: My Road to Erlang
the tutorials were self-absorbed in the accoutrements of functional programming: type systems, fancy ways of using types for generic programming, lambda calculus tricks like currying. The tutorials for all three languages were surprisingly similar. The examples were either trivial or geared toward writing compilers.

...

There were a few trivial games written in OCaml, but they made heavy use of imperative features which made me wonder what the point was.
And also Would You Bet $100,000,000 on Your Pet Programming Language?
Say you've got a program that operates on a large set of floating point values. Hundreds of megabytes of floating point values. And then one day, your Objective Caml program runs out of memory and dies. You were smart of course, and knew that floating point numbers are boxed most of the time in OCaml, causing them to be larger than necessary. But arrays of floats are always unboxed, so that's what you used for the big data structures. And you're still out of memory. The problem is that "float" in OCaml means "double." In C it would be a snap to switch from the 64-bit double type to single precision 32-bit floats, instantly saving hundreds of megabytes. Unfortunately, this is something that was never considered important by the OCaml implementers, so you've got to go in and mess with the compiler to change it. I'm not picking on OCaml here; the same issue applies to many languages with floating point types.
(on this specific concern, Bigarray can give you 16-bit floats if you want.)
vs. Lean: Why Lean 4 replaced OCaml as my Primary Language - OCaml has frank and predictable compilation, a conservative release cycle, restrained metaprogramming, and a professional build system, vs. some Haskell-like proof assistant with fractal complexity.

Performance

https://roscidus.com/blog/blog/2024/07/22/performance/ - very thorough and tool-heavy troubleshooting of a performance degradation on a rewrite from Lwt to Eio. With a followup, in particular:
OCaml has a major heap for long-lived values, plus one fixed-size minor heap for each domain. New allocations are made sequentially on the allocating domain's minor heap (which is very fast, just adjusting a pointer by the size required).
When the minor heap is full the program performs a minor GC, moving any values that are still reachable to the major heap and leaving the minor heap empty.
Garbage collection of the major heap is done in small slices so that the application doesn't pause for long, and domains can do marking and sweeping work without needing to coordinate (except at the very end of a major cycle, when they briefly synchronise to agree a new cycle is starting).
However, as minor GCs move values that other domains may be using, they do require all domains to stop.
(The significant improvement from removing an auxiliary loop is kind of appalling.)

Reservations and dissatisfactions

It's ugly, I don't like the syntax, SML looks better, Haskell looks better, Rust looks better. I don't like having to parenthesize (-1) to avoid subtraction. I don't like semicolons as the list separator. I don't like all the dots in floating-point math.

OK. I found that my aggravation dropped a huge amount when I got past the initial stage of "why does adding ;; fix it? Should it let ... in or not, and why not?" After that, I found that it grew on me quickly. Good tooling also helps a lot - especially ocp-indent and ocamlformat and merlin, and utop, and getting comfortable with odocs after seriously pouring over ocamlgraph docs during Advent of Code.

One big reason that I won't write Lisp anymore is that I got tired of needing lots of editor help just to edit it in any way, so I get objecting to a language on a syntax level. The dots also can make math code look ugly very quickly, especially written without whitespace. ocamlformat is also hampered, seriously vs. zig, by the language not having subtle ways to guide formatting. You can't just add a trailing comma for a vertical layout of a function's parameters, for example.

There are too many stdlibs which fractures the already small community. This is the Python2/3 problem, the D attribute/betterc problem, the Erlang/Elixir problem!

Very surprisingly, it's not that bad, because code using different stdlibs actually continues to work fine together. You can have a Base module and a Stdlib module and link them into the same program. There's certainly a potential maintenance and human burden, but it's really not the deep split that you might anticipate. A win for OCaml's superior module system, perhaps.

at this point the core language is mostly frozen, isn’t it? We’re not going to get Rust’s loop, or if let, or break/continue, or safer syntax.

It's not that frozen when OCaml releases keep adding language features, like immutable arrays (sharing the literal syntax!), labeled tuples, atomic record fields in 5.4, deep effect handlers in 5.3, concurrency and parallelism in 5.0 from just three years ago.

But mainly, there's a very important opposite hazard that OCaml avoids, of recklessly adding tons of ill-advised features and then struggling to cope with it all or have to walk it back, like in Perl, C++, D, Nim - even Zig which added async and then ripped it out without replacement for now 4 releases since. Languages join fads like defer, add it to say "yeah, we've got defer", and then not care that it interacts strangely with the rest of the language or results in codebases that in the aggregate are harder to maintain.

.mli files feel like a chore with duplicate definitions, when most recent languages don't have such things

Other languages have other (worse) problems in this chore's stead, but it's fair that it feels that way. I don't think OCaml would be much different if it had some sugar for the case that you only want to make some small changes to a module's inferred type.

OCaml's lack of equational definitions, in particular, is ugly

This is a sentiment that I'd almost forgotten I ever had, but I did have it at one point. Standard ML, Haskell, Erlang (and Prolog et al.) all have function-level pattern-matching, where function definitions can resemble mathematical definitions a little more:

fun length [] = 0
  | length (_ :: tl) = 1 + length tl

mylength([]) -> 0;
mylength([_|T]) -> 1 + mylength(T).

OCaml has some restrained sugar with function, but it really doesn't do this at all:

let rec length = function
  | [] -> 0
  | _ :: tl -> 1 + length tl

let rec length list =
  match list with
  | [] -> 0
  | _ :: tl -> 1 + length tl

SML's charming here but it's not something I miss. It gets bad, as well. Here's an example from The Little MLer:

(* book formatting *)
fun subst_anchovy_by_cheese(Crust)
    = Crust
  | subst_anchovy_by_cheese(Cheese(x))
    = Cheese(subst_anchovy_by_cheese(x))
  | subst_anchovy_by_cheese(Cheese(x))
    = Onion(subst_anchovy_by_cheese(x))
  | subst_anchovy_by_cheese(Anchovy(x))
    = Cheese(subst_anchovy_by_cheese(x))
  | subst_anchovy_by_cheese(Sausage(x))
    = Sausage(subst_anchovy_by_cheese(x))

(* smlfmt *)
fun subst_anchovy_by_cheese (Crust) = Crust
  | subst_anchovy_by_cheese (Cheese (x)) =
      Cheese (subst_anchovy_by_cheese (x))
  | subst_anchovy_by_cheese (Cheese (x)) =
      Onion (subst_anchovy_by_cheese (x))
  | subst_anchovy_by_cheese (Anchovy (x)) =
      Cheese (subst_anchovy_by_cheese (x))
  | subst_anchovy_by_cheese (Sausage (x)) =
      Sausage (subst_anchovy_by_cheese (x))

With long function name, even a function this simple becomes a wall of text. OCaml:

let rec subst_anchovy_by_cheese = function
  | Crust -> Crust
  | Cheese x -> Cheese (subst_anchovy_by_cheese x)
  | Onion x -> Onion (subst_anchovy_by_cheese x)
  | Anchovy x -> Cheese (subst_anchovy_by_cheese x)
  | Sausage x -> Sausage (subst_anchovy_by_cheese x)

Hostile reviews

100 languages speedrun - "possibly the ugliest syntax of any major programming language", and
Should you use OCaml?
No.

OCaml offered a mix of features that was somewhat appealing a few decades ago - it's a functional garbage-collected language, statically compiled to speeds comparable to Java, with easy to understand eager semantics (no laziness and monads), and syntax which while godawful at least doesn't have millions of parentheses. All alternatives back then were either not really functional (C, Java), too parenthesized (Lisp), semantically too weird (Haskell), or too slow (Lisp, Ruby; generally Haskell too unless you put a lot of effort to work around its laziness).

Nowadays most languages have sufficient functional features (even totally non-functional ones like Kotlin), there's a plethora of LLVM-based languages that are fast enough, so OCaml's niche disappeared - and it was a small niche to begin with.

Strict ordering

in F# it's great for taming dependencies