ML Notes: Distinctiveness

ocaml.org has a Why OCaml page, adapted from Real World OCaml, which presents OCaml with a familiar bag of features that make OCaml seem like an already-familiar language:

GC'd
First-class functions
Static type-checking
Parametric polymorphism
Support for immutable programming
Type inference
Algebraic data-types and pattern-matching

The pitch there is not just the feature list, but the combination and implementation of these features. But the list is so prominent, and even Python at least partially checks all of it. What might actually surprise you about OCaml?

Thoughtful design

Like the next point, this is relative to other languages and it's hard to argue in isolation, but you will notice it. OCaml avoids the pitfall of "someone thought this feature might be cool so I added it" design, and also the pitfall of "I am divorced from any goal other than helping me and my friends get academic credits" design. OCaml has some core decisions like +. *. /. float operators that chafe many people and often, but that OCaml sticks to for sound reasons that it gets sound benefits from. At the same time there are restrained compromises with a lot of practical value, like Format strings.

One result of this is simply that you'll notice it. You'll notice that privacy annotations in classic OOP languages are a much worse version of encapsulation with module types - in theory and practice.

The more important result is that things somehow work out better. There's a community where people are constantly, for years bitterly complaining about the quality and performance of the LSP. It takes way too much memory on a simple file, it slows down, it crashes. Maybe the community only has bad programmers, or that no good programmer is willing to make the tool work? There's another community where the recurring gripe is that the language is practically forked by its features: if you use X, you can't use Y, and as a result if you make a library you need to target the lowest common denominator or these 'feature' forks can't use your library, and the language is not really as worthwhile when you do that. And then occasionally you hear news of some scripting language that's just seen some very impressive performance gains after some Herculean efforts to improve its runtime - which rather seem like Sisyphean efforts when you look at the result.

Yeah, this is talking OCaml up so much that I immediately want to talk it down, for balance. If it's so good then why do people keep dropping it, huh? How about that crypto project that said "we need to rewrite this OCaml bit because we can't find anyone willing to maintain it?" Didn't multicore take a long time and doesn't it have some important caveats vs. Erlang?

Still, you'll notice.

Going much harder than usual with type inference

# let a = Hashtbl.create 0;;
val a : ('_weak1, '_weak2) Hashtbl.t = <abstr>
# Hashtbl.add a "type inference" 100;;
- : unit = ()
# a;;
- : (string, int) Hashtbl.t = <abstr>

Idiomatic OCaml implementation code doesn't have much type annotation, and really doesn't need it. There's rarely a frustration with OCaml failing to make some inference, and instead (mainly newbie) frustration with OCaml complaining about a type error on line 10 due to a mistake that you made on line 3.

Code is very clean while Merlin or an LSP can instantly tell you what type anything is, even while you're mid-edit and the code won't build, and this plus the strong static typing is a huge gift to refactoring: you can completely change the representation of some data while changing very little code and while having high confidence in the outcome. In "little code changed" this is the extreme opposite from Rust, and in "high confidence" this is the extreme opposite from Python.

The unbelievably good module system

You can create modules anywhere, open them in a scope or even a single expression or value, nest them, parameterize them, pass them as values to functions that return other modules, and more. Everyone knows now that C headers are a hack, but every module system is a hack next to OCaml's module system.

Have you ever seen OOP in Perl?

package Dog;
use strict;
use warnings;

our $kind = 'canine';
sub new {
	my ($class, $name) = @_;
	my $self = { name => $name };
	return bless $self, $class;
}
1; # important if the above is in its own Dog.pm - the norm

package main;
my $dog = Dog->new("Fido");
print $dog->{name}, "\n";
print $Dog::kind, "\n";

Or how about a C hashtable where the way to specialize it on different types is to have multiple #include blocks preceded by specialized #defines?

This is how awkward and limited other languages' module systems look next to OCaml.

No runtime type information

RTTI comes with conveniences (to printing, to matching OOP subclasses out of the box), but also comes with size and time and complexity costs. OCaml does without and benefits from doing without, though you'll mostly notice this in the lack of the conveniences.

Lexical bindings

In many languages, toplevel definitions can be in any order, and something like this would be fine:

 (* invalid.ml *)
let f () = print_endline "f"; g () (* Error: Unbound value g *)
let g () = print_endline "g"; f ()

In OCaml, bindings are lexical, and the minimal change to make the previous work is simultaneous bindings:

 (* this-works.ml *)
let rec f () = print_endline "f"; g ()
and g () = print_endline "g"; f ()
let () = f ()

This is usually encountered as a limitation, but it has some subtle advantages. For one, it allows you to read a file in order and understand it in order, with the necessary context of a binding prior to the binding (although I wouldn't state this too strongly, since type inference on what you've read can still depend on unread code).

Acyclic module dependencies

This is also often experienced as a limitation but also has subtle benefits. The other day I wrote a program with a dune layout with a bin/main.ml and four libraries and some tests, and built and formatted with dune while working on it - and then I wrote a Makefile that concatenated all the files into a single script with a #! /usr/bin/env ocaml at the top and deployed that.

That concatenation needed this much massaging: add appropriate #directory and #load lines, then recreate the module structure around my libs:

#! /usr/bin/env ocaml
#directory "+unix"
#directory "+str"
#use "str.cma"
#use "unix.cma"
module Defs_update = struct
  module Config = struct
    {{lib/config.ml}}
  end
  module Pkg = struct
    {{lib/pkg.ml}}
  end
  ...
end
{{bin/main.ml}}

This worked just fine. And although I had to take some care with the order of the modules, OCaml requires there to be some order that works. (NB. manual ordering like this is rare with dune handling it normally. I might also be able to use ocamldep.)

Fast separate compilation

Outside of Go, compilers are getting slower and innovating towards slower builds, with features that only work with full knowledge of the program, with heavy linking steps, with more computation at compile-time. OCaml's stayed fast and has innovated in the build system instead.

This isn't purely an implementation issue, as language features constrain the implementation.

Many deployment and backend options

On one kerbutenes container where the image Perl randomly broke one day and stopped having even basic modules like Sys::Hostname, I just shrugged and rewrote a rarely-used script in OCaml. The script now is just #!ocaml and the source code. It runs by firing up the ocaml frontend that then runs the code. It's a little bit slower to start than Perl, but more reliable, and it works.

With #!ocamlscript it'd only be slower to start the first time or on subsequent edits, to rebuild and cache an executable, but normally be as fast as native-compiled code while retaining the convenience of a script.

Or I could've compiled it once and used the binary - to bytecode, which would run without any issues (before C stubs) even though it's a Linux target and I'm developing on a macOS host, or native code.

Or, if it wasn't such an easy task and I cared about a faster runtime while still not caring much about startup time, I could use ocamlnat.

There's also js_of_ocaml to run on node or possibly bun or in a browser, and js_of_ocaml works so well because it uses the stable output of the bytecode compiler.

And all of this is treating ocaml/ocamlc/ocamlopt as a given, but that has many options as well: a heavier optimizer or not, better debuggability or not, static and more portable binaries or not, 32-bit or 64-bit, or with dev supports like fuzzing, thread, memory sanitizers.

This is mostly an implementation issue, but it's not completely detached from the language. It was easier for OCaml to have such a rich implementation because of the restrained features of the language.