Login

tee/Grammar

I didn't read https://ocaml.org/manual/5.4/language.html carefully for a long time. Let's go through it and pick out some subtler points.

Identifiers

(* identifiers can, after the first character, contain any number of ' and _ *)
(* identifiers that start with _ are conventionally 'throwaway' variables, but can still be referenced *)
# let _a'______' = 1;;
# _a'______';;
- : int = 1

(* capitalized identifiers can't start with _ *)
# module _M'_'_ = struct end;;
Error: Syntax error
# module M'_'_ = struct end;;
module M'_'_ : sig end

(* Latin1 and some extra characters are valid in identifiers *)
# let ø = 1;;
val ø : int = 1

(* identifiers can be obnoxiously but not infinitely long. *)

Integers

It was surprising to see that a leading minus is really part of the integer literal syntax, since it's so often encountered as an infix operator.

# 2, 2l, 2L, 2n;;
- : int * int32 * int64 * nativeint = (2, 2l, 2L, 2n)
# 0100, 0b100, 0o100, 0x100;;
- : int * int * int * int = (100, 4, 64, 256)
# Printf.printf "%x\n%lx\n%Lx\n%nx" max_int Int32.max_int Int64.max_int Nativeint.max_int;;
3fffffffffffffff
7fffffff
7fffffffffffffff
7fffffffffffffff- : unit = ()
# Printf.printf "%x" 0xDEAD______beef__;;
deadbeef- : unit = ()

(* 0u also exists, but only in string parsing *)
# Int32.of_string_opt "0u2147483648";;
- : int32 option = Some (-2147483648l)
# Int32.of_string_opt "2147483648";;
- : int32 option = None
# let u = Int32.of_string "0u2147483648" in Printf.printf "%ld\n %lu\n" u u;;
-2147483648
 2147483648
(* OCaml stdlib is like ANS Forth: there are unsigned operations rather than unsigned values *)

Floats

Floats are wild in any language. OCaml also has hexadecimal floats with a power-of-2 exponent.

# 0x1p-10;;
- : float = 0.0009765625
# 0x1p10;;
- : float = 1024.

Characters

OCaml's more internally consistent in avoiding interpreting 0123 as an octal number both in literals and in escapes, but the latter is very unusual across programming languages. OCaml's stricter about escapes, not just silently dropping them or silently leaving them when not special:

# List.map Char.code ['\027'; '\x1b'; '\o033'];;
- : int list = [27; 27; 27]
# '\y';;
Error: Line 1, characters 0-3:
Error: Illegal backslash escape in string or character (\y)

vs.

$ echo -e "\y"
\y
$ perl -le 'print "\y"'
y
$ python -c 'print("\y")' # didn't fix this with v3?
<string>:1: SyntaxWarning: "\y" is an invalid escape sequence. Such sequences will not work in the future. Did you mean "\\y"? A raw string is also an option.
\y
$ lua -e 'print("\y")'  # honorable mention
lua: (command line):1: invalid escape sequence near '"\y'

Strings

A char is a byte, and there's no other literal syntax but strings for text. Fortunately, strings in OCaml are nearly ideal.

# "\u{1F42a} \u{1F42b}";;
- : string = "🐪 🐫"
# {camels|"\u{1F42a} \u{1F42b}"|camels};;
- : string = "\"\\u{1F42a} \\u{1F42b}\""
# "\
    ^ \n even the file has \r\n\
    < not present.";;
- : string = "^ \n even the file has \r\n< not present."
# "\
    ^ \n even the file has \r\n\
      < not present either :(";;
- : string = "^ \n even the file has \r\n< not present either :("
# "a
b
c
d";;
- : string = "a\nb\nc\nd"
# "a
b
  can still have leading spaces if you want
d";;
- : string = "a\nb\n  can still have leading spaces if you want\nd"
# "a\
b\
c\
d";;
- : string = "abcd"
# "\
   <- not present
  \  <- four spaces";;
- : string = "<- not present\n    <- four spaces"

NB. {||} strings aren't safe even for binary data that doesn't include the string id as \r\n still gets normalized to \n.

Naming labels

The surprising thing here is that spaces can be present:

# let f ?a b = ();;
val f : ?a:'a -> 'b -> unit = <fun>
# let f ? a b = ();;
val f : ?a:'a -> 'b -> unit = <fun>
# let f ~(*not optional:*)a b = ();;
val f : a:'a -> 'b -> unit = <fun>

Operators

NB. Most initial operator characters result in what must be a binary operator. Only ! ? ~ result in prefix operators - easy to remember from ref and labeled parameters.

# let ( << ) x = print_endline x;;
val ( << ) : string -> unit = <fun>
# << "hello";;
Error: Syntax error
# let ( !<< ) x = print_endline x;;
val ( !<< ) : string -> unit = <fun>
# !<< "world";;

Precedence and associativity are covered in chapter 7, Expressions.

Values

Noteworthy here is how high all the limits are. In the distant past I've used OCaml as a data description language and ran early into compiler limitations. That seems to be gone.

Also noteworthy is that list literals aren't mentioned in this section, apart from []. Because you can actually reuse list syntax for other kinds of values.