A tutorial for the Mercury programming language

With Mercury, the entry point (e.g. main in C) must be defined as a main/2 , i.e. main with 2 parameters. If you attempt to define main as anything else, you would get a compile error: % mmc ./rall.m /usr/bin/ld: rall_init.o: in function `mercury_init': rall_init.c:(.text+0x4a0): undefined reference to `' collect2: error: ld returned 1 exit status The two parameters of this main/2 must be of type io.state . If you don't define it as io.state , you would get a compile error: ./rall.m:005: Error: `main'/2 must have mode `(di, uo)'. ./rall.m:005: Error: both arguments of `main/2' must have type `io.state'. So we now have the first few lines of Mercury: :- interface . % NOTE: to use io.state we must import the io module. :- import_module io. :- pred main(io. state , io. state ). You might wonder what exactly is io.state and why main needs two of them. We will explain this later. At the very least, these are not the argc argv thing you would see in C. There are dedicated predicates for retrieving command line arguments; we will get to that later. The (di, uo) part needs explanation. The message says it's a mode. What is a mode? Modes are things that describe the change of instantiatedness before and after the execution of a goal. Instantiatedness - that's a long word. What is instantiatedness? Roughly speaking, it's a concept for describing whether a slot is free or bound. (In Mercury, instantiatedness is actually a tree, because obviously you'd have - or at the very least you could manually construct - terms that are only bounded at certain parts.) Two kinds of basic instantiatedness exists in Mercury: free , which refers to variables that are not bound by any values; bound , which means that the term is not a variable but rather some concrete term at least at that level. Consider the term blah(A, b) ; it's bound at the top (with blah ) and at one of the children node (at b ) but free at another children node (at A , provided that A does not have a value already). A mode, ground , also exists, and refers to the terms that are completely bound (e.g. blah(a, b) is ground but blah(A, b) isn't, provided that A is still free .) With this knowledge, we should first understand what in and out modes actually are. If you have worked with some other languages you might have seen thing similar to this before: C libraries often ask you to pass in a reference for retrieving the actual result because the return value is used for returning error status; you can think of it as "in" (inputting) parameters and "out" (returning) parameters; some languages (e.g. Ada) even explicitly label them as such. I do not wish to introduce you to wrong analogies that will become detrimental for your future learning, but I have to say they do be somewhat similar. That said, in and out is properly defined in Mercury, as follows: :- mode in == ground >> ground . :- mode out == free >> ground . This largely fits with our intuition about in and out parameters: we expect in arguments to be fully grounded and they stay grounded, and we expect out arguments are variables which we will provide a solid term to (thus making it grounded). There are also "polymorphic" version of in and out , defined as follows: :- mode in ( Inst ) == Inst >> Inst . :- mode out ( Inst ) == free >> Inst . You can indeed define your own instantiatedness, with which then you can use these definitions to get the in and out version of it; but we will not need this for our example this time. Now we can finally come back to di and uo . di stands for "destructive input", and uo stands for "unique output". If you have used Rust (or, in the case where you are really adventurous, Clean) before, you might have a vague idea of what this is. In Mercury, there are two special instantiatedness named unique and dead , the former conceptually refers to values that can only have one reference at all time, and the latter refers to reference that are once "unique" but is now "dead". (Mercury also has muo , mdi and mui which stands for "mostly unique output", "mostly destructive input" and "mostly unique input". They are meant to support backtracking.) % unique output - used to create a "unique" value :- mode uo == free >> unique. % unique input - used to inspect a unique value without causing % reference to become dead :- mode ui == unique >> unique. % destructive input - used to deallocate or reuse the memory % occupied by a value that will not be used. :- mode di == unique >> dead. Up to now, our code would be something that's similar to this: :- module mercury_rall. :- interface . :- import_module io. :- pred main(io. state , io. state ). :- implementation . main ( _ , _ ) :- % some dummy body for our main predicate. 1 = 1. If you compile this, the Mercury compiler would complain about not having a mode declaration. For this reason, we will add the following line and compile: :- mode main( di , uo ). But this time we would see the compiler complain about different things: ./rall.m:007: Error: no determinism declaration for exported predicate ./rall.m:007: `main'/2. ./rall.m:012: In clause for `main(di, uo)': ./rall.m:012: mode error: argument 2 did not get sufficiently ./rall.m:012: instantiated. ./rall.m:012: Final instantiatedness of `HeadVar__2' was `free', ./rall.m:012: expected final instantiatedness was `unique'. The compiler expects the second argument would be supplied with a free argument and that argument should become a unique value at the end of main ! How do we do such a thing? Fortunately the io module has what we need to stuff the body of main , and we will use one of them: main ( In , Out ) :- io. write_string ( "blah" , In , Out ). But this time the compiler started complaining about other things: ./rall.m:007: Error: no determinism declaration for exported predicate ./rall.m:007: `main'/2. Determinism in this case, informally speaking, refers to "how a certain thing would succeed/fail". It's not something we'd care about in other languages, at least not in an explicit, supported-by-the-language-itself manner; (we normally only talk about when programs terminate at a certain state or not.) In Mercury, we have the following determinism categories: Deterministic ( det ): guaranteed to have one and exactly one solution. ): guaranteed to have one and exactly one solution. Semideterministic ( semidet ): have exactly one solution, but does not guarantee to produce it. ): have exactly one solution, but does not guarantee to produce it. Multisolution ( multi ): guaranteed to have a solution among possibly many solutions. ): guaranteed to have a solution among possibly many solutions. Nondeterministic ( nondet ): have possibly many solutions, does not guarantee to produce one. ): have possibly many solutions, does not guarantee to produce one. Failure ( failure ): cases where there's zero solutions. They are not actual errors but a part of the logic flow (e.g. arity mismatch during unification, which will never produce a solution because the arity is different). ): cases where there's zero solutions. They are not actual errors but a part of the logic flow (e.g. arity mismatch during unification, which will never produce a solution because the arity is different). Errorneous ( errorneous ): also have zero solutions, but they do represent actual errors which in other languages would be represented in the form of runtime exception throw or panic. (Some people might not understand why det and semidet are separate things. Imagine a function that takes the "head" of a linked list; this function is obviously only defined on non-empty list and not defined on empty lists, so for any given list there's either only one solution or no solution, thus semidet instead of det .) (NOTE the difference between errorneous and failure might be clearer if I explain their behaviours when it comes to negation. Basically in Mercury (and other logic programming languages), you can take a predicate, put a negation on it, and then ask for a case where it does not succeed. Naturally, the negation of determinisms that are guaranteed to produce a result like det and multi would be failure (i.e. negating a definite success would be a failure), the negation of failure would be det (i.e. negating a definite failure would be a definite success), and the negation of semidet and nondet would be semidet (i.e. negating these cases turns it into a case that asks if the match would be successful or not); but negating an errorneous would only produce an errorneous .) (If you're going to ask this question - yes, solving the problem of perfect determinism inference does mean solving the halting problem, and is thus undecidable.) Other than these six categories there are also this thing called the "committed choice nondeterminism", which adds two more determinism class: cc_multi and cc_nondet . The difference between committed choice nondeterminism and "normal" nondeterminism is that normal nondeterminism supports backtracking while CC nondeterminism, despite potentially having more than one solutions, commits to only one of them and thus does not backtrack. The main entry point can be defined to be det or cc_multi , since both of them guarantee one and only one solution ( det , of course, is more strict than cc_multi , and if the Mercury compiler can determine something that should be able to be a det got labelled as a cc_multi , it would spit out a warning saying you could've gone with the stricter option.) In this case, we should add the string is det at the end of our mode declaration, so that the whole line would be :- mode main(di, uo) is det. . After this modification, the compiler should have finally stopped complaining and gives you an executable; when you run it, it would display a string blah , which should be obvious. At this point, the code shall look like something like this: :- module rall. :- interface . :- import_module io. :- pred main(io. state , io. state ). :- mode main( di , uo ) is det . % also: you can combine the `pred` and the `mode` line into one like this: % % :- pred main(io.state::di, io.state::uo) is det. :- implementation . main ( In , Out ) :- io. write_string ( "blah" , In , Out ). Now, if you attempt to write multiple strings like this: main ( In , Out ) :- io. write_string ( "blah" , In , Out ), io. write_string ( "blah" , In , Out ). The compiler would produce this error: ./rall.m:013: In clause for `main(di, uo)': ./rall.m:013: in argument 2 of call to predicate `io.write_string'/3: ./rall.m:013: unique-mode error: the called procedure would clobber its ./rall.m:013: argument, but variable `In' is still live. For more information, recompile with `-E'. If we look up the declaration of io.write_string : :- pred write_string( string :: in , io:: di , io:: uo ) is det . You should already know that di specifies a turn of a unique value into a dead value, so In after the first write_string would be considered dead and not unique , which does not fit the requirement of the di of the second write_string . If you try to experiment and do this: main ( In , Out ) :- io. write_string ( "blah" , In , Out ), io. write_string ( "blah" , Out , Out2 ). This would have the following violation: since we declared Out to be uo which is free >> unique , Out must be unique at the end of main , but the second write_string turns Out into a dead because that slot is di which is unique >> dead . A solution to this would be to do this: main ( In , Out ) :- io. write_string ( "blah" , In , Out1 ), io. write_string ( "blah" , Out1 , Out ). This would compile, and the generated executable would write "blah" twice, as expected. To keep coming up with new variable name is tedious if this gets long. To solve this problem, Mercury have something called the "state variable". With state variables, the example above can be written like this: main (!. IO , !: IO ) :- io. write_string ( "blah" , !. IO , !: IO ), io. write_string ( "blah" , !. IO , !: IO ). !.IO refers to the value bound to IO at the current moment, and !:IO makes the value at that slot bound to IO ; so in the first write_string , the current value IO was destroyed, and the value bound to Out1 in our non-state-variable version was bound to IO as its next value; and in the second write_string , the value of Out1 , which itself has become the current value of IO and being referred to with !.IO , was destroyed, and the value bound to Out in our non-state-variable version was bound to IO as its next value, and subsequently became the output of main . You can write !.IO, !:IO as !IO , because the former is also quite tedious as well: main (! IO ) :- io. write_string ( "blah" , ! IO ), io. write_string ( "blah" , ! IO ). It would still feel a bit tedious for people who are used to other programming languages, but I suppose this is what you give up for being as explicit as possible for the sake of improved correctness and things…

A tutorial for the Mercury programming language

Share this article

Related Articles