This document describes GNU m4, as included with Linux; areas of potential incompatibility of which I am aware are mentioned as they arise and highlighted with a boldface “GNU”.
This was originally based on GNU m4 version 1.4.5; it has been updated for version 1.4.10.
You may find this helpful if
You should already be familiar with fundamental programming concepts (e.g., recursion).
There is a substantial overlap between the GNU m4 info pages and this document. The info pages are designed to be a comprehensive reference. This document is a much shorter “m4 by example” which is still “practically” complete – that is, I have tried to include:
Examples of the kind of details omitted are:
changequote
macro (in practice, all you need to know are the restrictions
to observe in order to ensure compatibility)
There is also some original material here:
M4 can be called a “template language”, a “macro language” or a “preprocessor language”. The name “m4” also refers to the program which processes texts in this language: this “preprocessor” or “macro processor” takes as input an m4 template and sends this to the output, after acting on any embedded directives, called macros.
At its most basic, it can be used for simple embedded text replacement. If m4 receives the input
define(AUTHOR, William Shakespeare) A Midsummer Night's Dream by AUTHOR
then it outputs
A Midsummer Night's Dream by William Shakespeare
While similar in principle to the better-known C preprocessor, it is a far more powerful, general-purpose tool. Some significant uses are:
M4 is a Unix filter program. Its arguments, if any, are the files it is to read; if none is specified then it reads from stdin. The resulting text is sent to stdout.
M4 comes with an initial set of built-in macros,
often simply called “builtins”.
The most basic of these,
define
, is used to create new macros:
define(AUTHOR, W. Shakespeare)
After this definition, the word “AUTHOR” is recognized as a macro that expands to “W. Shakespeare”.
The define
macro itself – including its two
arguments – expands to an empty string, that is,
it produces no output.
However the newline at the end of the AUTHOR
definition above would be echoed to the output.
If a blank line added to the output is a problem then
you can suppress it using the “delete to newline”
macro:
define(AUTHOR, W. Shakespeare)dnl
There is no space between the end of the macro and the dnl
:
if there were then that space would be echoed to the output.
No whitespace is allowed between a macro name and the opening parenthesis. Any whitespace before the beginning of a parameter is discarded. Thus the following definition is equivalent to the one above:
define( AUTHOR,W. Shakespeare)dnl
It's also possible to pass definitions on the command line
using the -D
option, for example:
m4 -DAUTHOR="W. Shakespeare" -DYEAR=1587 input_file.m4
Quoting a string suppresses macro expansion. The default quote characters are the backtick (`) and apostrophe ('). M4 strips off these delimiters before outputting the string. Thus
define(AUTHOR, W. Shakespeare)dnl `AUTHOR' is AUTHOR
produces the output
AUTHOR is W. Shakespeare
For conciseness, most examples will show m4's output in the following way:
`AUTHOR' is AUTHOR # -> AUTHOR is W. Shakespeare
In m4, the hash character # is the default opening delimiter of a comment. A comment lasts up to and including the following newline character. The contents of a comment are not examined by m4; however, contrary to what you might expect, comments are echoed to the output. Thus, the previous line, if entered in full, would actually produce the output
AUTHOR is W. Shakespeare # -> AUTHOR is W. Shakespeare
Opening comment delimiters can be protected by quotes:
`#' AUTHOR # -> # W. Shakespeare
Nested quotes are recognized as such:
``AUTHOR'' is AUTHOR # -> `AUTHOR' is W. Shakespeare
Quoted strings can include newlines:
define(newline,`line break') a newline here
outputs
a line break here
Without a matching opening quote character (`), a closing quote (') is simply echoed to the output. Thus
`AUTHOR ' is AUTHOR.''
produces
AUTHOR is W. Shakespeare.''
M4 also understands nested parentheses within a macro's argument list:
define(PARENS, ()) brackets: PARENS # -> brackets: ()
Unbalanced parentheses can be quoted to protect them:
define(LPAREN,`(') define(RPAREN,`)') LPAREN bracketed RPAREN # -> ( bracketed )
(Unbalanced quote characters are more problematic; a solution is given later.)
Pitfall: In fact, quoting of the macro name is also recommended. Consider the following:
define(LEFT, [) LEFT # -> [ define(LEFT, {) LEFT # -> [
Why didn't the second define
work?
The problem is that, within the second define
, the
macro LEFT
was expanded before the define macro
itself took effect:
define(LEFT, {) # -> define([, {) ->
That is, instead of redefining the macro LEFT
,
a new macro named [
was defined.
GNU m4 allows macros to have non-standard names,
including punctuation characters like [
.
In fact, the new macro doesn't seem to work either:
[ # -> [
That's because GNU m4 doesn't ordinarily recognize a
macro as a macro unless it has a valid name – that is,
a sequence of ASCII letters, underscores, or digits,
beginning with an underscore or letter.
For example,
my_macro1
and _1stMacro
are both valid names;
my.macro1
and 1stMacro
are not.
(We will see later how the ability to define
macros with invalid names can be useful.)
Quoting the macro's arguments avoids this problem:
define(`LEFT',`[') LEFT # -> [ define(`LEFT',`{') LEFT # -> {
For the same reason, the undefine
macro will
normally work as expected only if its argument is quoted:
define(`RIGHT', `]') undefine(RIGHT) # -> undefine(]) -> RIGHT # -> ] undefine(`RIGHT') RIGHT # -> RIGHT
(Note that undefine
does not complain if it is
given the name of a non-existent macro, it simply
does nothing.)
M4's behaviour can be mystifying. It is best to get an early understanding of how it works. This should save you time figuring out what's going on when it doesn't do what you expect.
First, m4 looks for tokens in its input – roughly speaking, it divides it into quoted strings, macro arguments, names (i.e., identifiers), numbers and other symbols (punctuation characters). Whitespace (including newlines), numbers and punctuation usually mark token boundaries; exceptions are when they appear within a quoted string or a macro argument.
define( `Version2', A – 1 )99Version2:Version2_ Version22 # -> 99A – 1 :Version2_ Version22
Above, since a valid name can include digits but cannot
begin with one, the names seen after the definition are
Version2
, Version2_
, and Version22
;
only the first of these corresponds to a defined macro.
Continuing:
Version2(arg1, arg2) Version2 (junk) garbage(trash)Version2() # -> A – 1 A – 1 (junk) garbage(trash)A – 1
If the name of a macro is followed immediately by a
'(' then m4 reads in a list of arguments.
The Version2
macro we have defined ignores
its arguments -- but that doesn't matter to m4:
it swallows up the arguments and outputs only
the macro's expansion “A – 1 ”.
In general, m4 passes input tokens and separators straight through to the output, making no change except to remove the quotes surrounding quoted string tokens. When it encounters a macro name, however, it stops echoing to the output. Instead:
If while reading in a macro's arguments, m4 encounters another macro then it repeats this process for the nested macro.
An example makes this clearer:
define(`definenum', `define(`num', `99')') num # -> num definenum num # -> define(`num', `99') num -> 99
As soon as m4 gets to the end of “definenum” on the
last line above, it recognizes it as a macro and
replaces it with “define(`num', 99)” --
however, instead of outputting this expansion,
it sticks it back on the beginning of its input buffer
and starts again from there.
Thus, the next thing it reads in is “define(`num', 99)”.
As the define macro expands to an empty string,
nothing is output; however, the new macro num
is
now defined.
Then m4 reads in a space which it echoes to the output,
followed by the macro num
, which it replaces with its
expansion.
The last line therefore results in the output “ 99”.
Unless a nested macro is quoted, it is expanded immediately:
define(`definenum', define(`num', `99')) num # -> 99 definenum # ->
Here, when m4 reads in the nested define
macro,
it immediately defines num
; it also replaces the macro
“define(`num', `99')” with its expansion – an empty string.
Thus, “definenum” ends up being defined as an empty string.
Arbitrary nesting is possible -- with (ordinarily) an extra layer of protective quotes at each level of nesting:
define(`definedefineX',`define(`defineX',`define(`X',`xxx')')') defineX X # -> defineX X definedefineX X # -> X defineX X # -> xxx
If rescanning of a macro's expansion is not what you want then just add more quotes:
define(`stmt',``define(`Y',`yyy')'') stmt # -> define(`Y',`yyy') Y # -> Y
Above, the outermost quotes are removed when the
nested macro is being read in – so stmt
expands
first to `define(`Y',`yyy')'
; m4 then rescans
this as a string token and removes the second layer of
quotes before sending it to the output.
Now consider the definition
define(`plus', `+')
Suppose we want to use this plus
macro twice
in succession with no intervening space.
Clearly, plusplus
doesn't work – it is read as
a single token, plusplus
, not two plus
tokens:
plusplus # -> plusplus
We can use an argument list as a separator:
plus()plus # -> ++
But watch what happens with an extra level of indirection:
define(`oper', `plus') oper()oper # -> plusoper
Here, oper()
expands to plus
; but then rescanning of
the input starts from the beginning of the expansion.
Thus, the next thing read in is the token plusoper
.
As it doesn't correspond to a macro, it is copied straight
to the output.
The problem can be solved by adding an empty quote as a separator:
oper`'oper # -> plus`'oper -> +`'oper -> ... -> ++
It is a good idea to include such a separator in macro definitions as a matter of policy:
define(`oper',`plus`'') oper()oper # -> plus`'oper -> +`'oper -> +oper -> ... -> ++
If ever m4 seems to hang or stop working, it is probably because a faulty macro has sent it into an infinite loop:
define(`Bye', `Bye for now') Hello. # -> Hello. Bye. # -> Bye for now. -> Bye for now for now. -> ...
Such an error is not always this obvious: the cycle may involve more than one macro.
Finally, look at this example:
define(`args', ``NAME', `Marie'') define(args) # -> define(`NAME', `Marie') -> NAME # -> Marie args(define(`args',`Rachel')) # -> args() -> `NAME', `Marie' -> NAME, Marie args # -> Rachel
In the second part of the example, although args
doesn't take an argument, we can still pass it one.
In this case the argument redefines the macro that's
currently being expanded.
However, it is the expansion that was in force when the
macro identifier was read in that is output.
Similarly, it is possible to define a self-modifying macro or even a self-destructing macro:
define(`msg', `undefine(`msg')Secret message.') msg # -> Secret message. msg # -> msg
Recursive macros can also be defined.
A deficiency of m4 is that there is no escape character. This means that if you want to use the backtick (`) for anything other than an opening quote delimiter you need to take care. Sometimes you can just add an extra layer of quotes:
I said, ``Quote me.'' # -> I said, `Quote me.'
However, in other cases, you might need an opening quote without m4 interpreting it as such.
The general way around this problem is to use the
changequote
macro, e.g.,
changequote(<!,!>) a `quo<!ted str!>ing'
outputs
a `quoted string'
Without parameters, changequote
restores the default
delimiters.
In general, it is best to avoid using changequote. You can define macros to insert literal quotes should you need them.
Sometimes, however, it is necessary to change the
quote character globally, e.g., because the backtick
character is not available on some keyboards or
because the text being processed makes extensive use
of the default quote characters.
If you do use changequote
then be aware of the
pitfalls:
GNU m4's changequote
can differ from other
implementations of m4 and from earlier versions of GNU m4.
For portability,
call changequote
only with two arguments –
or with no arguments, i.e.,
changequote`' # (trailing `' is separator if needed)
Note that changequote
changes how existing macros are
interpreted, e.g.,
define(x,``xyz'') x # -> xyz changequote({,}) x # -> `xyz'
Don't choose the same delimiter for the left and right quotes: doing so makes it impossible to have nested quotes.
Don't change a quote delimiter to anything that begins with a letter or underscore or a digit; m4 won't complain but it only recognizes a delimiter if it starts with a punctuation character. A digit may be recognized as a delimiter but not if it is scanned as part of the preceding token.
While later versions of GNU m4 have a greater tolerance for non-ASCII characters (e.g., the pound sign or an accented character) it is better to avoid them, certainly in macro names and preferably in delimiters too. If you do use 8-bit characters and m4 is not behaving quite as you expect, this may be the reason. Where multibyte character encoding is used, m4 should not be used at all.
As mentioned above, line comments are echoed to the output, e.g.,
define(`VERSION',`A1') VERSION # VERSION `quote' unmatched`
expands to
A1 # VERSION `quote' unmatched`
Comments are not very useful. However, even if you don't use them you need to remember to quote any hash character in order to prevent it being interpreted as the beginning of a comment:
`#' VERSION -> # A1
You can change the opening comment delimiter, e.g.,
changecom
(`@@')
– as with changequote
,
the new delimiter should start with a punctuation character.
If you want echoing block comments, you can also change the closing delimiter, e.g., for C-like comments,
changecom(/*,*/) VERSION `quote' /* VERSION `quote' ` */ VERSION # -> # A1 quote /* VERSION # `quote' ` */ A1
Without arguments, changecom
restores the default
comment delimiters.
For a comment that should not be echoed to the output,
use dnl
: this macro not only prevents the following
newline from being output (as we saw above), it also
discards everything up to the newline.
dnl These two lines will not result dnl in any output.
Non-echoing block comments: multiline comments that are not echoed to the output can be written like this
ifelse(` This is a comment spanning more than one line. ')dnl
This is a hack which takes advantage of the fact that the
ifelse
macro (described below) has no effect if it is
passed only one argument.
Some versions of m4 may therefore issue a warning about
insufficient arguments; GNU m4 doesn't.
Be sure there are no unmatched quotes in the comment text.
ifdef
(`a',b)
outputs b if a is defined;
ifdef(`a',b,c)
outputs c if a is not defined.
The definition being tested may be empty, e.g.,
define(`def') `def' is ifdef(`def', , not )defined. # -> def is defined.
ifelse
(a,b,c,d)
compares the strings a and b.
If they match, the macro expands to string c;
if not, string d.
This can be extended to multiple else-ifs:
ifelse(a,b,c,d,e,f,g)
means that if a matches b, then return (expand to) c; else if d matches e, then return f; else return g. In other words, it's shorthand for
ifelse(a,b,c,ifelse(d,e,f,g))
M4 normally treats numbers as strings.
However, the eval
macro allows access to
integer arithmetic;
expressions can include these operators (in order of precedence)
+ - | unary plus and minus |
** | exponent |
* / % | multiplication, division, modulo (eval(8/-5) -> -1 ) |
+ - | addition and subtraction |
<< >> | shift up or down (eval(-8>>1) -> -4 ) |
== != < <= >= > | relational |
! | logical not (converts non-zero to 0, 0 to 1) |
~ | bitwise not (eval(~0) -> -1 ) |
& | bitwise and (eval(6&5) -> 4 ) |
^ | bitwise exclusive or (eval(3^2) -> 1 ) |
| |
bitwise or (eval(1|2) -> 3 ) |
&& | logical and |
|| |
logical or |
The above table is for GNU m4; unfortunately,
the operators and precedence are version-dependent.
Some versions of m4 incorrectly treat ^
the same as **
(exponent).
For maximum compatibility, make liberal use of parentheses
to enforce precedence.
Should you need it, octal, hexadecimal and indeed
arbitrary radix arithmetic are available.
It's also possible to specify the width of eval
's output.
(See the m4 info pages for details on these.)
eval(7*6) # -> 42 eval(7/3+100) # -> 102
There are also incr
and decr
builtins as shortcuts
which expand to the argument plus or minus one, e.g.,
incr(x)
is equivalent to eval(x+1)
:
define(`n', 0) n # -> 0 define(`n', incr(n)) n # -> 1
Beware of silent integer overflow, e.g.,
on my machine, the integer range is -2**31
... 2**31-1
;
eval(2**31)
erroneously expands to -2147483648
.
Logical conditions can be checked like this:
`n' is ifelse(eval(n < 2), 1, less than , eval(n = 2), 1, , greater than )2
len
:
len(`hello') # -> 5
substr
:
substr(`hello', 1, 3) # -> ell substr(`hello', 2) # -> llo
index
:
index(`hello',`llo') # -> 2 index(`not in string', `xyz') # -> -1
translit
:
define(`ALPHA', `abcdefghijklmnopqrstuvwxyz') define(`ALPHA_UPR', `ABCDEFGHIJKLMNOPQRSTUVWXYZ') define(`ROT13', `nopqrstuvwxyzabcdefghijklm') translit(`abc ebg13', ALPHA, ALPHA_UPR) # -> ABC EBG13 translit(`abc ebg13', ALPHA, ROT13) # -> nop rot13
GNU m4 includes some additional string macros:
regexp
, to search for a regular expression in a
string, and patsubst
, to do find and replace.
Unfortunately, m4's usual approach of rescanning the expansion of a macro can be a problem with macros that operate on strings:
define(`eng',`engineering') substr(`engineer',0,3) # -> eng -> engineering translit(`rat', ALPHA, ROT13) # -> eng -> engineering
This is not normally the desired behaviour and is arguably a design bug in m4: the builtins should at least provide some way to allow us to prevent the extracted or transformed substring from being expanded. A workaround is suggested below.
In standard m4 (Unix), a macro can have up to 9 arguments;
within the macro definition, these are referenced as
$1
... $9
.
(GNU m4 has no fixed limit on the number of arguments.)
Arguments default to the empty string, e.g., if 2
arguments are passed then $3
will be empty.
Going in at the deep end, here is a reimplementation of the
len
builtin (replacing it) as a recursive macro.
define(`len',`ifelse($1,,0,`eval(1+len(substr($1,1)))')')
In a macro definition, argument references like $1
expand immediately, regardless of surrounding quotes.
For example, len(`xyz')
above would expand (at the
first step) to
ifelse(xyz,,0,`eval(1+len(substr(xyz,1)))')')
Where necessary, this immediate expansion can be prevented
by breaking up the reference with
inside quotes, e.g., $`'1
.
The name of the macro is given by $0
;
$#
expands to the number of arguments.
Note in the following example that
empty parentheses are treated as delimiting a single argument:
an empty string:
define(`count', ``$0': $# args') count # -> count: 0 args count() # -> count: 1 args count(1) # -> count: 1 args count(1,) # -> count: 2 args
$*
expands to the list of arguments;
$@
does the same but protects each one with quotes
to prevent them being expanded:
define(`list',`$`'*: $*; $`'@: $@') list(len(`abc'),`len(`abc')') # -> $*: 3,3; $@: 3,len(`abc')
A common requirement is to process a list of arguments where
we don't know in advance how long the list will be.
Here, the shift
macro comes in useful – it expands
to the same list of arguments with the first one removed:
shift(1,2, `abc', 4) # -> 2,abc,4 shift(one) # -> define(`echolast',`ifelse(eval($#<2),1,`$1`'', `echolast(shift($@))')') echolast(one,two,three) # -> three
All macros have global scope.
What if we want a “local variable” – a macro that is used only within the definition of another macro? In particular, suppose we want to avoid accidentally redefining a macro used somewhere else.
One possibility is to prefix “local” macro names with the name of the containing macro. Unfortunately, this isn't entirely satisfactory – and it won't work at all in a recursive macro. A better approach is described in the next section.
For each macro, m4 actually creates a stack of definitions –
the current definition is just the one on top of the stack.
It's possible to temporarily redefine a macro by using
pushdef
to add a definition to the top of the stack
and, later, popdef
to destroy only the topmost
definition:
define(`USED',1) define(`proc', `pushdef(`USED',10)pushdef(`UNUSED',20)dnl `'`USED' = USED, `UNUSED' = UNUSED`'dnl `'popdef(`USED',`UNUSED')') proc # -> USED = 10, UNUSED = 20 USED # -> 1
If the macro hasn't yet been defined then pushdef
is
equivalent to define
.
As with undefine
, it is not an error to popdef
a macro which isn't currently defined; it simply has
no effect.
In GNU m4, define(X,Y)
works like
popdef(X)pushdef(X,Y)
, i.e., it replaces only the
topmost definition on the stack;
in some implementations, define(X)
is equivalent to
undefine(X)define(X,Y)
, i.e., the new definition
replaces the whole stack.
When GNU m4 encounters a word such as “define” that corresponds to a builtin that requires arguments, it leaves the word unchanged unless it is immediately followed by an opening parenthesis.
define(`MYMACRO',`text') # -> define a macro # -> define a macro
Actually, we can say that m4 does expand the macro –
but that it expands only to the same literal string.
We can make our own macros equally intelligent by adding an
ifelse
– or an extra clause to an existing “ifelse”:
define(`reverse',`ifelse($1,,, `reverse(substr($1,1))`'substr($1,0,1)')') reverse drawer: reverse(`drawer') # -> drawer: reward define(`reverse',`ifelse($#,0,``$0'',$1,,, `reverse(substr($1,1))`'substr($1,0,1)')') reverse drawer: reverse(`drawer') # -> reverse drawer: reward
Unfortunately, some macros do not require arguments and so m4 has no way of knowing whether a word corresponding to a macro name is intended to be a macro call or just accidentally present in the text being processed.
Also, other versions of m4, and older versions of GNU m4, may expand macro names which are not followed by arguments even where GNU m4 does not:
# GNU m4 1.4.10 we shift the responsibility # -> we shift the responsibility # GNU m4 1.4.5 we shift the responsibility # -> we the responsibility
In general, the problem is dealt with by quoting any word that corresponds to a macro name:
we `shift' the responsibility # -> we shift the responsibility
However if you are not fully in control of the text being passed to m4 this can be troublesome. Many macro names, like “changequote”, are unlikely to occur in ordinary text. Potentially more problematic are dictionary words that are recognized as macros even without arguments:
divert
, undivert
(covered below)
windows
(“windows” – as well as “unix” and “os2” – is defined in some versions of m4 as a way of testing the platform on which m4 is running; by default it is not defined in GNU m4.)
An alternative to quoting macro names is to change all
m4's macro names so that they won't clash with anything.
Invoking m4 with the -P
command-line option prefixes
all builtins with “m4_”:
define(`M1',`text1')M1 # -> define(M1,text1)M1 m4_define(`M1',`text1')M1 # -> text1
On the basis that unnecessary changes to a language are
generally undesirable, I suggest not using -P
option
if you can comfortably avoid it.
However, if you are writing a set of m4 macros that may be included by others as a module, do add some kind of prefix to your own macros to reduce the possibility of clashes.
Although m4 provides no builtins for iteration, it is not difficult to create macros which use recursion to do this. Various implementations can be found on the web. This author's “for” loop is:
define(`for',`ifelse($#,0,``$0'',`ifelse(eval($2<=$3),1, `pushdef(`$1',$2)$4`'popdef(`$1')$0(`$1',incr($2),$3,`$4')')')') for n = for(`x',1,5,`x,')... # -> for n = 1,2,3,4,5,... for(`x',1,3,`for(`x',0,4,`eval(5-x)') ') # -> 54321 54321 54321
Note the use of pushdef
and popdef
to prevent
loop variables clobbering any existing variable;
in the nested for
loop, this causes the second x
to
hide (shadow) the first one during execution of the inner loop.
A “for each” macro might be written:
define(`foreach',`ifelse(eval($#>2),1, `pushdef(`$1',`$3')$2`'popdef(`$1')dnl `'ifelse(eval($#>3),1,`$0(`$1',`$2',shift(shift(shift($@))))')')') foreach(`X',`Open the X. ',`door',`window') # -> Open the door. Open the window. foreach(`X',`foreach(`Y',`Y the X. ',`Open',`Close')',`door',`window') # -> Open the door. Close the door. Open the window. Close the window. define(`OPER',``$2 the $1'') foreach(`XY',`OPER(XY). ', ``window',`Open'', ``door',`Close'') # -> Open the window. Close the door.
In a “for” loop of either kind, it can be useful to know when you've reached the last item in the sequence:
define(`foreach',`ifelse(eval($#>2),1, `pushdef(`last_$1',eval($#==3))dnl `'pushdef(`$1',`$3')$2`'popdef(`$1')dnl `'popdef(`last_$1')dnl `'ifelse(eval($#>3),1,`$0(`$1',`$2',shift(shift(shift($@))))')')') define(`everyone',``Tom',`Dick',`Harry'') foreach(`one',`one`'ifelse(last_one,0,` and ')',everyone). # -> Tom and Dick and Harry.
Finally, a simple “while” loop macro:
define(`while',`ifelse($#,0,``$0'',eval($1+0),1,`$2`'$0($@)')') define(`POW2',2) while(`POW2<=1000',`define(`POW2',eval(POW2*2))') POW2 # -> 1024
Here, the apparently redundant +0
in eval($1+0)
does
have a purpose: without it, a while
without arguments
expands to
ifelse(0,0,``while'',eval() ...
whereupon eval()
produces an empty argument warning.
To discard output – in particular,
to prevent newlines in a set of definitions being output – use
divert
:
divert(-1) <definitions...> divert(0)dnl
Unlike the contents of a comment, the definitions
(and any other macros) are still processed by m4;
divert(-1)
merely causes m4 to do this silently,
without sending anything to the output.
The last line above, with its dnl
to prevent
the following newline being echoed, could also have
been written:
divert`'dnl
divnum
expands to the number of the currently active
diversion; 0, the default, means standard output (stdout);
positive numbers are temporary buffers which are output in
numeric order at the end of processing.
Standard m4 has 9 buffers (1..9); in GNU m4 there is no
fixed limit.
undivert
(num)
appends the contents of diversion num
to the current diversion (normally stdout), emptying it; without
arguments, undivert
retrieves all diversions in numeric order.
Note that undivert()
is the same as undivert(0)
and
has no effect: diversion 0 is stdout which is effectively an
empty buffer.
The contents of the buffer are not interpreted when undivert
is run, they are simply output as raw text, e.g., the following
code results in Z Z Z
being output (not 9 9 9
):
divert(1) Z Z Z divert define(`Z',9) undivert(1)
There is an implicit divert
and undivert
when m4
reaches the end of the input, i.e., all buffers are flushed to
the standard output.
If you want to avoid this for any reason, you can of course
discard the contents of the buffers by putting the following
line at the end of your input
divert(-1)undivert
or by exiting using the m4exit
builtin.
include
(filename.m4)
causes the contents of the
named file to be read and interpreted as if it was part of
the current file (just like #include
in the C preprocessor).
GNU m4 allows for an include file search path.
To specify directories to be searched for include files use the
-I
option on the command line, e.g.,
m4 -I ~/mydir -Ilocaldir/subdir
or use the environment variable M4PATH
, e.g. (bash shell)
export M4PATH=~/mydir:localdir/subdir m4 test.m4
sinclude
(nonexistentfile)
(silent include) is a
version of include
that doesn't complain if the file
doesn't exist.
To include a file uninterpreted, GNU m4 allows
undivert
to be passed a filename argument.
If inc.m4
contains
define(`planet',`jupiter')
then
undivert(`inc.m4') # -> define(`planet',`jupiter') planet # -> planet include(`inc.m4')planet # -> jupiter
A system command can be passed to the shell, e.g.,
syscmd(`date --iso-8601|sed s/-/./g')
outputs something like 2007.10.16
.
The output from the command sent to syscmd
is not
interpreted:
syscmd(`echo "define(\`AUTHOR',\`Orwell')"') # -> define(`AUTHOR',`Orwell') AUTHOR # -> AUTHOR
However GNU m4 provides another macro,
esyscmd
, that does process the output of the
shell command:
esyscmd(`echo "define(\`AUTHOR',\`Orwell')"') # -> AUTHOR # -> Orwell
The macro sysval
expands to the exit status of the
last shell command issued (0 for success):
sysval # -> 0 esyscmd(`ls /no-dir/') sysval # -> 2
Naturally, m4 can be used as a filter in shell scripts or interactively:
echo "eval(98/3)"|m4
outputs 32.
Temporary files can be created to store the output of shell
commands:
maketemp
(prefixXXXXXX)
creates a temporary file and
expands to the filename – this name will be the (optional) prefix
with the six X's replaced by six random letters and digits.
In older versions of GNU m4 and in other implementations
of m4, the X's are generated from the process ID.
In certain contexts, this may be a security hole.
Another macro, mkstemp
, is available in newer m4's
which always generates a random filename extension.
define(`FILENAME',mkstemp(`/tmp/myscriptXXXXXX'))
The temporary file can be read in using include
(perhaps
in conjunction with divert
).
Most bugs relate to problems with quoting so check that first.
If you want to see step-by-step what m4 is doing, either
invoke it with the -dV
option or, to limit full debug output
to one part of the file,
debugmode(V) ...problematic section... debugmode
The V
flag is for full debugging; other flags for finer
control are described in the info pages.
dumpdef
(`macro', ...)
outputs to standard error
the formatted definition of each argument – or just <macro>
if macro
is a builtin;
dumpdef
without arguments dumps all definitions to stderr.
Nothing is sent to stdout.
For user-defined macros, defn
(`macro')
expands
to the definition string (i.e., not prefixed by the macro name).
errprint
(`this message goes to standard error (stderr)')
Suppose we want to allow strlen
to be used instead of
len
.
This won't work:
define(`strlen',`len') strlen(`hello') # -> len
because we forgot to relay the arguments:
define(`strlen',`len($@)') strlen(`hello') # -> 5
OK, but suppose we want to replace len
altogether.
Clearly, this doesn't work:
define(`strlen',`len($@)')undefine(`len') strlen(`hello') # -> len(hello)
since expansion now stops at len
.
However, using the builtin defn
to access the definition
of a macro, it's possible to alias or rename macros quite simply.
For user-defined macros, defn
expands to the text of the
macro (protected with quotes before being output).
The defn
of a builtin expands in most contexts to the empty
string – but when passed as an argument to “define” it expands
to a special token that has the desired effect:
define(`rename', `define(`$2',defn(`$1'))undefine(`$1')') rename(`define',`create') create(`vehicle',`truck') vehicle # -> truck define(`fuel',`diesel') # -> define(fuel,diesel) fuel # -> fuel
And, because of the intelligence built into the original
macro definition,
m4 is smart enough not to expand the word “create”
unless it is followed by parentheses.
Compare the indirect approach, where defn
is not used:
create(`new',`create($@)') new(`wheels', 4) define wheels # -> define 4 create wheels # -> create 4 create() wheels # -> 4 new wheels # -> 4
Even when you undefine a builtin or define another macro
with the same name, GNU m4 still keeps the internal
definition which can be called indirectly via the macro
builtin
:
define(`TREE',`maple') undefine(`define',`undefine') undefine(`TREE') # -> undefine(TREE) TREE # -> maple builtin(`undefine',`TREE') TREE # -> TREE builtin(`define',`create',`builtin'(``define'',$`'@)) create(`TREE',`ash') TREE # -> ash
(Note the judicious use of quotes for the last argument
to the call to builtin
which defines the create
macro above.
Because of the use of inner quotes, the usual approach
of surrounding the whole argument with quotes, i.e.,
builtin(`define',`create',`builtin(`define',$`'@)')
would not have worked as desired: instead, any call to the create macro would have ended up defining a macro called “$@”.)
Because they can be accessed only indirectly and so
don't need to be protected, the names of these
internal macros are not changed by the -P
flag.
The obvious way to prevent the characters ` and ' being interpreted as quotes is to change m4's quote delimiters as described above. This has some drawbacks, for example, to ensure the new delimiters don't accidentally occur anywhere else, more than one character may be used for each delimiter – and if there's a lot of quoting, the code will become more verbose and perhaps more difficult to read.
Another approach is to keep m4's existing quote delimiters and define macros which hide the backtick and apostrophe from m4. The trick is to balance the quotes while m4 still sees them as nested quotes, temporarily change the quoting, and then prevent one of the quotes being output:
define(`LQ',`changequote(<,>)`dnl' changequote`'') define(`RQ',`changequote(<,>)dnl` 'changequote`'') define(myne, `It`'RQ()s mine!') LQ()LQ()myne'' # -> ``It's mine!''
GNU m4 allows any macro to be called indirectly
using the macro indir
:
indir(`define',`SIZE',78) SIZE # -> 78 indir(`SIZE') # -> 78
This is useful where the name of the macro to be called is derived dynamically or where it does not correspond to a token (i.e., a macro name with spaces or punctuation).
Compared to an ordinary call, there are two differences to be aware of:
indir(`define(`SIZE')',67) # -> m4: undefined macro `define(`SIZE')' indir(`SIZE', indir(`define',`SIZE',53)) # -> 53 indir(`SIZE', indir(`undefine',`SIZE')) # -> m4: undefined macro `SIZE'
We can of course define our own higher-order macros.
For example, here is a macro, do
, roughly similar to
indir
above:
define(do, $1($2, $3, $4, $5)) do(`define', ``x'', 4) x # -> 4
Since extra arguments are normally ignored, do
works
for any macro taking up to 4 arguments.
Note however that the example here, which expands to
define(`x', 4, , , )
,
does generate a warning:
“excess arguments to builtin `define' ignored”.
Pretend we don't know that the sum n + (n-1) + ... + 1
is given by n*(n+1)/2
and so we define a recursive macro
to calculate it:
define(`sigma',`ifelse(eval($1<=1),1,$1,`eval($1+sigma(decr($1)))')')
If too large a number is passed to this macro then m4 may crash with a message like
ERROR: recursion limit of 1024 exceeded
(for GNU m4 1.4.10).
In fact, the problem is not that sigma
is recursive,
it is the degree of nesting in the expansion,
e.g., sigma(1000)
will expand to
eval(1000 + eval(999 + eval(998 + eval(997 + ...
The nesting limit could be increased using a command line
option (-L
).
However, we do better to avoid the problem by performing
the calculation as we go using an extra parameter as an
accumulator:
define(`sigma',`ifelse(eval($1<1),1,$2,`sigma(decr($1),eval($2+$1))')')
Now, no matter how many steps in the expansion, the amount of
nesting is limited at every step, e.g., sigma(1000)
becomes
ifelse(eval(1000<1),1,,`sigma(decr(1000),eval(+1000))')
which becomes sigma(999,1000)
which in turn expands to
ifelse(eval(999<1),1,1000,`sigma(decr(999),eval(1000+999))')
and so on.
Here, the default value of the added parameter (an empty string) worked OK. In other cases, an auxiliary macro may be required: the auxiliary macro will then be the recursive one; the main macro will call it, passing the appropriate initial value for the extra parameter.
Although it is not standard, GNU m4 allows any text string
to be defined as a macro.
Since only valid identifiers are checked against macros,
macros whose names include spaces or punctuation characters
will not be expanded.
However, they can still be accessed as variables using the
defn
macro:
define(`my var', `a strange one') my var is defn(`my var'). # -> my var is a strange one.
This feature can be used to implement arrays and hashes (associative arrays):
define(`_set', `define(`$1[$2]', `$3')') define(`_get', `defn(`$1[$2]')') _set(`myarray', 1, `alpha') _get(`myarray', 1) # -> alpha _set(`myarray', `alpha', `omega') _get(`myarray', _get(`myarray',1)) # -> omega defn(`myarray[alpha]') # -> omega
Above, we noted a problem with the string macros: it's not possible to prevent the string that's returned from being expanded.
Steven Simpson wrote a patch for m4 which fixes the problem by allowing an extra parameter to be passed to string macros – however this of course means using a non-standard m4.
A less radical fix is to redefine the
substr
macro as follows.
It works by extracting the substring one letter at a time,
thus avoiding any unwanted expansion (assuming, of course,
that no one-letter macros have been defined):
define(`substr',`ifelse($#,0,``$0'', $#,2,`substr($@,eval(len(`$1')-$2))', `ifelse(eval($3<=0),1,, `builtin(`substr',`$1',$2,1)`'substr( `$1',eval($2+1),eval($3-1))')')')dnl define(`eng',`engineering') substr(`engineer',0,3) # -> eng
To keep it simple, this definition assumes reasonably
sensible arguments, e.g., it doesn't allow for
substr(`abcdef', -2)
or substr(`abc')
.
Note that, as with the corresponding builtin substr
,
you may have problems where a string contains quotes, e.g.,
substr(``quoted'',0,3)
The new version of substr
can in turn be used to
implement a new version of translit
:
define(`translit',`ifelse($#,0,``$0'', len(`$1'),0,, `builtin(`translit',substr(`$1',0,1),`$2',`$3')`'translit( substr(`$1',1),`$2',`$3')')')dnl define(`ALPHA', `abcdefghijklmnopqrstuvwxyz') define(`ALPHA_UPR', `ABCDEFGHIJKLMNOPQRSTUVWXYZ') translit(`alpha', ALPHA, ALPHA_UPR) # -> ALPHA
M4's general character as a macro language can be seen by comparing it to another, very different macro language: FreeMarker.
GNU m4 and FreeMarker are both free in both senses of the word: FreeMarker is covered by a BSD-style license. They are more-or-less equally “powerful”, e.g., both languages support recursive macros.
In some respects, m4 has an edge over FreeMarker:
The two languages are quite different in appearance and
how they work.
In m4, macros are ordinary identifiers; FreeMarker uses
XML-like markup for the <#opening>
and </#closing>
delimiters of macros.
While m4's textual rescanning approach is conceptually
elegant, it can be confusing in practice and demands
careful attention to layers of nested quotes.
FreeMarker, in comparison, works like a conventional
structured programming language, making it much easier
to read, write and debug.
On the other hand, FreeMarker markup is more verbose and
might seem intrusive in certain contexts, for example,
where macros are used to extend an existing programming
language.
FreeMarker has several distinct advantages:
Ultimately, which language is “better” depends on the importance of their relative advantages in different contexts. This author has very positive experience of using FreeMarker/FMPP for automatic code generation where, for several reasons, m4 was unsuitable. On the other hand, m4 is clearly a more sensible and appropriate choice for Unix sendmail's configuration macros.