Emacs is an advanced text editor, which has great support for programming. Let's face it: not every language has been supported by the ecosystem. There are times you have to write your own major mode.

A few months ago, I wrote a major mode for Pest files. Yesterday, I used it again and found it quite usable. So I decided to write a post summarizing how to write a "good" major mode. In case you need any technical details, please refer to the Emacs Lisp manual.

Here is a quick checklist:

  • Syntax highlighting
  • Indentation
  • Commenting
  • imenu
  • flymake
  • eldoc
  • xref

Basic: Syntax table

The very first thing is to define a syntax table. You can define in the syntax table by yourself what parts should be a string, and what parts should be a number. This information is understood by font-lock-mode, motion commands, etc.

(defconst pest-mode-syntax-table
  (let ((table (make-syntax-table)))
    (modify-syntax-entry ?' "\"" table)
    (modify-syntax-entry ?\" "\"" table)

    (modify-syntax-entry ?/ ". 12" table)
    (modify-syntax-entry ?\n ">" table)
    table))

(define-derived-mode pest-mode prog-mode "Pest"
  ;; [...]
  :syntax-table pest-mode-syntax-table
  ;; [...]
  )

Note, the syntax table can be inherited, so if you are merely customizing an existing major mode, there's a good chance that you don't have to define it yourself.

Syntax highlighting

The minimal requirement. In Emacs, this is supported by font-lock-mode, which will attach face attributes to certain parts of the buffer based on their syntactic role. Highlighting happens in two phases:

  1. Syntactic parsing based on syntax table: to highlight comments, strings, etc.
  2. Searching: to highlight keywords, etc.

To enable the highlighting code, just set font-lock-defaults (buffer-)locally.

(defvar pest--highlights
  `((,(rx "'" (char alpha) "'")                         . font-lock-string-face)
    (,(rx (or "SOI" "EOI" "@" "+" "*" "?" "~"))         . font-lock-keyword-face)
    (,(rx (+ (or alpha "_")) (* (or (char alnum) "_"))) . font-lock-variable-name-face)))

(defun pest-font-lock-syntactic-face-function (state)
  "Return syntactic face given STATE."
  (if (nth 3 state)
      font-lock-string-face
    font-lock-comment-face))

(define-derived-mode pest-mode prog-mode "Pest"
  ;; [...]
  (setq-local font-lock-defaults
              '(pest--highlights
                nil nil nil nil
                (font-lock-syntactic-face-function . pest-font-lock-syntactic-face-function)))
  ;; [...]
  )

Regex-based highlighting is preferrable because it's fast, but it's not very accurate if you want more information. In that case, you may need something like tree-sitter, or a language server.

Indentation

Indentation is the next thing you expect from a "standard" major mode. Unfortunately, it's difficult to write a reliable, reusable, maintainable indentation function. What you often do is to write a simple one handling common cases, and to fix bugs whenever something bad occurs.

Emacs has a minimal "parser" based on syntax tables called syntax-ppss. It can tell you how deep you are in nested parentheses. To calculate the indentation, a lot of searches are required, so don't forget to set the limit to avoid performance degradation in large buffers.

It's worth noting that the usual pattern, press Tab anywhere in a line to indent this line, is not automatically supported by Emacs. You have to code it yourself, duh! Fortunately, this code is reusable, (seepest-indent-line).

(defun pest--calculate-indentation ()
  "Calculate the indentation of the current line."
  (let (indent)
    (save-excursion
      (back-to-indentation)
      (let* ((ppss (syntax-ppss))
             (depth (car ppss))
             (paren-start-pos (cadr ppss))
             (base (* 4 depth))
             (rule-sep (save-excursion
                         (or (looking-at "|")
                             (re-search-backward "|" paren-start-pos t)))))
        (unless (= depth 0)
          (setq indent base)
          (if (looking-at "\\s)")
              (setq indent (- base 4))
            (if (null rule-sep)
              (setq indent (+ 2 base)))))))
    indent))

(defun pest-indent-line (&optional indent)
  "Indent the current line according to the Pest syntax, or supply INDENT."
  (interactive "P")
  (let ((pos (- (point-max) (point)))
        (indent (or indent (pest--calculate-indentation)))
        (shift-amt nil)
        (beg (progn (beginning-of-line) (point))))
    (skip-chars-forward " \t")
    (if (null indent)
        (goto-char (- (point-max) pos))
      (setq shift-amt (- indent (current-column)))
      (if (zerop shift-amt)
          nil
        (delete-region beg (point))
        (indent-to indent))
      (if (> (- (point-max) pos) (point))
          (goto-char (- (point-max) pos))))))

(define-derived-mode pest-mode prog-mode "Pest"
  ;; [...]
  (setq-local indent-line-function #'pest-indent-line)
  ;; [...]
  )

As correctly pointed out in the Emacs Lisp manual, a reliable indentation function usally needs a parser. Emacs itself has a framework called semantic and two parser generators. I'm not familiar with them though. tree-sitter might as well help here.

Commenting

Enable M-; to comment or uncomment region. This is the easiest step, if the comment syntax is simple.

(define-derived-mode pest-mode prog-mode "Pest"
  ;; [...]
  (setq-local comment-start "// ")
  (setq-local comment-end "")
  ;; [...]
  )

imenu

imenu allows users to jump quickly between significant points (like the beginning of a function) in a buffer. The behavior is controlled by two buffer-local variables,

  1. imenu-prev-index-position-function: find the position backwards (usually through search)
  2. imenu-extract-index-name-function: get the name from given a correct index point
(defvar pest--rule-regexp (rx bol
                              (group (+ (or alpha "_") (* (or (char alnum) "_"))))
                              (* blank)
                              "=" (* blank) (or "_{" "@{" "!{" "${" "{")))

(defun pest--match-rule-name ()
  "Extract the rule name from last match."
  (match-string-no-properties 1))

(defun pest-imenu-prev-index-position ()
  "Jumps to the beginning of the previous rule."
  (re-search-backward pest--rule-regexp (point-min) t))

(defun pest-imenu-extract-index-name ()
  "Extract rule name here.
Should be called right after `pest-imenu-prev-index-position'."
  (pest--match-rule-name))

(define-derived-mode pest-mode prog-mode "Pest"
  ;; [...]
  (setq-local imenu-prev-index-position-function #'pest-imenu-prev-index-position)
  (setq-local imenu-extract-index-name-function #'pest-imenu-extract-index-name)
  ;; [...]
  )

Note that I did a little trick in pest-imenu-extract-index-name. There's no guarantee that the extract function will be called right after prev-index-position! A more reliable way is to do another search.

flymake

Flymake can check the buffer as the user edits it and gives diagnostic messages. The entry here is flymake-diagnostic-functions.

(define-derived-mode pest-mode prog-mode "Pest"
  ;; [...]
  (add-hook 'flymake-diagnostic-functions #'pest-flymake nil t)
  ;; [...]
  )

However, depending on how you check the buffer, the diagnostic function can be pretty long and complex, especially when you rely on external commands. The following is a shortened and commented version of pest-flymake.

The built-in facility for external commands is clunky, shabby, and hard to memorize. I recommend you to look for better third-party alternatives.

(defun pest-flymake (report-fn &rest _args)
  ;; Make sure previous process doesn't interfere
  (when (process-live-p pest--meta-flymake-proc)
    (kill-process pest--meta-flymake-proc))
  ;; Save current buffer, as we need its buffer-local variables later
  (let ((source (current-buffer)))
    (save-restriction
      ;; In case the buffer is narrowed
      (widen)
      ;; Create an asynchronous process
      (setq pest--meta-flymake-proc
            (make-process
             :name "pest-meta-flymake"
             :noquery t
             :connection-type 'pipe
             :buffer (generate-new-buffer " *pest-meta-flymake*")
             :command "pesta meta_check"
             :sentinel
             ;; Sentinel is called whenever process status changes
             (lambda (proc _event)
               ;; Check finished
               (when (eq 'exit (process-status proc))
                 (unwind-protect
                     ;; Check if `proc' is not changed
                     (if (with-current-buffer source (eq proc pest--meta-flymake-proc))
                         ;; Inside the process buffer, which is the output
                         ;; of the process
                         (with-current-buffer (process-buffer proc)
                           ;; Go to the beginning. Initially it will be at the end.
                           (goto-char (point-min))
                           (cl-loop
                            while (FIND-NEXT-DIAGNOSIS)
                            for msg = (GET-MESSAGE)
                            for beg = (GET-BEGIN)
                            for end = (GET-END)
                            for type = :error
                            collect (flymake-make-diagnostic source
                                                             beg
                                                             end
                                                             type
                                                             msg)
                            into diags
                            ;; Call report-fn with diagnostics
                            finally (funcall report-fn diags)))
                       (flymake-log :warning "Canceling obsolete check %s"
                                    proc))
                   ;; Don't forget to kill the process buffer
                   (kill-buffer (process-buffer proc)))))))
      ;; Send information to the process
      (process-send-region pest--meta-flymake-proc (point-min) (point-max))
      (process-send-eof pest--meta-flymake-proc))))

Note: you can't send EOF twice. If you need to send more than one segment of data, use serialization library like json.

eldoc

eldoc can show a little string depending on where the point is. A typical use case is to show the prototype of a function. The interface is simply eldoc-documentation-function, all you have to do is write a function that returns strings.

(define-derived-mode pest-input-mode text-mode "Pest-Input"
  ;; [...]
  (setq-local eldoc-documentation-function #'pest-input-eldoc)
  (eldoc-mode))

xref

xref is an extensible cross-referencing framework that enables jumping to definitions, finding references, etc.

Most of the interfaces mentioned above are buffer-local variables, which are pretty straightforward and easy to understand. However, xref is not one of them. xref makes uses of EIOIO, the Emacs Lisp version of Common Lisp Object System. To make matters worse, xref is not extensively documented like anything above. It has not got a dedicated manual. Therefore, you have to read the code! Both xref and emacs-lisp-mode. And of course, you have to know a bit of CLOS.

To define an xref backend:

  1. Add hook to xref-backend-functions: this hook should simply return a backend identifier
  2. Use cl-defmethod to implement an interface

Some important interfaces are

  • xref-backend-identifier-at-point
  • xref-backend-definitions
  • xref-backend-references
  • xref-backend-apropos

Pest-mode only implements definitions, so the full code is simply:

(defun pest--xref-backend () 'pest)

(cl-defmethod xref-backend-definitions ((_backend (eql pest)) identifier)
  (save-excursion
    (goto-char (point-min))
    (cl-loop
     while (re-search-forward pest--rule-regexp nil t 1)
     if (string= identifier (pest--match-rule-name))
     collect (xref-make (pest--match-rule-name)
                        (xref-make-buffer-location (current-buffer)
                                                   (match-beginning 0))))))

(define-derived-mode pest-mode prog-mode "Pest"
  // [...]
  (add-hook 'xref-backend-functions #'pest--xref-backend))

It's pretty interesting that xref automatically supported finding references, without a single line of code.

Conclusion

Did I mention that all of these are built-in? Yes, I mean it, all of them are built-in. Emacs is more than an editor. It's a powerful platform that can support all kinds of editing needs.

However, I should say, the interfaces are messy and clunky. Meh. I'm not saying ELisp is bad. I just don't like the interfaces. Could smell the '90s.