# **Graph Neural Networks and Arithmetic Circuits**

#### **Timon Barlag**

Institute for Theoretical Computer Science Leibniz University Hanover Hanover, Germany barlag@thi.uni-hannover.de

## Laura Strieker

Institute for Theoretical Computer Science Leibniz University Hanover Hanover, Germany strieker@thi.uni-hannover.de

#### Vivian Holzapfel

Institute for Theoretical Computer Science
Leibniz University Hanover
Hanover, Germany
holzapfel@thi.uni-hannover.de

## Jonni Virtema

School of Computer Science University of Sheffield Sheffield, United Kingdom j.t.virtema@sheffield.ac.uk

#### **Heribert Vollmer**

Institute for Theoretical Computer Science
Leibniz University Hanover
Hanover, Germany
vollmer@thi.uni-hannover.de

#### Abstract

We characterize the computational power of neural networks that follow the graph neural network (GNN) architecture, not restricted to aggregate-combine GNNs or other particular types. We establish an exact correspondence between the expressivity of GNNs using diverse activation functions and arithmetic circuits over real numbers. In our results the activation function of the network becomes a gate type in the circuit. Our result holds for families of constant depth circuits and networks, both uniformly and non-uniformly, for all common activation functions.

# 1 Introduction

Neural networks have recently received growing attention from a theoretical point of view in a number of papers studying their computational properties. Relevant to this paper are examinations of the computational power of neural networks after training, i.e., the training process is not taken into account but instead the computational power of an optimally trained network is studied. Starting already in the nineties, the expressive power of feed-forward neural networks (FNNs) has been related to Boolean threshold circuits, see, e.g., [Maass et al., 1991, Siegelmann and Sontag, 1995, Maass, 1997, He and Papakonstantinou, 2022]. Most importantly, Maass [1997] showed that when restricting networks to Boolean inputs, a language can be decided by a family of (in a certain sense) "polynomial-size" FNNs if and only if it belongs to the class  $TC^0$ , i.e., the class of all languages decidable by families of Boolean circuits of constant depth and polynomial size using negation gates and unbounded fan-in AND, OR, and threshold gates.

In the last five years attention has shifted to the study of *graph neural networks* (*GNNs*) which is a model for machine learning tasks on graph-structured inputs. Their expressive power can be closely related to the Weisfeiler-Leman algorithm [Xu et al., 2019, Morris et al., 2019, 2021]. Originally developed to solve the graph isomorphism problem, the WL algorithm is connected to fragments of first-order logic; hence the logical expressiveness of GNNs was subsequently studied. Barceló et al.

[2020] consider so-called logical classifiers—these are unary formulas of first-order (FO) predicate logic that classify a node in a given graph according to whether the formula holds for this node. Barceló et al. proved that all classifiers definable in a certain fragment GC (guarded first-order logic with counting quantifiers) are computable by GNNs. The converse direction is a bit problematic; it is only known to hold for unary queries definable in first-order logic. This result has been broadened recently by [Benedikt et al., 2024] using logics with Presburger quantifiers. Barceló et al. then consider an extension of GNNs by so-called global readout and show that these GNNs can compute all queries from the logic FOC<sub>2</sub> (first-order restricted to two variables, but extended with counting quantifiers, a superclass of GC), but the converse is open.

**Our motivation.** Neural networks in general and GNNs specifically have mostly been studied through the lens of Boolean functions. This does not necessarily reflect on the usage of machine learning models in real-world applications. We want to shift this focus to real-valued computations and hence in this paper, we make real numbers first-class citizens. When networks are digitally simulated in practical applications, of course only rational numbers are used as inputs, and the real numbers that appear during their computation (e.g., via the sigmoid function) are approximated by rationals. However, we are interested in principal statements about the computational power of neural networks—hence we study networks as devices operating with real numbers. We do not solely consider networks that are restricted to Boolean inputs or Boolean outputs (Boolean queries, logical classifiers), as done in all the papers cited above. Instead, we consider GNNs as a model to compute functions from (vectors of) real numbers to (vectors of) real numbers, or from labeled graphs (i.e., undirected graphs whose nodes are annotated with vectors of real numbers) to labeled graphs.

Instead of turning to logics, we focus on a computation model that has been used to capture the expressivity of neural networks in the past: we use circuits, but since we turn away from the Boolean computation model, we use *arithmetic circuits* over the reals, that is, circuits that take real numbers as inputs and have nodes that compute real functions such as addition, multiplication, projection, or a constant function. Why do we do that? It will turn out that in this way we obtain a close correspondence between GNNs and arithmetic circuits. Arithmetic circuits can in a second step be simulated by a Boolean computation model, but this does not say anything about the computational power of the networks. Going directly from neural networks to Boolean circuits mixes up two different issues and obscures statements about the power of GNNs. By separating the two aspects, we do not only obtain an upper bound, or correspondence with respect to discrete classification tasks, but an equivalence between GNNs and constant-depth arithmetic circuits over the real functions they compute. In this way, we shed more light on the computational model behind GNNs. Our results show very explicitly what the computational abilities of GNNs are and what elementary operations they can perform.

We also want to make a general statement about scaling and complexity. A common narrative nowadays seems to be that by making neural networks larger and larger, they can solve more complex reasoning tasks. However, theoretical limitations as proven in this paper show that scaling is not all we need. Our results show that to improve the expressivity of GNNs, more computationally complex aggregation and combine functions are needed. More precisely, these functions need to come outside of the class  $FAC_{\mathbb{R}}^0$ , which contains functions that can be computed by arithmetic circuits that are bounded in size and depth, see Definition 2.11 for a precise definition. Another approach would be to change the fixed depth framework of GNNs to one that includes recursion, for example.

Following a broad line of research outlined by Merrill [2022], in this paper, we aim at a general statement about the computational power of neural networks following the GNN architecture, i.e., networks operating in layers where connections within layers and hence their communication capabilities correspond to the given input graph. However, contrary to most predecessor papers [Morris et al., 2019, Barceló et al., 2020], we do not only consider so-called AC-GNNs, i.e., GNNs where the computation in each node of a layer of the network consists first of an aggregation step (collecting information from the adjacent nodes of the graph—very often simply summation), followed by a combine step (combining the aggregated information with the current information of the node—mostly a linear function followed by a unary non-linear activation function such as sigmoid or ReLU). AC-GNNs like these are just one example of a GNN architecture. We take a more general approach. The GNN framework defines a continuous graph-based "message passing" [Gilmer et al., 2017, Morris et al., 2019]. In our networks we keep the layout and message passing mechanisms of GNNs but equip the nodes in each layer with constant-depth arithmetic circuits (taken from a

previously fixed basis of circuits), not limited to the usual aggregate-combine (AC) operations. We call these networks C-GNNs, circuit graph neural networks. AC-GNNs whose aggregation function is computable by a constant-depth circuit, can be simulated by C-GNNs (using the same activation function). The computational upper bounds we obtain thus point out general computational limitations for networks following the GNN message passing framework.

**Main Results.** The main contribution of our paper is an exact correspondence between C-GNNs and arithmetic circuits. A function (from labeled graphs to labeled graphs) is computable by a C-GNN with a constant number of layers if and only if it is computable by a constant-depth arithmetic circuit over the real numbers. The activation function of the neural network will be a gate type in the arithmetic circuit. Thus, the (set of) actual activation function(s) becomes a parameter in a general statement equating the computational power of GNNs and arithmetic circuits.

A number of remarks are in order.

- Our result holds for all commonly used activation functions.
- Our result is uniform. If we start with a family of GNNs whose individual networks can be generated via some algorithm, the resulting family of arithmetic circuits will be uniform in the same way, and vice versa.
- Our result gives general limits for neural networks following the GNN message passing framework, including but not restricted to the currently widely used AC-GNNs. This means that in order to use a GNN to compute a function not computable by constant-depth arithmetic circuits, scaling GNNs up or adding computational power in the individual nodes will not help, but neural networks of a completely different architecture are needed.

This paper follows a modular approach. We stick to the GNN message passing framework, but allow computations of arbitrary power in the individual nodes of the graph. Our approach is adaptable and opens possibilities for many extensions; we point out a few in our conclusion section.

We would like to stress that theoretical insights like those presented in this paper have the potential to guide practical development. We prove principal limits of networks following the GNN framework, extending the current AC-GNNs. Our results enables the use of theoretical results regarding arithmetic circuit complexity for arguing about the complexity of GNNs. For example findings for a complexity hierarchy within the real valued  $FAC^0_{\mathbb{R}}$  will lead to corresponding statements about the power of GNNs. They may also be used to design GNN architectures with provable expressivity bounds.

**Organization.** In the next section, we introduce the relevant notions about GNNs and arithmetic circuits. Section 3 contains our results. In Subsection 3.1, we introduce Circuit-GNNs as a generalization of AC-GNNs. In Subsections 3.2 and 3.3, we show how to simulate C-GNNs by arithmetic circuits, and vice versa. In Section 4, we point out a number of questions for further research. Due to lack of space, some proofs marked with † are deferred to the appendix.

Related Work. The first model of GNNs was introduced by Scarselli et al. [2009]. Since then, their expressive power has been studied in numerous works and from various perspectives as discussed above. As mentioned already, Barceló et al. [2020] established a close connection between classifiers definable in guarded FOC<sub>2</sub> and computable by AC-GNNs. It is worth to point out that a connection very close to the aforementioned one can be obtained by connecting the results of Sato et al. [2019], that relate graph neural networks with more general models of distributed computation, to the results of Hella et al. [2015], that logically characterize expressivity of those models of distributed computation with graded modal logics (known to be equivalent with guarded FOC<sub>2</sub>). Grohe [2023] very recently extended these characterization results for GNNs, and furthermore connected GNNs to Boolean circuits. He considers an FO-fragment GFO+C (guarded fragment of FO plus a more general form of counting than used by Barceló et al.) and proves equivalence of this logic to GNNs for unary queries, that is for Boolean functions (functions with Boolean output). However, his overall result holds only in the non-uniform setting (that is, the sequence of GNNs for graphs of different sizes is non-uniform in the sense that there is not necessarily an algorithm that, given a graph size, computes that network from the sequence responsible for the given size), and the activation functions have to be "rpl approximable" (i.e., rational piecewise linear approximable, i.e., the real functions must in a certain way be a approximable by Boolean circuits or logic). This includes most of the commonly used activation functions. Since queries expressible in this logic are known to be  $TC^0$  computable, he obtains a  $TC^0$  upper bound for the computational power of GNNs. For the converse connection, an extension of GNNs is necessary, and it is shown that a unary query

computable by a  $TC^0$  circuit is computable by a GNN with so-called random initialization. Another line of work that focuses on real-valued computations includes [Geerts et al., 2022] where the logic MPLang operating directly on real numbers was introduced to readily express GNN computations. The main goal of that paper was to reason about GNNs using a logic, extended by real-number computations. However, the converse connection, from MPLang to GNNs, is highly unclear. Other work addressing the precise characterizations of expressivity issues of GNNs outside of the restriction to the Boolean setting with GNNs using formal logic include [Cucala et al., 2023, Pfluger et al., 2024]. To the best of our knowledge, there are no other works considering real-valued computations and circuits or arithmetic circuits. Another approach that leans more into comparing the expressivity when using different functions for aggregation in the GNN can be found in Rosenbluth et al. [2023], where they compared the expressivity of GNNs using sum as their aggregation with mean and max GNNs.

# 2 Preliminaries

Throughout this paper, the graphs we consider have an ordered set of vertices and are undirected unless otherwise specified. We use overlined letters to denote tuples, write  $|\overline{x}|$  to denote the length of  $\overline{x}$  (i.e., the number of elements of  $\overline{x}$ ), and use [n] to denote the first n nonzero natural numbers  $\{1,\ldots,n\}$ . For a  $k\in\mathbb{N}$  we denote all possible n-tuples for every n of  $\mathbb{R}^k$  with  $(\mathbb{R}^k)^*$ . The notation  $\{\!\{\}\!\}$  is used for multisets. A restriction of a function f to set f is written as  $f \upharpoonright S$ .

**Definition 2.1.** Let  $k \in \mathbb{N}$ , and let G = (V, E) be a graph with an ordered set of vertices V and a set of undirected edges E on V. Let  $g_V \colon V \to \mathbb{R}^k$  be a function which labels the vertices with attribute vectors. We then call  $\mathfrak{G} = (V, E, g_V)$  a *labeled graph of dimension* k and denote by Graph the set of all labeled graphs and by  $\mathfrak{Graph}_k$  the set of all labeled graphs of dimension k.

For a node  $v \in V$ , the *neighborhood of* v is defined as  $N_{\mathfrak{G}}(v) := \{w \in V \mid \{v, w\} \in E\}$ .

In order to compare standard graph neural networks and the circuit graph neural networks which we will introduce later on, we fix the notion of a graph neural network first. Generally, GNNs can classify either individual nodes or whole graphs. For the purpose of this paper, we will only introduce node classification as graph classification works analogously. We focus on aggregate combine graph neural networks that aggregate the information of every neighbor of a node and then combine this information with the information of the node itself.

An aggregation function (of dimension k) is a permutation-invariant function AGG:  $\mathbb{R}^k \times \cdots \times \mathbb{R}^k \to \mathbb{R}^k$ , a combine function (of dimension k) is a function COM:  $\mathbb{R}^k \times \mathbb{R}^k \to \mathbb{R}^k$  and a classification function (of dimension k) is a function CLS:  $\mathbb{R}^k \to \{0,1\}$  that classifies a real vector as either true or false. Finally, an activation function (of dimension k) is a componentwise function  $\sigma \colon \mathbb{R}^k \to \mathbb{R}^k$ .

**Definition 2.2** (AC-GNN, cf. [Barceló et al., 2020]). An L layer aggregate combine graph neural network (AC-GNN) is a tuple  $\mathcal{D}=(\{\mathrm{AGG}^{(i)}\}_{i=1}^L,\{\mathrm{COM}^{(i)}\}_{i=1}^L,\{\sigma^{(i)}\}_{i=1}^L,\mathrm{CLS}),$  where  $\{\mathrm{AGG}^{(i)}\}_{i=1}^L$  and  $\{\mathrm{COM}^{(i)}\}_{i=1}^L$  are sequences of aggregation and combine functions,  $\{\sigma^{(i)}\}_{i=1}^L$  is a sequence of activation functions and CLS is a classification function.

Given a labeled graph  $\mathfrak{G}=(V,E,g_V)$ , the AC-GNN model computes vectors  $\overline{x}_v^{(i)}$  for every  $v\in V$  in every layer  $1\leq i\leq L$  as follows:  $\overline{x}_v^{(0)}=g_V(v)$  is the initial feature vector of v, and for i>1

$$\overline{x}_v^{(i)} = \sigma^{(i)}\left(\mathrm{COM}^{(i)}\left(\overline{x}_v^{(i-1)}, \overline{y}\right)\right), \text{ where } \overline{y} = \mathrm{AGG}^{(i)}\left(\left\{\!\!\left\{\overline{x}_u^{(i-1)} \mid u \in N_{\mathfrak{G}}(v)\right\}\!\!\right\}\right).$$

The classification function CLS:  $\mathbb{R}^k \to \{0,1\}$  is applied to the resulting feature vectors  $\overline{x}_v^{(L)}$ .

In this paper we focus on the real-valued computation part of GNNs and discard the classification function. We consider the feature vectors  $\overline{x}_v^{(L)}$  after the computation of layer L as our output. While we could also integrate CLS into our model, for our concerns this is not needed.

Next we define arithmetic circuits as a model of computation for computing real functions. Since we will put them in the context of graph neural networks, which inherently operate on vectors of real numbers rather than individual reals, we will accordingly define arithmetic circuits relative to  $\mathbb{R}^k$  rather than relative to  $\mathbb{R}$ , as is done more commonly (cf. e.g. [Blum et al., 1997]).

**Definition 2.3.** Let  $k, n, m \in \mathbb{N}$ . An  $\mathbb{R}^k$ -arithmetic circuit with n inputs and m outputs is a simple directed acyclic graph of labeled nodes, also called *gates*, such that

- there are exactly n gates labeled *input*, which each have indegree 0,
- there are exactly m gates labeled output, which have indegree 1 and outdegree 0,
- there are gates labeled *constant*, which have indegree 0 and are labeled with a tuple  $c \in \mathbb{R}^k$ ,
- there are gates labeled  $projection_{i,j}$  for  $1 \le i, j \le k$ , which have in- and outdegree 1,
- there are gates labeled *addition* and *multiplication*.

Additionally, both the input and the output gates are ordered.

An  $\mathbb{R}^k$ -arithmetic circuit C with n inputs and m outputs computes the function  $f_C\colon (\mathbb{R}^k)^n\to (\mathbb{R}^k)^m$  as follows: Initially, the input to the circuit is placed in the input gates. Afterwards, in each step, each arithmetic gate whose predecessor gates all have a value, computes the respective function it is labeled with, using the values of its predecessors as inputs. By addition and multiplication we refer to the respective componentwise operations and the projection computes the function  $\operatorname{proj}_{i,j}\colon \mathbb{R}^k\to \mathbb{R}^k, \ (x_1,\ldots,x_i,\ldots x_k)\mapsto (0,\ldots,0,\underset{\operatorname{position} j}{x_i},0,\ldots,0).$  Analogously, each output

gate takes the value of its predecessor, once its predecessor has one. The output of  $f_C$  is then the tuple of values in the m output gates of C after the computation.

The depth of C (written depth(C)) is the length of the longest path from an input gate to an output gate in C and the size of C (written size(C)) is the number of gates in C. For a gate g in C, we will write depth(g) to denote the length of the longest path from an input gate to g in C.

Remark 2.4. In the context of the circuits over  $\mathbb{R}^k$  that we have just defined, a circuit over  $\mathbb{R}$  is just a circuit over  $\mathbb{R}^1$ . Note that for circuits over  $\mathbb{R}^1$ , projection gates are just identity gates and can therefore be omitted. It should also be noted that for any fixed  $k \in \mathbb{N}$ , circuits over  $\mathbb{R}^k$  and circuits over  $\mathbb{R}$  can easily simulate each other. This result is discussed more extensively in Section A.1 of the Appendix.

Since an arithmetic circuit itself can only compute a function with a fixed number of arguments, we extend this definition to families of arithmetic circuits in a natural way.

**Definition 2.5.** An  $\mathbb{R}^k$ -arithmetic circuit family  $\mathcal{C}$  is a sequence  $(C_n)_{n\in\mathbb{N}}$  of circuits, where each circuit  $C_n$  has exactly n input gates. Its *depth* and *size* are functions mapping natural numbers n to  $depth(C_n)$  and  $size(C_n)$ , respectively.

An arithmetic circuit family  $C = (C_n)_{n \in \mathbb{N}}$  computes the function  $f_C \colon (\mathbb{R}^k)^* \to (\mathbb{R}^k)^*$  defined as  $f_C(\overline{x}) \coloneqq f_{C_{|\overline{x}|}}(\overline{x})$ .

Remark 2.6. Since a circuit family is an infinite sequence of circuits, one would be hard pressed to consider such a family a (finite) algorithm. Such models of computation, where a different instance is needed for any different input length, is called *non-uniform*. In order to deem a circuit family  $(C_n)_{n\in\mathbb{N}}$  to be an algorithm, a frequent requirement is the existence of an algorithm which, when given n as an input, outputs the circuit  $C_n$ . Circuit families, for which such an algorithm exists are called *uniform* circuit families. For more details on circuit uniformity, see e.g. Vollmer [1999] and for more on uniformity with respect to real computation, see Blum et al. [1997].

**Definition 2.7.** For any  $k \in \mathbb{N}$  and any two functions  $s,d \colon \mathbb{N} \to \mathbb{N}$ , FSIZE-DEPTH $\mathbb{R}^k(s,d)$  is the class of all functions  $(\mathbb{R}^k)^* \to (\mathbb{R}^k)^*$  that are computable by arithmetic circuit families of size in  $\mathcal{O}(s)$  and depth in  $\mathcal{O}(d)$ . Let  $\mathcal{A}$  be a set of functions of the form  $f \colon \mathbb{R}^k \to \mathbb{R}^k$ . Then FSIZE-DEPTH $\mathbb{R}^k(s,d)[\mathcal{A}]$  is the class of functions computable by arithmetic circuits with the same constraints, but with additional gate types  $g_f$ , with indegree and outdegree 1 that compute f, for each  $f \in \mathcal{A}$ .

We call all classes of the form FSIZE-DEPTH<sub> $\mathbb{R}^k$ </sub> $(s,d)[\mathcal{A}]$  (and their subclasses) *circuit function classes*. As usual in circuit complexity, an  $\mathfrak{F}$ -circuit family, where  $\mathfrak{F}$  is a class of functions, will denote a circuit family that computes a function in  $\mathfrak{F}$ .

We restrict this work to versions of one specific circuit function class.

**Definition 2.8.**  $\operatorname{FAC}^0_{\mathbb{R}^k}$  is the class of all functions  $f\colon (\mathbb{R}^k)^* \to (\mathbb{R}^k)^*$  that can be computed by arithmetic circuit families  $(C_n)_{n\in\mathbb{N}}$  of constant depth and polynomial size, i.e.,  $\operatorname{FAC}^0_{\mathbb{R}^k} = \operatorname{FSIZE-DEPTH}_{\mathbb{R}^k}(n^{\mathcal{O}(1)}, 1)$ .

By Definition 2.7,  $FAC_{\mathbb{R}^k}^0[\mathcal{A}]$ -circuit families are  $FAC_{\mathbb{R}^k}^0$ -circuit families where the circuits are expanded by additional gate types for all functions in  $\mathcal{A}$ .

A node in a graph neural network aggregates the values of its neighbors in each step, irrespective of their order, and then combines them with its own previous value. In order for our circuits to mimic this behaviour, we impose a symmetry condition on them. We require the functions they compute to be *tail-symmetric*, i.e., to only be able to single out their first argument.

**Definition 2.9.** A function  $f: (\mathbb{R}^k)^* \to (\mathbb{R}^k)^*$  is *tail-symmetric*, if  $f(\overline{x}_1, \ldots, \overline{x}_n) = f(\overline{x}_1, \pi(\overline{x}_2, \ldots, \overline{x}_n))$  for all permutations  $\pi$ .

Remark 2.10. One way to construct a tail-symmetric function f is to take a binary function g and a fully symmetric function h of arbitrary arity and compose them:  $f(x_1,\ldots,x_n) \coloneqq g(x_1,h(x_2,\ldots,x_n))$ . Note that this is precisely the notion required for the aggregate-combine step in AC-GNNs as per Definition 2.2.

An arithmetic circuit C is *tail-symmetric* if  $f_C$  is tail-symmetric. In the sequel, we will consider families of tail-symmetric functions computed by some circuit family C. In this setting, the circuit family computes one unique function  $f_n$  for each arity  $n \in \mathbb{N}$ . We will then write  $f_C(\overline{x}_1, \{\!\{\overline{x}_2, \ldots, \overline{x}_n\}\!\})$  to denote  $f_n(\overline{x}_1, \ldots, \overline{x}_n)$ , when  $\{\!\{\overline{x}_2, \ldots, \overline{x}_n\}\!\}$  is a multiset of cardinality n-1.

**Definition 2.11.** Let  $\mathfrak{F}$  be a circuit function class. We denote by  $t\mathfrak{F}$  the class which contains all functions of  $\mathfrak{F}$  that are tail-symmetric.

# 3 Graph Neural Networks using Circuits

We now define the main computation model of interest for this paper, namely circuit graph neural networks (C-GNNs). C-GNNs work similarly to plain AC graph neural networks. Instead of using the concept of a two-step aggregate and combine computation via which the new feature vector of a node is computed, C-GNNs permit that this computation is done by an arithmetic circuit (with particular resource bounds). This allows us to classify GNNs based on the complexity of their internal computing functions.

## 3.1 Model of Computation

A *basis* of a C-GNN is a set of functions of a specific circuit function class along with a set of activation functions. The respective networks will be based on these functions.

**Definition 3.1.** Let S be a non-empty set of functions from  $(\mathbb{R}^k)^*$  to  $(\mathbb{R}^k)^*$  and let A be a non-empty set of activation functions of dimension k. Then we call the set  $S \times A$  a C-GNN-basis of dimension k.

C-GNNs of a particular basis essentially consist of a fixed depth and a function assigning circuit families and activation functions from its basis to its different layers.

**Definition 3.2.** Let  $B = \mathcal{S} \times \mathcal{A}$  be a C-GNN-basis of dimension k. A *circuit graph neural network* (of dimension k) of depth  $d \in \mathbb{N}$  is a function  $\mathcal{N} \colon [d] \to B$ . If all functions in  $\mathcal{S}$  belong to a circuit function class  $\mathfrak{F}$ , we also say that  $\mathcal{N}$  is a  $(\mathfrak{F}, \mathcal{A})$ -GNN.

**Definition 3.3.** Let  $\mathcal{N}$  be a C-GNN of depth d. The function  $f_{\mathcal{N}} \colon \mathfrak{Graph}_k \to \mathfrak{Graph}_k$  computed by  $\mathcal{N}$  is defined as follows:

Let  $\mathfrak{G} = (V, E, g_V)$  be a labeled graph. The labeled graph  $\mathfrak{G}'$  computed by  $f_{\mathcal{N}}(\mathfrak{G})$  has the same structure as  $\mathfrak{G}$ , however, its nodes have different feature vectors. That is  $f_{\mathcal{N}}(\mathfrak{G}) = \mathfrak{G}'$ , where  $\mathfrak{G}' = (V, E, h_V)$  and  $h_V$  is defined inductively as follows:

$$\begin{split} h_V^{(0)}(w) &\coloneqq g_V(w) \\ h_V^{(i)}(w) &\coloneqq \sigma^{(i)}\left(f_{\mathcal{C}^{(i)}}\left(h_V^{(i-1)}(w), M\right)\right), \text{ with } M = \left\{\!\!\left\{h_V^{(i-1)}(u) \mid u \in N_{\mathfrak{G}}(w)\right\}\!\!\right\} \end{split}$$
 where  $\mathcal{N}(i) = \left(\mathcal{C}^{(i)}, \sigma^{(i)}\right)$ . Finally  $h_V(w) \coloneqq h_V^{(d)}(w)$ .

The C-GNN model that we have just defined computes functions from labeled graphs to labeled graphs (with the same graph structure).

Our results are stated for C-GNNs using functions from versions of the function class  $FAC_{\mathbb{R}^k}^0$  as defined in Definitions 2.8 and 2.11.

Remark 3.4. We can relate our C-GNN model to the continuous computation of traditional AC-GNNs. For any AC-GNN following Definition 2.2 with activation functions  $\mathcal{A}$  and where the aggregation functions are computable by  $FAC_{\mathbb{R}^k}^0$ -circuit families, there exists a  $(FAC_{\mathbb{R}^k}^0, \mathcal{A})$ -GNN with a constant number of layers which computes the same function, omitting only the functionality of the classification function. This holds for the common aggregation functions like the sum, product or mean.

# 3.2 Simulating C-GNNs with arithmetic circuits

**Definition 3.5.** Let  $\mathfrak{G} = (V, E, g_V)$  be a labeled graph,  $n \coloneqq |V|$ , and  $M = \operatorname{adj}(\mathfrak{G})$  be the adjacency matrix of (V, E), where the columns are ordered in accordance with the ordering of V. We write  $\langle M \rangle$  to denote the encoding of M as the  $n^2$  matrix entries written using vectors  $(0, \ldots, 0)$  and  $(1, \ldots, 1)$  in  $\mathbb{R}^k$ , ordered in a row wise fashion. We denote them by  $\overline{m}_{ij}$  encoding the matrix entry  $m_{ij}$ . We write  $\langle \mathfrak{G} \rangle$  to denote the encoding of  $\mathfrak{G}$  as a tuple of real valued vectors, such that  $\langle \mathfrak{G} \rangle = (\langle M \rangle, \operatorname{vec}(\mathfrak{G})) \in (\mathbb{R}^k)^{n^2+n}$ , which consists of the encoding of M followed by  $\operatorname{vec}(\mathfrak{G})$ , the feature vectors of  $\mathfrak{G}$ . The feature vectors  $\operatorname{vec}(\mathfrak{G})$  are  $g_V(v)$  for all  $v \in V$  and ordered like V.

**Theorem 3.6**<sup>†</sup>. Let  $\mathcal{N}$  be a  $(tFAC^0_{\mathbb{R}^k}[\mathcal{A}], \{id\})$ -GNN. Then there exists an  $FAC^0_{\mathbb{R}^k}[\mathcal{A}]$ -circuit family  $\mathcal{C}$ , such that for all labeled graphs  $\mathfrak{G}$  the following holds:  $f_{\mathcal{N}}(\mathfrak{G}) = \mathfrak{G}'$  iff  $\mathcal{C}(\langle \mathfrak{G} \rangle) = \langle \mathfrak{G}' \rangle$ , where  $\langle \mathfrak{G} \rangle = (\langle adj(\mathfrak{G}) \rangle, \langle vec(\mathfrak{G}) \rangle)$ .

*Proof.* We essentially roll out the given  $(tFAC_{\mathbb{R}^k}^0[\mathcal{A}], \{id\})$ -GNN  $\mathcal{N}$  by using a circuit to simulate each individual layer of  $\mathcal{N}$  and concatenating these circuits to then simulate  $\mathcal{N}$ . Since the computation in each node of  $\mathcal{N}$  can be performed by a circuit of an  $FAC_{\mathbb{R}^k}^0[\mathcal{A}]$ -family, simulating this computation can be done without an issue. Similarly, connecting the simulated nodes and layers can be done relatively simply, and thus the resulting circuit family is an  $FAC_{\mathbb{R}^k}^0[\mathcal{A}]$ -family.  $\square$ 

Remark 3.7. Note that this proof preserves uniformity: if the sequence of C-GNNs is uniform, so is the circuit family. Furthermore, for any set of activation functions  $\mathcal{A}$ ,  $(tFAC_{\mathbb{R}^k}^0[\mathcal{A}], \{id\})$ -GNNs are at least as powerful as  $(tFAC_{\mathbb{R}^k}^0, \{id\} \cup \mathcal{A})$ -GNNs and  $(tFAC_{\mathbb{R}^k}^0, \{id\})$ -GNNs. The latter do not have access to the functions of  $\mathcal{A}$  at all and  $(tFAC_{\mathbb{R}^k}^0, \{id\} \cup \mathcal{A})$ -GNNs are more restricted in the use of the functions than  $(tFAC_{\mathbb{R}^k}^0[\mathcal{A}], \{id\})$ -GNNs.

## 3.3 Simulating arithmetic circuits with C-GNNs

In order to simulate  $FAC_{\mathbb{R}^k}^0$ -circuit families using C-GNNs, we utilize the following normal form.

**Definition 3.8.** A circuit C is in *path-length normal form* if every path from an input to an output gate has the same length and every non-input and non-output gate has exactly one successor.

**Lemma 3.9.** Let C be an  $FAC^0_{\mathbb{R}^k}$ -circuit family. Then for every circuit  $C \in C$  there exists a circuit C' of the same depth in path-length normal form such that  $f_{C'}(\overline{x}) = f_C(\overline{x})$ , for all  $\overline{x} \in (\mathbb{R}^k)^*$ .

*Proof.* We describe a procedure that transforms C to C'. If a gate h in C has more than one successor, the subgraph consisting of h and all paths from input gates to h are copied in C' for each successor of h. These copies are then used as the respective input for those successors which results in every gate in C' having only one successor. This is done iteratively, where in each step only gates in one depth layer of the circuit are modified, starting with depth 1, the gates closest to the input gates. Each step in this procedure incurs a polynomial overhead in size. Since C is of constant depth there are only constantly many iteration steps and thus C' remains polynomial in size.

Let  $\ell_{max} = depth(C) (= depth(C'))$  be the length of the longest path from an input to an output node in C. For every path of length  $\ell$  which is shorter than  $\ell_{max}$ , we add to C'  $\ell_{max} - \ell$  unary addition gates directly after the input. The depth of the resulting circuit C' equals the one of C, as all changes to path lengths are made with this upper bound in mind. Since all changes made preserve the computed function and the produced C' is in path-length normal form, the claim follows.  $\square$ 

Since C-GNNs get labeled graphs as their input, while arithmetic circuits typically work on vectors over  $\mathbb{R}^k$ , we show next how to represent a circuit as an input to a C-GNN.

**Definition 3.10.** If C is a circuit with input  $\overline{x}$  whose gates are additionally numbered with unique elements of  $\mathbb{R}$ , let  $\mathfrak{G}_{C,\overline{x}}$  be the labeled graph  $(G,g_V)$ , where G=(V,E) is the underlying graph of the circuit and the function  $g_V$  is defined as follows. The feature vector of a graph node corresponding to an input gate is the corresponding input value. Each other node gets its own number according to the gate numbering of the circuit as its feature vector.

The following theorems describe how C-GNNs are able to simulate arithmetic circuits. The proofs are given for  $\mathbb R$  circuits, but as mentioned in Remark 2.4 and proved in Corollary A.3 families of constant depth and polynomial size  $\mathbb R^k$  and  $\mathbb R$  circuits can simulate each other with constant increase/decrease in size, so they also hold for  $\mathbb R^k$  circuits.

**Theorem 3.11**<sup>†</sup>. Let  $C = (C_n)_{n \in \mathbb{N}}$  be an  $FAC_{\mathbb{R}}^0$ -circuit family. Then there exists a  $(tFAC_{\mathbb{R}}^0, \{id\})$ - $GNN \ \mathcal{N}$ , such that for all  $n \in \mathbb{N}$  and  $\overline{x} \in \mathbb{R}^n$ , the feature vectors of the nodes of  $f_{\mathcal{N}}(\mathfrak{G}_{C_n,\overline{x}})$  corresponding to output gates in  $C_n$  are exactly the components of  $f_{C_n}(\overline{x})$  in the same order. The number of layers in  $\mathcal{N}$  is equal to the depth of C.

*Proof.* The general idea is to use the graphs of the circuits from  $\mathcal{C}$  as input graphs for a C-GNN and then simulate the gates layerwise. The gates of  $\mathcal{C}$  are numbered and the numbering is represented in the feature vectors for each node. In each layer circuits of the C-GNN use a lookup table to decide which operation is applied to a feature vector (see Lemma A.6). An example illustrating the proof idea is given in Figure 1 and Table 1. For simplicity reasons an example omitting any special cases is chosen.



(a) Underlying circuit, to be simulated by the C-GNN (nr(\*) denotes the unique number of the gate).

(b) The labeled graph and initial feature vectors of the C-GNN  $(v_g^{(i)})$  denotes the feature vector of a gate at layer i).

Figure 1: Example illustrating the proof of Theorem  $3.11^{\dagger}$ .

Table 1: Example illustrating the proof of Theorem 3.11<sup>†</sup>: The values of the feature vectors during the computation of the C-GNN.

| LAYER | $\overline{v}_{\mathrm{in}_1}$ | $\overline{v}_{\mathrm{in}_2}$ | $\overline{v}_{\mathrm{in}_3}$ | $\overline{v}_+$ | $\overline{v}_{\times}$ |
|-------|--------------------------------|--------------------------------|--------------------------------|------------------|-------------------------|
| 1     | 6                              | 9                              | 5                              | 1                | 2                       |
| 2     | 6                              | 9                              | 5                              | 6 + 9 = 15       | 2                       |
| 3     | 6                              | 9                              | 5                              | 15               | $5 \times 15 = 75$      |

Since we have shown in Section 3.2 that C-GNNs can be simulated by  $FAC_{\mathbb{R}^k}^0[\mathcal{A}]$ -circuit families we now examine the converse direction: simulating these families with C-GNNs.

**Theorem 3.12**<sup>†</sup>. Let  $C = (C_n)_{n \in \mathbb{N}}$  be an  $FAC_{\mathbb{R}}^0[A]$ -circuit family. Then there exists a  $(tFAC_{\mathbb{R}}^0[A], \{id\})$ - $GNN \mathcal{N}$ , such that for all  $n \in \mathbb{N}$  and  $\overline{x} \in \mathbb{R}^n$ , the feature vectors of the nodes of  $f_{\mathcal{N}}(\mathfrak{G}_{C_n,\overline{x}})$  corresponding to output gates in  $C_n$  are exactly the components of  $f_{C_n}(\overline{x})$  in the same order. The number of layers in  $\mathcal{N}$  is equal to the depth of C.

*Proof.* The proof follows the same concept as the proof of Theorem 3.11<sup>†</sup>. With the exception that the internal circuits of the C-GNN now have access to the same set of functions A in the form of gates that compute them. Those are then used to simulate gates from  $\mathcal{C}$  that compute functions of  $\mathcal{A}$ .  $\square$ 

In the previous two theorems, we have shown that C-GNNs can simulate  $FAC^0_{\mathbb{R}}$ -circuit families as is and that they can simulate  $FAC_{\mathbb{R}}^{0}[A]$ -circuit families if they have arbitrary access to the functions in A. If we consider A to consist of activation functions, though, it would be more natural to only permit them as such in our C-GNN model. In the following we will introduce a structural restriction of circuits, which will allow us to capture a subset of  $FAC^0_{\mathbb{R}}[\mathcal{A}]$  using C-GNNs that have access to A only as activation functions.

**Definition 3.13.** A circuit C is said to be in function-layer form if it is in path-length normal form and for each depth  $d \leq depth(C)$  all gates of C at depth d have the same gate type.

**Definition 3.14.** Let  $\mathfrak{F}$  be a circuit function class. We denote by  $f\mathfrak{F}$  the class which contains all functions of \( \mathcal{F} \) that can be computed by circuit families where all circuits are in function-layer form.

We also need a small restriction on the functions we will permit as activation functions.

**Definition 3.15.** A computable function  $f: \mathbb{R}^k \to \mathbb{R}^k$  is *countably injective* if there is a countably infinite set  $S \subseteq \mathbb{R}^k$  such that  $f \upharpoonright_S$  is injective. We also say that f is countably injective on S.

Remark 3.16. The commonly used activation functions  $\operatorname{ReLU}(x) = \max(0, x)$ ,  $\sigma(x) = \frac{1}{1 + e^{-x}}$  and  $\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$  are countably injective.

**Theorem 3.17**<sup>†</sup>. Let  $\mathcal{A}$  be a set of countably injective activation functions and let  $\mathcal{C}=(C_n)_{n\in\mathbb{N}}$  be a fFAC $^0_{\mathbb{R}}[\mathcal{A}]$ -circuit family. Then there exists a (tFAC $^0_{\mathbb{R}},\mathcal{A}\cup\{\mathrm{id}\}$ )-GNN  $\mathcal{N}$ , such that for all  $n\in\mathbb{N}$  and  $\overline{x}\in\mathbb{R}^n$ , the feature vectors of the nodes of  $f_{\mathcal{N}}(\mathfrak{G}_{C_n,\overline{x}})$  corresponding to output gates in  $C_n$  are exactly the components of  $f_{C_n}(\overline{x})$  in the same order. The number of layers in  $\mathcal{N}$  is equal to the depth of C.



(a) Underlying circuit, to be simulated by the C-GNN (nr(\*) denotes the unique number of the gate). (b) The labeled graph and initial feature vectors of the C-GNN  $(v_g^{(i)})$  denotes the feature vector of a gate at layer i).

Figure 2: Example illustrating the proof of Theorem 3.17<sup>†</sup>. <sup>3</sup>

Table 2: Example illustrating the proof of Theorem 3.17†: The values of the feature vectors during the computation of the C-GNN.

| Layer | $\overline{v}_{\mathrm{in}_1}$ | $\overline{v}_{\mathrm{in}_2}$ | $\overline{v}_{\sigma}$ | $\overline{v}'_{\sigma}$ | $\overline{v}_+$             |
|-------|--------------------------------|--------------------------------|-------------------------|--------------------------|------------------------------|
| 1     | 6                              | 9                              | 1                       | 2                        | 3                            |
| 2     | $\sigma(6)$                    |                                | $\sigma(1)$             |                          | $\sigma(\sigma^{-1}(3)) = 3$ |
| 3     | $\sigma(6)$                    | $\sigma(9)$                    | $\sigma(1)$             | $\sigma(9)$              | $\sigma(6) + \sigma(9)$      |

*Proof.* Contrary to the proof of Theorem 3.12<sup>†</sup> the gates in C computing a function  $\sigma \in A$  are simulated in the C-GNN by using the respective function as an activation function applied to all nodes. To distinguish between nodes where this function should and should not be applied we assign  $\sigma^{-1}$  of the respective feature vectors of nodes where we do not want to change the value. This is done prior to the application of the activation function. Therefore the function needs to be injective on some countable set. An example illustrating the proof idea is given in Figure 2 and Table 2. As in the proof of Theorem 3.17<sup>†</sup> an example omitting any special cases is chosen.

## 4 Conclusion

In this paper we showed a correspondence between arithmetic circuits and a generalization of graph neural networks using circuits. However, some restrictions needed to be imposed on the particular circuits used in our constructions. In particular, the circuits used in our C-GNNs need to be tail symmetric, and in Theorem 3.17 $^{\dagger}$  the corresponding  $FAC_{\mathbb{R}}^{0}[\mathcal{A}]$ -circuit family needs to be in function-layer form as well. An interesting avenue for further research is the question, whether those restrictions can be made "on both sides in our simulations". This means, in particular, to ask if  $(tFAC_{\mathbb{R}^k}^0, \{id\} \cup \mathcal{A})$ -GNNs can be simulated by  $fFAC_{\mathbb{R}^k}^0[\mathcal{A}]$  families and, if  $(tfFAC_{\mathbb{R}^k}^0, \{id\} \cup \mathcal{A})$ -GNNs and  $fFAC_{\mathbb{R}^k}^0[\mathcal{A}]$  can simulate each other. Another direction of further work is to study whether by relating our C-GNNs to variants of so-called VVc-GNNs of Sato et al. [2019], we may drop the assumption of tail-symmetry.

We already mentioned the issue of uniformity. If the sequence of GNNs, one for each graph size, is uniform in the sense that there is an algorithm that, given the size of the graph as an input, outputs the GNN responsible for that size, then the sequence of arithmetic circuits will also be uniform, because our simulation proof explicitly shows how to construct the circuit; and vice versa. In future work, this should be made more precise. In the case of Boolean circuits, logtime-uniformity ( $U_{E*}$ -uniformity) has become the standard requirement [Vollmer, 1999]. What is the corresponding precise uniformity notion for GNNs?

In the introduction we mentioned the complexity of the problem of training neural networks. We studied the expressiveness (or computational power) of neural networks. The higher the expressiveness of a network is, the more complicated the training process will be; however, the question to decide if a trained network of given quality exists might become easier to decide. Can this be made precise? Is there a formal result connecting complexity of training and network expressiveness?

Further investigations are required to obtain practical implications of our theoretical results. Once we fix the architecture of our C-GNNs (i.e., a circuit function class for the GNN nodes), we obtain a specific circuit class that characterizes the computational power of the C-GNN model which can be then rigorously studied. The question of what can be computed by arithmetic circuits using activation functions is related to the question of which functions can be computed using other functions, or in other words, which functions are "more complex" than others. We are only aware of very little work in this area. Results include a PTIME upper bound (in the so-called BSS-model of computation) for the arithmetic circuit complexity class  $NC_{\mathbb{R}}$  that uses the sign function Cucker [1992].

As it currently stands, there does not seem to be meaningful experiments that could be run for our model of computation. The problem is that not much is known about the computational power of the circuit classes we utilize. Further research into functions computable by practically implementable arithmetic circuits of different sizes and depths would be required. This could e.g. be done by investigating the Boolean parts of different complexity classes defined by families of arithmetic circuits, i.e., the classes when restricted to Boolean inputs. This research would pinpoint properties that would be provably outside of the capabilities of particular GNN models, whose learnability could be then tested in practice. We believe that our results will motivate research for these circuit classes.

Merrill et al. [2022], Merrill and Sabharwal [2022] have studied transformer networks from a computational complexity perspective and obtained a  $TC^0$  upper bound, similar to the result for GNNs by Grohe [2023]. We think it is worthwhile to study the computational power of transformers from the point of view of computation over the reals, similar to what we have done in this paper for GNNs.

<sup>&</sup>lt;sup>3</sup>For readability purposes the one-dimensional vectors in the C-GNN are written as real numbers.

# Acknowledgements

The first and the fourth author were partially supported by the DFG grant VI 1045-1/1.

## References

- P. Barceló, E. V. Kostylev, M. Monet, J. Pérez, J. L. Reutter, and J. P. Silva. The Logical Expressiveness of Graph Neural Networks. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, 2020. URL https://openreview.net/forum?id=r11Z7AEKvB.
- M. Benedikt, C. Lu, B. Motik, and T. Tan. Decidability of graph neural networks via logical characterizations. In K. Bringmann, M. Grohe, G. Puppis, and O. Svensson, editors, 51st International Colloquium on Automata, Languages, and Programming, ICALP 2024, July 8-12, 2024, Tallinn, Estonia, volume 297 of LIPIcs, pages 127:1–127:20. Schloss Dagstuhl Leibniz-Zentrum für Informatik, 2024. doi: 10.4230/LIPICS.ICALP.2024.127. URL https://doi.org/10.4230/LIPIcs.ICALP.2024.127.
- L. Blum, F. Cucker, M. Shub, and S. Smale. *Complexity and Real Computation*. Springer-Verlag, Berlin, Heidelberg, 1997. ISBN 0387982817.
- D. T. Cucala, B. C. Grau, B. Motik, and E. V. Kostylev. On the correspondence between monotonic max-sum gnns and datalog. In P. Marquis, T. C. Son, and G. Kern-Isberner, editors, *Proceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning, KR 2023, Rhodes, Greece, September 2-8, 2023*, pages 658–667, 2023. doi: 10.24963/KR.2023/64. URL https://doi.org/10.24963/kr.2023/64.
- F. Cucker.  $P_{\mathbb{R}} \neq NC_{\mathbb{R}}$ . *J. Complexity*, 8(3):230–238, 1992. URL https://doi.org/10.1016/0885-064X(92)90024-6.
- F. Geerts, J. Steegmans, and J. Van den Bussche. On the Expressive Power of Message-Passing Neural Networks as Global Feature Map Transformers. In I. Varzinczak, editor, Foundations of Information and Knowledge Systems 12th International Symposium, FoIKS 2022, Helsinki, Finland, June 20-23, 2022, Proceedings, volume 13388 of Lecture Notes in Computer Science, pages 20–34. Springer, 2022. doi: 10.1007/978-3-031-11321-5\2. URL https://doi.org/10.1007/978-3-031-11321-5\2.
- J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for quantum chemistry. In *International conference on machine learning*, pages 1263–1272. PMLR, 2017.
- M. Grohe. The Descriptive Complexity of Graph Neural Networks. In LICS, pages 1–14, 2023. doi: 10.1109/LICS56636.2023.10175735. URL https://doi.org/10.1109/LICS56636.2023.10175735.
- S. He and P. A. Papakonstantinou. Deep Neural Networks: The Missing Complexity Parameter. *Electron. Colloquium Comput. Complex.*, TR22-159, 2022. URL https://eccc.weizmann.ac.il/report/2022/159.
- L. Hella, M. Järvisalo, A. Kuusisto, J. Laurinharju, T. Lempiäinen, K. Luosto, J. Suomela, and J. Virtema. Weak models of distributed computing, with connections to modal logic. *Distributed Comput.*, 28(1):31–53, 2015. doi: 10.1007/S00446-013-0202-3. URL https://doi.org/10.1007/s00446-013-0202-3.
- W. Maass. Bounds for the Computational Power and Learning Complexity of Analog Neural Nets. SIAM J. Comput., 26(3):708–732, 1997. doi: 10.1137/S0097539793256041. URL https://doi.org/10.1137/S0097539793256041.
- W. Maass, G. Schnitger, and E. D. Sontag. On the Computational Power of Sigmoid versus Boolean Threshold Circuits. In 32nd Annual Symposium on Foundations of Computer Science, San Juan, Puerto Rico, 1-4 October 1991, pages 767–776. IEEE Computer Society, 1991. doi: 10.1109/SFCS.1991.185447. URL https://doi.org/10.1109/SFCS.1991.185447.

- W. Merrill. Transformers are Uniform Constant-Depth Threshold Circuits. https://www.youtube.com/watch?v=XF2UoA0oovg, 2022.
- W. Merrill and A. Sabharwal. Log-Precision Transformers are Constant-Depth Uniform Threshold Circuits. *CoRR*, abs/2207.00729, 2022. doi: 10.48550/ARXIV.2207.00729. URL https://doi.org/10.48550/arXiv.2207.00729.
- W. Merrill, A. Sabharwal, and N. A. Smith. Saturated Transformers are Constant-Depth Threshold Circuits. *Trans. Assoc. Comput. Linguistics*, 10:843–856, 2022. URL https://transacl.org/ojs/index.php/tacl/article/view/3465.
- C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, and M. Grohe. Weisfeiler and Leman Go Neural: Higher-Order Graph Neural Networks. In *The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 February 1, 2019*, pages 4602–4609. AAAI Press, 2019. doi: 10.1609/AAAI.V33I01.33014602. URL https://doi.org/10.1609/aaai.v33i01.33014602.
- C. Morris, M. Fey, and N. M. Kriege. The Power of the Weisfeiler-Leman Algorithm for Machine Learning with Graphs. In Z. Zhou, editor, *Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021*, pages 4543–4550. ijcai.org, 2021. doi: 10.24963/IJCAI.2021/618. URL https://doi.org/10.24963/ijcai.2021/618.
- M. Pfluger, D. T. Cucala, and E. V. Kostylev. Recurrent graph neural networks and their connections to bisimulation and logic. In *Proceedings of the AAAI Conference on Artificial Intelligence*, volume 38, pages 14608–14616, 2024.
- E. Rosenbluth, J. Tönshoff, and M. Grohe. Some might say all you need is sum. *CoRR*, abs/2302.11603, 2023. doi: 10.48550/ARXIV.2302.11603. URL https://doi.org/10.48550/arXiv.2302.11603.
- R. Sato, M. Yamada, and H. Kashima. Approximation ratios of graph neural networks for combinatorial problems. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. B. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 4083-4092, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/635440afdfc39fe37995fed127d7df4f-Abstract.html.
- F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph neural network model. *IEEE Trans. Neural Networks*, 20(1):61–80, 2009. doi: 10.1109/TNN.2008.2005605. URL https://doi.org/10.1109/TNN.2008.2005605.
- H. T. Siegelmann and E. D. Sontag. On the Computational Power of Neural Nets. J. Comput. Syst. Sci., 50(1):132–150, 1995. doi: 10.1006/JCSS.1995.1013. URL https://doi.org/10.1006/jcss.1995.1013.
- H. Vollmer. *Introduction to Circuit Complexity A Uniform Approach*. Texts in Theoretical Computer Science. An EATCS Series. Springer, 1999. ISBN 978-3-540-64310-4. doi: 10.1007/978-3-662-03927-4. URL https://doi.org/10.1007/978-3-662-03927-4.
- K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How Powerful are Graph Neural Networks? In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL https://openreview.net/forum?id=ryGs6iA5Km.

# A Appendix

The Appendix is organized as follows: Section A.1 extends the results on the correspondence of circuits over  $\mathbb{R}^k$  and  $\mathbb{R}$ . Section A.2 goes into the concepts of Section 3.1 and contains a proof for the remark concerning the connection between AC-GNNs and C-GNNs. The other Sections A.3 and A.4 contain the full proofs for the theorems of Sections 3.2 and 3.3 in the paper, as well as additional required results.

#### A.1 Proofs of Section 2

The following Lemma shows in depth that  $\mathbb{R}^k$  circuits can be simulated by  $\mathbb{R}$  circuits.

**Lemma A.1.** Let  $k, n, m \in \mathbb{N}$  be arbitrary. For each arithmetic circuit C over  $\mathbb{R}^k$  with n input and m output gates, there exists an arithmetic circuit C' over  $\mathbb{R}$  with  $n \cdot k$  input and  $m \cdot k$  output gates, such that  $\operatorname{size}(C') \leq k \cdot \operatorname{size}(C)$ ,  $\operatorname{depth}(C') \leq \operatorname{depth}(C)$  and for all  $\overline{x}, \overline{y} \in \mathbb{R}^k$ 

$$f_C((x_{11},\ldots,x_{1k}),\ldots,(x_{n1},\ldots,x_{nk}))=((y_{11},\ldots,y_{1k}),\ldots,(y_{m1},\ldots,y_{nk}))$$

if and only if

$$f_{C'}(x_{11},\ldots,x_{1k},\ldots,x_{n1},\ldots,x_{nk})=(y_{11},\ldots,y_{1k},\ldots,y_{m1},\ldots,y_{mk}).$$

*Proof.* Each componentwise operation is replaced by the respective individual real operations. Projections are realized by directly working with the projected value and disregarding the values corresponding to the other elements of the vectors and replacing them by constant 0 gates. This way, there are k arithmetic gates in C' for each arithmetic gate in C and the depth of C is not increased.  $\Box$ 

Similarly,  $\mathbb{R}$  circuits can be simulated by  $\mathbb{R}^k$  circuits.

**Lemma A.2.** Let  $k, n, m \in \mathbb{N}$  be arbitrary. For each arithmetic circuit C over  $\mathbb{R}$  with  $n \cdot k$  input and  $m \cdot k$  output gates, there exists an arithmetic circuit C' over  $\mathbb{R}^k$  with n input and m output gates, such that  $\operatorname{size}(C') \leq k \cdot (n+m) + m + \operatorname{size}(C)$ ,  $\operatorname{depth}(C') \leq \operatorname{depth}(C) + 3$  and for all  $\overline{x}, \overline{y} \in \mathbb{R}^k$ 

$$f_C(x_{11},\ldots,x_{1k},\ldots,x_{n1},\ldots,x_{nk})=(y_{11},\ldots,y_{1k},\ldots,y_{m1},\ldots,y_{mk})$$

if and only if

$$f_{C'}((x_{11},\ldots,x_{1k}),\ldots,(x_{n1},\ldots,x_{nk})) = ((y_{11},\ldots,y_{1k}),\ldots,(y_{m1},\ldots,y_{mk})).$$

*Proof.* First project all tuple elements to the first component of each tuple (i.e.  $(x_1,\ldots,x_k)\mapsto (x_1,0,\ldots,0),(x_2,0,\ldots,0),\ldots,(x_k,0,\ldots,0)$ ), then compute as in the  $\mathbb{R}^1$  circuit and finally project back when reaching the output gates. Projecting from the input gates requires k projection gates in C' for each input gate in C and projecting back to the output gates requires k projection gates and one addition gate, to put all projected values back into one vector. This results in a total overhead in size of  $k \cdot (n+m) + m$  and a total overhead in depth of 3.

**Corollary A.3.** For each arithmetic circuit family C over  $\mathbb{R}$  of polynomial size and constant depth, there exists an arithmetic circuit family C' over  $\mathbb{R}^k$  of polynomial size and constant depth such that for all  $\overline{x}, \overline{y} \in \mathbb{R}^k$ 

$$f_{\mathcal{C}}(x_{11},\ldots,x_{1k},\ldots,x_{n1},\ldots,x_{nk})=(y_{11},\ldots,y_{1k},\ldots,y_{m1},\ldots,y_{mk})$$

if and only if

$$f_{\mathcal{C}'}((x_{11},\ldots,x_{1k}),\ldots,(x_{n1},\ldots,x_{nk}))=((y_{11},\ldots,y_{1k}),\ldots,(y_{m1},\ldots,y_{nk})).$$

Respectively an analogous family over  $\mathbb{R}$  exists for each family over  $\mathbb{R}^k$ .

#### A.2 Proofs of Section 3.1

As mentioned in Remark 3.4 our considered model of C-GNNs relates to the classical AC-GNNs in the following way. The definition of an AC-GNN does not fix the complexity of its aggregate and combine functions, hence to be able to compare the two models we need to restrict both functions to be of a specific circuit function class. This restriction holds for most practically used functions like aggregating via the weighted sum and combining via summation of the previous feature vector and the aggregate, each multiplied by a parameter matrix. We now proceed to prove this.

Remark 3.4. We can relate our C-GNN model to the continuous computation of traditional AC-GNNs. For any AC-GNN following Definition 2.2 with activation functions  $\mathcal{A}$  where the aggregation functions are computable by  $FAC_{\mathbb{R}^k}^0$ -circuit families, there exists a  $(FAC_{\mathbb{R}^k}^0,\mathcal{A})$ -GNN with a constant number of layers which computes the same function, omitting only the functionality of the classification function. This holds for the common aggregation functions like the sum, product or mean.

*Proof.* Let  $\mathcal{D}=(\{\mathrm{AGG}^{(i)}\}_{i=1}^L,\{\mathrm{COM}^{(i)}\}_{i=1}^L,\{\sigma^{(i)}\}_{i=1}^L,\mathrm{CLS})$  be an arbitrary AC-GNN as defined in Definition 2.2. A  $(\mathrm{FAC}_{\mathbb{R}^k}^0,\mathcal{A})$ -GNN computing the same function can be realized by using  $\mathrm{FAC}_{\mathbb{R}^k}^0$ -circuit families which compute the functions in  $\{\mathrm{AGG}^{(i)}\}_{i=1}^L$  and  $\{\mathrm{COM}^{(i)}\}_{i=1}^L$ . To compute a permutation-invariant aggregation function a tail-symmetric circuit family  $\mathcal{C}_{\mathrm{AGG}}$  computing a function  $f_{\mathcal{C}_{\mathrm{AGG}}}=\mathrm{AGG}^{(i)}$  as described in Definition 2.9 is used. The computation of a combine function can also be realised by using  $\mathrm{FAC}_{\mathbb{R}^k}^0$ - circuit families computing a function  $f_{\mathcal{C}_{\mathrm{COMB}}}=\mathrm{COMB}^{(i)}$ . Afterwards the application of the activation function  $\sigma^{(i)}$  can be achieved by adding  $\sigma^{(i)}$  to the set  $\mathcal{A}$  and denoting it as the activation function for layer i which is applied to all feature vectors after the circuit computations of these layers are done. As said in the beginning we are not concerned with classification in this paper, so we do not aim to simulate the transition from the continuous to the discrete as performed in the classification function of AC-GNNs. □

#### A.3 Proofs of Section 3.2

The proof of Theorem  $3.6^{\dagger}$  requires us to be able to distinguish the feature vectors of the neighborhood of a given node. This can be achieved by essentially zeroing out all non-neighbors as follows.

**Lemma A.4.** For each  $k, n, i \in \mathbb{N}$  where  $1 \leq i \leq n$  there exists an  $\mathrm{FAC}^0_{\mathbb{R}^k}$ -circuit family  $\mathcal C$  such that for all  $\mathfrak G = (V, E, g_V) \in \mathfrak{Graph}_k$  with  $V = \{v_1, \dots, v_n\}$  it holds that

$$f_{\mathcal{C}}(\langle \mathfrak{G} \rangle) = \langle vec(\mathfrak{G})' \rangle$$

where  $\langle \mathfrak{G} \rangle = (\langle adj(\mathfrak{G}) \rangle, \langle vec(\mathfrak{G}) \rangle)$  and for all  $1 \leq j \leq n$ 

$$vec(\mathfrak{G})'_{j} = \begin{cases} vec(\mathfrak{G})_{j}, & if \{v_{i}, v_{j}\} \in E \\ \overline{0}, & otherwise \end{cases}$$

where  $vec(\mathfrak{G})'_i$  and  $vec(\mathfrak{G})_i$  are the respective jth vectors of  $vec(\mathfrak{G})'$  and  $vec(\mathfrak{G})$ .

*Proof.* We write  $v_i$  for the node v which has number i and  $\overline{v}_i$  for its feature vector. Since the vectors in  $\langle M \rangle$  with  $M = \operatorname{adj}(\mathfrak{G})$  are ordered and the vectors  $\overline{m}_{ij}$  for  $1 \leq j \leq |V|$  correspond to one row in the adjacency matrix, row  $\overline{m}_{ij}$  describes all edge relations of  $v_i$ . Given the ordering on  $\operatorname{vec}(\mathfrak{G})$  we know that the ith vector in  $\operatorname{vec}(\mathfrak{G})$  is the vector  $\overline{v}_i$ . We multiply each vector  $\overline{m}_{ij}$  with  $\overline{v}_j$ , resulting in

$$\overline{m}_{ij}\overline{v}_j = \begin{cases} \overline{v}_j, & \text{if } (v_i, v_j) \in E \\ \overline{0}, & \text{otherwise} \end{cases}.$$

We continue to work with these values. This sets all feature vectors of non-neighboring nodes of  $v_i$  to  $\overline{0}$ .

We now proceed to prove Theorem  $3.6^{\dagger}$ . A simplified illustration of the proof is given in Figure 3.

**Theorem 3.6**<sup>†</sup>. Let  $\mathcal{N}$  be a  $(tFAC^0_{\mathbb{R}^k}[\mathcal{A}], \{id\})$ -GNN. Then there exists an  $FAC^0_{\mathbb{R}^k}[\mathcal{A}]$ -circuit family  $\mathcal{C}$ , such that for all labeled graphs  $\mathfrak{G}$  the following holds:

$$f_{\mathcal{N}}(\mathfrak{G}) = \mathfrak{G}' \text{ iff } \mathcal{C}(\langle \mathfrak{G} \rangle) = \langle \mathfrak{G}' \rangle$$

where 
$$\langle \mathfrak{G} \rangle = (\langle adj(\mathfrak{G}) \rangle, \langle vec(\mathfrak{G}) \rangle).$$

*Proof.* We would like to mimic each of the (constantly many) C-GNN layers with a circuit of constant depth. To do this we need to retrieve the correct circuit from the circuit families of our C-GNN for each layer i and for each possible size of the neighborhood for a node  $v \in V$ . These values are bounded by the number of layers d of the C-GNN and the size n = |V| of the input labeled graph  $\mathfrak{G} = (V, E, g_V)$ .

Since  $\mathfrak{G}$  has n nodes, our encoding  $\langle \mathfrak{G} \rangle$  consists of  $n^2$  vectors for the encoding of the adjacency matrix M and the feature vectors of the n nodes. We construct a circuit family  $\mathcal{K}$  which consists of a series of circuits for any possible number of nodes n.

While constructing a circuit K from K we use  $\mathcal{C}^{(i)} = proj_1(\mathcal{N}(i))$  (see Definition 3.3) to retrieve the circuit family of layer i of the C-GNN. The first observation we make is that if a node v has k neighbors, we need to consider the k+1st circuit from the circuit family for the computation, as this circuit gets all of the feature vectors of the neighbors of v and the feature vector of v as input. We also observe that, since our circuits are tail-symmetric, the order of the neighbors' feature vectors does not matter. We can figure out how many neighbors v has by summing over v's row in the adjacency matrix of  $\mathfrak{G}$ .

The circuit K works for a node v in the following way. When given the initial feature vectors and the encoding of the adjacency matrix M of  $\langle \mathfrak{G} \rangle$  as input it first computes the size k of the neighborhood of v. It then sets the feature vectors of non-neighboring nodes of v to  $\overline{0}$  via the functionality described in Lemma A.4. Afterwards it orders the feature vectors of all nodes in  $\mathfrak{G}$  such that the initial order remains the same, but all  $\overline{0}$  vectors get pushed to the end.

We can achieve such an ordering by essentially using the idea of the counting sort algorithm. We start by zeroing the non-neighbors, and then do a pairwise comparison with all nodes in the following way: When we compare two nodes  $v_p$  and  $v_q$  where  $v_p$  is a neighbor of v and  $v_q$  is not, then  $v_p > v_q$  and if neither or both are neighbors of v, then  $v_p > v_q$  iff p > q. We can do these kinds of comparisons, since we know that the comparison only needs to work for all numbers  $\leq n-1$  in  $\mathbb{N}_0$  and we can thus create a polynomial that works like the sign function for all these values. We then count (i.e. sum up) the number of comparison victories for each node and use equality checks (again, for numbers  $\leq n-1$  in  $\mathbb{N}_0$ ) to make sure the right vector is put in the right place.

Afterwards all circuits  $C_j^{(i)}$  from the circuit family  $\mathcal{C}^{(i)}$  for  $j \in \{1,\dots,n\}$  are simulated in parallel. Since the circuits  $C_j^{(i)}$  are tail-symmetric, it does not matter that we changed the order of the vectors. Moreover since at most k+1 vectors are nonzero after we zeroed the non-neighbors, they are in the first k+1 vectors in our new ordering. Each of these simulations is followed by an equality check on j and k+1. The sum is then formed over these which ensures that only the results of the circuit simulation for the correct number of neighbors of v is considered further.

We repeat the described computations for all nodes  $v \in V$  for every layer  $0 \le i \le d$ , using the resulting feature vectors from the computations before as input to the following ones. Now we have the following:

- The circuit K has constant depth.
- The computations for one layer are done in parallel for every node.
- The number of layers is a constant given by the C-GNN.
- As  $\mathcal N$  is a  $(t{\rm FAC}^0_{\mathbb R^k}[\mathcal A],\{{\rm id}\})$ -GNN the simulation of its circuit families are also  $t{\rm FAC}^0_{\mathbb R^k}[\mathcal A]$ -circuit families.

• The family of subcircuit described in Lemma A.4 is also an  $FAC^0_{\mathbb{R}^k}$ -circuit family.

Therefore  $\mathcal{K}$  is an  $FAC_{\mathbb{R}^k}^0[\mathcal{A}]$ -circuit family.

Other types of C-GNNs, mainly  $(tFAC_{\mathbb{R}^k}^0, \{id\} \cup \mathcal{A})$ -GNNs and  $(tFAC_{\mathbb{R}^k}^0, \{id\})$ -GNNs are also introduced in this paper. Restricting Theorem 3.6<sup>†</sup> to  $(tFAC_{\mathbb{R}^k}^0[\mathcal{A}], \{id\})$ -GNNs is not a restraint on generality as the following corollary shows.

**Corollary A.5.** Let  $\mathcal{A}$  be a set of activation functions. Then  $FAC^0_{\mathbb{R}^k}[\mathcal{A}]$ -circuit families can simulate  $(tFAC^0_{\mathbb{R}^k}, \{id\} \cup \mathcal{A})$ -GNNs and  $(tFAC^0_{\mathbb{R}^k}, \{id\})$ -GNNs in the sense of Theorem 3.6<sup>†</sup>.

*Proof.* As  $(tFAC^0_{\mathbb{R}^k}[\mathcal{A}], \{id\})$ -GNNs have access to the functions in  $\mathcal{A}$  at any point during the computations in a layer they are less restricted than  $(tFAC^0_{\mathbb{R}^k}, \{id\} \cup \mathcal{A})$ -GNNs and  $(tFAC^0_{\mathbb{R}^k}, \{id\})$ -GNNs, since those can either only apply the functions from  $\mathcal{A}$  as activation functions to all nodes at



Figure 3: Proof of Theorem 3.6<sup>†</sup>, where  $\overline{v}_i$  are the feature vectors of the C-GNN,  $\overline{u}_j$  the feature vectors of the neighbors of a node.

the end of a layers computation or do not have access to them at all. This is especially relevant for functions which cannot be computed by  $tFAC_{\mathbb{R}^k}^0$ -circuit families.

## A.4 Proofs of Section 3.3

The complete proofs of Section 3.3 make use of the following lemma which allows us to distinguish individual vectors from a finite subset of  $\mathbb{R}^k$  while still following our desired complexity constraints.

**Lemma A.6.** Let  $A \subseteq \mathbb{R}^k$  be finite and let  $\overline{a} \in A$ . Then the function  $\chi_{A,\overline{a}} \colon A \to \{\overline{0},\overline{1}\}$  defined as

$$\chi_{A,\overline{a}}(\overline{x}) := \begin{cases} \overline{1}, & \text{if } \overline{x} = \overline{a} \\ \overline{0}, & \text{otherwise} \end{cases}$$

where  $\overline{1}$  is the k-dimensional vector of ones, respectively  $\overline{0}$  the k-dimensional vector of zeros, is computable by a  $\mathbb{R}^k$  circuit whose size only depends on |A| and whose depth is constant.

*Proof.* Let  $A = \{\overline{a}, \overline{a}_1, \dots, \overline{a}_\ell\} \subseteq \mathbb{R}^k$  be a finite set and let the univariate polynomial  $p_{A,\overline{a}} \colon A \to \mathbb{R}^k$  be defined as follows:

$$p_{A,\overline{a}}(\overline{x}) := \left(\prod_{1 \leq i \leq \ell} (\overline{a}_i - \overline{x})\right) \cdot \frac{\overline{1}}{\prod_{1 \leq i \leq \ell} (\overline{a}_i - \overline{a})},$$

where all operations are applied componentwise. Now, for every element  $e \in A \setminus \{\overline{a}\}$ ,  $p_{A,\overline{a}}(e)$  yields  $\overline{0}$ , since one factor in the left multiplication will be  $\overline{0}$ . On the other hand,  $p_{A,\overline{a}}(\overline{a})$  evaluates to  $\overline{1}$ , since the left product yields exactly  $\prod_{1 \leq i \leq \ell} (\overline{a_i} - \overline{a})$ , which is what it gets divided by on the right. Thus,  $p_{A,\overline{a}} = \chi_{A,\overline{a}}$ , and since for any given A and  $\overline{a} \in A$ ,  $p_{A,\overline{a}}$  is a polynomial of constant degree, it can be evaluated by a circuit of constant depth and polynomial size in |A|.

**Definition A.7.** Let C be a circuit of an  $FAC^0_{\mathbb{R}^k}$ -circuit family with n inputs, m outputs and numbered gates via an injective function  $nr\colon V_C\to\mathbb{R}$  such that for all gates g,g' of  $C_n$ :

 $depth(g) < depth(g') \implies nr(g) < nr(g')$ . Let C also be in path-length-normal form and let  $\overline{x} \in (\mathbb{R}^k)^n$  be a tuple of vectors, the input of the circuit.

Then  $\mathfrak{G}_{C,\overline{x}}$  is the labeled graph  $(G_C,g_{V_C})$ , where  $G_C=(V_C,E_C)$  is the underlying graph of the circuit and the function  $g_{V_C}$  is defined as follows: The feature vector of a graph node corresponding to an input gate is the corresponding input value. Each other node gets its own number according to the gate numbering of the circuit as its feature vector.

We now proceed to prove the theorems of Section 3.3 in depth.

As mentioned in the main paper our proofs of this section are formulated using  $\mathbb R$  circuits. This technically results in our C-GNNs having feature vectors of dimension 1, i.e. vectors in  $\mathbb R^1$ . For simplicity reasons and because we also stated that  $\mathbb R$  circuits and  $\mathbb R^1$  circuits are equivalent we use real numbers, still called feature vectors, instead.

**Theorem 3.11**<sup>†</sup>. Let  $C = (C_n)_{n \in \mathbb{N}}$  be an  $FAC_{\mathbb{R}}^0$ -circuit family. Then there exists a  $(tFAC_{\mathbb{R}}^0, \{id\})$ - $GNN \ \mathcal{N}$ , such that for all  $n \in \mathbb{N}$  and  $\overline{x} \in \mathbb{R}^n$ , the feature vectors of the nodes of  $f_{\mathcal{N}}(\mathfrak{G}_{C_n,\overline{x}})$  corresponding to output gates in  $C_n$  are exactly the components of  $f_{C_n}(\overline{x})$ . The number of layers in  $\mathcal{N}$  is equal to the depth of C.

*Proof.* Without loss of generality let all circuits of  $\mathcal{C}$  be in path-length normal form and let the gates in the circuits  $C_n$  of  $\mathcal{C}$  be numbered via an injective function  $nr \colon V_{C_n} \to \mathbb{R}$  satisfying the conditions of Definition A.7.

We then define  $\mathcal N$  as follows: Let d be the depth of the circuits in  $\mathcal C$ .  $\mathcal N$  has d layers and maps each number in [d] to a circuit family  $\mathcal C_{\mathcal N}^{(i)}$ , i.e. it assigns a circuit family to every layer i. Since each node  $v_g$  in the given labeled graph  $\mathfrak G_{C_n,\overline x}$  represents a gate g in the underlying circuit  $C_n$ , each of the circuit families  $\mathcal C_{\mathcal N}^{(i)}$  of the C-GNN essentially make use of local neighborhood information in the graph of  $C_n$ . The circuit family  $\mathcal C_{\mathcal N}^{(i)}$  uses the gate numbers in the gate numbering of  $C_n$ , the depth and gate type of the respective gates and the number of the unique successor of each gate.

Let  $v_g^{(i)}$  denote the feature vector of node  $v_g$  in layer i. As per the definition of  $\mathfrak{G}_{C_n,\overline{x}}$ , the feature vector of each node in  $\mathfrak{G}_{C_n,\overline{x}}$  is initialized with the gate number of the gate in  $C_n$  it represents, and the nodes representing input gates are initialized with the respective input values of  $\overline{x}$ . In each layer i, the respective circuit of  $\mathcal{C}_N^{(i)}$  determines whether to perform a computation or just keep the feature vector as is. The circuit gets the previous feature vector  $v_g^{(i-1)}$  of the node as one of its inputs. It then checks whether  $v_g^{(i-1)}$  is among the gate numbers of gates with depth i in  $C_n$ . This is done in the following way: Let  $Y^{(i)}$  be the set of all gate numbers for gates in  $C_n$  that have depth i. Now  $\mathcal{C}_N^{(i)}$  simply computes  $\chi_{Y^{(i)},y}\left(v_g^{(i-1)}\right)$  for each  $y\in Y^{(i)}$  (see Lemma A.6) to check if  $v_g^{(i-1)}$  is the number of a gate at depth i. If that is not the case, then the gate g with nr(g)=g is not at depth i and should not be evaluated in layer i and  $\mathcal{C}_N^{(i)}$  keeps its feature vector as is, i.e., outputs the one it was given as its input. Otherwise,  $\mathcal{C}_N^{(i)}$  computes the respective function of g in the following way:  $\mathcal{C}_N^{(i)}$  computes  $\chi_{X_{op},y}\left(v_g^{(i-1)}\right)$  for  $op\in\{+,\times,out\}$ , where  $X_{op}$  contains the gate numbers of op-gates in  $C_n$ , to determine which function it needs to evaluate. If op=+, then  $\mathcal{C}_N^{(i)}$  computes the sum of all the feature vectors in the neighborhood of  $v_g$ .

Since the neighborhood of each node in  $\mathfrak{G}_{C_n,\overline{x}}$  contains not only the predecessors of the node, but also the successor of the node, this value still needs to be removed from the aggregation. This is achieved by figuring out which node is the successor of  $v_g$  using the gate number of the gate corresponding to  $v_g$ . We extract this information from the circuit  $C_n$  when constructing the C-GNN circuits and use it in the manner of a lookup table. For every  $y \in Y^{(i)}$  we check whether  $v_g^{(i-1)} = y$ , using the  $\chi$  function from Lemma A.6, and if true subtract the gate number we saved for y, which is nr(w) where w is the successor of  $v_g$ .  $\mathcal{C}_{\mathcal{N}}^{(i)}$  proceeds analogously for nodes corresponding to  $\times$  gates. For nodes corresponding to output gates  $\mathcal{C}_{\mathcal{N}}^{(i)}$  computes the sum over the single neighbor. As all input circuits to the C-GNN are in path length normal form it is ensured that nodes corresponding to output gates are only evaluated in the last layer of the C-GNN. The complete functionality of  $\mathcal{C}_{\mathcal{N}}^{(i)}$  for each layer i is described by Algorithm 1. By Lemma A.6 we know that for any finite set A and

any  $a \in A$ ,  $\chi_{A,a}$  can be evaluated with a circuit whose size only depends on |A| and whose depth is constant. Since we only need to evaluate polynomially many such functions and we can compute them in parallel, the circuit family we outline here is still an FAC $^0_{\mathbb{R}}$ -circuit family.

For example a node corresponding to a + gate with number 5 at depth 2, which knows that the gate its successor has number 9, will sum up all of its neighbors values and subtract 9 in the second layer of the C-GNN. See Figure 4 for visualisation. The only activation function used in each C-GNN layer is the identity function.

## **Algorithm 1** Proof Theorem 3.11<sup>†</sup>: algorithm for the circuit family in layer *i* of the C-GNN

```
Input: v_g^{(i-1)}, U_g = \{u^{(i-1)} \mid u \in \mathcal{N}_G(v_g)\} Y^{(i)} = \{nr(g) \mid depth(g) = i\} {The set of numbers of gates that need to be evaluated in this layer} X_+ = \{nr(g) \mid g \text{ is } + \text{gate}\} X_\times = \{nr(g) \mid g \text{ is } \times \text{ gate}\} X_{out} = \{nr(g) \mid g \text{ is out gate}\} if v_g^{(i-1)} \in Y^{(i)} then {If a node needs to be evaluated in this layer} if v_g^{(i-1)} \in X_+ then {If v_g corresponds to an addition gate} v_g^{(i)} = \sum_j U_g - nr\left(suc\left(v_g\right)\right) else if v_g^{(i)} = \prod_j U_g/nr\left(suc\left(v_g\right)\right) else {If v_g corresponds to an output gate} v_g^{(i)} = \sum_j U_g end if else {If v_g does not correspond to a gate at depth i} v_g^{(i)} = v_g^{(i-1)} end if
```

**Theorem 3.12**<sup>†</sup>. Let  $C = (C_n)_{n \in \mathbb{N}}$  be an  $FAC_{\mathbb{R}}^0[A]$ -circuit family. Then there exists a  $(tFAC_{\mathbb{R}}^0[A], \{id\})$ -GNN  $\mathcal{N}$ , such that for all  $n \in \mathbb{N}$  and  $\overline{x} \in \mathbb{R}^n$ , the feature vectors of the nodes of  $f_{\mathcal{N}}(\mathfrak{G}_{C_n,\overline{x}})$  corresponding to output gates in  $C_n$  are exactly the components of  $f_{C_n}(\overline{x})$ . The number of layers in  $\mathcal{N}$  is equal to the depth of C.

*Proof.* This proof follows the same ideas as the proof of Theorem 3.11<sup>†</sup> but includes the occurrence of gates computing functions besides + and × in the circuit family  $\mathcal{C}$  as well as in the circuit families  $\mathcal{C}_N^{(i)}$  used in the C-GNN.

Without loss of generality let all circuits of  $\mathcal{C}$  be in path-length normal form and let the gates in the circuits  $C_n$  of  $\mathcal{C}$  be numbered via an injective function as described in Definition A.7. We will use  $nr(C_n)$  to denote the set of all gate numbers in the circuit  $C_n$ .

We now define  $\mathcal N$  as follows: For every circuit in all circuit families  $\mathcal C_{\mathcal N}^{(i)}$ , we initialize all nodes with their respective gate number or input value as described in the proof of Theorem 3.11<sup>†</sup>. The sets  $Y^{(i)}$  for all layers i of the  $\mathcal C$ -GNN and the sets  $X_+$ ,  $X_\times$  and  $X_{out}$  are defined the same as well. Additionally, we now need sets  $X_\sigma$  for every function  $\sigma \in \mathcal A$ , such that  $X_\sigma$  contains all gate numbers of gates computing  $\sigma$  in  $C_n$ . All these sets are finite, since they are all subsets of  $nr(C_n)$  which is the set of gate numbers of  $C_n$ .

The idea is to handle the nodes corresponding to  $\sigma$ -gates in the same way as the other nodes – for every node  $v_g$  we check if it is to be evaluated in this layer, i.e. the depth of gate g in  $C_n$  equals the current layer i, by checking whether  $v_g^{(i-1)} \in Y^{(i)}$  and if so, assess of which type g is by checking  $v_g^{(i-1)}$  for inclusion in a set  $X_{op}$  where  $op \in \{+, \times, out\} \cup \mathcal{A}$ . We do this as in the proof of Theorem 3.11 $^\dagger$  by computing  $\chi_{X_{op},x}\left(v_g^{(i-1)}\right)$  for all  $X_{op}$  and all  $x \in X_{op}$ .

<sup>&</sup>lt;sup>4</sup>For readability purposes the one-dimensional vectors in the C-GNN are written as real numbers.



Figure 4: Example in the proof of Theorem 3.11<sup>†4</sup>

If we determined that a gate g corresponding to the node  $v_g$  computes a function  $\sigma \in \mathcal{A}$ , we compute the function on the feature vector of the node corresponding to its predecessor in  $C_n$ . Since all  $\sigma \in \mathcal{A}$  are unary functions, g has only one predecessor and given that  $C_n$  is in path-length normal form, it also has only one successor. Let u be the predecessor and w be the successor of  $v_g$ . We compute  $v_g^{(i)} = \sigma\left(u^{(i-1)}\right)$  by summing up the feature vectors of all neighbors of  $v_g$  in  $\mathfrak{G}_{C_n,\overline{x}}$  and then determining the gate number of the gate corresponding to w and subtract this value from the sum as described in the proof of Theorem  $3.11^{\dagger}$ . This leaves us with the feature vector of the predecessor of  $v_g$ , since  $u^{(i-1)} + w^{(i-1)} - w^{(i-1)} = u^{(i-1)}$ . Then  $\sigma$  is applied to  $u^{(i-1)}$  using an  $\sigma$ -gate and  $v_g^{(i)}$  is set to this value.

For all nodes  $v_h$  which do not represent gates which have depth i in  $C_n$ , the new feature vector  $v_h^{(i)}$  equals  $v_h^{(i-1)}$ .

All layers of N use the identity as their activation function.

**Theorem 3.17**<sup>†</sup>. Let  $\mathcal{A}$  be a set of countably injective activation functions and let  $\mathcal{C} = (C_n)_{n \in \mathbb{N}}$  be an  $fFAC_{\mathbb{R}}^0[\mathcal{A}]$ -circuit family. Then there exists a  $(tFAC_{\mathbb{R}}^0, \mathcal{A} \cup \{id\})$ -GNN  $\mathcal{N}$ , such that for all  $n \in \mathbb{N}$  and  $\overline{x} \in (\mathbb{R})^n$ , the feature vectors of the nodes of  $f_{\mathcal{N}}(\mathfrak{G}_{C_n},\overline{x})$  corresponding to output gates in  $C_n$  are exactly the components of  $f_{C_n}(\overline{x})$ . The number of layers in  $\mathcal{N}$  is equal to the depth of  $\mathcal{C}$ .

*Proof.* We again reuse the basic concept of the proof of Theorem 3.11<sup>†</sup> but adapt it for function gates. Unlike in Theorem 3.12<sup>†</sup>, the circuit families of our  $(tFAC_{\mathbb{R}}^0, \mathcal{A} \cup \{id\})$ -GNN do not have arbitrary access to functions in  $\mathcal{A}$  and thus cannot decide for each individual node, whether to apply an activation or not. We can only apply it to all of our nodes' feature vectors, or to none of them.

Let  $\sigma \in \mathcal{A}$  be countably injective on a set  $S \subseteq \mathbb{R}$  and let the gates in  $C_n$  be numbered as described in Definition A.7 by values from S. Let  $i \in \mathbb{N}$  be such that all gates at depth i in  $C_n$  are  $\sigma$ -gates. Since  $C_n$  is in function-layer form, if any gate at depth i is a  $\sigma$ -gate, then all gates at depth i are. The circuit family  $\mathcal{C}_{\mathcal{N}}^{(i)}$  used for each node in layer i in  $\mathcal{N}$  then first checks whether its previous feature vector  $v_g^{(i-1)}$  corresponds to a gate number of a gate g in layer i by querying  $v_g^{(i-1)} \in Y^{(i)}$  using  $\chi_{Y^{(i)},y}\left(v_g^{(i-1)}\right)$ , where  $Y^{(i)}$  is the set of all gate numbers for gates in  $C_n$  that have depth i, as introduced before.

If  $v_g^{(i-1)} \in Y^{(i)}$ , then  $\mathcal{C}_{\mathcal{N}}^{(i)}$  proceeds as in the case for addition gates in Algorithm 1, summing over its neighbors and removing the value of its successor. This way, after the activation function is applied,  $v_g^{(i)} = \sigma\left(v_g^{(i-1)}\right)$ , the intended value after layer i in  $\mathcal{N}$ . If  $v_g^{(i-1)} \notin Y^{(i)}$ , we need to make sure that

**Algorithm 2** Proof Theorem  $3.17^{\dagger}$ : algorithm for the circuit in layer i of the C-GNN if layer i evaluates function gates

```
Input: v_g^{(i-1)}, U_g = \{u^{(i-1)} \mid u \in \mathcal{N}_G(v_g)\} Y^{(i)} = \{nr(g) \mid depth(g) = i\} {The set of numbers of gates that need to be evaluated in this layer} X_+ = \{nr(g) \mid g \text{ is } \times \text{ gate}\} X_- = \{nr(g) \mid g \text{ is } \times \text{ gate}\} X_{out} = \{nr(g) \mid g \text{ is out gate}\} if v_g^{(i-1)} \in Y^{(i)} then {If a node needs to be evaluated in this layer} if v_g^{(i-1)} \in X_+ then {If v_g corresponds to addition gate} v_g^{(i)} = \sum U_g - nr(suc(v_g)) else if v_g^{(i-1)} \in \sigma(X_+) then {If v_g corresponds to multiplication gate} v_g^{(i)} = \prod U_g/nr(suc(v_g)) else {If v_g corresponds to output gate} v_g^{(i)} = \sum U_g end if else {If v_g does not correspond to a gate at depth i} v_g^{(i)} = \sigma^{-1}\left(v_g^{(i-1)}\right) end if
```

after the application of  $\sigma$  as an activation function,  $v_g^{(i)}$  is still the gate number of its corresponding gate g in  $C_n$ . For this purpose,  $\mathcal{C}_{\mathcal{N}}^{(i)}$  determines and outputs  $\sigma^{-1}\left(v_g^{(i-1)}\right)$ , so that after applying the activation function,  $v_g^{(i)} = \sigma\left(\sigma^{-1}\left(v_g^{(i-1)}\right)\right) = v_g^{(i-1)}$ . The inverse  $\sigma^{-1}\left(v_g^{(i-1)}\right)$  exists, because  $\sigma$  is countably injective in the set of gate numbers of  $C_n$ . The circuit family  $\mathcal{C}_{\mathcal{N}}^{(i)}$  sets its output to  $\sigma^{-1}\left(v_g^{(i-1)}\right)$  by computing  $\sum_{v\in nr(C_n)}\sigma^{-1}(v)\cdot\chi_{S,v}\left(v_g^{(i-1)}\right)$ , where  $nr(C_n)$  is the set of gate numbers of  $C_n$ . This works for all gates at depths >i. Gates at depths < i do not have an influence on the output of the computation anymore, and what happens to them is thus of no importance.

The procedure performed by  $\mathcal{C}_{\mathcal{N}}^{(i)}$  is outlined in Algorithm 2.

The preceding procedure requires that all functions in  $\mathcal{A}$  are countably injective in the same set. If they are not countably injective in the same set, we modify the construction by using a new gate numbering of  $C_n$  for each function layer, assigning new values to the feature vectors before the activation function is used. All sets of numbers of specific gates, e.g. numbers of addition gates are updated accordingly.