Introduction

Time-varying networks are ubiquitous. Examples are found in the social, cognitive, technological and ecological domains as well as in many others1. The temporal nature of such systems has a deep influence on dynamical processes occurring on top of them2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21. Indeed, the spreading of sexual transmitted diseases, the diffusion of topics over social networks and the propagation of ideas in scientific environments are affected by duration, sequence and concurrency of contacts2,4,17,18,19,22,23. In all these cases the timescale characterizing the evolution of the network is comparable with the timescale ruling the unfolding of the process and they cannot be decoupled. However, empirical datasets are often reduced to a series of static networks by introducing a time-integrating window, Δt1,24,25,26,27. This is the case, for instance, of face-to-face interaction networks28, for which the fine-grained temporal resolution of (e.g.) phone call networks is not available, or of infants' semantic networks29, whose evolution can be studied only through the analysis of few snapshots30. In other instances, a time window is introduced to reduce the amount of stored information, or to simplify the application of mathematical frameworks developed for static or annealed systems. This is the case, for example, of online social networks where, although usually the original information has time resolutions down to the second, the available datasets are integrated over different windows of hours, days, months, or even years. Thus, the introduction of an integrating window is either intrinsic to the system under study or dictated by practical reasons.

In this work we address the impact of an arbitrary Δt on the description of a discrete dynamical process taking place upon a time-varying network. Despite recent results showing that the presence of any level of temporal aggregation may affect the correct characterization of dynamical processes evolving on top of such datasets2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21, an analytical formalization, characterization and understanding of these effects for a general Δt is still missing.

In particular, we focus on the prototypical random walk process evolving on time-varying networks integrated over a general time window Δt. First, we clarify the relevance of the integrating window issue by studying the behavior of random walk processes on real time-varying networks as a function of Δt. Then, we introduce a mathematical framework that well describes the observed behavior on synthetic activity driven networks17 as well as on two different real datasets.

Results

We aim to understand how Δt affects the behavior of dynamical processes taking place on time-varying networks. To this end, we consider the fundamental random walk (RW) process on two different real time-varying networks in which the links have been integrated over different integrating windows Δt (see Fig. 1). Typically, the RW asymptotic occupation probability ρ (see Methods for the formal definition) is computed grouping the nodes according to their the degree k31,32,33. The quantity ρk is then defined as the average asymptotic occupation probability of a node in the degree class k31,32,33. However, in time-varying networks the degree of a node is not univocally defined and, more importantly, is a function of Δt. For example, the degree might be the number of connections integrated over the time window, or the average number of connections across the Tt static frames (where T is the total time span of the data). Thus, the same node could contribute to different degree classes depending on the value of Δt. We, therefore, focus on a different node measure that has been shown to be mostly invariant to Δt, namely the activity rate a of a node17. The activity rate a is defined as the average rate at which each node interacts with others during the observation period [0, T] and can be interpreted as the intrinsic attitude of each node to engage in interactions with other nodes. We aim to calculate the occupation probability as a function of a.

Figure 1
figure 1

Example of time integration on time-varying networks.

The random walker is located on the colored node and can travel on the links depicted as continuous line, while Δt defines the integration window. Dashed lines represent links that are present in the system, but are out of reach for the walker.

In our simulations we consider two real time-varying networks and investigate the RW occupation probability function of activity rate a and the integrating window Δt: ρat). The first dataset is the co-authorship network of the Physical Review Letters (PRL) journal from 1980 to 200634. The second dataset is the Yahoo! music dataset with ~4.6 × 105 songs rated by ~2 × 104 Yahoo! users over six months35. We run the RW process over these two time-varying networks for different values of Δt and record the occupation probability over multiple runs (see SI for details). Fig. 2 shows the empirical values of ρat) (solid points) observed in the PRL dataset for four distinct values of Δt = {1, 10, 60, 182} days. Error bars represent the the standard deviation obtained from distinct simulation runs starting at times t0 {0, 1, …, Δt − 1} from the beginning of the dataset.

Figure 2
figure 2

Occupation probability ρa of a RW at the end of the simulation as a function of node activity.

The points are the values of ρa of a RW over the Physics Review Letters time-varying co-authorship network from 1980 to 2006 for different integrating windows Δt {1, 10, 60, 182} days. The error bars are evaluated starting the process at different days from the beginning of the dataset.

The effect of Δt is dramatic. Over large values of Δt the RW behaves roughly as could be expected. The share of random walkers increases with the node activity, i.e., highly active nodes are collect more walkers at the end of the simulation than nodes with low activity. However, as Δt decreases, more active nodes lose their power to attract walkers and the occupation probability becomes more uniform. A similar scenario is observed over the Yahoo! dataset over four values of Δt, namely one second, one hour, six hours and one day (points in Fig. 3). In the next section we will see that the reason for this behavior rests solely in the probability that the RW sees no edges when it decides to move, which turns out to be a function of three factors: Δt, the activity of node the walker resides and the average node activity in the system.

Figure 3
figure 3

Occupation probability ρa of a RW at the end of the simulation as a function of node activity.

Points represent the ρa values of a RW over the time-varying graph of Yahoo! song ratings for different integrating windows Δt of one second, one hour, six hours and one day. The standard deviations are too small to be shown in the plots.

Mathematical formulation

Let us consider a random walker diffusing at discrete time steps Δt over a time-varying network characterized by N nodes. Starting at node V (t) at step t, the walker takes step t + 1 at time (t + 1) Δt diffusing over a network Gtt), where Gtt) is the result of the union of all the edges generated in the interval [tΔt, (t + 1) Δt). We focus on the general case of an arbitrary time aggregation window Δt > 0.

We consider a simple class of time-varying networks called activity driven networks17. The crucial ingredients of these models are: dF(a), the fraction of nodes with activity rate a and m, the number of edges that are simultaneously created by a node (see Methods for further details). The activity rate determines the probability per unit time for a node to establish (m, simultaneously) edges to other nodes in the system. The value of parameter m is dictated by the specific system under consideration. The case m > 1 is appropriate to describe one-to-many interactions, found for example in such systems as Twitter and blog networks36,37. On the other hand, m = 1 describes two-party (dyadic) communications that are characteristic of phone-call and text-message networks38,39. At each step t = 0, 1, … an unweighted network Gtt) is generated as follows:

  1. a

    Gtt) starts with N disconnected nodes;

  2. b

    The the number of times a node with activity a is active during interval Δt, KΔt,a, is Poisson distributed

    Node generates mKΔt,a undirected edges connected to mKΔt,a randomly selected nodes (without replacement or self-loops). Inactive nodes in this observed period of Δt may receive connections from other active vertices;

  3. c

    At time (t + 1)Δt the process starts over from step a) to generate network Gt+1t).

Although activity driven networks are Markovian (memoryless) and lack of some properties observed in real temporal systems, they can be considered as the simplest yet nontrivial framework to study the concurrence of changes in connectivity pattern of the network and dynamical processes unfolding on their structure17,18.

To describe the RW behavior, we need to evaluate the transition probability that a walker starting at a node with activity a′ moves to a node with activity a at the next Δt time step, Qa|at). Without loss of generality in what follows we focus on the case m = 1. Detailed results for the m > 1 one-to-many interactions are discussed in the Supplementary Information. At step t + 1 the neighbors of V (t) can be classified into two types:

  1. 1

    Passive destinations, are neighbors of V (t) connected by edges created due to the activity of V (t) itself. They are randomly selected from the graph and thus their activity is distributed according to dF(a). We define KΔt,A(t) to be the number of such passive destinations, where A(t) is the activity rate of node V (t).

  2. 2

    Active destinations, are neighbors of V (t) connected to V (t) by edges created due to their own activity. Thus, their activity is distributed as adF(a)/〈a〉, where 〈a〉 is the average activity rate in the system. We define define HΔt as the number of such active destinations.

The word destinations highlights the fact that the walker moves from V (t) to one of these KΔt,a + HΔt neighbors of V (t). For sufficiently large N, HΔt and KΔt,a are both Poisson distributed with average 〈a〉Δt and a′Δt, respectively. If V (t) has at least one edge, the walker follows the edge of a passive destination with probability KΔt,a/(KΔt,a + HΔt), while it moves towards an active destination with probability HΔt/(KΔt,a+ HΔt). Unconditioning the latter expressions with respect to the values of KΔt,a and HΔt we obtain

where δ(x) is the Dirac delta function. While we refer the reader to the SI for the detailed derivation, each term in eq. (1) has a simple interpretation. The two terms inside the double sum represent, respectively, the probability that the walker moves to a passive destination that has activity a and the probability that the walker moves to an active destination that has activity a. The terms multiplying the two terms inside the double summation are related to the probability that KΔt,a = k and HΔt = h. The δ(aa′) term considers the probability that the node has no edges after Δt and thus the walker must remain at V (t).

Thankfully, eq. (1) can be simplified (see SI) yielding

where ζa′,Δt = e−(a′+〈a〉)Δt is the probability that no edge is created at a node with activity a′ during interval Δt. Note that in eq. (2) the parameter Δt only affects the probability that no edge is created until the next time step.

To find the RW stationary distribution we first note that the RW on the time-varying network is stationary and ergodic (see SI). Thus, the RW occupation probability ρa, defined as the probability of finding the walker in a given node of activity a, exists and is unique40. The value of ρa is the fixed point solution of the following Chapman-Kolmorogov set of equations41

where Ω is the set of all activity rates in the system. The solution to eq. (3) can be obtained numerically. Interestingly, we can extend eq. (3) to consider lazy random walks where the walker moves with probability p (0, 1] or does not move with probability 1 − p. For the lazy walker we just need to replace Qa|at) in eq. (3) with Qa|at)p + δ(a′ − a)(1 − p). A simple algebraic manipulation shows that ρa does not change with p. Hence, the steady state of the lazy walker for any p (0, 1) is the same as the walker that moves with probability p = 1.

We also find that closed-form solutions of eq. (3) exist in the limits of and . In the case, links are integrated over a large time window and the time-varying network can be considered static. Recall that ζat = e(a+〈a〉)Δt. For the value of ζat ≈ 0, a Ω and thus the second term of eq. (2) is close to zero. In this scenario Qa|at) = C(a + 〈a〉)dF(a), where C = 1/2〈a〉 yielding the fixed point solution of eq. (3)

The asymptotic occupation probability of a given node of class a is simply proportional to its activity. Since in the regime of large Δt the degree of a node v, kv, is proportional to its activity, av, eq. (4) yields . Thus, for sufficiently large Δt, we recover the well-known behavior of static networks, where the occupation probability of a node is proportional to its degree31. Furthermore, in the SI we show that eqs. (2), (3) and (4) hold for weighted aggregation procedures where integrated edges have weights proportional to how often they appeared during an interval Δt.

In the regime of very short aggregating windows we have limΔt→0 ζat → 1, a Ω. Thus, the first term of eq. (2) is zero yielding Qa|at) = dF(a) and the trivial fixed point solution of eq. (3)

Thus, the walker is equally likely to be found at any node regardless of its activity rate. In fact, when Δt is small the probability a node has more than one edge is close to zero. Consequently, highly active nodes lose and gain walkers at the same rate, giving rise to homogeneous occupation probabilities in eq. (5). Interestingly, in previous work on general time-varying network processes we show that the result in eq. (5) holds even when aggregated snapshots have arbitrary strong spatio-temporal correlations40.

Numerical validation on synthetic networks

We validated our analytical results through extensive numerical simulations. We considered networks with N = 105 nodes and a power-law activity distribution dF(a) a−γ (as observed in many real networks17), restricted to the interval Ω = [10−3, 1] to avoid divergencies in the limit . As shown in Fig. 4, the exact solution reproduces the simulations accurately for the entire spectrum of integrating windows Δt (case m = 1 in main panel). Interestingly, as Δt grows, the occupation probability increases sharply in high-activity vertices while slightly decreasing at low activity nodes. Moreover, as Δt increases ρa a as predicted by eq. (4), while as Δt gets smaller, ρa = 1/N, as predicted by eq. (5). The equations describe correctly also the behavior observed for one-to-many simultaneous connections m, characterized by a smoother increase in ρa at high activity nodes (see m = 6 case in Fig. 4, inset). The SI contains more details on the formulation of the m > 1 case.

Figure 4
figure 4

Occupation probability ρa of a RW over an activity-driven network with activity distribution dF(a) a−2, a (10−3, 1), N = 105, for different values of m.

Curves in the main plot concern the m = 1 case, where each node can only simultaneously connect to one node. In the inset, the case m = 6 is considered, where a node simultaneously connect to six other nodes. Solid curves represent the analytical prediction of eq. (3) integrated over Δt = 1, 10, 100 (diamonds, squares and circles) time windows. Note that in both panels as Δt gets larger ρaa. Averages performed over 103 independent simulations.

Numerical validation on real-world networks

The analytical framework discussed above qualitatively reproduce also the behavior observed in real datasets. In Figs. 5 and 6 the solid lines show the numerical solution obtained by applying eq. (2) into eq. (3) (see SI), for the PRL and Yahoo! datasets, respectively. The gray points in Figs. 5 and 6 reproduce the simulation results already shown in Figs. 2 and 3, respectively. All numerical solutions use the same activity distribution dF(a), extracted from the time-varying graph of Δt = 1 day for the PRL dataset and Δt = 1 second for the Yahoo! dataset (dF(a) extracted from larger values of Δt provide similar results17, see SI for details).

Figure 5
figure 5

Occupation probability ρa of a RW at the end of the simulation as a function of node activity.

The points are the values of ρa of a RW over the Physics Review Letters time-varying co-authorship network from 1980 to 2006 for different integrating windows Δt {1, 10, 60, 182} days. The solid curves show the respective numerical solutions of eq. (3) and the black curve shows eq. (4).

Figure 6
figure 6

Occupation probability ρa of a RW at the end of the simulation as a function of node activity.

The points are the values of ρa of a RW over the time-varying graph of Yahoo! song ratings for different integrating windows Δt of one second, one hour, six hours and one day. The solid curves show the respective numerical solutions of eq. (3) and the black curve shows eq. (4).

The theoretical results accurately describe real data, with some deviations for nodes in the intermediate activity range at Δt of one day. The RW occupation probability is uniform and independent of node activity for small Δt as predicted by eq. (5). As predicted by eq. (4), the RW occupation probability ρa approaches (a + 〈a〉)/(2Na〉) (black curve) as Δt increases, an effect particularly noticeable for high-activity nodes. It is also worth highlighting that the data matches well the theoretical equations for the case m = 1, suggesting a connection between the datasets and the fundamental mechanisms described in our model (for the similarity in behavior between m = 1 and projected networks such as the PRL co-authorship networks see SI).

Discussion

Our results clarify the effect of time aggregation procedures on the behavior of the RW, taken as the simplest instance of dynamical process, even when aggregation windows are “short”. We have quantified this effect in a rigorous mathematical framework that (i) allows us to recover the results concerning static networks in the limit of infinite aggregation windows, (ii) accurately describes the behavior observed in numerical simulations upon synthetic time-varying networks and (iii) captures the phenomenology observed on real datasets. Overall, while for practical or technical reasons researchers are often forced, or simply tempted, to work with time aggregated representations of time-varying networks, our work suggests that caution should be used when drawing general conclusions about dynamical processes based upon time-aggregated networks. At the same time, moreover, our theoretical results may help to investigate possible distortions introduced by the aggregating windows of data collection methods.

The proposed framework considers inherently discrete processes, such as spreading phenomena in contact networks that are, also at the smallest time resolution possible, discrete. We leave the generalization to continuous processes for further work.

Methods

Occupation probability

The asymptotic occupation probability is the steady state probability of finding the walker in a node with activity a, which is guaranteed to exist and be unique if the time-varying network that is stationary, ergodic and T-connected (see SI), such as in activity driven networks. A time-varying network is T-connected if there is a temporal path between any two nodes40. In our simulations we consider the RW occupation probability ρa to be the probability of finding the walker in a node with activity a at the end of the simulation period [0, T], given the walker starts at a random node.

Activity-driven networks

Activity-driven network models are based on the activity patterns of nodes, that are used to explicitly model the evolution of the network structure over time17. It can be shown that the full dynamics of the network are encoded in the activity rate distribution, dF(a) and that the time-aggregated measurement of network connectivity yields a degree distribution that follows the same functional form as the distribution dF(a) in the limit of small kt and k/N17. This is an important feature of the model, that is able to reproduce basic statistical properties found in many real networks giving a simple prescription to characterize explicitly dynamical connectivity patterns.

Datasets & simulation

In this study we considered two different empirical projections of bipartite time-varying networks. The collaborations in the journal Physical Review Letters (PRL) published by the American Physical Society34 and the Yahoo! music dataset made available by Yahoo!35.

PRL dataset

The bipartite network representation of this dataset has two type of nodes: authors and papers. An author is connected to all the papers she/he wrote in a integrating window Δt. We study the bipartite projection of the authors. In this representation each author of an article in PRL as a node. Undirected edges connect authors that collaborate in the same article. We focus just on small collaborations filtering out all the articles with more than 10 authors. We consider the period between 1958 and 2006. The datasets contains 80,554 authors and 66,892 articles. The smallest timescale available is one day.

Yahoo! music dataset

In this dataset the bipartite network has two type of nodes: users and songs. We study the bipartite projection over the songs. Each node is a song and two songs are connected if at least one user rated both in a time window Δt. The dataset contains 4.6 × 105 songs rated by 199,719 users of Yahoo! users collected in the course of six months35. User activity is recorded at a time resolution of seconds.

Simulation setup

We obtain the empirical walker occupation probability, ρa, as follows. Construct the transition probability matrix Pt associated to the RW on the t-th aggregated network Gtt), t = 0, …, Tt, where T is the time of the last event in the dataset. The empirical RW occupation probability is obtained by multiplying the matrices and then left-multiplying the result by the vector (1/N, …, 1/N), which gives equal probability that for the walker to start at any node. We note in passing that similar results are obtained when the walker starts at a handful of high activity nodes.