Bayesian Modeling Approach for Forecast of Structural Stress Response Using Structural Health Monitoring Data

: The advancement in structural health monitoring (SHM) technology has been evolving from monitoring-based diagnosis to monitoring-based prognosis. The structural stress response derived by the measured strain data is increasingly used for structural condition diagnosis and prognosis because it can be directly used to indicate the safety reserve of a structural component or provide information regarding the load-carrying capacity of the whole structure. Therefore, accurate forecasting of structural stress responses is an essential step for the reliable diagnosis and prognosis of structural condition. For a large-scale, complex structure subjected to multisource effects such as live loads and environmental loads, its stress evolution is a typically nonlinear dynamic process. Moreover, the online monitoring-derived stress data extracted from an SHM system are extremely massive. This arouses a strong demand for developing a computationally efficient and accurate method for forecasting structural stress responses. In this work, we propose the use of a Bayesian modeling approach with Gaussian processes (GPs), which allows for probabilistic forecasts of structural stress responses and has great capability of modeling the underlying nonlinear dynamic process. Although powerful for characterizing dynamic nonlinearity of structural stress responses, the conven-tional GP-based Bayesian modeling approach remains computationally intensive because of the massive stress data increasingly gathered by the monitoring system. We propose a moving window strategy to substantially shrink the size of training data, thus leading to a reduced-order GP model and effectively alleviating the high computational cost. The feasibility of the reduced-order GP-based Bayesian modeling approach is illustrated by using the real-time monitoring-derived stress data acquired from a supertall structure. Its performance is compared with the full GP-based Bayesian approach, and the comparison results indicate that the proposed approach holds higher computational accuracy and efficiency for stress response forecasts than the traditional method. DOI: 10.1061/(ASCE)ST.1943-541X.0002085. This work is made available under the terms of the Creative Commons Attribution 4.0 International license, http://creativecommons.org/licenses/by/4.0/.


Introduction
Civil infrastructure systems such as tunnels, pipelines, railways, buildings, and bridges, among many others, play a vital role in maintaining the well-being of people, protecting significant capital investments, and promoting the regional and national prosperity. These infrastructures are designed to provide many years of safe functionality not only under normal operational conditions but also during extreme events (such as typhoons and earthquakes). Structural health monitoring (SHM) has attracted increasing attention in both academic and industrial communities in the interest of ensuring the safe and reliable operation of infrastructure systems and issuing early warnings on damage or deterioration prior to costly repair or even catastrophic failure. The past decades have witnessed a surge of research dedicated to SHM (Doebling et al. 1998;Catbas and Aktan 2002;Sohn et al. 2003;Spencer et al. 2004;Ko and Ni 2005;Brownjohn 2007;Farrar and Lieven 2007;Lynch 2007;Song et al. 2009;Ou and Li 2010;Wang and Yim 2010;Yang et al. 2015) and its applications to a variety of infrastructure systems such as bridges (Pines and Aktan 2002;Wong 2004;Chan et al. 2006;Peeters et al. 2009;Ni et al. 2011;Li et al. 2014), buildings (Lin et al. 2005;Kijewski-Correa et al. 2006;Brownjohn and Pan 2008;Ni et al. 2009), and tunnels (Glisic et al. 2000;Mohamad et al. 2012;Zheng and Lei 2017). The deployment of online SHM systems on infrastructures allows structural operators and asset managers to rate the structural performance in real time and conduct maintenance and rehabilitation in a timely manner according to the monitored structural condition.
The real-time monitoring data are highly valuable because they reflect the authentic behavior of the monitored structure under operational conditions. Forecasting of the future states and performance from the measured structural responses is becoming crucial for condition prognosis and control of an engineered structure. It is also central to sensor fault diagnosis, signal outlier detection, and reconstruction of incomplete data (Abdelghani and Friswell 2007;Hernandez-Garcia and Masri 2008;Kullaa 2011). Without much effort, the real-time data can be gradually accumulated from a monitoring system over time. The fast-varying acceleration data and the slow-varying strain/stress data are two main measurands from the instrumentation systems. The former has been extensively used in structural health diagnosis by the traditional vibrationbased methods (Doebling et al. 1998;Sohn et al. 2003;Jaishi and Ren 2005;Ni et al. 2008;Hua et al. 2009;Moaveni et al. 2009) or in combination with Bayesian approaches (Papadimitriou et al. 2001;Christodoulou et al. 2008;Beck 2010 Wang et al. 2018) because it can be used to reliably extract the modal properties (i.e., frequencies, mode shapes, and their derivatives such as modal flexibility), which the vibration-based methods rely on. In recent years, the strain response, as a localized structural response, has gained growing interest for structural condition diagnosis and prognosis. In particular, online monitoring of strain can offer a way to derive the stresses experienced by the monitored structure during its service. Stress data can be directly used to indicate the safety reserve of a structural component or provide information about the load capacity of the whole structure; they would be better suited to characterize local damage of a structure than the vibration (acceleration) data. On the other hand, the emergence of innovative optical fiber sensors has made it possible to conduct densely distributed monitoring of strain for SHM applications. A lot of research activities have been focused on structural health assessment using successively accumulated strain data (DeWolf et al. 2002;Chen et al. 2004;Chakraborty and DeWolf 2006;Wu et al. 2008;Ni et al. 2012). To take into account the uncertainty in stress responses, a variety of procedures have been proposed for SHM-based structural reliability assessment (Catbas et al. 2008;Liu et al. 2009;Xia et al. 2012;Zhu and Frangopol 2013). Apparently, the accuracy in the stress data-based structural condition prognosis relies largely on the validity and reliability of the forecasted stresses, which highlights the importance of accurate forecasting of the structural stress responses.
Predicting the evolution of time-varying structural stress behavior using monitoring-derived stress data poses certain challenges. First, for an in-service structure experiencing various types of load effects such as live loads and environmental loads, its stress evolution is a typically nonlinear dynamic process, and the underlying mechanism may not be well defined. Second, the monitoringderived stress data continuously collected by a monitoring system are tremendously massive by nature. As a result, modeling approaches that bear adequate performance in both computational accuracy and computational efficiency are favored to forecast structural stress responses from the massive monitoring data. In the literature, the autoregressive (AR) model has been widely investigated for SHM applications (Fugate et al. 2001;Gul and Catbas 2009;Magalhães et al. 2012;Harmanci et al. 2016). Specifically, the AR model defines a linear parametric structure to map the relationship between the current output and the previous outputs, and the AR coefficient can be efficiently obtained by the classical least-squares estimator. As a nonparametric one, the Gaussian process model (GPM) has been increasingly considered as a highly promising modeling technique for a wide range of engineering problems such as model updating (Khodaparast et al. 2011;Erdogan et al. 2014), uncertainty quantification (Fricker et al. 2011;Wan et al. 2014Wan et al. , 2017a, sensitivity analysis (Rohmer and Foerster 2011;Wan and Ren 2015;Wan et al. 2017b), and time-series modeling (Dervilis et al. 2016;Worden and Cross 2018). The wide applications of the GPM to different research topics can be attributed to its two admirable merits. First, the GPM, which is a probabilistic model elicited within a Bayesian framework, allows modelers to estimate the uncertainty in prediction. Second, the GPM has a data-driven feature that makes it not be restricted to a certain algebraic structure and thus ensures a high modeling flexibility and great expressive power (the AR model is in reality a special case of the GPM when using specific covariance functions). These two desirable features render the Gaussian process (GP)-based Bayesian method suitable for forecasting the stress responses of an in-service structure with monitoring data. The first feature allows for estimating the uncertainty in structural stress forecasts, whereas the second one guarantees the capability of characterizing the nonlinear pattern inherent in the stress responses.
Although the GP-based Bayesian modeling approach is powerful for modeling nonlinear dynamic systems, its practical applications in forecasting stress responses from the massive measurement data are inevitably hindered by the prohibitive cost of the required computations. In the current way, the currently available data are used to construct the GPM for ahead forecasts; when new data are available, the GPM is reconstructed for further ahead forecasts. This iterative forecasting procedure indicates that with more and more data available over time, the size of the training data set becomes bigger and bigger, making the forecasts further ahead in the future more computationally expensive. The increasing size of the training data set is a large barrier, which limits the application of the GP-based Bayesian modeling approach for forecasts, because the construction of GPM requires a number of computations in the order of Oðn 3 Þ (Rasmussen and Williams 2006), where n is the training data set size. To address this issue, we propose a moving window strategy to reduce the size of the training data set, thus leading to a considerable relief in the computational burden associated with the reconstruction of the GPM. The proposed reduced-order GP-based Bayesian modeling approach is applied for forecasting the structural stress responses of the 600-m-high Canton Tower based on the online monitoringderived stress data. Performance comparisons with the full GP-based method constructed using the full set of available data are provided to investigate the feasibility of the proposed reducedorder GP-based approach. The comparison results show that the reduced-order GP-based approach greatly outperforms the full GP-based method in terms of both prediction accuracy and computational speed.

Bayesian Modeling with Gaussian Processes
Gaussian processes offer a powerful method to perform Bayesian modeling of the underlying functions. A GP is a collection of random variables, any finite number of which has a joint Gaussian distribution (Rasmussen and Williams 2006). A multivariate Gaussian distribution of finite dimension is fully specified by its mean function and covariance function. The GPM is formulated within a Bayesian framework. Specifically, GP modeling begins with a Gaussian prior over the model outputs, specified through a prior mean function and a covariance function. Then, the observations of the model outputs are assumed to be under a Gaussian likelihood. Finally, through the maximum likelihood estimate given the training data set, a Gaussian prior combined with a Gaussian likelihood gives rise to a posterior GP over the unobserved model output. As the prior mean represents whatever we expect for the target function before seeing any data, it is generally set to be zero in the absence of prior knowledge about the mean variation (Neal 1999). It is easy to generalize into an arbitrary mean function, such as linear and polynomial functions. In the present study, the squared exponential (SE) covariance function, which is most commonly used in engineering literature, is selected for defining a GPM. The SE covariance function has the following form: where η 2 = signal variance; l k = characteristic length scale; and d = input dimensionality. In addition to the SE function, the Martérn class (MC) function and periodic (PE) function are also typical covariance functions used in GPs (Ko 2011). The MC function is suitable for modeling significantly rough variations, whereas the PE function is appropriate for cyclical variations. Consider a set of observed data D ¼ fðx i ; y i Þj n i¼1 g consisting of n pairs of vector input x and noisy scalar output y. For notational convenience, we denote the input and output components as X ¼ fx i gj n i¼1 and Y ¼ fy i gj n i¼1 , respectively. With the assumption of Gaussian noise, the observation model can be expressed as where fðxÞ = latent function; and σ 2 noise = variance of noise. Note that in modeling deterministic systems, which always produce the same outputs with the identical inputs, one just needs to remove the noise item ε noise or set the variance σ 2 noise to be zero. Interested readers may refer to Wan and Ren (2015) and Wan et al. (2017a) for the deterministic case. With Gaussian prior, the joint distribution of the latent function realizations becomes where f ¼ fðXÞ; and covariance matrix C ¼ CðX; XÞ, which is the covariance function encoding the correlations between pairs of inputs. Let f Ã denote the latent function realization to be predicted at an unobserved point x Ã . Again, under the assumption of Gaussian prior, one can obtain the following joint Gaussian distribution: where By applying Bayes' theorem, the joint posterior distribution of f and f Ã conditioned on observations is obtained as where, by using the observation model defined in Eq.
(2), pðYjfÞ can be expressed as where I = identity matrix of size n × n.
To evaluate the posterior distribution of f Ã given the observed outputs Y, we marginalize the joint posterior expressed in Eq. (5) such that where denominator pðYÞ = normalizing factor (also known as the marginal likelihood) ensuring pðf Ã jYÞ is a valid posterior distribution such that ∫ pðf Ã jYÞdf Ã ¼ 1. Fortunately, calculating pðf Ã jYÞ is extremely simple because both pðf; f Ã Þ and pðYjfÞ are of the Gaussian form. Finally, one can obtain its expression in a closed-form Gaussian distribution as with the mean and the variance given by where K ¼ C þ σ 2 noise I. For the noisy case, the hyperparameters to be determined include the covariance function parameters and the noise parameter, denoted as Θ ¼ fl 1 ; : : : ; l d ; η 2 ; σ 2 noise g. In the Bayesian context, it is common practice to infer an optimal set of the hyperparameters that maximize the marginal likelihood of the training data. In so doing, the estimation of the hyperparameters is converted to the solution of an optimization problem of minimizing the negative logarithmic marginal likelihood (NLML). That iŝ With a Gaussian likelihood, the NLML LðΘÞ has an analytical expression as The partial derivatives with respect to the hyperparameters are also analytically tractable, expressed by where j · j, trð·Þ, and ð·Þ ⊤ = determinant, trace, and transpose operators, respectively.

Moving Window Strategy for Forecasting
Forecasting time series refers to learning dynamical systems (also known as time series modeling), which aim at characterizing the behavior of system responses over time. The model to be constructed from the obtained time series data is dynamic; that is, it is continuously updated when more and more data are available over time. As aforementioned, the GP-based Bayesian modeling approach requires a number of computations that scales Oðn 3 Þ, with n as the number of data in the training data set. As a result, the GP-based method is computationally demanding when the training data size is huge. Moreover, even though the size of the training data set is not prohibitively large, the total time consumed in continuous forecasting by repeatedly building the GPM likely becomes unaffordable. To overcome the limitation of high computational cost, a moving window strategy is proposed in this study in an effort to significantly reduce the size of the training data set, thus resulting in a computationally efficient, reduced-order GPM. The principle behind the moving window is well known. As the window slides along the data, a new process model is generated by including the newest sample and excluding the oldest one.
With the moving window strategy, a small set of training data, which is closest to the prediction point and has a fixed size, is used to build a reduced-order GPM. In line with the idea of moving window and the theory of GPM described previously, the reducedorder GP-based Bayesian modeling approach for time series forecasts is detailed in Algorithm 1. Note that the full GP-based approach is also given for comparison.
Algorithm 1. Forecasting using Bayesian modeling approach with Gaussian process Input: Initial training data D ¼ fX; Yg with X ¼ ft i gj n i¼1 and Y ¼ fy i gj n i¼1 . Output: Forecasts of y nþ1; : : : ;N at the times t nþ1; : : : ;N in term of mean vector μ and variance vector Σ. 1. Initialize μ and Σ such as μ←∅, Σ←∅. 2. Compute the NLML LðΘÞ and its partial derivatives using the following expressions: 3. Use the multistarting point strategy-aided conjugate gradient routine to solve the optimization problem defined asΘ ¼ arg min Θ LðΘÞ to infer the optimal set of hyperparameters.
4. Evaluate the prediction mean and variance from 6. Update the training data D as follows: a. With moving windows: D ¼ fX; Yg with X←½X −1 ; t nþ1 and Y←½Y −1 ; y nþ1 , in which X −1 denotes a collection of inputs X excluding the element t 1 , and similarly for Y −1 .

Repeat
Steps 2-6 to iteratively forecast the stress responses at other time instances.
8. Return the forecasting results μ and Σ.

Description of Canton Tower
The Canton Tower is located in the city of Guangzhou, China. This supertall structure, being a landmark in Guangzhou, consists of a 454-m-high main tower and a 146-m-high antennary mast, resulting in a total height of 600 m. The shape of the Canton Tower is generated by two ellipses: one at foundation level of the main tower and the other at the top. These two ellipses are rotated relative to each another. The tightening caused by the rotation between the two ellipses forms a "waist" halfway up the tower. More specifically, the ellipse decreases from 50 × 80 m at the ground to the minimum of 20.65 × 27.5 m at 280 m high, and then increases to 41 × 55 m at 454 m high (the top of the main tower). As shown in Fig. 1, the main tower is a tube-in-tube structure that comprises a core tube inside and a usual tube outside. The core tube is constructed by using reinforced concrete; whereas the usual tube is composed of 24 concrete-filled-tube (CFT) columns, uniformly spaced in an ellipse while inclined in the vertical direction. The CFT columns are interconnected transversely by steel ring beams and bracings. The inner tube is also ellipses-shaped but with a constant cross section of 14 × 17 m, and its centroid differs from that of the outer tube. The inner and outer tubes are connected through 37 functional floors, which hold services such as utilities, tour, catering, and TV and radio signal transmission facilities. The antennary mast on the top of the main tower is a steel spatial structure with an octagonal cross section, with a maximum diagonal of 14 m long.

Structural Health Monitoring System
To ensure the safety of the Canton Tower in both in-construction and in-service stages, a sophisticated long-term SHM system has been designed and implemented by The Hong Kong Polytechnic University for online monitoring of this landmark building ). The construction of the Canton Tower has been completed at the end of May 2009. Since then, the integrated SHM system embracing more than 800 sensors of 16 types (including anemometers, accelerometers, strain gauges, GPS, and digital video camera) has continuously operated for 8 years . The focus of the present work is on forecasts of structural stress responses, so we are more concerned with the deployment of strain sensors. The main tower is equipped with a total of 412 vibratingwire strain gauges, 148 of which are the embedded-type strain sensors and the rest are the surface-type strain sensors. In particular, each of the 12 cross sections at different heights is installed with 12 strain sensors: each of the first five rings corresponding to Cross Sections 1-5 is installed with 20 strain sensors, and each of the last seven rings associated with Cross Sections 6-12 is installed with 24 strain sensors. The cross sections selected for sensor deployment correspond to the concrete inner core wall at the elevations of 32. 8, 100.4, 121.2, 173.2, 204.4, 230.4, 272.0, 303.2, 334.4, 255.2, 286.4, and 438.4 m. The corresponding ring numbers are 3,9,11,17,21,24,28,32,35,38,40, and 45 at the outer tubular structure. The arrangement of the strain sensors on the inner and outer tubes of the Canton Tower is illustrated in Fig. 2. Each section is equipped with one substation to collect data from all its sensors, and then the data stored in these substations are transmitted to the central data warehouse. Fig. 3 shows a typical floor plan with the numbered CFT columns and the layout of the sensors in Cross Section 6 around halfway up the main body. In the inner tube, four positions [1-4 in Fig. 3(b)] at each critical section are selected for strain and temperature Vibrating wire strain gage (12) Section 11 Vibrating wire strain gage (12)

Ring 40
Ring 45 Fig. 3(c)]. Because the vibrating-wire strain gauges are unable to measure transient dynamic strain, the present system collects the strain data once per minute; that is, the sampling frequency of strain sensors is equal to 1=60 Hz.

Forecasting Results
Without loss of generality, two strain gauges at Cross Section 6 shown in Fig. 3 are chosen to illustrate the proposed GP-based Bayesian modeling approach: one is Sensor 1 of the strain rosette corresponding to Point 1 of the core tube [ Fig. 3(b)], and the other is located at Point 4 of CFT columns [ Fig. 3(c)]. For the sake of brevity, the first selected gauge is denoted CT 1, and the second is denoted CFT 1. The method discussed in the preceding section can also be applied to forecast the stress responses at any other monitoring points. First, the measured strain should be converted to the stress for the subsequent forecasting. For the strain gauge mounted on the steel surface, the conversion of the measured strain to stress is conducted by σ s ¼ E s ε m , where E s is the elastic modulus of the steel, and ε m is the measured total strain. For the strain gauge embedded inside the concrete, because temperature-induced expansions and contractions can give rise to changes of stress in the concrete, the conversion from the measured strain to stress is made by σ c ¼ E c ðε m þ α s ΔTÞ, where E c is the elastic modulus of the concrete, α s is the coefficient expansion of the strain gauge material, and ΔT is the temperature change relative to the reference temperature predefined for the gauge. The aforementioned strainto-stress conversion formulas are suitable for indeterminate (constrained) structural components. For determinate (unconstrained) structural components, however, one needs to further exclude the temperature-induced strain from the measured total strain so as to extract the strain that indeed induces stress and then converts  the extracted strain to stress. More details on the strain-to-stress conversion can be found in Su et al. (2017). In this study, the real-time monitoring-derived stress data extracted during a 3-month period (January 1 to March 31, 2012) are of concern. To decrease the uncertainty arising from the measurement facilities, the average stresses of every hour are used as the target quantities. After deducting a small number of missing data mainly caused by the shutdown of the acquisition system, a total of 2,161 sets of stress data are collected. Fig. 4 shows the averaged stresses of inner and outer tubes.
Following the implementation procedures summarized in Algorithm 1, the reduced-order and full GPMs are formulated separately for stress response forecasts. The moving window length is set as 10. The reason for this window size is discussed subsequently. In the statistical community, in addition to one-step ahead forecasting cases, multistep ahead forecasting situations are also widely explored. To comprehensively assess the performance of the GP-based Bayesian approach, the scenarios of multistep ahead forecasting are also presented in this study. More specifically, three scenarios, namely, one-, three-, and five-step ahead forecasts, are considered. The forecasting results produced by the reduced-order and full GPMs are shown in Figs. 5 and 6, respectively. In addition, the results of forecasting variances are provided in Fig. 7.

Comparison of Computational Accuracy
By comparing the results in each scenario, it can be found that the forecasting performance of the reduced-order GP-based method is better than that of the full GP-based method, especially for the multistep ahead forecasting cases. Specifically, the performance is evaluated by comparing the forecasting stress responses with the measured counterparts and quantifying the forecasting uncertainty. As the visual discrimination is more or less subjective, performance metrics that allow for evaluating the forecasting capabilities quantitatively are desired. This study adopts two measure criteria: root-mean-square error (RMSE) and mean likelihood (ML), defined as where μ y i and σ 2 y i = predictive mean and variance of stress, respectively; and y i = measured stress. The RMSE metric measures the overall accuracy of the forecasts, whereas the ML metric quantifies how likely the measurements are reproduced by the forecasts, which takes into account the effect of predictive variance. As a consequence, the smaller the value of RMSE and the greater the magnitude of ML, the more accurate the GPM. The results of the RMSE and ML metrics for different step ahead forecasts are demonstrated in Fig. 8. Again, it can be concluded that the reduced-order GP-based method outperforms the full GP-based method. It is evident from Fig. 7 that the forecasting error and uncertainty become larger with the increasing step of the ahead  forecasting. More specifically, the RMSE value of the five-step ahead forecast is biggest, followed by that of the three-step ahead forecast and then that of the one-step ahead forecast; so are the forecasting uncertainty bounds. This phenomenon can be explained by the fact that the bigger the forecasting step, the larger the forecasting period; in other words, we are more uncertain about the future behavior of the underlying system, which leads to higher forecasting error and uncertainty (variance). Another important observation is that the reducedorder GPM is more suitable for characterizing the nonlinear dynamic process of the stress responses than the full GPM because the former fits the shape of the stress curves well (see, e.g., the region around mid-January in Figs. 5 and 6).

Comparison of Computational Efficiency
Apart from the forecasting accuracy of the reduced-order and full GP-based Bayesian approaches, their computational efficiency is also our concern. The implementations are all performed on a Dell PowerEdge T420 (Dell, Round Rock, Texas) desktop with Dual Intel Xeon E5-2403V2 processor (IBM, Armonk, New York) and 16 GB of memory. The comparison of CPU time consumed by these two approaches is presented in Fig. 9. Specifically, for the one-, three-, and five-step ahead forecasting scenarios, the reduced-order GP-based approach takes 1,366 s (0.38 h), 435 s (0.12 h), and 263 s (0.07 h), respectively; the full GP-based method takes 121,141 s (33.65 h), 40,184 s (11.16 h), and 24,397 s (6.78 h), respectively. Apparently, the full GP-based method consumes tremendously longer CPU time for the forecasts of the stress responses. The comparison of CPU time consumed indicates that the reduced-order GP-based forecasting method exhibits overwhelming superiority over the full GP-based forecasting method in terms of computational time. The appealing advantage of high computational efficiency of the reduced-order GP-based method is attributed to a small set of training data used for learning the GPM with the moving window strategy. By contrast, the size of training data associated with the full GPM becomes bigger and bigger with the increase in the number of forecasting executions, resulting in consuming more and more CPU time. To have a clear picture of the evolution of CPU time consumed during the forecasting process, the CPU time consumption versus the number of forecasting executions is shown in Fig. 10. Obviously, for the reduced-order GPM, the CPU time of each forecasting run is small and quite stable; whereas for the full GPM, the CPU time gradually rises as the number of forecasting executions increases. However, the relationship between the CPU time consumption and the number of forecasting executions is not strictly monotonic because the convergence rates of optimization for hyperparameter estimation can be different. In summary, our investigations verify that the proposed reduced-order GP-based Bayesian modeling approach using the moving window strategy is more accurate and more computationally efficient than the full GP-based method that uses all available measurements as the training data set.

Investigation of Moving Window Size
A crucial issue of the reduced-order GPM using moving window strategy is the choice of a proper window length. The window length affects the time consumption in GPM construction; that is, the smaller the window, the smaller the training data set, and the less the time consumed in building the GPM. The total time consumption for continuous forecasting depends on not only the   time cost for each GPM construction but also the number of GPMs. In this respect, the smaller size of the window does not necessarily result in a reduction in computational expense. To explore the effect of the window size, the forecasting is pursued by using different window lengths. To be specific, we increase the window length from 1 to 30 with an interval of 1 and then check their computational accuracy and efficiency. Also, the multistep ahead forecasts are of interest. The performance results obtained using different window sizes are demonstrated in Fig. 11. From this figure, several observations can be obtained: (1) the computational cost in the case of a small window size (e.g., 3) is not necessarily lower than that in the case of big window size because the number of GPMs of the former is greater than that of the latter; (2) with the same window size, the forecasting accuracy for a small forecast horizon (one-step ahead) is higher than that for a big forecast horizon (three-step ahead); and (3) the RMSE, ML, and time consumption do not change much when the moving window size is larger than 6. Ideally, the optimal window size is determined when both computational accuracy and efficiency are highest. Although the ideal case does not exist, we may still select a good window size. Considering RMSE, ML, and time consumption, the "optimal" window length can be set as 10 because RMSE is not big, ML is not small, and time assumption is low.

Discussion
The aforementioned observations can be summarized as follows: • The forecasting error and uncertainty become larger as the forecasting horizon (step) increases, which can be confirmed by the resulting magnitudes of RMSE and ML. This trend is easy to understand. The ahead forecasts are out-of-sample predictions, so typically the further in the future we attempt to forecast, the larger the error and uncertainty are because of increasing lack of confidence regarding the long-term behavior. • The reduced-order GP-based method achieves better accuracy in stress response forecasts than the full GP-based approach. The reason is that the former, using a small set of measurements closest to the prediction point as the neighboring training data, is more capable in capturing the local nonlinearity around the prediction point; the latter, being global, sacrifices its ability of modeling local nonlinearity to be more representative of the overall trend of all available measurements. Therefore, the proposed reduced-order GP-based method is more apt to characterize the evolution of the structural stress responses, which is a highly nonlinear dynamic process. • In addition to high forecasting accuracy, the reduced-order GP-based method bears competitive advantage over the full GP-based approach in terms of computational efficiency. This overwhelming advantage is attributed to the fact that the training data size of the former is small and fixed, whereas that of the latter becomes larger and larger with more and more observations available. The different strategies of selecting training data give rise to totally different CPU time consumption. Specifically, during the forecasting process, the former takes small and stable computational time, whereas the latter consumes exponentially increasing computational time.

Conclusions
In this work, the Bayesian modeling approach with GP prior is adopted for forecasts of structural stress responses. The present methodology allows us to quantify the uncertainty in stress prediction through offering not only the predictive mean but also the associated predictive variance. Within the GP framework, the calculations of predictive mean and variance are both analytically tractable, and thus the complex and time-consuming integration procedure is eschewed. In this respect, the GP-based forecasting itself is computationally efficient because the probabilistic predictions can be analytically evaluated. In practice, the monitoringderived stress data are tremendously massive by nature and their size experiences a steady increase over time. Unfortunately, the construction of the GPM becomes computationally prohibitive when the size of the training data set is huge. In this sense, when the size of the training data set is quite large, the standard GP would be computationally expensive. Moreover, even though one-time GPM construction does not consume much time, the total time consumption in continuous forecasting by repeatedly building the GPM likely becomes unaffordable. Both the big size of the training data and the continuous forecasting scenario are realities we encounter in monitoring-based structural condition prognosis. To ease the high computational burden involved in GPM construction, the moving window strategy is proposed in this work to shrink the training data size, which results in a computationally efficient reduced-order GPM. The real-time monitoring-derived stress data from a supertall structure is provided to demonstrate the feasibility of the proposed reduced-order GP-based approach. Three forecasting scenarios, namely, one-, three-, and five-step ahead forecasts, are considered. The performance of the reduced-order GP-based method is compared with the full GP-based method in terms of both computational accuracy and efficiency. Specifically, two measure criteria, namely, RMSE and ML, are introduced to evaluate the forecasting performance in a quantitative manner. We also investigate the total CPU time consumption and the evolution of CPU time consumption with the increasing size of the data set used for stress response forecasts. Our studies indicate that with the increasing number of the forecasting executions, the time consumption of the full GP-based approach exponentially rises, whereas that of the reduced-order GP-based method is small and quite stable.