Earthquake Risk of Gas Pipelines in the Conterminous United States and its Sources of Uncertainty

Relatively little research has been conducted to systematically quantify the nationwide earthquake risk of gas pipelines in the United States; simultaneously, national guidance is limited for operators across the country to consistently evaluate earthquake risk of their assets. Furthermore, many challenges and uncertainties exist in a comprehensive seismic risk assessment of gas pipelines. As a first stage in a systematic nationwide assessment, we quantify the earthquake risk of gas transmission pipelines in the conterminous United States due to strong ground shaking, including the associated uncertainties. Specifically, we integrate the U.S. Geological Survey 2018 National Seismic Hazard Model, a logic tree-based exposure model, three different vulnerability models, and a consequence model. The results enable comparison against other risk assessment efforts, encourage more transparent deliberation regarding alternative approaches, and facilitate decisions on potentially assessing localized risks due to ground failures that require site-specific data. Based on the uncertainties approximated herein, the resulting sensitivity analyses suggest that the vulnerability model is the most influential source of uncertainty. Finally, we highlight research needs such as (i) developing more vulnerability models for regional seismic risk assessment of gas pipelines, (ii) identifying, prioritizing, and measuring input pipeline attributes that are important for estimating seismic damage, and (iii) better quantifying seismic hazards with their uncertainties at the national scale, for both ground failures and ground shaking.


Introduction
The United States manages the largest gas pipeline network in the world. Besides heating homes, the gas pipeline network also provides fuel for electricity generation and different modes of transportation; hence, it is a vital asset for all phases of disaster recovery (Applied Technology Council 2016). As of 2018, distribution, transmission, and gathering lines compose, respectively, partially because of its close relationship to longitudinal ground strain (Honegger and Nyman 2004;Newmark 1968;Tsinidis et al. 2019).
The total risk for a specified group of pipeline segments depends on the risk for a given segment. For example, the average annual number of leaks for a given pipeline segment, , , is determined by integrating the PGV hazard curve at the segment's midpoint, ( ), with the expected number of leaks, ⋅ ( , ), yielding: Similarly, the average annual number of breaks for a given segment, , , is determined by integrating its PGV hazard curve with the expected number of breaks: Incorporating the cost to repair a leak, , and the cost to repair a break, , the AAL for a given segment captures the total loss from both leaks and breaks: = , ⋅ ( ) + , ⋅ ( ) (3) Although the repair costs may also depend on the segment's pipeline attributes such as geographic region or pipeline cross-sectional area, this notation is dropped henceforth to emphasize the more critical relationship between the RR of a pipeline segment and its input pipeline attributes. Furthermore, if indirect economic losses associated with leaks and breaks such as service disruption or downtime were known (American Lifelines Alliance 2005), such losses could be substituted into Equation 3; while such losses can greatly exceed costs associated with physical repairs (Eguchi 1995), this important aspect of the problem requires more spatially accurate data (e.g., system redundancy, percent of population serviced) and is beyond the scope of the present study. Finally, the total AAL for a specified group Ω of pipeline segments (e.g., segments sharing a common geographic region, operator, or interstate classification) is determined by aggregating the AALs across the different segments within the group: The term |d ( )| in Equations 1 to 4 for defining risk refers essentially to the annual frequency of PGV occurrence and, hence, captures the aleatory variability, which reflects irreducible randomness in seismic hazard (Ang and Tang 2007;Kiureghian and Ditlevsen 2009). Unlike some studies where earthquake risk is quantified for a specified hazard level (e.g., a return period of 2,475 years for PGV exceedance events), this study integrates the risks across all hazard levels.
To explicitly identify and quantify the different sources of uncertainty that influence an estimate of AAL, we substitute Equations 1 to 3 into Equation 4 and distinguish estimates from corresponding exact quantities using hat symbols: For a given dataset of pipeline segments with known lengths, Equation 5 identifies four key sources of overall uncertainty in the final AAL estimate: (i) hazard, or ̂( ); (ii) exposure, or ̂; (iii) vulnerability, or ̂ and ̂; and (iv) consequence, or ̂ and ̂. Put differently, the quality of the AAL estimate depends primarily on the quality of estimates for these six inputs, and, hence, these six quantities are viewed as random variables; estimation and discussion on the nature of these uncertainties are provided in the subsequent sections.
Equation 5 also reveals the coupled nature between the uncertainty in the vulnerability model and that in the exposure model. For example, part of the uncertainty in the estimated rate of repairing leaks for segment at PGV level , or ̂( , ), is due to the uncertainty in the estimated input pipeline attributes, ̂ (e.g., decade of installation, pipe material, nominal diameter). To separate the uncertainty in exposure model from that in the vulnerability model, we rewrite Equation 5 by explicitly capturing the epistemic uncertainty of input pipeline attributes with a logic tree for each pipeline segment: where denotes the total number of final branches in the logic tree for a given segment, , denotes the vector of input pipeline attributes for the th branch of a given segment, and ̂, denotes the corresponding branch probability.
The total number of final branches in a logic tree depends on the vulnerability model of interest. For example, one vulnerability model may indicate that only diameter range is important for estimating RRs (i.e., 1 node in logic tree), whereas another vulnerability model may indicate that both pipe material and decade of installation are important for estimating RRs (i.e., 2 nodes). The hat symbol in ̂, denotes uncertainty in the branch probability for a given segment, and, hence, ̂, is viewed as a random variable. For example, the diameter of a given pipeline segment may be unknown, and, hence, probabilities need to be estimated for the different possible diameter ranges. Finally, the lack of hat symbol in , indicates the transfer of epistemic uncertainty in input pipeline attributes ̂ in Equation 5 to epistemic uncertainty in the branch probabilities ̂, in Equation 6. As a result, the uncertainty in RR for known exposure, ̃ in Equation 6, is less than the uncertainty in RR for unknown exposure, ̂ in Equation 5. For consistency, the tilde symbols are also employed to distinguish the uncertainty in repair costs for known exposure, ̃ and ̃, from that for unknown exposure, ̂ and ̂. In the subsequent sections, we present information that is currently available in the literature for estimating the quantities in Equation 6 (i.e., their central values and corresponding uncertainties).

Hazard Available information
We utilized the U.S. Geological Survey (USGS) 2018 National Seismic Hazard Model (NSHM) (Petersen et al. 2019) to quantify the aleatory variability of seismic hazard for strong ground shaking at the national scale. The 2018 NSHM does not provide seismic hazard for PGV (Powers, Altekruse, and Abrahamson), so we estimate it from 5%-damped spectral acceleration at a vibration period of 1 second using the Hazus methodology (Federal Emergency Management Agency 2012; Mousavi, Hesari, and Azarbakht 2014). Previous research has shown that hazard curves based only on reference soil conditions (i.e., 30 =760 m/sec) tend to underestimate the annualized earthquake losses (Jaiswal et al. 2015). To account for local soils, we first estimate the site-specific value of 30 using the USGS global server (Wald and Allen 2007) at the midpoint of each pipeline segment. Then, we applied linear interpolation of 30 and logarithmic interpolation . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press. Page 6 of 42 of exceedance frequencies to the 30 -specific hazard curves from the 2018 NSHM (Shumway et al. 2021). The resulting hazard curves serve as best estimates of ̂( ) in Equation 6. Figure 1 presents the estimated PGV hazard curves at four cities across the CONUS using each city's estimate of 30 , which are shown in the legend. While the highest hazard in this figure corresponds to Los Angeles, CA, the lowest hazard corresponds to Chicago, IL. However, being situated in the central or eastern United States does not necessarily imply low seismic hazard because the hazard for Memphis, TN, which is near the NMSZ, exceeds that for Portland, OR, at exceedance frequencies that are lower than ~4 × 10 −3 . Therefore, the earthquake threat for gas transmission pipelines is a national concern.
Figure 1 -Peak ground velocity (PGV) hazard curves at four cities based on city-specific estimates of local soils; estimated epistemic uncertainties shown by dashed lines.

Uncertainty analysis
When estimating AAL using Equation 6, the estimated PGV hazard curve for a given pipeline segment, ̂( ), contains epistemic uncertainty because of limited data and knowledge in characterizing the seismic sources and defining the ground motion model (Kiureghian and Ditlevsen 2009;McGuire 2004). Furthermore, this epistemic uncertainty increases for increasing levels of seismic hazard. This feature implies larger uncertainty for low-probability-highconsequence events (e.g., large RRs associated with rare PGV exceedance) than for highprobability-low-consequence events (e.g., small RRs associated with frequent PGV exceedance). The 2018 NSHM does not provide estimates of epistemic uncertainty in seismic hazard, so we estimate it by first assuming that the annual exceedance frequency at a given PGV level is lognormally distributed (Bradley 2009;Y. Lee, Graf, and Hu 2018) and then estimating the logarithmic standard deviation as a function of annual exceedance frequency. While the annual exceedance frequency may not strictly follow a lognormal distribution (e.g., Lacour and Abrahamson 2019), the lognormal assumption facilitates capturing the center, body, and range of . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press. technically defensible interpretations of data (Atkinson, Bommer, and Abrahamson 2014). The probabilistic seismic hazard analysis conducted for Yucca Mountain, NV, in Stepp et al. (2001) was one of the most comprehensive analyses because it rigorously captured epistemic uncertainty in the seismic source characterization through six teams of experts on seismic sources and epistemic uncertainty in ground motion model through seven experts on ground motions. Therein, their Figure 10b documents several "fractiles" for the annual probability of exceeding spectral acceleration at 1 second. Assuming that probabilities are approximately equal to frequencies and data for 1-second spectral acceleration are applicable to PGV, their Figure 10b essentially portrays a realistic estimate of epistemic uncertainties in hazard. Based on that figure, we created a model to approximately estimate the logarithmic standard deviation of annual exceedance frequency, denoted by ln , as a function of annual exceedance frequency, . Specifically, we assume a constant epistemic uncertainty of ln = 0.67 for annual frequencies of 0.01 or higher, whereas for lower annual frequencies, the epistemic uncertainty increases linearly to a value of ln = 1.6 at an annual frequency of 1.7 × 10 −6 .
We applied this approximate model for ln to all estimated PGV hazard curves in the CONUS to study epistemic uncertainty in seismic hazard and its impact on the resulting risk. For a given site, we used the approximate model for ln to determine the median hazard curve; this assumes that the previously estimated PGV hazard curve corresponds to the mean (Abrahamson and Bommer 2005;McGuire, Cornell, and Toro 2005) and that the annual exceedance frequency at a given PGV level is lognormally distributed. Next, we determine various percentiles (e.g., 85, 97.5) of hazard curves by multiplying the median hazard curve with exp[Φ −1 ( ) ln ] where Φ −1 ( ) denotes the inverse of the standard normal cumulative distribution function at the specified th percentile. Since the mean hazard curve from the 2018 NSHM varies spatially with different locations, the percentile hazard curves also vary spatially with different sites.
As examples, Figure 1 presents 15 and 85 percentile curves for each of the four cities. The 85-percentile curve for Memphis, TN, exceeds the 15-percentile curve for Los Angeles, CA, at annual frequencies lower than 10 −3 , highlighting the importance of quantifying epistemic uncertainty in seismic hazard when quantifying risk. The figure also shows that the approximate model for ln captures increasing epistemic uncertainty for increasing levels of hazard. However, our estimate of epistemic uncertainty in seismic hazard, ln , is based on a comprehensive analysis of one location in NV, which may not be accurate for other sites. While we have also considered other choices for estimating this epistemic uncertainty (e.g., trying an approximate model based on data from Bradley 2009), improved estimates will better inform the subsequent sensitivity analyses.

Available information
The locations of gas pipelines are sensitive information even though they are necessary inputs for seismic risk assessments (Applied Technology Council 2016). Fortunately, PHMSA annually collects this information from operators in a standardized format (Pipeline and Hazardous Materials Safety Administration 2017). Access to the 2018 National Pipeline Mapping System (NPMS) dataset of gas transmission pipelines in the CONUS was obtained through a confidential agreement with PHMSA. This dataset provides input pipeline attributes such as geographic location, total mileage, managing operator, classification of interstate vs. intrastate, or commodity. However, many other pipeline attributes that are needed to develop an engineering exposure model and estimate potential seismic damage are unavailable (e.g., burial depth and decade of installation are unknown for all segments). Pipe diameter is available for 58.4% of the total mileage because operators were not required to provide this information.
We supplemented information from the NPMS dataset by compiling and analyzing annual reports that are submitted by pipeline operators to PHMSA (Pipeline and Hazardous Materials Safety Administration 2020b). These reports provide distributions of total mileage by pipe diameter, pipe material, and decade of installation for each operator's assets. For 2018, we found that 99.5% of the total pipeline mileage is made of steel and that almost all have some form of protection against corrosion. Moreover, the next dominant category of pipe material is plastic, and none of the transmission lines are made of cast iron, which differs from gas distribution lines. We also found that 3.79% of the total mileage was essentially installed before 1940, which roughly distinguishes pipeline response from brittle vs. ductile behavior (Federal Emergency Management Agency 2012;T. D. O'Rourke and Palmer 1996).
One of the most important pipeline attributes that is missing in the NPMS dataset is the classification of a pipeline segment as an HCA (49 C.F.R. § 192.903). Due to the importance of HCAs, we discretized all pipeline segments in the NPMS dataset that are longer than 1.6 km (1 mile) into segments whose length is 1.6 km (1 mile) long or shorter (ASME 2018). We used this discretization and data on nearby buildings to approximately classify each segment as either HCA or non-HCA, but this effort is not discussed further due to space limitations. Based on PHMSA annual reports for 2018, approximately 6.89% of all mileage for onshore gas transmission pipelines in the CONUS corresponds to HCAs. Moreover, the top five states with the largest share of the total HCA mileage are CA (15.0%), TX (13.4%), IL (5.51%), PA (5.42%), and FL (4.12%). For context, the corresponding share of the total mileage in these states are CA (4.13%), TX (15.4%), IL (3.09%), PA (3.48%), and FL (1.83%). These statistics show that in addition to operators in CA, operators in other states such as TX or PA are also required to consider seismicity in their Integrity Management Programs, highlighting the need to better understand earthquake risk of gas pipelines at the national scale.

Uncertainty analysis
In addition to epistemic uncertainty in hazard curves, the epistemic uncertainty in pipeline attributes ̂ also contributes to the overall uncertainty in the risk estimates (Equation 5). For example, a different decade of installation leads to a different vulnerability and, hence, risk. This source of uncertainty is characterized as epistemic (Kiureghian and Ditlevsen 2009) because it can be reduced by acquiring more information from the operator.
To capture this source of uncertainty, we propose a logic tree framework where each node in the logic tree of a given pipeline segment corresponds to a unique pipeline attribute (Equation 6). Specifically, Table 1 presents the default nodes and branches chosen based on data available from PHMSA annual reports (Pipeline and Hazardous Materials Safety Administration 2020b). For a given node, all branches are mutually exclusive and collectively exhaustive. Therefore, the final (9×9×28=2,268) branches are also mutually exclusive and collectively exhaustive. These branches are denoted as 'default' because a vulnerability model may indicate that only a subset may be relevant in terms of estimating RRs.
While all pipeline segments share a common set of final branches, the segments' known attributes from NPMS (i.e., operator, state, interstate classification, and commodity) determine the appropriate branch probabilities ̂, . The PHMSA 2018 annual reports provide distributions of pipeline mileage by pipeline attribute, and these distributions vary depending on the operator, state, interstate classification, and commodity. For example, the thin orange lines in Figure 2 show the distribution of mileage by diameter for natural gas intrastate pipelines from an operator with the most HCA mileage in CA. This distribution reveals that slightly more than 60% of the mileage corresponds to diameters of 610 mm (24 inches) or less. In contrast, interstate pipelines in CA (thick orange lines in Figure 2) are generally larger in diameter with a most likely value of approximately 762 mm (30 inches). Several more distributions for other combinations are also shown in the figure. We used a segment's known attributes from the NPMS to determine the appropriate branch probabilities for the given node, which approximately captures correlations between segments (e.g., two adjacent intrastate segments in CA that are managed by the same operator would share the same distribution of diameter). Repeating this process for the other nodes (i.e., pipeline attributes) and assuming each node is conditionally independent of the other nodes leads to the final set of branch probabilities ̂, for a given segment. Pre -1940, 1940 to 1949, 1950 to 1959, 1960 to 1969, …, 2010  Considering all possibilities using the proposed logic tree framework in Equation 6 provides a flexible way to update the exposure model given new information. For example, if the diameter of a segment is known with certainty, a value of unity can be assigned to the corresponding diameter branch in the logic tree, while zeros can be assigned to the remaining diameter branches. In this sense, the uncertainty in exposure model is epistemic in nature. Most importantly, expressing AAL via Equation 6 instead of Equation 5 enables conducting sensitivity analyses with realistic values of the exposure model (i.e., varying attributes only for segments in which nonzero probability exists for more than one branch).
Due to mutual exclusivity and collective exhaustivity, the default logic tree (Table 1) can be readily modified to improve computational efficiency when applying different vulnerability models for estimating RRs. For example, Figure 3 illustrates an example logic tree for a given pipeline segment where the hypothetical vulnerability model captures the effects from decade of installation, pipe material, and pipe diameter using only four different time intervals, two different materials, and four different ranges of diameter (i.e., a total of 32 branches). Instead of applying the vulnerability model to all 2,268 branches, one can equivalently first combine and reweigh the relevant branches before applying the vulnerability model.   . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press.

Available information
In earthquake engineering of gas pipelines, there are essentially two categories of models for estimating potential seismic damage: empirical RR models (e.g., American Lifelines Alliance 2001; Eidinger 2020; Federal Emergency Management Agency 2012; O'Rourke et al. 2014) and numerical fragility models (e.g., Ashrafi et al. 2019;Jahangiri and Shakib 2018;Lee et al. 2016;Tsinidis et al. 2020b;Yoon, Lee, and Jung 2019). Empirical RR models are typically developed from observed damage to pipelines in past earthquakes and yield the average number of repairs per length of pipe, ( , ). Given strong ground shaking, common practice is to assume that 80% of repairs corresponds to leaks, whereas 20% of repairs corresponds to breaks (Federal Emergency Management Agency 2012; Pineda-Porras and Najafi 2010; De Risi et al. 2018) because repair records after earthquakes do not always provide enough information to distinguish between different damage states (O'Rourke et al. 2014). Therefore, the RR functions in Equation 6 can be expressed as ̃( , , ) =̃( , , ) ⋅ 0.8 and ̃( , , ) =̃( , , ) ⋅ 0.2. In contrast, numerical fragility models are typically developed from finite element simulations of gas pipelines subjected to a range of conditions and yield probabilities of damage state exceedance, Pr( ≥ | , ). While exceedance probabilities can be readily used to obtain the occurrence probability for a given damage state, Pr( = | , ), these probabilities do not directly provide the expected number of damage state occurrences along a pipeline. One way of potentially applying fragility models for regional seismic risk assessments is to model the occurrence of damage along the pipeline as a binomial process using segments of length equal to that used in the development of the fragility model. For example, 1-km-long pipelines were studied in Jahangiri and Shakib (2018), and, hence, the RR functions in Equation 6 might alternatively be expressed as ̃( , , ) = Pr( = | , , ) ÷ 1 km and ̃( , , ) = Pr( = | , , ) ÷ 1 km; however, this approach implies an upper limit of one repair per kilometer, which might be reasonable for some but not all values of PGV (e.g., Table 3 in O' Rourke and Palmer 1996).
There are both advantages and disadvantages to empirical RR models. For example, they are commonly used in regional seismic risk assessments of pipelines, including gas pipelines (Ameri and van de Lindt 2019; Esposito et al. 2013Esposito et al. , 2015Gehl et al. 2014;Jahangiri and Shakib 2018;Mousavi, Hesari, and Azarbakht 2014;De Risi et al. 2018;Shabarchin and Tesfamariam 2017). Furthermore, they are developed empirically from observed damage in past earthquakes. However, such data are almost exclusively from water pipelines, and, hence, the resulting models may not be directly applicable to gas pipelines (Honegger and Wijewickreme 2013;Karamanos et al. 2017;Tsinidis et al. 2019). Additionally, empirical RR models do not directly distinguish between different degrees of seismic damage (e.g., leak vs. break) and cannot be used for strainor stress-based design of gas pipelines.
Similarly, there are both advantages and disadvantages to numerical fragility models. For example, they are directly applicable to gas pipelines (e.g., representative joint types, grades of steel, wall thicknesses, operating pressures) and distinguish directly between different degrees of seismic damage. Furthermore, they typically capture a variety of input pipeline attributes that may impact seismic performance. However, numerical fragility models may not be directly applicable to regional seismic risk assessments because the key output is exceedance probability instead of RR. Strictly speaking, while numerical fragility models are directly applicable to gas pipelines, . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press. each model is applicable only to the specific conditions for which the model was developed (e.g., in the case of Tsinidis et al. (2020b), buried gas transmission pipelines that cross locations of vertical geotechnical discontinuities).
Given this context, we focus on two empirical RR models that have been accepted by the profession for regional seismic risk assessment of gas pipelines. The first model is the one that has been adopted into the Hazus methodology (Esposito et al. 2015;Federal Emergency Management Agency 2012;Mousavi, Hesari, and Azarbakht 2014;O'Rourke and Ayala 1993). In this model, the estimated RR is a function of pipeline ductility in addition to PGV. A "ductile" pipeline is defined in Hazus as one that is made of ductile material (e.g., steel with arc-welded joints), whereas other materials, including steel with gas-welded joints, are classified as "brittle" (page 8-17 in Federal Emergency Management Agency 2012). The second model is the "X grade" version of the model by the American Lifelines Alliance (ALA) 2001 (page 42 in American Lifelines Alliance 2001). In this model, the RR function is based on the cast iron damage algorithm for unknown soil conditions but scaled by 0.01 to reflect the general quality controls and design procedures that are commonly used for gas pipelines. This X grade version is a function of PGV only and differs from the "Welded steel" versions in Table 4-5 of the same reference, which are intended for construction characteristics that are typical of water pipelines (American Lifelines Alliance 2001).
The preceding two empirical RR models are illustrated in Figure 4. While the ductile curve from Hazus is lower than the brittle curve, it greatly exceeds the X grade curve from ALA. This level of difference highlights relatively large uncertainty in the estimated RR, even when both the level of PGV and the type of joint (e.g., not gas-welded) are known.
Figure 4 -Comparison of available repair rate curves. The three shaded regions correspond respectively to 2.5 and 97.5 percentiles of the repair rate curve for "Hazus brittle," "Hazus ductile," and "ALA X grade." . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press.
Page 13 of 42 Empirical evidence for gas transmission pipelines suggests that both vulnerability models may be plausible. On the one hand, the Hazus model seems plausible because relatively high RRs have indeed been observed for gas transmission pipelines in past earthquakes. For example, Table  3 in O' Rourke and Palmer (1996) shows that in areas with no reported permanent ground deformation during the 1994 Northridge earthquake, a RR of 0.27 repairs per kilometer was observed for Line No. 1001, which was installed in 1925, joined by oxy-acetylene welds, and damaged by transient ground deformations (Honegger 2000). Similarly, the same table shows that in the 1971 San Fernando earthquake, a RR of 1.03 repairs per kilometer was observed for Line No. 102.90, which was installed in 1920-1921 and joined by oxy-acetylene welds. Using the USGS ShakeMaps (Wald et al. 2005) for these two earthquakes as well as the approximate locations of these repairs depicted in Figures 3 and 6 of O' Rourke and Palmer (1996), a PGV range of 20 to 50 cm/sec was estimated for the RR of 0.27, and a PGV range of 70 to 220 cm/sec was estimated for the RR of 1.03, lending plausibility to both curves from Hazus.
On the other hand, the ALA X grade model also seems plausible because gas transmission pipelines have rarely been damaged by strong ground shaking in past earthquakes. For example, Table 3 in O' Rourke and Palmer (1996) shows one instance of no damage in the 1933 Long Beach earthquake, five instances of no damage in the 1952 and 1954 Kern County earthquakes, one instance of no damage in the 1971 San Fernando earthquake, and three instances of no damage in the 1994 Northridge earthquake. Furthermore, unlike the gas distribution system (Honegger 1998), the gas transmission system in San Francisco was virtually undamaged during the 1989 Loma Prieta earthquake (page 140 in National Research Council 1994). Similarly, no leaks were found for gas transmission pipelines after the 2019 Ridgecrest earthquake sequence (Jacobson 2019). These data lend plausibility to the X grade curve from ALA.
As noted on page 494 of O' Rourke and Palmer (1996), in which the authors reviewed the earthquake performance of gas transmission pipelines over a period of 61 years, the higher incidence of oxy-acetylene weld damage is associated with the construction quality (e.g., poor root penetration, lack of good fusion between pipe and weld), rather than the type of weld (e.g., oxyacetylene weld vs. electric arc weld), though the two aspects are related (e.g., administering welds with an oxy-acetylene torch may cause poor construction quality). Consequently, we reviewed the literature on construction practices, revealing several key milestones that may impact the construction quality of gas transmission pipelines. For example, shielded-metal-arc welding started becoming the standard approach for joining pipes in the field around the 1930s, replacing acetylene girth welds (page 29 in Kiefner and Rosenfeld 2012). Later, the time period of 1949 to 1962 was characterized by huge growth in pipe manufacturing, with new processes being developed and implemented. In 1955, the ASME standard B31 was revised, and for the first time, this revised ASME standard required recordkeeping of welding procedure qualification tests (page 13 in Rosenfeld and Gailing 2013) and hydrostatic pressure testing of pipelines before their operation; however, no duration of the pressure testing was specified at that time (Jacobs Consultancy Inc. and Gas Transmission Systems Inc. 2013). In 1961, the CA Public Utilities Commission established state regulations (General Order 112) that required hydrostatic pressure testing of newly constructed pipelines for at least 1 hour (Rosenfeld and Gailing 2013). In 1970, the federal safety regulations for gas transmission pipelines were issued, setting minimum requirements for designing, constructing, operating, and maintaining gas transmission pipelines across the nation. These minimum requirements include additional recordkeeping beyond that for welding and hydrostatic pressure testing of newly constructed pipelines for at least 8 hours. In . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press. Page 14 of 42 summary, pipelines constructed since 1970 represent state of the art in metallurgy of steel, pipe mill practices, and construction techniques (Kiefner and Trench 2001).
Given the plausibility of both vulnerability models and the preceding brief historical review of construction practices for gas transmission pipelines, we propose a third model that uses the pipeline segment's decade of installation to combine existing models. Specifically, the brittle curve from Hazus is used for segments of pre-1940 vintage, whereas the X grade curve from ALA is used for segments that have been installed in 1970 or later; for the decades of installation between 1940 and 1969, the ductile curve from Hazus is used. When applying this proposed "Mixed" model, = 3 logic tree branches were considered for each pipeline segment to capture the decade of installation. Assuming 80% of repairs as leaks and 20% as breaks, these decadedependent RR curves serve as best estimates of ̃( , , ) and ̃( , , ) in Equation 6. In passing, we note that a segment's decade of installation is used as a proxy for quality of construction, not to convey that vulnerability depends on the segment's age (Kiefner and Rosenfeld 2012).

Uncertainty analysis
The RR for leaks of a given pipeline segment , with known PGV and known exposure, is denoted by ̃( , , ) in Equation 6. The overall uncertainty in this RR arises from three underlying sources of uncertainty. First, it arises primarily from the functional relationship between the estimated RR and the input variables (i.e., PGV, ), or "model uncertainty" (Kiureghian and Ditlevsen 2009). Since RRs are typically estimated from a regression model, this model uncertainty further consists of two components: uncertainty due to omitting input variables (e.g., omitting properties of surrounding soil when defining for the vulnerability model) and uncertainty due to potentially inaccurate functional form (e.g., assuming a power law between PGV and RR). This model uncertainty can be viewed as partly epistemic because it can be reduced by employing more accurate functional forms in future models, but it can also be viewed as partly aleatory because our state of scientific knowledge may not allow refinement in functional form. Regardless of the categorization, this model uncertainty contributes to uncertainty in ̃( , , ), which propagates to uncertainty in the final AAL estimate.
Second, the overall uncertainty in ̃( , , ) also arises from "parameter" (or statistical) uncertainty (Kiureghian and Ditlevsen 2009). For example, even when the relevant input variables have been included and the functional form is known, uncertainty still exists in estimating the parameters for the functional form from limited data and this uncertainty propagates to uncertainty in the final AAL estimate. This parameter uncertainty can be viewed as epistemic because, in principle, the precision of parameter estimates increases with increasing data.
Third, the overall uncertainty in ̃( , , ) arises from the assumed distribution of repairs as leaks (e.g., Bonneau and O'Rourke,page 113). Even when the vulnerability model for estimating RRs, ̃( , , ), is known with certainty (i.e., all input variables have been included, the accurate functional form was used, parameter estimates are precise), the uncertainty in distributing the repairs as leaks (i.e., 80% given strong ground shaking) propagates to uncertainty in the final AAL estimate. The categorization of this third source of uncertainty depends on whether the modeler foresees the potential for a reduction, but more importantly, this source of uncertainty also contributes to uncertainty in the final AAL estimate. . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press. Page 15 of 42 In this paper, the preceding three sources of uncertainty in ̃( , , ) are combined as a single uncertainty in 'specification of the vulnerability model,' denoted by lñ. For example, using ̃( , , ) =̃( , , ) ⋅ 0.8 is one modeling choice and using ̃( , , ) = Pr( = | , , ) ÷ 1 km is another. Furthermore, utilizing the version of a model that has been updated with additional empirical data constitutes yet another choice in specification of the vulnerability model. Finally, the model uncertainty consists of specifying the input pipeline attributes that are relevant and specifying a functional form, which are both additional modeling choices. While the preceding discussion focused on leaks to identify the three underlying sources of uncertainty, it applies equally to breaks.
Unfortunately, no information is available to directly quantify each of the preceding underlying sources of uncertainty. Therefore, to estimate the uncertainty in ̃( , , ) and in ̃( , , ), we assume a lognormal distribution for each RR. The corresponding medians are given by 80% and 20%, respectively, of the RR from the proposed Mixed vulnerability model. Furthermore, their logarithmic standard deviations, lñ and lñ, are both assumed equal to 1.15, which is the model uncertainty for the backbone vulnerability model in ALA (Table  4-4 in American Lifelines Alliance 2001). While this estimate of uncertainty may seem large, it is not unreasonably large because it excludes both parameter uncertainty and uncertainty in distributing repairs as leaks or breaks; moreover, it is within the range of estimates from other studies (see total uncertainties listed in Table D1 of Bellagamba et al. 2019). Most importantly, this relatively large uncertainty is adopted herein to reflect the fact that even when a segment's decade of installation is known, the resulting RR remains uncertain because it can conceivably be influenced by other input pipeline attributes (e.g., buried vs. aboveground, diameter to thickness ratio, friction at soil-pipe interface, operating pressure, temperature). Figure 4 illustrates this uncertainty using shaded regions that are defined by 2.5 and 97.5 percentiles of each empirical RR curve in the proposed Mixed model.

Available information
Modern gas transmission pipelines in the United States have performed relatively well during past earthquakes, even though they are not immune to permanent ground deformation (Lanzano et al. 2013;O'Rourke and Palmer 1996). For example, lack of damage has been observed in 11 earthquakes (Eidinger 2020), even though major damage to gas transmission pipelines has been observed in the 1971 San Fernando and 1994 Northridge earthquakes (Ariman 1983;O'Rourke and Palmer 1994); in the latter two earthquakes, repairs were observed in areas with reported permanent ground deformation as well as in areas without reported permanent ground deformation. Based on the subset of PHMSA incident reports since 2010 that correspond to earthquake-induced consequences, no fatalities or injuries have been observed; however, total costs on the order of millions of dollars have been documented (Pipeline and Hazardous Materials Safety Administration 2020a). As a result, we focused primarily on repair costs when quantifying earthquake risk using AAL.
After an earthquake occurs, the location and severity of defects in the gas network may not be known immediately, and, hence, restoration activities may include leak surveys, patrols, or integrity management engineering (Pacific Gas and Electric Company 2016). When a defect requiring a remedy is identified, an operator may decide to repair the defect or replace the segment . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press. (Batisse 2008). If the defect is repairable, the repair process may involve activities such as excavation of soil and cleaning of pipe before applying sleeves, as well as testing and evaluation of pipeline after repair (Interstate Natural Gas Association of America 2016). If the damaged pipe segment needs to be replaced, the existing segment is removed and the new segment is recoated.
In earthquake engineering of gas pipelines, there are two stages for estimating the costs to repair a leak (break), ( ): (i) estimating the replacement value of the pipeline and (ii) estimating the damage ratios that correspond to the damage state of interest (Federal Emergency Management Agency 2012). Table 15.27 of the Hazus methodology provides best estimate damage ratios of 0.1 and 0.75 for leaks and breaks, respectively. Furthermore, the damage ratio for leaks could range from 0.05 to 0.2 while that for breaks could range from 0.5 to 1.0. However, these damage ratios are adapted from those for oil pipelines and more importantly, the replacement value for gas pipelines is a major source of uncertainty.
We attempted several approaches to estimate the replacement value for a kilometer of gas pipeline. For example, a regression model from the literature is available for estimating the construction cost for gas pipelines, herein referred to as Rui et al. 2011(Rui et al. 2011). The construction cost comprises four components (material, labor, securing right-of-way access, and miscellaneous) and varies with the length, cross-sectional area, and geographic region of the pipeline. We applied this regression model to all pipeline systems in the NPMS dataset and then normalized the resulting construction cost by the system length to estimate the replacement value per kilometer of pipe. Figure 5 presents the resulting distribution of replacement values. The figure shows relatively large uncertainty, with a logarithmic standard deviation of 0.714, because the replacement value varies with geographic region, among other factors (Rui et al. 2011). Independently, we also obtained replacement values per kilometer of pipe from FEMA, which vary by different states in the CONUS; these values are also shown in Figure 5. Collectively, these estimates suggest that on average, the replacement value per kilometer of gas pipeline is roughly $300,000 to $500,000. Furthermore, applying these geographic-region-dependent cost rates to the NPMS dataset suggests that the total economic value of onshore gas transmission pipelines in the CONUS is approximately $200 billion. Henceforth, we estimate repair costs using the regression model by Rui et al. 2011 because this model facilitates estimation of uncertainty in repair costs.  . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press. Page 17 of 42

Uncertainty analysis
As illustrated by Equation 6, the uncertainties in the repair costs, ̃ and ̃, also contribute to the overall uncertainty in the resulting AAL estimates. We modeled these two sources of uncertainty by quantifying the plausible range for these two types of repair costs. Specifically, the normalized replacement values in Figure 5 were first grouped by six geographic regions of the CONUS, where the regions are defined by the U.S. Energy Information Administration and are identical to those in Rui et al. (2011). Then, for each region, the median was multiplied by a damage ratio of 0.1 to obtain the best estimate of cost to repair a leak, ̃. As shown in Figure  6 with solid lines, these best estimates of ̃ vary with geographic region and range from approximately $30,000 to $70,000 per leak. Similarly, the median of the replacement values was multiplied by a damage ratio of 0.75 to obtain the best estimate of cost to repair a break, ̃. As shown in Figure 7 with solid lines, these best estimates of ̃ also vary with geographic region and range from approximately $200,000 to $600,000 per break. The best estimates of ̃ and ̃ were quantified using the median because the distribution of normalized replacement value is skewed, which is evident from the shape of the histogram in Figure 5. To capture a 'low' estimate of ̃ and ̃ for a given geographic region, the 2.5 percentile replacement value was multiplied by a damage ratio of 0.05 and 0.5, respectively. To capture a 'high' estimate of ̃ and ̃ for a given geographic region, the 97.5 percentile replacement value was multiplied by a damage ratio of 0.2 and 1.0, respectively. The low and high estimates of ̃ are shown as dashed lines in Figure 6, whereas those for . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press.
Page 18 of 42 ̃ are shown as dashed lines in Figure 7. These results indicate a relatively large uncertainty in precisely estimating both ̃ and ̃. To examine the reasonableness of the estimated repair costs, we also present repair costs from two alternative sources of data. In the first case, the normalized replacement values from FEMA are grouped by the same geographic regions and multiplied by the best estimate damage ratios of 0.1 and 0.75 for leaks and breaks, respectively. Except for the Central region, these FEMA estimates are slightly lower than those estimated using the regression model from Rui et al. (2011). In the second case, the industry averages from the Interstate Natural Gas Association of America are also superimposed in the figures; they are lower than the best estimates from the other two approaches because the industry averages exclude costs associated with excavation, recoating, and post-repair evaluation of pipelines (Interstate Natural Gas Association of America 2016).

Risk estimates from individual vulnerability models
To develop a better understanding of the risk estimates, we isolated each of the three vulnerability models' influence; consideration of additional models can be found in the appendix. For each model, we implemented Equation 6 to quantify the AAL for every county Ω within the CONUS. Specifically, we used the 30 -adjusted hazard curves from the USGS 2018 NSHM (Figure 1) to estimate ̂( ), the variation of default logic trees (Table 1 and Figure 3) to quantify ̂, , and the best-estimate repair costs ( Figure 6 and Figure 7) to quantify ̃ and ̃. Using the proposed Mixed vulnerability model that varies with decade of installation ( Figure 4) to estimate ̃( , , ) and ̃( , , ), the resulting geographic distribution . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press. of nationwide AAL by county is portrayed in Figure 8. Most of the CONUS corresponds to low risk (0.1% or less); the counties with very low risk (0%) correspond to either no damage from strong ground shaking or lack of gas transmission pipelines within the county (but distribution lines can still exist within these areas). In the other extreme, CA corresponds to the highest risk, which is expected because CA experiences a higher frequency of earthquakes than other parts of the CONUS (Petersen et al. 2019). This result for CA is especially noteworthy because among all states in the CONUS, the largest share of total HCA mileage exists in CA. In between these extremes, medium levels of risk are observed in several states in the western United States and in the NMSZ (i.e., IL, KY, TN, MS, AR, MO).
A slightly different geographic distribution of risk is observed when the nationwide seismic risk assessment was repeated separately with the other two vulnerability models. For example, implementing Equation 6 with the same best estimates of hazard, exposure, and consequence but using the X grade model from ALA yields the geographic distribution shown in Figure 9. Under this assumed vulnerability model, most of the CONUS again corresponds to low risk, whereas CA again corresponds to the highest risk. However, the risk is now lower in several places such as the NMSZ, SC, and OR; furthermore, the risk is now higher in NV, AZ, WY, NM, and TX. The difference between the two maps highlights the influence from the considered vulnerability models.
Since CA consistently corresponds to the highest risk within the CONUS and since multiple research efforts are underway to advance understanding of the earthquake threat to CA gas pipelines (e.g., Eidinger 2020), we present the nationwide risk from all three vulnerability models in Table 2 separately for CA and the remaining CONUS. For each vulnerability model, we also present the average annual number of leaks, , and the average annual number of breaks, , which are based on aggregating results using Equations 1 and 2, respectively. To facilitate interpretation, we normalize the average annual number of leaks and breaks by the total mileage within each geographic region.
Regardless of leaks, breaks, or total repair costs, the X grade model yields the lowest risk while the Hazus model yields the highest risk. Note that the zero values for X grade are rounded values since the resulting rates and AALs are nonzero. Applying the X grade model to all pipeline segments essentially assumes that all are constructed with the same joint type and quality of construction because the X grade model depends on PGV only (i.e., = 1 in Equation 6). However, this assumption contradicts the fact that not all the pipelines are of the same joint type and construction quality. In contrast, applying the Hazus model to all pipeline segments required = 2 logic tree branches for each pipeline segment to capture pipeline ductility. The corresponding branch probabilities for ductility were determined by combining the default branch probabilities for both pipe material and decade of installation (Table 1). Since 99.5% of the total pipeline mileage is made of steel and 96.2% was installed 1940 or later, the estimates from this separate application of the Hazus model are approximately equal to those obtained by applying the ductile curve from Hazus to all pipeline segments, which was used as a check of the results shown in Table 2. . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press. Page 20 of 42   . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press.  Figure 4); = average annual number of leaks; = average annual number of breaks; AAL = average annual loss. *In this paper, a "leak" is a modest damage mode that can be repaired, whereas a "break" is a more catastrophic damage mode that requires pipe replacement. The AAL estimates include only costs associated with physical repairs of damage modes; they exclude costs associated with the amount of released gas, duration of service disruption, or other indirect economic losses.
As discussed previously in the Vulnerability section, the proposed Mixed model captures both the X grade and Hazus models using each segment's decade of installation as a proxy for quality of construction. Therefore, unsurprisingly, the risk estimates from the proposed Mixed model fall somewhere in between those from X grade and Hazus. For example, the AAL estimate for CA from the Mixed model is about $4 million, which lies between the estimates from X grade ($0.1 million) and Hazus ($6 million) models. Finally, while the risk estimates may vary appreciably depending on the assumed vulnerability model, similar features are observed in the relative geospatial distribution of risk across the different vulnerability models; additional results for other models can be found in the appendix. This observation is consistent with preliminary findings (Kwong et al.) and suggests that maps of relative risk may still be helpful despite relatively large uncertainties associated with estimating potential seismic damage of gas pipelines; however, more research is needed to confirm or deny this finding.
While separately applying each vulnerability model develops an understanding of each model's influence on the resulting risk, this approach does not fully capture the uncertainty in the AAL estimate arising from specification of a vulnerability model because the models are not collectively exhaustive. For example, if only one vulnerability model were available to estimate RRs, using only one model in the preceding approach will erroneously convey lack of uncertainty from the choice of vulnerability model. Furthermore, the vulnerability models are not mutually exclusive because the empirical data used for developing the Hazus model are a subset of the data used for developing the X grade model. Alternatively, estimates of the uncertainties in ̃( , , ) and in ̃( , , ) will be used in the next subsection to more fully capture the uncertainty in the AAL estimate arising from specification of a vulnerability model. . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press. Page 22 of 42 Despite the preceding caveats, documenting nationwide estimates of earthquake risk enables comparison against other risk assessment efforts, encourages more transparent deliberation regarding alternative approaches, and facilitates decisions on potentially assessing localized risks that require more spatially accurate data. Considering all three vulnerability models, Table 2 documents that for CA, the leak rate varies from 4 × 10 −5 year-km to 2 × 10 −3 year-km, the break rate varies from 9 × 10 −6 year-km to 6 × 10 −4 year-km, and the AAL varies from $0.1 million to $6 million. For the remaining CONUS, the leak rate varies from 1 × 10 −6 year-km to 4 × 10 −5 year-km, the break rate varies from 3 × 10 −7 year-km to 1 × 10 −5 year-km, and the AAL varies from $0.07 million to $3 million.
To provide additional context for readers, we also utilized PHMSA data to document "background rates" (see Table 5-4 in Eidinger 2020), which capture the relative frequency of gas pipeline damage in a given year from a variety of causes. In 2018, the background rate of "leaks" (defined per Pipeline and Hazardous Materials Safety Administration 2020b) for gas transmission pipelines in the United States is approximately 2.9 × 10 −3 leaks per year-km, and the background rate of "significant incidents" (defined per 49 C.F.R. § 191.3) is approximately 1.2 × 10 −4 incidents per year-km. Our long-term seismic risk estimates for the entire CONUS (from 3 × 10 −6 year-km to 1 × 10 −4 year-km for leaks and from 6 × 10 −7 year-km to 3 × 10 −5 year-km for breaks) are lower than the background rates, which is expected because the latter are due to a variety of threats that may or may not include earthquakes. We note that in such comparisons, background rates are for a given year, whereas large earthquakes occur infrequently over the span of many years.
To better understand the relative influence on risk from variations in these six inputs, we conducted a sensitivity analysis (e.g., Appendix A in Porter 2017) in the context of estimating AAL for the CONUS. First, to capture the plausible range of possibilities for each input, we defined three levels: low, best estimate, and high. Second, we used the best estimates for all six inputs to develop a "baseline" estimate of the nationwide AAL. Third, for each of the six inputs, we used the low and high estimates to quantify the resulting impact on the AAL estimate (i.e., "swing") while keeping the other five inputs unchanged. Finally, the inputs were sorted in order of decreasing swing.
In more detail, this sensitivity analysis required 12 additional implementations of Equation  6 for the CONUS. For the input of hazard curve ̂( ), the best estimate corresponds to hazard curves from the USGS 2018 NSHM (Figure 1), whereas the low (high) estimate corresponds to the 2.5 (97.5) percentile hazard curves. For the input of branch probability corresponding to decade of installation ̂, , the best estimate corresponds to the distribution of mileage from PHMSA 2018 annual reports, whereas the low (high) estimate corresponds to post-1970 (pre-1940); however, . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press. the decade of installation for a given segment is varied only within its nonzero branches to maintain plausible heterogeneity in the exposure model. For the input of leak RR ̃( , , ), the best estimate corresponds to 80% of the median RR curve that depends on the segment's decade of installation (Figure 4), whereas the low (high) estimate corresponds to 80% of the 2.5 (97.5) percentile RR curve. For the input of break RR ̃( , , ), the best estimate corresponds to 20% of the median RR curve that depends on the segment's decade of installation (Figure 4), whereas the low (high) estimate corresponds to 20% of the 2.5 (97.5) percentile RR curve. For the input of leak cost ̃, the best estimate corresponds to the region-dependent median value in Figure 6, whereas the low (high) estimate corresponds to the region-dependent 2.5 (97.5) percentile. Finally, for the input of break cost ̃, the best estimate corresponds to the regiondependent median value in Figure 7, whereas the low (high) estimate corresponds to the regiondependent 2.5 (97.5) percentile. In summary, care was taken to utilize segment-dependent estimates that are plausible in the AAL calculation while simultaneously ensuring that comparable ranges were employed for uncertainty in each of the six key inputs.
Results from the sensitivity analysis are summarized in a 'tornado diagram' (Figure 10). The baseline estimate of AAL for the CONUS is about $5.5 million, which is the sum of the AAL estimates from the Mixed model in Table 2. The figure shows that the break RR is the most influential input because it results in the largest swing. Specifically, assuming a high estimate for ̃( , , ) only (i.e., the median RR curves are still used for leaks) leads to an AAL estimate of $36 million. The second and third most influential inputs are decade of installation and hazard curve, respectively. As the decade of installation for segments with uncertain exposure varies from post-1970 to pre-1940, the brittle curve from Hazus is used instead of the X grade curve from ALA (Figure 4), and, hence, the resulting AAL estimate increases from $0.2 million to $22 million. This uncertainty in the AAL estimate can be reduced by decreasing the uncertainty in decade of installation for each pipeline segment (e.g., PHMSA is planning to collect data on decade of installation in Phase 2 of information collection activities for the NPMS). Finally, as the hazard curves for all segments increase from the 2.5 percentile curves to the 97.5 percentile curves, the resulting AAL estimate increases from $0.7 million to $21 million. refers to hazard curves from the USGS 2018 NSHM (Figure 1), logic trees for decade of installation from PHMSA 2018 annual reports, median RR curves from the proposed Mixed . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press. vulnerability model that varies with each segment's decade of installation (Figure 4), and median repair costs from application of regression model after Rui et al. (2011) (Figure 6, Figure 7). The vulnerability model (i.e., estimating potential seismic damage of gas pipelines) is the most influential source of uncertainty because it dictates the relative influence of the remaining sources of uncertainty ( Figure 10). To explore this hypothesis, we conducted two additional sensitivity analyses for the County of Los Angeles, CA, in which the uncertainty in vulnerability model is removed. More precisely, conditioned upon a vulnerability model, uncertainty in the AAL estimate from Equation 6 arises from uncertainties only in the following four remaining inputs: ̂( ), ̂, , ̃, and ̃.
In the first additional sensitivity analysis, we condition upon the X grade model. Because this model depends on PGV only (i.e., no uncertainty in ̂, ), the remaining uncertainty in the AAL estimate for the county comes from the hazard curves and repair costs. Figure 11 presents the results from systematically varying the remaining three inputs and quantifying the resulting influence on the AAL estimate. By conditioning upon the X grade vulnerability model, the cost to repair a break now becomes the most influential input, compared to the ranking of break cost in Figure 10, where the RRs were uncertain. In the second additional sensitivity analysis, we conditioned upon the fragility model by Tsinidis et al. (2020b). In this model, the fragility depends on properties of the surrounding soil (i.e., trench type, depth to bedrock, degree of contrast between two adjacent soil sub-deposits in a vertical geotechnical discontinuity, and soil cohesiveness) in addition to other input pipeline attributes (i.e., burial depth, grade of steel, and pipe diameter), resulting in a total of seven elements in the vector . The total number of branches for a given segment now becomes = 1,296 branches, and the corresponding branch probabilities were estimated from PHMSA annual reports (Pipeline and Hazardous Materials Safety Administration 2020b), USGS National Crustal Model (Boyd 2020), and PHMSA incident reports (Pipeline and Hazardous Materials Safety Administration 2020a) (more details are provided in the appendix). Since the output is exceedance probability, we apply the fragility model by expressing ̃( , , ) = Pr( = | , , ) ÷ 1 km and ̃( , , ) = Pr( = | , , ) ÷ 1 km, where ULS and GCLS denote ultimate limit state and global collapse limit state (Tsinidis et al. 2020b), respectively, and , now refers to many more pipeline attributes for the th branch of segment . Unlike the X grade model, the remaining uncertainty in the AAL estimate for the county now comes from the estimated branch probabilities ̂, , which correspond to seven input pipeline attributes, in addition to the hazard curve ̂( ) and repair costs, ̃, and ̃. Figure 12 -Example sensitivity analysis of estimating AAL in the County of Los Angeles, CA, conditioned upon the fragility model by Tsinidis et al. (2020b). Figure 12 presents the results from systematically varying these 10 uncertain inputs and quantifying the resulting influence on the AAL estimate. By conditioning upon this fragility model, the cost to repair a break and the hazard curve still remain the top two most influential inputs as in the previous case with the X grade model ( Figure 11); however, trench type (i.e., compaction level of backfill) and soil pair (i.e., type of vertical geotechnical discontinuity) now become more influential than the cost to repair a leak. (No influence from steel grade is shown in the figure because geospatial interpolation of data from PHMSA incident reports suggests that all segments in this county have specified minimum yield strength less than 414 MPa (60 ksi) and at the same time "X60" was the lowest steel grade considered in the fragility model.) This illustrative example highlights the importance of developing more vulnerability models that are directly applicable to regional seismic risk assessments of gas pipelines, not only for estimating potential seismic damage but also for identifying the input pipeline attributes that can be important.
Collectively, these sensitivity analyses suggest that the vulnerability model is the most influential source of uncertainty because it dictates the relative influence of the remaining sources of uncertainty. However, the rankings for these sensitivity analyses can vary for different estimates of the underlying uncertainties, and, hence, future work should improve such estimates. For example, more accurate estimates of epistemic uncertainties in hazard (Figure 1) can move the horizontal bar for 'hazard curve' in the tornado diagrams either up or down.

Limitations and research needs
While this paper offers insights into earthquake risk and its uncertainty for gas pipelines at the national scale, it is prudent to reiterate the scope and highlight critical limitations. First, our risk estimates are not a substitute for localized seismic risk assessments with more spatially accurate data; instead, our risk estimates are intended to facilitate decisions on research needs and whether . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press. more detailed analyses are warranted. Our seismic risk assessment is similar to the first of two phases as described in (American Lifelines Alliance 2005). If more detailed analyses are warranted, earthquake scenarios can be used to examine aspects such as spatial correlation of ground motions, potential service disruption from ground failures, or potential disproportionate impacts on marginalized communities (Davis, Mostafavi, and Wang 2018;Eguchi and Taylor 1987;De Risi et al. 2018). Second, the presented earthquake risk is due to impacts from strong ground shaking only; it does not account for the possibility of higher risk from ground failures, even for post-1945 shielded electric arc steel pipelines (Baum, Galloway, and Harp 2008;O'Rourke and Palmer 1996). Third, the results described in this paper apply only to onshore gas transmission pipelines in the CONUS; risk from impacts on compressor stations, storage facilities, or distribution lines are beyond the scope of the paper. Fourth, the AAL captures only the costs required to repair leaks and breaks; it excludes costs associated with the release of gas, service disruption, or other indirect economic losses (American Lifelines Alliance 2005;Eguchi 1995;Mousavi, Hesari, and Azarbakht 2014), which could greatly exceed costs associated only with physical repairs. However, given appropriate caveats, Equation 6 may provide a framework to incorporate such costs if available; e.g., an estimate of cost associated with service disruption from a leak could be substituted into ̃. Finally, our estimates for the various sources of underlying uncertainties can be further improved, which may change the tornado diagrams' rankings.
To outline a path for potentially reducing uncertainties, we highlight major research needs. First, more vulnerability models that are directly applicable to regional seismic risk assessment of gas pipelines need to be developed. For example, more research on experimental testing (e.g., O'Rourke 2010; Sextos et al. 2018;Stewart et al. 2015) and finite element simulations (e.g., Ni, Mangalathu, and Liu 2020;Ni, Mangalathu, and Yi 2018;Tsinidis et al. 2020b) would help close this knowledge gap. Second, more research is needed to identify, prioritize, and measure pipeline attributes that are important for estimating potential seismic damage of gas pipelines. For example, while our sensitivity analyses offered some insight into this challenge, the analyses did not capture other potentially important pipeline attributes such as buried vs. aboveground pipelines. Third, more research is needed to quantify seismic hazards at the national scale for both ground failures and ground shaking.

Conclusion
We quantified the earthquake risk for onshore gas transmission pipelines in the CONUS due to strong ground shaking. Specifically, we integrated 30 -specific hazard curves, a logic tree-based exposure model, three different vulnerability models, and a consequence model for estimating costs required to repair leaks and breaks along gas pipelines. We found that: 1. The earthquake risk in CA is consistently the highest, whereas the earthquake risk in the rest of the CONUS is low, irrespective of the conditioned vulnerability model; 2. In CA, the leak rate varied from 4 × 10 −5 year-km to 2 × 10 −3 year-km, the break rate varied from 9 × 10 −6 year-km to 6 × 10 −4 year-km, and the AAL varied from $0.1 million to $6 million; and These risk estimates enable comparison against other risk assessment efforts, encourage more transparent deliberation regarding alternative approaches, and facilitate decisions on potentially assessing localized risks due to ground failures that require more spatially accurate data. Furthermore, we identified and estimated different sources of uncertainty in the seismic risk assessment. Based on the uncertainties approximated herein, the resulting sensitivity analyses suggest that the vulnerability model is the most influential source of uncertainty because it dictates the relative influence of the remaining sources of uncertainty. For example, a vulnerability model that includes properties of surrounding soil as input pipeline attributes can reveal that the uncertainty in compaction level of trench backfill is more influential than that in the cost to repair a leak. However, the relative influence can vary for different estimates of the underlying uncertainties and, hence, future work should improve such estimates.
As a first stage in a systematic nationwide assessment, our study illustrated the complexity of quantifying seismic risk to gas pipelines because even without ground failures, many different types of uncertainties exist. Considering the scope and limitations, our study highlighted major research needs such as: 1. Developing more vulnerability models that are directly applicable to regional seismic risk assessment of gas pipelines; 2. Identifying, prioritizing, and measuring input pipeline attributes that are important for estimation of seismic damage; and 3. Better quantifying seismic hazards and their uncertainties at the national scale, for both ground failures and ground shaking.

Appendix
Besides results for the three vulnerability models considered in the main body of this paper, results for additional models are presented in this appendix. Table 3 summarizes the additional damage models considered. While ALA was derived for water pipelines, it is commonly used in regional seismic risk assessments of gas pipelines (Gehl et al. 2014). The L2013 model is used in the gas industry (Mangold and Huntley 2016), and the T2020 model captures effects of soils surrounding the pipeline, though both models provide exceedance probabilities instead of RRs. The E2020 model provides RRs that are specific to gas pipelines in CA; the backbone model was used when the operator is neither Pacific Gas and Electric Company (PG&E) nor Southern California Gas Company (SoCalGas).  (Lanzano et al. 2013); T2020 = Tsinidis et al. 2020 (Tsinidis et al. 2020b); RS1 = risk state 1 (i.e., leak); RS2 = risk state 2 (i.e., break); OLS = operable limit state, PILS = pressure integrity limit state, ULS = ultimate limit state (i.e., leak), GCLS = global collapse limit state (i.e., break); E2020 = Eidinger 2020 (Eidinger 2020  . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk  Table 4 summarizes the variation of default logic trees for isolating the influence of the ALA model on earthquake risk. Two different diameter ranges were considered in the ALA model, resulting in two different branches for the node corresponding to pipe diameter; corresponding branch probabilities were obtained by summing the probabilities for the default branches of pipe diameter. A subset of relevant pipe materials was selected as logic tree branches; the corresponding probabilities were obtained by summing the probabilities for the default branches of steel. To capture the effect of corrosion on earthquake risk (Section 4.4.4 in American Lifelines Alliance 2001), the branch probabilities for corrosion were obtained by summing the probabilities for the default branches of steel and decade of installation.  (2001). Total of 2×2×3=12 branches with branch probabilities that vary per pipeline segment. *When material is non-steel, "backbone" version is used, irrespective of diameter and corrosion. ** When material is steel and diameter is large, "large steel" version is used, irrespective of corrosion. ***Probability of "corrosive" corresponds to probability of "cathodically unprotected bare steel" and "installed before 1970," whereas probability of "non-corrosive" corresponds to probability of "steel with some corrosion protection" and "installed 1990 or later."  . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press. Table 5 summarizes the variation of default logic trees for isolating the influence of the L2013 model on earthquake risk. Two different materials were considered: (i) diameter-specific version of the model was used for steel pipes, whereas (ii) the "backbone" version was used for non-steel pipes (e.g., plastic). Furthermore, two different diameter ranges were considered in the L2013 model, resulting in two different branches for the node corresponding to pipe diameter. Like other damage models, the branch probabilities for each node were obtained by summing the probabilities for the corresponding default branches.

Input pipeline attribute
Table 5 -Variation of default logic trees for the L2013 model by Lanzano et al. (2013). Total of 2×2=4 branches with branch probabilities that vary per pipeline segment. *For non-steel material (e.g., plastic), "backbone" version is used, regardless of diameter.  Table 6 summarizes the variation of default logic trees for the T2020 model. The T2020 model dictated the possibilities for each input pipeline attribute. For example, six different diameters were considered therein, resulting in six different branches for the diameter node. As another example, three different soil pairs were considered in the T2020 model, resulting in three different branches for the corresponding node. To capture different soil pairs for different segments, we first determined the soil class for a given segment using its 30 ; in the T2020 paper (Tsinidis et al. 2020b), soil class A refers to soft soil ( 30 <360 m/sec), whereas soil class C refers to stiff soil ( 30 >800 m/sec). Then, we used the worst-case scenario between the segment's soil class and soil classes from all adjacent segments within a 1.6-km (1-mile) radius to uniquely assign a soil pair. Table 6 -Variation of default logic trees for the T2020 model by Tsinidis et al. (2020b). Total of 6×3×2×2×3×2×3=1,296 branches with branch probabilities that vary per pipeline segment. *For diameter, used following diameter thresholds (mm) to assign mileage distributions from PHMSA annual reports to branch probabilities: 457, 635, 838, 991, and 1143. **For trench type, a depth of 1.5 m below the ground surface was chosen to evaluate shear modulus and estimate compaction level of backfill soils above buried pipelines because all pipeline models in T2020 were below a surficial layer of soil with thickness of 3.0 m. ***For soil pair, assumed each segment corresponds to a unique pair of soils; VS30 thresholds from Table 3.1 in Eurocode (CEN (European Committee for Standardization) 2004).

Input pipeline attribute
No. of branches Branch possibilities Basis for branch probabilities . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk Besides PHMSA annual reports (Pipeline and Hazardous Materials Safety Administration 2020b), we also used data from the USGS 2020 National Crustal Model (Boyd 2020) to assign input pipeline attributes related to the soils surrounding each pipeline segment. Based on the T2020 model, "bedrock" is defined as subsurface material with a shear wave velocity of 1 km/sec; therefore, we chose the parameter 1.0 (Petersen et al. 2019) to quantify bedrock depth. Specifically, we obtained the value of 1.0 from USGS 2020 National Crustal Model for each segment and used linear interpolation to assign the appropriate branch probabilities. All pipelines in the T2020 model were buried below a surficial layer of backfill soil with thickness of 3.0 m. Furthermore, the T2020 model considers two categories of backfill soils that reflect different compaction levels: (i) trench type TA with a shear modulus of 37.1 MPa and (ii) trench type TB with a shear modulus of 63.15 MPa. With this context, we chose the shear modulus at a depth of 1.5 m below the ground surface, (1.5), to classify the trench type.
We relied on data from PHMSA incident reports (Pipeline and Hazardous Materials Safety Administration 2020a) to estimate burial depths and steel grades. First, we gathered data on all gas transmission pipeline incidents since 2010 from PHMSA reports and then identified the subset of . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press. data where both the "depth of cover" and "specified minimum yield strength" were documented for the pipelines. Finally, we geospatially interpolated these data to assign different values of burial depth and steel grade for the different segments.
The results for CA from the proposed Mixed model were examined with information from the E2020 model. Specifically, for SoCalGas pipelines, the oxy-acetylene, backbone, and arcwelded sub-models were used for segments of pre-1940, 1940 to 1969, and post-1970 vintage; these sub-models correspond respectively to Figures 8-3, 8-1, and 8-2 in Eidinger (2020). For PG&E pipelines, the sub-model from Figure 8-5 in Eidinger (2020) was used and for pipelines in CA that are managed by neither SoCalGas nor PG&E, the sub-model from Table 8-6 in Eidinger (2020) was used.
Separate implementations of Equation 6, using the additional models listed in Table 3, lead to the nationwide estimates of earthquake risk shown in Table 7 and Figure 13 to Figure 16.  *In this paper, a "leak" is a modest damage mode that can be repaired, whereas a "break" is a more catastrophic damage mode that requires pipe replacement. The AAL estimates include only costs associated with physical repairs of damage modes; they exclude costs associated with the amount of released gas, duration of service disruption, or other indirect economic losses. Figure 14 -Distribution of nationwide AAL by county using the L2013 fragility model after Lanzano et al. (2013). . "Earthquake risk of gas pipelines in the conterminous United States and its sources of uncertainty." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, in press.
Page 33 of 42 Figure 15 -Distribution of nationwide AAL by county using the T2020 fragility model after Tsinidis et al. (2020b).

Notation
The following symbols are used in this paper: Ω = Attribute of pipeline segments (e.g., county, operator) for grouping segments and aggregating corresponding earthquake risk; = Cost to repair a break along pipeline; = Cost to repair a leak along pipeline; = Shear modulus of soil; ( ) = PGV hazard curve at midpoint of a given pipeline segment ; = Length of a given pipeline segment ; , = Probability for th branch of the logic tree for quantifying epistemic uncertainty of pipeline attributes for a given pipeline segment ; ( , ) = Repair rate for breaks as a function of PGV level and vector of pipeline attributes; ( , ) = Repair rate for leaks as a function of PGV level and vector of pipeline attributes; = Vector of pipeline attributes (e.g., geographic location, decade of installation, pipe diameter, characteristics of surrounding soil) for a given pipeline segment ;