Large-Data Omics Approaches in Modern Remediation

Forum papers are thought-provoking opinion pieces or essays in fact, containing speculation, on a civil engineering topic of general interest and relevance to the readership journal. The views expressed in this Forum article not the

Forum papers are thought-provoking opinion pieces or essays founded in fact, sometimes containing speculation, on a civil engineering topic of general interest and relevance to the readership of the journal. The views expressed in this Forum article do not necessarily reflect the views of ASCE or the Editorial Board of the journal.

Background
Hydrocarbons, heavy metals, and nitroaromatic compounds are prominent contaminants of concern due to their recalcitrance and ubiquity. Removing these contaminants traditionally involves physicochemical approaches, but bioremediation approaches offer a sustainable path for contaminant detoxification. Bioremediation relies on microbially mediated environmental contaminants' transformations and the interactions between biological systems (Chandran et al. 2020).
The Systems Biology approach has been effective in identifying sources of contamination and developing remediation strategies by supplementing purely geochemical observations with biologically relevant measurements. The implementation of such an approach relies heavily on biological specimens such as genomes, proteins, lipids, metabolites, and transcribed RNA functioning as biosignatures. Investigations related to the aforementioned biological materials commonly are referred to as omics, and are becoming increasingly accessible and cost-effective while providing progressively more significant data. Omics tools provide targeted or community-wide investigations of organism abundance, functional capabilities, and biological activity, with each approach offering multiple lines of evidence for remediation strategies. Some technologies, such as polymerase chain reaction (PCR) or compoundspecific assays, can offer targeted source tracking of proteins, metabolites, DNA, and RNA at a low cost. Omics approaches offer biologically relevant information from a community-level assessment of genes and bacteria present to a molecular-level view of active production of proteins, metabolic activity, and environmental responses (Fig. 1). If a site is planning to stimulate reductive processes for remediation, an informed choice can be made regarding injectate selection if there is adequate knowledge about the microbiome being dealt with. Surveys of microbial activity during the injection may inform the anticipated byproducts and amendments to promote continued reduction. With bioinformatics techniques, omics data can provide single-point assessments of sites or even can be used to generate informative time-series evaluations and environmental models. This paper provides examples and discussions of effective uses of omics techniques in the remediation of metals, chlorinated solvents, and hydrocarbons, as well as the advantages and limitations of omics within the context of remediation and site management.

Using Omics to Track, Identify, and Assess the Extent of Site Contamination
Utilizing omics in the early phases of site investigation and remediation is advantageous in determining the history and extent of contamination at a site (Smith et al. 2015). Whole community analysis for organisms or genes of interest can be tracked using specified genetic or biological markers, and some targeted approaches such as quantitative PCR (qPCR) can be used to track a specific gene or species of interest by providing a result within only a few hours of sample collection. Gene tracking with qPCR has been used successfully to track nitrate contamination (Carrey et al. 2021), bacterially mediated denitrification in groundwater (Kim 2020), and communities of oil-degrading bacteria during the Deepwater Horizon Spill (Hazen et al. 2010). The qPCR technique also has been used effectively to determine the source of pathogens (Vadde et al. 2019). Metatranscriptomes of RNA from active eukaryotic cells have identified target genes that can serve as biosensors for detecting organic contaminants and heavy metals in soil and water (Lehembre et al. 2013;Pei et al. 2020).
The success of the bioremediation approach depends on whether the concentrations of contaminants have been reduced to environmentally acceptable levels and continue to satisfy regulatory requirements. During the Deepwater Horizon Oil Spill these omics approaches were used and relayed to the Joint Command, and demonstrated that the oil was degrading and dispersing rapidly and that bacteria capable of degrading oil were present in the water column even at 1,500 m (Hazen et al. 2010). Although publication of Hazen et al. (2010) was embargoed until August 2010, Science allowed the authors to release the data and conclusions of the paper to the government Joint Command in July, which allowed the regulators to make the decision that no further engineering of the oil plume in the water column would be necessary. There have been many recent advances and techniques in systems biology and metabolic engineering that are enabling bioremediation strategies for various contaminants (Dangi et al. 2019).
Although targeted approaches are valuable, high-throughput omics approaches provide massive data returns with correlative power to track contaminants such as nitric acid and metals stress using biological indicators (Smith et al. 2015;Tian et al. 2020).
Biological indicators of stress may be the presence or absence of certain taxa, as well as the presence of genes, proteins, metabolites, or lipids associated with microbial stressors. The result can be a comprehensive assessment of the community composition and genetic potential from metagenomics, or simply a roster of microorganisms obtained by sequencing the ribosomal RNA of a bacteria known as 16S rRNA. Gene surveys from metagenomes and genotyping devices such as the GeoChip can provide quantitative data on potential functions and specific gene markers for contaminants such as toxic metals, nitrogen compounds, aromatics, aromatic hydrocarbons, chlorinated solvents, and others (He et al. 2010). The process of targeted analysis by proteomics and transcriptomics also has been successful in identifying active microbial responses related to contamination (Singh 2006).
Non-DNA-based proteomics and metabolomics approaches provide insight into the phenotypic and metabolic expression of a microbial community by assessing the proteins and metabolites created by active cells. Both approaches investigate cellular activity at a molecular level (Mapelli et al. 2008), producing a survey of phenotypic expression and microbial interactions (Kleiner 2019). Metaproteomics approaches have been used in multiple remediation studies to identify proteins that are indicative of a microbial stress response and integral to the tolerance and biodegradation of the metal or organic contaminant (Khatiwada et al. 2020;Oka et al. 2011;Yun et al. 2016). Extractions of the phospholipid fatty acid (PLFA) components of bacterial cell walls can be used to generate a rapid summary of bacterial stress markers which have been used in the tracking of hydrocarbon contaminants from oil spills (Brewer et al. 2015;Hazen 2020;Willers et al. 2015). When cell expression and activity data complement genomic community data, the results can be a powerful insight into the community and its function.

Integration of Omics in Remediation
The application of metabolomics and biomarkers have been successfully used to suggest beneficial bioremediation strategies (Desai et al. 2010;Grostern et al. 2012). Genomic approaches successfully have determined the effects of contamination on shaping a community, as well as the geochemical and community transformations during remediation (Anderson et al. 2003;Paradis et al. 2016Paradis et al. , 2022Smith et al. 2015). In environmental systems contaminated with metals, nitric acid, solvents, or hydrocarbons, a reduction in the community richness and variation in abundant microbes have been observed (Hazen et al. 2010;Smith et al. 2015;Techtmann et al. 2015;Wu et al. 2017), with the system becoming naturally enriched with more-resistant species (Hazen et al. 2010). The result of this natural stress-induced enrichment is a community with a mechanistic and selective advantage similar to those targeted by some bioremediation treatments.

Omics in Electron Donor Injections
Existing genomic features or functions are targeted during electron donor injections when abundant quantities of nutrients stimulate reductive processes leading to the transformation of organic or redox-sensitive contaminants to less-toxic states. Taxonomic functional profiles sampled before and during injections have been used to identify particular groups of bacteria and community functions related to nitrate, sulfate, and metal reduction (Anderson et al. 2003;Gihring et al. 2011;Paradis et al. 2016Paradis et al. , 2022Watson et al. 2013). Sulfate-reducing bacteria (SRB) are very resilient to variable conductivity, contamination, and pH, and demonstrate consistency when stimulated by electron donor injections. Injections of electron donors such as ethanol (Jin and Roden 2011;Paradis et al. 2016Paradis et al. , 2022, emulsified vegetable oil (Gihring et al. 2011;Watson et al. 2013), and acetate (Anderson et al. 2003;Hwang et al. 2009) select for reductive processes, facilitating SRBs, and in environments with high uranium concentrations these injections temporarily can bioimmobilize or even reduce uranium (Anderson et al. 2003;Gihring et al. 2011). During acetate injection at the uranium-contaminated Rifle, Colorado site, there was a significant increase in the sulfate-reducing (Color) Omics approaches provide community-level and cellular-level data. These data can be related to measured activity and interaction among cells or interpreted to predict potential or capable function for a specific environment. bacteria population. including several Geobacter species (Anderson et al. 2003;Hwang et al. 2009). When supplemented with isotopically labeled acetate, the resulting profile resembled sulfate-reducing PLFA profiles with C 13 incorporation into DNA and cell wallextracted PLFA (Anderson et al. 2003;Hwang et al. 2009). A proteomic survey of the acetate-injected groundwater confirmed an abundance of acetate metabolism and energy generation proteins associated with Geobacter (Wilkins et al. 2009). Although the genes, pathways, and taxonomic composition may vary, the responsiveness of sulfate-and nitrate-reducing bacteria to injections in shallow groundwater systems has been well demonstrated (Gihring et al. 2011;Jin and Roden 2011;Paradis et al. 2016Paradis et al. , 2022Watson et al. 2013). Repeated injections of ethanol at the uranium-contaminated Y-12 site in Oak Ridge, Tennessee resulted in the microbial community composition shifting but retaining the increased functional capability for reduction of nitrate for several weeks postinjection (Paradis et al. 2022). The ethanol injections increased microbial activity, including the production of acetate, and the reduction of nitrate and uranium (VI) to uranium (IV) (Paradis et al. 2016(Paradis et al. , 2022. Similarly, following a single injection of emulsified vegetable oil at the uranium-and nitrogen-contaminated Y-12 site in Oak Ridge, the community continued to reduce uranium for over 269 days postinjection (Gihring et al. 2011). The emulsified vegetable oil injection also stimulated an increase in sulfate-reducing bacteria and Geobacter which persisted for over 269 days postinjection (Gihring et al. 2011;Watson et al. 2013).

Interpretation of Omics Data
The generation of vast data sets from studies that utilize omics approaches and the associated environmental metadata mandates us to harness computational methods that can help predict the biogeochemical parameters of contaminated sites based on the genomic information available (Faure et al. 2021;Smith et al. 2015). Metabolite profiling and proteome assessments provide information that is vital to understanding and developing biodegradation pathways and models. The adoption of the prior approaches along with the quantitative approaches to study the rates of metabolic reactions has led to the discovery of novel pathways for substrate utilization in bacteria upon exposure to varying contaminant stresses (Kitamura et al. 2019;Tang et al. 2007). Metabolic pathway databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (Kanehisa and Goto 2000) and the Biocatalysis/Biodegradation Database (EAWAG-BBD) (Gao et al. 2010) contain a great number of microbial degradation reactions and biodegradation pathways which can be a stepping stone for metabolic engineering of existing or activatable pathways for contaminant removal (Dangi et al. 2019). Omics techniques, like any other analytical method, have their own strengths and weaknesses. Each technique is designed to answer specific types of questions and to have specific strengths by design (Table 1); some techniques are able only to only what is present, and some are able only to indicate what is occurring (Fig. 1). These limitations often require the implementation of several types of omics approaches which can require energy-intensive computational analysis of millions of base pairs, thousands of metabolites, and even thousands of proteins. The high-throughput nature of omics approaches makes large data available at a lower price per gigabyte of data, but can increase the computational resources and expense of analysis. In addition to computational costs, analyzing samples for omics data may incur other operational costs such as specialized clean spaces, supplies, and the purchase of expensive analytical equipment required for omics detection.

Future Directions for Omics
Synthetic biology approaches are a growing area of interest. As the understanding of the natural world continues to evolve, so does the ability to target a specific response beneficial to remediation. Advancements in synthetic biology may be a precursor to the future of remediation. These may include specifically engineered microorganisms and synthetic communities and synthetic biomarkers (Beabout et al. 2021). As advancements in methods continue to improve accuracy and resolution, so do the libraries of known analytes. The greater volumes of data and open access to computational tools, some of which even allocate provisional space on supercomputers, creates a promising environment for improved accessibility and the potential for data processing.

Data Availability Statement
No data, models, or code were generated or used during the study.