Introduction

Two of the most common methods for high-throughput screening in directed evolution are the fluorescence-activated cell sorting (FACS) and, its more recent variation, the fluorescence-activated droplet sorting (FADS). While FACS cytometers can analyze and sort both cells and droplets (with or without cells), they cannot generate the latter. FADS cytometers, on the other hand, can generate droplets that will encapsulate cell-free mixtures or cells for subsequent analysis and sorting. Despite their differences, both systems are used for high-throughput screening of particles and we will mostly focus on FACS applications as they are more common/accessible among laboratories. FACS cytometers are complex instruments that can analyze the fluorescent characteristics of individual “particles” (e.g., cells, liposomes, beads, emulsion droplets, or polymer shells) at extremely high event rates (~10^7 per hour), isolating those with the desired signal for the desired trait

Challenges

The main bottleneck in the usage of these systems is that they require the linkage of a genotype (a nucleic acid sequence) to a phenotype (a functional feature like binding or catalytic activity) in an ultrahigh-throughput context (Griffiths and Tawfik, 2006). In other words, the activity of an evolved molecule must be somehow linked to an output that is detectable by the FACS machine (usually a fluorescence signal). Once detected, the particle is sorted and its content can then be sequenced for establishing the genotype-phenotype relationship. Creating experimental setups for these types of studies can be very difficult depending on the system one is working with. If you work with the directed evolution of fluorescent proteins, the signal is directly provided by your protein, however, if you work with metabolic enzymes, for example, you might have to use indirect methods, such as fluorogenic substrates (if available), transcription factors that detect the product of your reaction, being induced by it, and activate the expression of a fluorescent protein, etc. Thus, for screenings involving indirect relationships, things can get a way more complicated. We will see below many examples addressing these challenges and, hopefully, they will serve as inspiration for your project!

Firstly, we will briefly explore different FACS-based screening techniques in directed evolution. They can be divided into six distinct categories according to the methods used to link the genotype and phenotype: - cell surface display (or in vivo display) of enzymes, - fluorescent protein-based methods, - entrapment of the product inside the cells, - in vitro compartmentalization, - single-cell/microcolony compartmentalization

Cell surface display

This technique is based on the attachment of heterologous enzymes to the surface of microorganisms (the most common are: bacterial surface display (van Bloois et al., 2011)), yeast (yeast surface display (Boder and Wittrup, 1997)) or phages (phage surface display (Fernandez-Gacio et al., 2003)). Cell display offers ultrahigh-throughput capacity of up to 10^9 clones per round and allows direct contact between the enzyme and the added substrate, without requiring diffusion through the cell membrane. However, this accessibility is at the same time a major drawback as the product is also freely diffusible. This drawback has been overcome by the development of cell microencapsulation techniques that imitate natural cell compartmentalization e.g., including the target enzyme in microemulsion droplets along with the substrate and products (Griffiths and Tawfik, 2006). The main advantages of the cell-in-droplet approach are its high-throughput capacity (~10^9 clones per round) and the reduced volume of the microemulsions (typically ~5 femtoliters), such that little substrate is needed (Aharoni et al., 2005). Very recently, novel screening platforms using polymer encapsulated cells have been developed like: (1) the cellular high-throughput encapsulation, solubilization and screening (CHESS) method to allow the encapsulation of libraries of approximately 10^8 mutants in a polymer to form cell-like microcapsules (Yong and Scott, 2015), (2) the polymer shell method that relies on fluorescent hydrogel formation around E. coli cells (Pitzler et al., 2014) or yeast cells (Vanella et al,. 2019).

In vitro display

This method is based on the synthesis of the target protein through an in vitro translation system so that the genotype-phenotype linkage is achieved using molecules like ribosomes (ribosome display), puromycin (mRNA display), and the DNA replication initiator protein RepA (CIS display). More recently, novel in vitro display approaches have been reported that include liposome and megavalent bead display. Liposome display enables membrane proteins to be engineered, and it relies on their display on liposome membranes through the transcription and translation of a single DNA molecule using an encapsulated cell-free translation system (Fujii et al., 2014). Megavalent bead surface display allows tailor-made protein binders by laboratory evolution. In this method, the gene that encodes the protein of interest is attached to a bead by strong noncovalent interactions and the corresponding protein is displayed via a covalent thioether bond on the DNA (Diamante et al., 2013).

Fluorescent protein-based methods

Due to their fluorescent properties, GFP and other fluorescent proteins (FPs) variants are ideal candidates for single-cell fluorescence analysis (you can find more about them in this amazingly resourceful website: https://www.fpbase.org/) and, therefore, they can be used in flow cytometry screening (Yang and Withers, 2009). The main advantages of GFP-based screening methods are their very high-throughput (∼10^7–108 clones per round) and the lack of expensive fluorescent substrates. On the other hand, the biggest challenge in this approach is to link the enzymatic function or activity of interest to the FPs expression, which can be very difficult depending on the studied protein. While the results achieved using this flow cytometry-based screen are promising, we must emphasize that flow cytometry is unlikely to serve as a standalone method. It is well-known that individual cells from bacterial cultures composed of a single genotype may display a range of phenotypes attributable to the stochastic nature of gene expression. As one considers that the selection criteria in flow cytometry experiments are determined by the phenotype of an individual cell, the inherent population heterogeneity of a bacterial culture, especially one containing a library of genotypes, makes the task of counter-selecting for one phenotype against another extremely challenging. For example, isolating a single bacterium displaying a low level of fluorescence does not guarantee that a culture grown from that single bacterium will possess, as a whole, those same fluorescence characteristics. With this in mind, the need for reinforcing a single-cell enrichment strategy with traditional assays, measuring average expression levels for an entire population, should be clear. As FP-based methods are becoming more and more common, a few examples for the directed evolution of different molecules can be found below: - Transcription factor directed evolution (usually targeted to ligand and/or DNA binding domains) Tang and colleagues evolved a XylR regulator towards improved operating range, producing a more linear response of the system to the xylose inducer. In other words, a system which was originally considered a bistable molecular switch (digital) was evolved into an analogic one. In this example, the link between genotype and protein function was established by a transcriptional interaction: XylR directly regulated the expression of a GFP gene and mutations affecting the function of the regulator could be readily tracked by the analysis and selection of cells with fluorescence levels that were different from the original system. https://pubs.acs.org/doi/10.1021/acssynbio.0c00225 In a different approach, Machado and colleagues evolved a repressor called PcaV to act as a biosensor for hydroxyl-substituted benzoic acids (such as vanillin and other closely related aromatic aldehydes). The overall system was similar to the one discussed above, linking protein function to fluorescent output through a transcriptional system directly repressed by PcaV which was alleviated by the usage of inducing molecules. After the generation of PcaV mutants, cells were screened through multiple rounds of counter-selection using FACS. In the first round, libraries were uninduced and selected for low fluorescence levels in order to remove non-functional repressors (cells expressing GFP without the inducer probably had non-functional repressors). On the second round, the previously sorted libraries were induced with the new molecule e.g. vanillin (that did not induce the original PcaV) and selected for GFP expression (functional mutants responding to vanillin). On the third round, the uninduced library was selected for cells with low fluorescence levels (this time with a less stringent cutoff than the first round). Finally, cells were again induced with the new molecule and selected for the most fluorescent cells. With this approach, a new biosensor capable of detecting vanillin and other related molecules was generated. https://jbioleng.biomedcentral.com/articles/10.1186/s13036-019-0214-z In our last example, a good combination of dry and wetlab was employed. The IPTG responsive Transcription Factor LacI has been computationally redesigned and mutated to recognize four new non-native inducers (Taylor et al., 2016). This study is a good example of how the combination of computation-guided mutagenesis based on the Rosetta algorithm and different diversification techniques (such as epPCR and saturation mutagenesis) can accelerate the discovery/generation of new molecular functions. In a counter-selection round, the mutant libraries were pre-screened for retained ability to repress a TolC reporter (a porin that allows entry of toxic colicin E1) by growth enrichment in the absence of inducers but in the presence of colicin E1. The enriched LacI libraries were then positively screened by FACS for their ability to activate production of GFP upon induction with different sugars, resulting in variants of LacI that were responsive to fucose, gentiobiose, lactitol and sucralose (Taylor et al., 2016). In this instance, no active counter-screening was performed against activation by IPTG, so nearly all mutants retained their ability to respond to IPTG, a phenotype that was reduced for some designs upon shuffling of mutations for activity screening. https://academic.oup.com/nar/article/48/1/e3/5645005?login=true - Evolution of promoters/cis-regulatory regions For the study and directed evolution of promoter and cis-regulatory regions, a recent and extremely powerful strategy is the Sort-Seq. It relies on the power of FACS for sorting individual cells from libraries of regulatory sequences (controlling the expression of a FP) into different bins, according to their fluorescence levels. Later, each bin is sequenced through and the pool of regulatory sequences in each bin is analysed for the understanding, prediction and re-engineering of sequence-function (promoter-expression) relationships, https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2533-5 https://www.annualreviews.org/doi/full/10.1146/annurev-genom-083118-014845 Although not exactly a directed evolution project, the work from de Boer and colleagues (de Boer et al., 2019) provides an insightful resource for the generation and study of promoter libraries in yeast. Through the usage of random mutagenesis, a library of 100,000,000 promoters was cloned upstream of a Yellow Fluorescent Protein (YFP) gene and screened through FACS for different promoter activities (cells were binned according to their YFP expression levels). Each bin was then sequenced and a machine learning interpretable model was built with this data, allowing accurate prediction of the library and of independent test promoters that were designed de novo following the model rules. https://www.nature.com/articles/s41587-019-0315-8 In a similar approach using E. coli cells, Urtecho and colleagues (Urtecho et al., 2019) explored the combinatorial space of sigma70 promoter elements in E. coli a combinatorial library of promoter elements was generated, cloned upstream a GFP reporter gene, sorted into bins by FACS and sequenced (both DNA and RNA-seq) for establishing the genotype (promoter sequence)-phenotype(GFP expression bins) relationships. One could use this strategy to evolve not only constitutive promoters but also inducible promoters (see see Rohlihil et al., 2017 for the Engineering a Formaldehyde-Inducible Promoter and Yu et al., 2021 for the engineering of LacI based systems) towards different expression dynamics. https://pubs.acs.org/doi/10.1021/acs.biochem.7b01069 https://www.nature.com/articles/s41467-020-20094-3 https://pubs.acs.org/doi/10.1021/acssynbio.7b00114

Directed evolution of enzymes by linking the product to a biosensor Although linking enzyme mutant libraries to FPs for FACS-based directed evolution is not a straightforward task, many groups have achieved it through the adoption of in vivo Biosensors. These biosensors can be any molecular sensor (usually protein or RNA-based one) that will interact with the product of a specific reaction and modulate the expression of a FP according to the product concentration. The biggest challenge is to find/engingineer sensors that can detect specific metabolites (in the native cellular concentrations) and also possess a good dynamic range for reliably connecting the input (product concentration) to the output (FP production). A comprehensive review in this topic has been written by Lin and colleagues and will probably give you some inspiring ideas if you decide to work on this topic. https://www.sciencedirect.com/science/article/abs/pii/S0734975017300812 As an example of using a transcription factor to link the activity of an enzyme to the expression of a FP, let's consider the work of Kwon and collaborators (Kwon et al., 2018). In this study, the main goal was to change the substrate scope of a tryptophan-indole lyase from tryptophan-indole to tyrosine-phenol. The expression of the FP in E. coli cells was controlled by a phenol-inducible transcription factor (DmpR). Using this genetic circuit-based biosensor for detection of phenolic components, the substrate scope of a tryptophan-indole lyase was re-engineered by directed evolution towards the desired tyrosine-phenol-lyase with complete loss of the original activity. https://www.nature.com/articles/s41598-018-20943-8 RNA sensors have also been used to screen enzymes from directed evolution with improved activities. For example, Michener and Smolke (Michener and Smolke, 2012) have targeted yCDM1, an enzyme that is not very efficient in demethylating caffeine to theophylline. A RNA riboswitch was used as a theophylline biosensor. The RNA switch was placed in the 3′ UTR region of a fluorescent reporter gene. If no theophylline was available, the poly-A tail of the mRNA was removed, leading to rapid degradation and low gene expression. Addition of the small molecule ligand favored a conformation in which the mRNA was stable and translation could proceed. Thus, the enzyme was mutagenized through EP-PCR and the RNA-based biosensor was used for screening with FACS. A mutant version of yCDM1 was obtained showing 33-fold increase in enzyme activity and 22-fold increase in selectivity. (Michener and Smolke, 2012). https://www.sciencedirect.com/science/article/abs/pii/S109671761200047X?via%3Dihub
Directed evolution of RNA/DNA aptamers/switches If a biosensor for your system is not available, why not try to develop one through directed evolution? This strategy, although very laborious, has been adopted by many groups with a special focus on RNA-based sensors. The first class of RNA-based sensors we will see are Riboswitches. They are biosensors typically composed of an aptamer domain, which recognizes a specific ligand, and an expression platform that couples ligand binding to a change in the FP expression. A traditional example of riboswitch directed evolution was demonstrated by Sean and Gallivan (Sean & Gallivan, 2009). Libraries (theophylline aptamer, followed by a sequence of eight randomized bases, the RBS, and the red-fluorescent reporter gene DsRed) were generated through random primers and cloned into plasmids for in vivo expression in E. coli. The system was coupled to a FP in a way that FP expression was increased in the presence of theophylline through the exposure of the RBS region, which was previously “masked” by the mRNA secondary structure. A more recent study by Page and colleagues (Page et al., 2018) using the same system has also generated novel theophylline riboswitches by coupling a dual genetic selection and FACS. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2615613/ https://pubs.acs.org/doi/10.1021/acssynbio.8b00099 The second class of RNA-based sensors are toehold switches, A toehold switch is an RNA fragment with a secondary hairpin structure that is used to regulate translation. The toehold sequence contains a strong ribosome binding site (RBS) and a start codon that is followed by the coding sequence of a reporter gene. In the absence of a complementary sequence, the toehold is in its native hairpin conformation acting as a translational repressor and preventing ribosomes from binding to the RBS and thus from translating the reporter gene downstream. In the presence of a single stranded trigger complementary to the stem of the hairpin, the switch unfolds and exposes the ribosome binding site and the start codon and translation can be initiated. In a very comprehensive study, Green and colleagues (Green et al., 2014) have designed and characterized libraries of Toehold Switches, ending up with 26 sensors that could be used together with very little crosstalk and high dynamic range for the detection of endogenous mRNA in E. coli and the building of genetic circuits. The researchers used the NUPACK nucleic acid sequence design package (Zadeh et al., 2011) to generate libraries of de-novo-designed translational activators satisfying their desired parameters followed by synthesis of the oligos, cloning and in vivo testing of the libraries through FACS. In a more recent study,Angenent-Mari and colleagues ( Angenent-Mari et al., 2020) have taken one step further, using Deep Neural Networks (DNNs) to predict toehold switch function as a canonical riboswitch model in synthetic biology. To facilitate DNN training, they synthesized and characterized in vivo a dataset of 91,534 toehold switches spanning 23 viral genomes and 906 human transcription factors. DNNs trained on nucleotide sequences outperformed previous state-of-the-art thermodynamic and kinetic models. https://www.sciencedirect.com/science/article/pii/S0092867414012896 https://www.nature.com/articles/s41467-020-18677-1

Entrapment of the substrate/product inside the cells

An alternative to cell surface display strategies and FP-based screening methods are approaches in which the fluorescent product is trapped within the cell (Yang and Withers, 2009). This strategy faces two main challenges: (i) the obtention of commercial fluorogenic substrates for that specific reaction and (ii) the retention of such molecules inside of the cell. In this context, before starting a project based on this approach, one has to check for the availability of commercial fluorogenic substrates that cannot diffuse to the extracellular environment. The entrapment is usually achieved when the size, polarity, or other chemical properties of the substrate are modified, resulting in the retention of the product within the cell. As a successful example, Tan and colleagues (Tan et al., 2019) have improved the fucosylation catalytic efficiency of an α1,3-FucT fucosidase by 14-fold after three rounds of directed evolution in E. coli. Two kinds of fluorescently labeled acceptor substrates (LacNAc derivatives) were designed and synthesized for cell entrapment. They were transported into the cell via the LacY transporter. The fucosylation reaction resulted in a trisaccharide product that accumulated inside of the cell as the LacY transport rate for such products was very low. Thus, fluorescence intensities of cells carried the information about the fucosylation efficiency allowing FACS-based screenings. Another possibility is to work with fluorogenic probes inside the cell that detect the substrate/product of the reaction. Sadler and colleagues (Sadler et al., 2018) have engineered a monoamine oxidase with improved activity towards a novel secondary amine substrate amine substrates based on the intracellular oxidation of a fluorogenic substrate by intracellular bacterial peroxidases. The authors expressed the amine oxidase variant library in E. coli and stained cells with a fluorogenic probe which is sensitive to oxidation by H2O2 (a byproduct of the monoamine oxidase reaction). The probe was introduced into cells with the help of freeze-thaw cycles and chemicals that increased membrane permeability. Finally, the drawback of substrate/product entrapment in the cell can also be overcome by the adoption of cell microencapsulation techniques that imitate natural cell compartmentalization as previously discussed in the Cell surface display section. As an example, a microdroplet-enabled FACS based screening strategy was also developed for improving cellulase activity. In this study, Ostafe et al. used double emulsions to encapsulate yeast cells expressing cellulase in microfluidic droplets (Ostafe et al., 2013). Glucose that was released by cellulose activity on carboxymethylcellulose substrate was detected via a hexose oxidase based coupled enzyme assay and enrichment of a positive population was demonstrated based on droplet fluorescence. https://advances.sciencemag.org/content/5/10/eaaw8451 https://pubs.rsc.org/en/content/articlelanding/2018/AN/C8AN00851E#!divAbstract

In vitro compartmentalization (IVC)

Microencapsulation can be performed in the absence of cells, directly producing the enzyme from the DNA library by in vitro protein translation. Such IVC offers the advantage of generating a large number of droplets per emulsion volume unit (>1010 in 1 mL of emulsion), coupled to the ease of preparing emulsions and their high stability to changes in temperature, pH, and salt concentrations (Aharoni et al., 2005). Additionally, IVC prevents the gene diversity loss derived from the cloning and transformation steps by in vitro protein translation of the DNA used to generate the mutant library; moreover, it reduces experimental time by skipping the cell growth steps associated with cloning, transformation, and protein production (Martínez and Schwaneberg, 2013).

Single-cell encapsulation

https://pubs.acs.org/doi/pdf/10.1021/acssynbio.9b00103 Cells are the most straightforward way of protein production, yet they often lack the ability to take up the desired screening substrate or to retain the nascent product. They can be co-encapsulated with all compounds required for the screening reaction (substrates, cell lysis agents, buffer etc.) in an emulsion droplet and the enzyme of interest can be released from the cell in situ. This is an elegant approach, as it combines the advantages of cells and emulsion compartments: that is, the cells produce the enzyme; the emulsion retains the genotype– phenotype connection. Although high-throughput sorting rates are feasible with emulsions in a conventional flow cytometer or in custom-made chip devices, the restriction of compartmentalizing a maximum of one cell per emulsion droplet by Poisson’s distribution causes most of the droplets to be empty. Therefore, the effective throughput is reduced. Nevertheless, from a sustainability point of view, microcompartments are superior to conventional assays, since only pico- or even femtoliters of reagents are used per sample as previously discussed by Agresti et al. Moreover, the compartmentalization of a single cell in such a small reaction volume typically results in high local enzyme concentrations during screening, which benefits the signal-to noise ratio. As previously discussed in the Fluorescent protein-based methods section, the stochastic nature of gene expression - phenotypic cell-to-cell variability - (Elowitz, 2002) can make FACS screenings misleading in some cases (Schaerli and Isalan, 2013). Imagine that due to noise, a single genotype (promoter sequence or synthetic circuit) can produce different fluorescence levels within the population. In this case, many cells will be sorted into different bins although having the same genotype. One way to tackle this problem is to use statistical methods after the sequencing step to find in which bin a single genotype is enriched (although it might be present in other bins). A second approach would be to select genotypes with high stochasticity, re-clone them and analyse the populational behaviour in both flow cytometry and plate reader. Lastly, an alternative method that has been gaining attention in recent years, is based on the gel encapsulation of microcolonies. With this technology, it is possible to use the high-throughput power of FACS for screening cell populations and circumventing the problem of high cell-to-cell variability (Weaver et al., 1991; Sahar et al., 1994; Zengler et al., 2002; Fischlechner et al., 2014; Meyer et al., 2015; Duarte et al., 2017).

Protocol and application:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6047729/pdf/emss-78570.pdf https://sci-hub.se/https://pubs.acs.org/doi/pdf/10.1021/acssynbio.7b00111

Review:

https://www.mdpi.com/2072-666X/10/11/734/htm