Introduction

Characterization of microstructural and nanoscale features in full 3D samples of materials is emerging to be a key challenge across a range of different technological applications. These microstructural features can range from grain size distribution in metals, voids and porosity in soft materials such as polymers to hierarchical structures and their distributions during self- and directed-assembly processes. It is well known that there is a strong correlation between microstructural/nanoscale features in materials and their observed properties. For the most part, however, grain size characterization is performed on 2D samples and the information from 2D slices is collated to derive the 3D microstructural information, which is inefficient and leads to potential loss of information. As such, a direct 3D classification approach for arbitrary polycrystalline microstructure is crucial and highly desirable, especially given the advancement in 3D characterization techniques such as tomography,1 high energy diffraction microscopy (HEDM),2 and coherent diffraction X-ray imaging.

Most industry relevant structural materials are polycrystalline in nature, and often contain thousands or millions of grains. Within each grain, the lattice arrangement of atoms is nearly identical, but the atomic orientations are different for each adjoining grain. Grain boundaries are interfaces where two grains or crystallites having different orientations meet without a disruption in the continuity of the material. Note that the thermodynamic equilibrium state of these polycrystalline materials is single crystal.3 It is, however, well known that materials are often arrested or trapped in local minima, i.e., in the polycrystalline state. Grain formation in polycrystalline films during their growth and processing is a complex process and is highly sensitive to several parameters such as temperature, deposition rate, dopant concentration, pressure, and impurity concentration to name a few. Nuclei when formed are nanoscopic – critical sizes start from tens of atoms – and lead to nanocrystalline solids that subsequently consolidate into larger grains. These ubiquitous phenomena, from “rare events” such as nucleation to the subsequent phase transformation in crystalline solids, lie at the heart of a spectrum of physico-chemical processes that govern nanoscale material transformation. They have been a fundamental problem in materials science and are also relevant to a broad range of energy applications.

Average grain size and grain distribution are critical microstructural features that impact several physical, mechanical, optical, chemical, and thermal properties to name a few, and represent fundamental quantities to characterize polycrystalline materials.4,5,6,7,8,9 For example, the Hall–Petch relationship10,11 states that the final average grain size after the transformation is directly related to the strength, hardness, stress–strain properties and fatigue of a material. Several previous investigations have shown that grain size distribution has a significant effect on mechanical properties. For example, Berbenni et al.12 showed that for a given average grain size, broadening of the grain size dispersion reduces the strength of a material. The classification and quantification of polycrystalline microstructure is therefore critically important in predicting material responses. A microstructural understanding is also important for the design and discovery of new materials with tailored properties, such as stronger materials that minimize fatigue failures of a machine component during their operation lifetime.

The ubiquitous connection between microstructure (mainly, grain-size distribution) of a material and its physical properties has motivated numerous studies on developing robust techniques to analyze microscopy/tomography images.13,14,15,16,17,18 ASTM outlines the industry standard for grain identification in 2D data,16 which consists of methods such as matching, planimetric, and intercept methods. These methods, albeit can achieve high accuracy (±0.25 grain size units) and reproducibility, can be severely impaired when the intersection criterion (for distinguishing grains) is poorly chosen or the grain-size distribution is non-uniform.16 In addition, these technique often require tedious manual measurements, and automation is challenging due variability in etching level or contrast differences although electron back scattering diffraction methods have been recently proposed to eliminate subjectivity surrounding existence/location of grain boundaries.15,19 Automated methods for grain identification in 2D data have been developed over the years. For example, there are supervised convolutional neural network (CNN) based methods,20 as well as unsupervised clustering or Voronoi based methods. Supervised methods once trained can achieve high accuracy, but the required prior training to target data makes them specific to the material system that they are trained for. Unsupervised methods based on a combination of histogram thresholding, watershed algorithms, and k-mean clustering can sometimes perform on par with supervised methods when a priori information (e.g., number of grains, crystal structure/orientation) and optimized hyperparameters are given, but in that case they inherent the same specificity of a material system due to the required information from specific dataset or experimental technique. Unsupervised methods that rely on just local density of atoms/electrons are applicable to a much wider range of material systems and experimental techniques, but at the expense of accuracy. Nevertheless, existing grain-analysis techniques are largely focused on 2D images and extending them to 3D images is not trivial. Extension of 2D based techniques to 3D is routinely done via stack of 2D image slices, which can be impacted by number of slices or orientation of slices and often leads to time-consuming processing. Evidently, a fast, general, reliable, and accurate way of identifying and analyzing grains in 3D images is still elusive.

With the advent of fourth generation synchrotron X-ray sources which possess extreme brightness and increased coherence, it has become possible to image materials over time in 3D (i.e., 4D imaging). Such advanced imaging is particularly invaluable especially when seeking information about material response under in-situ or operando conditions. For polycrystalline materials, a few imaging modalities including diffraction contrast tomography (DCT), Laue diffraction and HEDM have been used to create 3D maps of the polycrystalline state of the sample.21 Segmentation or the appropriation of the resulting image into discrete domains is often a challenge especially in tomographic images. When the contrast between regions or segments is faint, simple thresholding is often insufficient and more advanced techniques such as clustering, deformable models or gradient based techniques are required, which have been employed with varying degrees of success.21 The ability to rapidly and accurately segment images not just for polycrystalline materials but also to identify inclusions and precipitates within a matrix would be invaluable for real-time characterization of materials.

Here, we present a method that combines topology classification, image processing, and unsupervised machine learning including clustering algorithms to enable rapid microstructural characterization in 3D samples. Our method provides grain size distribution of samples derived from either simulations or experiments. We demonstrate the method on synthetic data of several representative polycrystal types – metals (fcc, bcc, hcp) and ice (hexagonal/cubic), as well as experimentally collected data of Ni-based superalloy. The method is insensitive to the presence of extended defect structures such as stacking faults and semi-amorphous domains which stymie standard classification methods. We have also extended the method to the characterization other microstructural features such as voids in porous materials22 (i.e., polymer matrices) and micellar distribution in complex solutions. The technique is computationally efficient and enables fast identification, tracking, and quantification of microstructural features that affect material properties. We envision this approach to be vital for future real time analysis of data obtained from large characterization facilities such as synchrotrons and broadly applicable to any 3D crystallographic data. The approach also enables characterization across a broad class of materials from polycrystalline inorganics such as metals and ceramic to soft materials such as polymers and self/directed assembled structures in complex fluids.

Results

Microstructural characterization

Figure 1 illustrates the major steps in our ML method for autonomous microstructural characterization. These steps can be loosely organized into three main processes, analogous to that in a data science process (i.e., data collection and cleaning, data analysis, and data finishing).

Fig. 1: A schematic showing the major steps of our ML method for autonomous microstructural characterization of 3D polycrystalline samples.
figure 1

a Identification of local structures using topological classifiers. b Voxelization improves the processing efficiency and enables image-based processing techniques. c Thresholding enhances the distinction between microstructures and boundaries. d Clustering algorithm identifies individual microstructures. e Refinement process improves the size estimation and distribution of identified microstructures. c An optional back-mapping step transforms voxel data back to atomistic representation.

Process 1: Preconditioning and topological classifiers

The first step in our microstructural analysis is to distinguish between the microstructures (e.g., grains) and their boundaries (Fig. 1a). For atomistic polycrystalline systems, this can be done via local structure identification using topological classifiers, such as common neighbor analysis (CNA) for fcc, bcc, hcp structure types that require topological information up to 1st nearest neighbors, and extended CNA for diamond (hexagonal/cubic) structure types that require up to 2nd nearest neighbors. These classifiers assign local structure labels to atoms based on their topological relationships with nearby neighbors. Unknown atom types (amorphous) or unlabeled atoms are typically excluded from the microstructural analysis. For soft materials, the labeling can be done via atom type assignment based on chemical elements, bond topology, and local charges, etc. Next, voxelization (Fig. 1b) is performed on the labeled (e.g., crystalline) atoms/beads, which makes possible efficient data preconditioning using standard image processing techniques. Lastly, preconditioning procedures such as image filters (e.g., uniform blur, local variance, etc.) and thresholding are applied on the voxelized data or experimental images to identify boundaries of microstructures (Fig. 1c).

Process 2: Unsupervised machine learning

Microstructural analysis is performed via clustering of the preconditioned voxels (Fig. 1d). Voxels of similar local structure labels are clustered. The number of clusters and their volumes, e.g., number of grains and their sizes, provide an estimate of the size distribution. Furthermore, individual microstructure is assigned a unique cluster label that can be utilized for visualization purpose. The choice of clustering algorithms (e.g., K-Means, DBSCAN, Mean-Shift, Gaussian mixture models) depends on the amount of pre-existing knowledge about the system, which can include the number of microstructures, characteristics of the boundaries, etc. Although in the results section, we demonstrate that density-based clustering algorithms (e.g., DBSCAN) can effectively handle all the tested polycrystal types and soft material systems even with limited pre-existing knowledge.

Process 3: Refinements and back-mapping

The number and size estimate of microstructures obtained from the unsupervised machine learning process can be improved via a refinement step. Techniques such as label propagation and label spreading can effectively be used to assign cluster label to unlabeled voxels/atoms nearby the boundaries (Fig. 1e). This step recovers information that might have been lost during the preconditioning process (e.g., thresholding and blur filters). Finally, for atomistic systems, a quick back-mapping based on the spatial relationship between voxels and atom coordinates can be used to transform voxels back to their corresponding atomistic representation (Fig. 1f).

Microstructural characterization applied to example systems

To demonstrate the generality of the described approach, we apply our ML method for the characterization of microstructural features in both polycrystalline materials and in soft materials such as polymers and micelles. In the former, the goal is to characterize the grain size distribution in 3D polycrystalline samples whereas in the latter, the ML algorithm is used to identify porosity and voids in soft materials such as polymer matrix and micellar distribution during a typical aggregation process in complex fluids. To adapt the ML method for these systems, mainly the preconditioning process (local structure classification, voxelization bin size, etc.) needs to be customized, and the details are discussed case-by-case. Below, we first describe our approach for the clustering and refinement processes.

In all the above described systems, the number of microstructures is not known, and the microstructures can be irregularly shaped, so we choose to use a local density-based clustering algorithm, DBSCAN, for the microstructural analysis. The algorithm has two hyperparameters in its clustering criterion: neighborhood cutoff (ε) and the minimum required number of neighbors (Nmin). For simplicity, we define ε to include only 1st nearest voxels of each voxel and start with the strictest criterion (Nmin = 27 for 3D or 9 for 2D) and loosen it until the total number of clusters is maximized. Refinement of the clusters is done by assigning unlabeled voxels to neighboring cluster labels of maximum occurrence, with priority given to unlabeled voxels close to smaller microstructures. Finally, to recover an atomistic representation from voxels, atoms are assigned cluster labels of the voxels that they are located in.

Case 1: Grain size distribution in metal polycrystals

Four metal samples (aluminum, iron, silicon, and titanium) are chosen as representatives of common polycrystal types (fcc, bcc, diamond, and hcp). For benchmarking, we prepared synthetic polycrystalline samples with a known size distribution (see Methods). The preconditioning process begins with local structure identification of atoms using standard CNA for fcc, bcc, and hcp structures, and extended CNA for diamond structures. The atoms are classified as either “crystalline” or “boundary” types. Voxelization of atoms is done based on number densities of crystalline atoms using a uniform bin size (4.5 Å for fcc Al, 4.1 Å for bcc Fe, 4.0 Å for diamond Si, and 4.4 Å for hcp Ti). A 40-percentile thresholding of non-zero voxels is applied in all samples to exclude grain boundary voxels from the clustering process.

Results of the ML grain analysis for the four metal samples are shown in Fig. 2. For each polycrystal type, a plot shows the target (in red) and predicted (in blue) grain size distributions sampled using gaussian kernel density estimation. The snapshots next to each plot visualize the polycrystallinity of these samples, where individual grains are colored by their sizes (smallest in red, largest in blue). Comparison between the target and predicted distributions indicates that our unsupervised method has achieved >94% accuracy in predicting the number of grains, and correctly identifying grains that are larger than ~200 atoms in size.

Fig. 2: Application of our ML method on several representative polycrystalline metal samples.
figure 2

Each of the samples (aluminum, iron, silicon, and titanium) is ~20 nm × 20 nm × 20 nm in size (~500,000 atoms). All samples have 300 grains. The plots show the target (in red) and predicted (in blue) grain size distributions. The distributions are normalized such that the shared area equates to the total number of grains. Polycrystallinity of these samples are visualized by snapshots shown next to the plots, where individual grains are colored by their sizes (smallest in red, largest in blue). The sample set consists of common polycrystal types: a fcc, b bcc, c diamond, d hcp.

Case 2: In situ visualization and 3D analysis of simulation trajectories

The high computational efficiency of our ML method makes it suitable for in-situ post-processing of molecular dynamics (MD) trajectories. To demonstrate this, we apply the grain analysis on the entire > 1 µs MD simulation trajectory (with a frame every 0.1 ns) of a polycrystalline ice sample, which was previously performed using a coarse-grained (CG) model of water.23 The preconditioning process is similar to that of diamond Si, where extended CNA was used to identify hexagonal, cubic, and stacking disordered phases of ice. Due to the larger sizes of CG beads compared to atoms, a larger bin size of 5 Å was used in the voxelization process. The voxelization is done based on number densities of cubic and hexagonal beads.

Figure 3 shows two representative snapshots at t = 330 ns (smaller grains) and t = 669 ns (larger grains). The bottom left of each snapshot shows the result of the grain analysis, where individual grains are colored by their sizes. Despite performing the analysis on an uncorrelated frame-by-frame basis, the coloring is relatively consistent due to the sorting by grain sizes. However, changes in number of grains across frames can lead to inconsistent assignment of cluster labels, which makes it difficult to isolate one grain and track its time evolution. We envision this to be resolved in future works by introducing correlation across frames based on spatial proximity and lattice orientation of the individual grains.

Fig. 3: Demonstration of our unsupervised ML grain analysis method on large-scale MD simulations.
figure 3

Snapshots from a 2-million molecules simulation of polycrystalline ice performed using a CG model of water.23 The right side of each snapshot shows the hexagonal/cubic stacking disordered ice grains and their grain boundaries. Bottom left of each snapshot shows the result of the grain analysis, where individual grains are colored by their sizes.

Case 3: Characterization of porosity and voids in polymer matrix

The described ML based approach can be easily extended for void analysis in porous material samples. To demonstrate this, high density polysiloxane and polyethylene samples were prepared (see Methods). These samples were equilibrated and densified using MD simulations. The preconditioning process is simply voxelization of the system based on the number densities of the polymer atoms. A bin size of 3 Å was used in the voxelization step to sample larger void spaces, although smaller bin sizes (higher resolution) can be used to sample much smaller spaces. Results of the void analysis are shown in Fig. 4. The method can handle large voids (Fig. 4a) as well as small voids (Fig. 4c). The method provides size distributions of voids, such as those shown in Fig. 4b, d, which can be used to characterize the porosity nature of the matrix samples.

Fig. 4: Demonstration of our unsupervised ML method on the analysis of voids in polymeric systems.
figure 4

The figure shows polysiloxane sample (top) and polyethylene sample (bottom). a, c Snapshots from atomistic MD simulations showing the identified void spaces. Individual voids are colored by their sizes. b, d Plots showing the size distribution of the voids. The distributions are normalized such that the shaded area equates to the total number of voids.

Case 4: Characterization of micellar size distribution in complex fluids

The described ML based approach is also suitable for the structural analysis of hierarchical soft materials in complex fluids. The dynamics of ions and mesoscale structure in complex organic fluids is a fascinating fundamental science problem with deep implications for many important energy, chemical, and biological systems. Many recent studies24,25,26,27 have indicated that ion dynamics and transport can be strongly influenced by the hierarchical mesoscale ordering and internal interfaces that often occur in these systems.28,29 The formation of such hierarchical structures provides a broad opportunity to design new materials with outstanding performance for diverse applications such as battery electrolytes, MRI contrast reagents, sensors, catalysts, and solvent extraction systems.30 Although the equilibrium structure and phase behavior of complex fluids has been the subject of much study, there is a need to characterize the dynamics to understand and control ion transport, complexation, and aggregation processes. Here, we use our ML algorithm to characterize the micellar size distribution during the aggregation process in a 3D colloidal sample obtained from molecular simulation trajectory.

To demonstrate this, we obtained a configuration of reverse micelles from CG MD simulations (see Methods). The preconditioning process includes voxelization of the system based on the number densities of water beads. Due to the high 4:1 CG ratio of this model, a large bin size of 8 Å is used in the voxelization step. Figure 5a shows the water clusters within individual equilibrated micelles colored by their sizes, and Fig. 5b shows the micellar size distribution as a function of these water cluster sizes.

Fig. 5: Demonstration of our unsupervised ML method on the size distribution analysis of reverse micelles in a complex solution.
figure 5

a Snapshots from CG MD simulations showing cluster of water beads within individual micelles colored by their sizes. b Plot showing the size distribution of the micelles as a function of the water cluster sizes. The distribution is normalized such that the shaded area equates to the total number of micelles.

Case 5: Grain size distribution in superalloy sample from experiment

The described ML based approach can be applied to images collected from experiments. Unlike voxelized atomistic data, 3D images obtained from experimental characterization techniques, such as tomography and coherent diffraction X-ray imaging, contain more noise and artifacts. Furthermore, these images can be of bright-field or dark-field types and the grains can span a range of pixel/voxel intensity values, which require grain boundaries detection techniques beyond thresholding with just one cutoff. Figure 6 shows examples of such images and demonstrates the use of a local variance filter to effectively identify boundaries of microstructures. Our method utilizes the same boundary detection method as outlined earlier. Figure 7 demonstrates the use of our method on an IN100 Ni-based superalloy sample collected from serial-sectioning experiments. The processing pipeline on such data is illustrated in Fig. 7a and the resulting grain size distribution is shown in Fig. 7b–d demonstrate the same processing pipeline applied to input images of lower resolutions, which result in significant speedup in processing albeit at the expense of lower feature detection resolution.

Fig. 6: Examples illustrating the use of local variance filter for grain boundary identification.
figure 6

2D images of polycrystalline grain samples are reproduced with permission from Campbell et al.36 and Groeber et al.15 The method can handle both bright field and dark field images and is only sensitive to local variance of pixel intensity which eliminates problems associated with direct thresholding based on absolute pixel intensity.

Fig. 7: Demonstration of our unsupervised ML method on grain identification of an IN100 Ni-based superalloy sample collected from serial-sectioning experiments.
figure 7

a 3D input image reconstructed from electron backscatter diffraction (EBSD) data15 and the corresponding target grain segmentation labeled using inverse pole figure (IPF) coloring. In our method, the input image is pre-processed using a local variance filter and thresholding prior to the clustering and refinement step. b The predicted grain size distribution and grain segmentation obtained using our method. Boundary and unidentified voxels are colored by green and gray. c, d Lower resolution input images obtained by down sampling and the corresponding grain segmentation predicted by our method. The effect of down sampling is analogous to using a large bin size in the voxelization step for atomistic data. Down sampling significantly speeds up the processing but at the expense of accuracy (i.e., ability to detect small and fine features).

Discussion

The robustness of our microstructural analysis method can be assessed by the deviation in results upon introducing variations to the data and hyperparameters at each of the major steps in the processing pipeline (Fig. 1). The cascade of operations leads to the possibility of errors earlier in the processing pipeline propagating downstream. Out of the 6 major steps, only the first 4 steps (i.e., local structure classification, voxelization, thresholding, and clustering) are likely to be affected by the variations in data or the choice of hyperparameters. Furthermore, hyperparameters in the clustering step can be optimized on-the-fly based on the number of identified clusters.

Here, we provide a quantitative assessment of error sensitivities associated with the remaining 3 steps. We use the fcc Al system as an example and manually introducing variations to the data and hyperparameters. Figure 8a–c shows snapshots of a sample with ~0%, 15%, and 25% randomly perturbed local structure labels. These incorrect labels in atomistic data affect the identification of grain boundaries (i.e., crystalline versus amorphous atoms), which is analogous to introducing noise to images from experimental measurements. The grain size distribution plots in Fig. 8d–e demonstrate that our method is resilient to such noise. For instance, the method can handle up to ~25% variation in the fcc Al data when the voxelization bin size is 5.5 Å, and up to ~15% data variation when the bin size is 4.5 Å. This robustness is attributed to the various down-sampling operations (e.g., voxelization and local variance/uniform filters) and the use of a density-based clustering algorithm that can handle noise (i.e., DBSCAN). As the voxelization bin size increases, the amount of data averaging increases, which makes it more resilience to variations in the data. There is, however, a trade-off. Larger bin size also leads to more efficient processing (Fig. 7c, d), but this comes at the expense of losing information of small features or fine details in the grain size distribution. Also note that the use of local variance filter for boundary identification alleviates the sensitivity of the method to different bin sizes, but the method remains sensitive to the thresholding cutoff value in the thresholding step. We found that a 90-percentile thresholding of non-zero voxels works well for atomistic data from simulations, and a 40-percentile thresholding generally works for images from experiment. This thresholding cutoff, similar to the hyperparameters (ε and Nmin) associated with the DBSCAN clustering algorithm, can potentially be optimized on-the-fly based on the number of identified grains. Future studies might investigate techniques analogous to Otsu thresholding for choosing such cutoff in a deterministic manner.

Fig. 8: Error sensitivity of our unsupervised ML method on grain identification.
figure 8

a–c Snapshots of the atomistic fcc Al sample with varying number of randomly perturbed local structure labels. df Grain size distribution plots of the system. Local variance filter with a 90-percentile thresholding is used for grain boundary identification, which alleviates the error sensitivity of the method to different voxelization bin sizes. Plots from left to right correspond to the amount of data variation in ac, whereas from top to bottom, the voxelization bin size changes from 5.5 Å to 4.5 Å to 3.5 Å. As the bin size increases, the method become more resilient to variations in data due to more data averaging from down-sampling. This however comes at the expense of losing fine structures in the grain size distribution.

The efficiency of our method can be analyzed based on the time complexity of the steps in the workflow. Excluding the time that it takes to load the input data, the major time-consuming steps are voxelization, clustering, and refinement. The voxelization step has a time complexity of O(n) since each atom/bead is processed once during the conversion into voxels. However, this operation provides a significant time saving in return for the remaining steps in the workflow since the voxelized system is typically ~25% of the original system size which is further reduced via subsequent preconditioning and thresholding. The clustering step, in particular DBSCAN clustering in 3D space, has a typical time complexity of O(n log(n)), where n is the remaining number of voxels after thresholding in the preconditioning process. The clustering step is supported by a k-d tree (incorporating periodic boundary conditions) for fast nearest neighbor search, which has a worst-case time complexity of O(n log(n)) to build and average time complexity of O(log(n)) for each neighbor query. The same k-d tree is used in the refinement step, where repeated query is performed on voxels with no cluster labels. The time spent in the refinement step varies depending on nature of the (grain) boundaries.

In conclusion, we summarized the importance and challenges in microstructural analysis of polycrystalline samples in full 3D and outlined an unsupervised ML approach to solve this major problem. Our ML method starts with data preconditioning using local structure topological classifier, voxelization, and image processing. An unsupervised ML clustering algorithm was then used to obtain statistics and distribution of the microstructures. Finally, techniques such as label spreading was used to refine the results and back-mapping is performed to recover the atomistic representation. We demonstrated the efficacy of our method on several different classes of materials ranging from polycrystalline solids to soft materials such as polymers and complex fluids. The technique is applicable for the characterization of grain size distribution, voids, porosity and similar microstructural features across a broad class of inorganic and soft material systems. The technique can be applied to synthetic data samples, as well as experimentally measured data. We also highlighted the computational efficiency and error sensitivity of the method and emphasized its suitability for future real time analysis of data from large characterization facilities.

Methods

Polycrystal sample preparation

Synthetic polycrystalline metal samples of a fixed number of grains (300) were prepared using Voronoi tessellation. Each sample is ~20 nm × 20 nm × 20 nm in size (~500,000 atoms) with periodic boundaries applied in the x-, y-, and z- directions. Grain size distribution curves of these samples were obtained for the purpose of benchmarking, where atoms at the grain (Voronoi) boundaries were identified and excluded from the grain size distribution curves to provide a more accurate grain size count. The identification of boundary atoms was done using standard CNA for fcc, bcc, and hcp lattice types and extended CNA for diamond lattice types.

Polycrystalline ice simulation

Polycrystalline ice samples were obtained from previously performed CG MD simulations of homogeneous nucleation runs23 using LAMMPS.31 The sample size is ~40 nm × 40 nm × 40 nm (~2-million water molecules) and the microstructure analysis was performed on the entire trajectory for up to t = 1.2 μs for a frame every 0.1 ns.

Polymer sample preparation

Two types of polymer matrix samples, polysiloxane and polyethylene, were prepared using atomistic fixed bond models. The sample sizes were ~5 nm × 6 nm × 6 nm box (~17k atoms) and ~8 nm × 9 nm × 8 nm box (~33k atoms), respectively. These samples were minimized and equilibrated for up to 200 ns in LAMMPS31 using an empirical class2 potential with parameters from the COMPASS and PCFF force fields.32,33 An isothermal-isobaric (NPT) ensemble at T = 300 K and varying pressures was used to densify the samples.

Micelle sample preparation

A sample of complex solution was prepared using a CG model (4:1 mapping). The sample size was ~82 nm × 82 nm × 90 nm, containing 125,000 water molecules, 1,500,000 dodecane molecules, and 120,400 surfactant-like molecules. The sample was minimized and equilibrated for up to 200 ns in NAMD34 using the MARTINI force field35 to obtain a configuration of reverse micelles. An isothermal-isobaric (NPT) ensemble at T = 300 K and P = 1 bar was used for the equilibration.