Session: 10-02: Photovoltaic, Photovoltaic-Thermal, and Electrochemical Technologies II
Paper Number: 156881
156881 - Feature Engineering for Optimizing Perovskite Solar Cells Using Perovskite Database Project Data
Abstract:
Abstract: This work presents a novel approach to optimizing perovskite solar cells using machine learning and data from the Perovskite Database Project (PDP). By utilizing Shapley Additive Explanations (SHAP) for interpretability, the study highlights key features influencing device efficiency and reliability. The research begins with extensive data cleaning, addressing formatting errors, converting categorical features such as solvents and chemicals into binary variables, and numerically encoding text features like device architectures. Noisy features with significant missing values are removed, while 'nan' values are imputed using K-Nearest Neighbors (KNN) imputation. The dataset is then preprocessed using the ColumnTransformer from Scikit-Learn to ensure categorical data is one-hot encoded.
Presently, many machine learning models use training data sets generated from simulations. The advantage of using simulation data is the large data volume and good interpretability, as the variables are physical parameters. For example, a study investigating ideal bromine doping concentrations in MASnI3-xBrx simulated 42,000 devices by varying five material parameters [1]. Similarly, 50,000 artificial J-V curves were used to train a neural network functioning as a surrogate model [2], and the same method was later adopted to optimize the GaAs deposition process based on 20,000 simulated J-V curves [3]. These studies investigated five parameters with a comparable number of simulations (~10^4), while a recent study used over 2 million simulated devices [4]. The results found that a random forest model outperformed a single-tree model, achieving high trust scores and over 80% prediction accuracy in distinguishing different recombination sources in solar cells. These studies indicate that the accuracy of prediction is influenced not just by the size of the dataset but also by its quality.
In this work, we leverage experimental data from the Perovskite Database Project, which has extracted up to 100 device parameters from more than 42,000 perovskite solar cells (PSCs). Compared to simulation data, published experimental data can significantly enhance prediction accuracy. With only a few hundred data points, prior research has found suitable electron transport layer (ETL) materials, optimized processing conditions, and identified material and method combinations that lead to stable PSCs [5]. The large dataset from the PDP is unprecedented and comparable to simulation data in terms of volume, but very few machine learning models have been applied to fully exploit this resource. Using this real data, material properties, device parameters, and processing conditions can all be correlated to power conversion efficiencies (PCEs) and device stabilities, providing comprehensive insights into device performance.
To identify the most relevant features in the PDP dataset, SHAP is used to rank them based on their contribution to performance metrics, such as PCE, short-circuit current density (Jsc), and open-circuit voltage (Voc). Important categories include materials (perovskite, ETL, HTL), solvents, anti-solvents, additives, and deposition methods. We further refine the feature space using Principal Component Analysis (PCA) for dimensionality reduction, allowing us to retain critical information while simplifying the model. Initial findings reveal significant insights into the behavior of key components. The SHAP results have identified features that impact efficiency and reliability, though these findings are generalized due to time constraints. Neptune.ai is employed for hyperparameter tuning to enhance model robustness and accuracy.
References:
Al Jame, H., et al., Supervised Machine Learning-Aided SCAPS-Based Quantitative Analysis for the Discovery of Optimum Bromine Doping in Methylammonium Tin-Based Perovskite (MASnI(3-x)Br(x)). Acs Applied Materials & Interfaces, 2022. 14(1): p. 502-516.
Ren, Z.K., et al. Physics-guided characterization and optimization of solar cells using surrogate machine learning model. in IEEE 46th Photovoltaic Specialists Conference (PVSC). 2019. Chicago, IL: IEEE.
Ren, Z.K., et al., Embedding physics domain knowledge into a Bayesian network enables layer-by-layer process innovation for photovoltaics. NPJ Computational Materials, 2020. 6(1).
Le Corre, V.M., et al., Identification of the dominant recombination process for perovskite solar cells based on machine learning. Cell Reports Physical Science, 2021. 2(2).
She, C.L., et al., Machine learning-guided search for high-efficiency perovskite solar cells with doped electron transport layers. Journal of Materials Chemistry A, 2021. 9(44): p. 25168-25177.
Presenting Author: Jiawei Gong Penn State Behrend
Presenting Author Biography: Dr. Jiawei Gong is an Associate Professor of Mechanical Engineering at Penn State Behrend. He earned his PhD (2017) and MS (2014) in Mechanical Engineering from North Dakota State University and a BE in Polymer Materials and Engineering from East China University of Science and Technology (2010). Dr. Gong's research interests include energy materials and devices, particularly dye-sensitized and perovskite solar cells. His recent research focuses on leveraging machine learning for data acquisition and analysis, aimed at advancing solar energy technologies.
Feature Engineering for Optimizing Perovskite Solar Cells Using Perovskite Database Project Data
Paper Type
Technical Presentation Only