a format that can be provided to machine learning algorithms, which typically require numerical input. This is a scikit-learn class used to convert categorical variables into a numerical form that can be more easily used by machine learning models. It does this by creating a new binary column for each unique category in the original column. During data transformation, it may occur that categories appear that were not present in the original data set during encoder training. By default, OneHotEncoder returns a sparse array to save memory, especially useful when there are many categories. However, by setting sparse=False, the encoder will return a dense array of NumPy. Primera Imputación incluyendo la columna ‘soil’ -> Preprocesamiento