African Data Community Newsletter. - Technical Writer, Public Speaker, Social Media Content creator, Machine Learning Thought Leader (Global AI Hub), Open- source & Community Advocate. Gift Ojeabulu CBBAnalytics @GiftOjeabulu_
to mitigate any project defects. Without validating data, you risk basing decisions on data with imperfections that are not accurately representative of the situation at hand.
learning models and data, and it enables doing so with minimal effort. Deep Checks accompany you through various validation needs such as: - Verifying your Data Integrity. - Inspect its distribution. - Validating data splits. - Evaluating your model & comparing different models.
new dataset: Validate New Data. 2. When you split the data (before training / various cross-validation split / hold-out test set/ …): Validate the Split. 3. When you evaluate a model: Validate Model Performance.
data and models, such as data drift, duplicate values, etc. Each check can have 2 types of results: - A visual result meant for display (e.g. a figure or a table) - A return value that can be used for validating the expected check results
can be installed using pip or conda, depending on the package manager you’re working with for most of your packages. - Using Pip - PIp Install Deepchecks - Using Conda - conda install -c conda-forge deepchecks - Using Google Colab or Kaggle Kernel - !pip install deepchecks --user
need to have is the data and model that you wish to validate. More specifically, you need: • Your train and test data (in Pandas DataFrames or Numpy Arrays) • (optional) A supported model (including XGBoost, scikit-learn models, and many more). Required for running checks that need the model’s predictions for running.