MAR univariate example
A basic example of generate artificial missing data under Missing at Random (MAR) mechanism with mdatagen library with the Iris dataset from scikit-learn. The observed feature is "petal length" and the feature that will receive the missing values is "petal width". The simulated missing rate is 12%. The method to choose missing values is random.
In [1]:
Copied!
# Import the libraries
import pandas as pd
from sklearn.datasets import load_iris
from mdatagen.univariate.uMAR import uMAR
# Load the data
iris = load_iris()
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
X = iris_df.copy() # Features
y = iris.target # Label values
# Create a instance with missing rate equal to 12% in dataset under MAR mechanism
generator = uMAR(X=X,
y=y,
missing_rate=12,
x_miss='petal width (cm)',
x_obs='petal length (cm)')
# Generate the missing data under MNAR mechanism
generate_data = generator.random()
print(generate_data.isna().sum())
# Import the libraries
import pandas as pd
from sklearn.datasets import load_iris
from mdatagen.univariate.uMAR import uMAR
# Load the data
iris = load_iris()
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
X = iris_df.copy() # Features
y = iris.target # Label values
# Create a instance with missing rate equal to 12% in dataset under MAR mechanism
generator = uMAR(X=X,
y=y,
missing_rate=12,
x_miss='petal width (cm)',
x_obs='petal length (cm)')
# Generate the missing data under MNAR mechanism
generate_data = generator.random()
print(generate_data.isna().sum())
sepal length (cm) 0 sepal width (cm) 0 petal length (cm) 38 petal width (cm) 0 target 0 dtype: int64
In [ ]:
Copied!