Skip to content

MCAR multivariate: mMCAR Class

mMCAR

mMCAR

mMCAR(X: pd.DataFrame, y: np.array, missing_rate: int = 10, missTarget: bool = False)

A class to generate missing data in a dataset based on the Missing Completely At Random (MCAR) mechanism for multiple features simultaneously.

Args: X (pd.DataFrame): The dataset to receive the missing data. y (np.array): The label values from dataset missing_rate (int, optional): The rate of missing data to be generated. Default is 10. missTarget (bool, optional): A flag to generate missing into the target.

Example Usage:

# Create an instance of the MCAR class
generator = MCAR(X, y, missing_rate=20)

# Generate missing values using the random strategy
data_md = generator.random()

random

random() -> pd.DataFrame

Function to randomly generate missing data in all dataset.

Returns: dataset (DataFrame): The dataset with missing values generated under the MCAR mechanism.

Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.

binomial

binomial(columns: list = None)

Function to generate missing data in columns by Bernoulli distribution for each attribute informed.

Args: columns (list): A list of strings containing columns names.

Returns: dataset (DataFrame): The dataset with missing values generated under the MCAR mechanism.

Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.