Skip to content

MCAR univariate: uMCAR Class

uMCAR

uMCAR

uMCAR(X: pd.DataFrame, y: np.array, missing_rate: int = 10, x_miss: str = None, method: str = 'random')

A class to generate missing values in a dataset based on the Missing Completely At Random (MCAR) univariate mechanism.

Args: X (pd.DataFrame): The dataset to receive the missing data. y (np.array): The label values from dataset missing_rate (int, optional): The rate of missing data to be generated. Default is 10. x_miss (string, optional): The name of feature to insert the missing data. method (str, optional): The method to choose x_miss. If x_miss not informed by user, x_miss will be choose randomly. The options to choose xmiss is ["random", "correlated", "min", "max"]. Default is "random"

Example Usage:

# Create an instance of the MCAR class
generator = MCAR(X, y, missing_rate=20, method="correlated")

# Generate missing values using the random strategy
data_md = generator.random()

random

random()

Function to randomly select locations in the feature (x_miss) to be missing.

Returns: dataset (DataFrame): The dataset with missing values generated under the MCAR mechanism.

Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.

binomial

binomial()

Function to choose the feature (x_miss) locations to be missing by Bernoulli distribution.

Returns: dataset (DataFrame): The dataset with missing values generated under the MCAR mechanism.

Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.