MCAR univariate: uMCAR Class
uMCAR
uMCAR
uMCAR(X: pd.DataFrame, y: np.array, missing_rate: int = 10, x_miss: str = None, method: str = 'random')
A class to generate missing values in a dataset based on the Missing Completely At Random (MCAR) univariate mechanism.
Args: X (pd.DataFrame): The dataset to receive the missing data. y (np.array): The label values from dataset missing_rate (int, optional): The rate of missing data to be generated. Default is 10. x_miss (string, optional): The name of feature to insert the missing data. method (str, optional): The method to choose x_miss. If x_miss not informed by user, x_miss will be choose randomly. The options to choose xmiss is ["random", "correlated", "min", "max"]. Default is "random"
Example Usage:
# Create an instance of the MCAR class
generator = MCAR(X, y, missing_rate=20, method="correlated")
# Generate missing values using the random strategy
data_md = generator.random()
random
Function to randomly select locations in the feature (x_miss) to be missing.
Returns: dataset (DataFrame): The dataset with missing values generated under the MCAR mechanism.
Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.
binomial
Function to choose the feature (x_miss) locations to be missing by Bernoulli distribution.
Returns: dataset (DataFrame): The dataset with missing values generated under the MCAR mechanism.
Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.