Skip to content

MNAR multivariate: uMNAR Class

mMNAR

mMNAR

mMNAR(X: pd.DataFrame, y: np.array, **kwargs)

A class to generate missing values in a dataset based on the Missing Not At Random (MNAR) mechanism for multiple features simultaneously.

Args: X (pd.DataFrame): The dataset to receive the missing data. y (np.array): The label values from dataset missing_rate (int, optional): The rate of missing data to be generated. Default is 10.

Keyword Args: n_xmiss (int, optional): The number of features in the dataset that will receive missing values. Default is the number of features in dataset. threshold (float, optional): The threshold to select the locations in feature (xmiss) to receive missing values where 0 indicates de lowest and 1 highest values. Default is 0 missTarget (bool, optional): A flag to generate missing into the target.

Example Usage:

# Create an instance of the MNAR class
generator = MNAR(X, y)

# Generate missing values using the random strategy
data_md = generator.random()

random

random(missing_rate: int = 10, deterministic: bool = False)

Function to randomly choose the feature (x_miss) in dataset for generate missing data. The miss locations on x_miss is the lower values based on unobserved feature or feature x_miss itself.

Args: missing_rate (int, optional): The rate of missing data to be generated. Default is 10. deterministc (bool, optinal): A flag that determine if x_miss will have miss locations based on itself or an unobserved feature. Default is False (i.e., an unobserved feature).

Returns: dataset (DataFrame): The dataset with missing values generated under the MNAR mechanism.

Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.

correlated

correlated(missing_rate: int = 10, deterministic: bool = False)

Function to generate missing data in dataset based on correlated pair. The feature (x_miss) most correlated with the class for each pair will receive the missing data based on lower values of an unobserved feature or feature x_miss itself.

Args: missing_rate (int, optional): The rate of missing data to be generated. Default is 10. deterministc (bool, optinal): A flag that determine if x_miss will have miss locations based on itself or an unobserved feature. Default is False (i.e., an unobserved feature).

Returns: dataset (DataFrame): The dataset with missing values generated under the MNAR mechanism.

Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.

median

median(missing_rate: int = 10, deterministic: bool = False)

Function to generate missing data in all dataset based on median from each feature. The miss locations are chosen by lower values from a unobserved feature or feature x_miss itself.

Args: missing_rate (int, optional): The rate of missing data to be generated. Default is 10. deterministc (bool, optinal): A flag that determine if x_miss will have miss locations based on itself or an unobserved feature. Default is False (i.e., an unobserved feature).

Returns: dataset (DataFrame): The dataset with missing values generated under the MNAR mechanism.

Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.

MBOUV

MBOUV(missing_rate: int = 10, depend_on_external=None, ascending=True)

Function to generate missing data based on Missigness Based on Own and Unobserved Values (MBOUV).

Args: missing_rate (int, optional): The rate of missing data to be generated. Default is 10.

Returns: dataset (DataFrame): The dataset with missing values generated under the MNAR mechanism.

Reference: [2] Pereira, R. C., Abreu, P. H., Rodrigues, P. P., Figueiredo, M. A. T., (2024). Imputation of data Missing Not at Random: Artificial generation and benchmark analysis. Expert Systems with Applications, 249(B), 123654.

MBOV_randomness

MBOV_randomness(missing_rate: int = 10, randomness: float = 0, columns: list = None)

Function to generate missing data based on Missigness Based on Own Values (MBOV) using a randomess to choose miss locations in each feature.

Args: missing_rate (int, optional): The rate of missing data to be generated. Default is 10. randomness (float, optional): The randomness rate for choose miss locations. Default is 0 that represents lower values columns (list): A list of strings containing columns names.

Returns: dataset (DataFrame): The dataset with missing values generated under the MNAR mechanism.

Reference: [2] Pereira, R. C., Abreu, P. H., Rodrigues, P. P., Figueiredo, M. A. T., (2024). Imputation of data Missing Not at Random: Artificial generation and benchmark analysis. Expert Systems with Applications, 249(B), 123654.

MBOV_median

MBOV_median(missing_rate: int = 10, columns: list = None)

Function to generate missing data based on Missigness Based on Own Values (MBOV) using a median to choose miss locations in each feature.

Args: missing_rate (int, optional): The rate of missing data to be generated. Default is 10. columns (list): A list of strings containing columns names.

Returns: dataset (DataFrame): The dataset with missing values generated under the MNAR mechanism.

Reference: [2] Pereira, R. C., Abreu, P. H., Rodrigues, P. P., Figueiredo, M. A. T., (2024). Imputation of data Missing Not at Random: Artificial generation and benchmark analysis. Expert Systems with Applications, 249(B), 123654.

MBIR

MBIR(missing_rate: int = 10, columns: list = None, statistical_method: str = 'Mann-Whitney')

Function to generate missing data based on Missingness Based on Intra-Relation (MBIR).

Args: missing_rate (int, optional): The rate of missing data to be generated. Default is 10. columns (list): A list of strings containing columns names. statistical_method (str, optional): A string to inform statistical method. The options are ["Mann-Whitney", "Bayesian"]. Default is Mann-Whitney

Returns: dataset (DataFrame): The dataset with missing values generated under the MNAR mechanism.

Reference: [2] Pereira, R. C., Abreu, P. H., Rodrigues, P. P., Figueiredo, M. A. T., (2024). Imputation of data Missing Not at Random: Artificial generation and benchmark analysis. Expert Systems with Applications, 249(B), 123654.