MNAR multivariate: uMNAR Class
mMNAR
mMNAR
A class to generate missing values in a dataset based on the Missing Not At Random (MNAR) mechanism for multiple features simultaneously.
Args: X (pd.DataFrame): The dataset to receive the missing data. y (np.array): The label values from dataset missing_rate (int, optional): The rate of missing data to be generated. Default is 10.
Keyword Args: n_xmiss (int, optional): The number of features in the dataset that will receive missing values. Default is the number of features in dataset. threshold (float, optional): The threshold to select the locations in feature (xmiss) to receive missing values where 0 indicates de lowest and 1 highest values. Default is 0 missTarget (bool, optional): A flag to generate missing into the target.
Example Usage:
# Create an instance of the MNAR class
generator = MNAR(X, y)
# Generate missing values using the random strategy
data_md = generator.random()
random
Function to randomly choose the feature (x_miss) in dataset for generate missing data. The miss locations on x_miss is the lower values based on unobserved feature or feature x_miss itself.
Args: missing_rate (int, optional): The rate of missing data to be generated. Default is 10. deterministc (bool, optinal): A flag that determine if x_miss will have miss locations based on itself or an unobserved feature. Default is False (i.e., an unobserved feature).
Returns: dataset (DataFrame): The dataset with missing values generated under the MNAR mechanism.
Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.
correlated
Function to generate missing data in dataset based on correlated pair. The feature (x_miss) most correlated with the class for each pair will receive the missing data based on lower values of an unobserved feature or feature x_miss itself.
Args: missing_rate (int, optional): The rate of missing data to be generated. Default is 10. deterministc (bool, optinal): A flag that determine if x_miss will have miss locations based on itself or an unobserved feature. Default is False (i.e., an unobserved feature).
Returns: dataset (DataFrame): The dataset with missing values generated under the MNAR mechanism.
Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.
median
Function to generate missing data in all dataset based on median from each feature. The miss locations are chosen by lower values from a unobserved feature or feature x_miss itself.
Args: missing_rate (int, optional): The rate of missing data to be generated. Default is 10. deterministc (bool, optinal): A flag that determine if x_miss will have miss locations based on itself or an unobserved feature. Default is False (i.e., an unobserved feature).
Returns: dataset (DataFrame): The dataset with missing values generated under the MNAR mechanism.
Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.
MBOUV
Function to generate missing data based on Missigness Based on Own and Unobserved Values (MBOUV).
Args: missing_rate (int, optional): The rate of missing data to be generated. Default is 10.
Returns: dataset (DataFrame): The dataset with missing values generated under the MNAR mechanism.
Reference: [2] Pereira, R. C., Abreu, P. H., Rodrigues, P. P., Figueiredo, M. A. T., (2024). Imputation of data Missing Not at Random: Artificial generation and benchmark analysis. Expert Systems with Applications, 249(B), 123654.
MBOV_randomness
Function to generate missing data based on Missigness Based on Own Values (MBOV) using a randomess to choose miss locations in each feature.
Args: missing_rate (int, optional): The rate of missing data to be generated. Default is 10. randomness (float, optional): The randomness rate for choose miss locations. Default is 0 that represents lower values columns (list): A list of strings containing columns names.
Returns: dataset (DataFrame): The dataset with missing values generated under the MNAR mechanism.
Reference: [2] Pereira, R. C., Abreu, P. H., Rodrigues, P. P., Figueiredo, M. A. T., (2024). Imputation of data Missing Not at Random: Artificial generation and benchmark analysis. Expert Systems with Applications, 249(B), 123654.
MBOV_median
Function to generate missing data based on Missigness Based on Own Values (MBOV) using a median to choose miss locations in each feature.
Args: missing_rate (int, optional): The rate of missing data to be generated. Default is 10. columns (list): A list of strings containing columns names.
Returns: dataset (DataFrame): The dataset with missing values generated under the MNAR mechanism.
Reference: [2] Pereira, R. C., Abreu, P. H., Rodrigues, P. P., Figueiredo, M. A. T., (2024). Imputation of data Missing Not at Random: Artificial generation and benchmark analysis. Expert Systems with Applications, 249(B), 123654.
MBIR
Function to generate missing data based on Missingness Based on Intra-Relation (MBIR).
Args: missing_rate (int, optional): The rate of missing data to be generated. Default is 10. columns (list): A list of strings containing columns names. statistical_method (str, optional): A string to inform statistical method. The options are ["Mann-Whitney", "Bayesian"]. Default is Mann-Whitney
Returns: dataset (DataFrame): The dataset with missing values generated under the MNAR mechanism.
Reference: [2] Pereira, R. C., Abreu, P. H., Rodrigues, P. P., Figueiredo, M. A. T., (2024). Imputation of data Missing Not at Random: Artificial generation and benchmark analysis. Expert Systems with Applications, 249(B), 123654.