MAR univariate: uMAR Class
uMAR
uMAR
A class to generate missing values in a dataset based on the Missing At Random (MAR) univariate mechanism.
Args: X (pd.DataFrame): The dataset to receive the missing data. y (np.array): The label values from dataset missing_rate (int, optional): The rate of missing data to be generated. Default is 10. x_miss (string): The name of feature to insert the missing data. If not informed, x_miss will be the feature most correlated with target
Example Usage:
# Create an instance of the MAR class
generator = MAR(X, y, missing_rate=20, x_miss='feature1')
# Generate missing values using the lowest strategy
data_md = generator.lowest()
lowest
Function to generate missing values in the feature (x_miss) using the lowest values from an observed feature.
Returns: dataset (DataFrame): The dataset with missing values generated under the MAR mechanism.
Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.
rank
Function to generate missing values in the feature (x_miss) using a rank from an observed feature.
Returns: dataset (DataFrame): The dataset with missing values generated under the MAR mechanism.
Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.
median
Function to generate missing data in the feature (x_miss) using the median of an observed feature.
Returns: dataset (DataFrame): The dataset with missing values generated under the MAR mechanism.
Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.
highest
Function to generate missing values in the feature (x_miss) using the highest values from an observed feature.
Returns: dataset (DataFrame): The dataset with missing values generated under the MAR mechanism.
Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.
mix
Function to generate missing values in the feature (x_miss) using the N/2 lowest values and N/2 highest values from an observed feature.
Returns: dataset (DataFrame): The dataset with missing values generated under the MAR mechanism.
Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.