Skip to content

MAR univariate: uMAR Class

uMAR

uMAR

uMAR(X: pd.DataFrame, y: np.array, missing_rate: int = 10, x_miss: str = None)

A class to generate missing values in a dataset based on the Missing At Random (MAR) univariate mechanism.

Args: X (pd.DataFrame): The dataset to receive the missing data. y (np.array): The label values from dataset missing_rate (int, optional): The rate of missing data to be generated. Default is 10. x_miss (string): The name of feature to insert the missing data. If not informed, x_miss will be the feature most correlated with target

Example Usage:

# Create an instance of the MAR class
generator = MAR(X, y, missing_rate=20, x_miss='feature1')

# Generate missing values using the lowest strategy
data_md = generator.lowest()

lowest

lowest()

Function to generate missing values in the feature (x_miss) using the lowest values from an observed feature.

Returns: dataset (DataFrame): The dataset with missing values generated under the MAR mechanism.

Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.

rank

rank()

Function to generate missing values in the feature (x_miss) using a rank from an observed feature.

Returns: dataset (DataFrame): The dataset with missing values generated under the MAR mechanism.

Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.

median

median()

Function to generate missing data in the feature (x_miss) using the median of an observed feature.

Returns: dataset (DataFrame): The dataset with missing values generated under the MAR mechanism.

Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.

highest

highest()

Function to generate missing values in the feature (x_miss) using the highest values from an observed feature.

Returns: dataset (DataFrame): The dataset with missing values generated under the MAR mechanism.

Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.

mix

mix()

Function to generate missing values in the feature (x_miss) using the N/2 lowest values and N/2 highest values from an observed feature.

Returns: dataset (DataFrame): The dataset with missing values generated under the MAR mechanism.

Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.