Skip to content

MAR multivariate: mMAR Class

mMAR

mMAR

mMAR(X: pd.DataFrame, y: np.array, n_xmiss: int = 2, missTarget: bool = False)

A class to generate missing data in a dataset based on the Missing At Random (MAR) mechanism for multiple features simultaneously.

Args: X (pd.DataFrame): The dataset to receive the missing data. y (np.array): The label values from dataset n_xmiss (int): The number of features in the dataset that will receive missing values. Default is 2. missTarget (bool, optional): A flag to generate missing into the target.

Example Usage:

# Create an instance of the MAR class
generator = MAR(X, y, n_xmiss=4)

# Generate missing values using the random strategy
data_md = generator.random(missing_rate = 20)

random

random(missing_rate: int = 10) -> pd.DataFrame

Function to generate arficial missing data in n_xmiss features chosen randomly. The lower values in observed feature for each feature x_miss will determine the miss locations in x_miss.

Args: missing_rate (int, optional): The rate of missing data to be generated. Default is 10.

Returns: dataset (DataFrame): The dataset with missing values generated under the MAR mechanism.

Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.

correlated

correlated(missing_rate: int = 10) -> pd.DataFrame

Function to generate missing data in features from dataset, except the class (target). The lower values in observed feature for each correlated pair will determine the miss locations in feature x_miss.

Args: missing_rate (int, optional): The rate of missing data to be generated. Default is 10.

Returns: dataset (DataFrame): The dataset with missing values generated under the MAR mechanism.

Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.

median

median(missing_rate: int = 10) -> pd.DataFrame

Function to generate missing data in features from dataset. The median in observed feature for each correlated pair will create two groups. The group is chosen randomly, and lower values will determine the miss locations in feature x_miss.

Args: missing_rate (int, optional): The rate of missing data to be generated. Default is 10.

Returns: dataset (DataFrame): The dataset with missing values generated under the MAR mechanism.

Reference: [1] Santos, M. S., R. C. Pereira, A. F. Costa, J. P. Soares, J. Santos, and P. H. Abreu. 2019. Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access 7: 11651–67.