Pandas как найти nan

You have a couple of options.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10,6))
# Make a few areas have NaN values
df.iloc[1:3,1] = np.nan
df.iloc[5,3] = np.nan
df.iloc[7:9,5] = np.nan

Now the data frame looks something like this:

          0         1         2         3         4         5
0  0.520113  0.884000  1.260966 -0.236597  0.312972 -0.196281
1 -0.837552       NaN  0.143017  0.862355  0.346550  0.842952
2 -0.452595       NaN -0.420790  0.456215  1.203459  0.527425
3  0.317503 -0.917042  1.780938 -1.584102  0.432745  0.389797
4 -0.722852  1.704820 -0.113821 -1.466458  0.083002  0.011722
5 -0.622851 -0.251935 -1.498837       NaN  1.098323  0.273814
6  0.329585  0.075312 -0.690209 -3.807924  0.489317 -0.841368
7 -1.123433 -1.187496  1.868894 -2.046456 -0.949718       NaN
8  1.133880 -0.110447  0.050385 -1.158387  0.188222       NaN
9 -0.513741  1.196259  0.704537  0.982395 -0.585040 -1.693810
  • Option 1: df.isnull().any().any() — This returns a boolean value

You know of the isnull() which would return a dataframe like this:

       0      1      2      3      4      5
0  False  False  False  False  False  False
1  False   True  False  False  False  False
2  False   True  False  False  False  False
3  False  False  False  False  False  False
4  False  False  False  False  False  False
5  False  False  False   True  False  False
6  False  False  False  False  False  False
7  False  False  False  False  False   True
8  False  False  False  False  False   True
9  False  False  False  False  False  False

If you make it df.isnull().any(), you can find just the columns that have NaN values:

0    False
1     True
2    False
3     True
4    False
5     True
dtype: bool

One more .any() will tell you if any of the above are True

> df.isnull().any().any()
True
  • Option 2: df.isnull().sum().sum() — This returns an integer of the total number of NaN values:

This operates the same way as the .any().any() does, by first giving a summation of the number of NaN values in a column, then the summation of those values:

df.isnull().sum()
0    0
1    2
2    0
3    1
4    0
5    2
dtype: int64

Finally, to get the total number of NaN values in the DataFrame:

df.isnull().sum().sum()
5

Improve Article

Save Article

Like Article

  • Read
  • Discuss
  • Improve Article

    Save Article

    Like Article

    NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special floating-point value and cannot be converted to any other type than float. 

    NaN value is one of the major problems in Data Analysis. It is very essential to deal with NaN in order to get the desired results.

    Check for NaN Value in Pandas DataFrame

    The ways to check for NaN in Pandas DataFrame are as follows: 

    • Check for NaN with isnull().values.any() method
    • Count the NaN Using isnull().sum() Method
    • Check for NaN Using isnull().sum().any() Method
    • Count the NaN Using isnull().sum().sum() Method

    Method 1: Using isnull().values.any() method

    Example: 

    Python3

    import pandas as pd

    import numpy as np

    num = {'Integers': [10, 15, 30, 40, 55, np.nan,

                        75, np.nan, 90, 150, np.nan]}

    df = pd.DataFrame(num, columns=['Integers'])

    check_nan = df['Integers'].isnull().values.any()

    print(check_nan)

    Output: 

    True

    It is also possible to get the exact positions where NaN values are present. We can do so by removing .values.any() from isnull().values.any() . 

    Python3

    Output: 

    0     False
    1     False
    2     False
    3     False
    4     False
    5      True
    6     False
    7      True
    8     False
    9     False
    10     True
    Name: Integers, dtype: bool

    Method 2: Using isnull().sum() Method

    Example: 

    Python3

    import pandas as pd

    import numpy as np

    num = {'Integers': [10, 15, 30, 40, 55, np.nan,

                        75, np.nan, 90, 150, np.nan]}

    df = pd.DataFrame(num, columns=['Integers'])

    count_nan = df['Integers'].isnull().sum()

    print('Number of NaN values present: ' + str(count_nan))

    Output:

    Number of NaN values present: 3

    Method 3: Using isnull().sum().any() Method

    Example: 

    Python3

    import pandas as pd

    import numpy as np

    nums = {'Integers_1': [10, 15, 30, 40, 55, np.nan, 75,

                           np.nan, 90, 150, np.nan],

            'Integers_2': [np.nan, 21, 22, 23, np.nan, 24, 25,

                           np.nan, 26, np.nan, np.nan]}

    df = pd.DataFrame(nums, columns=['Integers_1', 'Integers_2'])

    nan_in_df = df.isnull().sum().any()

    print(nan_in_df)

    Output: 

    True

    To get the exact positions where NaN values are present, we can do so by removing .sum().any() from isnull().sum().any() . 

    Method 4: Using isnull().sum().sum() Method

    Example: 

    Python3

    import pandas as pd

    import numpy as np

    nums = {'Integers_1': [10, 15, 30, 40, 55, np.nan, 75,

                           np.nan, 90, 150, np.nan],

            'Integers_2': [np.nan, 21, 22, 23, np.nan, 24, 25,

                           np.nan, 26, np.nan, np.nan]}

    df = pd.DataFrame(nums, columns=['Integers_1', 'Integers_2'])

    nan_in_df = df.isnull().sum().sum()

    print('Number of NaN values present: ' + str(nan_in_df))

    Output:

    Number of NaN values present: 8

    Last Updated :
    30 Jan, 2023

    Like Article

    Save Article

    Here are 4 ways to check for NaN in Pandas DataFrame:

    (1) Check for NaN under a single DataFrame column:

    df['your column name'].isnull().values.any()
    

    (2) Count the NaN under a single DataFrame column:

    df['your column name'].isnull().sum()
    

    (3) Check for NaN under an entire DataFrame:

    df.isnull().values.any()
    

    (4) Count the NaN under an entire DataFrame:

    df.isnull().sum().sum()
    

    (1) Check for NaN under a single DataFrame column

    In the following example, we’ll create a DataFrame with a set of numbers and 3 NaN values:

    import pandas as pd
    import numpy as np
    
    data = {'set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan]}
    df = pd.DataFrame(data)
    print (df)
    

    You’ll now see the DataFrame with the 3 NaN values:

        set_of_numbers
    0              1.0
    1              2.0
    2              3.0
    3              4.0
    4              5.0
    5              NaN
    6              6.0
    7              7.0
    8              NaN
    9              8.0
    10             9.0
    11            10.0
    12             NaN
    

    You can then use the following template in order to check for NaN under a single DataFrame column:

    df['your column name'].isnull().values.any()

    For our example, the DataFrame column is ‘set_of_numbers.’

    And so, the code to check whether a NaN value exists under the ‘set_of_numbers’ column is as follows:

    import pandas as pd
    import numpy as np
    
    data = {'set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan]}
    df = pd.DataFrame(data)
    
    check_for_nan = df['set_of_numbers'].isnull().values.any()
    print (check_for_nan)
    

    Run the code, and you’ll get ‘True’ which confirms the existence of NaN values under the DataFrame column:

    True
    

    And if you want to get the actual breakdown of the instances where NaN values exist, then you may remove .values.any() from the code. So the complete syntax to get the breakdown would look as follows:

    import pandas as pd
    import numpy as np
    
    data = {'set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan]}
    df = pd.DataFrame(data)
    
    check_for_nan = df['set_of_numbers'].isnull()
    print (check_for_nan)
    

    You’ll now see the 3 instances of the NaN values:

    0     False
    1     False
    2     False
    3     False
    4     False
    5      True
    6     False
    7     False
    8      True
    9     False
    10    False
    11    False
    12     True
    

    Here is another approach where you can get all the instances where a NaN value exists:

    import pandas as pd
    import numpy as np
    
    data = {'set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan]}
    df = pd.DataFrame(data)
    
    df.loc[df['set_of_numbers'].isnull(),'value_is_NaN'] = 'Yes'
    df.loc[df['set_of_numbers'].notnull(), 'value_is_NaN'] = 'No'
    
    print (df)
    

    You’ll now see a new column (called ‘value_is_NaN’), which indicates all the instances where a NaN value exists:

        set_of_numbers  value_is_NaN
    0              1.0            No
    1              2.0            No
    2              3.0            No
    3              4.0            No
    4              5.0            No
    5              NaN           Yes
    6              6.0            No
    7              7.0            No
    8              NaN           Yes
    9              8.0            No
    10             9.0            No
    11            10.0            No
    12             NaN           Yes
    

    (2) Count the NaN under a single DataFrame column

    You can apply this syntax in order to count the NaN values under a single DataFrame column:

    df['your column name'].isnull().sum()

    Here is the syntax for our example:

    import pandas as pd
    import numpy as np
    
    data = {'set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan]}
    df = pd.DataFrame(data)
    
    count_nan = df['set_of_numbers'].isnull().sum()
    print ('Count of NaN: ' + str(count_nan))
    

    You’ll then get the count of 3 NaN values:

    Count of NaN: 3
    

    And here is another approach to get the count:

    import pandas as pd
    import numpy as np
    
    data = {'set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan]}
    df = pd.DataFrame(data)
    
    df.loc[df['set_of_numbers'].isnull(),'value_is_NaN'] = 'Yes'
    df.loc[df['set_of_numbers'].notnull(), 'value_is_NaN'] = 'No'
    
    count_nan = df.loc[df['value_is_NaN']=='Yes'].count()
    print (count_nan)
    

    As before, you’ll get the count of 3 instances of NaN values:

    value_is_NaN      3
    

    (3) Check for NaN under an entire DataFrame

    Now let’s add a second column into the original DataFrame. This column would include another set of numbers with NaN values:

    import pandas as pd
    import numpy as np
    
    data = {'first_set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan],
            'second_set_of_numbers': [11,12,np.nan,13,14,np.nan,15,16,np.nan,np.nan,17,np.nan,19]}
    df = pd.DataFrame(data)
    
    print (df)
    

    Run the code, and you’ll get 8 instances of NaN values across the entire DataFrame:

        first_set_of_numbers  second_set_of_numbers
    0                    1.0                   11.0
    1                    2.0                   12.0
    2                    3.0                    NaN
    3                    4.0                   13.0
    4                    5.0                   14.0
    5                    NaN                    NaN
    6                    6.0                   15.0
    7                    7.0                   16.0
    8                    NaN                    NaN
    9                    8.0                    NaN
    10                   9.0                   17.0
    11                  10.0                    NaN
    12                   NaN                   19.0
    

    You can then apply this syntax in order to verify the existence of NaN values under the entire DataFrame:

    df.isnull().values.any()

    For our example:

    import pandas as pd
    import numpy as np
    
    data = {'first_set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan],
            'second_set_of_numbers': [11,12,np.nan,13,14,np.nan,15,16,np.nan,np.nan,17,np.nan,19]}
    df = pd.DataFrame(data)
    
    check_nan_in_df = df.isnull().values.any()
    print (check_nan_in_df)
    

    Once you run the code, you’ll get ‘True’ which confirms the existence of NaN values in the DataFrame:

    True
    

    You can get a further breakdown by removing .values.any() from the code:

    import pandas as pd
    import numpy as np
    
    data = {'first_set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan],
            'second_set_of_numbers': [11,12,np.nan,13,14,np.nan,15,16,np.nan,np.nan,17,np.nan,19]}
    df = pd.DataFrame(data)
    
    check_nan_in_df = df.isnull()
    print (check_nan_in_df)
    

    Here is the result of the breakdown:

        first_set_of_numbers  second_set_of_numbers
    0                  False                  False
    1                  False                  False
    2                  False                   True
    3                  False                  False
    4                  False                  False
    5                   True                   True
    6                  False                  False
    7                  False                  False
    8                   True                   True
    9                  False                   True
    10                 False                  False
    11                 False                   True
    12                  True                  False
    

    (4) Count the NaN under an entire DataFrame

    You may now use this template to count the NaN values under the entire DataFrame:

    df.isnull().sum().sum()

    Here is the code for our example:

    import pandas as pd
    import numpy as np
    
    data = {'first_set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan],
            'second_set_of_numbers': [11,12,np.nan,13,14,np.nan,15,16,np.nan,np.nan,17,np.nan,19]}
    df = pd.DataFrame(data)
    
    count_nan_in_df = df.isnull().sum().sum()
    print ('Count of NaN: ' + str(count_nan_in_df))
    

    You’ll then get the total count of 8:

    Count of NaN: 8
    

    And if you want to get the count of NaN by column, then you may use the following code:

    import pandas as pd
    import numpy as np
    
    data = {'first_set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan],
            'second_set_of_numbers': [11,12,np.nan,13,14,np.nan,15,16,np.nan,np.nan,17,np.nan,19]}
    df = pd.DataFrame(data)
    
    count_nan_in_df = df.isnull().sum()
    print (count_nan_in_df)
    

    And here is the result:

    first_set_of_numbers     3
    second_set_of_numbers    5
    

    You just saw how to check for NaN in Pandas DataFrame. Alternatively you may:

    • Drop Rows with NaN Values in Pandas DataFrame
    • Replace NaN Values with Zeros
    • Create NaN Values in Pandas DataFrame

    By using isnull().values.any() method you can check if a pandas DataFrame contains NaN/None values in any cell (all rows & columns ). This method returns True if it finds NaN/None on any cell of a DataFrame, returns False when not found. In this article, I will explain how to check if any value is NaN in a pandas DataFrame.

    NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. One of the major problems in Data Analysis is the NaN value as having NaN the operations will have side effects hence it’s always a best practice to check if DataFrame has any missing data and replace them with values that make sense for example empty string or numeric zero.

    1. Quick Examples of Check If any Value is NaN

    If you are in a hurry, below are some quick examples of how to check if any value is nan in a pandas DataFrame.

    
    # Below are a quick example
    # Checking NaN on entire DataFrame
    value = df.isnull().values.any()
    
    # Checking on Single Column
    value = df['Fee'].isnull().values.any()
    
    # Checking on multiple columns
    value = df[['Fee','Duration']].isnull().values.any()
    
    # Counte NaN on entire DataFrame
    result = df.isnull().sum()
    
    # Counte NaN on single column of DataFrame
    result = df['Fee'].isnull().sum()
    
    # Counte NaN on selected columns of DataFrame
    result = df[['Fee','Duration']].isnull().sum()
    
    # Get Total Count of all Columns
    count = df.isnull().sum().sum()
    print('Number of NaN values present:' +str(count))
    

    Now, let’s create a DataFrame with a few rows and columns and execute some examples and validate the output. Our DataFrame contains column names Courses, Fee, Duration, and Discount with some NaN values.

    
    # Create Sample DataFrame
    import pandas as pd
    import numpy as np
    technologies = ({
         'Courses':["Spark","Java","Hadoop","Python","pandas"],
         'Fee' :[20000,np.nan,26000,np.nan,24000],
         'Duration':['30days',np.nan,'35days','40days',np.nan],
         'Discount':[1000,np.nan,2500,2100,np.nan]
                   })
    df = pd.DataFrame(technologies)
    print(df)
    

    Yields below output.

    
      Courses      Fee Duration  Discount
    0   Spark  20000.0   30days    1000.0
    1    Java      NaN      NaN       NaN
    2  Hadoop  26000.0   35days    2500.0
    3  Python      NaN   40days    2100.0
    4  pandas  24000.0      NaN       NaN
    

    Use DataFrame.isnull().Values.any() method to check if there are any missing data in pandas DataFrame, missing data is represented as NaN or None values in DataFrame. When your data contains NaN or None, using this method returns the boolean value True otherwise returns False. After identifying the columns with NaN, sometimes you may want to replace NaN with zero value or replace NaN with a blank or empty string.

    
    # Check accross all cell for NaN values
    value = df.isnull().values.any()
    print(value)
    # Outputs: True
    

    The above example checks all columns and returns True when it finds at least a single NaN/None value.

    3. Check for NaN Values on Selected Columns

    If you wanted to check if NaN values exist on selected columns (single or multiple), First select the columns and run the same method.

    
    # Checking on Single Column
    value = df['Fee'].isnull().values.any()
    print(value)
    # Outputs: True
    
    # Checking on Single Column
    value = df['Courses'].isnull().values.any()
    print(value)
    # Outputs: False
    
    # Checking on multiple columns
    value = df[['Fee','Duration']].isnull().values.any()
    print(value)
    # Outputs: True
    

    3. Using DataFrame.isnull() Method

    DataFrame.isnull() check if a value is present in a cell, if it finds NaN/None values it returns True otherwise it returns False for each cell.

    
    # Using DataFrame.isnull() method
    df2 = df['Fee'].isnull()
    print(df2)
    

    Yields below output.

    
    0    False
    1     True
    2    False
    3     True
    4    False
    Name: Fee, dtype: bool
    

    4. Count the NaN Values on Single or Multiple DataFrame Columns

    You can also count the NaN/None values present in the entire DataFrame, single or multiple columns.

    
    # Counte NaN on entire DataFrame
    result = df.isnull().sum()
    print(result)
    # Outputs
    #Courses     0
    #Fee         2
    #Duration    2
    #Discount    2
    #dtype: int64
    
    # Counte NaN on single column of DataFrame
    result = df['Fee'].isnull().sum()
    print(result)
    # Outputs
    #2
    
    # Counte NaN on selected columns of DataFrame
    result = df[['Fee','Duration']].isnull().sum()
    print(result)
    # Outputs
    #Fee         2
    #Duration    2
    #dtype: int64
    

    Note that when you use sum() on multiple columns or entire DataFrame it returns naN values count for each column.

    5. Total Count NaN Values on Entire DataFrame

    To get the combined total count of NaN values, use isnull().sum().sum() on DataFrame. The below example returns the total count of NaN values from all columns.

    
    # To get the Count
    count = df.isnull().sum().sum()
    print('Number of NaN values present:' +str(df2))
    

    Yields below output.

    
    Number of NaN values present:6
    

    6. Complete Example For Check If any Value NaN

    Below is the complete example of how to check if any value is NaN in pandas DataFrame.

    
    import pandas as pd
    import numpy as np
    technologies = ({
         'Courses':["Spark","Java","Hadoop","Python","pandas"],
         'Fee' :[20000,np.nan,26000,np.nan,24000],
         'Duration':['30days',np.nan,'35days','40days',np.nan],
         'Discount':[1000,np.nan,2500,2100,np.nan]
                   })
    df = pd.DataFrame(technologies)
    print(df)
    
    # Checking NaN on entire DataFrame
    value = df.isnull().values.any()
    print(value)
    
    # Checking on Single Column
    value = df['Fee'].isnull().values.any()
    print(value)
    
    # Checking on Single Column
    value = df['Courses'].isnull().values.any()
    print(value)
    
    # Checking on multiple columns
    value = df[['Fee','Duration']].isnull().values.any()
    print(value)
    
    # Using DataFrame.isnull() method
    df2 = df['Fee'].isnull()
    print(df2)
    
    # Counte NaN on entire DataFrame
    result = df.isnull().sum()
    print(result)
    
    # Counte NaN on single column of DataFrame
    result = df['Fee'].isnull().sum()
    print(result)
    
    # Counte NaN on selected columns of DataFrame
    result = df[['Fee','Duration']].isnull().sum()
    print(result)
    
    # To get the Count
    count = df.isnull().sum().sum()
    print('Number of NaN values present:' +str(count))
    

    Conclusion

    In this article, you have learned how to check if any value is NaN in the entire pandas DataFrame, on a single column or multiple columns using DataFrame.isnull().any(), and DataFrame.isnull().sum() method. Also, you have learned how to get the count of NaN values using DataFrame.isnull().sum().sum() method.

    Happy Learning !!

    Related Articles

    • How to Drop Rows with NaN Values in Pandas DataFrame
    • How to Combine Two Series into pandas DataFrame
    • Pandas Remap Values in Column with a Dictionary (Dict)
    • Pandas Check Column Contains a Value in DataFrame
    • Check Values of Pandas Series is Unique
    • Pandas Check If DataFrame is Empty | Examples
    • Pandas – Check If a Column Exists in DataFrame
    • How to Check Pandas Version?

    References

    • https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isnull.html

    In this article I would like to describe how to find NaN values in a pandas DataFrame. This kind of operation can be very useful given that is common to find datasets with missing or incorrect data values.

    I will be using the numpy package to generate some data with NaN values.

    Import necessary packages

    import pandas as pd
    import numpy as np
    import platform
    

    Enter fullscreen mode

    Exit fullscreen mode

    print(f'Python version: {platform.python_version()} ({platform.python_implementation()})')
    print(f'Pandas version: {pd.__version__}')
    print(f'Numpy version: {np.__version__}')
    

    Enter fullscreen mode

    Exit fullscreen mode

    Python version: 3.6.4 (CPython)
    Pandas version: 0.23.1
    Numpy version: 1.14.5
    

    Enter fullscreen mode

    Exit fullscreen mode

    Generate data with NaN values

    num_nan = 25 # number of NaN values wanted in the generated data
    np.random.seed(6765431)  # set a seed for reproducibility
    A = np.random.randn(10, 10)
    print(A)
    

    Enter fullscreen mode

    Exit fullscreen mode

    [[-1.56132314 -0.16954058 -0.17845422 -1.33689111 -0.19185078 -1.18617765
       0.44499302 -0.61209568  0.31170935  1.4127548 ]
     [ 0.85330488  0.68517546 -1.10140989  0.84918019  0.72802961 -0.35161197
       0.73519152  1.13145412  0.53231247  0.78103143]
     [-0.81614324  0.15906898  0.49940119 -0.09319255 -1.07837721 -0.76053341
       0.73622083 -0.45518154 -0.69194032  1.02550409]
     [-1.96339975  0.07593331 -0.16798377 -1.20398958  0.88333656  1.17908422
       0.26324698 -2.65442248 -0.31583796 -0.16065732]
     [-1.24321376 -0.89816898  0.02824671  0.15304093  0.56505667 -0.78115883
       0.74504467  1.14025258 -0.04518221 -0.83908358]
     [ 1.00967019  0.84240102  1.15043436 -0.40120489  0.00664105 -1.23247563
       0.64738343  1.66096762 -0.92556683  0.47575796]
     [ 0.96516278  1.11158059 -0.82155143  0.88900313  2.16943761 -2.05250161
       2.40156233  0.92453867 -0.24437783 -2.91029265]
     [-0.86492662  0.82443151 -0.48246862 -1.05183143 -1.15272524 -0.77170733
       0.07177233  1.02820181 -2.08947076  0.89859677]
     [-0.07263982 -0.56840867  1.30910275 -0.52846822  0.06019191 -0.61000727
       0.40782356 -0.36124333 -1.54522486 -0.07891861]
     [-1.96361682 -1.06315325 -0.45582138 -0.74566868  1.27579529 -2.46306005
       0.57022673 -0.02793746  0.78652775  1.27690195]]
    

    Enter fullscreen mode

    Exit fullscreen mode

    # Set random values to nan
    A.ravel()[np.random.choice(A.size, num_nan, replace=False)] = np.nan
    print(A)
    

    Enter fullscreen mode

    Exit fullscreen mode

    [[-1.56132314 -0.16954058 -0.17845422 -1.33689111 -0.19185078 -1.18617765
              nan -0.61209568  0.31170935  1.4127548 ]
     [ 0.85330488  0.68517546         nan  0.84918019         nan -0.35161197
       0.73519152         nan  0.53231247  0.78103143]
     [-0.81614324  0.15906898  0.49940119         nan -1.07837721 -0.76053341
       0.73622083         nan -0.69194032  1.02550409]
     [-1.96339975  0.07593331         nan -1.20398958  0.88333656         nan
       0.26324698         nan -0.31583796 -0.16065732]
     [-1.24321376 -0.89816898  0.02824671  0.15304093  0.56505667 -0.78115883
       0.74504467  1.14025258 -0.04518221 -0.83908358]
     [ 1.00967019  0.84240102         nan -0.40120489  0.00664105         nan
       0.64738343  1.66096762 -0.92556683  0.47575796]
     [ 0.96516278         nan -0.82155143  0.88900313  2.16943761         nan
       2.40156233         nan -0.24437783         nan]
     [-0.86492662  0.82443151 -0.48246862 -1.05183143 -1.15272524 -0.77170733
       0.07177233  1.02820181 -2.08947076         nan]
     [-0.07263982         nan  1.30910275 -0.52846822  0.06019191 -0.61000727
       0.40782356 -0.36124333         nan         nan]
     [        nan         nan         nan         nan  1.27579529 -2.46306005
              nan         nan  0.78652775  1.27690195]]
    

    Enter fullscreen mode

    Exit fullscreen mode

    # Create a DataFrame from the generated data
    df = pd.DataFrame(A)
    df
    

    Enter fullscreen mode

    Exit fullscreen mode

    0 1 2 3 4 5 6 7 8 9
    0 -1.561323 -0.169541 -0.178454 -1.336891 -0.191851 -1.186178 NaN -0.612096 0.311709 1.412755
    1 0.853305 0.685175 NaN 0.849180 NaN -0.351612 0.735192 NaN 0.532312 0.781031
    2 -0.816143 0.159069 0.499401 NaN -1.078377 -0.760533 0.736221 NaN -0.691940 1.025504
    3 -1.963400 0.075933 NaN -1.203990 0.883337 NaN 0.263247 NaN -0.315838 -0.160657
    4 -1.243214 -0.898169 0.028247 0.153041 0.565057 -0.781159 0.745045 1.140253 -0.045182 -0.839084
    5 1.009670 0.842401 NaN -0.401205 0.006641 NaN 0.647383 1.660968 -0.925567 0.475758
    6 0.965163 NaN -0.821551 0.889003 2.169438 NaN 2.401562 NaN -0.244378 NaN
    7 -0.864927 0.824432 -0.482469 -1.051831 -1.152725 -0.771707 0.071772 1.028202 -2.089471 NaN
    8 -0.072640 NaN 1.309103 -0.528468 0.060192 -0.610007 0.407824 -0.361243 NaN NaN
    9 NaN NaN NaN NaN 1.275795 -2.463060 NaN NaN 0.786528 1.276902

    Check for NaN values

    Now that we have some data to operate on let’s see the different ways we can check for missing values.

    There are two methods of the DataFrame object that can be used: DataFrame#isna() and DataFrame#isnull(). But if you check the source code it seems that isnull() is only an alias for the isna() method. To keep it simple I will only use the isna() method as we would get the same result using isnull().

    df.isna()
    

    Enter fullscreen mode

    Exit fullscreen mode

    0 1 2 3 4 5 6 7 8 9
    0 False False False False False False True False False False
    1 False False True False True False False True False False
    2 False False False True False False False True False False
    3 False False True False False True False True False False
    4 False False False False False False False False False False
    5 False False True False False True False False False False
    6 False True False False False True False True False True
    7 False False False False False False False False False True
    8 False True False False False False False False True True
    9 True True True True False False True True False False

    As it can be seen above when we use the isna() method it returns a DataFrame with boolean values, where True indicates NaN values and False otherwise.

    If we wanted to know how many missing values there are on each row or column we could use the DataFrame#sum() method:

    df.isna().sum(axis='rows')  # 'rows' or 0
    

    Enter fullscreen mode

    Exit fullscreen mode

    0    1
    1    3
    2    4
    3    2
    4    1
    5    3
    6    2
    7    5
    8    1
    9    3
    dtype: int64
    

    Enter fullscreen mode

    Exit fullscreen mode

    df.isna().sum(axis='columns')  # 'columns' or 1
    

    Enter fullscreen mode

    Exit fullscreen mode

    0    1
    1    3
    2    2
    3    3
    4    0
    5    2
    6    4
    7    1
    8    3
    9    6
    dtype: int64
    

    Enter fullscreen mode

    Exit fullscreen mode

    To simply know the total number of missing values we can call sum() again:

    df.isna().sum().sum()
    

    Enter fullscreen mode

    Exit fullscreen mode

    25
    

    Enter fullscreen mode

    Exit fullscreen mode

    If we simply wanna know if there is any missing value with no care for the quantity we can simply use the any() method:

    df.isna().any()  # can also receive axis='rows' or 'columns'
    

    Enter fullscreen mode

    Exit fullscreen mode

    0    True
    1    True
    2    True
    3    True
    4    True
    5    True
    6    True
    7    True
    8    True
    9    True
    dtype: bool
    

    Enter fullscreen mode

    Exit fullscreen mode

    Calling it again we have a single boolean output:

    df.isna().any().any()
    

    Enter fullscreen mode

    Exit fullscreen mode

    True
    

    Enter fullscreen mode

    Exit fullscreen mode

    Besides the isna() method we also have the notna() method which is its boolean inverse. Applying it we can get the number of values that are not missing or simply if all values are not missing (but using the all() method instead of any()).

    print(df.notna().sum().sum())  # not missing
    print(df.notna().all().all())
    

    Enter fullscreen mode

    Exit fullscreen mode

    75
    False
    

    Enter fullscreen mode

    Exit fullscreen mode

    Note 1: in the examples, it was used the DataFrame methods to check for missing values, but the pandas package has its own functions with the same purpose that can be applied to other objects. Example:

    print(pd.isna([1, 2, np.nan]))
    print(pd.notna([1, 2, np.nan]))
    

    Enter fullscreen mode

    Exit fullscreen mode

    [False False  True]
    [ True  True False]
    

    Enter fullscreen mode

    Exit fullscreen mode

    Note 2: the methods applied here on DataFrame objects are also available for Series and Index objects.

    Time comparison

    Comparing the time taken by the two methods we can see that using any() is faster but sum() will give us the additional information about how many missing values there are.

    %timeit df.isna().any().any()
    

    Enter fullscreen mode

    Exit fullscreen mode

    333 µs ± 33.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    

    Enter fullscreen mode

    Exit fullscreen mode

    %timeit df.isna().sum().sum()
    

    Enter fullscreen mode

    Exit fullscreen mode

    561 µs ± 97.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    

    Enter fullscreen mode

    Exit fullscreen mode

    Dealing with missing values

    Two easy ways to deal with missing values are removing them or filling them with some value. These can be achieved with the dropna() and fillna() methods.

    The dropna() method will return a DataFrame without the rows and columns containing missing values.

    df.dropna()
    

    Enter fullscreen mode

    Exit fullscreen mode

    0 1 2 3 4 5 6 7 8 9
    4 -1.243214 -0.898169 0.028247 0.153041 0.565057 -0.781159 0.745045 1.140253 -0.045182 -0.839084

    The fillna() method will return a DataFrame with the missing values filled with a specified value.

    df.fillna(value=5)
    

    Enter fullscreen mode

    Exit fullscreen mode

    0 1 2 3 4 5 6 7 8 9
    0 -1.561323 -0.169541 -0.178454 -1.336891 -0.191851 -1.186178 5.000000 -0.612096 0.311709 1.412755
    1 0.853305 0.685175 5.000000 0.849180 5.000000 -0.351612 0.735192 5.000000 0.532312 0.781031
    2 -0.816143 0.159069 0.499401 5.000000 -1.078377 -0.760533 0.736221 5.000000 -0.691940 1.025504
    3 -1.963400 0.075933 5.000000 -1.203990 0.883337 5.000000 0.263247 5.000000 -0.315838 -0.160657
    4 -1.243214 -0.898169 0.028247 0.153041 0.565057 -0.781159 0.745045 1.140253 -0.045182 -0.839084
    5 1.009670 0.842401 5.000000 -0.401205 0.006641 5.000000 0.647383 1.660968 -0.925567 0.475758
    6 0.965163 5.000000 -0.821551 0.889003 2.169438 5.000000 2.401562 5.000000 -0.244378 5.000000
    7 -0.864927 0.824432 -0.482469 -1.051831 -1.152725 -0.771707 0.071772 1.028202 -2.089471 5.000000
    8 -0.072640 5.000000 1.309103 -0.528468 0.060192 -0.610007 0.407824 -0.361243 5.000000 5.000000
    9 5.000000 5.000000 5.000000 5.000000 1.275795 -2.463060 5.000000 5.000000 0.786528 1.276902

    References:

    • Create sample numpy array with randomly placed NaNs (StackOverflow)
    • How to check if any value is NaN in a Pandas DataFrame (StackOverflow)
    • pandas.isnull
    • pandas.isna
    • pandas.notna
    • pandas.DataFrame.dropna
    • pandas.DataFrame.fillna

    Понравилась статья? Поделить с друзьями:

    Не пропустите также:

  • Как найти яйцо динозавра в майнкрафте
  • Как найти площадь конуса если известен объем
  • Механическая энергия формула как найти высоту
  • Как правильно составить исковое заявление об оспаривании отцовства
  • Как составить книгу учета хозяйственных операций

  • 0 0 голоса
    Рейтинг статьи
    Подписаться
    Уведомить о
    guest

    0 комментариев
    Старые
    Новые Популярные
    Межтекстовые Отзывы
    Посмотреть все комментарии