Pandas как найти nan

You have a couple of options.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10,6))
# Make a few areas have NaN values
df.iloc[1:3,1] = np.nan
df.iloc[5,3] = np.nan
df.iloc[7:9,5] = np.nan

Now the data frame looks something like this:

          0         1         2         3         4         5
0  0.520113  0.884000  1.260966 -0.236597  0.312972 -0.196281
1 -0.837552       NaN  0.143017  0.862355  0.346550  0.842952
2 -0.452595       NaN -0.420790  0.456215  1.203459  0.527425
3  0.317503 -0.917042  1.780938 -1.584102  0.432745  0.389797
4 -0.722852  1.704820 -0.113821 -1.466458  0.083002  0.011722
5 -0.622851 -0.251935 -1.498837       NaN  1.098323  0.273814
6  0.329585  0.075312 -0.690209 -3.807924  0.489317 -0.841368
7 -1.123433 -1.187496  1.868894 -2.046456 -0.949718       NaN
8  1.133880 -0.110447  0.050385 -1.158387  0.188222       NaN
9 -0.513741  1.196259  0.704537  0.982395 -0.585040 -1.693810

Option 1: df.isnull().any().any() — This returns a boolean value

You know of the isnull() which would return a dataframe like this:

       0      1      2      3      4      5
0  False  False  False  False  False  False
1  False   True  False  False  False  False
2  False   True  False  False  False  False
3  False  False  False  False  False  False
4  False  False  False  False  False  False
5  False  False  False   True  False  False
6  False  False  False  False  False  False
7  False  False  False  False  False   True
8  False  False  False  False  False   True
9  False  False  False  False  False  False

If you make it df.isnull().any(), you can find just the columns that have NaN values:

0    False
1     True
2    False
3     True
4    False
5     True
dtype: bool

One more .any() will tell you if any of the above are True

> df.isnull().any().any()
True

Option 2: df.isnull().sum().sum() — This returns an integer of the total number of NaN values:

This operates the same way as the .any().any() does, by first giving a summation of the number of NaN values in a column, then the summation of those values:

df.isnull().sum()
0    0
1    2
2    0
3    1
4    0
5    2
dtype: int64

Finally, to get the total number of NaN values in the DataFrame:

df.isnull().sum().sum()
5

Источник

Improve Article

Save Article

Like Article

Read

Discuss

Improve Article

Save Article

Like Article

NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special floating-point value and cannot be converted to any other type than float.

NaN value is one of the major problems in Data Analysis. It is very essential to deal with NaN in order to get the desired results.

Check for NaN Value in Pandas DataFrame

The ways to check for NaN in Pandas DataFrame are as follows:

Check for NaN with isnull().values.any() method
Count the NaN Using isnull().sum() Method
Check for NaN Using isnull().sum().any() Method
Count the NaN Using isnull().sum().sum() Method

Method 1: Using isnull().values.any() method

Example:

Python3

import pandas as pd

import numpy as np

num = {'Integers': [10, 15, 30, 40, 55, np.nan,

75, np.nan, 90, 150, np.nan]}

df = pd.DataFrame(num, columns=['Integers'])

check_nan = df['Integers'].isnull().values.any()

print(check_nan)

Output:

True

It is also possible to get the exact positions where NaN values are present. We can do so by removing .values.any() from isnull().values.any() .

Python3

Output:

0     False
1     False
2     False
3     False
4     False
5      True
6     False
7      True
8     False
9     False
10     True
Name: Integers, dtype: bool

Method 2: Using isnull().sum() Method

Example:

Python3

import pandas as pd

import numpy as np

num = {'Integers': [10, 15, 30, 40, 55, np.nan,

75, np.nan, 90, 150, np.nan]}

df = pd.DataFrame(num, columns=['Integers'])

count_nan = df['Integers'].isnull().sum()

print('Number of NaN values present: ' + str(count_nan))

Output:

Number of NaN values present: 3

Method 3: Using isnull().sum().any() Method

Example:

Python3

import pandas as pd

import numpy as np

nums = {'Integers_1': [10, 15, 30, 40, 55, np.nan, 75,

np.nan, 90, 150, np.nan],

'Integers_2': [np.nan, 21, 22, 23, np.nan, 24, 25,

np.nan, 26, np.nan, np.nan]}

df = pd.DataFrame(nums, columns=['Integers_1', 'Integers_2'])

nan_in_df = df.isnull().sum().any()

print(nan_in_df)

Output:

True

To get the exact positions where NaN values are present, we can do so by removing .sum().any() from isnull().sum().any() .

Method 4: Using isnull().sum().sum() Method

Example:

Python3

import pandas as pd

import numpy as np

nums = {'Integers_1': [10, 15, 30, 40, 55, np.nan, 75,

np.nan, 90, 150, np.nan],

'Integers_2': [np.nan, 21, 22, 23, np.nan, 24, 25,

np.nan, 26, np.nan, np.nan]}

df = pd.DataFrame(nums, columns=['Integers_1', 'Integers_2'])

nan_in_df = df.isnull().sum().sum()

print('Number of NaN values present: ' + str(nan_in_df))

Output:

Number of NaN values present: 8

Last Updated :
30 Jan, 2023

Like Article

Save Article

Источник

Here are 4 ways to check for NaN in Pandas DataFrame:

(1) Check for NaN under a single DataFrame column:

df['your column name'].isnull().values.any()

(2) Count the NaN under a single DataFrame column:

df['your column name'].isnull().sum()

(3) Check for NaN under an entire DataFrame:

df.isnull().values.any()

(4) Count the NaN under an entire DataFrame:

df.isnull().sum().sum()

(1) Check for NaN under a single DataFrame column

In the following example, we’ll create a DataFrame with a set of numbers and 3 NaN values:

import pandas as pd
import numpy as np

data = {'set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan]}
df = pd.DataFrame(data)
print (df)

You’ll now see the DataFrame with the 3 NaN values:

    set_of_numbers
0              1.0
1              2.0
2              3.0
3              4.0
4              5.0
5              NaN
6              6.0
7              7.0
8              NaN
9              8.0
10             9.0
11            10.0
12             NaN

You can then use the following template in order to check for NaN under a single DataFrame column:

df['your column name'].isnull().values.any()

For our example, the DataFrame column is ‘set_of_numbers.’

And so, the code to check whether a NaN value exists under the ‘set_of_numbers’ column is as follows:

import pandas as pd
import numpy as np

data = {'set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan]}
df = pd.DataFrame(data)

check_for_nan = df['set_of_numbers'].isnull().values.any()
print (check_for_nan)

Run the code, and you’ll get ‘True’ which confirms the existence of NaN values under the DataFrame column:

True

And if you want to get the actual breakdown of the instances where NaN values exist, then you may remove .values.any() from the code. So the complete syntax to get the breakdown would look as follows:

import pandas as pd
import numpy as np

data = {'set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan]}
df = pd.DataFrame(data)

check_for_nan = df['set_of_numbers'].isnull()
print (check_for_nan)

You’ll now see the 3 instances of the NaN values:

0     False
1     False
2     False
3     False
4     False
5      True
6     False
7     False
8      True
9     False
10    False
11    False
12     True

Here is another approach where you can get all the instances where a NaN value exists:

import pandas as pd
import numpy as np

data = {'set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan]}
df = pd.DataFrame(data)

df.loc[df['set_of_numbers'].isnull(),'value_is_NaN'] = 'Yes'
df.loc[df['set_of_numbers'].notnull(), 'value_is_NaN'] = 'No'

print (df)

You’ll now see a new column (called ‘value_is_NaN’), which indicates all the instances where a NaN value exists:

    set_of_numbers  value_is_NaN
0              1.0            No
1              2.0            No
2              3.0            No
3              4.0            No
4              5.0            No
5              NaN           Yes
6              6.0            No
7              7.0            No
8              NaN           Yes
9              8.0            No
10             9.0            No
11            10.0            No
12             NaN           Yes

(2) Count the NaN under a single DataFrame column

You can apply this syntax in order to count the NaN values under a single DataFrame column:

df['your column name'].isnull().sum()

Here is the syntax for our example:

import pandas as pd
import numpy as np

data = {'set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan]}
df = pd.DataFrame(data)

count_nan = df['set_of_numbers'].isnull().sum()
print ('Count of NaN: ' + str(count_nan))

You’ll then get the count of 3 NaN values:

Count of NaN: 3

And here is another approach to get the count:

import pandas as pd
import numpy as np

data = {'set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan]}
df = pd.DataFrame(data)

df.loc[df['set_of_numbers'].isnull(),'value_is_NaN'] = 'Yes'
df.loc[df['set_of_numbers'].notnull(), 'value_is_NaN'] = 'No'

count_nan = df.loc[df['value_is_NaN']=='Yes'].count()
print (count_nan)

As before, you’ll get the count of 3 instances of NaN values:

value_is_NaN      3

(3) Check for NaN under an entire DataFrame

Now let’s add a second column into the original DataFrame. This column would include another set of numbers with NaN values:

import pandas as pd
import numpy as np

data = {'first_set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan],
        'second_set_of_numbers': [11,12,np.nan,13,14,np.nan,15,16,np.nan,np.nan,17,np.nan,19]}
df = pd.DataFrame(data)

print (df)

Run the code, and you’ll get 8 instances of NaN values across the entire DataFrame:

    first_set_of_numbers  second_set_of_numbers
0                    1.0                   11.0
1                    2.0                   12.0
2                    3.0                    NaN
3                    4.0                   13.0
4                    5.0                   14.0
5                    NaN                    NaN
6                    6.0                   15.0
7                    7.0                   16.0
8                    NaN                    NaN
9                    8.0                    NaN
10                   9.0                   17.0
11                  10.0                    NaN
12                   NaN                   19.0

You can then apply this syntax in order to verify the existence of NaN values under the entire DataFrame:

df.isnull().values.any()

For our example:

import pandas as pd
import numpy as np

data = {'first_set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan],
        'second_set_of_numbers': [11,12,np.nan,13,14,np.nan,15,16,np.nan,np.nan,17,np.nan,19]}
df = pd.DataFrame(data)

check_nan_in_df = df.isnull().values.any()
print (check_nan_in_df)

Once you run the code, you’ll get ‘True’ which confirms the existence of NaN values in the DataFrame:

True

You can get a further breakdown by removing .values.any() from the code:

import pandas as pd
import numpy as np

data = {'first_set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan],
        'second_set_of_numbers': [11,12,np.nan,13,14,np.nan,15,16,np.nan,np.nan,17,np.nan,19]}
df = pd.DataFrame(data)

check_nan_in_df = df.isnull()
print (check_nan_in_df)

Here is the result of the breakdown:

    first_set_of_numbers  second_set_of_numbers
0                  False                  False
1                  False                  False
2                  False                   True
3                  False                  False
4                  False                  False
5                   True                   True
6                  False                  False
7                  False                  False
8                   True                   True
9                  False                   True
10                 False                  False
11                 False                   True
12                  True                  False

(4) Count the NaN under an entire DataFrame

You may now use this template to count the NaN values under the entire DataFrame:

df.isnull().sum().sum()

Here is the code for our example:

import pandas as pd
import numpy as np

data = {'first_set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan],
        'second_set_of_numbers': [11,12,np.nan,13,14,np.nan,15,16,np.nan,np.nan,17,np.nan,19]}
df = pd.DataFrame(data)

count_nan_in_df = df.isnull().sum().sum()
print ('Count of NaN: ' + str(count_nan_in_df))

You’ll then get the total count of 8:

Count of NaN: 8

And if you want to get the count of NaN by column, then you may use the following code:

import pandas as pd
import numpy as np

data = {'first_set_of_numbers': [1,2,3,4,5,np.nan,6,7,np.nan,8,9,10,np.nan],
        'second_set_of_numbers': [11,12,np.nan,13,14,np.nan,15,16,np.nan,np.nan,17,np.nan,19]}
df = pd.DataFrame(data)

count_nan_in_df = df.isnull().sum()
print (count_nan_in_df)

And here is the result:

first_set_of_numbers     3
second_set_of_numbers    5

You just saw how to check for NaN in Pandas DataFrame. Alternatively you may:

Drop Rows with NaN Values in Pandas DataFrame
Replace NaN Values with Zeros
Create NaN Values in Pandas DataFrame

Источник

By using isnull().values.any() method you can check if a pandas DataFrame contains NaN/None values in any cell (all rows & columns ). This method returns True if it finds NaN/None on any cell of a DataFrame, returns False when not found. In this article, I will explain how to check if any value is NaN in a pandas DataFrame.

NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. One of the major problems in Data Analysis is the NaN value as having NaN the operations will have side effects hence it’s always a best practice to check if DataFrame has any missing data and replace them with values that make sense for example empty string or numeric zero.

1. Quick Examples of Check If any Value is NaN

If you are in a hurry, below are some quick examples of how to check if any value is nan in a pandas DataFrame.


# Below are a quick example
# Checking NaN on entire DataFrame
value = df.isnull().values.any()

# Checking on Single Column
value = df['Fee'].isnull().values.any()

# Checking on multiple columns
value = df[['Fee','Duration']].isnull().values.any()

# Counte NaN on entire DataFrame
result = df.isnull().sum()

# Counte NaN on single column of DataFrame
result = df['Fee'].isnull().sum()

# Counte NaN on selected columns of DataFrame
result = df[['Fee','Duration']].isnull().sum()

# Get Total Count of all Columns
count = df.isnull().sum().sum()
print('Number of NaN values present:' +str(count))

Now, let’s create a DataFrame with a few rows and columns and execute some examples and validate the output. Our DataFrame contains column names Courses, Fee, Duration, and Discount with some NaN values.


# Create Sample DataFrame
import pandas as pd
import numpy as np
technologies = ({
     'Courses':["Spark","Java","Hadoop","Python","pandas"],
     'Fee' :[20000,np.nan,26000,np.nan,24000],
     'Duration':['30days',np.nan,'35days','40days',np.nan],
     'Discount':[1000,np.nan,2500,2100,np.nan]
               })
df = pd.DataFrame(technologies)
print(df)

Yields below output.


  Courses      Fee Duration  Discount
0   Spark  20000.0   30days    1000.0
1    Java      NaN      NaN       NaN
2  Hadoop  26000.0   35days    2500.0
3  Python      NaN   40days    2100.0
4  pandas  24000.0      NaN       NaN

Use DataFrame.isnull().Values.any() method to check if there are any missing data in pandas DataFrame, missing data is represented as NaN or None values in DataFrame. When your data contains NaN or None, using this method returns the boolean value True otherwise returns False. After identifying the columns with NaN, sometimes you may want to replace NaN with zero value or replace NaN with a blank or empty string.


# Check accross all cell for NaN values
value = df.isnull().values.any()
print(value)
# Outputs: True

The above example checks all columns and returns True when it finds at least a single NaN/None value.

3. Check for NaN Values on Selected Columns

If you wanted to check if NaN values exist on selected columns (single or multiple), First select the columns and run the same method.


# Checking on Single Column
value = df['Fee'].isnull().values.any()
print(value)
# Outputs: True

# Checking on Single Column
value = df['Courses'].isnull().values.any()
print(value)
# Outputs: False

# Checking on multiple columns
value = df[['Fee','Duration']].isnull().values.any()
print(value)
# Outputs: True

3. Using DataFrame.isnull() Method

DataFrame.isnull() check if a value is present in a cell, if it finds NaN/None values it returns True otherwise it returns False for each cell.


# Using DataFrame.isnull() method
df2 = df['Fee'].isnull()
print(df2)

Yields below output.


0    False
1     True
2    False
3     True
4    False
Name: Fee, dtype: bool

4. Count the NaN Values on Single or Multiple DataFrame Columns

You can also count the NaN/None values present in the entire DataFrame, single or multiple columns.


# Counte NaN on entire DataFrame
result = df.isnull().sum()
print(result)
# Outputs
#Courses     0
#Fee         2
#Duration    2
#Discount    2
#dtype: int64

# Counte NaN on single column of DataFrame
result = df['Fee'].isnull().sum()
print(result)
# Outputs
#2

# Counte NaN on selected columns of DataFrame
result = df[['Fee','Duration']].isnull().sum()
print(result)
# Outputs
#Fee         2
#Duration    2
#dtype: int64

Note that when you use sum() on multiple columns or entire DataFrame it returns naN values count for each column.

5. Total Count NaN Values on Entire DataFrame

To get the combined total count of NaN values, use isnull().sum().sum() on DataFrame. The below example returns the total count of NaN values from all columns.


# To get the Count
count = df.isnull().sum().sum()
print('Number of NaN values present:' +str(df2))

Yields below output.


Number of NaN values present:6

6. Complete Example For Check If any Value NaN

Below is the complete example of how to check if any value is NaN in pandas DataFrame.


import pandas as pd
import numpy as np
technologies = ({
     'Courses':["Spark","Java","Hadoop","Python","pandas"],
     'Fee' :[20000,np.nan,26000,np.nan,24000],
     'Duration':['30days',np.nan,'35days','40days',np.nan],
     'Discount':[1000,np.nan,2500,2100,np.nan]
               })
df = pd.DataFrame(technologies)
print(df)

# Checking NaN on entire DataFrame
value = df.isnull().values.any()
print(value)

# Checking on Single Column
value = df['Fee'].isnull().values.any()
print(value)

# Checking on Single Column
value = df['Courses'].isnull().values.any()
print(value)

# Checking on multiple columns
value = df[['Fee','Duration']].isnull().values.any()
print(value)

# Using DataFrame.isnull() method
df2 = df['Fee'].isnull()
print(df2)

# Counte NaN on entire DataFrame
result = df.isnull().sum()
print(result)

# Counte NaN on single column of DataFrame
result = df['Fee'].isnull().sum()
print(result)

# Counte NaN on selected columns of DataFrame
result = df[['Fee','Duration']].isnull().sum()
print(result)

# To get the Count
count = df.isnull().sum().sum()
print('Number of NaN values present:' +str(count))

Conclusion

In this article, you have learned how to check if any value is NaN in the entire pandas DataFrame, on a single column or multiple columns using DataFrame.isnull().any(), and DataFrame.isnull().sum() method. Also, you have learned how to get the count of NaN values using DataFrame.isnull().sum().sum() method.

Happy Learning !!

How to Drop Rows with NaN Values in Pandas DataFrame
How to Combine Two Series into pandas DataFrame
Pandas Remap Values in Column with a Dictionary (Dict)
Pandas Check Column Contains a Value in DataFrame
Check Values of Pandas Series is Unique
Pandas Check If DataFrame is Empty | Examples
Pandas – Check If a Column Exists in DataFrame
How to Check Pandas Version?

References

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isnull.html

Источник

In this article I would like to describe how to find NaN values in a pandas DataFrame. This kind of operation can be very useful given that is common to find datasets with missing or incorrect data values.

I will be using the numpy package to generate some data with NaN values.

Import necessary packages

import pandas as pd
import numpy as np
import platform

Enter fullscreen mode

Exit fullscreen mode

print(f'Python version: {platform.python_version()} ({platform.python_implementation()})')
print(f'Pandas version: {pd.__version__}')
print(f'Numpy version: {np.__version__}')

Enter fullscreen mode

Exit fullscreen mode

Python version: 3.6.4 (CPython)
Pandas version: 0.23.1
Numpy version: 1.14.5

Enter fullscreen mode

Exit fullscreen mode

Generate data with NaN values

num_nan = 25 # number of NaN values wanted in the generated data
np.random.seed(6765431)  # set a seed for reproducibility
A = np.random.randn(10, 10)
print(A)

Enter fullscreen mode

Exit fullscreen mode

[[-1.56132314 -0.16954058 -0.17845422 -1.33689111 -0.19185078 -1.18617765
   0.44499302 -0.61209568  0.31170935  1.4127548 ]
 [ 0.85330488  0.68517546 -1.10140989  0.84918019  0.72802961 -0.35161197
   0.73519152  1.13145412  0.53231247  0.78103143]
 [-0.81614324  0.15906898  0.49940119 -0.09319255 -1.07837721 -0.76053341
   0.73622083 -0.45518154 -0.69194032  1.02550409]
 [-1.96339975  0.07593331 -0.16798377 -1.20398958  0.88333656  1.17908422
   0.26324698 -2.65442248 -0.31583796 -0.16065732]
 [-1.24321376 -0.89816898  0.02824671  0.15304093  0.56505667 -0.78115883
   0.74504467  1.14025258 -0.04518221 -0.83908358]
 [ 1.00967019  0.84240102  1.15043436 -0.40120489  0.00664105 -1.23247563
   0.64738343  1.66096762 -0.92556683  0.47575796]
 [ 0.96516278  1.11158059 -0.82155143  0.88900313  2.16943761 -2.05250161
   2.40156233  0.92453867 -0.24437783 -2.91029265]
 [-0.86492662  0.82443151 -0.48246862 -1.05183143 -1.15272524 -0.77170733
   0.07177233  1.02820181 -2.08947076  0.89859677]
 [-0.07263982 -0.56840867  1.30910275 -0.52846822  0.06019191 -0.61000727
   0.40782356 -0.36124333 -1.54522486 -0.07891861]
 [-1.96361682 -1.06315325 -0.45582138 -0.74566868  1.27579529 -2.46306005
   0.57022673 -0.02793746  0.78652775  1.27690195]]

Enter fullscreen mode

Exit fullscreen mode

# Set random values to nan
A.ravel()[np.random.choice(A.size, num_nan, replace=False)] = np.nan
print(A)

Enter fullscreen mode

Exit fullscreen mode

[[-1.56132314 -0.16954058 -0.17845422 -1.33689111 -0.19185078 -1.18617765
          nan -0.61209568  0.31170935  1.4127548 ]
 [ 0.85330488  0.68517546         nan  0.84918019         nan -0.35161197
   0.73519152         nan  0.53231247  0.78103143]
 [-0.81614324  0.15906898  0.49940119         nan -1.07837721 -0.76053341
   0.73622083         nan -0.69194032  1.02550409]
 [-1.96339975  0.07593331         nan -1.20398958  0.88333656         nan
   0.26324698         nan -0.31583796 -0.16065732]
 [-1.24321376 -0.89816898  0.02824671  0.15304093  0.56505667 -0.78115883
   0.74504467  1.14025258 -0.04518221 -0.83908358]
 [ 1.00967019  0.84240102         nan -0.40120489  0.00664105         nan
   0.64738343  1.66096762 -0.92556683  0.47575796]
 [ 0.96516278         nan -0.82155143  0.88900313  2.16943761         nan
   2.40156233         nan -0.24437783         nan]
 [-0.86492662  0.82443151 -0.48246862 -1.05183143 -1.15272524 -0.77170733
   0.07177233  1.02820181 -2.08947076         nan]
 [-0.07263982         nan  1.30910275 -0.52846822  0.06019191 -0.61000727
   0.40782356 -0.36124333         nan         nan]
 [        nan         nan         nan         nan  1.27579529 -2.46306005
          nan         nan  0.78652775  1.27690195]]

Enter fullscreen mode

Exit fullscreen mode

# Create a DataFrame from the generated data
df = pd.DataFrame(A)
df

Enter fullscreen mode

Exit fullscreen mode

	0	1	2	3	4	5	6	7	8	9
0	-1.561323	-0.169541	-0.178454	-1.336891	-0.191851	-1.186178	NaN	-0.612096	0.311709	1.412755
1	0.853305	0.685175	NaN	0.849180	NaN	-0.351612	0.735192	NaN	0.532312	0.781031
2	-0.816143	0.159069	0.499401	NaN	-1.078377	-0.760533	0.736221	NaN	-0.691940	1.025504
3	-1.963400	0.075933	NaN	-1.203990	0.883337	NaN	0.263247	NaN	-0.315838	-0.160657
4	-1.243214	-0.898169	0.028247	0.153041	0.565057	-0.781159	0.745045	1.140253	-0.045182	-0.839084
5	1.009670	0.842401	NaN	-0.401205	0.006641	NaN	0.647383	1.660968	-0.925567	0.475758
6	0.965163	NaN	-0.821551	0.889003	2.169438	NaN	2.401562	NaN	-0.244378	NaN
7	-0.864927	0.824432	-0.482469	-1.051831	-1.152725	-0.771707	0.071772	1.028202	-2.089471	NaN
8	-0.072640	NaN	1.309103	-0.528468	0.060192	-0.610007	0.407824	-0.361243	NaN	NaN
9	NaN	NaN	NaN	NaN	1.275795	-2.463060	NaN	NaN	0.786528	1.276902

Check for NaN values

Now that we have some data to operate on let’s see the different ways we can check for missing values.

There are two methods of the DataFrame object that can be used: DataFrame#isna() and DataFrame#isnull(). But if you check the source code it seems that isnull() is only an alias for the isna() method. To keep it simple I will only use the isna() method as we would get the same result using isnull().

df.isna()

Enter fullscreen mode

Exit fullscreen mode

	0	1	2	3	4	5	6	7	8	9
0	False	False	False	False	False	False	True	False	False	False
1	False	False	True	False	True	False	False	True	False	False
2	False	False	False	True	False	False	False	True	False	False
3	False	False	True	False	False	True	False	True	False	False
4	False	False	False	False	False	False	False	False	False	False
5	False	False	True	False	False	True	False	False	False	False
6	False	True	False	False	False	True	False	True	False	True
7	False	False	False	False	False	False	False	False	False	True
8	False	True	False	False	False	False	False	False	True	True
9	True	True	True	True	False	False	True	True	False	False

As it can be seen above when we use the isna() method it returns a DataFrame with boolean values, where True indicates NaN values and False otherwise.

If we wanted to know how many missing values there are on each row or column we could use the DataFrame#sum() method:

df.isna().sum(axis='rows')  # 'rows' or 0

Enter fullscreen mode

Exit fullscreen mode

0    1
1    3
2    4
3    2
4    1
5    3
6    2
7    5
8    1
9    3
dtype: int64

Enter fullscreen mode

Exit fullscreen mode

df.isna().sum(axis='columns')  # 'columns' or 1

Enter fullscreen mode

Exit fullscreen mode

0    1
1    3
2    2
3    3
4    0
5    2
6    4
7    1
8    3
9    6
dtype: int64

Enter fullscreen mode

Exit fullscreen mode

To simply know the total number of missing values we can call sum() again:

df.isna().sum().sum()

Enter fullscreen mode

Exit fullscreen mode

Enter fullscreen mode

Exit fullscreen mode

If we simply wanna know if there is any missing value with no care for the quantity we can simply use the any() method:

df.isna().any()  # can also receive axis='rows' or 'columns'

Enter fullscreen mode

Exit fullscreen mode

0    True
1    True
2    True
3    True
4    True
5    True
6    True
7    True
8    True
9    True
dtype: bool

Enter fullscreen mode

Exit fullscreen mode

Calling it again we have a single boolean output:

df.isna().any().any()

Enter fullscreen mode

Exit fullscreen mode

True

Enter fullscreen mode

Exit fullscreen mode

Besides the isna() method we also have the notna() method which is its boolean inverse. Applying it we can get the number of values that are not missing or simply if all values are not missing (but using the all() method instead of any()).

print(df.notna().sum().sum())  # not missing
print(df.notna().all().all())

Enter fullscreen mode

Exit fullscreen mode

75
False

Enter fullscreen mode

Exit fullscreen mode

Note 1: in the examples, it was used the DataFrame methods to check for missing values, but the pandas package has its own functions with the same purpose that can be applied to other objects. Example:

print(pd.isna([1, 2, np.nan]))
print(pd.notna([1, 2, np.nan]))

Enter fullscreen mode

Exit fullscreen mode

[False False  True]
[ True  True False]

Enter fullscreen mode

Exit fullscreen mode

Note 2: the methods applied here on DataFrame objects are also available for Series and Index objects.

Time comparison

Comparing the time taken by the two methods we can see that using any() is faster but sum() will give us the additional information about how many missing values there are.

%timeit df.isna().any().any()

Enter fullscreen mode

Exit fullscreen mode

333 µs ± 33.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Enter fullscreen mode

Exit fullscreen mode

%timeit df.isna().sum().sum()

Enter fullscreen mode

Exit fullscreen mode

561 µs ± 97.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Enter fullscreen mode

Exit fullscreen mode

Dealing with missing values

Two easy ways to deal with missing values are removing them or filling them with some value. These can be achieved with the dropna() and fillna() methods.

The dropna() method will return a DataFrame without the rows and columns containing missing values.

df.dropna()

Enter fullscreen mode

Exit fullscreen mode

	0	1	2	3	4	5	6	7	8	9
4	-1.243214	-0.898169	0.028247	0.153041	0.565057	-0.781159	0.745045	1.140253	-0.045182	-0.839084

The fillna() method will return a DataFrame with the missing values filled with a specified value.

df.fillna(value=5)

Enter fullscreen mode

Exit fullscreen mode

	0	1	2	3	4	5	6	7	8	9
0	-1.561323	-0.169541	-0.178454	-1.336891	-0.191851	-1.186178	5.000000	-0.612096	0.311709	1.412755
1	0.853305	0.685175	5.000000	0.849180	5.000000	-0.351612	0.735192	5.000000	0.532312	0.781031
2	-0.816143	0.159069	0.499401	5.000000	-1.078377	-0.760533	0.736221	5.000000	-0.691940	1.025504
3	-1.963400	0.075933	5.000000	-1.203990	0.883337	5.000000	0.263247	5.000000	-0.315838	-0.160657
4	-1.243214	-0.898169	0.028247	0.153041	0.565057	-0.781159	0.745045	1.140253	-0.045182	-0.839084
5	1.009670	0.842401	5.000000	-0.401205	0.006641	5.000000	0.647383	1.660968	-0.925567	0.475758
6	0.965163	5.000000	-0.821551	0.889003	2.169438	5.000000	2.401562	5.000000	-0.244378	5.000000
7	-0.864927	0.824432	-0.482469	-1.051831	-1.152725	-0.771707	0.071772	1.028202	-2.089471	5.000000
8	-0.072640	5.000000	1.309103	-0.528468	0.060192	-0.610007	0.407824	-0.361243	5.000000	5.000000
9	5.000000	5.000000	5.000000	5.000000	1.275795	-2.463060	5.000000	5.000000	0.786528	1.276902

References:

Create sample numpy array with randomly placed NaNs (StackOverflow)
How to check if any value is NaN in a Pandas DataFrame (StackOverflow)
pandas.isnull
pandas.isna
pandas.notna
pandas.DataFrame.dropna
pandas.DataFrame.fillna

Источник

Check for NaN Value in Pandas DataFrame

Method 1: Using isnull().values.any() method

Python3

Python3

Method 2: Using isnull().sum() Method

Python3

Method 3: Using isnull().sum().any() Method

Python3

Method 4: Using isnull().sum().sum() Method

Python3

(1) Check for NaN under a single DataFrame column

(2) Count the NaN under a single DataFrame column

(3) Check for NaN under an entire DataFrame

(4) Count the NaN under an entire DataFrame

1. Quick Examples of Check If any Value is NaN

3. Check for NaN Values on Selected Columns

3. Using DataFrame.isnull() Method

4. Count the NaN Values on Single or Multiple DataFrame Columns

5. Total Count NaN Values on Entire DataFrame

6. Complete Example For Check If any Value NaN

Conclusion

Related Articles

References

Import necessary packages

Generate data with NaN values

Check for NaN values

Time comparison

Dealing with missing values

References:

Не пропустите также: