Link Search Menu Expand Document

Display full content of a column

pd.set_option(“display.max_colwidth”,-1)

Display full array

np.set_printoptions(threshold=np. inf)

set back to normal width

pd.reset_option(“all”)

Select only particular columns

T1=T1[[‘likeCount’,’retweetCount’,’renderedContent’,’hashtags’,’date’]]

drop multiple columns at a time

T1=T1.drop[[‘a’,’b’,’c’],axis=1]

Error: Unhashable series Error

Check for [] in the code

Convert list to string

s = T1[‘hashtags’] listToStr = ‘ ‘.join(map(str, s))

Divide single column into multiple

T1[‘hashtags’]=T1[‘hashtags’].astype(str) New=T1[‘hashtags’].str.split(‘,’,expand=True)

Replace [ & ‘ with empty spaces to work further

listToStr = listToStr.replace(“’”,””) listToStr=listToStr.replace(“[”,””)

changing list datatype column to string datatype column

T1[‘hashtagsNew’] = T1[‘hashtags’].agg(lambda x: ‘,’.join(map(str, x)))

Replace a chararcter in string datatype column

T1[‘hashtagsNew’]=T1[‘hashtagsNew’].str.replace(“’”,””) T1[‘hashtagsNew’]= T1[‘hashtagsNew’].str.replace(“[”,””)

Convert List type Column to String type

T1[‘hashtags’]=T1[‘hashtags’].astype(str)

Now replace characters

T1[‘hashtags’]= T1[‘hahstags’].str.replace(“’”,””) T1[‘hashtags’]= T1[‘hahstags’].str.replace(“[”,””) T1[‘hashtags’]= T1[‘hahstags’].str.replace(“]”,””)

Bar chart

mlt.figure(figsize=(9,6)) mlt.bar(x=T1[‘sentiment’], height=T1[‘likeCount’]) mlt.xticks(rotation=45)

Rename a Column

rankings_pd.rename(columns = {‘test’:’TEST’}, inplace = True) If Numerical value as column then dont give under quotes, give like this New.rename(columns={0:”NT”},inplace=False)

### Print Unique values df.B.unique() (B is a column name)

### Convert to lower case New[‘NT’]=New[‘NT’].str.lower()

### Convert series object to a DataFrame Test1=pd.DataFrame({‘Values’:Test1.index, ‘Frequency’:Test1.values}) (Here index will be first column and values will be second column of that series object)

### Pie Chart mlt.pie(Test1[‘Frequency’]) [‘Frequency is a variable here’]

### Search string with some common format Test1[Test1[‘Values’].str.contains(r’mark(?!$)’)] [Test 1 is the dataframe here & Values is the variable, and mark is the search word]

Search string with common format dropping NaN Values

Test1[Test1.NT1.notnull()][Test1.NT1.dropna().str.contains(“nft”)]

Drop Index in DataFrame

Test3.reset_index(drop=True,inplace=True) ( Test3 is a DataFrame here)

Check for If Any NaN Values in a column

Test3[‘NT1 Freq’].isnull().values.any() - Gives the answer true if any Test3[‘NT1 Freq’].isnull() - Gives True or False with the row number for Nan Values

Drop Nan Values before plotting

Test3= Test3.dropna()

Sorting a column

Test3.sort_values(by=[‘NT1 Freq’])

Changing a object column into float

Test3[‘NT1 Freq’]=Test3[‘NT1 Freq’].astype(float)

Listing the highest values in a column

Test3.nlargest(20, [‘NT1 Freq’])

Listing the smalllest values in a column

Test3.nsmallest(20, [‘NT1 Freq’])

Extracting one string and calculating its occurences in each column and adding up

New[New[‘NT’].str.contains(r’zuck(?!$)’)].count().sum()

Group by

title_type = Test1.groupby(‘NT1’).agg(‘count’) print(title_type)

Output NT1
1milliondancestudio 1 1 1
3d 2 2 2
3dclothing 1 1 1
3dweb 1 1 1
5g 1 1 1
… .. .. ..
zepetox1m 2 2 2
zucc 1 1 1
zuck 1 1 1
zuckerberg 3 3 3
فورتنايت 1 1 1


Table of contents