Display full content of a column
pd.set_option(“display.max_colwidth”,-1)
Display full array
np.set_printoptions(threshold=np. inf)
set back to normal width
pd.reset_option(“all”)
Select only particular columns
T1=T1[[‘likeCount’,’retweetCount’,’renderedContent’,’hashtags’,’date’]]
drop multiple columns at a time
T1=T1.drop[[‘a’,’b’,’c’],axis=1]
Error: Unhashable series Error
Check for [] in the code
Convert list to string
s = T1[‘hashtags’] listToStr = ‘ ‘.join(map(str, s))
Divide single column into multiple
T1[‘hashtags’]=T1[‘hashtags’].astype(str) New=T1[‘hashtags’].str.split(‘,’,expand=True)
Replace [ & ‘ with empty spaces to work further
listToStr = listToStr.replace(“’”,””) listToStr=listToStr.replace(“[”,””)
changing list datatype column to string datatype column
T1[‘hashtagsNew’] = T1[‘hashtags’].agg(lambda x: ‘,’.join(map(str, x)))
Replace a chararcter in string datatype column
T1[‘hashtagsNew’]=T1[‘hashtagsNew’].str.replace(“’”,””) T1[‘hashtagsNew’]= T1[‘hashtagsNew’].str.replace(“[”,””)
Convert List type Column to String type
T1[‘hashtags’]=T1[‘hashtags’].astype(str)
Now replace characters
T1[‘hashtags’]= T1[‘hahstags’].str.replace(“’”,””) T1[‘hashtags’]= T1[‘hahstags’].str.replace(“[”,””) T1[‘hashtags’]= T1[‘hahstags’].str.replace(“]”,””)
Bar chart
mlt.figure(figsize=(9,6)) mlt.bar(x=T1[‘sentiment’], height=T1[‘likeCount’]) mlt.xticks(rotation=45)
Rename a Column
rankings_pd.rename(columns = {‘test’:’TEST’}, inplace = True) If Numerical value as column then dont give under quotes, give like this New.rename(columns={0:”NT”},inplace=False)
### Print Unique values df.B.unique() (B is a column name)
### Convert to lower case New[‘NT’]=New[‘NT’].str.lower()
### Convert series object to a DataFrame Test1=pd.DataFrame({‘Values’:Test1.index, ‘Frequency’:Test1.values}) (Here index will be first column and values will be second column of that series object)
### Pie Chart mlt.pie(Test1[‘Frequency’]) [‘Frequency is a variable here’]
### Search string with some common format Test1[Test1[‘Values’].str.contains(r’mark(?!$)’)] [Test 1 is the dataframe here & Values is the variable, and mark is the search word]
Search string with common format dropping NaN Values
Test1[Test1.NT1.notnull()][Test1.NT1.dropna().str.contains(“nft”)]
Drop Index in DataFrame
Test3.reset_index(drop=True,inplace=True) ( Test3 is a DataFrame here)
Check for If Any NaN Values in a column
Test3[‘NT1 Freq’].isnull().values.any() - Gives the answer true if any Test3[‘NT1 Freq’].isnull() - Gives True or False with the row number for Nan Values
Drop Nan Values before plotting
Test3= Test3.dropna()
Sorting a column
Test3.sort_values(by=[‘NT1 Freq’])
Changing a object column into float
Test3[‘NT1 Freq’]=Test3[‘NT1 Freq’].astype(float)
Listing the highest values in a column
Test3.nlargest(20, [‘NT1 Freq’])
Listing the smalllest values in a column
Test3.nsmallest(20, [‘NT1 Freq’])
Extracting one string and calculating its occurences in each column and adding up
New[New[‘NT’].str.contains(r’zuck(?!$)’)].count().sum()
Group by
title_type = Test1.groupby(‘NT1’).agg(‘count’) print(title_type)
Output NT1
1milliondancestudio 1 1 1
3d 2 2 2
3dclothing 1 1 1
3dweb 1 1 1
5g 1 1 1
… .. .. ..
zepetox1m 2 2 2
zucc 1 1 1
zuck 1 1 1
zuckerberg 3 3 3
فورتنايت 1 1 1