Kaggle - Titanic -1
續前 : Kaggle
Ref: http://www.jianshu.com/p/32def2294ae6
繼續研究各欄位
1. 分析各欄位
info()
Ref: http://www.jianshu.com/p/32def2294ae6
繼續研究各欄位
1. 分析各欄位
info()
Data columns (total 12 columns):
PassengerId 891 non-null int64
Survived 891 non-null int64
Pclass 891 non-null int64
Name 891 non-null object
Sex 891 non-null object
Age 714 non-null float64
SibSp 891 non-null int64
Parch 891 non-null int64
Ticket 891 non-null object
Fare 891 non-null float64
Cabin 204 non-null object
Embarked 889 non-null object
--
PassengerId: 是 int, 值為 index +1
--
Survived : 0 / 1
--
Pclass : 1 - 頭等艙, 2 - 商務艙, 3 - 經濟艙
--
Name : 姓名, 我取 last name (, 前 )
須注意 string 輸入到 random forest classifier 會有型態的錯誤, 需要轉換
可用
- LabelEncoder : turn your string into incremental value
- OneHotEncoder : use One-of-K algorithm to transform your String into integer
先用 LabelEncoder 來試試.
--
Age :
想說用 random forest 來預測 Age, 但準確度並不高. (<0.1),
改用原範例 平均值先
之後可能 Age 得想個好的關聯圖..
--
Ticket : 一樣用 LabelEncoder
--
Cabin : 很多 NaN, 也許下一版找個好的方法
Embarked :
這兩欄先去除
--
程式 : github
---
結果 : 增加 0.01
Todo:
https://www.kaggle.com/c/titanic/details/further-reading-watching
https://www.kaggle.com/c/titanic/details/further-reading-watching
Further Reading / Watching
Want more? Here are some links to some tutorial pages that might interest you!
- triangleinequality's Complete Guide to improving your score with a great section on Feature Engineering
- agconti's iPython Tutorial which touches upon other machine learning methods to try
- Getting Started with Pandas exercise, more advanced one that uses a completely different dataset
- Getting Into Shape for Competitive Data Science (Video)
- Kaggle wiki of Data Science Tutorials
留言
張貼留言