「 机器学习 」
August 20, 2018
Words count
8.9k
Reading time
8 mins.
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(666)
X = np.random.normal(0,1,size=(200,2))
y = np.array(X[:,0]**2 + X[:,1] < 1.5,dtype='int')
for _ in range(20):
y[np.random.randint(200)] = 1
plt.scatter(X[y==0,0],X[...
Read article
「 机器学习 」
August 20, 2018
Words count
5k
Reading time
5 mins.
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(666)
X = np.random.normal(0,1,size=(200,2))
y = np.array(X[:,0]**2+X[:,1]**2 <1.5,dtype='int')
plt.scatter(X[y==0,0],X[y==0,1])
plt.scatter(X[y==1,0],X[y==1,1])
plt.show()...
Read article
「 机器学习 」
August 20, 2018
Words count
2.7k
Reading time
2 mins.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target
X = X[y<2, :2]
y = y[y<2]
X.shape
(100, 2)
y.shape
(100,)
plt.scatter(X[y==0,0],X[y==0,1],color='...
Read article
「 机器学习 」
August 20, 2018
Words count
514
Reading time
1 mins.
「 机器学习 」
August 20, 2018
Words count
1.2k
Reading time
1 mins.
解决分类问题
- 回归问题怎么解决分类问题?
将样本的特征和样本发生的概率联系起来,概率是一个数
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(t):
return 1 / (1 + np.exp(-t))
x = np.linspace(-10,10,500)
y = sigmoid(x)
plt.plot(x,y)
plt.show()
问题:
对于给定的样本数据集X,y,我们如何找到参数theta,使...
Read article
「 机器学习 」
August 20, 2018
Words count
4.8k
Reading time
4 mins.
Least Absolute Shrinkage and Selection Operator Regression
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
x = np.random.uniform(-3.0,3.0,size=100)
X = x.reshape(-1,1)
y = 0.5 * x + 3 + np.random.normal(0,1,size=100)
plt.scatter...
Read article
「 机器学习 」
August 20, 2018
Words count
3.9k
Reading time
4 mins.
模型正则化:限制参数的大小
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
x = np.random.uniform(-3.0,3.0,size=100)
X = x.reshape(-1,1)
y = 0.5 * x + 3 + np.random.normal(0,1,size=100)
plt.scatter(x, y)
plt.show()
# 使用多项式回归
from sklearn....
Read article
「 机器学习 」
August 20, 2018
Words count
1.1k
Reading time
1 mins.
模型误差 = 偏差(Bias)+ 方差(Variance)+ 不可避免的误差
偏差和方差通常是矛盾的:降低偏差,会提高方差,降低方差会提高偏差
机器学习的主要挑战,来自于方差
解决高方差的通常手段
- 降低模型复杂度
- 减少数据维度;降噪
- 增加样本数
- 使用验证集
- 模型正则化
Read article
「 机器学习 」
August 20, 2018
Words count
5.4k
Reading time
5 mins.
解决方法是将数据分成三份,训练数据、验证数据、测试数据
import numpy as np
from sklearn import datasets
/Users/shirukai/anaconda3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return...
Read article