1. 程式人生 > >[data scientist面試總結]

[data scientist面試總結]

為什麼用八層的CNN?

為什麼每隔兩層用dropout 和pooling,為什麼不是每層或者每隔三層?

講一下dropout?

怎麼計算的精確度?(softmax)

講一下k-means clustering, 用的什麼距離演算法?(這裡用的是歐幾里得距離)

TensorFlow用過什麼功能,舉例(驗證你用過TensorFlow)

tf.Variable(tf.random_normal(shape, stddev=stddev), name=name)

x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])

hidden = tf.matmul(x, weight1) + bias1
hidden = tf.nn.relu(hidden)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=y, labels=y_))

train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)), tf.float32))

怎麼把不同型別的資料轉換成TensorFlow可以使用的資料?Matrix, Text?

資料分析專案中怎麼用的Folium, 在實際畫地圖的時候,如果遇到GPS重合的情況怎麼辦?

cols = ["Latitude", "Longitude"]
task4_frame = task1_frame[task1_frame["TRIP_ID"] == "190676"][cols]
task4_frame = task4_frame.dropna()  # drop all rows contain NaN
latitude = task4_frame["Latitude"].values
longitude = task4_frame["Longitude"].values

center = (latitude.mean(), longitude.mean())
points = [(lat, lon) for lat, lon in zip(latitude, longitude)]
start_point, end_point = points[0], points[-1]

# cf. http://python-visualization.github.io/folium/docs-v0.5.0/modules.html#id1
folium_map = folium.Map(location=center, zoom_start=13, tiles="OpenStreetMap")
# start point
folium.CircleMarker(location=start_point, radius=10, weight=3, color='red', opacity=0.8).add_to(folium_map)
# end point
folium.CircleMarker(location=end_point, radius=10, weight=3, color='blue', opacity=0.8).add_to(folium_map)
# route 
folium.PolyLine(points, color="green", weight=2.5, opacity=1).add_to(folium_map)
folium_map.save('results/route_plot.html')
folium_map

印象最深刻的專案?