{"id":19136,"date":"2026-05-05T11:10:03","date_gmt":"2026-05-05T04:10:03","guid":{"rendered":"https:\/\/mb668s.com\/cam-nang-7mb66-xoc-dia\/?p=19136"},"modified":"2026-05-28T12:01:08","modified_gmt":"2026-05-28T05:01:08","slug":"cau-hoi-phong-van-data-scientist","status":"publish","type":"post","link":"https:\/\/mb668s.com\/cam-nang-7mb66-xoc-dia\/phong-van-viec-lam\/cau-hoi-phong-van-data-scientist","title":{"rendered":"30 c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Scientist \u2013 Stats, ML, Coding, System"},"content":{"rendered":"\n
Data Scientist l\u00e0 m\u1ed9t trong nh\u1eefng v\u1ecb tr\u00ed c\u00f3 nhu c\u1ea7u tuy\u1ec3n d\u1ee5ng v\u00e0 m\u1ee9c l\u01b0\u01a1ng cao nh\u1ea5t ng\u00e0nh CNTT t\u1ea1i Vi\u1ec7t Nam 2026. Theo d\u1eef li\u1ec7u CareerLink (05\/2026) v\u00e0 kh\u1ea3o s\u00e1t Talentnet 2025, l\u01b0\u01a1ng Data Scientist t\u1ea1i VN t\u0103ng 22% so v\u1edbi 2024 \u2013 cao th\u1ee9 2 trong c\u00e1c vai tr\u00f2 IT (sau ML Engineer). B\u1ed9 c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Scientist<\/strong> th\u01b0\u1eddng t\u1eadp trung v\u00e0o 5 nh\u00f3m: Statistics & Math, Machine Learning, Data Engineering, Programming (Python\/SQL), v\u00e0 Behavioral case. B\u00e0i vi\u1ebft t\u1ed5ng h\u1ee3p 30 c\u00e2u h\u1ecfi ph\u1ed5 bi\u1ebfn nh\u1ea5t v\u1edbi khung tr\u1ea3 l\u1eddi cho th\u1ecb tr\u01b0\u1eddng VN.<\/p>\n\n\n\n T\u1ed5ng quan nhanh:<\/strong><\/p>\n \u2013 Quy tr\u00ecnh ph\u1ecfng v\u1ea5n Data Scientist th\u01b0\u1eddng 4\u20135 v\u00f2ng: HR \u2192 Take-home assignment \u2192 Technical Interview \u2192 System Design \u2192 Hiring Manager.<\/p>\n \u2013 5 nh\u00f3m c\u00e2u h\u1ecfi: Stats & ML (35%), Coding Python\/SQL (25%), Data Engineering (15%), System Design (15%), Behavioral (10%).<\/p>\n \u2013 M\u1ee9c l\u01b0\u01a1ng 2026 (CRL Q2 + Talentnet): Junior 18\u201328 tri\u1ec7u, Mid 30\u201355 tri\u1ec7u, Senior 60\u2013100 tri\u1ec7u, Lead 90\u2013160 tri\u1ec7u.<\/p>\n \u2013 Top 5 c\u00f4ng ty tuy\u1ec3n nhi\u1ec1u: VNG Cloud, MoMo, Be Group, FPT AI, VinAI Research.<\/p>\n<\/div>\n\n\n\n \u0110\u00e2y l\u00e0 ki\u1ebfn th\u1ee9c n\u1ec1n \u2013 \u0111a s\u1ed1 ph\u1ecfng v\u1ea5n v\u00f2ng \u0111\u1ea7u h\u1ecfi 3\u20135 c\u00e2u nh\u00f3m n\u00e0y.<\/p>\n\n\n\n \u2013 C\u00e2u 1:<\/strong> “S\u1ef1 kh\u00e1c bi\u1ec7t gi\u1eefa Type I v\u00e0 Type II error?”. Khung: Type I (false positive) \u2013 reject H0 khi \u0111\u00fang. Type II (false negative) \u2013 fail to reject H0 khi sai. Trade-off qua significance level \u03b1 v\u00e0 power 1-\u03b2.<\/p>\n\n\n\n \u2013 C\u00e2u 2:<\/strong> “Khi n\u00e0o d\u00f9ng t-test, khi n\u00e0o d\u00f9ng z-test?”. Khung: z-test khi sample size > 30 v\u00e0 bi\u1ebft \u03c3. t-test khi sample size < 30 ho\u1eb7c kh\u00f4ng bi\u1ebft \u03c3. Both gi\u1ea3 \u0111\u1ecbnh data normally distributed.<\/p>\n\n\n\n \u2013 C\u00e2u 3:<\/strong> “P-value l\u00e0 g\u00ec? Threshold th\u01b0\u1eddng l\u00e0?”. Khung: X\u00e1c su\u1ea5t quan s\u00e1t \u0111\u01b0\u1ee3c data extreme nh\u01b0 hi\u1ec7n t\u1ea1i n\u1ebfu H0 \u0111\u00fang. p < 0.05 th\u01b0\u1eddng d\u00f9ng \u0111\u1ec3 reject H0 (significance level \u03b1).<\/p>\n\n\n\n \u2013 C\u00e2u 4:<\/strong> “Central Limit Theorem (CLT) ph\u00e1t bi\u1ec3u g\u00ec?”. Khung: Sampling distribution c\u1ee7a mean ti\u1ec7m c\u1eadn normal khi n \u0111\u1ee7 l\u1edbn (\u226530), b\u1ea5t k\u1ec3 distribution g\u1ed1c. C\u01a1 s\u1edf cho confidence interval v\u00e0 hypothesis testing.<\/p>\n\n\n\n \u2013 C\u00e2u 5:<\/strong> “Kh\u00e1c bi\u1ec7t gi\u1eefa correlation v\u00e0 causation?”. Khung: Correlation: 2 bi\u1ebfn bi\u1ebfn thi\u00ean c\u00f9ng nhau (Pearson r). Causation: A g\u00e2y ra B. Correlation kh\u00f4ng implies causation \u2013 c\u1ea7n experiment (RCT) ho\u1eb7c quasi-experiment (DiD, IV).<\/p>\n\n\n\n \u0110\u00e2y l\u00e0 ph\u1ea7n th\u1ef1c h\u00e0nh \u2013 th\u01b0\u1eddng c\u00f3 live coding 1\u20132 b\u00e0i.<\/p>\n\n\n\n \u2013 C\u00e2u 12 (Python):<\/strong> “Vi\u1ebft function t\u00ednh moving average c\u1ee7a list”. Khung: D\u00f9ng deque ho\u1eb7c rolling window. Code: \u2013 C\u00e2u 13 (SQL):<\/strong> “Vi\u1ebft query t\u00ecm top 3 s\u1ea3n ph\u1ea9m doanh thu cao nh\u1ea5t m\u1ed7i th\u00e1ng”. Khung: D\u00f9ng ROW_NUMBER() OVER (PARTITION BY month ORDER BY revenue DESC). WHERE rn \u2264 3.<\/p>\n\n\n\n \u2013 C\u00e2u 14 (Python):<\/strong> “Kh\u00e1c bi\u1ec7t list comprehension v\u00e0 generator?”. Khung: List \u2013 eval ngay, l\u01b0u memory to\u00e0n b\u1ed9. Generator \u2013 lazy eval, l\u01b0u state. Generator ph\u00f9 h\u1ee3p large data ho\u1eb7c streaming.<\/p>\n\n\n\n
<\/figure>\n\n\n\n1. Nh\u00f3m Statistics & Probability<\/h2>\n\n\n\n
2. Nh\u00f3m Machine Learning<\/h2>\n\n\n\n
\n\n
\n \nC\u00e2u h\u1ecfi<\/th>\n \u0110i\u1ec3m tr\u1ecdng t\u00e2m<\/th>\n<\/tr>\n<\/thead>\n \n Bias-variance tradeoff?<\/td>\n Bias cao = underfit; variance cao = overfit. Total error = bias\u00b2 + variance + noise. C\u00e2n b\u1eb1ng qua regularization, cross-validation<\/td>\n<\/tr>\n \n Khi n\u00e0o d\u00f9ng L1 vs L2 regularization?<\/td>\n L1 (Lasso) \u2013 feature selection, sparse model. L2 (Ridge) \u2013 t\u1ea5t c\u1ea3 feature, smooth weights. Elastic Net = L1 + L2<\/td>\n<\/tr>\n \n ROC AUC vs Precision-Recall?<\/td>\n ROC AUC t\u1ed1t cho balanced data. PR curve t\u1ed1t h\u01a1n cho imbalanced (fraud detection, churn)<\/td>\n<\/tr>\n \n Random Forest vs XGBoost?<\/td>\n RF \u2013 bagging parallel, robust noise. XGBoost \u2013 boosting sequential, accuracy cao h\u01a1n nh\u01b0ng d\u1ec5 overfit, c\u1ea7n tuning<\/td>\n<\/tr>\n \n Cross-validation strategies?<\/td>\n k-fold (k=5\/10), stratified k-fold cho classification, time-series split cho temporal data<\/td>\n<\/tr>\n \n Class imbalance x\u1eed l\u00fd th\u1ebf n\u00e0o?<\/td>\n SMOTE oversampling, undersampling, class weight, threshold tuning, focal loss<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n 3. Nh\u00f3m coding Python & SQL<\/h2>\n\n\n\n
def moving_avg(arr, w): return [sum(arr[i:i+w])\/w for i in range(len(arr)-w+1)]<\/code>. L\u01b0u \u00fd: pandas df.rolling(w).mean()<\/code> hi\u1ec7u qu\u1ea3 h\u01a1n cho large data.<\/p>\n\n\n\n