bicloud

5月 192017
 
2017 data science bowl 2nd https://github.com/dhammack/DSB2017/ https://github.com/juliandewit/kaggle_ndsb2017/
增强学习介绍 https://lufficc.com/blog/reinforcement-learning-and-implementation https://github.com/lufficc/dqn
Demystifying Deep Reinforcement Learning https://www.nervanasys.com/demystifying-deep-reinforcement-learning/
Santander Product Recommendation https://github.com/ttvand/Santander-Product-Recommendation
https://github.com/aaron-xichen/pytorch-playground Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)
伯克利增强学习课程 http://rll.berkeley.edu/deeprlcourse/
DQN 从入门到放弃1 DQN与增强学习 https://zhuanlan.zhihu.com/p/21262246
alphago介绍 http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Resources_files/AlphaGo_IJCAI.pdf
geohash https://en.wikipedia.org/wiki/Geohash http://blog.csdn.net/zmx729618/article/details/53068170 http://www.cnblogs.com/dengxinglin/archive/2012/12/14/2817761.html
    com.spatial4j
    spatial4j
    0.5
    ch.hsr
    geohash
    1.3.0
http://blog.csdn.net/ghsau/article/details/50591932
deepmind papers https://deepmind.com/research/publications/
How we built Tagger News: machine learning on a tight schedule http://varianceexplained.org/programming/tagger-news/  https://github.com/dodger487
mnist data csv format https://pjreddie.com/projects/mnist-in-csv/
deep chatbots https://github.com/mckinziebrandon/DeepChatModels

 
 Posted by at 11:36 下午
5月 122017
 
最近实际项目需要构建复杂网络,这块一直没有实践,之前主要是看看paper,尤其是大数据下的图计算模型。基于hadoop的图计算框架giraph(facebook实践),通过实践对pregel的理解更加深入,实现热传导算法等等。
hadoop graph框架学习和实践   giraph http://giraph.apache.org/ , http://grafos.ml/  http://arabesque.io/
Arabesque: A System for Distributed Graph Mining http://sigops.org/sosp/sosp15/current/2015-Monterey/printable/093-teixeira.pdf
tensorflow https://github.com/skcript/tensorflow-resources
spark https://github.com/endymecy/spark-ml-source-analysis
spark 关闭运行日志 http://stackoverflow.com/questions/25193488/how-to-turn-off-info-logging-in-pyspark
Architecture of Giants: Data Stacks at Facebook, Netflix, Airbnb, and Pinterest https://blog.keen.io/architecture-of-giants-data-stacks-at-facebook-netflix-airbnb-and-pinterest-9b7cd881af54?imm_mid=0f1550
TensorFlow template application for deep learning https://github.com/tobegit3hub/deep_recommend_system
Top 20 Recent Research Papers on Machine Learning and Deep Learning http://www.kdnuggets.com/2017/04/top-20-papers-machine-learning.html
jblas http://jblas.org/
spark 机器学习 https://book.douban.com/subject/26350074/
machine learning dataset http://persoal.citius.usc.es/manuel.fernandez.delgado/papers/jmlr/data.tar.gz
Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? http://jmlr.org/papers/volume15/delgado14a/delgado14a.pdf
CTR predict
Y. W. Chang, C. J. Hsieh, K. W. Chang, M. Ringgaard, and C.-J. Lin, “Training and testing low- degree polynomial data mappings via linear SVM,” Journal of Machine Learning Research, vol. 11, pp. 1471–1490, 2010.
T. Kudo and Y. Matsumoto, “Fast methods for kernel-based text analysis,” in Proceedings of the 41st Annual Meeting of the Association of Computational Linguistics (ACL), 2003
S. Rendle, “Factorization machines,” in Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 995–1000, 2010.
B. Mcmahan, G. Holt, D. Scully , “Ad Click Prediction: a View from the Trenches”
J. Pan, O. Jin, T. Xu, “Practical Lessons from Predicting Clicks on Ads at Facebook”
Y. Juan, Y. Xhuang, W, Chin, “Field-aware Factorization Machines for CTR Prediction”
G. James, D. Witten, T. Hastie, R. Tibshirani, “An Introduction to Statistical Learning”, 2013.
Neural Models for Information Retrieval https://arxiv.org/pdf/1705.01509.pdf
https://kowshik.github.io/JPregel/pregel_paper.pdf Pregel: A System for Large-Scale Graph Processing
Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction https://arxiv.org/abs/1704.05194
High Performance Linear Algebra OOP https://github.com/fommil/matrix-toolkits-java
抓取京东评论数据 https://github.com/awolfly9 

 
 Posted by at 8:32 下午
5月 052017
 
learning deep learning with keras http://p.migdal.pl/2017/04/30/teaching-deep-learning.html
移动视频2017年用户画像和趋势预测 http://mp.weixin.qq.com/s?src=3&timestamp=1493786539&ver=1&signature=LmxQAN5pURJyKafpyA7pOMD85zwzyCuxMHTKCKqXVC7D-a*-DWOtWapMbH0LA6VWonKHQy1pAp*EX0bRu5lpiTELozDJCDoTiZL4*5LVI2sMLN5DECkwAslE1rtyOmOs8zc8opBzTAfPs7sN9uqgLVhn20e-HC1lJG7SBSj*gmQ=
https://chrisalbon.com/ Notes on Data Science, Machine Learning, & Artificial Intelligence
TensorFlow template application for deep learning https://github.com/tobegit3hub/deep_recommend_system.git
Python + Scrapy + MongoDB . 5 million data per day !!!💥 The world's largest website. 🔞  https://github.com/xiyouMc/WebHubBot
QuestionAnsweringSystem是一个Java实现的人机问答系统,能够自动分析问题并给出候选答案 https://github.com/ysc/QuestionAnsweringSystem
python 新闻联播 https://github.com/maxiee/MyCodes/blob/master/PythonJiebaProjects/XWLB_words_freq/xwlb_jieba.py
2nd place solution for the 2017 national datascience bowl http://juliandewit.github.io/kaggle-ndsb2017/
feature hash https://github.com/wush978/FeatureHashing
A framework for training and evaluating AI models on a variety of openly available dialog datasets https://github.com/facebookresearch/ParlAI


 
 Posted by at 10:25 下午
5月 012017
 
How transferable are features in deep neural networks? https://arxiv.org/abs/1411.1792
TensorFlow CNN for fast style transfer https://github.com/lengstrom/fast-style-transfer
https://github.com/HappyShadowWalker/ChineseTextClassify 中文文本分类,使用搜狗文本分类语料库
https://lukeoakdenrayner.wordpress.com/2017/04/24/the-end-of-human-doctors-understanding-medicine/ The End of Human Doctors – Understanding Medicine
Machine Learning in Science and Industry slides http://arogozhnikov.github.io/2017/04/20/machine-learning-in-science-and-industry.html https://github.com/yandexdataschool/MLAtGradDays
all the available code repos for the NIPS 2016's top papers https://www.reddit.com/r/MachineLearning/comments/5hwqeb/project_all_code_implementations_for_nips_2016/
Best Practices for Applying Deep Learning to Novel Applications https://arxiv.org/abs/1704.01568v1?utm_campaign=Revue newsletter&utm_medium=Newsletter&utm_source=revue
Medical Image Analysis with Deep Learning https://medium.com/@taposhdr/medical-image-analysis-with-deep-learning-i-23d518abf531
https://royalsociety.org/~/media/policy/projects/machine-learning/publications/machine-learning-report.pdf
中国谣言数据库 http://rumor.thunlp.org/


 
 Posted by at 10:33 下午
4月 232017
 
http://alias-i.com/lingpipe/  LingPipe is tool kit for processing text using computational linguistics.
http://svmlight.joachims.org/ 基于svm做文本分类
http://adrem.ua.ac.be/~tmartin/  svm jni java接口
https://github.com/antoniosehk/keras-tensorflow-windows-installation  windows上安装基于tensoflow-gpu的keras深度学习包
http://thegrandjanitor.com/  机器学习
http://www.wsdm-conference.org/2017/accepted-papers/  wsdm 2017 accepted papers
https://www.slideshare.net/BhaskarMitra3/neural-text-embeddings-for-information-retrieval-wsdm-2017
https://github.com/laura-dietz/tutorial-utilizing-kg
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
https://cmusatyalab.github.io/openface/
https://eliasvansteenkiste.github.io/ Predicting lung cancer
https://brage.bibsys.no/xmlui/handle/11250/2433761 Tree Boosting With XGBoost - Why Does XGBoost Win "Every" Machine Learning Competition?
https://github.com/YaronBlinder/MIMIC-III_readmission/ Predicting 30-day ICU readmissions from the MIMIC-III database
https://github.com/caffe2/caffe2  facebook  开源深度学习框架 caffe2
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications https://arxiv.org/abs/1704.04861
https://zhuanlan.zhihu.com/p/24322376 欺诈盛宴:百万黑产军团,两千万手机号,瓜分百亿蛋糕


 
 Posted by at 6:16 下午
4月 222017
 

常用工具

1. 文本处理

(1)atom ,https://atom.io/,常用插件

列编辑,https://atom.io/packages/Sublime-Style-Column-Selection

Run code in Atom 主要是运行 python https://atom.io/packages/script

项目管理 https://atom.io/packages/project-manager

markdown编辑 https://atom.io/packages/markdown-preview-plus

markdown-scroll-sync https://atom.io/packages/markdown-scroll-sync 

Python Autocomplete Package https://atom.io/packages/autocomplete-python

HQL (Apache Hive) query language https://atom.io/packages/language-hql

(2)sublime text ,http://www.sublimetext.com/

(3)markdown mac版本,http://macdown.uranusjr.com/

(4) pandoc, http://www.pandoc.org/, 格式转换,markdown等处理

(5) word, ppt, excel, onenote, 画图,笔记,表格处理

2. 编码相关工具

pycharm,http://www.jetbrains.com/pycharm/

intelij IDEA, http://www.jetbrains.com/idea/

maven , http://maven.apache.org/, 项目管理

visual studio code,https://code.visualstudio.com/

rstudio, https://www.rstudio.com/

3. 思维导图

xmind http://www.xmindchina.net/

4. 终端登录工具

iterm2 macos http://www.iterm2.com/

putty windows 

5. 网络分析

gelphi,https://gephi.org/

6. 可视化工具

graphviz, http://www.graphviz.org/

7. ftp工具

FileZilla, https://filezilla-project.org/

8. 代码版本管理

git, https://git-scm.com/

9. 文档写作

mkdocs, http://www.mkdocs.org/

10. 数据库工具

mysql, https://www.mysql.com/ postgresql, https://www.postgresql.org/


 
 Posted by at 12:22 上午
4月 152017
 
http://adventuresinmachinelearning.com/neural-networks-tutorial/
http://adventuresinmachinelearning.com/improve-neural-networks-part-1/
http://adventuresinmachinelearning.com/stochastic-gradient-descent/
https://github.com/adventuresinML/adventures-in-ml-code
https://github.com/yandexdataschool/Practical_RL 增强学习实践课程
https://github.com/yandexdataschool/YSDA_deeplearning17 Deep Learning course, 2017
https://webhose.io/datasets  免费数据集
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/keras/python/keras/applications/vgg16.py
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/keras/python/keras/applications/vgg19.py
https://arxiv.org/pdf/1409.1556.pdf VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
https://github.com/machrisaa/tensorflow-vgg
transfer learning
pre-train networks
https://github.com/BVLC/caffe/tree/master/models
http://mscoco.org/dataset/#download
https://github.com/visipedia/inat_comp
https://rahulduggal2608.wordpress.com/2017/04/02/alexnet-in-keras/


互联网黑产剖析——虚假号码
http://mp.weixin.qq.com/s?__biz=MzA4MjI2MTcwMw==&mid=2650485616&idx=1&sn=d26063f090b936d7efd3fedf32108df0&chksm=8787f0d8b0f079ce57ba26a9a6deb1f444a7939b6a1d443a3cbc773b662084bc6eaf8a16ea74&scene=21#wechat_redirect
互联网黑产剖析——代理和匿名
http://mp.weixin.qq.com/s?__biz=MzA4MjI2MTcwMw==&mid=2650485686&idx=1&sn=b8d3fd492e7fd27c0ceec7a98511c63b&chksm=8787f01eb0f0790824d1d0ac37a6817416e7ec79e535d7ba8e628628eef6fcaac962ca3ebfe0&scene=21#wechat_redirect
关于IP,这里有你想知道的一切!(上篇)
http://mp.weixin.qq.com/s?__biz=MzA4MjI2MTcwMw==&mid=2650485704&idx=1&sn=a34cb411701008ed13b1042ba549d341&chksm=8787f060b0f0797642628a9bba9f4ea5f713f4bb8b0347a69630938c333ca6ee3ce46ca470a3&scene=21#wechat_redirect

https://github.com/stuxuhai/jpinyin  JPinyin是一个汉字转拼音的Java开源类库
   com.github.stuxuhai
   jpinyin
   1.1.8


https://github.com/NLPchina/ansj_seg  中文分词
    org.ansj
    ansj_seg
    5.1.1

emoji-java is a lightweight java library that helps you use Emojis in your java applications.
  com.vdurmont
  emoji-java
  3.2.0


 
 Posted by at 6:22 下午
4月 082017
 

https://github.com/kwotsin/awesome-deep-vision
from https://github.com/m2dsupsdlclass/lectures-labs
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. LeNet
Simonyan, Karen, and Zisserman. "Very deep convolutional networks for large-scale image recognition." (2014) VGG-16
Simplified version of Krizhevsky, Alex, Sutskever, and Hinton. "Imagenet classification with deep convolutional neural networks." NIPS 2012 AlexNet
He, Kaiming, et al. "Deep residual learning for image recognition." CVPR. 2016. ResNet
Szegedy, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." (2016)
Canziani, Paszke, and Culurciello. "An Analysis of Deep Neural Network Models for Practical Applications." (May 2016).

classification and localization
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." CVPR (2016)
Liu, Wei, et al. "SSD: Single shot multibox detector." ECCV 2016
Girshick, Ross, et al. "Fast r-cnn." ICCV 2015
Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." NIPS 2015
Redmon, Joseph, et al. "YOLO9000, Faster, Better, Stronger." 2017

segmentation
Long, Jonathan, et al. "Fully convolutional networks for semantic segmentation." CVPR 2015
Noh, Hyeonwoo, et al. "Learning deconvolution network for semantic segmentation." ICCV 2015
Pinheiro, Pedro O., et al. "Learning to segment object candidates" / "Learning to refine object segments", NIPS 2015 / ECCV 2016
Li, Yi, et al. "Fully Convolutional Instance-aware Semantic Segmentation." Winner of COCO challenge 2016.

弱监督学习 Weak supervision
Joulin, Armand, et al. "Learning visual features from large weakly supervised data." ECCV, 2016
Oquab, Maxime, "Is object localization for free? – Weakly-supervised learning with convolutional neural networks", 2015

Self-supervised learning
Doersch, Carl, Abhinav Gupta, and Alexei A. Efros. "Unsupervised visual representation learning by context prediction." ICCV 2015.


dnn优化
Ren, Mengye, et al. "Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes." 2017
Salimans, Tim, and Diederik P. Kingma. "Weight normalization: A simple reparameterization to accelerate training of deep neural networks." NIPS 2016.
Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization." 2016.
Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." ICML 2015
Generalization
Understanding deep learning requires rethinking generalization, C. Zhang et al., 2016.
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, N. S. Keskar et al., 2016
1. A strong optimizer is not necessarily a strong learner.
2. DL optimization is non-convex but bad local minima and saddle structures are rarely a problem (on common DL tasks).
3. Neural Networks are over-parametrized but can still generalize.
4. Stochastic Gradient is a strong implicit regularizer.
5. Variance in gradient can help with generalization but can hurt final convergence.
6. We need more theory to guide the design of architectures and optimizers that make learning faster with fewer labels.
7. Overparametrize deep architectures
8. Design architectures to limit conditioning issues:
(1)Use skip / residual connections
(2)Internal normalization layers
(3)Use stochastic optimizers that are robust to bad conditioning
9. Use small minibatches (at least at the beginning of optimization)
10. Use validation set to anneal learning rate and do early stopping
11. Is it very often possible to trade more compute for less overfitting with data augmentation and stochastic regularizers (e.g. dropout).
12. Collecting more labelled data is the best way to avoid overfitting.


 
 Posted by at 4:04 下午
4月 052017
 

大数据风控创业公司

1. siftscience

介绍

https://siftscience.com

提供的服务:

盗账户,支付欺诈,垃圾内容, 账户冒用,营销资金冒用,设备指纹

服务行业:

电子商务,旅游,订票,数字产品等;

技术博客

https://engineering.siftscience.com,介绍siftscience的风控技术,工程,算法和架构;

2. forter

介绍

https://www.forter.com

服务: All E-Commerce Needs Marketplaces,Digital Goods,Services,Physical Goods,Travel ,Mobile (SDK & API),Alternative Payments

技术: Machine Learning with a Human Touch, Understanding the Context of a Transaction, Real-Time Approve/Decline Decision

技术博客

http://blog.forter.com,介绍反欺诈相关的业务,技术发展和报告;

3. datavisor

介绍

https://www.datavisor.com

提供的服务:

金融欺诈,反洗钱,电子商务反欺诈

服务行业:

yelp, momo,唱吧等

技术博客

https://www.datavisor.com/blog/, 介绍datavisor在大数据风控的技术,产品和架构,机器学习,规则引擎和决策平台等;

4. patternex

介绍

https://www.patternex.com

提供的服务: 数据分析,盗账户, 人工智能风控助理 基于大数据分析驱动人工智能,提供大数据风控服务;

技术博客

https://www.patternex.com/blog,介绍patternex通过人工智能技术在大数据风控领域的研究和探索,技术介绍和反欺诈相关的报告;


 
 Posted by at 8:55 下午