tips

5月 052017
 
learning deep learning with keras http://p.migdal.pl/2017/04/30/teaching-deep-learning.html
移动视频2017年用户画像和趋势预测 http://mp.weixin.qq.com/s?src=3&timestamp=1493786539&ver=1&signature=LmxQAN5pURJyKafpyA7pOMD85zwzyCuxMHTKCKqXVC7D-a*-DWOtWapMbH0LA6VWonKHQy1pAp*EX0bRu5lpiTELozDJCDoTiZL4*5LVI2sMLN5DECkwAslE1rtyOmOs8zc8opBzTAfPs7sN9uqgLVhn20e-HC1lJG7SBSj*gmQ=
https://chrisalbon.com/ Notes on Data Science, Machine Learning, & Artificial Intelligence
TensorFlow template application for deep learning https://github.com/tobegit3hub/deep_recommend_system.git
Python + Scrapy + MongoDB . 5 million data per day !!!💥 The world's largest website. 🔞  https://github.com/xiyouMc/WebHubBot
QuestionAnsweringSystem是一个Java实现的人机问答系统,能够自动分析问题并给出候选答案 https://github.com/ysc/QuestionAnsweringSystem
python 新闻联播 https://github.com/maxiee/MyCodes/blob/master/PythonJiebaProjects/XWLB_words_freq/xwlb_jieba.py
2nd place solution for the 2017 national datascience bowl http://juliandewit.github.io/kaggle-ndsb2017/
feature hash https://github.com/wush978/FeatureHashing
A framework for training and evaluating AI models on a variety of openly available dialog datasets https://github.com/facebookresearch/ParlAI


 
 Posted by at 10:25 下午
5月 012017
 
How transferable are features in deep neural networks? https://arxiv.org/abs/1411.1792
TensorFlow CNN for fast style transfer https://github.com/lengstrom/fast-style-transfer
https://github.com/HappyShadowWalker/ChineseTextClassify 中文文本分类,使用搜狗文本分类语料库
https://lukeoakdenrayner.wordpress.com/2017/04/24/the-end-of-human-doctors-understanding-medicine/ The End of Human Doctors – Understanding Medicine
Machine Learning in Science and Industry slides http://arogozhnikov.github.io/2017/04/20/machine-learning-in-science-and-industry.html https://github.com/yandexdataschool/MLAtGradDays
all the available code repos for the NIPS 2016's top papers https://www.reddit.com/r/MachineLearning/comments/5hwqeb/project_all_code_implementations_for_nips_2016/
Best Practices for Applying Deep Learning to Novel Applications https://arxiv.org/abs/1704.01568v1?utm_campaign=Revue newsletter&utm_medium=Newsletter&utm_source=revue
Medical Image Analysis with Deep Learning https://medium.com/@taposhdr/medical-image-analysis-with-deep-learning-i-23d518abf531
https://royalsociety.org/~/media/policy/projects/machine-learning/publications/machine-learning-report.pdf
中国谣言数据库 http://rumor.thunlp.org/


 
 Posted by at 10:33 下午
4月 232017
 
http://alias-i.com/lingpipe/  LingPipe is tool kit for processing text using computational linguistics.
http://svmlight.joachims.org/ 基于svm做文本分类
http://adrem.ua.ac.be/~tmartin/  svm jni java接口
https://github.com/antoniosehk/keras-tensorflow-windows-installation  windows上安装基于tensoflow-gpu的keras深度学习包
http://thegrandjanitor.com/  机器学习
http://www.wsdm-conference.org/2017/accepted-papers/  wsdm 2017 accepted papers
https://www.slideshare.net/BhaskarMitra3/neural-text-embeddings-for-information-retrieval-wsdm-2017
https://github.com/laura-dietz/tutorial-utilizing-kg
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
https://cmusatyalab.github.io/openface/
https://eliasvansteenkiste.github.io/ Predicting lung cancer
https://brage.bibsys.no/xmlui/handle/11250/2433761 Tree Boosting With XGBoost - Why Does XGBoost Win "Every" Machine Learning Competition?
https://github.com/YaronBlinder/MIMIC-III_readmission/ Predicting 30-day ICU readmissions from the MIMIC-III database
https://github.com/caffe2/caffe2  facebook  开源深度学习框架 caffe2
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications https://arxiv.org/abs/1704.04861
https://zhuanlan.zhihu.com/p/24322376 欺诈盛宴:百万黑产军团,两千万手机号,瓜分百亿蛋糕


 
 Posted by at 6:16 下午
4月 222017
 

常用工具

1. 文本处理

(1)atom ,https://atom.io/,常用插件

列编辑,https://atom.io/packages/Sublime-Style-Column-Selection

Run code in Atom 主要是运行 python https://atom.io/packages/script

项目管理 https://atom.io/packages/project-manager

markdown编辑 https://atom.io/packages/markdown-preview-plus

markdown-scroll-sync https://atom.io/packages/markdown-scroll-sync 

Python Autocomplete Package https://atom.io/packages/autocomplete-python

HQL (Apache Hive) query language https://atom.io/packages/language-hql

(2)sublime text ,http://www.sublimetext.com/

(3)markdown mac版本,http://macdown.uranusjr.com/

(4) pandoc, http://www.pandoc.org/, 格式转换,markdown等处理

(5) word, ppt, excel, onenote, 画图,笔记,表格处理

2. 编码相关工具

pycharm,http://www.jetbrains.com/pycharm/

intelij IDEA, http://www.jetbrains.com/idea/

maven , http://maven.apache.org/, 项目管理

visual studio code,https://code.visualstudio.com/

rstudio, https://www.rstudio.com/

3. 思维导图

xmind http://www.xmindchina.net/

4. 终端登录工具

iterm2 macos http://www.iterm2.com/

putty windows 

5. 网络分析

gelphi,https://gephi.org/

6. 可视化工具

graphviz, http://www.graphviz.org/

7. ftp工具

FileZilla, https://filezilla-project.org/

8. 代码版本管理

git, https://git-scm.com/

9. 文档写作

mkdocs, http://www.mkdocs.org/

10. 数据库工具

mysql, https://www.mysql.com/ postgresql, https://www.postgresql.org/


 
 Posted by at 12:22 上午
4月 152017
 
http://adventuresinmachinelearning.com/neural-networks-tutorial/
http://adventuresinmachinelearning.com/improve-neural-networks-part-1/
http://adventuresinmachinelearning.com/stochastic-gradient-descent/
https://github.com/adventuresinML/adventures-in-ml-code
https://github.com/yandexdataschool/Practical_RL 增强学习实践课程
https://github.com/yandexdataschool/YSDA_deeplearning17 Deep Learning course, 2017
https://webhose.io/datasets  免费数据集
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/keras/python/keras/applications/vgg16.py
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/keras/python/keras/applications/vgg19.py
https://arxiv.org/pdf/1409.1556.pdf VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
https://github.com/machrisaa/tensorflow-vgg
transfer learning
pre-train networks
https://github.com/BVLC/caffe/tree/master/models
http://mscoco.org/dataset/#download
https://github.com/visipedia/inat_comp
https://rahulduggal2608.wordpress.com/2017/04/02/alexnet-in-keras/


互联网黑产剖析——虚假号码
http://mp.weixin.qq.com/s?__biz=MzA4MjI2MTcwMw==&mid=2650485616&idx=1&sn=d26063f090b936d7efd3fedf32108df0&chksm=8787f0d8b0f079ce57ba26a9a6deb1f444a7939b6a1d443a3cbc773b662084bc6eaf8a16ea74&scene=21#wechat_redirect
互联网黑产剖析——代理和匿名
http://mp.weixin.qq.com/s?__biz=MzA4MjI2MTcwMw==&mid=2650485686&idx=1&sn=b8d3fd492e7fd27c0ceec7a98511c63b&chksm=8787f01eb0f0790824d1d0ac37a6817416e7ec79e535d7ba8e628628eef6fcaac962ca3ebfe0&scene=21#wechat_redirect
关于IP,这里有你想知道的一切!(上篇)
http://mp.weixin.qq.com/s?__biz=MzA4MjI2MTcwMw==&mid=2650485704&idx=1&sn=a34cb411701008ed13b1042ba549d341&chksm=8787f060b0f0797642628a9bba9f4ea5f713f4bb8b0347a69630938c333ca6ee3ce46ca470a3&scene=21#wechat_redirect

https://github.com/stuxuhai/jpinyin  JPinyin是一个汉字转拼音的Java开源类库
   com.github.stuxuhai
   jpinyin
   1.1.8


https://github.com/NLPchina/ansj_seg  中文分词
    org.ansj
    ansj_seg
    5.1.1

emoji-java is a lightweight java library that helps you use Emojis in your java applications.
  com.vdurmont
  emoji-java
  3.2.0


 
 Posted by at 6:22 下午
4月 052017
 

大数据风控创业公司

1. siftscience

介绍

https://siftscience.com

提供的服务:

盗账户,支付欺诈,垃圾内容, 账户冒用,营销资金冒用,设备指纹

服务行业:

电子商务,旅游,订票,数字产品等;

技术博客

https://engineering.siftscience.com,介绍siftscience的风控技术,工程,算法和架构;

2. forter

介绍

https://www.forter.com

服务: All E-Commerce Needs Marketplaces,Digital Goods,Services,Physical Goods,Travel ,Mobile (SDK & API),Alternative Payments

技术: Machine Learning with a Human Touch, Understanding the Context of a Transaction, Real-Time Approve/Decline Decision

技术博客

http://blog.forter.com,介绍反欺诈相关的业务,技术发展和报告;

3. datavisor

介绍

https://www.datavisor.com

提供的服务:

金融欺诈,反洗钱,电子商务反欺诈

服务行业:

yelp, momo,唱吧等

技术博客

https://www.datavisor.com/blog/, 介绍datavisor在大数据风控的技术,产品和架构,机器学习,规则引擎和决策平台等;

4. patternex

介绍

https://www.patternex.com

提供的服务: 数据分析,盗账户, 人工智能风控助理 基于大数据分析驱动人工智能,提供大数据风控服务;

技术博客

https://www.patternex.com/blog,介绍patternex通过人工智能技术在大数据风控领域的研究和探索,技术介绍和反欺诈相关的报告;


 
 Posted by at 8:55 下午
3月 252017
 

python 获取tensorflow课程讲义

# -*- coding: utf-8 -*-
# @DATE    : 2017/3/25 11:08
# @Author  : 
# @File    : pdf_download.py

import os
import shutil
import requests
from bs4 import BeautifulSoup
import urllib2

def download_file(url, file_folder):
    file_name = url.split("/")[-1]
    file_path = os.path.join(file_folder, file_name)
    r = requests.get(url=url, stream=True)
    with open(file_path, "wb") as f:
        for chunk in r.iter_content(chunk_size=1024 * 1024):
            if chunk:
                f.write(chunk)
    r.close()
    return file_path

def get_pdfs(url, root_url, file_folder):
    html = urllib2.urlopen(url)
    soup = BeautifulSoup(html, "lxml")
    cnt = 0
    for link in soup.find_all("a"):
        file_url = link.get("href")
        if file_url.endswith(".pdf"):
            file_name = download_file(file_url, file_folder)
            print("downloading {} -> {}".format(file_url, file_name))
            cnt += 1
    print("downloaded {} pdfs".format(cnt))

def main():
    root_url = "http://web.stanford.edu/class/cs20si/lectures/"
    course_url = "http://web.stanford.edu/class/cs20si/syllabus.html"
    file_folder = "./course_note"
    if os.path.exists(file_folder):
        shutil.rmtree(file_folder)
    os.mkdir(file_folder)
    get_pdfs(course_url, root_url, file_folder)

if __name__ == "__main__":
    main()

 
 Posted by at 7:42 下午
3月 172017
 

文档制作工具mkdocs学习

最近在写项目文档,发现mkdocs非常轻量,基于markdown,生成静态html,托管在服务器即可访问,非常方便适用。

1. 介绍

mkdocs是一款基于markdown构建项目文档的工具,并且通过静态html文件,可以部署在服务器上提供访问。

2. mkdocs安装

pip install mkdocs


$ mkdocs --version
mkdocs, version 0.16.1

3.1 用法

新建项目 mkdocs new 项目名称,生成一个配置文件mkdocs.yml,同时在docs文件夹下生成一个 markdown文件index.md。


$ mkdocs new deeplearning
INFO    -  Creating project directory: deeplearning
INFO    -  Writing config file: deeplearning/mkdocs.yml
INFO    -  Writing initial docs: deeplearning/docs/index.md
$ cd deeplearning/
$ ll
total 8
drwxr-xr-x  3   staff   102B  3 17 11:11 docs
-rw-r--r--  1   staff    19B  3 17 11:11 mkdocs.yml

启动mkdocs内置的develop server


 $ mkdocs serve
INFO    -  Building documentation...
INFO    -  Cleaning site directory
[I 170317 11:35:02 server:283] Serving on http://127.0.0.1:8000
[I 170317 11:35:02 handlers:60] Start watching changes
[I 170317 11:35:02 handlers:62] Start detecting changes

http://localhost:8000



3.2 文档编辑

(1)编辑页面

(2)编辑配置文件


$ cat mkdocs.yml
site_name: 深度学习
pages:
    - home: index.md
    - content: deeplearning.md
    - about: about.md




 
 Posted by at 8:36 下午
1月 072017
 

win7usb启动盘制作

最近在更换电脑,需要重装系统,好久没有重装windows操作系统,回收记得最近也是多年以前,基于
windows光盘安装操作系统。目前,很少接触到带光驱的电脑,需要制作基于usb启动的windows系统
安装盘。制作流程总结如下,方便以后重装系统使用。

1.下载微软官方制作工具

http://www.microsoft.com/en-us/download/windows-usb-dvd-download-tool/
http://wudt.codeplex.com

2. 安装USB制作工具

打开下载的制作工具,直接进行安装,即可;

3.开始制作usb启动盘

(1)下载windows7操作系统iso文件,电脑是64位的,准备好64位windows7操作系统;
(2)点击安装好的usb启动盘制作工具,
第一步,选择要制作的操作系统的iso文件;
第二步,选择要制作的类型,有usb类型和dvd类型,这里选择usb;
第三步,插入u盘,选择u盘,这时会提示要对u盘
进行格式化;
第四步,进行booter安装;至安装结束 可能会出现这样安装可能会出现错误,We were unable to copy your files. Please check your USB
device and the selected ISO file and try again.
此处有雷,最好手动进行格式化,解决方案如下,
打开windows命令行工具,输入命令,
diskpart
list disk
select disk #
clean
create partition primary
select partition 1
active
format quick fs=fat32
assign
exit
继续前面的过程,这时可能在第四步碰到如下错误,出现bootsect错误;解决方案如下,
下载32位windows7 iso文件,打开iso文件,进入boot文件夹,复制bootsect.exe文件,放到usb制作
工具的安装文件夹下。(尝试过64位操作系统下的booysect文件,还是报错,无法正常制作成功);至此,重新进行前面的安装
流程。出现提示制作成功。

4. 开始windows7系统安装

在新电脑上插入usb启动盘,选择从usb启动,我的电脑是自动识别,直接进入window安装界面,一路下去,安装成功;


 
 Posted by at 10:29 下午
12月 272016
 
蚂蚁金服 风险智能部 诚招数据挖掘,机器学习,base 上海 or 杭州
岗位描述:
主要从事互联网金融风控领域数据挖掘;
岗位要求:
(1)编码能力(python  or java)
(2)数据挖掘,机器学习实践应用经验(互联网领域)
(3)熟悉hadoop or tensorflow or spark技术;
(4)工作有激情;
欢迎站内联系;

 
 Posted by at 8:46 下午