282017
 

Did you know that… Scientists have concluded that the chicken came first, not the egg, because the protein which makes the egg shells is only produced by hens. Source A toaster uses almost half as much energy as a full-sized oven. Source The London Eye in England is the largest [...]

The post Are you learning what future employers are really looking for? appeared first on SAS Analytics U Blog.

272017
 
RNN预测股票价格小实验

RNN示例

基于basic rnn构建时间序列预测模型,预测股票价格趋势。本实例仅仅作为RNN学习实验。

代码


# -*- coding: utf-8 -*-
# @DATE    : 2017/2/14 17:50
# @Author  : 
# @File    : stock_predict.py

import os
import sys
import datetime

import tensorflow as tf
import pandas as pd
import numpy as np
from yahoo_finance import Share
import matplotlib.pyplot as plt

from utils import get_n_day_before, date_2_str


class StockRNN(object):
    def __init__(self, seq_size=12, input_dims=1, hidden_layer_size=12, stock_id="BABA", days=365, log_dir="stock_model/"):
        self.seq_size = seq_size
        self.input_dims = input_dims
        self.hidden_layer_size = hidden_layer_size
        self.stock_id = stock_id
        self.days = days
        self.data = self._read_stock_data()["Adj_Close"].astype(float).values
        self.log_dir = log_dir

    def _read_stock_data(self):
        stock = Share(self.stock_id)
        end_date = date_2_str(datetime.date.today())
        start_date = get_n_day_before(200)
        # print(start_date, end_date)

        his_data = stock.get_historical(start_date=start_date, end_date=end_date)
        stock_pd = pd.DataFrame(his_data)
        stock_pd["Adj_Close"] = stock_pd["Adj_Close"].astype(float)
        stock_pd.sort_values(["Date"], inplace=True, ascending=True)
        stock_pd.reset_index(inplace=True)
        return stock_pd[["Date", "Adj_Close"]]

    def _create_placeholders(self):
        with tf.name_scope(name="data"):
            self.X = tf.placeholder(tf.float32, [None, self.seq_size, self.input_dims], name="x_input")
            self.Y = tf.placeholder(tf.float32, [None, self.seq_size], name="y_input")

    def init_network(self, log_dir):
        print("Init RNN network")
        self.log_dir = log_dir
        self.sess = tf.Session()
        self.summary_op = tf.summary.merge_all()
        self.saver = tf.train.Saver()
        self.summary_writer = tf.summary.FileWriter(self.log_dir, self.sess.graph)
        self.sess.run(tf.global_variables_initializer())
        ckpt = tf.train.get_checkpoint_state(self.log_dir)
        if ckpt and ckpt.model_checkpoint_path:
            self.saver.restore(self.sess, ckpt.model_checkpoint_path)
            print("Model restore")

        self.coord = tf.train.Coordinator()
        self.threads = tf.train.start_queue_runners(self.sess, self.coord)

    def _create_rnn(self):
        W = tf.Variable(tf.random_normal([self.hidden_layer_size, 1], name="W"))
        b = tf.Variable(tf.random_normal([1], name="b"))
        with tf.variable_scope("cell_d"):
            cell = tf.contrib.rnn.BasicLSTMCell(self.hidden_layer_size)
        with tf.variable_scope("rnn_d"):
            outputs, states = tf.nn.dynamic_rnn(cell, self.X, dtype=tf.float32)

        W_repeated = tf.tile(tf.expand_dims(W, 0), [tf.shape(self.X)[0], 1, 1])
        out = tf.matmul(outputs, W_repeated) + b
        out = tf.squeeze(out)
        return out

    def _data_prepare(self):
        self.train_x = []
        self.train_y = []
        # data
        data = np.log1p(self.data)
        for i in xrange(len(data) - self.seq_size - 1):
            self.train_x.append(np.expand_dims(data[i: i + self.seq_size], axis=1).tolist())
            self.train_y.append(data[i + 1: i + self.seq_size + 1].tolist())

    def train_pred_rnn(self):

        self._create_placeholders()

        y_hat = self._create_rnn()
        self._data_prepare()
        loss = tf.reduce_mean(tf.square(y_hat - self.Y))
        train_optim = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)
        feed_dict = {self.X: self.train_x, self.Y: self.train_y}

        saver = tf.train.Saver(tf.global_variables())
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            for step in xrange(1, 20001):
                _, loss_ = sess.run([train_optim, loss], feed_dict=feed_dict)
                if step % 100 == 0:
                    print("{} {}".format(step, loss_))
            saver.save(sess, self.log_dir + "model.ckpt")

            # prediction
            prev_seq = self.train_x[-1]
            predict = []
            for i in range(5):
                next_seq = sess.run(y_hat, feed_dict={self.X: [prev_seq]})
                predict.append(next_seq[-1])
                prev_seq = np.vstack((prev_seq[1:], next_seq[-1]))
            predict = np.exp(predict) - 1
            print(predict)
            self.pred = predict

    def visualize(self):
        pred = self.pred
        plt.figure()
        plt.legend(prop={'family': 'SimHei', 'size': 15})
        plt.plot(list(range(len(self.data))), self.data, color='b')
        plt.plot(list(range(len(self.data), len(self.data) + len(pred))), pred, color='r')
        plt.title(u"{}股价预测".format(self.stock_id), fontproperties="SimHei")
        plt.xlabel(u"日期", fontproperties="SimHei")
        plt.ylabel(u"股价", fontproperties="SimHei")
        plt.savefig("stock.png")
        plt.show()


if __name__ == "__main__":
    stock = StockRNN()
    # print(stock.read_stock_data())
    log_dir = "stock_model"
    stock.train_pred_rnn()
    stock.visualize()

运行结果,预测BABA未来5天股价2017.2.27


[ 104.23436737  103.82189941  103.59770966  103.43360138  103.29838562]

部分数据2017-02-08,103.57 2017-02-09,103.339996 2017-02-10,102.360001 2017-02-13,103.099998 2017-02-14,101.589996 2017-02-15,101.550003 2017-02-16,100.82 2017-02-17,100.519997 2017-02-21,102.120003 2017-02-22,104.199997 2017-02-23,102.459999 2017-02-24,102.949997



 
 Posted by at 10:04 下午
272017
 

Longtime SAS programmers know that the SAS DATA step and SAS procedures are very tolerant of typographical errors. You can misspell most keywords and SAS will "guess" what you mean. For example, if you mistype "PROC" as "PRC," SAS will run the program but write a warning to the log: WARNING 14-169: Assuming the symbol PROC was misspelled as PRC.

This feature provided a big productivity boost in the days before GUI program editors. Imagine submitting a program from a command line in the early 1980s. If you mistyped one keyword you would have to retype the entire statement. As a convenience, SAS implemented an algorithm that checks the "spelling distance" between the tokens that you submit and a list of valid keywords for the procedure that you are calling. DATA step programmers might be familiar with the SPEDIS function, which measures how close two words are to each other in the English language. The SAS language parser uses the same algorithm.

Not everyone wants this feature. Many companies in regulated industries (such as pharmaceuticals) turn off the autocorrect feature in SAS because they want to force their programmers to type every keyword correctly. You can determine whether AUTOCORRECT option is enabled on your system by running PROC OPTIONS:

proc options option=AUTOCORRECT value;  run;

The AUTOCORRECT option is turned on by default. You can turn off the option by submitting options NOAUTOCORRECT or by putting -NOAUTOCORRECT in a configuration file.

Today I've invited two people to argue for and against using this feature. Larry Literal is a programmer who believes that no program should ever accept a syntax error. Annie Intel sees nothing wrong with programs that self-correct. She argues that it is desirable for programs to interpret the intention of the programmer. Which do you agree with? Do you have something to add? Leave a comment.

Point: A program should not allow ambiguity

My name is Larry Literal and I believe that computer programming should be an exact science. There is no room for ambiguity. A program that runs because it is "close to" a correct program is an abomination. I do not want a computer to change the code that I write!

When my system administrator installs a new version of SAS, the first thing I do is turn off the autocorrect feature. (I've also turned off the autocorrect feature on my phone. What a pain!) My main argument against the AUTOCORRECT option is that it makes code unreadable. Take a look at the following program:

/* The correct program is:
   proc freq dta=sashelp.class order=freq;
      table sex / chisq;
   run;
*/
prc freq dta=sashelp.class ordor=freqq;
   tble sex / chsq;
runn;

Every keyword in this program is mistyped. The only tokens that are specified correctly are the name of the procedure, the name of the data set, and the name of the variable. The program looks more like the Klingon language than the SAS language, yet this program runs if you use the AUTOCORRECT option!

And what happens if SAS introduces a new keyword that is closer to a mistyped word than a previous keyword? Then the procedure might do something different even though I have not changed the program! The autocorrect feature is an abomination and should never be used!

Counterpoint: Computers should interpret what you say

Really, Larry? "An abomination"? What century are you living in?

My name is Annie Intel, but my friends call me "A.I." I think the SAS autocorrect feature was way ahead of its time. Today we have autocorrecting logic on smartphones and word processors. Applying the same techniques to computer programs is no different. In fact, if you use a modern SAS program editor, the editor will suggest valid keywords and flag any keyword that is not valid.

Let's be real: Larry's example is not realistic. No programmer is going to use that garbled call to PROC FREQ in a production job. The autocorrect feature does not "make code unreadable." It is a convenience while developing a program, not an excuse to write nonsense. Any competent programmer will check the log for warning messages and correct the typos.

Larry claims that he doesn't want a computer munging and altering the code he writes. But optimizing compilers have been doing exactly that for decades! Programmers write instructions in a high-level language and an optimizing compiler maps the code to a set of machine instructions. The compiler will sometimes rearrange the structure of the program to get better performance. If it is okay for a compiler to map a program into an optimal version of itself, why is it not okay for a parser to do the same by correcting misspellings?

I want computers to recognize my intentions. When I give a voice command to my smartphone or personal home device, the audio signal is mapped to an action. I am allowed a certain amount of flexibility. "Turn on the lights" and "turn da light on" are equivalent phrases that should be understood and mapped to the same action. The SAS AUTOCORRECT feature is similar. The interpreter has a context (the name of the procedure) which is used to standardize your input. I think it is very cool. In the future, I think more programming languages will accept ambiguities.

The post Point/Counterpoint: Should a programming language accept misspelled keywords? appeared first on The DO Loop.

272017
 

Manufacturers are used to operating in challenging circumstances – whether financial, economic, political or competitive. And it's a good thing since the coming years seem likely to be as uncertain as those following the global economic crash. But manufacturing will have an excellent opportunity to thrive because, like many sectors, [...]

Why quality is the answer to manufacturing challenges was published on SAS Voices by Tim Clark

272017
 

Manufacturers are used to operating in challenging circumstances – whether financial, economic, political or competitive. And it's a good thing since the coming years seem likely to be as uncertain as those following the global economic crash. But manufacturing will have an excellent opportunity to thrive because, like many sectors, [...]

Why quality is the answer to manufacturing challenges was published on SAS Voices by Tim Clark

252017
 

Forecasting at Scale

1. facebook时间序列预测

facebook开源时间序列预测算法,该算法基于加法模型,支持非线性趋势预测,改变点(change point),周期性,季节性以及节假日等等。

It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.

时间序列预测在实际工作中非常频繁,譬如预测业务发展,制定业务目标;设定产品的kpi,预测未来的UV, PV等等;

2. 时间序列预测框架



3. 算法

加法模型

y(t)=g(t)+s(t)+h(t)+ϵt" role="presentation" style="-webkit-print-color-adjust: exact; display: inline; line-height: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">y(t)=g(t)+s(t)+h(t)+ϵt其中,

g(t)" role="presentation" style="-webkit-print-color-adjust: exact; display: inline; line-height: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">g(t)表示增长函数,拟合时间序列模型中非周期性变化的值;

s(t)" role="presentation" style="-webkit-print-color-adjust: exact; display: inline; line-height: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">s(t)表示周或者年等季节性的周期性变化;

h(t)" role="presentation" style="-webkit-print-color-adjust: exact; display: inline; line-height: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">h(t)表示节假日或者事件,对时间序列预测值的影响;

4. 实例


# -*- coding: utf-8 -*-
# @DATE    : 2017/2/25 18:18
# @Author  : 
# @File    : fb_example1.py

import pandas as pd
import numpy as np
from fbprophet import Prophet

data_df = pd.read_csv("data/example_wp_peyton_manning.csv")
data_df["y"] = np.log(data_df["y"])
print(data_df.head())
print(data_df.tail())

# fit the model, model params
# growth = 'linear',
# changepoints = None,
# n_changepoints = 25,
# yearly_seasonality = True,
# weekly_seasonality = True,
# holidays = None,
# seasonality_prior_scale = 10.0,
# holidays_prior_scale = 10.0,
# changepoint_prior_scale = 0.05,
# mcmc_samples = 0,
# interval_width = 0.80,
# uncertainty_samples = 1000
m = Prophet()
m.fit(data_df)

# make prediction
data_future = m.make_future_dataframe(periods=30)
print(data_future.tail())
pred_res = m.predict(data_future)
print(pred_res[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail())

# visualization
m.plot(pred_res)

运行结果,


           ds         y
0  2007-12-10  9.590761
1  2007-12-11  8.519590
2  2007-12-12  8.183677
3  2007-12-13  8.072467
4  2007-12-14  7.893572
              ds          y
2900  2016-01-16   7.817223
2901  2016-01-17   9.273878
2902  2016-01-18  10.333775
2903  2016-01-19   9.125871
2904  2016-01-20   8.891374
STAN OPTIMIZATION COMMAND (LBFGS)
init = user
save_iterations = 1
init_alpha = 0.001
tol_obj = 1e-12
tol_grad = 1e-08
tol_param = 1e-08
tol_rel_obj = 10000
tol_rel_grad = 1e+07
history_size = 5
seed = 1691376609
initial log joint probability = -19.4685
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
      99       7977.57   0.000941357       431.339      0.3404      0.3404      134   
     199        7988.7   0.000894011       356.862       0.739       0.739      241   
     299       7996.29    0.00359033       180.856           1           1      358   
     399       8000.11   0.000546236       205.358     0.09131      0.7253      481   
     499       8002.89    0.00024026        99.613           1           1      608   
     514       8003.11   5.25911e-05       135.817   7.646e-07       0.001      671  LS failed, Hessian reset 
     580       8003.41   3.04884e-05       92.4947    1.88e-07       0.001      798  LS failed, Hessian reset 
     599       8003.49   8.15685e-05        83.046      0.6885      0.6885      821   
     607        8003.5   2.60204e-05       67.9783   1.712e-07       0.001      874  LS failed, Hessian reset 
     654       8003.64   0.000118504       280.906   6.562e-07       0.001      973  LS failed, Hessian reset 
     699       8003.75   2.52751e-06       58.0645      0.3238           1     1029   
     705       8003.75   4.61033e-07       59.0008      0.2964           1     1037   
Optimization terminated normally: 
  Convergence detected: relative gradient magnitude is below tolerance
             ds
2930 2016-02-15
2931 2016-02-16
2932 2016-02-17
2933 2016-02-18
2934 2016-02-19
             ds      yhat  yhat_lower  yhat_upper
2930 2016-02-15  8.021739    7.371417    8.641458
2931 2016-02-16  7.710504    7.079853    8.334700
2932 2016-02-17  7.448298    6.849103    8.012131
2933 2016-02-18  7.370376    6.724225    8.004908
2934 2016-02-19  7.305117    6.683996    8.001754

Process finished with exit code 0

5. 参考资源

facebook prophet 

https://facebookincubator.github.io/prophet/

PS:在日常工作应用中,预测成交额,销量,PV等等可以借鉴fb的时间序列技术,引入季节性因素,节假日,促销事件(譬如双11,双12等);


 
252017
 

Let’s have some fun, shall we? Share your video or photo!

The SAS User Community, albeit spread around the world, is a tight-knit group. We may sit alone in our offices pounding out code, developing applications, tweaking system performance or creating reports,  but the truth is other SAS users (our colleagues at work, in online communities, and at local user group meetings), are always there to assist us, and to socialize with from time to time. We rely on our fellow SAS Users for support and companionship, as well as a resource for new ideas and techniques. Then, once each year, we join users on a global scale by gathering for a few days at SAS Global Forum.

The opportunity to strengthen and extend our bonds with other SAS Users makes SASGF a much sought-after event. We will go to great lengths to attend; by demonstrating value to our employer to secure permission, presenting content to receive a registration discount, applying for an award or scholarship, volunteering as a presenter or room coordinator, joining the Conference Team, or even becoming Conference Chair!

What might these efforts look like if we were to record metaphors for them? What I mean is, how would you represent your effort?

For example, here is a photo of two determined SAS Users negotiating a portage on Lady Evelyn River (Ontario, Canada) on their way to SAS Global Forum.

These two must really understand the value of attending!

So...

What are you willing to do to get to SAS Global Forum?!

Share your videos and photos that represent your efforts to get to SASGF in Orlando. We’ll have some fun seeing how our fellow SAS Users spend their non-SAS-coding time. I’m looking forward to seeing new faces and new places.

Simply follow @SASsoftware on Twitter and Instagram, then post your video, photo or gift. Make sure you tag your post with the #GetToSASGF and @SASsoftware. 

Share more than one, encourage your fellow SAS Users to play along. And check back often to see what your peers have shared.

Who knows, you may even see your picture or video on the Big Screen at SASGF 2017!

 

What are you willing to do to get to SAS Global Forum? was published on SAS Users.

252017
 

Machine learning is a type of artificial intelligence that uses algorithms to iteratively learn from data and finds hidden insights in data without being explicitly programmed where to look or how to find the answer. Here at SAS, we hear questions every day about machine learning: what it is, how it compares to [...]

12 machine learning articles to catch you up on the latest trend was published on SAS Voices by Alison Bolen

252017
 

Editor's note: This series of blogs addresses the questions we are most frequently asked at SAS Press! It worth spending some time on this. Arguably, this is one of the most important parts of the book. The table of contents and outline provide the blue print of your book – [...]

The post 2 Easy Steps to Write a Great Book Outline appeared first on SAS Learning Post.