“The software you use not only shapes what you learn from your data; it shapes the questions you ask!” -- Tom Treynor, Director, Zymergen The Beyond Spreadsheets blog series shows how JMP customers are augmenting their tools and processes for exploratory data analysis to make breakthrough discoveries. The series features […]
Last week, I attended the announcement of the Longhaus Pulse Research on BI and Analytics for 2011. Longhaus Pulse is the most comprehensive and regionally focused assessment of the Business Intelligence and Analytics market in Australia.
One area I found interesting was Longhaus Managing Director Peter Carr's observations around the evolution of the Business Intelligence market. Having worked in the BI industry for almost 20 years, this is something I can definitely relate to. Peter compared the current exciting trends around "Next Generation BI" to the prevalent trends of the 1980's, 1990's and 2000's. Right now, Peter observed, trends around "Cloud" and Social Media are "putting the power back into the hands of the knowledge workers". It's certainly interesting that the "Next Generation BI" trends are all around Data Visualization, End User Self-Service, and Mobility (Mobile access to information). In other words, empowering the business user. (Funny, I thought we already had that in the 90's...). Unfortunately, the 2000s became about "BI Standardization", Portals and the like which were all heavily IT-centric. In the process, BI vendors perhaps lost sight of what BI was all about. Enabling users to get access to the Right Information, at the Right Time, to make Better Decisions. Longhaus talk about the "Digital Divide", with BI technology outpacing end-user capability/need (just how many 'Studios' do I actually need to get access to my information???)
The consequence? Spreadsheets. And more spreadsheets. Spreadsheets everywhere.
The problem with this? Disparate data silos. Poor data integrity. Lack of an audit trail. Lost productivity. Inability to make timely, fact-based decisions.
Which brings me to my recent survey of BI trends in A/NZ. Covering all industries and geographies across A/NZ, I received 230 responses (7% response rate). What became clear from the results was that BI, although assumed to be mature and pervasive, had not solved the Reporting problem in most organisations. For example:
Most surprising of all, 40% are currently evaluating (or re-evaluating) their BI vendors, with a further 26% planning to do so within the next 12 months.
Good news for SAS, named by Longhaus as the No.1 BI Vendor in Australia for 2011 ("Longhaus names SAS institute BI as No 1 vendor"), as we heed the message of frustrated users, and focus on empowering the Knowledge Workers to solve complex business problems. Intuitively. And fast.
For anyone interested in my full survey results report, I can be contacted at email@example.com
In order to use the data in statistical software, you have to get it out of Excel. While Excel does provide some tools for exporting data, these are not easily replicable, since they rely on menu choices through the GUI. A better approach is to read the file directly from within the statistical software.
An additional complication is that a single file may contain several sheets, each of which may have unique columns and rows. While importing from Excel into SAS is shown in section 1.1.5, we don't discuss reading from specific sheets or show how to read an Excel file in R.
In SAS, it's possible to use the "Import Data" wizard to gain access via the GUI (File; Import Data; etc.) but this is no better than using the GUI in Excel. However, all the wizard does is compose a proc import to read from the file. This turns out to be important, because the documentation for using proc import for Excel files is hard to find.
The documentation is buried in the on-line help at SAS Products; SAS/ACCESS; SAS/ACCESS 9.2 for PC Files: Reference; File Format-Specific Reference; Microsoft Excel Workbook Files. This is not the same material found through SAS Products; SAS Procedures; Proc Import. The code below was derived by running the wizard and using its option to save the resulting commands. The help.xlsx file can be downloaded from the book website; SAS currently cannot read an Excel file in directly from a URL-- you must download the file manually and read it locally.
PROC IMPORT OUT= WORK.test
The range option can be used to specify the desired sheet. This means that you must know the name of the sheet you want to import and type it in. The trailing "$" can be used to read in just a specific range of cells, but with no additional information it implies the full sheet. Since there is only one sheet in the helpdata.xlsx file, it's not required here.
The foreign package reads data from many file types. However, the .xls and .xlsx formats are not among them, as far as we know. Fortunately, the files can be read with the gdata package, using the read.xls() function. This will read files in either the .xls or the .xlsx format.
ffdata = read.xls("http://www.math.smith.edu/r/data/help.xlsx",
In this implementation, you specify the sheet by number, rather than name. This may be less precise than using the full name, but it does spare some tedious typing.