Data access

10月 122016
 

For most of us, the data we analyze in JMP starts out somewhere else: in a relational database, Excel, a CSV file or perhaps SAS. The need to seamlessly move such data into JMP and prepare it for analysis led us to introduce the Query Builder feature in JMP 12. Query Builder helps you select multiple tables from an external data source and join them. Then, you can interactively filter (creating a prompting filter if desired), sample and set column names, formats and modeling types for the imported data.

The feedback we’ve gotten from users about Query Builder suggests that you are finding it useful. We have also gotten suggestions for fixes and enhancements, both for Query Builder and other aspects of data access. With JMP 13, we are delivering a boatload of such fixes and enhancements. The 13 most important such fixes and enhancements are detailed below.

Filtering

The first four enhancements all relate to filtering data.

Careful, my data could be huge – When you create a filter for a categorical column, Query Builder retrieves the values to display in a list. With large tables, this can take a long time. In JMP 12, value retrieval was unconditional, plus there was not a way to cancel it. In JMP 13, we have made several changes to prevent long waits:

  • Cancelable value retrieval – JMP 13 puts up a progress bar with a Cancel button when retrieving categorical column values. This is supported for all ODBC drivers we have tested when JMP is running on Windows. It is not supported when connecting to SAS or for most ODBC drivers available for the Macintosh.
  • Too big to attempt – If there are more than 1,000,000 rows in a table, JMP will not even attempt to retrieve unique column values. The 1,000,000 value can be changed via a preference.
  • Simpler list – In JMP 12, the Check Box List was the only type of filter available for selecting from a list of values. In JMP 13, we have added a plain List Box filter type. The List Box filter is less resource-intensive than the Check Box List filter. This makes it better-suited for larger lists. The default filter type for categorical columns is the List Box in JMP 13.

New filter types – In addition to the new, simpler List Box filter type, two more filter types have been added for categorical columns in JMP 13:

  • Contains filter – Enter some text, and JMP will match all rows that contain that text. You can also ask to match rows that do not contain the text.
  • Manual List filter – Allows you to create a list of selections yourself to avoid the need for values to be looked up.

List filters are now invertible – All of the list-type filters (List Box, Manual List, Check Box List and Match Column Values) now have a Not in List check box. This allows you to select a couple items and retrieve all rows that do not match the selected values. For example, this filter will return all movies rated something other than “G”:

List filters can now be conditional – This one is sort of a big deal.  Using the red-triangle menu on a list-type filter, you can set the filter to be conditional. Conditional filters only display values that match other filters that precede them in the list. Below is an example using movie Rating and movie Genre. In this example, I have asked for the Genre filter to be conditional. When I select G in the list for Rating, the Genre filter changes to list only genres that contain at least one G-rated film:

     

This symbol indicates that the filter is conditional. Only filters for columns from the same table affect the values displayed in a conditional filter.

JMP Query Builder

After using Query Builder, some users would ask us, “What if I just have a folder full of JMP data tables. Can I use Query Builder on them?” In JMP 13, the answer is a resounding “Yes!” Or perhaps you use ODBC Query Builder, Text Import or the Excel Import Wizard to import several tables. It would nice to be able to use Query Builder to join the results. With JMP 13, you can!

To use Query Builder on JMP data tables, first open the tables, and then select JMP Query Builder from the Tables menu.

tables_jmp_query_builder

For example, JMP has two sample data tables, SATByYear.jmp and CrimeData.jmp, that both have State and Year columns. Another sample table, US Demographics.jmp, has a State column. I can easily join these three tables with JMP Query Builder:

  

JMP Query Builder allows up to 64 tables to be joined together. If you ever get that many tables into one query, please send me a screenshot.

All of the other features of Query Builder, such as filters and prompting, are also available with JMP Query Builder.

Query() JSL Function

We built a SQL engine into JMP to allow Query Builder to work on JMP data tables. A new JSL function, Query(), gives you direct access to that SQL engine. You can use the Query() function to manipulate JMP data tables using SQL statements. Here is an example using SATByYear and CrimeData sample data tables:

Run on Open

In JMP 13, you can configure a query to immediately run when you open it instead of opening the Query Builder window. Simply check the Run on Open option on the red-triangle menu at the top of the Query Builder window:

This is especially useful for queries that have prompted filters. You can send these queries to others (or incorporate them into a JMP add-in), and when the other user opens them, they will just see the filter prompt. This allows them to make their filter selections without having to wade through the complexities of Query Builder.

When a query has been set to Run on Open, but you need to open it into Query Builder to make changes, you have a few options. If you hold down the Ctrl key while opening the query, it will open into the Query Builder window. Alternatively, you can right-click on the query file in the JMP Home Window and select Edit Query.

Creating queries that will work in JMP 12

One caveat to all these neat new JMP 13 Query Builder features – if you create queries that use these features, you will not be able to open them in JMP 12. At the same time, you may get JMP 13 earlier than your co-workers so that you can try out other new features.

To help with this scenario, we have added a preference in JMP 13 that hides all of the new JMP 13 features of Query Builder so that the queries you build will still be compatible with JMP 12. The preference is on the Query Builder Preferences page:

Any ODBC or SAS queries you build after setting that preference will only allow features that are compatible with JMP 12. If you want to relax that rule for a particular query, there is an option on Query Builder’s red-triangle menu that you can uncheck to allow JMP 13 features for that query:

New features on the Tables panel

The Tables panel on the Query Builder window in JMP 12 did not have much functionality other than showing you the list of tables in your query.  In JMP 13, that panel gains a number of features:

  • Selecting one or more tables in the Tables panel restricts the columns listed in the Available Columns panel to just columns from the selected tables, making columns easier to find.
  • The Tables panel now displays the Venn diagram icon corresponding to the join type for each table, and you can edit the join, change the table alias, or remove the table from the query from the context menu.
  • When querying JMP data tables, double-clicking a table in the Tables panel makes the table visible and brings it to the front (or select the View item on the context menu).

“First N Rows” Sampling

When querying large tables from databases, sometimes it is helpful to retrieve just the first thousand or so rows of data for a query to experiment with before you spend the time and resources to retrieve all the data.

In JMP 12, First N Rows sampling was supported for the Oracle and SQL Server databases. In JMP 13, support has been added for most other databases, including PostgreSQL, MySQL, Microsoft Access, SQLite, Apache Hive, and Cloudera Impala.

Improved Hadoop and Text File Support

More and more data is being stored in “big data” databases these days. JMP 13 improves date support for sources like Apache Hive, Cloudera Impala and HortonWorks. Also, saving tables with File > Database > Save Table did not work well with some of these data sources. That has been improved in JMP 13, with the caveat that using ODBC to save data to Hadoop-based data sources is not a very efficient way to get data to them.

If you do a lot with CSV files, support for the Microsoft Text Files ODBC driver has been improved in JMP 13.

Saving JMP data to a database is much faster

Keeping data in a database makes it convenient to provide access to whoever needs it. For many releases, JMP has supported saving JMP data tables to databases via the File > Database > Save Table feature. However, with data sizes getting larger and larger, we have had reports that saving JMP tables to a database was taking much longer than people felt that it should. We listened and investigated, and we are happy to report that, in JMP 13, the performance of saving JMP tables to databases has significantly improved, in some cases dramatically. Please try this feature again and let us know what you experience.

Virtual Join

With JMP, all of the data you are analyzing has to fit in memory. When you join JMP data tables with either Tables > Join or the new JMP Query Builder, data tends to get duplicated from smaller “look-up” tables into the larger join result. To help prevent this duplication, the Virtual Join feature has been added in JMP 13. For example, a DVD store might have an inventory table that knows where all the DVD’s are and a film table with details about each title. In the film table, I can set the film_id column to be the Link ID for the table:

Then, in the inventory table, I can set film_id to be a link reference to the film table. This action effectively joins the two tables based on the film_id column.

Once I’ve set that up, columns from the film table now appear in the column list for inventory. They are designated "referenced columns" and are initially hidden. I can unhide whichever columns I want to appear in the inventory table, in this case title[film_id]:

Virtual Join allows me to see the values from the film table in the inventory table. However, they have not been physically copied. They are looked up as needed, which saves memory.

This just scratches the surface of Virtual Join, which is worthy of a blog post all on its own.

So, there you have it – a look at the many enhancements for accessing and manipulating data in JMP 13. Which feature is your favorite? What feature were you hoping to see that was not mentioned? Let me know in the comments.

For more information on using Query Builder for JMP data tables, check out my Discovery Summit poster presentation in the JMP User Community. While you're there, you can also see the slides from my Discovery Summit tutorial titled, "Wrangling All Your Data With Query Builder in JMP 13."

tags: Data access, Databases, JMP 13, Query Builder

The post 13 reasons data access is better than ever in JMP 13 appeared first on JMP Blog.

8月 222016
 

You may have data in a relational database that you need to bring into JMP. Your data may be spread across multiple tables. And you might want only part of it rather than all. Query Builder simplifies these types of data access tasks, and in the latest version of JMP […]

The post JMP 13 Preview: Query Builder wrangles data better than ever appeared first on JMP Blog.

8月 032016
 

Data access and data privacy are often fundamentally at odds with each other. Organizations want unfettered access to the data describing customers. Meanwhile, customers want their data – especially their personally identifiable information – to remain as private as possible. Organizations need to protect data privacy by only granting data access to authorized […]

The post Who was that masked data? appeared first on The Data Roundtable.

2月 192015
 
Hadoop is increasingly being adopted as the go-to platform for large-scale data analytics. However, it is still not necessarily clear that Hadoop is always the optimal choice for traditional data warehousing for reporting and analysis, especially in its “out of the box” configuration. That is because Hadoop itself is not […]
7月 102014
 

SAS users love to look at data. And the data grid in SAS Enterprise Guide is a convenient way to view the contents of a data set. While small data sets can be rendered lickity-split for quick viewing, sometimes people get justifiably anxious when opening very large data. Perhaps they've been burned with a bad experience, because opening a huge transaction table can hang up a system. When dealing with very large data, it's wise to be cautious.

In general, the SAS Enterprise Guide data grid can open and show data quickly, even if the data sets are very large. This is because the data grid does not read in all of the data records at once, but reads in only the records that are needed to fill the rows of the grid. The data grid uses a "virtual mode", which allows you to scroll quickly around your data set, no matter the size.

datagrid
In addition to the scroll bar, which shows you approximately where you are in the data set, there are a few keyboard shortcuts that can help you to navigate:

  • Ctrl+End - takes you to the last record, last column in the data set.
  • Ctrl+Home - takes you to the first record, first column in the data set.
  • Ctrl+G - brings up a "Go to" window, in which you specify exactly which record number to bring to the top of the view.

Note: The concept of "record number" is a specific to SAS data sets, since traditional SAS files store records sequentially. When you view a database table (accessed through a SAS/ACCESS library), the idea of "record number" might not make as much sense, since the database rules determine the sequence. However, you can still use the keyboard shortcuts to navigate your view.

Even for all of its built-in smarts, the data grid can still perform slowly in certain situations. If you're experiencing slow performance, here are some tips to help.

Understand your network latency. SAS Enterprise Guide retrieves data records from your SAS session, which is typically on another machine in your network. If your network connection is slow or the remote machine is very remote (such as in another country), fetching those records will take more time.

Avoid opening data unnecessarily. When you add data to your project, whether using File->Open Data or by running a task that produces output data, SAS Enterprise Guide will automatically open the data for viewing. Simply opening the data can cause a noticable delay when there is network latency or a slow database connection. If you're already familiar with the data and don't need to see the records every time, you can change this default behavior. From Tools->Options, select Data General and uncheck the option for "Automatically open data when added to the project."

Avoid scrolling through very wide data. As I stated earlier, large data isn't generally a problem in SAS Enterprise Guide. However, when the data are very wide, the grid can be slow. Why? Because even though the data grid reads only enough data records to fill the grid, it reads only entire records -- even if some of the variables in the record aren't visible on the screen at the time.

I define "wide" data as hundreds or thousands of variables, or maybe fewer variables that each have very large lengths. Consider these two LENGTH statements, each of which allocate 2000 variables but yield dramatically different record lengths:

/* 2000 numeric vars, 8 bytes each = 16,000 bytes per record */
/* a data set with just these fields can render quickly in SAS EG */
length x1-x2000 8;
 
/* 2000 character vars, 1000 bytes each = 2,000,000 bytes per record! */
/* a data set with these fields can really slow things down */
length c1-c2000 $ 1000;

Use the Data Explorer with care. The Data Explorer (added in SAS Enterprise Guide 5.1) is a popular new feature (see Marje Fecht's enthusiastic review). The Data Explorer can provide at-a-glance "Quick Stats" of variable distribution and frequency. However, the behind-the-scenes analyses that support those stats can be expensive for very large data. SAS Enterprise Guide sets some automatic constraints in order to limit the impact, but you should set your expectations accordingly.

Understand the special treatment for database tables. When you access data in a library using a SAS/ACCESS engine (for example, connecting to Oracle or Teradata), SAS Enterprise Guide automatically limits the data grid view to 10,000 records. You can configure that number in the application options, but the reason for the constraint is to limit the impact on the database connection and to keep your DBA happy (well, as happy as DBAs tend to be, anyway). 10,000 records should be enough for you to "eyeball" the shape of the data. If you need to see rows that meet specific criteria, consider using the Query Builder.

To paraphrase King Burger's Bon Qui Qui (from the famous MAD TV sketch): You can view your data your way, but don't get crazy. I hope that with a little knowledge about what happens behind the scenes, you can click through your data with more confidence.

tags: big data, data access, SAS Enterprise Guide, sas/access
5月 212014
 

I’ve recently had the opportunity to learn a little more about administering SAS Visual Analytics.  The sessions introduced me to two new GUI interfaces that simplify the work for SAS administrators.  In my last post, I shared how to load data into memory using the SAS Visual Data Builder.

After building the table, loading into memory and scheduling the query, you will want to work on the user or group permissions for the dataset.  You can use the SAS Visual Analytics Administrator for this task.

Accessing the table

To access the SAS Visual Analytics Administrator, return to the SAS Visual Analytics home page by selecting the HOME icon and select Manage Environment from the right hand pane. Your userid must have the appropriate permissions to be able to utilize all of the features I am going to show below:VApermissions1

To view the current permissions on the table, expand the LASR directory tree in the left hand side pane.  Double-click on the SampleClaims table that we’ve been using for this discussion. You’ll see the permissions listed by type of access permitted for each user role.

VApermissions2

Setting permissions 

Defining user groups can be done within SAS Management Console. Once I have my user groups, I can then set permissions within SAS Visual Analytics. In this particular situation, SASUSERS is the collection of all users who have access to SAS. SAS Administrators and SAS System Services are a subset of the SASUSERS group by definition. You may have other groups defined, such as Finance or Marketing. The following tasks can be done on any of your user groups as needed.

I want to give my SASUSERS read permissions, but I don’t want them to delete or overwrite the existing table. When I click on “grant” or “deny”, it becomes an explicit permission. An explicit permission is one that the administrator has expressly set. Implicit or inherited permissions will then be set for the subset of users within SASUSERS (my SAS Administrators and SAS System Services groups).

To change the permission explicitly, I select the icon I wish to change for the user or user group I wish to change.

VApermissions3

When I make SASUSERS an explicit Deny for WriteMetadata access, the SAS System Services and SAS Administrators inherit a Deny as shown below. Explicit permissions are denoted by a yellow star.

VApermissions4

Since SAS Administrators and SAS System Services should have WriteMetadata access, I will need to go in and give the SAS System Services and SAS Administrators groups an explicit grant.

The same process will need to be repeated for other permissions columns such as Write, Administer, and Delete.  All the permissions should look like the table below, with explicit permissions denoted by a yellow star.

VApermissions5

Testing the result

Once your query has been scheduled and you’ve set all your user permissions, it’s good practice to create an exploration to determine the table has loaded correctly and to see what end-users will see.

Returning to the Home page, select Create Exploration --> Select a Data Source.  Scroll down to your table (here we're testing the SampleClaims table) and select Open.

VApermissions6

Take advantage of the auto-charting feature  in the SAS Visual Analytics Data Explorer by simply dragging and dropping whatever categories or measures you’d like to view onto the workspace.  In this example, I’ve selected the variables Claim Amount and Type of Car, and the auto-charting facility automatically selects a bar chart for me.

VApermissions7

I hope this has helped you get up and running with some common administrator tasks within SAS Visual Analytics!

tags: data access, permissions, SAS Administrators, SAS Visual Analytics, SAS Visual Analytics Administrator