System z Chief Data Officer: Lessons from the Field: July 2017

What is Machine Learning for z/OS?

Machine Learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights in the data without being explicitly programmed where to look. Machine learning systems can find correlations in data and recognize patterns to provide early detection and to predict events before they happen. This can mean early detection of healthcare conditions, prediction of factors that lead to better patient adherence or better clinical outcomes, or algorithms to reach new heights of personalized care and tailored treatment protocols.

Machine learning projects generally include tasks such as data cleansing and ingestion, data feature engineering and selection, data transformation, model training, model evaluation, model deployment, scoring, re-evaluation, and re-training (feedback loop). Many of these tasks need to be performed iteratively to get to desired results. Each task requires heavy engagement from experienced analytics personas across the organization from data scientist and/or software/data engineers to application developers. As such, a machine learning project usually takes weeks to months before a usable model could be generated and deployed in production.

IBM Machine Learning for z/OS (Machine Learning for z/OS) is an end to end enterprise machine learning platform that will help to simplify and significantly reduce the time for creation and deployment of machine learning models by:

Integrating all the tools and functions needed for machine learning and automating the machine learning workflow.
Providing a platform with freedom of choice and productivity for better collaboration across different personas including data scientist, data engineer, business analyst and application developers, for a successful machine learning project.
Infusing cognitive capabilities into the machine learning workflow to help determine when model results deteriorate and need to be tuned and provide suggestions for updates or changes.

Machine learning is needed where business rules are rapidly changing, or where application development can’t keep pace with changes that need to be made, or where applications need to be continually tuned. Instead of writing lots of complex business rules you would use machine learning, select the appropriate algorithm and parameters to build the model. Once the model is created, it can be trained on historical data and deployed to recognize patterns to make future predictions. Predictions are retained and compared to actual result as part of model monitoring. As environment evolves, model results may deteriorate at which time, the data scientist can choose to retrain the model with stored feedback data. By simplifying model management, Machine Learning for z/OS reduces the amount or maintenance in an application because the model is "aware" and always learning, becoming smarter over time.

Why is Machine Learning Important to zEnterprise Customers?

Many of our enterprise customers have expressed an interest in leveraging the latest analytics technology with the flexibility to deploy on premise, in the cloud or in a hybrid environment. That said, many of our z Systems customers are not yet ready to move their most sensitive data to the cloud. They want to take advantage of their existing significant investment in infrastructure, minimize costly data movement and ensure data governance/security. For some customers this will be their first entry into the machine learning domain. For them we have made this process much simpler by lowering the bar for development and maintenance of predictive behavior models. For some customers, with already extensive data science expertise we have simplified the development and maintenance process by providing cognitive expertise to build behavioral models and automation to maintain those models over time -- freeing up their data developers and data scientists to work on enhancing their existing models and to bring data science to new areas of the business.

Machine Learning for z/OS also offers RESTful APIs and programming APIs to perform tasks such as transactional scoring. Scoring allows zEnterprise customers to evaluate a transaction against a machine learning model to determine in real time e.g. risk of pre-diabetes, likelihood of medication adherence/compliance, risk of over-payment prior to claims payment, and to make real time decisions based on these information (e.g. elastic drug pricing). This type of real time scoring requires access to the actual transactional data which means the model scoring engine should be collocated with the transactions to meet transactional SLAs. Machine Learning for z/OS includes the various tools and functions needed to train and deploy machine learning models and automating machine learning workflows. It includes collaboration features for personas such as data scientists and application developers. It also includes capabilities to determine when models need to be tuned and advise changes. Through its web UI, RESTful APIs and programming APIs, it provides a suite of functions to ingest all types of zEnterprise data, transform and cleanse the data, train models with a selected algorithm using the data, evaluate a trained model, select optimal models/algorithms through the Cognitive Assistant for Data Scientist (CADS) interface, manage models, deploy models into production, automate feedback to ingest new data and re-train models, monitor model status and resource utilization, RESTful APIs to call for online scoring with models, a data scientist notebook interface to use machine learning APIs in interactive mode.

IBM makes it possible for customers to satisfy these requirements while benefiting from the latest analytics advancement like Machine Learning for z/OS. They can access z Systems data in place and combine that data with other sources of information, such as structured and unstructured data from other systems. They can then build models to predict customer behavior to make the most optimal business decisions. And by accessing live data they can be more agile. This is exactly what our large customers want to do.

What is the IBM DB2 Analytics Accelerator?

The IBM DB2 Analytics Accelerator for z/OS (the Accelerator) is a high-performance appliance for DB2 z/OS that deeply integrates Netezza balanced and highly parallelized asymmetric massively parallel processing technology with IBM z Systems technology at the database kernel level. The accelerator allows DB2 to offload data-intensive and complex static and dynamic DB2 queries (e.g. data warehousing, business intelligence, and analytic workloads) to the accelerator without any application changes. With the accelerator, these queries can be executed significantly faster than was previously possible, while avoiding expensive general purpose CPU (GP) utilization in DB2 for z/OS. The performance and cost savings of the Accelerator opens up unprecedented opportunities for organizations to make use of their data on the zEnterprise platform.

The analytics accelerator is conceptually the same as a hybrid automobile. The hybrid automobile has a standard vehicle user interface (e.g. steering wheel, brake, accelerator pedal). A hybrid automobile may at any given time run using its gasoline or electrical power source to optimize fuel economy. The switching between power sources to optimize fuel efficiency is done by the automobile itself without requiring constant manual intervention by the user or a change in the standard vehicle API’s.

With the DB2 Analytics Accelerator, DB2 for z/OS can offload data-intensive and complex static and dynamic DB2 for z/OS queries, such as data warehousing, business intelligence and analytic workloads, transparently to the application. The DB2 Analytics Accelerator then executes these queries significantly faster than previously possible—all while avoiding CPU utilization by DB2 for z/OS. It allows users to run workloads that historically were offloaded from z Systems, or run queries that were governed or shunted in DB2 for z/OS such as ad hoc queries whose performance characteristics are typically unknown at runtime. And IT administrators can allow DB2 for z/OS to choose where to run these queries, or they can force these queries to the DB2 Analytics Accelerator to prevent additional DB2 for z/OS consumption.

The accelerator delivers dramatic improvement in response time on unpredictable, complex, and long-running dynamic and static query workloads. It helps in meeting SLAs and shortening batch windows by offloading complex query workloads. The idea is to keep what’s working well in DB2 and improve response times for CPU intensive queries.
The accelerator allows users to run new workloads that had previously not been considered for the MF or run queries that had previously been governed or shunted in DB2 (e.g. Ad-hoc queries whose performance characteristics are typically unknown at runtime). Clients can allow DB2 to choose where to run these queries, or they can force these types of queries to the accelerator to prevent additional DB2 consumption.
By offloading resource intensive queries and the associated processing onto the accelerator, clients can lower MSU consumption. Additionally, they can reduce the cost of storing, managing, and processing historical data with a near line storage solution.
There is also the reduction in costs associated with the time it takes to perform general tuning and administration tasks associated with supporting and improving performance for resource intensive workloads in DB2 for System z.
Clients can also lower or eliminate the cost of acquiring HW and SW for data warehousing and analytics as well as lowering or eliminating the cost incurred from data movement, transformation, landing, storage, and maintenance of systems. With the accelerator, clients can consolidate disparate data to their existing zEnterprise platform while benefiting from integrated operational BI.
With Accelerator-only tables and in-DB transformation capabilities, data can be Extracted from a number of source systems, Loaded into the Accelerator, and Transformed within the Accelerator (ELT). Applications directly access the transformed data through DB2 for z/OS. Accelerator-only tables can be used to store transformed data ‘only’ in the Accelerator and not maintain a second copy in z/OS.
Increased organization agility by being able to more rapidly respond with immediate, accurate information and deliver new insights to business users.
Reporting is consolidated on zEnterprise where the majority of the data being analyzed lives, while retaining zEnterprise security and reliability.

How Does the Analytics Accelerator Complement and Improve the Enterprise Data Lake Strategy?

The Analytics Accelerator was designed to be used in concert with DB2 z/OS with a vision to become the first true Hybrid Transactional and Analytics Processing Engine (HTAP). The Analytics Accelerator was intended to be complementary to a zEnterprise data lake strategy and not competitive. Several new features within the Analytics Accelerator actually reduce the costs of data movement to the data lake AND improve the data latency of the data that is landed in the data lake.

In 2017, two new features will further the Analytics Accelerator’s ability to complement a zEnterprise data lake strategy.

Transactional consistency in the Analytics Accelerator: With this feature, DB2 applications will no longer need to be concerned with data currency within the Analytics Accelerator: the most current result set will be guaranteed. This removes the largest obstacle for much broader use of the Analytics Accelerator. Today many customers hesitate to use the Analytics Accelerator because they cannot guarantee that the queries can tolerate potentially stale data. With this feature, there will be no difference in latency between data returned by DB2 and by the Analytics Accelerator. This will make DB2 + the Analytics Accelerator the only true Hybrid Transactional and Analytics Processing Engine (HTAP) solution in the market.

Remove the cost of replication to the Analytics Accelerator from the 4HRA: When customers say 'we can replicate to other environments', there will be 2 major advantages with the Analytics Accelerator. First is that they cannot guarantee transactional consistency when replicating to a separate environment (see above). Second; when sending data to an external environment, replication and ETL has a cost on z/OS on top of the standard People, Process, Infrastructure, Liability of Data Breach costs from maintaining 2 copies with 2 separate access points. See our 'Cost of ETL' Calculator. With this feature, the cost of replication to the Analytics Accelerator will be removed from the 4HRA. Any other replication or ETL to disparate environments will impact the 4HRA and thus lead to additional costs.

As was mentioned above, the Analytics Accelerator also supports Accelerator Only Tables (AoT’s). With Accelerator-only tables and in-DB transformation capabilities, data can be Extracted from a number of source systems, Loaded into the Accelerator, and Transformed within the Accelerator (ELT). Applications directly access the transformed data through DB2 for z/OS. Accelerator-only tables can be used to store transformed data ‘only’ in the Accelerator.

What this all means is that the Analytics Accelerator data will be transactionally consistent with DB2 data. The replication of data from DB2 to the Analytics Accelerator will be $0 cost. The Analytics Accelerator will support in accelerator transformations of data. Therefore, data can be replicated to the accelerator, transformed to match the structure of data in the data lake, and extracted with 0 latency from the DB2 data without incurring any costs in DB2 AND without having to extract to a ETL server in between to do the transformations. Such a solution avoids the high cost of extraction of data from DB2 for System z, the cost of maintaining a set of ETL servers and complex ETL flows (Test, Prod), the additional liability of data breach from maintaining additional data copies and interfaces to these copies, the latency in moving this data to disparate systems before landing to the data lake, etc. Many customers are already using federation technologies between DB2 + the Analytics Accelerator and the data lake (Big SQL, Impala) to reduce data movement processes. With HTAP, $0 cost of replication to the Analytics Accelerator, and AoT’s, the Analytics Accelerator is completely complementary to the enterprise data lake strategy and reduces costs, liability of data breach, and latency associated with getting data from System z to the data lake.

How Can Machine Learning and the Analytics Accelerator Be Used?

Pharmaceutical Benefits Manager (PBM) example

The proposed solution architecture, with Machine Learning for z/OS and the IBM DB2 Analytics Accelerator at its core, is intended to drive substantial new analytics driven revenue for clients while reducing existing people, process, and infrastructure costs. This solution provides the tooling to derive a tremendous amount of actionable insight from its transactional data (monetize its transactional data), reduces existing costs by reducing data/infrastructure sprawl across the enterprise, improves existing Service Level Agreements (SLAs), reduces data latency for analytics initiatives, improves data governance, etc. Ultimately, the goal of Machine Learning for clients is to take new, transactional AI solutions to the market in an efficient and scalable manner. In the case of a 'Transparent Pharmaceutical Benefits Manager (PBM)', machine learning and the analytics accelerator can serve as the transactional analytics engine that deliver new revenue opportunities to a consumer. Showcasing state of the art analytics and AI solutions may also attract new PBM opportunities (e.g. marketing machine learning based formulary and rebate management processes to earn new claims adjudication business). Some examples of opportunities for Machine Learning and the Analytics Accelerator in the PBM example are:

Example 1: Health Outcomes Optimization; ex Diabetes

Most health conditions being treated have metrics associated with success. Conditions can be segmented into common chronic (i.e. diabetes, asthma, high cholesterol, high blood pressure, heart disease, arthritis, etc.) and uncommon high cost/needing specialty medications (i.e. RA, Crohns, multiple sclerosis, cancers). Diabetes has very clear metrics tied to success (ABC: A1c = average blood sugar; B=Blood pressure; and C = cholesterol). Unfortunately, payers and providers have limited views on the successful metrics for a given population. A PBM can build out a predictive risk model to provide a health score for patients with Diabetes and thereby segment the diabetes population into well controlled, moderate control and poor control. By having this information available for real time analysis inside it’s Db2 adjudication system, a PBM can enable its health plan clients to “treat/manage” these segments differently – i.e. someone who is poorly controlled may receive additional counseling at the pharmacy, have a different copay for the member or have a different message to the physician. At both the point of care in the doctor’s office and the point of sale, the PBM would measure the adherence to medications. If someone is not at goal, and was not taking their medications regularly, an adherence program could be implemented. If the patient was taking their medications, then a more potent medication or a new medication may be needed.

The value of doing this type of analysis to consumers is that the PBM can help patients meet clinical goals and drive lower copay's to the consumer. For the physicians, this type of analysis can be used to drive pay for performance programs. This analysis can also be used to drive value between the health plans and pharmaceutical companies. By leveraging the concept of differential rebates, this technology can help members achieve clinical goals. By increasing achievement in clinical goals, the pharmaceutical companies get paid more, and the health care systems can reduce costs. A PBM can monetize this by further aligning itself with the health systems (increased value to the health system from better clinical outcomes, more effective transactional scoring and auditing within fast pass and e-Prior Authorization control processes, etc.) and potentially driving increased revenue through its ‘prescription outcomes’ contracts.

Example 2: Major changes in “risk” - Resource Utilization Bands (RUBS)

The Johns Hopkins ACG uses regression based modeling primarily from historical pharmacy and medical claims to profile and predict risk for a population. Each member in a population receives a variety of risk scores. These patients are also lumped together into 1 of 6 RUBS (resource utilization bands) – no data, healthy, low risk, medium risk, high risk and very high risk. As the amount of data inputs increases beyond medical and pharmacy claims to include behavioral data, care management data, EMR data, and consumer data, we can use ML to more timely and accurately predict changes in risk. For example, 60% of people who take a chronic medication have at least one 30 day gap in a year. Some resume the medication after a few months whereas many stop altogether. ML techniques can be used to identify the correlation between non compliance and hospitalizations for certain diseases (i.e. for high cholesterol unlike to have correlation whereas for health failure, likely to have high correlation). This machine learning based modeling helps identify potential causes for changes to patient morbidity risk. The machine learning modeling analyzes changes to Resource Utilization Bands across consecutive periods and attempts to find correlations with a number of patient related features. Clearly, uncovering factors that may predict changes in morbidity risk can be used to alert providers and health systems to potentially increasing morbidity risk and provide possible interventions in reducing this risk.

Example 3: Showcasing the Value of Machine Learning Driven Insight to Existing Clients

With the ability to access medical data files from existing customers, a PBM can:

Use ML capabilities to show correlations (e.g. patient attributes and co-morbidity) using medical/health data
Apply Johns Hopkins ACG functions to this data
Show clients the value of ML to clinical outcomes
Integrate ML features into the existing application (e.g. via a Bot)
Sell this new application as a service to clients

Other clients may have other interesting data sources. For example, some customers may engage human coaching companies who have a wealth of data, interactivity with member, and a wealth of asynchronous communications that can be leveraged in Machine Learning modeling.

Example 4: Fast Pass, e-Prior Authorization, Alternative Drug Recommendation

This is a case where the drug is covered (i.e. it is the preferred drug). Machine Learning can be used to determine cases additional situations where fast pass is appropriate vs additional controls. The second case is related to the process of recommending alternative drugs that require the pharmacy to contact the provider. The third case exists within the e-Prior Authorization control mechanisms. Again, streamlining these processes and determining where additional controls make sense (or do not make sense) is something that Machine Learning models can help obviate. This is value add to the pharmacy, the providers, the consumers, and the health plans.

Example 5: Drive new revenue at Hospital Systems

There are several immediate potential opportunities that exist within small to medium hospital health systems using Machine Learning.

The first opportunity is with employee health at these hospital systems. Small to medium systems may have 50K employees. In the case of employee health, every 10K employees represents $100M in employee spend. Machine learning driven insight can be used to show these hospital systems how a PBM can help save 5-7% on employee health costs and improve the qualities of service for its employees.

The second potentially large opportunity is to use machine learning to help hospital systems optimize revenue for specialty products. A transparent PBM typically wants to align with the hospital systems. For example, there are cases where hospital systems are treating patients that require expensive drugs (MS, HIV). Historically, some of these health systems started prescribing the drugs and sending them out to a 3rd party who would handle the filling of the medication. These drugs often represented $50K of medication. This presents an opportunity for the PBM to showcase what they can do as a partner and sell new core services to the hospital health system.

Smaller hospital health systems may also be more interested in population health management. For instance, understanding the factors that lead to some people taking medications and others skipping or not filling their medication. Machine Learning is key to uncovering factors that humans may not have previously considered.

Example 6: Drive Value to Retail Clinics/Stores

Promote patient medication adherence using other financial motivators such as free co-pay cards to use with retail pharmacy’s or retail grocery store coupons for health food options.

System z Chief Data Officer: Lessons from the Field

Wednesday, July 5, 2017

Transactional Machine Learning and Analytics: Industry Example