System z Chief Data Officer: Lessons from the Field

Wednesday, July 5, 2017

Transactional Machine Learning and Analytics: Industry Example

What is Machine Learning for z/OS?

Machine Learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights in the data without being explicitly programmed where to look. Machine learning systems can find correlations in data and recognize patterns to provide early detection and to predict events before they happen. This can mean early detection of healthcare conditions, prediction of factors that lead to better patient adherence or better clinical outcomes, or algorithms to reach new heights of personalized care and tailored treatment protocols.

Machine learning projects generally include tasks such as data cleansing and ingestion, data feature engineering and selection, data transformation, model training, model evaluation, model deployment, scoring, re-evaluation, and re-training (feedback loop). Many of these tasks need to be performed iteratively to get to desired results. Each task requires heavy engagement from experienced analytics personas across the organization from data scientist and/or software/data engineers to application developers. As such, a machine learning project usually takes weeks to months before a usable model could be generated and deployed in production.

IBM Machine Learning for z/OS (Machine Learning for z/OS) is an end to end enterprise machine learning platform that will help to simplify and significantly reduce the time for creation and deployment of machine learning models by:

Integrating all the tools and functions needed for machine learning and automating the machine learning workflow.
Providing a platform with freedom of choice and productivity for better collaboration across different personas including data scientist, data engineer, business analyst and application developers, for a successful machine learning project.
Infusing cognitive capabilities into the machine learning workflow to help determine when model results deteriorate and need to be tuned and provide suggestions for updates or changes.

Machine learning is needed where business rules are rapidly changing, or where application development can’t keep pace with changes that need to be made, or where applications need to be continually tuned. Instead of writing lots of complex business rules you would use machine learning, select the appropriate algorithm and parameters to build the model. Once the model is created, it can be trained on historical data and deployed to recognize patterns to make future predictions. Predictions are retained and compared to actual result as part of model monitoring. As environment evolves, model results may deteriorate at which time, the data scientist can choose to retrain the model with stored feedback data. By simplifying model management, Machine Learning for z/OS reduces the amount or maintenance in an application because the model is "aware" and always learning, becoming smarter over time.

Why is Machine Learning Important to zEnterprise Customers?

Many of our enterprise customers have expressed an interest in leveraging the latest analytics technology with the flexibility to deploy on premise, in the cloud or in a hybrid environment. That said, many of our z Systems customers are not yet ready to move their most sensitive data to the cloud. They want to take advantage of their existing significant investment in infrastructure, minimize costly data movement and ensure data governance/security. For some customers this will be their first entry into the machine learning domain. For them we have made this process much simpler by lowering the bar for development and maintenance of predictive behavior models. For some customers, with already extensive data science expertise we have simplified the development and maintenance process by providing cognitive expertise to build behavioral models and automation to maintain those models over time -- freeing up their data developers and data scientists to work on enhancing their existing models and to bring data science to new areas of the business.

Machine Learning for z/OS also offers RESTful APIs and programming APIs to perform tasks such as transactional scoring. Scoring allows zEnterprise customers to evaluate a transaction against a machine learning model to determine in real time e.g. risk of pre-diabetes, likelihood of medication adherence/compliance, risk of over-payment prior to claims payment, and to make real time decisions based on these information (e.g. elastic drug pricing). This type of real time scoring requires access to the actual transactional data which means the model scoring engine should be collocated with the transactions to meet transactional SLAs. Machine Learning for z/OS includes the various tools and functions needed to train and deploy machine learning models and automating machine learning workflows. It includes collaboration features for personas such as data scientists and application developers. It also includes capabilities to determine when models need to be tuned and advise changes. Through its web UI, RESTful APIs and programming APIs, it provides a suite of functions to ingest all types of zEnterprise data, transform and cleanse the data, train models with a selected algorithm using the data, evaluate a trained model, select optimal models/algorithms through the Cognitive Assistant for Data Scientist (CADS) interface, manage models, deploy models into production, automate feedback to ingest new data and re-train models, monitor model status and resource utilization, RESTful APIs to call for online scoring with models, a data scientist notebook interface to use machine learning APIs in interactive mode.

IBM makes it possible for customers to satisfy these requirements while benefiting from the latest analytics advancement like Machine Learning for z/OS. They can access z Systems data in place and combine that data with other sources of information, such as structured and unstructured data from other systems. They can then build models to predict customer behavior to make the most optimal business decisions. And by accessing live data they can be more agile. This is exactly what our large customers want to do.

What is the IBM DB2 Analytics Accelerator?

The IBM DB2 Analytics Accelerator for z/OS (the Accelerator) is a high-performance appliance for DB2 z/OS that deeply integrates Netezza balanced and highly parallelized asymmetric massively parallel processing technology with IBM z Systems technology at the database kernel level. The accelerator allows DB2 to offload data-intensive and complex static and dynamic DB2 queries (e.g. data warehousing, business intelligence, and analytic workloads) to the accelerator without any application changes. With the accelerator, these queries can be executed significantly faster than was previously possible, while avoiding expensive general purpose CPU (GP) utilization in DB2 for z/OS. The performance and cost savings of the Accelerator opens up unprecedented opportunities for organizations to make use of their data on the zEnterprise platform.

The analytics accelerator is conceptually the same as a hybrid automobile. The hybrid automobile has a standard vehicle user interface (e.g. steering wheel, brake, accelerator pedal). A hybrid automobile may at any given time run using its gasoline or electrical power source to optimize fuel economy. The switching between power sources to optimize fuel efficiency is done by the automobile itself without requiring constant manual intervention by the user or a change in the standard vehicle API’s.

With the DB2 Analytics Accelerator, DB2 for z/OS can offload data-intensive and complex static and dynamic DB2 for z/OS queries, such as data warehousing, business intelligence and analytic workloads, transparently to the application. The DB2 Analytics Accelerator then executes these queries significantly faster than previously possible—all while avoiding CPU utilization by DB2 for z/OS. It allows users to run workloads that historically were offloaded from z Systems, or run queries that were governed or shunted in DB2 for z/OS such as ad hoc queries whose performance characteristics are typically unknown at runtime. And IT administrators can allow DB2 for z/OS to choose where to run these queries, or they can force these queries to the DB2 Analytics Accelerator to prevent additional DB2 for z/OS consumption.

The accelerator delivers dramatic improvement in response time on unpredictable, complex, and long-running dynamic and static query workloads. It helps in meeting SLAs and shortening batch windows by offloading complex query workloads. The idea is to keep what’s working well in DB2 and improve response times for CPU intensive queries.
The accelerator allows users to run new workloads that had previously not been considered for the MF or run queries that had previously been governed or shunted in DB2 (e.g. Ad-hoc queries whose performance characteristics are typically unknown at runtime). Clients can allow DB2 to choose where to run these queries, or they can force these types of queries to the accelerator to prevent additional DB2 consumption.
By offloading resource intensive queries and the associated processing onto the accelerator, clients can lower MSU consumption. Additionally, they can reduce the cost of storing, managing, and processing historical data with a near line storage solution.
There is also the reduction in costs associated with the time it takes to perform general tuning and administration tasks associated with supporting and improving performance for resource intensive workloads in DB2 for System z.
Clients can also lower or eliminate the cost of acquiring HW and SW for data warehousing and analytics as well as lowering or eliminating the cost incurred from data movement, transformation, landing, storage, and maintenance of systems. With the accelerator, clients can consolidate disparate data to their existing zEnterprise platform while benefiting from integrated operational BI.
With Accelerator-only tables and in-DB transformation capabilities, data can be Extracted from a number of source systems, Loaded into the Accelerator, and Transformed within the Accelerator (ELT). Applications directly access the transformed data through DB2 for z/OS. Accelerator-only tables can be used to store transformed data ‘only’ in the Accelerator and not maintain a second copy in z/OS.
Increased organization agility by being able to more rapidly respond with immediate, accurate information and deliver new insights to business users.
Reporting is consolidated on zEnterprise where the majority of the data being analyzed lives, while retaining zEnterprise security and reliability.

How Does the Analytics Accelerator Complement and Improve the Enterprise Data Lake Strategy?

The Analytics Accelerator was designed to be used in concert with DB2 z/OS with a vision to become the first true Hybrid Transactional and Analytics Processing Engine (HTAP). The Analytics Accelerator was intended to be complementary to a zEnterprise data lake strategy and not competitive. Several new features within the Analytics Accelerator actually reduce the costs of data movement to the data lake AND improve the data latency of the data that is landed in the data lake.

In 2017, two new features will further the Analytics Accelerator’s ability to complement a zEnterprise data lake strategy.

Transactional consistency in the Analytics Accelerator: With this feature, DB2 applications will no longer need to be concerned with data currency within the Analytics Accelerator: the most current result set will be guaranteed. This removes the largest obstacle for much broader use of the Analytics Accelerator. Today many customers hesitate to use the Analytics Accelerator because they cannot guarantee that the queries can tolerate potentially stale data. With this feature, there will be no difference in latency between data returned by DB2 and by the Analytics Accelerator. This will make DB2 + the Analytics Accelerator the only true Hybrid Transactional and Analytics Processing Engine (HTAP) solution in the market.

Remove the cost of replication to the Analytics Accelerator from the 4HRA: When customers say 'we can replicate to other environments', there will be 2 major advantages with the Analytics Accelerator. First is that they cannot guarantee transactional consistency when replicating to a separate environment (see above). Second; when sending data to an external environment, replication and ETL has a cost on z/OS on top of the standard People, Process, Infrastructure, Liability of Data Breach costs from maintaining 2 copies with 2 separate access points. See our 'Cost of ETL' Calculator. With this feature, the cost of replication to the Analytics Accelerator will be removed from the 4HRA. Any other replication or ETL to disparate environments will impact the 4HRA and thus lead to additional costs.

As was mentioned above, the Analytics Accelerator also supports Accelerator Only Tables (AoT’s). With Accelerator-only tables and in-DB transformation capabilities, data can be Extracted from a number of source systems, Loaded into the Accelerator, and Transformed within the Accelerator (ELT). Applications directly access the transformed data through DB2 for z/OS. Accelerator-only tables can be used to store transformed data ‘only’ in the Accelerator.

What this all means is that the Analytics Accelerator data will be transactionally consistent with DB2 data. The replication of data from DB2 to the Analytics Accelerator will be $0 cost. The Analytics Accelerator will support in accelerator transformations of data. Therefore, data can be replicated to the accelerator, transformed to match the structure of data in the data lake, and extracted with 0 latency from the DB2 data without incurring any costs in DB2 AND without having to extract to a ETL server in between to do the transformations. Such a solution avoids the high cost of extraction of data from DB2 for System z, the cost of maintaining a set of ETL servers and complex ETL flows (Test, Prod), the additional liability of data breach from maintaining additional data copies and interfaces to these copies, the latency in moving this data to disparate systems before landing to the data lake, etc. Many customers are already using federation technologies between DB2 + the Analytics Accelerator and the data lake (Big SQL, Impala) to reduce data movement processes. With HTAP, $0 cost of replication to the Analytics Accelerator, and AoT’s, the Analytics Accelerator is completely complementary to the enterprise data lake strategy and reduces costs, liability of data breach, and latency associated with getting data from System z to the data lake.

How Can Machine Learning and the Analytics Accelerator Be Used?

Pharmaceutical Benefits Manager (PBM) example

The proposed solution architecture, with Machine Learning for z/OS and the IBM DB2 Analytics Accelerator at its core, is intended to drive substantial new analytics driven revenue for clients while reducing existing people, process, and infrastructure costs. This solution provides the tooling to derive a tremendous amount of actionable insight from its transactional data (monetize its transactional data), reduces existing costs by reducing data/infrastructure sprawl across the enterprise, improves existing Service Level Agreements (SLAs), reduces data latency for analytics initiatives, improves data governance, etc. Ultimately, the goal of Machine Learning for clients is to take new, transactional AI solutions to the market in an efficient and scalable manner. In the case of a 'Transparent Pharmaceutical Benefits Manager (PBM)', machine learning and the analytics accelerator can serve as the transactional analytics engine that deliver new revenue opportunities to a consumer. Showcasing state of the art analytics and AI solutions may also attract new PBM opportunities (e.g. marketing machine learning based formulary and rebate management processes to earn new claims adjudication business). Some examples of opportunities for Machine Learning and the Analytics Accelerator in the PBM example are:

Example 1: Health Outcomes Optimization; ex Diabetes

Most health conditions being treated have metrics associated with success. Conditions can be segmented into common chronic (i.e. diabetes, asthma, high cholesterol, high blood pressure, heart disease, arthritis, etc.) and uncommon high cost/needing specialty medications (i.e. RA, Crohns, multiple sclerosis, cancers). Diabetes has very clear metrics tied to success (ABC: A1c = average blood sugar; B=Blood pressure; and C = cholesterol). Unfortunately, payers and providers have limited views on the successful metrics for a given population. A PBM can build out a predictive risk model to provide a health score for patients with Diabetes and thereby segment the diabetes population into well controlled, moderate control and poor control. By having this information available for real time analysis inside it’s Db2 adjudication system, a PBM can enable its health plan clients to “treat/manage” these segments differently – i.e. someone who is poorly controlled may receive additional counseling at the pharmacy, have a different copay for the member or have a different message to the physician. At both the point of care in the doctor’s office and the point of sale, the PBM would measure the adherence to medications. If someone is not at goal, and was not taking their medications regularly, an adherence program could be implemented. If the patient was taking their medications, then a more potent medication or a new medication may be needed.

The value of doing this type of analysis to consumers is that the PBM can help patients meet clinical goals and drive lower copay's to the consumer. For the physicians, this type of analysis can be used to drive pay for performance programs. This analysis can also be used to drive value between the health plans and pharmaceutical companies. By leveraging the concept of differential rebates, this technology can help members achieve clinical goals. By increasing achievement in clinical goals, the pharmaceutical companies get paid more, and the health care systems can reduce costs. A PBM can monetize this by further aligning itself with the health systems (increased value to the health system from better clinical outcomes, more effective transactional scoring and auditing within fast pass and e-Prior Authorization control processes, etc.) and potentially driving increased revenue through its ‘prescription outcomes’ contracts.

Example 2: Major changes in “risk” - Resource Utilization Bands (RUBS)

The Johns Hopkins ACG uses regression based modeling primarily from historical pharmacy and medical claims to profile and predict risk for a population. Each member in a population receives a variety of risk scores. These patients are also lumped together into 1 of 6 RUBS (resource utilization bands) – no data, healthy, low risk, medium risk, high risk and very high risk. As the amount of data inputs increases beyond medical and pharmacy claims to include behavioral data, care management data, EMR data, and consumer data, we can use ML to more timely and accurately predict changes in risk. For example, 60% of people who take a chronic medication have at least one 30 day gap in a year. Some resume the medication after a few months whereas many stop altogether. ML techniques can be used to identify the correlation between non compliance and hospitalizations for certain diseases (i.e. for high cholesterol unlike to have correlation whereas for health failure, likely to have high correlation). This machine learning based modeling helps identify potential causes for changes to patient morbidity risk. The machine learning modeling analyzes changes to Resource Utilization Bands across consecutive periods and attempts to find correlations with a number of patient related features. Clearly, uncovering factors that may predict changes in morbidity risk can be used to alert providers and health systems to potentially increasing morbidity risk and provide possible interventions in reducing this risk.

Example 3: Showcasing the Value of Machine Learning Driven Insight to Existing Clients

With the ability to access medical data files from existing customers, a PBM can:

Use ML capabilities to show correlations (e.g. patient attributes and co-morbidity) using medical/health data
Apply Johns Hopkins ACG functions to this data
Show clients the value of ML to clinical outcomes
Integrate ML features into the existing application (e.g. via a Bot)
Sell this new application as a service to clients

Other clients may have other interesting data sources. For example, some customers may engage human coaching companies who have a wealth of data, interactivity with member, and a wealth of asynchronous communications that can be leveraged in Machine Learning modeling.

Example 4: Fast Pass, e-Prior Authorization, Alternative Drug Recommendation

This is a case where the drug is covered (i.e. it is the preferred drug). Machine Learning can be used to determine cases additional situations where fast pass is appropriate vs additional controls. The second case is related to the process of recommending alternative drugs that require the pharmacy to contact the provider. The third case exists within the e-Prior Authorization control mechanisms. Again, streamlining these processes and determining where additional controls make sense (or do not make sense) is something that Machine Learning models can help obviate. This is value add to the pharmacy, the providers, the consumers, and the health plans.

Example 5: Drive new revenue at Hospital Systems

There are several immediate potential opportunities that exist within small to medium hospital health systems using Machine Learning.

The first opportunity is with employee health at these hospital systems. Small to medium systems may have 50K employees. In the case of employee health, every 10K employees represents $100M in employee spend. Machine learning driven insight can be used to show these hospital systems how a PBM can help save 5-7% on employee health costs and improve the qualities of service for its employees.

The second potentially large opportunity is to use machine learning to help hospital systems optimize revenue for specialty products. A transparent PBM typically wants to align with the hospital systems. For example, there are cases where hospital systems are treating patients that require expensive drugs (MS, HIV). Historically, some of these health systems started prescribing the drugs and sending them out to a 3rd party who would handle the filling of the medication. These drugs often represented $50K of medication. This presents an opportunity for the PBM to showcase what they can do as a partner and sell new core services to the hospital health system.

Smaller hospital health systems may also be more interested in population health management. For instance, understanding the factors that lead to some people taking medications and others skipping or not filling their medication. Machine Learning is key to uncovering factors that humans may not have previously considered.

Example 6: Drive Value to Retail Clinics/Stores

Promote patient medication adherence using other financial motivators such as free co-pay cards to use with retail pharmacy’s or retail grocery store coupons for health food options.

Saturday, June 17, 2017

CSI: DB2 Historical Data Forensics On Demand for Audit Defense

Imagine that you get audited by the IRS for claiming a large business loss due to your online retail business facing some unforeseen competition. You try to recall the details of all your business expenses such as the times you used your car and home for business purposes. You wish you had kept log records of all your business activities neatly organized and indexed on your computer for quick analysis. instead, you attempt to cobble the details together to put forth some semblance of proof. Every detail that you cannot prove costs you money.

Now imagine you are the Risk Officer at a $30 Billion/year Enterprise that services some of the most sensitive transactional data in the world. This could be Social Security numbers, medical records/lab results, credit card numbers, account balances. Changes to this data are under constant scrutiny by regulatory bodies in each industry sector. Many organizations devote significant financial and technical resources on risk management. For example, internal governance rules may require housing 20+ years of historical records in case of a law suit. Audits related to government regulations (HIPAA, SEC Rule 17a-4) may not only require maintenance of historical data, but also a view of all data changes. In order to do this, organizations may:

Transform all transactional ‘Update’ operations into ‘Insert’ and ‘Delete’ pairs to retain before-and-after images of records.
Employ procedural code (e.g. triggers) to keep track of changes.
Create copies of the historical data on external systems which may increase the liability of data breach and lead to additional costs related to data copying, transformation, storage, and maintenance.

Performing these tasks may cost Millions in yearly costs associated with additional transactions, additional procedural computing, increased storage, copying data to external environments, etc. But what if there was a way to:

Keep an entire history of changes to the data without manually changing the transactions themselves (i.e. without requiring code to transform updates into insert/delete pairs)
Automatically maintain beginning and end timestamps for each row of data where the timestamps indicate the “life” of the data (i.e. without requiring procedural code)
Access and analyze this data via the transactional systems (without impacting resources on these transactional systems)
Create a snapshot of the data as it existed at any point in time or range(s) of time with massive parallelism (without creating separate data connections and credentials

All of these "data forensic" enabling features are made possible on System z through two technologies. The first is a capability within DB2 for z/OS called Temporal Tables. The second is through a technology called the IBM DB2 Analytics Accelerator (The Accelerator). Please see the following paper (published soon) for details on using Temporal tables and the Accelerator for 'Historical Data Forensic' capabilities On Demand!

Thursday, June 15, 2017

New opportunities to drive analytics value into business operations: IBM DB2 Analytics Accelerator

Today, many System z clients are using the IBM DB2 Analytics Accelerator (the Accelerator) to help their organizations gain even greater insight and value from their data. Organizations can offload data-intensive and complex DB2 for z/OS queries to the Accelerator in order to support data warehousing, business intelligence and analytic workloads. The Accelerator executes these queries quickly, without requiring CPU utilization by DB2 for z/OS. The Accelerator is a logical extension of DB2 for z/OS, so DB2 manages and regulates all access to the Accelerator. DB2 for z/OS directly processes relevant workloads, such as OLTP queries and operational analytics. Queries that run more efficiently in a massively parallel processing (MPP) environment are seamlessly rerouted by DB2 for z/OS to the Accelerator. There is one set of credentials that is governed by RACF security, and all access flows through DB2 for z/OS. Users often first see the business value of the Accelerator in handling long-running queries, but many are also finding that the Accelerator can drive cost savings in areas such as administration, storage and consolidation as well as delivering real-time analytics.

This white paper discusses how organizations can improve analytic insight with the IBM DB2 Analytics Accelerator. It offers guidance to help organizations more quickly uncover new opportunity areas where the Accelerator can have the greatest impact. The paper covers topic areas including:

    •    Accessing enterprise data in place
    •    Gaining advocates from IT, application teams and Lines of Business
    •    Uncovering and expanding opportunities for the DB2 Analytics Accelerator
    •    Measuring the business value of the DB2 Analytics Accelerator
    •    Case studies
    •    The potential for the DB2 Analytics Accelerator to provide even greater ROI

Monday, September 14, 2015

z Analytics Business Value Validation Methodology

Are you considering an investment in a z Systems Analytics solution? How will you evaluate the Return on Investment (ROI) that will be realized using this solution? Does the measurement of 'Return' align with your business objectives? The z Systems 'Business value validation workshop' offered by IBM will validate both technically and financially if/how a z Systems centric solution can help you meet your key business objectives. Typical business objectives include cost savings, cost avoidance, new customer value, customer satisfaction, reduced liability, increased security.

For example, a fictional company 'Acme Systec' is focused on reducing costs and reduce data sprawl. This assessment would be used to explore the savings, efficiencies, and new value that can be gained by reducing data sprawl within Acme Systec's IT infrastructure through use case definition, requirement gathering, technical validation and a cost benefit analysis. The workshop would focus on Acme Systec's specific environment and business requirements, forging a partnership between the application teams, infrastructure teams, and key decision makers. The application teams provide relevant insight into use cases and business usage, while the infrastructure teams provide insight into current costs and technical configurations. The workshop recommendations provide a holistic approach to both technical architecture improvement and financial cost reduction.

The following link contains a sample offering focused on determining the cost savings that can be realized through DB2 z/OS + the IBM DB2 Analytics Accelerator: IDAA Cost Benefit Analysis Link. For more information about the Business Value Validation Methodology for z Systems, please contact your local IBM z Systems sales specialist.

-Shantan

Friday, September 11, 2015

Could your analytics strategy cost your business USD 100 million?

How new technologies can help protect your analytics data and your bottom line

    Technology trends and forces such as cloud, mobile and big data can represent big opportunities to bring analytic insight to the enterprise. They can also represent big risks if proper data security and governance controls are not in place. In 2015, one of the largest health benefits companies in the United States reported that its systems were the target of a massive data breach. This exposed millions of records containing sensitive consumer information such as social security numbers, medical IDs and income information. Various sources, including The Insurance Insider, suggest that this company's USD 100 million cyber-insurance policy would be depleted by the costs of notifying consumers of the breach and providing credit monitoring services—and that doesn’t consider other significant costs associated with a breach such as lost business, regulatory fines and lawsuits.
    Data is now so important that it is has a value on the balance sheet. Cyber criminals know this. Without exception, every industry has been under attack and suffered data breaches – healthcare, government, banking, insurance, retail, telco. Once a company has been breached, hackers focus on other companies in that same industry to exploit similar vulnerabilities. In 2015 the average cost of a data breach was US$ 3.79M, causing long term damage to the brand, loss of faith and customer churn.
    As you think about the impacts of this and other data security breaches occurring at organizations worldwide, consider this question: how exposed is your business to a similar type of breach? To answer this question, you must first ask, “Where does the data that feeds our analytics processes originate?”

See my full paper here

-Shantan