One post tagged with "insight"

The progression of analytics in enterprises

October 1, 2021 · 13 min read

Principal Developer

An organization’s analytics strategy is how its people, processes, tools, and data work together to collect, store, and analyze data. Processes refers to how analytics are produced, consumed, and maintained. A more modern approach to analytics is intended to support greater business agility at scale. This requires faster data preparation from a wider variety of sources, rapid prototyping and analytics model building, and cross-team collaboration processes. Tools, or technologies, are the raw programs and applications used to prepare for and perform analyses, such as the provisioning, flow, and automation of tasks and resources. As an analytics strategy matures, the technologies used to implement it tend to move from monolithic structures to composable microservices. The last element is data. A modern analytics architecture supports a growing volume and variety of data sources, which may include data from data warehouses and data lakes—streaming data, relational databases, graph databases, unstructured or semi-structured data, text data, and images.

Analytics Past, Present, and Future

	Past	Present	Future
	This refers to an era of analytics starting in the 1990s and running through the mid-2000s. During this phase, organizations were able to consolidate mostly transactional data into a unified system, often a data warehouse, which limited end users’ ability to interact directly with the data due to technical and governance requirements.	Starting in the late 2000s, organizations were forced to rethink how they used analytics, in no small part due to the explosion of data during this time. This was the era of “Big Data” and its infamous “ V’s”: volume, velocity, and variety.4 As organizations shifted their approach during this period, they unlocked diagnostic analytics, or the capability to answer “Why did it happen?”	The Future of Analytics Is Converged. Converged analytics unifies advances in AI, streaming data, and related technologies into a seamless analytics experience for all users. This arrangement unlocks prescriptive analytics across an organization, allowing anyone to make data-driven decisions that answer important questions.
People	IT professionals were needed to kick off any data-based work by extracting data from a centralized, difficult-to-use source. This process could take multiple days, and the number of query requests could easy exceed the IT team’s ability to fulfill those requests—and the opportune time for new insights.If some change was needed to data collection or storage methods, it could easily take IT months to perform. The data analysis and modeling work could take nearly as long. Rank-and-file domain experts did have some access to data, through so-called self-service business intelligence (BI) features. However, due to the same speed and accessibility issues that technical professionals faced, it was often difficult for domain experts like line of business leaders to truly lead with data for decision making.	It’s no coincidence that around the same time as Big Data emerged, so did the role of the data scientist. Compared with earlier roles like researcher or statistician, the data scientist blends quantitative and domain expertise with a greater degree of computational thinking. These skills became necessary both to handle the greater variety and volume of data sources and to update and deploy data and analytics models without the assistance of IT professionals.Whereas IT in the past sought to meticulously catalog and structure data to enter into a data warehouse, they no longer needed to always clean the data before collecting it; these analytics teams could focus on ease of use and speed to governed access.With these new workflows and organization structures in place, domain leaders are better able to lead with data: both via self-service BI tools and from frequent collaboration with data analysts, data scientists, and other data specialists.	Statisticians and IT served information to business users at the inception of a wider analytics adoption; further into maturity, data analysts and scientists built systems where business users could self-serve insights. In a converged architecture, not only is the business user at the center, but their decision making is augmented by automation. Given this arrangement, there is more collaboration, more automation, and greater scale for data-driven insights as a result of the convergence of teams and workstreams. Teams can work cross-functionally and in parallel across different domains iterating the system to their needs with the raw time and human resources needed to create and maintain analytics products such as dashboards and models.
Processes	IT professionals spent long periods of time gathering requirements for analytics projects before they could build or deploy solutions. The team meticulously catalogued sources of data used across the organization, from financial or point-of-sale systems to frequently used external datasets. As part of the warehousing process, it was decided which of these data sources to store and how to store them.Once deployed, data passed into the warehouse through an extract-transform-load process (ETL), where the data was copied from these various sources, cleaned and reshaped into the defined structure of the data warehouse, then inserted into production. In other words, data went through rigorous cleaning and preprocessing before use.To reach this data, users needed to write time-consuming ad hoc queries. Alternatively, particular data segments or summaries that were frequently requested by business users could be delivered via scheduled automation to reports, dashboards, and scorecards.	As opposed to earlier analytics strategies, IT professionals now seek to collect data as is from any possible source of value. This data can be in a variety of formats, so few predefined rules or relationships are established for ingestion. Depending on the data size, data is processed in batch over discrete time periods, or in streams and events near real time. Because data cleaning is the last step, this process is sometimes referred to as extract-load-transform (ELT), as opposed to the ETL of earlier architectures. For data scientists and other technical professionals, faster access to more and more dynamic data better enables the rapid development of training sets of data for machine learning models. The ELT process allows for the construction of machine learning models, where computers are able to improve performance as more data is passed to them.As more data is collected and put into production, the importance of a data governance process typically grows, describing who has authority over data and how that data should be used. Similar approaches are necessary to audit how models are put into production and how they work.	While perhaps using different means, the ends of older analytics approaches were the same: insights, whether historic or in support of future decisions, using governed data and processes. In the methods for doing so, however, infrastructure tended to bloat, either from fragile data storage jobs or increasingly complex data pipelines.Given the volume, velocity, and variety of data needed for prescriptive analytics, such monolithic, centralized approaches are less than optimal. Using the tools discussed in the next section, a converged architecture offers a more nimble approach for providing the right insights at the right time to users of all technical levels.Such democratization relies on quick deployment and adjustment of data products; optimizing production, for example, requires bringing more machine learning models to production faster and at scale. The practice of ModelOps is used to institute and govern such rapid production. These processes have become a necessity in rapidly changing business conditions; for example, as the COVID-19 pandemic made structural changes to the economy, many models lost their predictive edge in the face of fundamentally different data.
Tools	Data warehouses implemented some new technologies relative to the traditional relational database model. Importantly, the data warehouse separated data into fact tables, where measurements were stored, and dimension tables, which contained descriptive attributes. Business users interacted with the data via reporting software to view static data summaries. These tended to rely on overnight batch jobs to update.In a more sophisticated architecture, analysts could take advantage of online analytical processing (OLAP) cubes. Usually relying on a star schema, OLAP let users query the data across dimensions during interactive sessions. For example, they could “slice and dice” or “roll up and drill down” on the data.By this point, end users had some autonomy in how they looked at and acted upon the data. Automated processes to inform business activities through data were also put into place, such as alerts when inventory or sales dropped below some threshold. Basic what-if analyses also helped business users evaluate decisions and plan for the future.That said, given the limited sources of data from the data warehouse, there were limited ways to customize and work with the data. While reporting and basic analytics were automated, end users operated largely without the assistance of models developed by statisticians. Although business intelligence and operations research seek to create value from data, too often these complementary tools were siloed.	In 2011, James Dixon, then chief technology officer of Pentaho, coined the term data lake as the architecture needed to support the next level of analytics maturity. Dixon argued that because of the inherently rigid structures of data warehouses, getting value from the increasing volume and variety of data associated with Big Data was difficult. A data lake, “a repository of data stored in its natural/raw format,” was a better approach. In particular, this arrangement wasn’t suited to operate or capitalize on the expanding volume and variety of Big Data.The data lake is often powered by cloud computing for the benefits of reliability, redundancy, and scalability. Dominant cloud service providers include Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). Open source technologies like Hadoop and Spark are used to process and store massive datasets using parallel computing and distributed storage. Because this data is often unstructured, it may be stored in graph, document, or other non-relational databases.With the increasing volume and velocity of data, and the use of data lakes along with data warehouses to enable data-driven decisions, businesses needed better ways to scale and share business intelligence. One such path was through interactive, immersive exploration and visualization of the data, as pioneered with Spotfire. Other paths were through visual reports and dashboards, as used by not just Spotfire, but by Jaspersoft, Power BI, WebFOCUS, and many others. As BI tools matured, self-service capabilities and automation for end users also matured.	If maintaining legacy analytics is like raising a thoroughbred, then developing converged analytics is like cultivating a school of goldfish. That is, the backend provisioning is no longer served by monolithic systems but rather by composable groups of microservices. This arrangement supports elastic and scalable analytics; composability makes it easier to adapt to changes driven in part by a growing volume and variety of data sources. In previous analytics approaches, the distinction between backward-facing BI and prediction-focused data science was clear. Under convergence, analytics at the edge is possible—automating analytic computations so they can be performed on non-centralized data generated by sensors, switches, and similar. With converged analytics, individuals no longer need to wait for data science teams to provide ad hoc deeper insights. They have all the data-driven insights at their fingertips, assisted by AI to quickly explore and make decisions. This isn’t just the case for back-office analysts: frontline workers can, for example, adjust how they interact with a customer given data retrieved about that customer at the time of that interaction.
Data	During this period, data tended to be transactional, or related to sales and purchases. Take a point-of-sales (POS) system, for example. Each time a sale is made, information about what was sold, possibly to whom, is recorded in the POS system. Those records can be compiled into tables and ultimately processed into a data warehouse.Under this process, data is gathered from prespecified sources at prespecified times, such as a nightly POS extract. Not all data made its way to the data warehouse, especially in the earlier days of analytics—either because it was judged unimportant, or because it was not prioritized.	Contemporary analytics expands the variety of data available and used: both structured tables and unstructured sources like natural language and images are available. On account of stream processing, refreshes of this data are available in minutes or even less. In particular, the data lake can accommodate real-time events such as IoT sensor readings, GPS signals, and online transactions as they happen.	A primary feature of converged analytics is the blending of historical and real-time data. According to a study by Seagate and International Data Corporation (IDC), 30% of all data will be real time by 2025. In particular, IoT sensor readings, GPS signals, and online transactions as they happen are available for immediate analysis and modeling.
Agility	The relatively rigid nature of the data warehouse made changes to the collection and dissemination of data difficult. Subsequently, business agility was limited. Business users could get historic data about the business through static reports (descriptive analytics). Through OLAP cubes, they could possibly even dig into the data to parse out cause and effect (diagnostic analytics). But without more immediate access to broader data, it was difficult to advance to predictive analytics, or the ability to ask: “What is going to happen?”	This next phase in the evolution of analytics gets data-driven insights into the hands of end users quickly, with technology allowing them to interact with it on a deeper level. Data scientists are able to build machine learning systems that improve with more data. Using drag-and-drop tools, business users can process and analyze data without technical assistance. With cloud, automation, and streaming technologies, organizations have been better able to adapt to and plan for changing circumstances. That said, machine learning works only so long in production before the algorithm struggles to account for changes to the business and needs intervention. While data scientists undertake these predictive challenges, BI professionals and domain experts tend to operate solely in analyzing current or past data. The next generation of analytics architecture will further reflect organizational needs for greater collaboration among data scientists, BI and analytics teams, and business users and consumers of analytics insights.	Earlier analytics tended to isolate skills and processes: technical versus highly technical roles, data collection versus deployment versus modeling, and so forth. Converged analytics promotes close collaboration between teams to rapidly model, deploy, and act on data. As data operations become decentralized, teams and individuals can rapidly mine and act on the analytics.In particular, the marriage of real-time data with machine learning and AI-infused BI allows any user to magnify their own domain knowledge with data-driven insights. These features square precisely with the definition of business agility as “innovation via collaboration to be able to anticipate challenges and opportunities before they occur.” With the support of converged analytics, any professional can detect and act on both challenges and opportunities at the moment of impact, rather than months later.

Analytics Past, Present, and Future​

Analytics Past, Present, and Future