On-line data analysis and on-line analytical processing. On-line analytic application servers integrating internal operating and external subscription based information via XML and the Web. Been there, done that – for quite a while – using a general-purpose, temporal, object-oriented data base management engine – Vision – that is equally at home supporting portfolio analytics, TV ratings, and market research applications. Vision has been doing that job successfully, reliably, and with no fundamental change to its architecture or information model since its initial release in 1986. The in-house systems powered by Vision are mission critical in their organizations.
While Vision is a long established technology, it remains state of the art in its ability to manage, analyze, and disseminate complex cross-sectional and time-varying information. With the recent acquisition of Innovative Systems Techniques and Vision by FactSet Research Systems (NYSE:FDS), a global provider of integrated online database and information services to the financial community, the scope, role, and deployment of Vision based analytic data warehouses and applications is expanding based on the powerful synergies that come from possessing both an analytic database technology and the analytic data and applications that exploit it.
It is axiomatic that data cleaning and data integrity are the most costly, least glamorous, and most essential components of any analytic information system. They can easily be the most nuanced as well. Even in communities that are heavily invested in standardized infrastructure, data from different sources is rarely directly comparable. Keys that ostensibly name the same real-world entity are not stable. Different sources of information can be counted upon to reflect those instabilities at different times and, in many cases, in different, often idiosyncratic, ways. Keys are routinely overloaded to refer to different, but functionally related, things. GM, the company, is not the same as the common stock of GM, which is certainly not the same as the tracking stock of GM Hughes, although, in many cases, the distinctions are blurred when context alone should disambiguate the intent. Even when keys are stable, the interpretation of the data associated with those keys is sensitive to its context in other ways. That context can include factors such as the currency in which the data was reported or the timing of activities such as stock splits. Unlike transaction files that track consumer activity, these are not cases that can be ‘cleaned’ with simple statistical filtering and sampling techniques. Every piece of information counts. Using that information correctly requires the embedding of complex interpretative rules in the infrastructure of the data model and making those rules accessible to the data base engine that manages and queries the data.
While complex time-varying analytic applications and long, exploratory transactions are clearly the forte of an analytic data warehouse, often overlooked in the search for needles hiding in haystacks is the fact that, once built, an integrated data warehouse provides day-to-day value simply based on its existence and the integration and quality of its data. That is especially true when the content of the warehouse incorporates operational data on a timely basis. The basic applications that we have found to be the bread-and-butter of the systems we deploy run the gamut from one-time ad hoc queries to Web packaged fact-lets containing standard bundles of information and analysis. These applications typically require high performance, especially when used as building blocks for Web content; however, because of the basic complexity of the interpretative rules that must be obeyed and the aggregations that must be performed, these are not simple queries. Supporting these applications efficiently requires recognition of the fact that analytic applications take functional paths through multiple, usually wide, ‘tables’, and usually do not require record-oriented access to individual entities. While the principle is not new, the message bears repeating.
Before HTML and the Web, a database could supply data to whoever wanted it – provided they came to it on the database’s terms. The question of the day asked by information consumers was “What do I have to do to get what I want out of that system over there?” Typically, that meant using obscure APIs and arcane terminology. Databases were middlemen in a seller’s market dominated by those who understood their workings – take it or leave it. Of course, for those of us who serve, that’s no way to treat our clients, but that’s the way it was. With the arrival of the Web as a catalyst and HTML and XML as delivery vehicles, the world as we know it became a consumer’s marketplace. “I understand HTML or this particular XML schema, can you give me what you have in a form that I understand?” became, or ought to become, the relevant question. Along the way, by learning anew to speak languages and not just APIs, the world has rediscovered the value of the interpreted language as a vehicle for communication. At least for us, that has greatly simplified the middleware and allowed us to do something we could always do – speak directly to information consumers in languages they understand.
Having an integrated analytic database/application server technology like Vision changes one’s perspective on the problem at hand. With the ability to capture and use complex rules and relationships from within the database, including the rules that contextually present information in the language of its consumers, we significantly reduce the complexity of our deployments. Not only can our clients be arbitrarily thin, so can our plumbing. Most new development is conducted at the level of application and application support code written in the language of the Vision database system itself. Few new features are ever added at the level of the core technology, and those that are added constitute implementation refinements. We are free to concentrate on the business of adding value to the data and the information. Such is the state of our art.
Michael J. Caruso
Senior Vice President, Strategic Vision Solutions
FactSet Research Systems, Inc.
One Gateway Center, Suite 910
Newton, MA 02458
(617) 965-8450 ext. 3002