This is a list of features we propose to deliver in future releases of Mondrian. Each feature is linked to a high-level description. Complex features will have more detailed specifications in a separate document.
This document has several goals. First, it lets the Mondrian community know what features we are thinking about implementing. There may be better ways of delivering the same functionality that we haven't thought of.
Second, since there is always more work than time, it allows us to prioritize. If we hear that a particular feature is important to a lot of people, we will try to get to it sooner.
Third, it allows us to attract resources. If there are features in this roadmap which are important to your organization, consider sponsoring Mondrian's development.
Mondrian's goal is to bring multidimensional analysis to the masses.
To do this it needs to be:
As an open-source olap server written in pure Java, we feel that it meets these goals. We can't anticipate all of our customers' requirements, but open-source combined with Java keeps Mondrian flexible. It's easy to add functionality or to integrate third-party tools, and Mondrian be integrated into a variety of environments.
Mondrian is part of the Pentaho Open Source BI Suite. Pentaho aims to deliver the best possible user experience by integrating Mondrian with other open-source components such as Kettle, Pentaho Reporting, and Weka. While building this integration, Pentaho is committed to keeping Mondrian independent from other components, and available under a commercial-friendly open-source license.
Mondrian can't do everything. If it did everything, it would be a huge download, difficult to install, and even more difficult to integrate with other software; and we'd never finish writing it. But the good news is, this is open source. If a feature is missing, it's often easy to add the feature to Mondrian or to integrate with another open-source product that provides the feature.
JPivot is Mondrian's sister project. It provides an excellent user-interface, and shows off what Mondrian can do. But we have been careful to keep the two projects separate. (You can use another user-interface to Mondrian, and you can also use JPivot with other data-sources.) If you've run Mondrian's demo and you have suggestions on how to improve the web interface, please make your suggestion to the JPivot project directly.
Pentaho encourages companies to sponsor development of features which are important to them. Sponsorship allows Mondrian developers to spend more time to spend more time adding features to Mondrian, rather than having to find other ways to pay the rent. The results are always contributed back to the project as open-source.
Another way companies can help Mondrian is to assign employees to co-develop features. We can help specify and design these features, provided that the resulting code is contributed to the project.
If your organization would like to sponsor development of features, please contact Julian Hyde.
Targeted release timeframe: Q3 2008.
olap4j is a proposed standard API for access to any OLAP data source from Java. See www.olap4j.org.
As of mondrian-3.0 olap4j is the primary API to mondrian; mondrian's driver is based on olap4j-0.9.4 (beta). olap4j release 1.0 will be the first production release of the olap4j specification. It will include a full Test Compatibility Kit (TCK) and incorporate bug fixes & feedback from the drivers and applications built using olap4j beta.
Targeted release timeframe: Q3 2008
Feature | Effort | Importance |
---|---|---|
Remove support for old API | low | medium |
3.12 Bridge to CWM. Integration with Pentaho Metadata. Could be incubator project. Note that someone has already implemented a bridge in one way. | high | high |
3.10 Further work on Aggregate Tables. To support the aggregation designer, mondrian release 3.1 will probably include utilities (2) DDL generation and (3) Utility (maybe graphical, maybe text-based) to recommend a set of aggregate tables. | high | high |
TBD |
Targeted release timeframe: Q2 2008
Effort: high, Importance: high, Priority: high
Release Highlights:
Targeted release timeframe – not specified
Effort: high, Importance: high, Priority: high
Release Highlights:
Effort: medium; importance: medium; priority: medium.
Whereas a regular cube has a single fact table, a partitioned cube has several fact tables, which are unioned together. The fact tables must have the same column names.
Each fact table can have a range (similar to 'cache ranges', above) which describes what data ranges are found in each. When looking for a particular cell, Mondrian scans the tables' criteria to determine which table to look in. For example, T1 holds data for Texas, 2005 onwards; T2 holds data for 2004 onwards; T3 holds all other data. The cell (Oklahoma, January 2005) would be found in T2.
Partitioned tables are useful for real-time analysis. For example, one partition might contain today's data, while another might hold historical data. The 'hot' partition with today's data would typically have fewer or no aggregation tables and have caching disabled; its fact table might have different physical options in the RDBMS, say fewer indexes to maximize insert performance.
Example schema:
<Cube name="Sales">
<Partitions>
<Partition name="partition1"
cache="false">
<Table name="sales_fact_this_month"/>
<Ranges>
<Range dimension="[Time]">
<RangeMember bound="lower" member="[Time].[2005].[9]"/>
</Range>
<Range dimension="[Store]">
<RangeMember member="[Store].[USA].[CA]"/>
<RangeMember member="[Store].[USA].[WA].[Seattle]"/>
</Range>
</Ranges>
</Partition>
<Partition name="partition2"
cache="true">
<Table name="sales_fact"/>
<Ranges/>
</Partition>
</Partitions>
</Cube>
Effort: medium; importance: medium; priority: low.
When Mondrian initializes and starts to process the first queries, it makes sql calls to get member lists and determine cardinality, and then to load segments into the cache. When Mondrian is closed and restarted, it has to do that work again. This can be a significant chunk of time depending on the cube size. For example in one test an 8GB cube (55M row fact table) took 15 minutes (mostly doing a group by) before it returned results from its first query, and absent any caching on the database server would take another 15 minutes if you closed it and reopened the application. Now, this cube was just one month of data; imagine the time if there was 5 years worth.
What ideas and designs can you come up with to speed that up, in other words to do anything time consuming only once and reuse it between instances?
Gang Chen: If it's possible, can we calculate the real levels of a parent-child hierarchy? This'll let Mondrian's metadata close to MS AS's.
Julian Hyde: Can you give me more details on how that would work? Start a discussion forum or feature request on SourceForge.
Other options for cold start:
Effort: medium; importance: medium; priority: low.
If the cache contains aggregates for all children of a member, then Mondrian would be able to compute the aggregate for the parent member by rolling up.
See the email thread "grouper in Mondrian".
Effort: medium; importance: low; priority: low.
Process to validate a schema.
Process to validate a set of queries. Maybe an option to ignore errors due to specific members not existing because the data hasn't been loaded yet.
Expose validation via Eclipse plugin.
Mondrian's name resolution is not always compatible with other MDX implementations such as MSAS and SAS.
[Products].[Boston Lager]
seems to be valid in MSAS if product names are unique, whereas Mondrian
currently requires [Products].[Beverages].[Beer].[Samual Adams].[Boston Lager]
.[Customers].[(All customers)].[USA]
would become
[Customers].[USA]
. Mondrian would still understand names of the
previous form.Implement standard MDX functions:
Except
is implemented in Mondrian 1.2 except the ALL
keyword.)CWM (Common Warehouse Model) is a standard model for defining data warehouse and multidimensional schemas. It allows interoperability with tools such as UML diagrams, relational report design tools, and ETL tools.
This feature will add:
The standard aggregate functions are sum, count, distinct-count, min, max and avg. This feature will provide an SPI by which application developers can write their own aggregate functions.
The SPI will include:
The SPI will support functions which map to a SQL expression rather than a SQL aggregate function. The "avg" function is an example of this: it works by expanding itself to sum / count.
The SPI will support functions which can be computed from unaggregate fact table data, but cannot be rolled up. The "distinct-count" function is an example of this.
You will be able to include user-defined aggregate functions in aggregate tables.
Utility to populate (or generate INSERT statements to populate) the agg tables. (For extra credit: populate the tables in topological order, so that higher level aggregations can be built from lower level aggregations.)
Utility to generate a script containing CREATE TABLE and CREATE INDEX statements all possible aggregate tables (including indexes), XML for these tables, and comments indicating the estimated number of rows in these tables. Clearly this will be a huge script, and it would be ridiculous to create all of these tables. The person designing the schema could copy/paste from this file to create their own schema.
This is essentially an optimization algorithm, and it is described in the academic literature. Constraints on the optimization process are the amount of storage required, the estimated time to populate the agg tables. The algorithm could also take into account usage information.
I'm thinking of these being utilities, not part of the core runtime engine. There's plenty of room to wrap these utilities in nice graphical interfaces, make them smarter.
mondrian.olap
package)
still exists but is deprecated; from mondrian-3.1 onwards, classes and
methods in this API may not exist, may not work, or may change.rollupPolicy
attribute of the <HierarchyGrant>
element.Descendants(<Member>, , LEAVES)
;
Format
can now be
applied to DateTime values; Iif
can be applied to member,
level, hierarchy, dimension and tuple and set values; Levels
can be applied to a string expression.Removed methods that were deprecated in 2.4, plus:
GROUPING SETS
SQL construct, for
databases which support it. By leveraging Grouping Sets, Mondrian can reduce
the number of SQL queries necessary to fulfill an MDX request, and databases
can often execute the combined queries more efficiently than the individual
queries. Grouping Sets are currently supported in Oracle, DB2, Teradata and
Microsoft SQL Server.Extract(<Set>, <Dimension>[, <Dimension>...])
,
Generate
, Iif(bool, bool, bool)
, Len
,
Left
, Mid
, UCase
. [Products].&[1234]
.API changes in release 2.4
DynamicSchemaProcessor
. Moved the
mondrian.rolap.DynamicSchemaProcessor
interface to package
mondrian.spi
. The processSchema(URL, PropertyList)
method now has signature processSchema(String, PropertyList)
,
and the URL is intended to be interpreted as an Apache VFS URL. Class
mondrian.spi.impl.FilterDynamicSchemaProcessor
is a partial
implementation.String
or String[]
to lookup multi-part identifiers such as '[Store].[USA].[CA]
'Id.Segment
or List<Id.Segment>
. The
previous methods are deprecated and will be removed in mondrian-3.0 (see
below).Deprecated methods to lookup multi-part identifiers which are deprecated in mondrian-2.4 and will be removed in mondrian-3.0:
Formula.Formula(String[], exp)
Formula.Formula(String[], Exp, MemberProperty[])
QueryPart.addFormula(String[], Exp, MemberProperty[])
SchemaReader.lookupCompound(OlapElement, String[], boolean,
int)
SchemaReader.getMemberByUniqueName(String[], boolean)
SchemaReader.getMemberByUniqueName(String[], boolean,
MatchType)
Util.explode(String)
Util.lookupCompound(SchemaReader, OlapElement, String[],
boolean, int)
Util.lookup(Query, String[])
Other deprecated methods to be removed mondrian-3.0:
Query.getQueryString()
QueryPart.toMdx()
RolapSchema.flushSchema(String, String, String, String)
RolapSchema.flushSchema(String, DataSource)
RolapSchema.clearCache()
RolapSchema.flushRolapStarCaches(boolean)
RolapSchema.flushAllRolapStarCachedAggregations()
CachePool.flush()
API changes which may impact existing applications:
mondrian-*-embedded.zip
,
including an embedded Derby database in the WAR. This can be deployed to
Tomcat on any platform by simply exploding the WAR into TOMCAT/webapps,
allowing folks "kicking the tires" to easily try out Mondrian/JPivot. See
how to deploy and run the embedded web app.VisualTotals
, LastPeriods
,
AddCalculatedMembers
, StripCalculatedMembers
MDX
functions./*
... */
, --
[rest of line], //
[rest of line]).FunCall
with the name of a function but no function definition.
This complicated the validation process, because we would discover at runtime
that a function call had no definition. Now you should use the new class
UnresolvedFunCall
.Exp.getType()
used to return int
, now returns
Type
Exp.getType()
should use
Exp.getCategory()
int[] FunDef.getParameterTypes()
is renamed to int[]
FunCall.getParameterCategories()
int FunCall.getReturnType()
is renamed to int
FunCall.getReturnCategory()
Exp.getTypeX()
method; old usages of this
method should now use Exp.getType()
.Cube
,
Dimension
, Hierarchy
, Level
and Member
no longer implement the Exp
interface. If you want to use these
in expressions, there are wrapper classes: DimensionExpr
,
HierarchyExpr
, LevelExpr
, MemberExpr
. These
are in a new package, mondrian.mdx
. Some other parse tree classes
(Query
, Literal
) will move to this package at some
time in the future.WITH SET
syntax to
define sets within an MDX query.WITH SET
feature and functions such as
RANK
cause the same expression to
be evaluated many times within the course of a single MDX statement. The set-expression cache
improves the performance of such queries. Hierarchize
, ":", Aggregate
, and
statistical functions.
Author: Julian Hyde; last modified February 2008.
Version: $Id: //open/mondrian/doc/roadmap.html#28 $
(log)
Copyright (C) 2002-2009 Julian Hyde and others