MapDB 4 and near future

I decided to start new major MapDB version. Master branch was already refactored and tagged as MapDB 4.

MapDB 3

  • MapDB3 was announced more than 18 months ago.
  • Current stable branch is 3.0.
  • Dev branch 3.1 is cancelled
    • I started backporting changes to 3.0.x releases;
    • for example 3.0.5 had major performance improvement that reduced lock overhead.
  • The 3.0 branch will be maintained until 4.0 is released and becomes stable enough (most likely December 2017)

  • I decided to start new version because
    • Some features require format change (external files for large records, extended records)
    • API changes; lot of refactoring
    • changes in core classes (DBMaker, Serializers)
    • Some parts rewritten (Write Ahead Log, Volumes)

Major news in 4.0

  • Format change in StoreDirect format
    • better support for huge records
    • transparently put large records into external files
      • large records will bypass write-ahead-log, while preserving durability
    • better support for checksums and encryption
    • lazily streaming of large records (right now it is loaded into byte[])
  • full support for zero copy
    • deserialization input stream reads directly from mmaped file
    • write-ahead-log
  • redesign Volumes (file IO)
  • format change
    • support for values in external files
    • unified header
    • format evolution
      • old features will be deprecated, but not removed
  • way more automated tests
    • backward compatibility, format spec will be part of tests
  • MapDB will integrate with several libraries
    • it will be able to export/import data to Hadoop file formats, Sparkā€¦
    • I do not like several tiny maven project, so everything will be in MapDB artifacts (or perhaps mapdb-extra)
      • in separate package, latter might move into separate jar files
    • MapDB artifact will depend on several libraries,
      • but those will be optional compile time deps
      • user will be responsible for providing those
  • integration with libs and extras

  • mapdb will unify various types of collections
    • spark like
    • chronicle like
    • primitive collections over flat arrays (or memory mapped files)
    • flat cols over mmap files
  • support for Streams and Parallel Streams

Changes in development

  • I kept too tight grip on MapDB, tried to make it perfect, that made development too slow
  • way more blog posts
    • comments on various projects, algorithms, papers
    • staging place for documentation,
      • new feature will be first documented in blog post for comments, then moved into separate chapter
  • youtube channel
    • screencast videos to walkthrough code in IDE (very fast to produce, good for quick introduction)
  • change in a way documentation is made
    • bullet point oriented format
      • very fast to make, very readable
      • Antirez from Redis originally used this format
      • contributors are welcomed to reedit and polish the documentation
    • more code oriented
      • code examples will be written first, before code
  • change in release cycle
    • MapDB4 is the last major release
    • various formats will be introduced, and deprecated, but never removed
    • new formats (or collections) will start new file header, and use different implementations
    • new minor (4.X) version will be out every month
    • integration tests take about week to finish
      • dedicated machine will run integration tests nonstop
      • so every week there will be stable snapshot release or minor (4.0.X) bugfix release
  • changes in unit tests
    • way more unit tests
    • test full matrix of all configuration options; CPU is cheap
    • concurrency stress tests
    • performance regression testing (MapDB 3 release was disaster)
    • test storage format compatibility (can read and modify files generated by older 4.0.0 release)

Roadmap for next 3 months

  • first priority is to finish Elsa Serialization library, but final version will be released together with MapDB 4

  • MapDB 4.0 should be out at end of October
    • with features of MapDB 3, but without open TODOs (missing compaction)
  • there will be many blog post describing my progress on MapDB

  • semi-stable release (passes acceptance tests) should be out every week

New features after 4.0 release

I have very long list of ideas. So I will go through my bookmarks and notes; and put everything into series of blog posts.

So far most requested features are:

  • extra collections to support cryptography and blockchain applications
    • authenticated merkle tree (immutable, fast creation with data pump)
    • authenticated skip list, already written for iodb
  • LSM Store based on IODB
    • supports snapshots
    • supports branching (the same way Git or other CVS)
  • data pump for everything
    • including hashmap
    • fast creation is important for Merge algos
  • spark compatibility
    • spark data frames is functional data transformation language
      • it also defines how data should be partioned to fit into memory on single node
    • spark uses several nodes
    • but single node spark swaps data in-out of memory
    • mapdb can do it way more efficiently (10x?)
    • so I want to have some compatibility with Spark Data Frames
  • Query planner support
    • support some sort SQLish language with query planner and executor
    • take inspiration from Postgres extension API
    • SQL engine from SQLite VM?
    • use Spark Catalyst??
  • reactive support
    • planned for very long time I played with Kilim in 2008, JDBM3 was originally steered this direction
    • based on Kotlin continuations and perhaps similar framework
    • based on AsynchronousFileChannel
    • non-blocking disk IO
    • should include MapDB and most of its collections
    • support for Akka, RxJava and similar frameworks
  • time series database

  • graph databaseā€¦