MapDB 4 and near future

I decided to start new major MapDB version. Master branch was already refactored and tagged as MapDB 4.

MapDB 3

MapDB3 was announced more than 18 months ago.
Current stable branch is 3.0.
Dev branch 3.1 is cancelled
- I started backporting changes to 3.0.x releases;
- for example 3.0.5 had major performance improvement that reduced lock overhead.
The 3.0 branch will be maintained until 4.0 is released and becomes stable enough (most likely December 2017)
I decided to start new version because
- Some features require format change (external files for large records, extended records)
- API changes; lot of refactoring
- changes in core classes (DBMaker, Serializers)
- Some parts rewritten (Write Ahead Log, Volumes)

Major news in 4.0

Format change in StoreDirect format
- better support for huge records
- transparently put large records into external files
  - large records will bypass write-ahead-log, while preserving durability
- better support for checksums and encryption
- lazily streaming of large records (right now it is loaded into byte[])
full support for zero copy
- deserialization input stream reads directly from mmaped file
- write-ahead-log
redesign Volumes (file IO)
- refactor File IO to use memory-mapped files better way
- support for AsynchronousFileChannel and non-blocking disk IO, with continuations and light thread
format change
- support for values in external files
- unified header
- format evolution
  - old features will be deprecated, but not removed
way more automated tests
- backward compatibility, format spec will be part of tests
MapDB will integrate with several libraries
- it will be able to export/import data to Hadoop file formats, Spark…
- I do not like several tiny maven project, so everything will be in MapDB artifacts (or perhaps mapdb-extra)
  - in separate package, latter might move into separate jar files
- MapDB artifact will depend on several libraries,
  - but those will be optional compile time deps
  - user will be responsible for providing those
integration with libs and extras
mapdb will unify various types of collections
- spark like
- chronicle like
- primitive collections over flat arrays (or memory mapped files)
- flat cols over mmap files
support for Streams and Parallel Streams

Changes in development

I kept too tight grip on MapDB, tried to make it perfect, that made development too slow
- Lessons from mapdb development blogpost
- in future I will move faster, but keep quality where it matters; automated unit and acceptance tests
way more blog posts
- comments on various projects, algorithms, papers
- staging place for documentation,
  - new feature will be first documented in blog post for comments, then moved into separate chapter
youtube channel
- screencast videos to walkthrough code in IDE (very fast to produce, good for quick introduction)
change in a way documentation is made
- bullet point oriented format
  - very fast to make, very readable
  - Antirez from Redis originally used this format
  - contributors are welcomed to reedit and polish the documentation
- more code oriented
  - code examples will be written first, before code
change in release cycle
- MapDB4 is the last major release
- various formats will be introduced, and deprecated, but never removed
- new formats (or collections) will start new file header, and use different implementations
- new minor (4.X) version will be out every month
- integration tests take about week to finish
  - dedicated machine will run integration tests nonstop
  - so every week there will be stable snapshot release or minor (4.0.X) bugfix release
changes in unit tests
- way more unit tests
- test full matrix of all configuration options; CPU is cheap
- concurrency stress tests
- performance regression testing (MapDB 3 release was disaster)
- test storage format compatibility (can read and modify files generated by older 4.0.0 release)

Roadmap for next 3 months

first priority is to finish Elsa Serialization library, but final version will be released together with MapDB 4
MapDB 4.0 should be out at end of October
- with features of MapDB 3, but without open TODOs (missing compaction)
there will be many blog post describing my progress on MapDB
semi-stable release (passes acceptance tests) should be out every week

New features after 4.0 release

I have very long list of ideas. So I will go through my bookmarks and notes; and put everything into series of blog posts.

So far most requested features are:

extra collections to support cryptography and blockchain applications
- authenticated merkle tree (immutable, fast creation with data pump)
- authenticated skip list, already written for iodb
LSM Store based on IODB
- supports snapshots
- supports branching (the same way Git or other CVS)
data pump for everything
- including hashmap
- fast creation is important for Merge algos
spark compatibility
- spark data frames is functional data transformation language
  - it also defines how data should be partioned to fit into memory on single node
- spark uses several nodes
- but single node spark swaps data in-out of memory
- mapdb can do it way more efficiently (10x?)
- so I want to have some compatibility with Spark Data Frames
Query planner support
- support some sort SQLish language with query planner and executor
- take inspiration from Postgres extension API
- SQL engine from SQLite VM?
- use Spark Catalyst??
reactive support
- planned for very long time I played with Kilim in 2008, JDBM3 was originally steered this direction
- based on Kotlin continuations and perhaps similar framework
- based on AsynchronousFileChannel
- non-blocking disk IO
- should include MapDB and most of its collections
- support for Akka, RxJava and similar frameworks
time series database
graph database…

Comments

Eduard Dudar • 3 years ago

No pressure of course but wondering what are the current plans for 4.0 release. Some features like non-blocking IO are very sweet but github shows only 1 issue in closed for 4.0 and about 60 opened.