Big Data

Collecting and organising Big Data with OrientDB

Over the past 9 months, we’ve been re-developing QOREX (now far from the MVP phase), an application that aims at reducing business complexity by making sense of the large amounts of business data, such as human resources, projects, business targets, and strategic objectives – elements that are common to every organisation and are connected to each other in a complicated net of records (typical NoSQL database case).

When creating the architecture for QOREX’s database, we had to solve the problem of retrieving and analysing large amounts of data very quickly and in real time. We used the concept of the Tree Matrix (https://pdfs.semanticscholar.org/da83/7c708c59c2cb3c9c363f8871e4843f4fd7e7.pdf, https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/TrajectoryModelling/HTS-Introduction.pdf) and combined it with three grammar concepts – verbs, nouns, and adjectives – to organise data in QOREX. In QOREX’s case, this is the level of granularity required, but one might think about a more detailed structure where other concepts are introduced. A similar concept was used by Nasa for the Apollo 11 spacecraft computer (https://en.wikipedia.org/wiki/Apollo_Guidance_Computer).

Let’s quickly remind ourselves about each of these grammar concepts:

  • A noun is a word that names a person, place, thing, or idea.
  • A verb is a word used to describe an action, state, or occurrence that forms the main part of the predicate of a sentence.
  • An adjective is a word naming an attribute of a noun.

Nouns act on other nouns. Let’s think of a user called John, who needs to create a project and assign stakeholders to specific tasks of that project.

  • John -> Created -> Project 1
  • John -> Created -> Fred
  • John -> Updated -> Project 1 (assigning Fred as Project Manager of Project 1) At this point we can say that:
  • Fred -> IsProjectManagerOf -> Project 1
  • Fred -> Updated -> Project 1 (he changed the status to Approved)
  • … and so on

This structure helps data analytics answer these questions:

  • Who acted on a subject? (What is Project 1’s history?)
  • Which elements is a user related to? (Which projects is Fred linked to?)

However, there is important data missing in the diagram: the answer to the question “When?”. That becomes an attribute of the verbs because an action always happens at a precise point in time. This is where the concept of adjectives is introduced in a structurally embedded way, because in the case of a verb, time is an attribute of the action. In that case, the time is stored in the record; however, if your system needs to understand and analyse a timeline of events or group different actions according to when they happened, I recommend that you consider the Time Series proposed in the Orient DB documentation (https://orientdb.com/docs/2.2/Time-series-use-case.html).

In QOREX’s case, we were particularly interested in the database edges because most of the questions to be answered are related to user actions and connections between different elements or relationships, i.e., tasks in projects, projects relationship to strategic objectives, people in business targets, etc. The Object-Oriented (OO) nature of OrientDB is great for structuring the data, and the OO Paradigm works in our favor, speeding up DB queries. By default, OrientDB comes with two basic classes: E (for edges) and V (for vertices). Organising the classes in a meaningful structure proved to be essential to ensure database and query performance.

We created an abstract class called UserAction (extends E) and, from that, concrete classes such as Created, Updated, and Deleted.

In general terms, the database structure looks like this:

  • E
    • QorexEdge (extends E, abstract)
      • UserAction (extends QorexEdge, abstract, out:Person, in:X_V)
        • Created (extends UserAction, concrete)
        • Updated (extends UserAction, concrete)
        • Deleted (extends UserAction, concrete)
      • QorexAssociation (abstract, out: QorexEntity, in: QorexEntity)
        • StrategicObjectiveToBusinessDriver (extends QorexAssociation, concrete)
        • BusinessDriverToBusinessTarget (extends QorexAssociation, concrete)
        • BusinessTargetToBusinessTarget (extends QorexAssociation, concrete)
        • BusinessTargetToEnabler (extends QorexAssociation, concrete)
      • V
        • QorexVertex (extends V, abstract)
          • QorexEntity (extends QorexVertex, abstract)
            • StrategicObjective (extends QorexEntity, concrete)
            • BusinessDriver (extends QorexEntity, concrete)
            • BusinessTarget (extends QorexEntity, concrete)
            • Enabler (extends QorexEntity, concrete)

Supposing a user has RID (Record ID) #12:34 and we want to know all actions that the user performed in the system, no matter of what type of action it is, we would query as follows:

SELECT FROM UserAction WHERE out IN(#12:34)

Now, suppose we want to know every record created by the user. We would query as follows:

SQL: SELECT FROM Created WHERE out IN(#12:34)
API: query/records-created-by/12:34

If we want to know every action a user performed upon a record, we would query as follows:

SELECT FROM UserAction WHERE out IN(#12:34) and in IN(#56:78)
API: query/actions-performed-by/12:34

If we want to know the full history of actions performed on that record, we would query as follows:

SELECT FROM UserAction WHERE in IN(#56:78)
API: query/actions-performed-at/56:78

If we want to know who deleted a record, we would query as follows:

SELECT FROM Deleted WHERE in IN(#56:78)
API: query/who-deleted/56:78

Business Targets (BT) are connected to other Business Targets, to Business Drivers (BD), and to Enablers (EN).

When we need to return all connections between other entities and the BT #11:22, we query as follows:

SELECT FROM QorexAssociation WHERE out IN(#11:22) OR in IN(#11:22)
API: query/records-associated-to/56:78

If, instead, we want to know all the other BTs connected to BT  #11:22, we query as follows:

SELECT FROM BusinessTargetToBusinessTarget WHERE out IN(#11:22) OR in IN(#11:22)
API: query/records-associated-to/56:78?link=bttobt

There are many other insights that are more interesting than knowing who created what in the system that were created to support organisational analysis. Those insights are helping companies to identify and tackle problems, predict failure or success, and anticipate issues, allowing managers the time to correct their organisational trajectory. That real-time analysis is only possible because we have both a database and a system capable of dealing with large volumes of data.

High volume transaction application