nodata/README.md

57 lines
1.9 KiB
Markdown
Raw Permalink Normal View History

# nodata
Nodata is a simple binary that consists of two parts:
1. Data ingest
2. Data storage
3. Data aggregation
4. Data API / egress
## Data ingest
Nodata presents a simple protobuf grpc api for ingesting either single events or batch
## Data storage
Nodata stores data locally in a parquet partitioned scheme
## Data aggregation
Nodata accepts wasm routines for running aggregations over data to be processed
## Data Egress
Nodata exposes aggregations as apis, or events to be sent as grpc streamed apis to a service.
# Architecture
## Data flow
Data enteres nodata
1. Application uses SDK to publish data
2. Data is sent over grpc using, a topic, id and data
3. Data is sent to a topic
4. A broadcast is sent that said topic was updated with a given offset
5. A client can consume from said topic, given a topic and id
// 6. We need a partition in here to separate handling between partitions and consumer groups
6. A queue is running consuming each broadcast message, assigning jobs for each consumer group to delegate messages
## Components
A component is a consumer on a set topic, it will either act as a source, sink or a tranformation between topics. It can declare topics, use topics, transform data and much more.
A topic at its most basic is a computational unit implementing a certain interface, source, sink, transformation.
The most simple is a source and sink, where we respectively push or pull data from the topics.
A component implements either or all of 3 sdk interfaces
1. Create a new sample rust application
2. Add dependency nodata-component
3. In the main functin use: nodata_component::component
4. Implement the interfaces you want to use
5. Build the application into a dockerfile, optionally use the nodata cli to build the app
6. Register the application as a component
7. nodata client add-component --image docker.io/kjuulh/nodata-example-transform:latest --tranform `<input-topic>:<output-topic>`