In-Cluster Storage

 

For some on-premise deployments, Staple is offering in-cluster storage as an alternative to S3. This is implemented by storing the documents in a Postgres database, with a thin server layer for the our other applications to interact with. This guide describes how to interact with that.

What: S3 replacement Where: inside the kubernetes cluster Why: quickly deploy on-prem without worrying about the underlying infrastructure When: only when absolutely necessary bc S3 is way better Who: everything that currently talks to S3

 

Action Item: You need to add the capacity for your applications to use the in-cluster storage instead of S3

 

Testing

Contact your cluster admin for the test server.


 

API Reference

The storage API exposes the following endpoints:

Endpoint Methods Formdata
/storage_health GET -
/create_folder GET, POST path: location of folder to create
force: force create folder chain
            true or false;
            if absent, read as false
/list_contents GET, POST path: location to list contents of
/upload_file POST path: location to upload file
file: file to upload
force: force create folder chain
            replace file if exist.
            true or false;
            if absent, read as false
/download_file GET, POST path: location of file to download
/delete GET, POST type: either file or folder
path: location of object to delete

Common errors:

 

Postman Collection

I've also created a Postman collection demo'ing the above endpoints:

https://www.getpostman.com/collections/0fe1211c979c4dc4b88d

You might find this a bit clearer than just documentation.

Please be careful testing /delete. Deleting a folder is recursive. Once a folder is deleted, people can't view or add anything to that path until it's manually recreated. Since you've all got access to the same Postman collection, that may get a bit tricky.

 


 

DevOps: How to Implement

Before we get into the actual storage API, some important points about how to include this:

  1. Use a switch to incorporate this new storage mechanism into your existing code
  2. Control which storage mechanism is used with an environment variable

Please please please do not create a new branch for this. I will be using the same docker image for cloud and on-premise deployments. Only the configuration data (environment variables) should change.

 

Environment Variables

You'll be provided with two new environment variables, STORAGE_MECHANISM for controlling which method to use, and STORAGE_URL which will provide the url in the case we're using internal storage.

STORAGE_MECHANISM is an ENUM with current possible values: s3 and internal.

When the deployment is meant to use S3 for storage, your environment variables will look like this:

 

When the deployment is meant to use internal storage, they will be:

The S3 credential environment variables will be named whatever they're currently named for your codebase. The above is just an example.

 

Switch

Use this to program a switch, so that your code will use a different upload method depending on the STORAGE_MECHANISM. An example in pseudocode:

 


 

Completion Checks

  1. Add the internal storage mechanism to all places where S3 is currently used.

  2. The switching logic is dependent only on environment variables

  3. Build a single Docker image from this

  4. Verify this image works under both configurations:

    1. STORAGE_MECHANISM=s3, S3 variables are set, but STORAGE_URL is empty.
    2. STORAGE_MECHANISM=internal and STORAGE_URL=http://myserver.com, but S3 variables are empty.