In-Cluster Storage

For some on-premise deployments, Staple is offering in-cluster storage as an alternative to S3. This is implemented by storing the documents in a Postgres database, with a thin server layer for the our other applications to interact with. This guide describes how to interact with that.

What: S3 replacement Where: inside the kubernetes cluster Why: quickly deploy on-prem without worrying about the underlying infrastructure When: only when absolutely necessary bc S3 is way better Who: everything that currently talks to S3

Action Item: You need to add the capacity for your applications to use the in-cluster storage instead of S3

Testing

Contact your cluster admin for the test server.

API Reference

The storage API exposes the following endpoints:

Endpoint	Methods	Formdata
`/storage_health`	`GET`	-
`/create_folder`	`GET`, `POST`	`path`: location of folder to create `force`: force create folder chain `true` or `false`; if absent, read as `false`
`/list_contents`	`GET`, `POST`	`path`: location to list contents of
`/upload_file`	`POST`	`path`: location to upload file `file`: file to upload `force`: force create folder chain replace file if exist. `true` or `false`; if absent, read as `false`
`/download_file`	`GET`, `POST`	`path`: location of file to download
`/delete`	`GET`, `POST`	`type`: either `file` or `folder` `path`: location of object to delete

Common errors:

Element already exists. Please delete existing element, or pick a new path.
Folder does not exist. - when trying to create a new element at a path that has not been created yet. The root folder is created on startup.

Postman Collection

I've also created a Postman collection demo'ing the above endpoints:

https://www.getpostman.com/collections/0fe1211c979c4dc4b88d

You might find this a bit clearer than just documentation.

Please be careful testing /delete. Deleting a folder is recursive. Once a folder is deleted, people can't view or add anything to that path until it's manually recreated. Since you've all got access to the same Postman collection, that may get a bit tricky.

DevOps: How to Implement

Before we get into the actual storage API, some important points about how to include this:

Use a switch to incorporate this new storage mechanism into your existing code
Control which storage mechanism is used with an environment variable

Please please please do not create a new branch for this. I will be using the same docker image for cloud and on-premise deployments. Only the configuration data (environment variables) should change.

Environment Variables

You'll be provided with two new environment variables, STORAGE_MECHANISM for controlling which method to use, and STORAGE_URL which will provide the url in the case we're using internal storage.

STORAGE_MECHANISM is an ENUM with current possible values: s3 and internal.

When the deployment is meant to use S3 for storage, your environment variables will look like this:


S3_ID: "AKIA123456789"
    
    S3_SECRET_KEY: "suchsecrecymuchwow"
    STORAGE_MECHANISM: "s3"
    STORAGE_URL: ""

When the deployment is meant to use internal storage, they will be:


xxxxxxxxxx
    
    
    
    
        
            
            
            S3_ID: ""
        
        S3_SECRET_KEY: ""
        STORAGE_MECHANISM: "internal"
        STORAGE_URL: "http://internal-storage"

The S3 credential environment variables will be named whatever they're currently named for your codebase. The above is just an example.

Switch

Use this to program a switch, so that your code will use a different upload method depending on the STORAGE_MECHANISM. An example in pseudocode:


xxxxxxxxxx
    
    
    
    
        
            
            
            def upload_image(image):
        
          if STORAGE_MECHANISM == "s3":
            s3.set_credentials(S3_ID, S3_SECRET_KEY)
            s3.upload_file(image)
          else if STORAGE_MECHANISM == "internal":
            post(STORAGE_URL, image)

Completion Checks

Add the internal storage mechanism to all places where S3 is currently used.
The switching logic is dependent only on environment variables
Build a single Docker image from this
Verify this image works under both configurations:
1. STORAGE_MECHANISM=s3, S3 variables are set, but STORAGE_URL is empty.
2. STORAGE_MECHANISM=internal and STORAGE_URL=http://myserver.com, but S3 variables are empty.