Build a Highly Performant File Server in Kubernetes with SeaweedFS
Then, with Helm installed, we can deploy the SeaweedFS chart by running the following:
SeaweedFS stands out as a highly scalable and efficient distributed file store that seamlessly integrates with Kubernetes using Helm. Capable of handling billions of files, SeaweedFS excels in rapidly retrieving files with an O(1) time complexity for disk seeks. This means that the algorithm powering SeaweedFS maintains a consistent performance level, unaffected by the scale of input data, ensuring efficient processing even with vast amounts of information.
To explore the basics of SeaweedFS, we'll build a simple file server, which writes files to Volumes in SeaweedFS, and stores related metadata in a separate Postgres DB. This data will be written and read in a sequence that allows it to be easily accessible and extremely efficient.
The full project is available in GitHub.
We're going to build a simple file server that accepts image files as uploads, stores them in SeaweedFS, and writes related metadata to a Postgres database. The file server will also allow files to be downloaded by clicking a hyperlinked filename in the browser.
Setup up a Kubernetes Cluster
To deploy SeaweedFS in Kubernetes, we'll use the official Helm Chart, which will require that we first install Helm. To install on macOS, you can run:
Then, with Helm installed, we can deploy the SeaweedFS chart by running the following:
NOTE: the example repo includes a Helm values.yaml file that will allow you to deploy SeaweedFS on a M1. To include it in the above helm install command, simply include `-f <path-to-the-example-values.yaml-file>`.
The Web-API service will handle all queries to SeaweedFS and Postgres.
First, we'll need to define a database table to store file metadata in Postgres, as described above. For this, we'll use Gorm, a popular SQL ORM for Go.
As you can see, the database table is defined as a Go struct called FileRecord (defined in the following code block). We create an instance of this struct and pass a pointer to the struct to the db.AutoMigrate() method provided by Gorm, and we create a database client in the same InitDB() function, which we'll later call in our main() function in main.go.
Next, we have several structs to create. MasterResponse, Location, and Volume, are for parsing JSON responses from various components in the SeaweedFS system, and FileRecord — as described above — is the struct we'll use to write file metadata to Postgres.
Let's begin by defining our main() function and our first Gin endpoint — “/api/upload,” which will accept a POST request with a file as form data.
After importing the required packages and defining the SeaweedFS master URL, we define the Gin router and instantiate our database connection by calling InitDB(), which we defined above.
Then, we parse the incoming file and call two functions in sequence. First, we call the SeaweedFS Master to get an available file location, and the URL for the SeaweedFS volume in which the file will be stored with the function getSeaweedfsFidAndUrl(). Then, we call the uploadFileToSeaweedfs() function with the file and url values returned from the SeaweedFS Master, along with the file to be uploaded.
Finally, after the upload has succeeded, we create a FileRecord containing the file's location in SeaweedFS and its filename, which we then write to Postgres. This will allow us to easily track the files that have been uploaded, and the locations in SeaweedFS to which they've been written.
Next, still in our main.go file, let's define the two functions called in the Gin router handler we just defined. First, we'll define getSeaweedfsFidAndUrl():
And then, we'll need to define the uploadFileToSeaweedFS() function in the same file, like so:
With that, the file server's upload functionality is complete. Next, let's define the download functionality by adding the following Gin route handler to our main() function that will handle GET requests to "/api/download/:fid".
Here again, we'll need to define the functions called in the route handler — getSeaweedfsFileLocation() and downloadSeaweedfsFile(). Again, these functions will also be defined in main.go.
And now, the Web-API service's download functionality is also ready.
Next, let's define the React frontend that will include a form to upload files and list all files uploaded as hyperlinked file names in the browser.
The frontend will consist of a single React component called FileForm, which will include a simple form for uploading files and will fetch all file metadata from Postgres on load, which will be displayed as an ordered list of filenames as download links.
And, finally, we'll have to include our new component FileForm in our App.js file, like so:
Before we can deploy the frontend and Web-API services to Kubernetes, we'll need to create Docker images for both services and push those images to a registry.
Frontend Dockerfile
Because we're serving the React app with NGINX, we'll also need to create a nginx.conf file, which we COPY into the image defined above:
nginx.conf
To build the image defined above with Minikube, from the /frontend directory, we can now run:
Similarly, to build the Web-API image defined above in our Minikube cluster, from the /web-api directory, we can run:
Next, we will need to create the Kubernetes manifests that will run the above container images, and allow them to be networked together. All required resource definitions are available in GitHub, so we'll just look at one example here, and walk through what each of the different pieces are doing.
We have four Kubernetes resource types defined above: a deployment, a service an ingress, and in the case of our Postgres deployment a persistent volume claim, which will allow data to be persisted even if the Pod running Postgres should go down.
The deployment is responsible for pulling and running the Docker image we defined above. It can create any number of Kubernetes pods, each running a separate copy of our defined service and load balancing traffic amongst them. The Kubernetes service is of the type ClusterIP, which exposes networking between microservices running inside the cluster.
And finally, the Kubernetes ingress allows external HTTP traffic to hit the endpoints defined in the spec.rules section of the manifest. Above, we have two ingress rules defined, which allows the React frontend to be run on port 80, and a second rule which allows traffic from the browser — where the frontend is running — to the Web-API service to read and write from SeaweedFS and Postgres.
With the Web-API service written as above, the upload and download functionality works, but we have a bug! Every file that is downloaded is called “image.png,” as the filename is hardcoded. Let's update that so that files are given the same name they have at upload.
Traditionally, when you have one or more services running in Kubernetes, you would have to go through the majority of the above deployment steps in order to update the container images running in your Deployments, but with Velocity you can simply start a remote development and debugging session from your IDE, and update you code as if it were running locally. Velocity automatically syncs your code to the remote cluster, so you can see your changes almost immediately.
To use Velocity, we'll install the VSCode plugin by clicking on the “Extensions” icon in the IDE, searching for Velocity, and clicking “Install.”
Once Velocity is installed, we’ll need to click “Login to Velocity” to login with a Google or GitHub account, and then click the Velocity icon in the leftmost menu in our IDE.
Next, check to make sure that the auto-populated fields are correct — by default, Velocity selects your default Kubernetes environment, but it can work with any Kubernetes environment defined in your Kubeconfig, which can be selected with the “Kubernetes Context” dropdown menu.
Click “Next,” and then in the following view, click “Create.”
When you do, you will see the following as Velocity spins up the required resources in your Kubernetes cluster, and then builds and pushes your local code according to the specifications defined in your selected local Dockerfile. When this process is complete, you'll see your local code running in the cluster, and every time you make a change locally, that code change will be reflected in the remote cluster.
With the Velocity development session running, let's add a function to our main.go file that will get the correct filename from the metadata we're storing in Postgres.
And then, update the main() function in your main.go file as follows:
Now, when you download any existing or new files, they will be named as they were uploaded!
SeaweedFS is a super powerful file store that can easily be spun up in Kubernetes. It can store billions of files, which it can serve extremely efficiently. Above, we walked through the basics of getting SeaweedFS up and running in Kubernetes by building a simple file server that stores files in SeaweedFS volumes and related metadata in a separate Postgres instance.
Then, after the application was up and running in a remote Kubernetes cluster, we saw how Velocity can streamline the development and debugging of applications in complex Kubernetes environments.
Python class called ProcessVideo
Python class called ProcessVideo