ShaoLin Cogo File System 1.0 Technical White Paper

ShaoLin Microsystems Ltd. reserved the rights


Table of Contents
Introduction to Cogo File System
Features and benefits of CogoFS
High Performance
Higher I/O Performance
Higher Network Capacity
Higher Capacity
Stability
High Level of Data Integrity
Intelligent Compression
Transparency
CPU utilization
Current limitations
Summary
List of Figures
1. CogoFS Block Diagram

Introduction to Cogo File System

ShaoLin Cogo File System (CogoFS) is system software that provides high performance, scalable, and high capacity local/network storage. It provides transparent random read/write access to compressed files located on both local and network based file systems. CogoFS compresses individual files which stored at mounted Linux file systems. Compressed files are stored with a different file extension (.cogo). Therefore, CogoFS is able to mix compressed files and uncompressed files in a single directory. Mounting CogoFS over network based file systems not just reducing disk space consumption, it also increases network performance by read/write compressed data (reducing network traffic), reducing compressed files

CogoFS uses the Kypertec compression engine which is a kernel compressor and decompresser developed by ShaoLin Microsystems. The Kypertec engine is designed from scratch for high performance compression and decompression. CogoFS is a cross-platform system software that can work on i386 based PC's as well as other embedded Linux architectures (e.g. Strong ARM, Xscale). CogoFS use the VFS layer for file system driver access, this stackable design makes CogoFS work independent of the physical storage media and file system types.

Figure 1. CogoFS Block Diagram

CogoFS is not a simple compression utility, it behaves as a real kernel file system in the kernel. It has its own page cache, dcache and all other behavior of a real file system. This means, CogoFS can make use of the Linux VFS page cache to enable access to cached data without the need to rework on compression or decompression.

CogoFS is a resident file system driver in the Linux kernel space. Therefore, it works like any file system driver which provides a high transparency to user level applications. The benefit of having CogoFS living in the kernel space is that it generate very low overheads. In addition, the Kypertec compression engine works nearly 3-4 times faster than normal compression tools like gzip and zip even they are using the same Huffman encoding compression algorithm.

As see on the diagram above, CogoFS access the Linux VFS API which make use the standard VFS interface from other file system drivers. Allowing CogoFS to work with other file systems directly in the kernel without the need of extra overheads, or other extra turn arounds.


Features and benefits of CogoFS

High Performance

With the use of Kypertec compression technology, kernel space compressor is much more efficient and higher performance than user space compression utilities. Typically, the optimized compression engine runs 3 to 9 times faster than user space compression utilities with optimization. Kypertec has is own cache management, this design makes memory allocation without stressing the Linux VM at all. It also guarantees it has enough memory to operate in low memory situations, which is important for a file system. For SMP systems, multiple copies of Kypertec can be execute in parallel to utilize multiple CPU's processing power. Kypertec is also tested on the latest Intel Hyper-Threading technology with performance increase.


Higher I/O Performance

With today's over power processors, CPU compression and decompression speed are incredibly fast. CogoFS enables multiple your disk I/O bottleneck by flowing less data through the disk I/O bottleneck. As CPU speed is going to continuously improve over disk I/O speed, CogoFS will continue to increase benefit for your disk I/O's.


Higher Network Capacity

CogoFS can dramatically doubles your network capacity for stressed NFS traffics, without the need to upgrade your network hardware or NFS server. Data rate can be reduced more than half of the original network traffic, together with increased storage capacity over your original setup. CogoFS is a low cost solution for extending the life of your network file servers and solving saturated network scenarios.


Higher Capacity

The Kypertec uses the dynamic Huffman encoding algorithm which achieves high compression ratio as well as gzip and zip. Plain text which is the most storage format for email, web content, XML and other data, can be even achieve a high compression ratio of 1 to 5. By using CogoFS, your storage capacity can easily grow by double or more without doing any hardware upgrades together with higher performance.


Stability

CogoFS has been using in data center 24x7 environment for years without causing data corruption. With its stackable design, the risk of loosing data is the same as or similar to other block managed file systems (e.g. ext3, xfs, reiserfs etc.). CogoFS works in VFS without touching any block level operations, enables CogoFS itself to operate without interfering the lower level file system's journaling features. This ensures the same level of reliability of the actual block level file system.


High Level of Data Integrity

Unlike other block compression file systems which compresses the whole volume, the chance of file system corruption is minimal. The stackable design of CogoFS leaves the maximal of data corruption to only 1 file instead of the whole volume. Meta data in CogoFS is not compressed to reduce the chance of failure caused by misalignment caused by compression.


Intelligent Compression

CogoFS with equipped with its intelligent compression back-off algorithm. Designed from scratch for performance, CogoFS will back-off for incompressible data and leaving it uncompressed to save CPU time. CogoFS take cares of time consuming incompressible data intelligently.


Transparency

CogoFS can be used with applications transparently, without the need to modify or recompile existing applications. CogoFS looks like a real file system with persistent Unix inode numbers, symbolic links, hard links and file permissions.


CPU utilization

With a higher CPU utilization, that means you make use of system resources more efficiently. CogoFS turns CPU power to some more demanding resources which are storage capacity and network bandwidth. With such utilization, you immediate increase your return on investment.


Current limitations

CogoFS now currently only support a file size of maximum 2G on 32-bit systems. The reason why CogoFS to use this limit is that consider the frequency of using a compressed file over 2G is quite rare, and comparing the cost of memory consumption of indexing, overheads caused on a 32-bit system to use 64-bit indexing is quite high.


Summary

The design goal of CogoFS is for both high performance and highly resource utilization on storage systems. By minimizing data transfer on networked file systems and load on file servers. The current CogoFS+NFS is known as the best combination for a replacement of highly stressed NFS and saturated networks. Future development of CogoFS will towards more performance enhancements and security features such as POSIX Access Control Lists (ACL) and encryption. CogoFS will continue to evolve and improve over time.