PostgresoL Blog

Forwardpostgresql will have some variable-length data types, storage are using varlena format (except cstring type), through the statement SELECT typname FROM pg_type WHERE typlen = -1 you can see all...

Forward

postgresql will have some variable-length data types, storage are using varlena format (except cstring type), through the statement SELECT typname FROM pg_type WHERE typlen = -1 you can see all the data types using varlena format, such as the common text . json types.

Varlena structureVarlena structure

The varlena structure has a common definition format as follows.

struct varlena
{
	char		vl_len_[4];		/* Do not touch this field directly! */
	char		vl_dat[FLEXIBLE_ARRAY_MEMBER];	/* Data content is here */
};

Note that this is a generic format, varlena is also divided into many formats, each with a different definition. Before we use it, we need to convert it to its corresponding format according to its first byte:

  1. The first byte is equal to 1000 0000, then it is varattrib_1b_e, which is used to store external data (explained in toast)

  2. The highest bit of the first byte is equal to 1, and then the byte is not equal to 1000 0000, then it is varattrib_1b, which is used to store small data.

  3. The highest bit of the first byte is equal to 0, then it is varattrib_4b, which can store data up to 1GB.

Varattrib_1b Type


The format of the varattrib_1b type is as follows:
Typed structure
{
	uint8 va_header;
	char va_data[FLEXIBLE_ARRAY_MEMBER]; /* data starts here */
} varattrib_1b;
The va_header has only 8 bits, the highest bit is the marker bit and has a value of 1. The remaining 7 bits represent the length of the data, so the varattrib_1b type is only used to store small data up to 127 bytes in length.


-----------------------------------------
   tag   |  length     |   
-----------------------------------------
  1 bit   |  7 bit       |
-----------------------------------------

Type varattrib_4b


The varattrib_4b format is categorized into two types depending on whether the stored data is compressed or not. A 1 in the second highest bit indicates that the stored data is uncompressed, and a 0 in the second highest bit indicates that the stored data is compressed. A 0 indicates that the stored data is compressed. varattrib_4b is defined as follows, using union to represent both cases. For uncompressed data, use the va_4byte structure to store it. For compressed data, use the va_compressed structure for storage.
typedef union
{
    /* va_data stores uncompressed data */
	struct						
	{
		
		char va_data[FLEXIBLE_ARRAY_MEMBER];
	} va_4byte.
    
    /* va_data stores data that has been compressed */
	struct						
	{
		uint32 va_header; /* va_data; /* va_data_stored_data
		uint32 va_rawsize; /* Size of raw data */
		char va_data[FLEXIBLE_ARRAY_MEMBER]; 
	} va_compressed; }
    
} varattrib_4b.
The first member of both structures, va_header, is of type uint32 and has the same format. The highest bit is a flag bit with a value of 0. The second bit indicates whether the data is uncompressed or not. The remaining 30 bits indicate the length of the data, so it can only support data up to 1GB (2^30 - 1 bytes).
--------------------------------------------------
    tag    |    compress    |      length        |
--------------------------------------------------
   1 bit   |    1 bit       |      30 bit        |
------------------------------------------------

Varattrib_1b_e Type


It does not store data, it just points to the address of the external data. Depending on where the external data is stored, it can be categorized into several formats. First look at its definition:
Typed structure
{
	uint8 va_header;		
	uint8 va_tag; /* type */
	char va_data[FLEXIBLE_ARRAY_MEMBER];
} varattrib_1b_e;
The second byte va_tag indicates the type, of which there are four below. The format of its va_data storage is not the same under each type.
typedef enum vartag_external
{
	VARTAG_INDIRECT = 1,
	VARTAG_EXPANDED_RO = 2,
	VARTAG_EXPANDED_RW = 3,
	VARTAG_ONDISK = 18
} vartag_external;External data stored on disk

External data stored on disk

If it is of type VARTAG_ONDISK, it indicates that the external data is stored on disk. The format of the va_data store is defined as follows.

typedef struct varatt_external
{
	int32		va_rawsize;		/* Original data size (includes header) */
	int32		va_extsize;		/* External saved size (doesn't) */
	Oid			va_valueid;		/* Unique ID of value within TOAST table */
	Oid			va_toastrelid;	/* RelID of TOAST table containing it */
}			varatt_external;

External data stored in memory


If the external data is stored in memory, it corresponds to the types VARTAG_EXPANDED_RO and VARTAG_EXPANDED_RW. The only difference between the two is that the former is read-only and the latter can be read and written.
typedef struct varatt_expanded
{
	ExpandedObjectHeader *eohptr;  // pointers
} varatt_expanded;


Pointer Types


There remains a special format, VARTAG_INDIRECT, which is simply a varlena pointer that can point to raw data of type varatt_external, varatt_expanded, or varattrib_1b, varattrib_4b.

typedef struct varatt_indirect
{
	struct varlena *pointer;	/* Pointer to in-memory varlena */
}			varatt_indirect;

Utilization


postgresql provides a number of macros (in the src/include/postgres.h file) that make it easy to manipulate varlena data. Here is an example of the varattrib_4b type


在这里插入图片描述

The following shows an example of creating varlena data

result = (struct varlena *) palloc(length + VARHDRSZ); // Allocate heap memory
SET_VARSIZE(result, length + VARHDRSZ);   // Set the header
memcpy(VARDATA(result), mydata, length); // Write data

Design Ideas

postgresql designed varlena structure, mainly to solve the problem of cstring. We know that cstring can only be scanned from the beginning to the end to know the length, this efficiency is very low. varlena in order to support different sizes of data, but also to avoid the waste of space, so for small data, the length of the information and marking bits are fused in a byte. Because the external data format just stores a pointer, and the length of the data can be determined, so the first byte does not need to store the length information, using 1000 0000 just as a marker. The rest of the format is summarized in the varattrib_1b type, and the data length must be greater than 0, so there is no conflict.


For large data, the length information takes up more bits to represent, so postgresql uses four bytes to store it. To avoid waste, it uses the first two bits as marker bits. As you can see, postgresql's design of the data format is very subtle, and we can learn more about it.




Our Customers

Industries