Safetensors, CKPT, ONNX, GGUF, and Other Key AI Model Formats [2026]

Categorized as AI/ML, Open Source Tagged , , , , ,
Save and Share:

The growth of artificial intelligence and machine learning has created a critical need for methods to store and distribute models that are efficient, secure, and compatible with different systems. As models become more complex and are used in more diverse settings, the choice of serialization format is a key decision. This choice affects performance, resource use, and the security of AI systems.

This report examines leading model serialization formats, including Safetensors, CKPT, ONNX, and GGUF. It highlights their unique features, common uses, and how they compare to one another.

1. Introduction to Model Serialization in AI/ML

Model serialization is the process of saving a trained machine learning model to a file. This file can then be stored, shared, or reloaded for later use, such as making predictions, continuing training, or performing analysis. This capability is essential for the entire AI/ML lifecycle, from research and development to large-scale deployment.

The Critical Role of Model Formats in the AI/ML Lifecycle

Saving models to a standard format is crucial for several reasons:

  • Reproducibility: It allows research experiments to be precisely replicated and validated.
  • Collaboration: Standard formats make it easy for teams to share models, enabling them to work together and integrate models into larger systems.
  • Deployment: Serialization turns a trained model into a portable file that can be loaded and run in various environments, from cloud servers to edge devices.
  • Transfer Learning: It enables the use of pre-trained models as a foundation for new tasks, which saves significant training time and data.

Overview of Challenges Addressed by Modern Formats

As machine learning has advanced, modern serialization formats have evolved to solve several key challenges:

  • Security: A major concern is the security risk in traditional methods, especially those using Python's pickle module. These methods can allow malicious code to run when a model is loaded, creating a severe security threat if the model comes from an untrusted source.
  • Performance: Today's large and complex models require very fast loading and efficient memory management. This is especially important for devices with limited resources, like mobile phones, and for applications that need immediate responses.
  • Portability and Interoperability: The machine learning world uses many different frameworks (like PyTorch, TensorFlow, and JAX). Formats are needed that allow models to move easily between these frameworks and run on different hardware (GPUs, TPUs) without major rework.

In recent years, the AI community has shifted towards more efficient and secure formats like GGUF and Safetensors, reflecting a collective effort to address these issues.

Early methods for saving ML models, like PyTorch's use of the Python pickle module for its .pt and .pth files, were chosen for their ease of use. They could easily save complex Python objects, including both the model's design and its training state (like the optimizer). While this was convenient for research in a Python environment, it created a major security flaw. The pickle module is designed in a way that allows it to run any code embedded within a file during the loading process. This means loading a seemingly harmless model from an untrusted source could compromise an entire system.

The creation of formats like Safetensors, along with the growing use of ONNX and GGUF, is a direct response to this security risk, as well as the need for better performance and portability. Safetensors, for example, was built specifically to prevent malicious code from running. This shows that as the machine learning field matures and AI moves from research to real-world applications, security and efficiency are no longer afterthoughts but core principles in designing new formats. This change represents a necessary shift from research-focused flexibility to production-level security and robustness, fixing the "technical debt" of older, more permissive methods.

Framework-native formats, such as .pt/.pth for PyTorch and .ckpt/.h5 for TensorFlow/Keras, are tightly integrated with their specific frameworks. While this makes them efficient within a single ecosystem, it causes significant problems with interoperability. A model trained in one framework cannot be easily used in another without complex conversions or maintaining separate systems for each framework. This leads to disconnected development and deployment workflows.

The Open Neural Network Exchange (ONNX) format was created to break down these barriers. It provides a "cross-platform" and "vendor-neutral" standard for models. It achieves this by defining the model's structure (its computation graph) in an abstract way that is independent of any single framework. Similarly, GGUF, though originally made for the llama.cpp project, also focuses on improving compatibility for large language models (LLMs) across different platforms.

The variety of formats today reflects a core tension in the ML industry: the desire for framework-specific features during development (e.g., PyTorch's dynamic graph for research flexibility) versus the need for universal, efficient, and secure deployment. This tension means that multiple formats will continue to exist, making conversion tools and advanced MLOps pipelines increasingly vital to connect model development with deployment. Different formats will continue to be used for specific stages of the ML lifecycle based on their unique strengths.

2. Understanding Safetensors

Safetensors is a major step forward in model serialization, designed specifically to fix the security and efficiency problems of traditional model storage methods.

Definition and Core Design Principles

Safetensors is a modern, secure, and fast serialization format for deep learning models, created by Hugging Face. Its main goal is to provide a safe way to store and share tensors—the multi-dimensional arrays that are the basic data building blocks of machine learning. The format is designed to be safer and faster than older formats like pickle.

A core principle of Safetensors is its strict separation of model weights (tensors) from any runnable code. This design directly addresses the security flaws found in older serialization methods.

Key Features

  • Zero-copy and Lazy Loading: A key to Safetensors' performance is its "zero-copy" capability. This allows model data to be mapped directly from the disk into memory without creating extra copies, which saves memory and speeds up loading. It also supports "lazy loading," meaning only the necessary parts of a large model are loaded into RAM when needed. This is very useful for extremely large models or systems with limited memory.
  • Structured Metadata Handling: Every Safetensors file includes a separate metadata section in JSON format. This section lists all the tensors in the model with details like their shape, data type, and name. The metadata points to where the actual tensor data is stored separately in the file, which improves both readability and security.
  • Tensor-only Data Storage: The most important security feature of Safetensors is that it is designed to contain "only raw tensor data and associated metadata." By its architecture, it "doesn't allow serializing of arbitrary Python code." This fundamental design choice eliminates the risk of running malicious code when loading a model.
  • Quantization Support: Safetensors can handle quantized tensors, which helps make models smaller and use less memory. However, its quantization support is "not as flexible as GGUF" because it depends on the features provided by the PyTorch framework.

Primary Benefits

  • Enhanced Security (Mitigating Arbitrary Code Execution): This is Safetensors' biggest advantage. By design, it completely prevents Python code from being saved in the file. This eliminates the most serious security risk found in pickle-based formats: running malicious code when a model is loaded. This makes Safetensors the best choice for sharing and using models from public or untrusted sources. The format also includes other security features like "advanced encryption techniques" and access controls to prevent data tampering.
  • Performance Optimization: The use of zero-copy and lazy loading results in "faster loading times and lower memory usage." Benchmarks show it is much "faster" than pickle and can be "76.6X faster on CPU and 2X faster on GPU compared to the traditional PyTorch" saving method.
  • Portability: The format is designed to be portable, meaning it works across different programming languages. This makes it easier to share and use models in various software systems.
  • Seamless Integration: Safetensors "seamless integration with existing machine learning frameworks and libraries." This allows developers to adopt this safer format easily, without making major changes to their current workflows.

Comparison with Traditional Serialization (e.g., Pickle)

Python's pickle module, which is used for PyTorch's .pt and .pth files, is inherently unsafe. It allows any code to be hidden inside a serialized file and run automatically when the file is loaded. This is a well-known and severe vulnerability, especially when using models downloaded from public websites. While tools like picklescan can detect some malicious patterns, they are not foolproof and cannot guarantee safety.

Safetensors was created specifically to solve this security problem. By allowing only raw tensor data and structured metadata in the file, it removes the possibility of executing malicious code. Beyond security, Safetensors also offers much better performance. Its design for memory mapping and lazy loading leads to significantly faster loading and more efficient memory use compared to pickle, which typically loads the entire model into memory at once.

The security flaw in Python's pickle means that downloading a .pt or .pth file from an untrusted source is not just downloading data; it is like running a potentially harmful program. It is known that there is "no 100% bullet-proof solution to verifying the safety of a pickle file without execution." This puts the burden of checking the file's safety on the user, which is difficult and unreliable.

Safetensors changes this dynamic by redesigning the format itself to prevent harmful code from being included in the first place. It shifts the security responsibility from the user's difficult verification process to the format's built-in safety. This marks a major shift in the open-source AI community from a "verify, then trust" approach to a "trust by design" model. This change acknowledges that it's nearly impossible to scan for every possible threat in complex files. By blocking the attack vector (arbitrary code execution), Safetensors makes it safer to share models widely, encouraging collaboration and making it easier for more people to use pre-trained models. This "trust by design" principle is essential for the growth and security of the entire AI ecosystem.

While Safetensors was created mainly for security reasons (to fix pickle's vulnerabilities), it also provides major performance boosts, such as faster loading, lower memory use, and zero-copy operations. These performance gains are not just a side effect; they are a direct result of Safetensors' optimized design, which uses memory mapping and lazy loading to efficiently handle data. This makes it naturally more efficient for large models.

This combination of enhanced security and significant performance improvements has been a key driver of its widespread adoption. If Safetensors had only offered better security, its adoption might have been slower, particularly among users not immediately focused on security. However, the clear and measurable performance benefits provide a strong reason for everyone to switch, speeding up its integration into major platforms like Hugging Face. This shows that in AI engineering, a technology often needs to offer both security and performance advantages to be rapidly and widely accepted by the industry.

3. Overview of Key Model Formats

Besides Safetensors, several other formats are important in the machine learning world, each with its own features and use cases.

3.1. CKPT (Checkpoints)

An AI checkpoint is not a single file type but rather a snapshot of a model's state saved at a specific point during training. Checkpoints are essential for saving progress during long training jobs.

Characteristics and Typical Use Cases

A checkpoint typically contains a model's learned parameters, like its weights and biases. It can also store other important information needed to resume training, such as the optimizer's state, the current epoch number, and the learning rate schedule. The file extensions for checkpoints vary by framework. For PyTorch, they are usually .pt or .pth, while for TensorFlow/Keras, they are .ckpt or .h5.

Key benefits of CKPT files include:

  • Reproducibility: They ensure a model behaves consistently when reloaded, which is vital for validating research and maintaining reliable performance.
  • Collaboration: They are easy to share, allowing developers to replicate results or build on existing work.
  • Flexibility: PyTorch's .pt/.pth formats are particularly flexible, making it simple to save and load models for research purposes.

Common use cases for CKPT files include:

  • Resuming Training: Continuing a training session that was interrupted, which saves significant time and computational resources.
  • Fine-Tuning: Using a pre-trained model as a starting point for training on a new, more specific dataset.
  • Model Evaluation: Testing a model's performance at different stages of training without having to retrain it.
  • Inference: Loading a fully trained model into a production system to make predictions.
  • Research and Experimentation: Analyzing how a model evolves over time and systematically tuning its parameters.
  • Transfer Learning: Serving as a powerful starting point for related tasks, which reduces training time and data needs.
  • Disaster Recovery: Acting as a backup to resume work after a failure during a long training process.

Security Considerations

The biggest security risk with CKPT files, especially PyTorch's .pt and .pth formats, comes from their reliance on Python's pickle module. This means these files can be designed to contain and run malicious Python code when loaded (if the torch.load function is used without the weights_only=True setting). This vulnerability (CWE-502: Deserialization of Untrusted Data) can have serious consequences, such as data theft, altered model behavior, or even a full system takeover.

The industry has acknowledged this risk, and Safetensors has emerged as a safer option. As noted, "Most Stable Diffusion AI checkpoints are saved in formats like .ckpt or .safetensors... .safetensors is a safer alternative, designed to prevent malicious code execution." This shows a clear trend toward more secure formats for sharing models.

CKPTs, particularly in PyTorch's .pt/.pth format, are known for being "highly flexible." This flexibility allows them to save not just model weights but also the optimizer state and even custom Python classes, which is very useful for resuming training precisely.

However, this same flexibility is what creates the security vulnerability. Because the format can save any Python object, an attacker can hide malicious code inside a model file. When the file is loaded without proper precautions, that code runs. This illustrates a fundamental trade-off in system design: more flexibility often leads to a larger attack surface and greater security risks.

The industry's solution is to adopt formats like Safetensors for distributing models, even if the more flexible .pt/.pth formats are still used for training in controlled environments. This shows a growing understanding that different stages of the ML lifecycle require different levels of security. The power of saving the full training state is best kept within a trusted development environment, while sharing and deployment require formats with built-in security guarantees.

3.2. ONNX (Open Neural Network Exchange)

ONNX, which stands for Open Neural Network Exchange, is an open-standard format for machine learning models. It is designed to allow models to work across different deep learning frameworks.

Characteristics and Primary Use Cases

An ONNX file contains a model's complete structure, including its sequence of operations (the computation graph), its learned weights, and other metadata. A major strength of ONNX is that it acts as a universal translator. Models trained in frameworks like PyTorch, TensorFlow, or scikit-learn can be converted to the ONNX format, enabling a "train once, deploy anywhere" approach.

Unlike formats that only store model weights (like Safetensors or GGUF), ONNX includes the model's computation graph. This graph-based structure provides "more flexibility when converting models between different frameworks." ONNX offers excellent portability across many platforms, devices, and hardware accelerators (CPUs, GPUs, AI chips). The models are stored in Protobuf format, which is an efficient, platform-neutral way to save structured data.

Primary use cases for ONNX include:

  • Cross-Framework Deployment: Running a model in a different framework or environment than the one it was trained in.
  • High-Performance Inference: The ONNX Runtime is an inference engine that automatically optimizes models for specific hardware, often leading to faster performance.
  • Edge and Mobile Deployment: Its small footprint and optimized runtime make ONNX a good choice for running models on resource-limited devices.
  • Production Systems: Its robustness and portability make it popular for deploying models in demanding production environments.

Security Considerations

A subtle but serious security risk with ONNX models is the potential for architectural backdoors. An attacker could modify a model's computation graph to include a hidden path that is only triggered by specific inputs. When activated, this backdoor could cause the model to produce malicious or unexpected outputs, all while behaving normally on standard inputs, making it difficult to detect. Other risks include model inversion attacks (extracting sensitive training data) and adversarial attacks (using malicious inputs to fool the model).

To reduce these threats, several practices are recommended:

  • Digitally sign ONNX models to ensure they haven't been tampered with.
  • Deploy models in isolated environments, like Docker containers, with strong network security.
  • Use monitoring tools to track model behavior and detect anomalies.
  • Follow general security best practices, such as sanitizing inputs and keeping software updated.

ONNX is generally safer than pickle-based formats because it does not run arbitrary code when loaded. However, if an ONNX model uses custom layers implemented externally, those layers could potentially contain malicious Python code if not managed carefully.

Disadvantages

Although ONNX supports quantized models, it "does not natively support quantized tensors" as seamlessly as GGUF does. It breaks them down into separate integer and scale factor tensors, which "can lead to reduced quality." Converting models with complex or custom layers that are not standard in ONNX can also be difficult and may require custom work that could slow down performance.

Traditional formats based on Python's pickle (like .pt files) save Python objects, which can include runnable code. This treats the model as a program. In contrast, ONNX focuses on saving the model's "computation graph"—a more abstract representation of its operations and data flow, rather than a specific code implementation.

This graph-centric approach is what gives ONNX its excellent cross-framework portability and allows it to be optimized for different hardware. By defining the model's logic at a higher level, it becomes independent of the framework it was trained in. This is a significant conceptual shift, moving from a framework-specific implementation to a portable computational representation. While this greatly improves deployment flexibility, it also creates new security concerns, like architectural backdoors, which require different security strategies than those used for pickle-based formats.

3.3. GGUF (GPT-Generated Unified Format)

GGUF (GPT-Generated Unified Format) is a file format designed specifically for storing and running large language models (LLMs) efficiently. It is an improved version of its predecessor, GGML, and aims to make LLMs easier to use, especially on personal computers.

Characteristics and Primary Use Cases

GGUF is designed to make LLMs smaller and much faster to load. This is crucial for running models locally, where storage space and RAM are often limited. The format uses "advanced compression techniques" to achieve this. It also provides a standard way to package a model's weights, architecture, and metadata, ensuring it works consistently across different software, especially with inference engines based on llama.cpp.

A key feature of GGUF is its excellent support for quantization. Quantization reduces the numerical precision of a model's weights (e.g., from 16-bit to 4-bit numbers), which drastically cuts down on file size and the computation needed to run it. GGUF models are available in various quantization levels (from Q2 to Q8), offering a range of trade-offs between size and quality.

  • Lower quantization levels (like Q2 or Q3) result in very small files that can run on hardware with less RAM, but may have a slight drop in model quality.
  • Higher quantization levels (like Q6 or Q8) maintain better quality but require more storage and RAM.

Primary use cases for GGUF include:

  • Local LLM Deployment: Tools like Ollama use GGUF to make it easy for users to run powerful LLMs on their own computers.
  • Offline AI Assistants: Many applications use GGUF models to provide local, private alternatives to cloud-based AI tools.
  • Code Assistance: IDEs and code editors are starting to use GGUF models for intelligent code completion.
  • Local Chatbots: GGUF models are often used for private and responsive conversational AI systems.
  • AI Research: Its flexibility and quantization support make it popular among researchers for experimenting with LLMs on accessible hardware.

Security Considerations

Contrary to popular belief, the underlying GGML library (which GGUF is based on) has had documented vulnerabilities related to "insufficient validation on the input file." These flaws can lead to "potentially exploitable memory corruption vulnerabilities during parsing." Specific security issues have been identified where unchecked user input could cause heap overflows, potentially allowing an attacker to run malicious code.

There is a common misconception that a GGUF file "cannot contain code" and is "solely a modelfile." However, a security report from Databricks showed that while the GGUF file itself doesn't contain executable Python code, a specially crafted file can exploit flaws in the parser (the software that reads the file) to cause memory corruption and achieve code execution.

To reduce these risks, it is best to:

  • Use models and tools from well-known, reputable sources (like Koboldcpp).
  • Run LLMs in isolated environments (like Docker containers).
  • For highly sensitive tasks, consider using a dedicated machine with no internet access.

Disadvantages

A major drawback of GGUF is that most models are first developed in other frameworks (like PyTorch) and must be converted to the GGUF format. This conversion process is not always easy, and some models may not be fully supported by GGUF-compatible tools. Additionally, modifying or fine-tuning a model after it is in the GGUF format is generally "not straightforward."

While GGUF is designed for fast loading and efficient VRAM usage, the actual inference speed (how fast the model generates responses) can sometimes be slower than unquantized models. This can happen with lower quantization levels because of the extra work needed to dequantize the weights during inference. GGUF's main performance benefit is that it enables large models to run on consumer hardware by saving VRAM, not that it necessarily makes them faster.

GGUF's defining feature is its deep integration with quantization, which allows powerful LLMs to run on "consumer-grade hardware" with limited VRAM. This helps to democratize access to AI. However, this efficiency involves trade-offs. While quantization makes models smaller, lower levels can reduce model quality slightly. Also, inference speed can sometimes be slower than with unquantized models, especially if the unquantized version fits entirely in VRAM.

The "speed" benefit of GGUF usually refers to faster loading and the ability to run a larger model on limited hardware, rather than raw performance. GGUF perfectly captures the "democratization of AI" trend by making advanced models accessible to more people. This requires users to balance model quality with their hardware's limitations. The availability of multiple quantization levels allows users to adapt models to their specific needs, which is key to the format's popularity in the local AI community.

4. Comparative Analysis of Formats

The selection of an appropriate model serialization format is a strategic decision that hinges on balancing various factors, including security, performance, resource efficiency, interoperability, and the specific application context. The table below provides a comparative overview of Safetensors, CKPT, ONNX, and GGUF across these critical dimensions.

Feature / Format Safetensors CKPT (.pt/.pth) ONNX GGUF
Primary Purpose Secure, fast tensor storage for deep learning models Training checkpoints, model parameters, state preservation Cross-framework interoperability, deployment across diverse hardware Efficient LLM storage, optimized local inference on consumer hardware
Security Profile High (no arbitrary code execution by design) Low (arbitrary code execution via Pickle deserialization) Moderate (no arbitrary code execution, but architectural backdoors possible) Moderate (underlying library vulnerabilities, but file itself not executable Python code)
Loading Speed Very Fast (zero-copy, lazy loading) Varies (can be slower than Safetensors due to full load) Fast (optimized runtime, graph optimizations) Fast (mmap, efficient for LLMs)
Memory Usage Efficient (lazy loading, partial loading) Can be high (loads entire object graph) Efficient (runtime optimizations) Very Efficient (quantization, VRAM saving)
Disk Space Efficient (compression, tensor-only) Varies (can be large, includes full state) Efficient (Protobuf format) Very Efficient (quantization, advanced compression)
Quantization Support Yes, but less flexible than GGUF (PyTorch-dependent) Yes (framework dependent) Limited native support (decomposes tensors) Robust (multiple levels, Q2-Q8, specialized variants)
Portability High (across different programming languages) Low (tightly coupled to specific frameworks) Very High (cross-framework, cross-platform, diverse hardware) High (especially for llama.cpp ecosystem)
Primary Applications Secure model sharing, Hugging Face default Training, fine-tuning, research, model saving Production deployment, mobile/edge, interoperability Local LLM inference, consumer hardware, chat applications
Key Advantage Security by design, rapid loading, low memory footprint Training state preservation, detailed reproducibility Universal deployment, runtime optimization, framework agnosticism LLM efficiency on consumer hardware, flexible quantization
Key Disadvantage JSON parser required for metadata in C++ Arbitrary code execution risk, large file sizes Complexity for custom layers, limited native quantization Conversion often required, potential inference slowdown with lower quants

5. Conclusion

The world of machine learning model formats is constantly evolving, driven by the need for better security, performance, and interoperability. Traditional formats, like pickle-based CKPT files, were flexible for research but introduced serious security risks by allowing arbitrary code execution. This has led to the development and adoption of newer, safer formats.

Safetensors is a leading example of this shift. By separating data from code and using efficient loading techniques, it offers a secure and high-performance alternative for sharing deep learning models, especially in the Hugging Face ecosystem. Its dual benefits of security and speed have made it a popular choice in modern AI workflows.

ONNX solves the major problem of framework incompatibility. By representing models as abstract computation graphs, it allows them to be deployed across different hardware and software. While ONNX prevents the arbitrary code execution seen in pickle, it has its own security concerns, like architectural backdoors, which require different protective measures.

GGUF is a specialized solution for running large language models on consumer hardware. Its powerful quantization features dramatically reduce model size and memory use, making powerful LLMs accessible to more people. However, this efficiency can sometimes result in slower inference speeds, and its underlying libraries have shown vulnerabilities that require users to be cautious.

Ultimately, the best format depends on the specific context.

  • Safetensors is the top choice for securely and efficiently sharing deep learning models.
  • ONNX is ideal for deploying models across different frameworks and hardware.
  • GGUF offers unmatched efficiency for running large language models on local, resource-limited devices.

While traditional CKPT formats are still useful for saving training progress in controlled environments, they are being replaced by safer alternatives for public distribution. As the AI field matures, the continued development of these specialized formats will be essential for advancing the power and reach of machine learning.

2 comments

  1. Always go for the .safetensors file instead of .ckpt if it’s an option! Originally, model files for Stable Diffusion (and other ML tools) used the .ckpt format, which had the potential to contain malicious code. To address this concern, the .safetensors format was developed, providing inherent safety. Both .safetensors and .ckpt files are used in the same way—simply place them in the models directory and select them from the model list in the interface.

    Make sure to choose the .safetensors version over .ckpt whenever possible!

    1. .ckpt files are indeed dangerous. Models in the .onnx format are also safe. Yet another format to consider alongside .safetensors.

Leave a comment

Your email address will not be published. Required fields are marked *