Data

XML vs JSON vs YAML: Which Data Format for Which Job

Three data formats with overlapping use cases, very different strengths. Learn when to use each, the security gotchas, and the surprising places XML still wins.

11 min read

Three data formats. Three communities that prefer their pick. The honest answer is that each fits different jobs — JSON for APIs, YAML for configs, XML for document-shaped data with strong schema needs. Mixing them up creates fragility, security risks, and developer pain.

The three formats at a glance

JSON

{
  "name": "Jane Doe",
  "age": 32,
  "skills": ["python", "rust"],
  "active": true
}

YAML

name: Jane Doe
age: 32
skills:
  - python
  - rust
active: true

XML

<person>
  <name>Jane Doe</name>
  <age>32</age>
  <skills>
    <skill>python</skill>
    <skill>rust</skill>
  </skills>
  <active>true</active>
</person>

JSON: the API standard

JSON dominates web APIs because:

  • Native to JavaScript — no parsing library needed in browsers.
  • Compact compared to XML, more verbose than binary formats but human-readable.
  • Maps cleanly to most languages' basic data types.
  • No comments, attributes, or namespaces — keeps it simple.

JSON's real weaknesses

  • No comments. Annoying for configuration files. Workarounds (JSON5, JSONC) exist but aren't universal.
  • Trailing commas not allowed. Diff-unfriendly when adding/removing items at the end of arrays.
  • No date type. Use ISO 8601 strings; some libraries auto-parse, most don't.
  • Number precision. JavaScript loses precision past 2^53. Big integers (account IDs, timestamps in microseconds) become approximate.
  • No schema in the data itself. JSON Schema exists but is external.

YAML: the configuration format

YAML is the choice for human-edited configuration:

  • Comments supported.
  • Significantly less syntactic noise than JSON or XML.
  • Reasonable for hand-editing complex nested structures.
  • Multi-line strings have several syntaxes.
  • Anchors and aliases enable reuse within a document.

The YAML gotchas list

YAML's flexibility is also its weakness. Famous traps include:
  • The Norway problem: country: NO parses as boolean false. Quote it.
  • Indentation matters and is fragile. Tabs vs spaces breaks parsing.
  • Implicit type conversion. "1.0" becomes float 1.0, "01" might become number 1 or string "01" depending on parser.
  • YAML 1.1 vs 1.2 differences. "y", "yes", "on" are booleans in 1.1 but strings in 1.2. Most parsers default to 1.1.

YAML's real weaknesses

  • Implicit typing surprises. Especially with country codes, version strings, and short numbers.
  • Indentation errors are subtle. A misaligned dash can produce structurally different data without an error.
  • Anchors and aliases can DoS parsers. The "billion laughs" attack is a thing.
  • Standards drift. YAML 1.1 vs 1.2 vs library-specific extensions all in the wild.
  • Slow to parse compared to JSON. 5–10x slower in most benchmarks.

XML: still alive in specific niches

XML lost the API war but won where it actually fits:

  • Document-shaped data. Office Open XML (.docx), SVG, RSS/Atom feeds, EPUB.
  • Strict schema validation. XML Schema (XSD) is more powerful than JSON Schema.
  • Industry standards. SOAP, SAML, HL7, financial messaging (FIX, SWIFT) all use XML.
  • Mixed content. Text with embedded markup (HTML-like) is XML's strength. JSON can't represent this naturally.
  • XSLT transformations. Powerful, declarative XML-to-XML or XML-to-HTML conversion.

XML's real weaknesses

  • Verbosity. Tags duplicate as opening and closing.
  • Attributes vs elements debate. Same data can be expressed multiple ways.
  • Namespaces. Powerful but complex. Adding namespaces breaks naive parsers.
  • Security: XXE attacks. External entity expansion can read arbitrary files or DoS the server.
  • Heavyweight tooling. XPath, XSLT, XSD all add complexity beyond what JSON apps need.

Security comparisons

JSON

Generally safe. Don't use eval() to parse it (always use JSON.parse). The old "JSONP" pattern is dangerous and obsolete.

YAML

yaml.load in PyYAML can execute arbitrary Python via !!python/object tags. Always use yaml.safe_load for untrusted input. Other languages have similar risks.

XML

XXE (XML External Entity) attacks: malicious XML can read local files (/etc/passwd) or cause DoS via billion-laughs expansion. Disable entity expansion in your parser unless you specifically need it.

The decision framework

Use JSON for

  • HTTP APIs
  • JavaScript-heavy applications
  • Simple data interchange
  • Browser-side data
  • Speed-critical parsing

Use YAML for

  • Application configuration files
  • CI/CD pipelines (GitHub Actions, GitLab)
  • Kubernetes manifests, Docker Compose
  • Hand-edited multi-environment configs
  • Data with comments needed

Use XML for: document-shaped data (where text and markup intermix), industries with mature XML standards, or when you need strict schema validation with XSD's power.

Conversions and tooling

  • JSON ↔ YAML: jq, yq, online converters. Conversion is mostly lossless (YAML can express everything JSON can).
  • JSON ↔ XML: not lossless. XML attributes vs elements creates ambiguity. Use xq (jq-like for XML).
  • YAML ↔ XML: rare but possible via JSON intermediate.

Newer alternatives worth knowing

  • TOML: simpler than YAML, supports comments, less ambiguity. Used by Cargo (Rust), pyproject.toml.
  • HCL (Hashicorp Config Language): Terraform configs. Mix of declarative and expressions.
  • JSON5 / JSONC: JSON with comments and trailing commas. Limited adoption but useful for configs.
  • Protocol Buffers / Avro / MessagePack: binary formats for high-throughput, schema-driven systems.

Common mistakes

  • Using YAML for APIs. Slow to parse, ambiguous typing, harder client tooling.
  • Using JSON for hand-editable config. No comments, no trailing commas, brittle for humans.
  • Using XML for new APIs. Verbose, heavyweight, and most clients prefer JSON.
  • Trusting YAML's implicit typing. Quote anything that could ambiguously be a number, boolean, or null.
  • Allowing XML external entities by default. Disable XXE unless explicitly needed.

Key Takeaways

  • JSON for APIs (fast, native, simple). YAML for configs (comments, less syntax). XML for documents (mixed content, schema validation).
  • YAML's implicit typing creates traps: NO becomes false, "01" might become 1. Quote ambiguous values.
  • Security matters: PyYAML's yaml.load is unsafe (use safe_load); XML XXE attacks are real (disable entities).
  • JSON has no comments, no trailing commas, no big-int precision. Workarounds (JSON5, JSONC) help for configs.
  • XML still wins for document-shaped data, mixed content, and industries with XML-based standards.