Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Realistic Fake Data in Java

Realistic Fake Data in Java

Realistic Fake Test Data in Java — with DataFaker + EasyRandom (Quarkus and Spring implementation)

• Why fake data matters: demos that feel real, safer staging, better tests and load simulations.
• Quick compare: DataFaker for lifelike fields; EasyRandom for full object graphs. They’re better together.
• Core patterns: generate objects with EasyRandom, then “polish” key fields (name, email, address) using DataFaker.
• API design: clean endpoints (e.g., /users/{count}), consistent envelope with server-side timestamp.
• Determinism: fixed seeds for reproducible tests; dynamic seeds for live demos.
• Localisation: swap locales to make data region-aware (names, addresses, phone formats).
• Large datasets: stream CSV/JSON for seeding databases and performance testing.
• Testing workflow: reusable factories, stable seeds in CI, quick Postman checks.
• Observability: timestamps in responses, lightweight health/ping for sanity.
• Framework-agnostic: generation lives in plain Java; works with Quarkus or Spring—no lock-in.
• Security & ethics: never mix real PII, label fake data, document assumptions.
• Common gotchas: email/phone formatting, string lengths, nulls in nested objects—tune parameters.
• Demo walkthrough: generate users, toggle EasyRandom/locale/seed, export 10k rows.
• Cheat sheet: copy-paste snippets for DataFaker, EasyRandom, and the blend pattern.
• TL;DR: EasyRandom gives structure, DataFaker gives realism—together they produce production-like test data fast.

Author Information:

Wallace Espindola
Software Engineer Sr., Solution Architect, Java & Python Dev

- LinkedIn: https://www.linkedin.com/in/wallaceespindola/
- GitHub: https://github.com/wallaceespindola
- Twitter: https://x.com/wsespindola
- Gravatar: https://gravatar.com/wallacese
- Dev Community: https://dev.to/wallaceespindola
- DZone Articles: https://dzone.com/users/1254611/wallacese.html
- Website: https://www.wtechitsolutions.com/

Avatar for Wallace Espindola

Wallace Espindola

October 08, 2025
Tweet

More Decks by Wallace Espindola

Other Decks in Programming

Transcript

  1. Realistic Fake Data in Java With DataFaker + EasyRandom practical

    patterns for APIs, tests, and demos Wallace Espindola Solution Architect
  2. Agenda Weʼll cover: • Why fake data matters • DataFaker

    vs EasyRandom (quick compare) • Core patterns & recipes • API & testing workflows • Production-ish practices (observability, determinism) • Short demo plan • Wrap-up & resources
  3. Why fake data matters What you gain: • Realistic demos

    and prototypes (no more John Doe). • Stronger test coverage with varied inputs. • Repeatable load & chaos testing with large datasets. • Safer staging data without touching production records.
  4. DataFaker vs EasyRandom Aspect DataFaker EasyRandom Goal Realistic, localized field

    values Auto-populate full object graphs Best at Names, emails, addresses, business, etc. POJOs, nested objects, collections Usage faker.name().fullName() random.nextObject(User.class) Pros Believable data; locales Zero boilerplate for complex models Together Polish fields with realism Generate structure; then fine-tune
  5. DataFaker: quick start import net.datafaker.Faker; Faker faker = new Faker();

    String fullName = faker.name().fullName(); String email = faker.internet().emailAddress(); String address = faker.address().fullAddress(); // Locale example: Faker pt = new Faker(new java.util.Locale("pt")); String nome = pt.name().fullName();
  6. EasyRandom: object population import org.jeasy.random.EasyRandom; import org.jeasy.random.EasyRandomParameters; EasyRandomParameters params =

    new EasyRandomParameters() .seed(System.currentTimeMillis()) .stringLengthRange(5, 20); EasyRandom random = new EasyRandom(params); User user = random.nextObject(User.class);
  7. Combine both: structure + realism User u = random.nextObject(User.class); if

    (u.getId() == null || u.getId().isBlank()) { u.setId(java.util.UUID.randomUUID().toString()); } u.setFullName(faker.name().fullName()); u.setEmail(faker.internet().emailAddress());
  8. API pattern (framework-agnostic) Principles: • All responses include a server-side

    timestamp. • Prefer path variables for simple params (e.g., /users/{count}). • Offer GET for idempotent reads; mirror POST-only with safe GET when doing demos. • Document via OpenAPI / SwaggerUI for easy discovery. { "data": [ { "id": "...", "fullName": "...", "email": "...", "phone": "...", "address": "..." } ], "timestamp": "2025-10-07T12:34:56Z" }
  9. Observability: timestamps everywhere Why timestamps help: • Correlate client logs

    with server events. • Measure end-to-end latency in demos. • Easy debugging across distributed services. • Great for screenshot-friendly outputs.
  10. Determinism: seeds & reproducibility Tips: • Use a fixed seed

    to reproduce exact datasets in tests. • Use dynamic seeds for demos/live streams. • Keep a toggle to switch between deterministic and random modes. // Deterministic EasyRandom EasyRandom r = new EasyRandom(new EasyRandomParameters().seed(42)); // Deterministic DataFaker Faker deterministic = new Faker(new java.util.Random(42));
  11. Localization & realism Make data feel real: • Pick locale

    per environment or per request (e.g., ?locale=pt). • Mix locales for international datasets. • Ensure email/phone formats match the locale when showcased.
  12. Generating large datasets CSV/JSON int n = 10_000; try (java.io.PrintWriter

    out = new java.io.PrintWriter("users.csv")) { out.println("id,fullName,email,phone,address"); for (int i=0;i<n;i++){ User u = fakerUser(); // or EasyRandom + polish out.printf("%s,%s,%s,%s,%s\n", u.getId(), u.getFullName(), u.getEmail(), u.getPhone(), u.getAddress()); } }
  13. Testing workflows Patterns: • Use EasyRandom for object graphs in

    unit tests. • Override a few fields with DataFaker. • Create reusable factories (e.g., UserFactory) for clarity. • Keep seeds fixed in CI to avoid flaky tests. • Bundle a Postman collection for API checks.
  14. Health & sanity checks Ideas: • Expose a health endpoint

    that includes a timestamp detail. • Add a lightweight /ping returning { "ok": true, "timestamp": ... }. • Log a tiny sample of generated data at startup for quick visibility.
  15. Keep it framework-agnostic Guidance: • The generation logic DataFaker +

    EasyRandom) lives in plain Java services. • Controllers/resources stay thin; any web framework can host them. • Works great with Spring or Quarkus — code stays the same at core. • Focus on portability: DTOs (records) + minimal dependencies.
  16. Demo flow Run-through: • 1 Hit GET /users/{count} → see

    realistic data + timestamp. • 2 Toggle ?easy=true → object population differs slightly. • 3 Switch locales → names/addresses feel regional. • 4 Export 10k users → CSV, open in spreadsheet. • 5 Brief on seeds → re-run deterministic dataset.
  17. Security & ethics Be careful with: • Never mix real

    data with fake data in the same dataset. • Label fake data clearly in demos and logs. • If masking real data, ensure one-way transforms. • Document locale assumptions and format limitations.
  18. Common gotchas Watch for: • Email format realism vs. deliverability

    (donʼt spam domains). • Phone number formatting per region. • Long strings & edge cases (min/max length). • Nulls in nested objects when EasyRandom rules are too strict — tune parameters.
  19. Cheat sheet (copy & paste) // DataFaker Faker faker =

    new Faker(); String name = faker.name().fullName(); String email = faker.internet().emailAddress(); // EasyRandom EasyRandomParameters p = new EasyRandomParameters().seed(42).stringLengthRange(5,20); EasyRandom rnd = new EasyRandom(p); User u = rnd.nextObject(User.class); // Blend u.setFullName(faker.name().fullName()); u.setEmail(faker.internet().emailAddress());
  20. TL;DR Takeaways: • Use DataFaker for realism. • Use EasyRandom

    for structure. • Seeded randomness = reproducible tests. • Return timestamps for observability. • Locale-aware data makes demos shine. • Keep the core generation framework-agnostic.
  21. Resources & next steps Try this next: • Add request

    params: count, locale, seed. • Produce CSV/JSON/SQL dumps to seed environments. • Bundle Postman/HTTP files for the team. • Wire simple metrics around generation time/count.
  22. Let's stay connected LinkedIn: linkedin.com/in/wallaceespindola GitHub: github.com/wallaceespindola Twitter: @wsespindola Dev

    Community: dev.to/wallaceespindola DZone Articles: dzone.com/users/1254611/wallacese.html Slides: speakerdeck.com/wallacese Medium: medium.com/@wallaceespindola Substack: wallaceespindola.substack.com Pulse: linkedin.com/in/wallaceespindola/recent-activity/articles/ Thank you!!!