AI Engineer - LLM Architect

Pittsburgh, PA
Full Time
Experienced

AI Engineer (LLM Architect), Emotional Companion

 

The Role:

This position will help design and ship an emotionally intelligent conversational companion that reduces loneliness and improves daily life for older adults. You will architect the end-to-end AI stack, move fast with real users, and set the technical bar for the team.

What you will do:

  • Architecture: Design the conversational system from intake to response. Own policy, generation, tool use, long-term memory, personalization, and retrieval.
  • Model selection and training: Choose base models, build data pipelines, and run instruction tuning, safety tuning, and preference optimization. Use techniques such as LoRA, DPO, distillation, and quantization to reach latency and cost targets.
  • Prompt and agent design: Create robust system prompts, function-calling schemas, and tool APIs. Stand up an A/B framework to test prompts, policies, and safety rules with real users.
  • Evaluation: Build an automated and human-in-the-loop eval harness for empathy, helpfulness, safety, groundedness, latency, and cost. Define success metrics and wire them into dashboards.
  • Safety and ethics: Implement guardrails for prompt injection, jailbreaks, self-harm, medical boundaries, and misinformation. Add escalation, deflection, and human handoff paths that respect user consent.
  • Data and privacy: Set standards for PII handling, redaction, consent management, anonymization, and secure storage. Curate, generate, and label data that reflects diverse seniors and scenarios.
  • Serving and MLOps: Ship models to production using efficient inference stacks. Add observability, tracing, rollback, canary releases, and a model registry. Keep the system fast, stable, and affordable.
  • Voice pipeline: Integrate ASR, TTS with expressive prosody, barge-in, turn-taking, and latency budgets for a natural feel.
  • Collaboration: Work with design and research to translate user studies into product requirements. Mentor teammates and help make pragmatic build-vs-buy decisions.

Required skills & experience:

  • Deep Python: Production-grade code, profiling, testing, and packaging.
  • LLM implementation: Strong PyTorch and experience training or fine-tuning open models (e.g., Llama, Mistral, Qwen) including tokenizer issues, data curation, and distributed training with FSDP or DeepSpeed.
  • Inference and optimization: Quantization (GGUF, GPTQ, AWQ), serving stacks (vLLM, TensorRT-LLM, llama.cpp), caching, KV-reuse, streaming, and throughput tuning.
  • Prompt engineering and tool use: System and developer prompts, function calling, tool orchestration, and failure handling. Ability to make prompts measurable and testable.
  • Retrieval-augmented generation: Indexing, chunking, reranking, and grounding. Experience with FAISS, Milvus, Vespa, or Pinecone. Understanding of hallucination mitigation.
  • Evaluation and experimentation: Human ratings at scale, rubric design for empathy and safety, statistical testing, online A/B. Comfort turning qualitative findings into quantitative KPIs.
  • Security and privacy: PII handling, threat modeling for LLMs, prompt-level defenses, rate limiting, abuse detection. Familiarity with HIPAA-adjacent expectations and SOC 2 practices.
  • Product mindset: Ability to ship thin slices, instrument them, and iterate quickly based on user feedback.

Nice-to-have:

  • Affective computing: Emotion and intent classifiers, prosody features, conversation state tracking, de-escalation strategies.
  • Speech: ASR, diarization, VAD, latency-aware pipelines, expressive TTS.
  • Reinforcement and preference learning: DPO, PPO, ORPO, reward modeling, red-teaming loops.
  • On-device and edge: GPU and CPU constraints, memory mapping, mixed precision, mobile or embedded deployment.
  • Compliance awareness: Experience in healthcare or aging tech, consent UX, accessibility standards.
  • HCI and conversation design: Persona, turn-taking, long-term rapport, and evaluation methods suited for vulnerable users.

What success looks like in 90 days:

  • A production-ready conversational MVP with safety guardrails and memory that passes internal red-team checks.
  • An eval harness with live dashboards for empathy, safety, groundedness, latency, and cost per session.
  • A prompt and policy library with A/B tests running weekly and clear learnings.
  • A data pipeline with redaction, consent flags, and a high-quality instruction-tuning set sourced from real use.

Tools you might use:
 

Python, PyTorch, vLLM or TensorRT-LLM, llama.cpp, Weights & Biases, Ray, FAISS or Milvus, Redis, Postgres, Kubeflow or Flyte, Grafana or OpenTelemetry, Whisper or similar ASR, high-quality TTS, and standard MLOps tooling.
 

About us:

We are an exciting, new (funded) and stealthy AI startup that focuses on addressing the negative effects of isolation. You will be working with a group of experienced tech entrepreneurs and AI technologists. This position will help design and ship an emotionally intelligent conversational companion that reduces loneliness and improves daily life for older adults. You will architect the end-to-end AI stack, move fast with real users, and set the technical bar for the team.

 

What we offer:

  • Competitive base salary
  • Cash bonus 
  • Equity stack 
  • Unlimited PTO Plan
  • Dental, Vision, and Health Insurance
  • Hybrid Work Schedule in Pittsburgh, PA
  • We sponsor OPT and STEM OPT only

Please include a 1-3 minute video of you explaining/describing design rationale and showing a demo of a working system (full creation) using any LLM model you can implement. A longer video is okay but not more than 5 minutes. The audience for this will be our Co-founder/hiring manager with a lot of technical experience so be as technical as you can!
Share

Apply for this position

Required*
Apply with Indeed
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*