# Employee Multi-Month Report Generator - Technical Documentation ## Project Overview **Purpose:** Generate professional HTML reports showing employee work patterns over multiple months (1-12 months) based on Slack presence tracking data. **Current Status:** ✅ Fully functional, generating complete HTML reports with interactive charts **Latest Request:** User wants to produce reports for multiple employees covering several months of data. The system is working correctly, including for employees with non-ASCII characters in names (e.g., "Rūta" with Lithuanian ū character). --- ## System Architecture ### Data Flow ``` Raw CSV Files (minute-by-minute) ↓ Employee Report Generator Script ↓ HTML Report (self-contained) ↓ Browser Display / Print to PDF ``` ### File Locations ``` slack-presence-tracker/ ├── generate_employee_report.py ← Main report generator ├── data/ │ └── raw/ │ └── username_YYYY-MM-DD.csv ← Source data files └── reports/ ├── employee_name_start_end.html ← Generated reports └── index.html ← Directory listing (when using --all) ``` --- ## Core Script: `generate_employee_report.py` ### Main Functions #### 1. `get_date_range(months, start_date, end_date)` **Purpose:** Calculate date range for report generation **Parameters:** - `months` (int): Number of months to go back from today - `start_date` (str): Custom start date in YYYY-MM-DD format - `end_date` (str): Custom end date in YYYY-MM-DD format **Returns:** Tuple of (start_date, end_date) as datetime.date objects **Logic:** - If both start/end provided: Use those dates - If only months provided: Go back N*30 days from today - Default: Last 3 months (90 days) --- #### 2. `load_user_data(user_name, start_date, end_date)` **Purpose:** Load all CSV data for a specific user within date range **Parameters:** - `user_name` (str): User's display name or user ID - `start_date` (date): Start of period - `end_date` (date): End of period **Returns:** List of dictionaries, each containing one minute of data **Process:** 1. Scans `data/raw/` directory for CSV files 2. Extracts date from filename (format: `username_YYYY-MM-DD.csv`) 3. Filters files by date range 4. Reads each CSV file 5. Filters rows matching user_name or user_id 6. Extracts time from timestamp field 7. Returns combined list of all matching rows **Data Structure Example:** ```python { 'date': '2025-11-19', 'file_date': datetime.date(2025, 11, 19), 'user_id': 'U123ABC', 'user_name': 'Rūta', 'presence': 'active', 'timestamp': '2025-11-19T08:30:00+00:00', 'time': '08:30', 'department': 'Engineering', 'team': 'Backend' } ``` **Important Notes:** - Handles multiple timestamp formats (ISO, space-separated, time-only) - Supports UTF-8 characters in user names (e.g., ū, ė, š) - Case-sensitive matching on user_name --- #### 3. `analyze_daily_activity(data)` **Purpose:** Aggregate minute-by-minute data into daily statistics **Parameters:** - `data` (list): Raw minute-by-minute data from load_user_data() **Returns:** List of daily statistics dictionaries **Calculations:** - Total minutes tracked per day - Active minutes (where presence='active') - First active time (earliest 'active' presence) - Last active time (latest 'active' presence) - Active hours (active_minutes / 60) - Activity rate (active_minutes / total_minutes * 100) **Output Example:** ```python { 'date': '2025-11-19', 'day_of_week': 'Tuesday', 'total_minutes': 540, 'active_minutes': 480, 'first_active': '08:00', 'last_active': '17:00', 'active_hours': 8.0, 'activity_rate': 88.9 } ``` --- #### 4. `group_by_week(daily_stats)` **Purpose:** Group daily statistics into weekly summaries **Parameters:** - `daily_stats` (list): Output from analyze_daily_activity() **Returns:** List of week dictionaries **Logic:** - Uses ISO week numbers (week_num from date.isocalendar()[1]) - Groups consecutive days with same week number - Calculates weekly totals and averages **Output Example:** ```python { 'week_num': 47, 'start_date': '2025-11-17', 'end_date': '2025-11-23', 'days': [... list of daily_stats ...], 'total_hours': 42.5, 'avg_hours': 8.5, 'working_days': 5 } ``` --- #### 5. `group_by_month(daily_stats)` **Purpose:** Group daily statistics into monthly summaries with embedded weeks **Parameters:** - `daily_stats` (list): Output from analyze_daily_activity() **Returns:** List of month dictionaries **Output Example:** ```python { 'month': '2025-11', 'month_name': 'November 2025', 'days': [... all days in November ...], 'weeks': [... weekly summaries ...], 'total_hours': 168.5, 'avg_hours_per_day': 7.7, 'working_days': 22, 'total_days': 30 } ``` --- #### 6. `calculate_time_patterns(daily_stats)` **Purpose:** Analyze typical work patterns across all days **Parameters:** - `daily_stats` (list): Output from analyze_daily_activity() **Returns:** Dictionary of time pattern metrics **Calculations:** - Typical start time (median of all start times) - Typical end time (median of all end times) - Earliest ever start - Latest ever end - Hourly activity percentages (0-23 hours) **Output Example:** ```python { 'typical_start': '08:15', 'typical_end': '17:30', 'earliest_ever': '07:15', 'latest_ever': '20:15', 'hourly_activity': { 0: 0.0, 1: 0.0, ... 8: 95.0, # 95% of days active at 8 AM 9: 100.0, # 100% of days active at 9 AM ... 17: 85.0, 18: 20.0, ... 23: 0.0 } } ``` --- #### 7. `generate_html_report(...)` **Purpose:** Generate complete HTML report with embedded CSS, JavaScript, and data **Parameters:** - `user_name` (str): Employee name - `daily_stats` (list): Daily statistics - `months_data` (list): Monthly summaries with weeks - `time_patterns` (dict): Time pattern analysis - `start_date` (date): Report start date - `end_date` (date): Report end date **Returns:** Complete HTML string (2000-3000 lines) **HTML Structure:** ```html Activity Report - {user_name}
...
Overall Summary
Monthly Comparison Chart
...
Time Patterns
``` **Key Features:** - Self-contained (no external files except Chart.js CDN) - UTF-8 encoding for international characters - Responsive design - Print-friendly CSS - Interactive collapse/expand sections - Color-coded daily status (normal/short/absent/weekend) --- #### 8. `generate_report(user_name, months, start_date, end_date)` **Purpose:** Main orchestration function **Process:** 1. Calculate date range 2. Load user data from CSV files 3. Analyze daily activity 4. Group by month (which includes grouping by week) 5. Calculate time patterns 6. Generate HTML 7. Save to file 8. Return output path **File Naming:** - Format: `{safe_name}_{start_date}_{end_date}.html` - Safe name: lowercase, spaces→underscores, dots→underscores - Example: `rūta_2025-11-19_2026-02-17.html` **Console Output:** ``` ============================================================ GENERATING REPORT FOR: Rūta ============================================================ Date range: 2025-11-19 to 2026-02-17 Loading data for Rūta... ✅ Loaded 131,040 data points Analyzing daily activity... ✅ Analyzed 91 days Grouping by month... ✅ Processed 4 months Calculating time patterns... ✅ Patterns calculated Generating HTML report... ✅ Report generated: reports/rūta_2025-11-19_2026-02-17.html File size: 112.3 KB ``` --- #### 9. `get_all_users()` **Purpose:** Discover all users in the system by sampling CSV files **Process:** 1. Samples first 10 CSV files from data/raw/ 2. Extracts unique user_name values 3. Returns sorted list **Returns:** List of user names **Example:** ```python ['Bartosz Witkowski', 'Jane Doe', 'Rūta', 'Tomas Šimkus', ...] ``` --- #### 10. `create_index_file(report_paths)` **Purpose:** Generate index.html listing all reports **Parameters:** - `report_paths` (list): List of Path objects to generated reports **Output:** `reports/index.html` with clickable links to all reports **HTML Structure:** ```html Employee Reports - Index

📊 Employee Activity Reports

Rūta
2025-11-19 to 2026-02-17
View Report →
...
``` --- ## Command-Line Interface ### Usage Examples ```bash # Single employee, last 3 months (default) python3 generate_employee_report.py --user "Rūta" # Single employee, last 6 months python3 generate_employee_report.py --user "Rūta" --months 6 # Single employee, custom date range python3 generate_employee_report.py --user "Rūta" --start 2025-11-01 --end 2026-01-31 # All employees, last 3 months python3 generate_employee_report.py --all --months 3 # All employees, last 12 months python3 generate_employee_report.py --all --months 12 ``` ### Arguments | Argument | Type | Default | Description | |----------|------|---------|-------------| | `--user` | string | None | User name to generate report for | | `--all` | flag | False | Generate reports for all users | | `--months` | int | 3 | Number of months to include | | `--start` | string | None | Start date (YYYY-MM-DD) | | `--end` | string | None | End date (YYYY-MM-DD) | **Validation:** - Either `--user` or `--all` must be specified - If both `--start` and `--end` are provided, `--months` is ignored - Dates must be in YYYY-MM-DD format --- ## Data Requirements ### Source Data Format **File Location:** `data/raw/` **File Naming:** `{username}_{YYYY-MM-DD}.csv` **CSV Columns Required:** - `user_id` - Slack user ID - `user_name` - Display name - `timestamp` - ISO timestamp with timezone - `presence` - 'active' or 'away' **CSV Columns Optional:** - `department` - Department name - `team` - Team name **Example CSV:** ```csv timestamp,user_id,user_name,presence,department,team 2025-11-19T08:00:00+02:00,U123ABC,Rūta,active,Engineering,Backend 2025-11-19T08:01:00+02:00,U123ABC,Rūta,active,Engineering,Backend 2025-11-19T08:02:00+02:00,U123ABC,Rūta,away,Engineering,Backend ``` --- ## Output Format ### HTML Report Structure **Sections:** 1. **Header** - Name, date range, department/team 2. **Overall Summary** - 8 key metrics across entire period 3. **Monthly Comparison** - Bar chart showing hours per month 4. **Monthly Breakdown** - Collapsible sections per month - Monthly statistics (4 cards) - Weekly breakdown (collapsible) - Progress bar - Daily bar chart (Chart.js) - Daily table with start/end times 5. **Time Patterns** - Typical schedule and hourly activity ### Chart Configuration **Library:** Chart.js 3.9.1 (loaded from CDN) **Monthly Chart:** - Type: Bar chart - Data: Total hours per month - Colors: Purple gradient (#667eea) **Weekly Charts:** - Type: Bar chart - Data: Hours per day - Colors: Purple for weekdays, gray for weekends - Y-axis: 0-12 hours - Labels: Day abbreviation + date (e.g., "Mon 11/01") ### Interactive Features **Collapsible Sections:** - Month headers: Click to expand/collapse - Week headers: Click to expand/collapse - Default: Most recent month expanded, others collapsed **Toggle Functions:** ```javascript function toggleMonth(id) { const content = document.getElementById(id); const toggle = document.getElementById('toggle-' + id); if (content.classList.contains('expanded')) { content.classList.remove('expanded'); toggle.textContent = '▶'; } else { content.classList.add('expanded'); toggle.textContent = '▼'; } } ``` --- ## Known Issues & Solutions ### Issue 1: UTF-8 Characters in Filenames **Problem:** Some systems may have issues with UTF-8 characters (ū, ė, š, etc.) in filenames **Current Behavior:** Script generates filename with UTF-8 characters preserved - Example: `rūta_2025-11-19_2026-02-17.html` **Solution Implemented:** - HTML uses `` - File written with `encoding='utf-8'` - Works correctly on modern systems **Alternative Solution (if needed):** ```python # In generate_report() function, modify: safe_filename = user_name.lower().replace(' ', '_').replace('.', '_') # Change to: import unicodedata safe_filename = unicodedata.normalize('NFKD', user_name.lower()) safe_filename = safe_filename.encode('ascii', 'ignore').decode('ascii') safe_filename = safe_filename.replace(' ', '_').replace('.', '_') # This would convert: "Rūta" → "ruta" ``` --- ### Issue 2: No Data Found for User **Symptoms:** ``` ❌ No data found for {user_name} in the specified date range ``` **Causes:** 1. User name spelling mismatch (case-sensitive) 2. No CSV files in date range 3. User not in CSV files **Debug Steps:** ```python # Add to load_user_data() function: print(f"Checking files for user: {user_name}") print(f"Found {len(csv_files)} CSV files") print(f"Date range: {start_date} to {end_date}") ``` **Solutions:** - Verify user name spelling (exact match required) - Check CSV files exist for date range - Try using `user_id` instead of `user_name` --- ### Issue 3: Charts Not Displaying in Browser **Symptoms:** HTML opens but charts are blank **Cause:** Chart.js CDN not loading **Solutions:** 1. Check internet connection (CDN requires internet) 2. Check browser console (F12) for errors 3. Try different browser (Chrome recommended) **Alternative:** Download Chart.js locally ```html ``` --- ## Performance Metrics ### Data Volume **Test Case:** 19 users, 91 days of data | Metric | Value | |--------|-------| | Input CSV files | ~91 files per user | | Raw data points | ~131,040 per user (1440 per day) | | Processing time | 2-5 seconds per user | | Output HTML size | 100-150 KB | ### Scalability | Period | CSV Files | Data Points | Processing Time | HTML Size | |--------|-----------|-------------|-----------------|-----------| | 1 month | ~30 | ~43,200 | 1-2s | 50-70 KB | | 3 months | ~90 | ~129,600 | 3-5s | 100-150 KB | | 6 months | ~180 | ~259,200 | 6-10s | 200-300 KB | | 12 months | ~365 | ~525,600 | 12-20s | 400-600 KB | **Bottlenecks:** 1. File I/O (reading CSV files) 2. HTML string generation 3. Chart.js code generation **Optimization Opportunities:** - Use multiprocessing for `--all` mode - Cache user data between reports - Compress HTML output - Generate charts client-side from JSON data --- ## Testing ### Test Cases #### Test 1: Single User, Default Period ```bash python3 generate_employee_report.py --user "Rūta" ``` **Expected:** - Loads last 3 months of data - Generates HTML file - File size ~100-150 KB - Exit code 0 --- #### Test 2: Single User, Custom Period ```bash python3 generate_employee_report.py --user "Rūta" --start 2025-11-01 --end 2025-12-31 ``` **Expected:** - Loads November-December 2025 data - Generates HTML file - Filename includes date range - Exit code 0 --- #### Test 3: All Users ```bash python3 generate_employee_report.py --all --months 3 ``` **Expected:** - Discovers all users - Generates report for each - Creates index.html - Prints summary of generated reports - Exit code 0 --- #### Test 4: User Not Found ```bash python3 generate_employee_report.py --user "NonexistentUser" ``` **Expected:** - Error message: "No data found for NonexistentUser" - No HTML file created - Exit code 0 (should probably be 1) --- #### Test 5: UTF-8 Characters ```bash python3 generate_employee_report.py --user "Rūta" ``` **Expected:** - Handles UTF-8 correctly - Filename: `rūta_*.html` - HTML displays "Rūta" correctly - Exit code 0 --- ### Validation Checklist **HTML Output:** - [ ] File exists in `reports/` directory - [ ] File size > 50 KB - [ ] Opens in browser without errors - [ ] Charts render correctly - [ ] UTF-8 characters display correctly - [ ] Collapsible sections work - [ ] Print to PDF works **Data Accuracy:** - [ ] Total days matches expected - [ ] Active days ≤ total days - [ ] Hours per month reasonable (0-200) - [ ] Start times < end times - [ ] Activity rate 0-100% **Edge Cases:** - [ ] User with 0 data (should error gracefully) - [ ] User with partial month - [ ] Weekend-only data - [ ] UTF-8 characters in name - [ ] Very long names (>50 chars) --- ## Future Enhancements ### High Priority 1. **Error Handling Improvements** - Return non-zero exit code on error - Better error messages for common issues - Validate CSV file format before processing 2. **Performance Optimization** - Multiprocessing for `--all` mode - Progress bar for long operations - Memory optimization for large datasets 3. **Output Options** - Export to PDF directly - Export to Excel - JSON data export for API use ### Medium Priority 4. **Customization** - Custom CSS themes - Logo upload - Company branding - Report title customization 5. **Data Validation** - Check for data gaps - Flag unusual patterns - Warn about low activity rates 6. **Comparison Features** - Team averages - Department comparisons - Trend analysis (comparing periods) ### Low Priority 7. **Scheduling** - Cron integration - Email delivery - Automated monthly reports 8. **Web Interface** - Flask/Django app - Online report generation - Real-time updates --- ## Troubleshooting Guide ### Problem: Script crashes during generation **Debug:** ```bash # Add verbose output python3 -u generate_employee_report.py --user "Rūta" 2>&1 | tee debug.log ``` **Check:** - CSV file format - Disk space - Memory usage - Python version (requires 3.7+) --- ### Problem: HTML file is blank or incomplete **Causes:** 1. Script crashed mid-generation 2. File write permissions 3. Disk full **Solutions:** ```bash # Check file size ls -lh reports/ # Check permissions chmod 644 reports/*.html # Check disk space df -h ``` --- ### Problem: Charts don't load **Causes:** 1. No internet (Chart.js CDN) 2. JavaScript errors 3. Browser compatibility **Solutions:** 1. Open browser console (F12) 2. Check for errors 3. Try Chrome browser 4. Check internet connection --- ## Code Maintenance ### Dependencies **Python Standard Library:** - `csv` - CSV file reading - `argparse` - CLI argument parsing - `json` - JSON data encoding - `pathlib` - File path handling - `datetime` - Date calculations - `collections.defaultdict` - Data grouping **External:** - Chart.js 3.9.1 (CDN) - Charts in HTML **No pip install required** - uses only standard library --- ### Code Style **Conventions:** - Function names: snake_case - Variables: snake_case - Constants: UPPER_SNAKE_CASE - Docstrings: NumPy style - Max line length: 100 characters --- ### File Structure ```python #!/usr/bin/env python3 """Module docstring""" # Imports import csv import argparse ... # Constants RAW_DATA_DIR = Path("data/raw") OUTPUT_DIR = Path("reports") # Helper functions def get_date_range(...): def load_user_data(...): def analyze_daily_activity(...): def group_by_week(...): def group_by_month(...): def calculate_time_patterns(...): def generate_html_report(...): # Main functions def generate_report(...): def get_all_users(...): def create_index_file(...): # CLI def main(): parser = argparse.ArgumentParser(...) ... if __name__ == "__main__": main() ``` --- ## Integration with Existing System ### Related Scripts 1. **create_master_file.py** - Generates master data files 2. **simple_dashboard.html** - Interactive dashboard 3. **check_presence_team.py** - Data collection (cron) ### Workflow Integration ``` Cron (every minute) ↓ check_presence_team.py ↓ data/raw/*.csv ↓ generate_employee_report.py ← YOU ARE HERE ↓ reports/*.html ``` ### Data Sharing - Both systems read from `data/raw/` directory - No conflicts (read-only access) - Can run simultaneously - Independent of each other --- ## Quick Reference ### Generate Single Report ```bash python3 generate_employee_report.py --user "Employee Name" --months 3 ``` ### Generate All Reports ```bash python3 generate_employee_report.py --all --months 3 ``` ### View Reports ```bash open reports/index.html ``` ### File Locations - Script: `generate_employee_report.py` - Input: `data/raw/*.csv` - Output: `reports/*.html` --- ## Current Status (as of last update) ✅ **Working:** - Single user report generation - Multi-month reports (1-12 months) - UTF-8 character support (tested with "Rūta") - All employees mode - Custom date ranges - Interactive HTML with charts - Print to PDF functionality ❌ **Known Issues:** - None currently 🔄 **In Progress:** - Documentation for CLI agent handoff 📋 **Next Steps:** - Continue development in VSCode with CLI agent - Possible enhancements as needed - Bug fixes as discovered --- ## Contact & Support **Developer:** AI Assistant (Claude) **User:** Tomas (Head of Development, TravelTime Technologies) **Project:** Slack Presence Tracker - Employee Reporting Module **Date:** February 2026 --- ## Handoff Notes for CLI Agent ### What Works The script is **fully functional** and tested. The most recent test generated a report for user "Rūta" covering November 2025 to February 2026 (4 months, 91 days). The HTML file was generated successfully and is valid. ### What You Need to Know 1. **UTF-8 Support:** The system correctly handles international characters 2. **Data Format:** CSV files in `data/raw/` with specific naming convention 3. **Output:** Self-contained HTML files with embedded CSS/JS 4. **Charts:** Requires internet for Chart.js CDN ### Where to Focus If the user requests changes: - **Performance:** Currently single-threaded, could use multiprocessing - **Error Handling:** Could be more robust - **Validation:** Could validate data quality more thoroughly - **Customization:** Could add themes, logos, branding ### Testing Always test with: - UTF-8 characters in names - Edge cases (no data, partial months) - Different date ranges - Multiple users **Good luck with development!** 🚀